* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Woolfe, 2005
Human genetic variation wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Non-coding RNA wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Genetic engineering wikipedia , lookup
Essential gene wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Copy-number variation wikipedia , lookup
Gene therapy wikipedia , lookup
Point mutation wikipedia , lookup
Metagenomics wikipedia , lookup
Oncogenomics wikipedia , lookup
Gene nomenclature wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Transposable element wikipedia , lookup
History of genetic engineering wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Gene expression programming wikipedia , lookup
Human genome wikipedia , lookup
Pathogenomics wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Public health genomics wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Non-coding DNA wikipedia , lookup
Genome editing wikipedia , lookup
Gene desert wikipedia , lookup
Genomic imprinting wikipedia , lookup
Microevolution wikipedia , lookup
Minimal genome wikipedia , lookup
Ridge (biology) wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Genome (book) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Gene expression profiling wikipedia , lookup
Helitron (biology) wikipedia , lookup
Highly Conserved Non-Coding Sequences are Associated with Vertebrate Development PLoS Biol. 2005 Jan;3(1):e7. Epub 2004 Nov 11. Yvonne Li Paper presentation for MEDG505 Jan 27, 2005 Outline Motivation Method and Results Discussion Motivation  Gene Regulatory Networks for development have been described in invertebrates but not characterized for vertebrates  Studies have shown:     a number of developmental genes are regulated by highly conserved enhancer regions at distances of hundreds of kb ultra-conserved elements are more frequent than expected there is a significant association between these highly conserved elements and DNA binding proteins Goal: look for all such elements in the entire human genome and see how they relate to development. Method Computationally identify Computationally analyze Experimentally validate Sequence Data Identifying  CNE : Highly Conserved Noncoding Elements  Which 2 species to use for whole-genome alignment? Sequence Data  Which 2 species to use for wholegenome alignment?     Human and Fugu Fugu has 1/8 genome size of human but similar gene repertoire Fugu’s developmental blueprint is very similar to Human Two ways to detect CNEs 1. 2. Whole-genome alignment Regional alignments Identifying Identifying Obtaining CNEs      Start with Fugu genome assembly MegaBLAST against Ensembl human genome v18.34.1 Remove alignments < 100bp in length Masked coding and non-coding RNA content Remove telomere-like sequences and transposons S T A T S 1373 core set of elements Length: ave 199bp Identity: ave 84% 1365 conserved in 1316 conserved in 1310 conserved in 1093 conserved in max 736 bp max 98% mouse rat chicken zebrafish CNE Distribution  CNEs in human genome are found on all chromosomes except 21 and Y  Distribution of CNEs is highly clustered Clustered CNEs by genomic location  165 clusters  The 20 largest clusters have ≥ 20 CNEs  Analyzing CNE associated genes  Find most statistically over-represented GO terms    Over 93% of clusters have transdev gene within 500kb of its CNEs. 15% have 2 or more. CNEs generally located large distances from nearest gene   For each CNE, extract closest gene from Ensembl 12 of the 13 terms relate to transcriptional regulation and development How many clusters situated near such transdev genes?   Analyzing Average distance between CNE and 5’ end of closest human gene is 182kb, with 93 CNEs > 500kb, and 12 CNEs > 1Mb. Transdev genes are located in regions of low gene density  Average number of genes within 500 kb upstream or downstream is 16 for all human genes and 6 for transdev genes Obtaining rCNEs Identifying  Use MLAGAN (Localized multiple alignment) to identify additional conserved sequences around specific genes  MLAGAN more sensitive than whole-genome alignment    Species: Human, Fugu, Mouse, Rat Algorithm itself is more sensitive Require only 40bp window with 60% identity  Chose 4 cluster regions containing diff types of developmental genes:  SOX21, PAX6, HLXB9, SHH  Sometimes, the CNEs are more conserved than the gene’s coding exons! Sox21 MLAGAN Vertebrates vs Invertebrates  Are the CNEs also found in invertebrates?  Use all CNEs and rCNEs Search whole genome sequence of         Ciona intestinalis Drosophila melanogaster Caenorhabditis elegans Anopheles gambiae No significant matches (however, the genes have clear homologs) 43 CNEs show significant similarity to at least one other CNE (their genes have clear paralogous relationships) Method Computationally identify CNEs Computationally analyze CNEs Experimentally validate a few CNEs Experimental Validation  Coinject CNEs with green fluorescent protein (GFP) reporter, in zebrafish embryos  Idea:  CNEs contain something that affects the transcription of a transdev gene  The transdev gene affects development  Examine the ability of CNEs to up-regulate GFP reporter expression Experimental Validation  Chose 25 regions for GFP assay  10 CNEs, 15 rCNEs  Look for GFP expression in live embryos  Average of 200 embryos screened per control   No upregulation Average of 188 embryos screened per element  GFP expression in all but 2 elements; varied from 4% to 44% SOX21 associated elements Known    SRY-related box gene Acts as a transcriptional repressor during early development Expressed in a complex manner in CNS, and in nasal epithelium, lens and retina of eye, inner ear PAX6 associated elements Known   Paired-box containing transcription factor, known to be influenced by cis-acting elements in upstream, intronic and downstream positions Expressed in developing eye, forebrain, hindbrain, spinal cord HLXB9 associated elements Known   Homeobox gene associated with autosomal dominant effects Zebrafish ortholog is expressed in notochord, hypochord, tail mesoderm, and tailbud SHH associated elements Known   A signaling molecule Zebrafish ortholog is expressed mainly in midline structures like floorplate and notochord, but also in branchial arches, pectoral fin buds, retina Limitations  CNE-gene misassociated, especially in gene-rich regions   Can kind of tell from results of assays CNEs missed due to stringent whole-genome analysis  Down regulation of expression will not be detected  Assayed elements out of context and individually    Each element had cases of unexpected expression Tissues from few cells are underrepresented Late developing tissues or cell types after 24 h will be missed completely Summary        Identified a set of 1373 vertebrate CNEs Experimentally showed CNE-transdev gene association CNEs found in clusters, in front of transdev genes CNEs act at large distances from coding sequence The relative order and positions of CNEs are conserved No vertebrate CNEs were found in invertebrates, even though the genes had clear homologs Many of these results are paralleled by a similar paper (Sandelin et al. 2004)  >50bp, >95% Human/Mouse identity  3583 Human/Mouse/Pufferfish UCRs; ave length 125 bp Discussion  Almost all CNEs are associated with developmental regulators   CNEs act at large distances from gene   Do most transdev genes have CNEs associated? They could be enhancers or silencers The relative positioning and order of CNEs are completely conserved  Do they play a role in structuring the genomic architecture around transdev genes? Discussion  No vertebrate CNEs are found in invertebrates  Are there CNEs in invertebrates?  But PAX6 in Drosophila has been shown to have an highly effective LE9 enhancer, that is also well conserved in vertebrates (The Interactive Fly)  Why is it not found in this analysis?    Only 52 bp in length! (but the MLAGAN should have found it ..) So, maybe invertebrate enhancers/CNEs are shorter Should maybe look for shorter CNEs in vertebrates Discussion  Missing whole genome CNEs due to stringency of parameters.   Try discontinuous MegaBLAST which does not require exact word match of 20. Only 109 of 256 of non-coding ultraconserved regions from Berejano et al. are identified. Discussion  What is in the CNE?  Modules of transcription factor binding sites?     Regulatory RNAs? (i.e. microRNAs)     Hard to account for the high level conservation. Perform assays on portions of the CNEs. Use computational methods. Lack of EST evidence. Use regulatory RNA gene finders? Something else entirely? One thing is in agreement:  More functional studies are needed. Discussion  Do CNEs work together?   How to robustly test combinations of elements? Mutations in CNEs can cause human disease    Studies are showing that mutations in CNEs cause disorders. CNEs at very distal locations can still effect the transcription May be candidates for genetic screens seeking sequence variation associated with disease Check it out with dbSNP! References & Acknowledgements  Thanks to Misha Bilenky for lots of fun discussion   Woolfe A, Goodson M, Goode DK, Snell P, McEwen GK, Vavouri T, Smith SF, North P, Callaway H, Kelly K, Walter K, Abnizova I, Gilks W, Edwards YJ, Cooke JE, Elgar G. Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol. 2005 Jan;3(1):e7. Epub 2004 Nov 11. Elgar, G. Identification and analysis of cis-regulatory elements in development using comparative genomics with the pufferfish, Fugu rubripes. Semin Cell Dev Biol. 2004 Dec;15(6):715-9. Venkatesh B, Yap WH. Comparative genomics using fugu: a tool for the identification of conserved vertebrate cis-regulatory elements. Bioessays. 2005 Jan;27(1):100-7. Sandelin A, Bailey P, Bruce S, Engstrom PG, Klos JM, Wasserman WW, Ericson J, Lenhard B. Arrays of ultraconserved non-coding regions span the loci of key developmental genes in vertebrate genomes. BMC Genomics. 2004 Dec 21;5(1):99. The interactive fly. http://www.sdbonline.org/fly/aimain/1aahome.htm    
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            