* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Arabidopsis thaliana Arabidopsis thaliana
Metabolic network modelling wikipedia , lookup
Genome (book) wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genomic library wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Public health genomics wikipedia , lookup
Transposable element wikipedia , lookup
Human genome wikipedia , lookup
Designer baby wikipedia , lookup
Minimal genome wikipedia , lookup
Microevolution wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Genome evolution wikipedia , lookup
Pathogenomics wikipedia , lookup
Gene expression profiling wikipedia , lookup
Metagenomics wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
V4 Genome of Arabidopsis thaliana Review of lecture V3 ... - What are Tandem repeats? - How does one find CpG islands? - What are Gardiner-Frommer and Takai-Jones parameters? Why do we need t-tests? – What are the findings of (Hutter et al. 2006)? Biological Sequence Analysis SS 2008 lecture 4 1 Arabidopsis thaliana Arabidopsis thaliana is a small flowering plant that is widely used as a model organism in plant biology. Arabidopsis is a member of the mustard (Brassicaceae) family, which includes cultivated species such as cabbage and radish. Arabidopsis is not of major agronomic significance, but it offers important advantages for basic research in genetics and molecular biology. TAIR Biological Sequence Analysis SS 2008 lecture 4 2 Some useful statistics for Arabidopsis thaliana – Small genome (114.5 Mb/125 Mb total) has been sequenced in the year 2000. – Extensive genetic and physical maps of all 5 chromosomes. – A rapid life cycle (about 6 weeks from germination to mature seed). – Prolific seed production and easy cultivation in restricted space. – Efficient transformation methods utilizing Agrobacterium tumefaciens. – A large number of mutant lines and genomic resources many of which are available from Stock Centers. – Multinational research community of academic, government and industry laboratories. TAIR • Such advantages have made Arabidopsis a model organism for studies of the cellular and molecular biology of flowering plants.TAIR collects and makes available the information arising from these efforts. Biological Sequence Analysis SS 2008 lecture 4 3 Arabidopsis thaliana genome sequence Representation of the Arabidopsis chromosomes. Sequenced portions are red, telomeric and centromeric regions are light blue, heterochromatic knobs are shown black and the rDNA repeat regions are magenta. Left: DAPI-stained chromosomes. Gene density (`Genes') ranged from 38 per 100 kb to 1 gene per 100 kb; expressed sequence tag matches (`ESTs') ranged from more than 200 per 100 kb to 1 per 100 kb. Transposable element densities (`TEs') ranged from 33 per 100 kb to 1 per 100 kb. Mitochondrial and chloroplast insertions (`MT/CP') were assigned black and green tick marks, respectively. Transfer RNAs and small nucleolar RNAs (`RNAs') were assigned black and red ticks marks, respectively. Nature 408, 796 (2000) Biological Sequence Analysis SS 2008 lecture 4 4 Arabidopsis thaliana genome sequence The proportion of Arabidopsis proteins having related counterparts in eukaryotic genomes varies by a factor of 2 to 3 depending on the functional category. Only 8 ± 23% of Arabidopsis proteins involved in transcription have related genes in other eukaryotic genomes, reflecting the independent evolution of many plant transcription factors. In contrast, 48 ± 60% of genes involved in protein synthesis have counterparts in the other eukaryotic genomes, reflecting highly conserved gene functions. The relatively high proportion of matches between Arabidopsis and bacterial proteins in the categories `metabolism' and `energy' reflects both the acquisition of bacterial genes from the ancestor of the plastid and high conservation of sequences across all species. Finally, a comparison between unicellular and multicellular eukaryotes indicates that Arabidopsis genes involved in cellular communication and signal transduction have more counterparts in multicellular eukaryotes than in yeast, reflecting the need for sets of genes for communication in multicellular organisms. Biological Sequence Analysis SS 2008 lecture 4 Nature 408, 796 (2000) 5 Many genes were duplicated Nature 408, 796 (2000) Biological Sequence Analysis SS 2008 lecture 4 6 Segmental duplication Segmentally duplicated regions in the Arabidopsis genome. Individual chromosomes are depicted as horizontal grey bars (with chromosome 1 at the top), centromeres are marked black. Coloured bands connect corresponding duplicated segments. Similarity between the rDNA repeats are excluded. Duplicated segments in reversed orientation are connected with twisted coloured bands. Nature 408, 796 (2000) Biological Sequence Analysis SS 2008 lecture 4 7 Membrane channels and transporters Transporters in the plasma and intracellular membranes of Arabidopsis are responsible for the acquisition, redistribution and compartmentalization of organic nutrients and inorganic ions, as well as for the efflux of toxic compounds and metabolic end products, energy and signal transduction. Unlike animals, which use a sodium ion P-type ATPase pump to generate an electrochemical gradient across the plasma membrane, plants and fungi use a proton Ptype ATPase pump to form a large membrane potential. plant secondary transporters are typically coupled to protons rather than to sodium. -almost half of the Arabidopsis channel proteins are aquaporins which emphasizes the importance of hydraulics in a wide range of plant processes. - Compared with other sequenced organisms, Arabidopsis has 10-fold more predicted peptide transporters, primarily of the proton-dependent oligopeptide transport (POT) family, emphasizing the importance of peptide transport or indicating that there is broader substrate specificity than previously realized. - nearly 1,000 Arabidopsis genes encoding Ser/Thr protein kinases, suggesting that peptides may have an important role in plant signalling. Nature 408, 796 (2000) Biological Sequence Analysis SS 2008 lecture 4 8 What is TAIR*? • • • • NSF-funded project begun in 1999 Web resource for Arabidopsis data and stocks Literature-based manual annotation of gene function Genome annotation (gene structure, computational gene function) URL The following slides were borrowed from a talk at the TAIR7 workshop by Eva Huala & Donghui Li SS 2008 lecture 4 * Biological Sequence Analysis 9 Portals Biological Sequence Analysis SS 2008 lecture 4 10 Tools Biological Sequence Analysis SS 2008 lecture 4 11 Search Biological Sequence Analysis SS 2008 lecture 4 12 Biological Sequence Analysis SS 2008 lecture 4 13 Names Description Biological Sequence Analysis SS 2008 lecture 4 14 GO annotations Expression Biological Sequence Analysis SS 2008 lecture 4 15 Sequences Maps Biological Sequence Analysis SS 2008 lecture 4 16 Mutations Seed lines Biological Sequence Analysis SS 2008 lecture 4 17 Seed lines Links to other sites Biological Sequence Analysis SS 2008 lecture 4 18 Seed lines Links to other sites Biological Sequence Analysis SS 2008 lecture 4 19 Seed lines Links to other sites Biological Sequence Analysis SS 2008 lecture 4 20 Seed lines Links to other sites Biological Sequence Analysis SS 2008 lecture 4 21 Comments References Biological Sequence Analysis SS 2008 lecture 4 22 Biological Sequence Analysis SS 2008 lecture 4 23 Biological Sequence Analysis SS 2008 lecture 4 24 Biological Sequence Analysis SS 2008 lecture 4 25 Biological Sequence Analysis SS 2008 lecture 4 26 GBrowse - coming soon Biological Sequence Analysis SS 2008 lecture 4 27 Overview of releases to date Protein coding genes Transposons and pseudogenes Alternatively spliced genes Gene density (kb per gene) Avg. exons per gene Avg. exon length Avg. intron length Nature (12/00) 25,498 TIGR1 (8/01) 25,554 TIGR2 (1/02) 26,156 TIGR3 (8/02) 27,117 TIGR4 (4/03) 27,170 TIGR5 (1/04) 26,207 NA 1,274 1,305 1,967 2,218 3,786 NA 4.50 5.20 250 168 0 4.55 5.23 256 168 28 4.48 5.25 265 167 162 4.32 5.24 266 166 1,267 4.38 5.31 279 166 2,330 4.54 5.42 276 164 TIGR3 (8/02) 27,117 TIGR4 (4/03) 27,170 TIGR5 (1/04) 26,207 TAIR6 (11/05) 26,541 TAIR7 (4/07) 26,819 1,967 2,218 3,786 3,818 3,889 162 4.32 5.24 266 166 1,267 4.38 5.31 279 166 2,330 4.54 5.42 276 164 3,159 4.48 5.64 269 164 3,866 4.44 5.79 268 165 26,819 protein coding genes 3,866 alternatively spliced Biological Sequence Analysis SS 2008 lecture 4 28 T (1