Download Arabidopsis thaliana Arabidopsis thaliana

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Metabolic network modelling wikipedia , lookup

Genome (book) wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genomic library wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Public health genomics wikipedia , lookup

Transposable element wikipedia , lookup

Human genome wikipedia , lookup

Designer baby wikipedia , lookup

Minimal genome wikipedia , lookup

Microevolution wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Gene wikipedia , lookup

Genome evolution wikipedia , lookup

Pathogenomics wikipedia , lookup

Gene expression profiling wikipedia , lookup

Metagenomics wikipedia , lookup

RNA-Seq wikipedia , lookup

Genomics wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome editing wikipedia , lookup

Helitron (biology) wikipedia , lookup

Transcript
V4 Genome of Arabidopsis thaliana
Review of lecture V3 ...
- What are Tandem repeats?
- How does one find CpG islands?
- What are Gardiner-Frommer and Takai-Jones parameters?
Why do we need t-tests? –
What are the findings of (Hutter et al. 2006)?
Biological Sequence Analysis
SS 2008
lecture 4
1
Arabidopsis thaliana
Arabidopsis thaliana is a small
flowering plant that is widely
used as a model organism in
plant biology.
Arabidopsis is a member of the
mustard (Brassicaceae) family,
which includes cultivated
species such as cabbage and
radish.
Arabidopsis is not of major
agronomic significance, but it
offers important advantages for
basic research in genetics and
molecular biology.
TAIR
Biological Sequence Analysis
SS 2008
lecture 4
2
Some useful statistics for Arabidopsis thaliana
– Small genome (114.5 Mb/125 Mb total) has been
sequenced in the year 2000.
– Extensive genetic and physical maps of all 5 chromosomes.
– A rapid life cycle (about 6 weeks from germination to
mature seed).
– Prolific seed production and easy cultivation in restricted
space.
– Efficient transformation methods utilizing Agrobacterium
tumefaciens.
– A large number of mutant lines and genomic resources
many of which are available from Stock Centers.
– Multinational research community of academic, government
and industry laboratories.
TAIR
•
Such advantages have made Arabidopsis a model organism for
studies of the cellular and molecular biology of flowering
plants.TAIR collects and makes available the information
arising from these efforts.
Biological Sequence Analysis
SS 2008
lecture 4
3
Arabidopsis thaliana genome sequence
Representation of the Arabidopsis
chromosomes. Sequenced portions are
red, telomeric and centromeric regions are
light blue, heterochromatic knobs are
shown black and the rDNA repeat regions
are magenta.
Left: DAPI-stained chromosomes.
Gene density (`Genes') ranged from 38
per 100 kb to 1 gene per 100 kb;
expressed sequence tag matches (`ESTs')
ranged from more than 200 per 100 kb to
1 per 100 kb.
Transposable element densities (`TEs')
ranged from 33 per 100 kb to 1 per 100
kb.
Mitochondrial and chloroplast insertions
(`MT/CP') were assigned black and green
tick marks, respectively.
Transfer RNAs and small nucleolar RNAs
(`RNAs') were assigned black and red
ticks marks, respectively.
Nature 408, 796 (2000)
Biological Sequence Analysis
SS 2008
lecture 4
4
Arabidopsis thaliana genome sequence
The proportion of Arabidopsis proteins having related counterparts in eukaryotic genomes varies by a factor of
2 to 3 depending on the functional category. Only 8 ± 23% of Arabidopsis proteins involved in transcription
have related genes in other eukaryotic genomes, reflecting the independent evolution of many plant
transcription factors.
In contrast, 48 ± 60% of genes involved in protein synthesis have counterparts in the other eukaryotic
genomes, reflecting highly conserved gene functions. The relatively high proportion of matches between
Arabidopsis and bacterial proteins in the categories `metabolism' and `energy' reflects both the acquisition of
bacterial genes from the ancestor of the plastid and high conservation of sequences across all species. Finally,
a comparison between unicellular and multicellular eukaryotes indicates that Arabidopsis genes involved in
cellular communication and signal transduction have more counterparts in multicellular eukaryotes than in
yeast, reflecting the need for sets of genes for communication in multicellular organisms.
Biological Sequence Analysis
SS 2008
lecture 4
Nature 408, 796 (2000)
5
Many genes were duplicated
Nature 408, 796 (2000)
Biological Sequence Analysis
SS 2008
lecture 4
6
Segmental duplication
Segmentally duplicated regions in the Arabidopsis genome.
Individual chromosomes are depicted as horizontal grey bars (with chromosome 1
at the top), centromeres are marked black.
Coloured bands connect corresponding duplicated segments. Similarity between
the rDNA repeats are excluded. Duplicated segments in reversed orientation are
connected with twisted coloured bands.
Nature 408, 796 (2000)
Biological Sequence Analysis
SS 2008
lecture 4
7
Membrane channels and transporters
Transporters in the plasma and intracellular membranes
of Arabidopsis are responsible for the acquisition,
redistribution and compartmentalization of organic
nutrients and inorganic ions, as well as for the efflux of
toxic compounds and metabolic end products, energy
and signal transduction.
Unlike animals, which use a sodium ion P-type ATPase
pump to generate an electrochemical gradient across
the plasma membrane, plants and fungi use a proton Ptype ATPase pump to form a large membrane potential.
plant secondary transporters are typically coupled to
protons rather than to sodium.
-almost half of the Arabidopsis channel proteins are
aquaporins which emphasizes the importance of
hydraulics in a wide range of plant processes.
- Compared with other sequenced organisms,
Arabidopsis has 10-fold more predicted peptide
transporters, primarily of the proton-dependent
oligopeptide transport (POT) family, emphasizing the
importance of peptide transport or indicating that there
is broader substrate specificity than previously realized.
- nearly 1,000 Arabidopsis genes encoding Ser/Thr
protein kinases, suggesting that peptides may have an
important role in plant signalling.
Nature 408, 796 (2000)
Biological Sequence Analysis
SS 2008
lecture 4
8
What is TAIR*?
•
•
•
•
NSF-funded project begun in 1999
Web resource for Arabidopsis data and stocks
Literature-based manual annotation of gene function
Genome annotation (gene structure, computational gene function)
URL
The following slides were borrowed
from a talk at the TAIR7 workshop
by Eva Huala & Donghui Li
SS 2008
lecture 4
*
Biological Sequence Analysis
9
Portals
Biological Sequence Analysis
SS 2008
lecture 4
10
Tools
Biological Sequence Analysis
SS 2008
lecture 4
11
Search
Biological Sequence Analysis
SS 2008
lecture 4
12
Biological Sequence Analysis
SS 2008
lecture 4
13
Names
Description
Biological Sequence Analysis
SS 2008
lecture 4
14
GO annotations
Expression
Biological Sequence Analysis
SS 2008
lecture 4
15
Sequences
Maps
Biological Sequence Analysis
SS 2008
lecture 4
16
Mutations
Seed lines
Biological Sequence Analysis
SS 2008
lecture 4
17
Seed lines
Links to other
sites
Biological Sequence Analysis
SS 2008
lecture 4
18
Seed lines
Links to other
sites
Biological Sequence Analysis
SS 2008
lecture 4
19
Seed lines
Links to other
sites
Biological Sequence Analysis
SS 2008
lecture 4
20
Seed lines
Links to other
sites
Biological Sequence Analysis
SS 2008
lecture 4
21
Comments
References
Biological Sequence Analysis
SS 2008
lecture 4
22
Biological Sequence Analysis
SS 2008
lecture 4
23
Biological Sequence Analysis
SS 2008
lecture 4
24
Biological Sequence Analysis
SS 2008
lecture 4
25
Biological Sequence Analysis
SS 2008
lecture 4
26
GBrowse - coming soon
Biological Sequence Analysis
SS 2008
lecture 4
27
Overview of releases to date
Protein coding genes
Transposons and pseudogenes
Alternatively spliced genes
Gene density (kb per gene)
Avg. exons per gene
Avg. exon length
Avg. intron length
Nature
(12/00)
25,498
TIGR1
(8/01)
25,554
TIGR2
(1/02)
26,156
TIGR3
(8/02)
27,117
TIGR4
(4/03)
27,170
TIGR5
(1/04)
26,207
NA
1,274
1,305
1,967
2,218
3,786
NA
4.50
5.20
250
168
0
4.55
5.23
256
168
28
4.48
5.25
265
167
162
4.32
5.24
266
166
1,267
4.38
5.31
279
166
2,330
4.54
5.42
276
164
TIGR3
(8/02)
27,117
TIGR4
(4/03)
27,170
TIGR5
(1/04)
26,207
TAIR6
(11/05)
26,541
TAIR7
(4/07)
26,819
1,967
2,218
3,786
3,818
3,889
162
4.32
5.24
266
166
1,267
4.38
5.31
279
166
2,330
4.54
5.42
276
164
3,159
4.48
5.64
269
164
3,866
4.44
5.79
268
165
26,819 protein coding genes
3,866 alternatively spliced
Biological Sequence Analysis
SS 2008
lecture 4
28
T
(1