Download genome

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gene desert wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Point mutation wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Gene expression programming wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Short interspersed nuclear elements (SINEs) wikipedia , lookup

Cancer epigenetics wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Genetic engineering wikipedia , lookup

NUMT wikipedia , lookup

Primary transcript wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Public health genomics wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Transposable element wikipedia , lookup

Essential gene wikipedia , lookup

Metagenomics wikipedia , lookup

Oncogenomics wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

RNA-Seq wikipedia , lookup

Pathogenomics wikipedia , lookup

Genomic library wikipedia , lookup

Genomic imprinting wikipedia , lookup

Ridge (biology) wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Genomics wikipedia , lookup

Non-coding DNA wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Designer baby wikipedia , lookup

Human genome wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Microevolution wikipedia , lookup

Genome (book) wikipedia , lookup

Gene expression profiling wikipedia , lookup

Gene wikipedia , lookup

Genome editing wikipedia , lookup

Helitron (biology) wikipedia , lookup

History of genetic engineering wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Minimal genome wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
Chapter 3:
How many genes are there?
3.1 Introduction
Total number of genes at four levels:
•genome is the complete set of genes of an organism
•transcriptome is the complete set of genes expressed
under particular conditions
•proteome is the complete set of proteins
•Proteins may function independently or as part of
multiprotein assemblies
 Identify the coding potential of a genome directly:
•open reading frames
• transcriptome: mRNAs
•proteome: all the proteins
3.2 Why are genomes so large?
C Value
Figure 3.1
DNA content of the
haploid genome is
related to the
morphological
complexity of
lower eukaryotes,
but varies
extensively among
the higher
eukaryotes. The
range of DNA
values within a
phylum is indicated
by the shaded area.
Figure 3.2
The minimum
genome size
found in each
phylum increases
from prokaryotes
to mammals.
Figure 3.3 The genome sizes of some
common experimental animals.
Figure 3.4
The proportions of
different sequence
components vary in
eukaryotic genomes.
The absolute
content of
nonrepetitive DNA
increases with
genome size, but
reaches a plateau at
~2 X109 bp.
3.3 Total gene number is known
for several organisms
Figure 3.5
Genome sizes and
gene numbers are
known from
complete
sequences for
several organisms
(Arabidopsis,
Drosophila, and
man are estimated
from partial data).
Lethal loci are
estimated from
genetic data.
Figure 3.6 ~20% of Drosophila genes code for proteins
concerned with maintaining or expressing genes, ~20% for
enzymes, <10% for proteins concerned with the cell cycle or
signal transduction. Half of the genes of Drosophila code for
products of unknown function.
Figure 3.7 Because many genes are duplicated. the number
of different gene families is much less than the total
number of genes.
Figure 3.8
The fly genome
can be divided
into genes that
are (probably)
present in all
eukaryotes,
additional genes
that are
(probably)
present in all
multicellular
eukaryotes, and genes that are more specific to subgroups of
species that include flies.
3.4 How many genes are essential?
Figure 3.9
Genome sizes and
gene numbers are
known from
complete sequences
for several
organisms
(Arabidopsis,
Drosophila, and
man are estimated
from partial data).
Lethal loci are
estimated from
genetic data.
3.5 How many genes are
expressed?
Figure 3.10
Hybridization between
excess mRNA and
cDNA identifies several
components in chick
oviduct cells, each
characterized by the
Rot½ of reaction.
Figure 3.11 HDA analysis allows
change in expression of each gene
to be measured. Each square
represents one gene (top left is
first gene on chromosome I,
bottom right is last gene on
chromosome XVI). Change in
expression relative to wild type is
indicated by red (reduction), whte
(no change) or blue (increase).
High-Density oligonucleotide Arrays
3.6 Organelles have DNA
Figure 3.12 Mitochondrial genomes have genes
coding for (mostly complex 1-4) proteins, rRNAs,
and tRNAs.
3.8 Mitochondrial DNA codes for
few proteins
Figure 3.13 Human
mitochondrial DNA
has 22 tRNA genes, 2
rRNA genes, and 13
protein-coding
regions. 14 of the 15
protein-coding or
rRNA-coding regions
are transcribed in the
same direction. 14 of
the tRNA genes are
expressed in the
clockwise direction
and 8 are read counter
clockwise.
Figure 3.14 The
mitochondrial
genome of S.
cerevisiae contains
both interrupted and
uninterrupted
protein-coding genes,
rRNA genes, and
tRNA genes
(positions not
indicated). Arrows
indicate direction of
transcription.
3.9 The chloroplast genome codes for
~100 proteins and RNAs
Figure 3.15
The chloroplast
genome codes
for 4 rRNAs,
30 tRNAs, and
~50 proteins.
3.10 Summary
The sequences comprising a eukaryotic genome can
be classified in three groups:
 nonrepetitive sequences: unique;
moderately repetitive sequences: dispersed repeated
a small number of times in the form of related but not
identical copies;
highly repetitive sequences: short and usually
repeated as a tandem array.
The proportions of the types of sequence:
characteristic for each genome, although larger
genomes tend to have a smaller proportion of
nonrepetitive DNA.
The complexity of any class describes the length of
unique sequences in it; the repetition frequency
describes the number of times each sequence is
repeated.
The C-value paradox describes the discrepancy
between coding potential and DNA content in
eukaryotic genomes
 Most structural genes are located in
nonrepetitive DNA. The complexity of
nonrepetitive DNA is a better reflection of the
complexity of the organism than the total
genome complexity; nonrepetitive DNA reaches
a maximum complexity of ~2 x109 bp.
The total number of genes:
<1000 for Mycoplasma and intracellular
parasites,
20004000 for bacteria
>6000 for yeast
>12,000 for insects
>100,000 for mammals.
Genes are expressed at widely varying
levels. There may be 105 copies of mRNA
for an abundant gene whose protein is the
principal product of the cell, 103 copies of
each mRNA for <10 moderately abundant
messages, and <10 copies of each mRNA
for >10,000 scarcely expressed genes.
Overlaps between the mRNA populations of
cells of different phenotypes are extensive;
the majority of mRNAs are present in most
cells.
 not all genes are essential (lethal genes: the
existence of devastating effects when they are
mutated). The numbers of nonessential genes
and essential genes could be comparable.
yeast: only 60% of genes appear to be
essential;
D. melanogaster: <5000 essential genes.
We do not understand how nonessential genes
are maintained; they may provide selective
advantages that are not evident.
NonMendelian inheritance is explained by
the presence of DNA in organelles in the
cytoplasm. Mitochondria and chloroplasts
both represent membrane-bounded systems
in which some proteins are synthesized
within the organelle, while others are
imported. The organelle genome is usually a
circular DNA that codes for all of the RNAs
and for some of the proteins that are required.
 Mitochondrial genomes vary greatly in size
from the 16 kb minimalist mammalian genome
to the 570 kb genome of higher plants. It is
assumed that the larger genomes code for
additional functions.
Chloroplast genomes range from 120~200 kb.
Those that have been sequenced have a similar
organization and coding functions. In both
mitochondria and chloroplasts, many of the
major proteins contain some subunits
synthesized in the organelle and some subunits
imported from the cytosol.
 Mammalian mtDNAs are transcribed into a
single transcript from the major coding strand,
and individual products are generated by
RNA processing.
Rearrangements occur in mitochondrial DNA
rather frequently in yeast; and recombination
between mitochondrial or between chloroplast
genomes has been found.
There are some tantalizing homologies
between mitochondrial and chloroplast
genomes.