Download Genome Organization

Document related concepts

Zinc finger nuclease wikipedia , lookup

DNA profiling wikipedia , lookup

DNA polymerase wikipedia , lookup

Replisome wikipedia , lookup

DNA nanotechnology wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

Microsatellite wikipedia , lookup

Helitron (biology) wikipedia , lookup

Transcript
Genome Organization
… all started with heating up DNA
ÇpH
Renaturation kinetics
Upon heating (or treatment with alkali), the two strands of the DNA double helix separate from
one another into two single-stranded molecules. This process, called DNA denaturation or
melting, can be assayed by several different methods, but most commonly the 40% increase in
optical density (termed hyperchromicity) that accompanies denaturation is monitored. When
optical density is measured as the DNA is heated, and temperature is plotted against absorbance,
a melting curve is obtained.
Trp
λ max 253 nm
λ max 259 nm
λ max 280 nm
Nucleic Acid
λ max 267 nm
λ max 271 nm
λ max 262 nm
Which are the chemical groups involved in H-bonding ?
Nitrogenous Bases DO NOT Base Pair in Solution
Effects of temperature
One of the simplest ways of denaturing nucleic acids is to raise the temperature
until all the hydrogen bonds and stacking forces have been broken.
The strands then dissociate or “melt”.
ss-nucleic acids absorbs UV light (max. at 260 nm) more strongly than ds-DNA
→ a phenomenon called hyperchromicity.
The absorption of DNA itself is some 40%
less than would be displayed by a mixture of
free nucleotides of the same composition.
The increase in absorption is proportional to
the extent of denaturation, so dissociation
can be followed by monitoring absorbance as
the temperature is being raised.
Thermal melting curve for DNA.
As the DNA is heated, the weak forces holding the ds-DNA together are
broken.
The midpoint of the increase in absorption is defined as the melting
temperature (Tm) or transition temperature.
DNA denaturation/renaturation experiments
⇒ effect of temperature or pH
⇒ hyperchromic effect (max. 40%)
⇒ free nitrogen bases versus stacked nitrogen
bases
⇒ simple and cheap experiment. The effect of
temperature on nucleic acids and proteins
(reversible versus irreversible denaturation)
⇒ little amounts of DNA can be detected
⇒ in addition to A260 nm one can also use
Ethidium Bromide to monitor DNA
⇒ salt conc. affects the renaturation.
10XSSC (1.5 M NaCl; 0.3 M Na-Citrate pH 7.0
as standard buffer)
⇒ use of sonicated DNA
(fragments = 400 - 500 bp)
Analysis of the GC content of the DNA extracted from
different organisms.
Ö No conclusive results on the organization of
GENOMES (e.g. E.coli 50%, S. cerevisiae 70%)
Pneumococcus (40 % GC content)
Serratia marcescens
(60 % GC content)
It is more difficult -- it takes a higher temperature (higher energy input) -to separate a GC base pair than an AT.
Denatured DNA will renature to re-form the duplex
structure if the denaturing conditions are removed (that
is, if the solution is cooled, the pH is returned to
neutrality, or the denaturants are diluted out).
Renaturation requires reassociation of the DNA strands
into a double helix, a process termed reannealing.
For this to occur, the strands must realign themselves
so that their complementary bases are once again in
register and the helix can be zippered up.
Many of the realignments are imperfect, and thus the
strands must dissociate again to allow for proper
pairings to be formed.
Renaturation and Hybridization
The reverse of nucleic acid denaturation is renaturation, also referred to as
reassociation or hybridization.
When heat-denatured DNA is cooled slowly, the complementary strands reassociate
to form a double-stranded molecule.
This is a two-step process: - random collisions
- rapid zippering
Single strands of DNA undergo
numerous random collisions until
eventually a collision occurs that places
complementary sequences in the
correct register.
This event is known as NUCLEATION.
Formation of a base-paired duplex
quickly follows
At the Tm of a long nucleic acid, 50% of the bp have been broken and the
molecules contain both single-stranded and double-stranded regions
Heat denaturation of hybrids containing oligonucleotides.
These hybrids are too short to contain both single and double-stranded
regions, and they melt over a very sharp temperature range.
At Tm half the hybrids have dissociated.
Satellite Sequences (1961)
At the time, several laboratories were carrying out
equilibrium density gradient centrifugation studies on DNA
from higher organisms. Two simultaneously found that a
number of such organisms, including two species of crab,
guinea pig and mouse, had DNA that -- unlike prokaryotic
DNA -- did not behave as a single entity upon
centrifugation. The DNA showed a major band, similar to
that of prokaryotic DNA, and a minor band (or bands) that
had a different density.
The minor band was termed "satellite".
(lower density)
Main Band 92% – Mouse DNA
GC content = 42% , (typical for a
mammal)
Minor band 8% - Mouse satellite
DNA GC content = 30%
Satellite DNA in Drosophila melanogaster
Satellite
1.672 g/cc
1.686 g/cc
Sequence
% of genome
(A A T A T)n
3.0%
(A A T A C)n
<0.5%
(A A T A G)n
<0.5%
(A A T A A C A T A G)n 2.0%
(A A G A C)n
2.4%
1.688 g/cc
359 base pair repeat
6.8%
1.705 g/cc
(A A G A G)n
6.0%
(A A G A G A G)n
1.5%
SCIENCE
Carnegie Institute of Washington
Time-course of an ideal reaction
Cot curves
DNA denaturation/renaturation experiments
C/Co = 1/(1 +kCot)
25°C below Tm
C = conc. ssDNA at time t
Co = initial ssDNA conc. (t = 0)
k = constant
t = incubation time
when C/C0 = ½
½ = 1/ (1 + kC0t½)
1 + kC0t½ = 2
→ kC0t½ = 1
The meaning of Cot
C0t½ = 1/k
A greater Cot implies a slower reaction.
(nucleotide-moles x sec /liter)
The renaturation of the DNA of any
genome should display a Cot ½ that is
proportional to its complexity.
[ 2nd order kinetic ]
(1966) Renaturation experiments carried out with mouse DNA.
If mouse DNA was denatured, and then subjected to renaturation, a fraction
of the DNA (about 8 - 10%) renatured in very short times -- many times
shorter than the simplest natural DNA known at the time (red line in the
figure).
Conclusion: this fraction of the mouse genome was present in all mouse
tissues examined, and was a simple sequence of about 200 - 400
nucleotides (ultimately it was found to be a sequence of 234 bp).
It was present in tandem arrays and was repeated about a million times in
the mouse genome. Subsequent analysis showed that this DNA formed a
satellite upon density gradient centrifugation.
Satellite DNA in general has the following characteristics:
- It is found in almost all eukaryotes.
- It consists of short sequences, typically 2 to 200 bp in length, repeated
many times in one or more tandem arrays.
- The bulk of the sequences seem to be located in a fraction of chromatin
called heterochromatin. When its location is determined, it is found mostly
around the centromeres, and at the telomeres.
- The satellite sequences are seldom transcribed.
Some of these satellites are cryptic; that
is, they do not have a density that is
different from main band DNA and so are
not visible after CsCl centrifugation as
distinct bands.
They become visible when actinomycin
D is added to the density gradients.
This and similar antibiotics bind at GC
sites, and shift the density of DNA
Renaturation rates depend on temperature, salt
concentration and the length of the DNA molecules
involved.
If these factors are held
constant, then the rate of
renaturation (how much of
the DNA forms a duplex per
time period) is dependent
only on the concentration of
the two strands of DNA.
DNA fingerprinting in Forensic Science
Minisatellites, 10-100 bp repeated several times in tandem (VNTR =Variable
Number Tandem Repeats)
Microsatellites, 2-4 bp repeated several times in tandem (STR = Short Tandem
Repeat Polymorphisms). They represent 3% of the human genome.
Tandem nucleotide repeat are generated
by slippage mutation occurring during
DNA replication
In most of the cases, DNA fingerprinting is
based on minisatellite or microsatellite markers.
When different probes are used to make several
fingerprints, the likelihood that any two
individuals chosen at random will have identical
matches in all of them is extremely small (less
than 1 in 1 trillion).
Nucleic Acid Hybridization
If DNA from two different species are mixed, denatured, and allowed to
cool slowly so that reannealing can occur, artificial hybrid duplexes may
form, provided the DNA from one species is similar in nucleotide sequence
to the DNA of the other.
The degree of hybridization is a measure of the sequence similarity or
relatedness between the two species.
Used for:
- Evolutionary relationships
Zuckerland, E. & Pauling,L.
Molecules as documents of evolutionary
history. J. Theor. Biol. (1965), 8:357-366
- Identification of specific genes,
using oligonucleotides or
polynucleotide probes
- Quantitative expression of genes
(amount of mRNA synthesized)
ss DNA
hybrid
Conclusion: each RNA molecule is derived from a specific DNA sequence
Re-Annealing or Hybridization
Works with:
• DNA - DNA
• DNA - RNA
• RNA - RNA
Basis of many techniques in
molecular biology
DNA-RNA HYBRID
DNA renaturation curve for mouse genomic DNA
10 %
15 %
% of dsDNA
% of ssDNA
Chemical and kinetic complexity
75 %
Kinetic Complexity
C0t½ (DNA of any genome)
C0t½ (E.coli)
This genome has an haploid DNA content
of 7x108 bp (chemical complexity)
=
Complexity of
any genome
4,2 x106 bp
Assuming C0t½ (E.coli) = 4
Kinetic Complexity Fast Component = 0,0013 x 0,25 x 4,2 x106 bp / 4
= 340 (bp)
Kinetic Complexity Intermediate Component = 1,9 x 0,30 x 4,2 x106 bp / 4
= 6x105 (bp)
Kinetic Complexity Slow Component = 630 x 0,45 x 4,2 x106 bp / 4
= 3x108 (bp)
Repetition frequency = Total DNA content (bp) • % (fraction) / kinetic complexity (bp)
Genes are expressed at widely differing
levels
Abundant mRNAs consist of a small number of
individual species, each present in a large number of
copies per cell.
Scarce mRNA (Complex mRNA) consists of a large
number of individual mRNA species, each present in
very few copies per cell. This accounts for most of the
sequence complexity in RNA.
• The first component is the ovalbumin mRNA which
indeed occupies about 50% of the messenger mass
in oviduct tissue.
• The next component provides 15% of the reaction,
with a total complexity of 15 kb. This corresponds to
7-8 mRNA species of average length 2000 bases.
• The last component provides 35% of the reaction,
which corresponds to a complexity of 26 Mb. This
corresponds to ~13,000 mRNA species of average
length 2000 bases.
Hybridization between excess
mRNA and cDNA identifies
several components in chick
oviduct cells, each characterized
by the Rot½ of reaction.
Elliot Volkin (1957) ORNL (Oak Ridge National Laboratories,TN)
Infection of E.coli with bacteriophage T2 + 32PO4 labeling
Alkaline hydrolysis – paper chromatography
Quantitative measurement of C, G, U, A
(Do you remember G. Mendel and E. Chargaff ?)
Nucleotide composition of the host E.coli
RNA ( C:G:U:A ratios of 1 : 1.4 : 1 : 1 )
Nucleotide composition of the infecting phage
RNA ( C:G:U:A ratios of 1 : 1 : 1.9 : 1.9 )
Nucleotide composition of the infecting phage
DNA ( C:G:T:A ratios of 1 : 1 : 1.9 : 1.9 )
This molecule was originally called
“DNA-like RNA”
Messenger RNA (mRNA) … the metabolic intermediate X
Features: 1) the same composition of DNA; 2) it renews itself very quickly
F. Jacob, M. Meselson, J. Monod, S. Brenner,
F. Crick (1960)
1) Bacteria grown for several generations on
heavy-isotope labeled media ⇒ “heavy”-labeled
bacteria
2) Bacteria infected with T4 virus, which destroys
bacterial DNA, and substitutes viral DNA.
Bacterial cells are simultaneuosly transferred to a
“light” (non heavy-isotope labeled) medium
3) Radioactive RNA precursor (14C – Uracil) was
added and time was allowed for virus-directed
synthesis to proceed
4) Bacteria are lysed, and ribosomes are
centrifuged on a CsCl gradient
5) Only heavy ribosomes were present after
centrifugation
Conclusion: ribosomes are not carrier of genetic
information; instead they are protein synthesis
machines
6) Any newly synthesized RNA is detected
because it is radioactive.
New radioactive [14C] RNA is associated with
“old” bacterial ribosomes
7) New radioactive RNA is removed from
ribosomes.
It hybridizes with viral single-stranded DNA and
not with bacterial ss-DNA.
These experiments established that:
1. The expression of the viral genes is associated
with the formation of new virus-specific RNA
molecules (mRNA);
2. Ribosomes are NOT involved in viral gene
expression except as passive sites of synthesis;
3. The new messenger RNA has a base sequence
complementary to DNA, and presumably originated
there;
4. The new mRNA can be isolated complexed to
ribosomes. It follows that these new RNA
molecules are indeed the genetic messenger,
carrying information from DNA to ribosome, as
envisaged by Francis Crick.
(M.I.T. & CSHL, New York)
Human adenovirus type 41 (Ad41) is
involved in the aetiology of some
enteric diseases.
The major capsid protein (hexon)
contributes to the pseudo-hexagonal
shape of the virus (Icosahedral-type).
R-looping experiments
The electron microscope photo shows
the formation of a DNA-mRNA hybrid.
This hybrid contains loops.
A TESTABLE HYPOTHESIS
As a consequence of the discovery that genes are often split, it
seems likely that higher organisms in addition to undergoing
mutations may utilize another mechanism to speed up
evolution: rearrangement (or shuffling) of gene segments to
new functional units. This can take place in the germ cells
through crossing-over during pairing of chromosomes.
This hypothesis seems even more attractive following the
discovery that individual exons in several cases correspond to
building modules in proteins, so-called domains, to which
specific functions can be attributed. An exon in the genome
would thus correspond to a particular subfunction in the protein
and the rearrangement of exons could result in a new
combination of subfunctions in a protein.
This kind of process could drive evolution considerably by
rearranging modules with specific functions.
Structure of eukaryotic genes
Genomic DNA (20 – 50 Kbp)
RNA
pol
Nuclear Heterogeneous RNA (hn RNA)
Precursor RNA
Mature messenger RNA (average length = 2.2 kb in human)
( approx. 5% of the length of the gene )
Notes: 1. Alternative splicing
from intronic sequences
2. mini-RNA with regulative function(s) produced
Prokaryotic and Eukaryotic Genes
Uninterrupted genes
Interrupted genes are expressed via a
precursor RNA.
Introns are removed when the exons are
spliced together.
The mRNA contains only the sequences of
the exons.
The exons coding for stretches of protein
tend to be fairly small. In higher eukaryotes,
the average exon codes for ~50 amino
acids, and the general distribution fits well
with the idea that genes have evolved by the
slow addition of units that code for small,
individual domains of proteins.
The introns vary widely in size. There are
no very long introns in worms, but flies
contain a significant proportion. In
vertebrates, the size distribution is much
wider, extending from approximately the
same length as the exons (<200 bp) to
lengths measured in 10s of kbs, and
extending up to 50-60 kb in extreme cases.
Eukaryotic genome organization
Very long genes are the result of very long introns, not the result of coding for longer
products.
Species
Average exon N°
length (kb)
Average gene length (Kb)
Average mRNA
S. cerevisiae
1
1,6
Fungi
3
1,5
C. elegans
4
4.0
3,0
D. melanogaster
4
11,3
2,7
Chicken
9
13,9
2,4
Mammals
7
16,6
2,2
10
times
1,6
1,5
There are virtually no S.
cerevisiae genes with
more than 4 exons.
Uninterrupted
genes
BACTERIAL GENOMES
-
Species
Genome size No. of
(kb)
genes
Gene density
(per 1 kb)
Fraction of noncoding DNA
Aeropyrum pernix
1670
1688
1.01
0.14
Sulfolobus solfataricus
2592
3012
1.16
0.15
Sulfolobus tokodaii
2695
2956
1.10
0.15
Methanococcus jannaschii
1665
1828
1.10
0.12
Methanobacterium
thermoautotrophicum
1751
1917
1.09
0.09
Pyrococcus horikoshii
1739
1796
1.03
0.09
Pyrococcus abyssi
1765
1802
1.02
0.08
Archaeoglobus fulgidus
2178
2467
1.13
0.07
Thermoplasma acidophilum
1565
1528
0.98
0.12
Thermoplasma volcanium
1585
1548
0.98
Halobacterium sp.a
2570
2640
Escherichia coli K12
4639
Buchnera sp.
Salmonella typhi
Organism
GC
%
Number of Genes
Annotated
Buchnera
26.2
564
B. burgdorferi
28.6
857
C. jejuni
30.6
1654
M. jannaschii
31.4
1715
M. genitalium
31.7
483
H. influenzae
38.0
1754
H. pylori
38.9
1593
A. aeolicus
43.3
1517
0.14
B. subtilis
43.5
4220
1.03
0.14
Synechocystis
47.6
3169
4375
0.94
0.12
Y. pestis
47.6
4043
641
610
0.95
0.12
E. coli
50.8
4290
4809
4696
0.98
0.13
D. radiodurans
67.0
2622
Vibrio cholerae
4033
3949
0.98
0.13
Yersinia pestis
4654
4096
0.88
0.19
R.
solanacearum
67.0
3442
Haemophilus influenzae
1830
1746
0.96
0.11
S. coelicolor
72.1
7851
a
The C-value paradox
The total amount of DNA in the
(haploid) genome is a characteristic of
each living species known as its Cvalue.
The C-value paradox describes the
lack of relationship between the DNA
content (C-value) of an organism and
its coding potential.
In some phyla (insects, amphibians,
and plants) there are extremely large
variations in DNA content between
organisms that do not vary very much in
complexity.
… a short tour into GENOMICS
“GENOM”
(Hans Winkler, Univ. Hamburg, 1920)
gen "gene" + (chromos)om "chromosome”
He referred to the complete set of
chromosomes, intended as carriers of
hereditary factors.
1 h WHAT IS GENOMICS
GENOMICS:
The study of genomes, starting from
the determination of the nucleotide
sequence of the chromosome/s of a
given organism.
G-A-T-C
GENOMES:
- How big they are ?
- How they are organized ?
- Why sequencing ?
2 h OBJECTIVES
‰
Construct physical and genetic maps
‰
Determine the DNA sequence
‰
Identify all the genes, and their regulatory sequences
‰
Characterize non-coding DNA sequences
‰
Recognize the molecular bases of cellular processes and the differences
among organisms.
- “Holistic approach”
- The amount of data obtained requires the use of
algorithms and computers that are able to assemble,
organize and analyze all the information.
- Results should be made available to the scientific
community.
3 h STRATEGIES USED FOR GENOMIC SEQUENCING PROJECTS
f What is DNA sequencing?
Genomes and genes are “restricted” to small
fragments.
Subsequently, the chemical composition of these
fragments is determined, as a sequence of bases.
[ Reductionist approach ]
Genetic Map →
measuring recombination frequencies of “linked markers”
(genes or polymorphisms, whose pattern of transmission can be tracked) → low
resolution
Restriction Map → alignment of 1-2 Mbp DNA fragments
Libraries →
→ medium resolution
40-400 Kbp DNA fragments inserted into artificial chromosomes
(YAC, BAC, cosmids) → high resolution
Nucleotide Sequence →
“is the ultimate physical map”
Translation initiation factor IF-1 [Escherichia coli str. K-12 substr. MG1655]
>EG10504 infA Protein chain initiation factor IF1
925665
5’ATGGCCAAAGAAGAC AATATTGAAATGCAA GGTACCGTTCTTGAA
ACGTTGCCTAATACC ATGTTCCGCGTAGAG TTAGAAAACGGTCAC
GTGGTTACTGCACAC ATCTCCGGTAAAATG CGCAAAAACTACATC
CGCATCCTGACGGGC GACAAAGTGACTGTT GAACTGACCCCGTAC
925466
GACCTGAGCAAAGGC CGCATTGTCTTCCGT AGTCGCTGA 3’
219 nucleotides
>EG10504 infA Protein chain initiation factor IF1
ATG GCC AAA GAA GAC AAT ATT GAA ATG CAA GGT ACC GTT CTT
M
A
K
E
D
N
I
E
M
Q
G
T
V
L
GAA ACG TTG CCT AAT ACC ATG TTC CGC GTA GAG TTA GAA AAC GGT CAC
E
T
L
P
N
T
M
F
R
V
E
L
E
N
G
GTG GTT ACT GCA CAC ATC TCC GGT AAA ATG CGC AAA AAC TAC ATC
V
V
T
A
H
I
S
G
K
M
R
K
N
Y
I
CGC ATC CTG ACG GGC GAC AAA GTG ACT GTT GAA CTG ACC CCG TAC
R
I
L
T
G
D
K
V
T
V
E
L
T
GAC CTG AGC AAA GGC CGC ATT GTC TTC CGT AGT CGC TGA
D
L
S
K
G
R
I
V
F
R
S
R
(stop)
P
Y
H
MAKEDNIEMQ GTVLETLPNT MFRVELENGH VVTAHISGKM
RKNYIRILTG DKVTVELTPY DLSKGRIVFR SR
calculated_mol_wt=8118
fMet-tRNA
3D structure
IF3
IF2
mRNA
rRNA
tRNA
f Escherichia coli genome:
.
.
ds circular DNA (4 639 221 bp)
Genome: 12.1 Mbp
1.Non classificati
N° of estimated genes =
6.100
Genome: 167 Mbp
5 cromosomes (2N = 10)
N° of estimated genes =
25.706
Working Draft Sequence of the
Human Genome
(published in Feb. 2001)
Unknown
functions
►
Human Genome: 3.4 Gbp
N° of estimated genes: 32.000-35.000
≅ 2 % of the genome corresponds
to EXONS
h 23% INTRONS (Intervening Seq.)
h 75% intergenic DNA
(< 2%)
Red Bars
J repeated sequences
Blue Bars
J EXONS
GenBank is an international archive where researchers can submit and retrieve partial
or complete genomic sequences.
Seats: NCBI (Bethesda Maryland, USA)
Established in 1988 as a national resource for molecular biology information, NCBI
creates public databases, conducts research in computational biology, develops
software tools for analyzing genome data, and disseminates biomedical information.
EMBL (Heidelberg, Germany)
DDBJ DNA Data Bank of Japan (Mishima, Japan)
BLAST is a program used to compare a sequence
of interest against all known sequences available
Types of DNA markers present in genomic DNA
- SNP (Single Nucleotide Polymorphysm) [ > 1% in a population]
- RFLP
5’ GCT CTATCGTT 3’
5’ GCT CTACCGTT 3’
3’ CGAGATAGCAA 5’
3’ CGAGATGGCAA 5’
The SNP defines two alleles for which there
could be three genotypes among individuals
in the population.
In the example shown above:
- Homozygous with T-A in both homologous
chromosomes
- Homozygous with C-G in both homologous
chromosomes
- Heterozygous with T-A in one
chromosome and C-G in the homologous
chromosome
► SNPs need not be in a coding
sequence, or even in a gene.
► They are the most common
form of genetic differences among
people.
► They are distributed approx.
uniformely along the chromosomes
(> 4X106 identified).
A marker is like a signpost on the genetic highway - a
spot that is observed in everyone and that can be used
as a reference point among people. (http://snp.cshl.org/
http://www.hgvbaseg2p.org/)
The marker itself (a SNP, for example) may or not
cause the disease, medicine response or other
phenotype that is being examined.
In some cases, it may be directly linked to the
phenotype, but it is useful as a signpost in either case.
Using the information SNPs provide, it
may be possible to predict your genetic risk
of developing a certain disease, to diagnose
a disease more accurately, or to predict how
you most likely will respond to a medicine.
How might a doctor's knowledge of your genetic data affect your everyday
life in the future?
Just as you carry your medical insurance card with you, you may also one day
carry a wallet-sized card that has your genetic data coded on it.
Doctors would be able to use this data to predict your risk of developing a
disease and your likely response to a medicine before they prescribe it for you
→ → towards personalized medicine.
Connecting phenotype with genotype is the fundamental aim of Genetics
Some polymorphic variants
Relative risk is the ratio of the disease rate in exposed persons
to that in people who are unexposed. This ratio can be <1 or > 1
Cytogenetic Location of APOE gene: 19q13.2
Molecular Location on chromosome 19: base pairs 50,100,878 to 50,104,489
Lipoproteins are responsible for packaging cholesterol and other fats and
carrying them through the bloodstream.
Apolipoprotein E is a major component of a specific type of lipoprotein called
very low-density lipoproteins (VLDLs). VLDLs remove excess cholesterol from
the blood and carry it to the liver for processing.
Maintaining normal levels of cholesterol is essential for the prevention of
disorders that affect the heart and blood vessels (cardiovascular diseases),
including heart attack and stroke.
The ε4 version of the APOE gene (Cys112→ Arg112; Cys158 → Arg158)
increases an individual's risk for developing late-onset Alzheimer disease.
People who inherit one copy of the APOE ε4 allele have an increased chance of
developing the disease; those who inherit two copies of the allele are at even
greater risk. The APOE ε4 allele may also be associated with an earlier onset of
memory loss and other symptoms.
5’TGT3’
→
5’CGT3’
Cys
→
Arg
The number of genes in a eukaryote varies from
6,000 to 40,000 but does not correlate with the
genome size nor with the complexity of the
organism.
S. Cerevisiae → 6,000 genes
S. Pombe →
5,000 genes
C. Elegans → 18,500 genes
D. Melanogaster → 13,600 genes
Arabidopsis thaliana → 25,000 genes
Oryza sativa
→ 40,000 genes
Mouse
→ 33,000
Man
→ 35,000
Mycoplasma genitalium → 470 genes
Archaea → 1,500-2,700 genes
H. influenzae → 1,743 genes
E. coli → 4,288 genes
Nitrogen-fixing bacteria → 6,000 genes
The bacteria with genome sizes below 1.5 Mb
are obligate intracellular parasites. Their
genomes identify the minimum number of
functions required to construct a cell.
The MINIMUM GENE NUMBER
required for any type of organism
increases with its complexity.
Key words: Genome size – Gene Density - Complexity of
the Genome - Complexity of the Organism
Complexity of a genome is
defined as the total length (bp) of
different DNA sequences.
Gene density is the average
number of genes per Mb of
genomic DNA.
There is an inverse correlation
between organism complexity and
gene density; the less complex the
organism, the higher the gene
density.