Download Yoni Toker - School of Natural Sciences

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Exome sequencing wikipedia , lookup

Microsatellite wikipedia , lookup

Helitron (biology) wikipedia , lookup

Human Genome Project wikipedia , lookup

Transcript
The Human Genome, and Human Complexity
Yoni Toker
Kolmogorov:
Complexity of an object is the shortest length of a
computer program that creates the object
Viewpoint
GENE NUMBER:
What If There Are Only 30,000 Human
Genes?
Science 16 February 2001:
Jean-Michel Claverie*
Vol. 291. no. 5507, pp. 1255 - 1257
Humans:
~ 30,000 genes
Worm (Caenorhabditis elegans):
~20,000 genes
Are we not much more complicated than worms?
Mapping of the Human genome
1953
Rosalind Franklin, James
Watson and Francis Crick
discover the double helical
structure of DNA.
Mid 1980’s
Human Genome Project Suggested
Objections to the Human Genome Project
•Too hard:
Human genome is 3e+9 base pairs long. A
lab (in the 1980’s) could sequence 500 base
pairs a day.
Base pairs
Days a year
3e+9/500/365 ~ 16,000
BP’s a day
years
Objections to the Human Genome Project
•Too hard:
Human genome is 3e+9 base pairs long. A
lab (in the 1980’s) could sequence 500 base
pairs a day.
•Too expensive!
•Not the way to do biology:
Biology is hypothesis driven experiments,
not a fishing expedition
Mapping of the Human genome
1953
Rosalind Franklin, James Watson and Francis Crick discover the double helical
structure of DNA.
Mid 1980’s
Human Genome Project Suggested
1990
Human Genome project announced: Goal:
sequence the entire human genome in 15 years,
with a budget of $3 billion
Comparison:
LHC budget ~5 billion
Aircraft carrier ~10 billion
Mapping of the Human genome
1953
Rosalind Franklin, James Watson and Francis Crick discover the double helical
structure of DNA.
Mid 1980’s
Human Genome Project Suggested
1990
Human Genome project announced: Goal: sequence the entire human genome
in 15 years, with a budget of $3 billion
1998
Only 5% of genome sequenced
I (Celera) will decode the entire
human genome in just 3 years with a
budget of only $300 Million Dollars
Sequencing small pieces of DNA
primer
A
C G A
T
A
C
C
G
T
C
A
T
A
F. Sanger et al., Nature 265, 687 (1977).
E. C. Strauss, J. A. Kobori, G. Siu, L. E. Hood, Anal.
Biochem. 154, 353 (1986).
Sequencing small pieces of DNA
primer
primer
primer
primer
primer
A
C G A
T
A
C
T
G
A
C G A
T
A
C
T
G C
T
A
A
C G A
T
A
C
T
G C
A
C G A
T
A
C
T
G
A
C G A
T
A
C
C G A
T
A
C
C
T
T
C
T
primer
A
T
G
Sequencing small pieces of DNA
T
T
G
C
T
G
T
G C
T
G C
T
G
T
T
A
T
C
Sequencing Large DNAs
The whole shotgun method
Fierce competition ..
Comes to a draw
June 26, 2000 President
Clinton, with J. Craig Venter,
left, and Francis Collins,
announces completion of "the
first survey of the entire
human genome."
Technology is getting better:
Solexa sequencing
Technology is getting better!
size of largest project (bp)
10
10
10
8
Sequencing
Syntheis
1e+5
10
10
10
6
4
2
0
10
1960
1970
1980
1990
Year of Publication
2000
Oligonucleotide Synthesis
• 1) De-Blocking
DMT= dimethoxytrity
dichloroacetic acid (DCA) or
trichloroacetic acid in dichloromethane
(DCM)
DMT
DMT
A
A
A
T
C
G
A
T
A
T
C
G
A
T
Oligonucleotide Synthesis
• 1) De-Blocking
•2) Base Condensation
DMT
DMT
C
C
C
DMT
C
A
A
DMT
C
A
C
A
A
A
Oligonucleotide Synthesis
• 1) De-Blocking
A
•2) Base Condensation
C
•3) Capping
A
•4) Oxidation
A
C
A
DMT
C
A
A
Oligonucleotide Synthesis
• 1) De-Blocking
A
•2) Base Condensation
C
•3) Capping
A
•4) Oxidation
DMT
A
A
DMT
C
A
A
DMT
C
A
A
DNA Synthesis
Genetic Code
4 base pairs
20 amino acids
Every 3 base pairs code for an amino acid
Example:
CCG
Proline
From DNA to Proteins
Some of the things we learned
•Human genome contains 3e+9 base pairs
•Less then 2% of the genome is genes
•Gene average length 3,000 base pairs
•Number of genes ~30,000
•98% genes identical between all people:
only 1-2% of genes responsible for color of
eyes, genetic diseases…
Genome Size
Species
Human
Fruit fly (Drosophila
melanogaster)
Baker's yeast
(Saccharomyces
cerevisiae)
Worm (Caenorhabditis
elegans)
E. coli
Arabidopsis
(Arabidopsis thaliana)
Size of genome
Number of genes
2900 e+6 base pairs
30,000
120 e+6 base pairs
13,601
12 e+6 base pairs
275 ,6
97 e+6 base pairs
19,000
4.1 e+6 base pairs
4,800
125 e+6 base pairs
25,000
Viewpoint
GENE NUMBER:
What If There Are Only 30,000 Human
Genes?
Science 16 February 2001:
Jean-Michel Claverie*
Vol. 291. no. 5507, pp. 1255 - 1257
Humans:
~ 30,000 genes
Worm (Caenorhabditis elegans):
~20,000 genes
Are we not much more complicated than worms?
Viewpoint
GENE NUMBER:
What If There Are Only 30,000 Human
Genes?
Jean-Michel Claverie*
•Are we really more complicated then flies and
worms?
• 30,000 is much more complicated then 20,000
• Gene number isn’t everything
30,000 is much more complicated then
20,000
230,000
220,000
3000
210,000 ~
10
~
Gene Number isn’t everything
mRNA
30,000 genes, but more than 85,000 mRNA
species
Alternative splicing
mRNA editing
Vertebrate Immune System
Gene sites
Anti body
Complexity comes from more sophisticated
regulation mechanisims!
More sophisticated methods of
gene expression and regulation
mRNA
mRNA editing
…
Proteins change their function:
•Number of sugars attached
•Folding/Unfolding
•….
Genetic Networks
Calverie:
Every gene connected on average to 4-5 other
genes
We are not much more complicated
then an airplane!
But: Genetic networks follow a
power law distribution
Genetic Networks
Number of connections
Average is not very meaningful!
Summary
Human Genome Project
•Decoding the “part list” of
humans
•Extraordinary technological
advances
Complexity: Genome is just the beginning
Aim High!
Dream On!
Aim High Dream On!
•Sequence more and more organisms
•Find the genes for genetic diseases
•Reconstruct the tree of life
•Learn more of nature’s tricks
•Creation of Synthetic life
• DNA nanotechnology
• Producing clean energy, depositing C02…