Download No Slide Title

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
HUMAN GENOME PROJECT
101
Human Genome Program, U.S. Department of Energy, Genomics and Its Impact on Medicine and Society: A 2001 Primer, 2001
Human Genome Project
Begun in 1990, the U.S. Human Genome Project is a 13-year effort coordinated by
the U.S. Department of Energy and the National Institutes of Health. The
project originally was planned to last 15 years, but effective resource and
technological advances have accelerated the expected completion date to
2003.
HGP goals are to:
■ identify all the approximately 35,000* genes in human DNA,
■ determine the sequences of the 3 billion chemical base pairs
that make up human DNA,
■ store this information in databases,
■ improve tools for data analysis,
■ transfer related technologies to the private sector, and
■ address the ethical, legal, and social issues (ELSI) that may
arise from the project.
Human Genome Data
• Derived from the Human Genome Project
• sequence freeze date in anticipation of data
release: 22 July 2000
• Release of First Draft Sequence of Human
Genome :
Nature 409 (6822), 15 February 2001
Science 291 (5507), 16 February 2001
• Release of “Complete” Draft Sequence of
Human Genome: April 2003
GENE
exons
GENE
Intragenic region
introns
tandem
repeats
interspersed
repeats
ACGTTGTGTCGCTGATTAGCTAGACCAAGATAGTTCG
CTATAGGCTATAGCGATATAACCCAGGGGGGATATAT
TAGGAGGAGAGATATAGGATAGATTACATGTGATATA
TAGGAGAGAGAATATATAAGAGAGAGAGAGATTTTTT
CTCCTGGTAAAAAGCTCGCTTAGGATTGCGCTAGATG
Fine Structure of Human Genomic DNA
The
Human
Genome
3.2 billion nucleotides
How
many
genes?
< 40,000
>100,000
But think of all our
traits, Jim-bo!
Ours?! Are you of
my species?
Get lost, punk!
Ouch!
The
Human
Genome
ACGTTGTGTCGCTGATTAGCTAGACCAAGATAG
TTCGCTATAGGCTATAGCGATATAACCCAGGGG
GGATACGCWHENISAGENEAGENETATTAGGAG
GAGAGATATAGGATAGATTACATGTGATATATA
GGAGAGAGAATATATAAGAGAGAGAGAGATTTT
TTCTCCTGGTAAAAAGCTCGCTTAGGATTGCGC
Experimental Discovery
(Genetics)
Comparative Genomics (Alignment)
Gene Prediction
Alignment
CTCGCTGACTCAATCGGATTATGCTAGTCG
TCATATGACTCAATCGGATTATGCTAGTCG
TGACTCAATCGGATTATGCTAGTCG
ATAGCCTAATAGCTGACTCAATCGGATTATGCTAGTCG
ATTTTTTTGACTCAATCGGATTA
CGGGGTGACTCAATCGGA
GCCCCCCCCCCCCTGAGTCAGGGGGGCTCGCTGCTGTGCTG
AAAAATATATTGACTCAATCGGATTATGCTAGTCG
GTCGTAGCTTGACTCAATCGGATTATGCTAGTCG
CTCGCTGACTCAATCGGATTATGCTAGTCG
TCATATGACTCAATCGGATTATGCTAGTCG
TGACTCAATCGGATTATGCTAGTCG
ATAGCCTAATAGCTGACTCAATCGGATTATGCTAGTCG
ATTTTTTTGACTCAATCGGATTA
CGGGGTGACTCAATCGGA
GCCCCCCCCCCCCTGAGTCAGGGGGGCTCGCTGCTGTGCTG
AAAAATATATTGACTCAATCGGATTATGCTAGTCG
GTCGTAGCTTGACGGAATCGGATTATGCTAGTCG
CTCGCTGACTCAATCGGATTATGCTAGTCG
TCATATGACTCAATCGGATTATGCTAGTCG
TGACTCAATCGGATTATGCTAGTCG
ATAGCCTAATAGCTGACTCAATCGGATTATGCTAGTCG
ATTTTTTTGACTCAATCGGATTA
CGGGGTGACTCAATCGGA
GCCCCCCCCCCCCTGAGTCAGGGGGGCTCGCTGCTGTGCTG
AAAAATATATTGACTCAATCGGATTATGCTAGTCG
GTCGTAGCTTGACGGAATCGGATTATGCTAGTCG
Gene Prediction
TTCGCTATAGGCTATAGCGATATAACCCAGGGGGGATACGCTATTAGGAG
GAGAGAATATAAAGGATAGATTACATGTGATATATGGAGAGAGAATATAT
AAGAGAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTATGGATTGC
GCTTCGCTATAGGCTATAGCGATATAACCCAGGGGGGATACGCTATTAGG
AGGAGAGATATAGGATAGATTACATGTGATATATAGGAGAGAGAATATAT
AAGAGAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTAGGATTGCG
CTTCGCTATAGGCTATGCGATATAACCCAGGGGGGATACGCTATTAGGAG
GAGAGATATAGGATAGATTACATGTGATATATAGGAGAGAGAATATATAA
GAGAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTAGGATTGCGCT
TCGCTATAGGCTATAGCGATATGACCCAGGGGGGATACGCTATTAGGAGG
AGAGATATAGGATAGATTACATGTGATATATAGGAGAGAGAATATATAAG
AGAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTAGGATTGCGCTT
CGCTATAGGCTATAGCGATATAACCCAGGGGGGATATGATATTAGGAGGA
GAGATATAGGATAGATTACATGTGATATATAGGAGAGAGAAATAATATAA
GAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTAGGATTGCGC
TTCGCTATAGGCTATAGCGATATAACCCAGGGGGGATACGCTATTAGGAG
GAGAGAATATAAAGGATAGATTACATGTGATATATGGAGAGAGAATATAT
AAGAGAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTATGGATTGC
GCTTCGCTATAGGCTATAGCGATATAACCCAGGGGGGATACGCTATTAGG
AGGAGAGATATAGGATAGATTACATGTGATATATAGGAGAGAGAATATAT
AAGAGAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTAGGATTGCG
CTTCGCTATAGGCTATGCGATATAACCCAGGGGGGATACGCTATTAGGAG
GAGAGATATAGGATAGATTACATGTGATATATAGGAGAGAGAATATATAA
GAGAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTAGGATTGCGCT
TCGCTATAGGCTATAGCGATATGACCCAGGGGGGATACGCTATTAGGAGG
AGAGATATAGGATAGATTACATGTGATATATAGGAGAGAGAATATATAAG
AGAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTAGGATTGCGCTT
CGCTATAGGCTATAGCGATATAACCCAGGGGGGATATGATATTAGGAGGA
GAGATATAGGATAGATTACATGTGATATATAGGAGAGAGAAATAATATAA
GAGAGAGAGATTTTTTCTCCTGGTAAAAAGCTCGCTTAGGATTGCGC
GENE
exons
GENE
Intragenic region
introns
tandem
repeats
interspersed
repeats
Gene Prediction Algorithms
based on consensus nucleotide sequences of
•tata boxes and start codons
•stop codons
•splice junctions
•CpG islands
Comparative Gross Results from
Model Genome Projects
Humans have about 35,000 genes!
You were right.
So what’s new!
Human Genes
Surprising Findings = !!
•
•
•
•
!! Only 35,000 genes
most genes in euchromatin
GC/AT patchiness
!! Gene density higher & intron
size smaller in GC-rich patches
• !! 1.4% translated, 28%
transcribed
• !! Origins of genes
Some Origins of Human Genes
• Most from distant evolutionary past
(basic metabolism, transcription, translation,replication fixed since appearance of bacteria and yeast)
• Only 94/1278 families vertebrate-specific
• 740 are nonprotein-encoding RNA genes
• many derive from partial genomes of viruses and
virus-like elements—genomic fossils
• some acquired directly from bacteria
(rather than by evolution from bacteria)
Genomic Fossils
Genomic Fossils
(also known as Molecular Fossils)
• interspersed repeats
• generated by integration of transposable
elements or retrotransposable RNAs
• active contemporary modifier of some
vertebrate genomes (mouse)
• formerly active modifier of human genome
• some as prevalent as 1.5 million copies
Alu Elements
Type of Short Interspersed Nuclear Element (SINE)
direct
repeats
5’
31 bp
A
Alu
•
•
•
•
•
•
•
B
A/T-rich
region and
3’-UTR
3’
AAAn
RNA polymerase
III Promoter
A-rich
region
50-300 bp
transcribed by RNA polymerase III
3’ oligo dA-rich tail
found only in primates
1,500,000 copies
derived from 7SL RNA gene
dimer-like structure
most retroposition occurred 40 mya
Reverse Transcription
Essential for Retroposition and Proliferation of Retroelements
• Converts primed RNAs into cDNAs
• catalyzed by RNA-dependent DNA pol
» (reverse transcriptase)
• pol encoded by retroviruses and active LINEs
Retroviral genomic RNA
Alu RNA
LINE RNA
Alu Elements as Genomic Fossils
Alu Subfamily Structure (millions of years)
Oldest [J]
Intermediate [S]
450,000 copies
Youngest [Y]
50,000 copies
Jo
Jb
(65)
Sg
Y (25)
Yb8
S (50)
Sc
Ya5
Sx
Ya8
Sq
Sp
Alu Subfamily Structure
PS [J]: Primate-Specific. Abundant in all primates.
65-70 mya: Early Prosimian (strepsirhini)
Alu Subfamily Structure
AS [S]: Anthropoid-Specific (haplorhini) 50-60 mya
One mutation difference than PS.
Alu Subfamily Structure
CS[S]: Catarrhine-specific. Nine mutations arising
30-40 mya: Platyrrhines (FN) (Marmoset)
Catarrhine (DFN) (Macaque)
Alu Subfamily Structure
HS [Y]: Human-specific. Five or more additional
20-25 mya: Almost exclusively Hominids
Master Gene Model of Retroposition
P. Deininger, M. Batzer, Trends in Genetics 8:307, 1992
1. Amplification
3’
5’
2. Master mutation
5’
3’
TIME (m.y.)
Alus as Genomic Fossils
Alu Subfamily Structure (millions of years)
Oldest [J]
Intermediate [S]
450,000 copies
Youngest [Y]
50,000 copies
Jo
Jb
(65)
Sg
Y (25)
Yb8
S (50)
Sc
Ya5
Sx
Ya8
Sq
Sp
ALU INSERTIONS AND DISEASE
LOCUS
BRCA2
Mlvi-2
DISTRIBUTION
de novo
de novo (somatic?)
SUBFAMILY
Y
Ya5
de novo
Familial
Ya5
Yb8
about 50%
Ya5
Familial
Y
Familial
one Japanese family
Ya5
Yb8
familial
Ya4
C1 inhibitor
ACE
de novo
about 50%
Y
Ya5
Factor IX
2 x FGFR2
GK
a grandparent
De novo
?
Ya5
Ya5
NF1
APC
PROGINS
Btk
IL2RG
Cholinesterase
CaR
Sx
DISEASE
Breast cancer
Associated with
leukemia
Neurofibromatosis
Hereditary desmoid
disease
Linked with ovarian
carcinoma
X-linked
agammaglobulinaemia
XSCID
Cholinesterase
deficiency
Hypocalciuric
hypercalcemia and
neonatal severe
hyperparathyroidism
Complement deficiency
Linked with protection
from heart disease
Hemophilia
Apert’s Syndrome
Glycerol kinase
deficiency
REFERENCE
Miki et al, 1996
Economou-Pachnis and
Tsichlis, 1985
Wallace et al, 1991
Halling et al, 1997
Rowe et al, 1995
Lester et al, 1997
Lester et al, 1997
Muratani et al, 1991
Janicic et al, 1995
Stoppa Lyonnet et al, 1990
Cambien et al, 1992
Vidaud et al, 1993
Oldridge et al, 1997
McCabe et al, (personal
comm.)
What’s New About Old Fossils?
In the Human Genome
• Comprise nearly 50% of genome
• 50% more Alu elements than were predicted by
molecular biology
• scarce in highly-regulated regions (detrimental?)
• enriched in GC regions (beneficial?)
• little activity, but little scouring
• occur frequently within exons
• contribute to formation of genes encoding novel
proteins
The
Human
Genome
FEATURES
3.2 billion bases
28% transcribed
<1.4% encodes protein
50% repeats
not many modern protein families
Only ~35,000 genes!
Humans have about 35,000 genes!
Well, then…
How can you explain
human complexity?
Related documents