Download Slide 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

DNA sequencing wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Exome sequencing wikipedia , lookup

Microsatellite wikipedia , lookup

Helitron (biology) wikipedia , lookup

Transcript
Goals of the
Human Genome Project
•
•
•
•
•
•
determine the entire sequence of human DNA
identify all the genes in human DNA
store this information in databases
improve tools for data analysis
transfer related technologies to the private sector
address the ethical, legal, and social issues (ELSI)
that may arise from the project.
Sequencing a genome
Obtain Genomic DNA Sample
Sequence genomic DNA
Assemble sequences in order
Annotate sequence
Sanger Sequencing
Chemical reaction that includes:
DNA polymerase
DNA primer
Nucleotide bases (A, T, G, C)
Nucleotide bases that are ‘labeled’
Addition of labeled bases stops reaction.
Repeated many times.
DNA separated by size using a gel and an electric current
_
DNA moves
towards
positive charge
Short DNA moves
faster
+
Sequenced sample put in well
How do we sequence a
genome?
For the HGP, two approaches were used:
1. Hierarchical sequencing
2. Shotgun sequencing
How do we put the sequences together in
the right order?
Genome assembly - based on finding regions of overlap
between individual sequencing fragments
CCCATTAGATGCGATGGGTTAAAA
GGTTAAAAATCGATCCCATTTTACG
Very, very difficult problem for complex genomes!!
Genome Annotation
Annotation – identifying what part of DNA
corresponds to genes, etc.
Compare to known genes:
• Gene already described and sequenced
• Expressed Sequence Tags (EST),
essentially randomly sequenced mRNA
Predict genes:
• Computer predictions
Genome made of two types
of DNA
• Euchromatic
– Comprises 93% of your DNA
– Contains most of the genes in your genome
– 99% has been sequenced
• Heterochromatic DNA
– Comprises ~7% of your DNA
– Highly repetitive
– Some parts are structural: contains centromeres,
telomeres
– Gene sparse
– Very difficult to sequence, largely unexplored.
Euchromatic DNA
• 2.8 Billion base pairs
• ~30,000 genes
– Many fewer than expected, initial guesses were ~100,000 genes
– 50% have unknown function
– Less than 2% of the total genome
• 98% “junk” DNA
– Does not code for genes
– Function is unknown - but potentially very important!!!
– Many (~50%) repeated sequences (e.g. AGAGAGAGAGAG) and
transposable elements
What does the draft human
genome sequence tell us?
How the genome is arranged
• Genes occur in gene-dense “jungles” and gene poor
“deserts”.
• Genes appear to be concentrated in random areas along the
genome, with vast expanses of noncoding DNA between.
• Chromosome 1 has the most genes (2968), and the Y
chromosome has the fewest (~231).
HapMap
An NIH program to map genetic variation
within the human genome
• Begun in 2002
• Construct a map of the patterns of
variation that occur across human
populations.
• Facilitate the discovery of genes
involved in complex human traits
and diseases.
Evolutionary Genomics - comparing
genomes of different species to learn
about genome evolution and function
Organism
Genome Size
(Bases)
Estimated
Genes
Human (Homo sapiens)
3 billion
30,000
Laboratory mouse (M. musculus)
2.6 billion
30,000
Mustard weed (A. thaliana)
100 million
25,000
Roundworm (C. elegans)
97 million
19,000
Fruit fly (D. melanogaster)
137 million
13,000
Yeast (S. cerevisiae)
12.1 million
6,000
Bacterium (E. coli)
4.6 million
3,200
Human immunodeficiency virus (HIV)
9700
9
Gene number does not directly scale with complexity of organism!
What do evolutionary
comparisons tell us?
How the Human Compares with Other Organisms?
• Humans have 3X as many kinds of proteins as the fly or worm
• mRNA transcript "alternative splicing" and chemical modifications to the
proteins.
• This process can yield different protein products from the same gene.
• Large portions of non-genic DNA highly conserved, suggesting the serve
some function.