Download Human Genome Project

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epigenetics of human development wikipedia , lookup

Genomic imprinting wikipedia , lookup

Segmental Duplication on the Human Y Chromosome wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Gene expression profiling wikipedia , lookup

Mutation wikipedia , lookup

Epigenomics wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Copy-number variation wikipedia , lookup

Point mutation wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

DNA sequencing wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Human genetic variation wikipedia , lookup

Oncogenomics wikipedia , lookup

Genetic engineering wikipedia , lookup

Transposable element wikipedia , lookup

NUMT wikipedia , lookup

Gene wikipedia , lookup

Microevolution wikipedia , lookup

RNA-Seq wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Public health genomics wikipedia , lookup

ENCODE wikipedia , lookup

Genome (book) wikipedia , lookup

Designer baby wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Helitron (biology) wikipedia , lookup

Non-coding DNA wikipedia , lookup

Pathogenomics wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Minimal genome wikipedia , lookup

Metagenomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Human genome wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Genome editing wikipedia , lookup

Genomic library wikipedia , lookup

Genome evolution wikipedia , lookup

Genomics wikipedia , lookup

Human Genome Project wikipedia , lookup

Transcript
HGP, Fragment Assembly,
and Physical Mapping
蔡懷寬
E-mail: [email protected]
人類基因組解讀計畫
(Human Genome Project, HGP)
Human Genome
基因組(genome)


All the genetic material in the chromosomes
of a particular organism
Its size is generally given as its total number
of base pairs.
基因組的大小






Human: 3000 million bases
Mouse: 3000 million bases
Drosophila (fruit fly): 165 million bases
Nematode (roundworm): 100 million bases
Yeast (fungus): 14 million bases
E. coli (bacteria) 4.67 million bases
人類基因組解讀計畫


簡稱為HGP (Human Genome Project)
主要目標有:






identify all the genes in human DNA,
determine the sequences of the 3 billion chemical
bases that make up human DNA
store this information in databases
develop tools for data analysis
transfer related technologies to the private sector
address the ethical, legal, and social issues (ELSI)
that may arise from the project
Human Genome Project (HGP)
Who did the work?
The work begun formally in 1990, carried out in 16 centers across the
world. The project originally was planned to last 15 years, but rapid
technological advances have accelerated the expected
completion date to 2003.
The international Human Genome Mapping Consortium
includes researchers in France, Germany, Japan, China, Great
Britain, Canada and the US.
Celera Genomics (www.celera.com)
Human Genome Project (HGP)
Whose genome was sequenced?
Celera
sequenced the genomes of
five anonymous individuals one African-American, one
Asian-Chinese, one
Hispanic-Mexican and two
Caucasians. One individual's
genome was sequenced 3.5
times; about half the
genome of each of the
remaining individuals was
sequenced.
The HGP
study is based on data
collected from many
individuals from around the
world over a longer period of
time, it is more difficult to
estimate the exact size of
the HGP pool (though it is
significantly more than
Celera's five).
HGP的沿革與進展(續)

2001年2月:


Initial sequencing and analysis of the
human genome (Nature, Vol. 409, 15 Feb.
2001, by International Human Genome
Sequencing Consortium)
The sequence of the human genome
(Science, Vol. 291, 16 Feb. 2001, by J. C.
Venter, et al.)
Publication of the Draft Human Genome Sequence
February 12, 2001
Science, 16 February 2001
Vol. 291 No.5507
Pages 1145-1434
Nature, 15 February 2001
Vol. 409
Pages 813-960
Strategies For Sequencing the Human Genome
The HGP's
approach has primarily relied
on a "map-based approach" sequencing an overlapping
series of large chunks of
human DNA cloned into
bacteria. These sequences,
represented by overlapping
bacterial clones, are then
compiled using computer
software using knowledge of
each clone's position on the
map. In this way, HGP has
read each letter of DNA an
average of five times
Celera's
genome sequencing approach
relied on "shotgun sequencing"
- a method in which small bits
of the genome are sequenced
and assembled by computers
into intermediary "scaffolds"
and, ultimately, whole
chromosomes and the genome
How to Sequence a Genome
by
Mapped clones
(clone & clone & clone)
Sequencing Strategies (1)
• Map-Based Assembly:
• Create a detailed complete fragment map
• Time-consuming and expensive
• Provides scaffold for assembly
• Original strategy of Human Genome Project
Interactive Presentation-How to Sequence
a Genome by Mapped Clones
Introduction
1. Mapping…1min 16s
2. Building Libraries …28s
3. Subclones…35s
4. E. coli to store and copy DNA… 54s
5. Preparing DNA for sequencing…16s
6. Sequencing Reaction…1min 44s
7. Products of Sequencing Reaction… 40s
8. Separating the Sequencing Reaction… 30s
9. Reading the Sequencing Reaction… 28s
10. Assembling the Results… 55s
11. Working Draft Sequence
12. Conclusion
How to Sequence a Genome
by
Shotgun Sequencing
Sequencing Strategies (2)
• Shotgun:
• Quick, highly redundant – requires 7-9X coverage
•
•
•
for sequencing reads of 500-750bp. This means
that for the Human Genome of 3 billion bp, 21-27
billion bases need to be sequence to provide
adequate fragment overlap.
Computationally intensive
Troubles with repetitive DNA
Original strategy of Celera Genomics
Shotgun Sequencing: Assembly of
Random Sequence Fragments
• To sequence a Bacterial Artificial Chromosome (100-300Kb),
millions of copies are sheared randomly, inserted into plasmids,
and then sequenced. If enough fragments are sequenced, it will
be possible to reconstruct the BAC based on overlapping
fragments.
What Does the
Draft Human Genome Sequence
Tell Us?
By the Numbers
The Wheat from the Chaff
How It's Arranged
How the Human Compares with Other Organisms
Variations and Mutations
By the Numbers
•The human genome contains 3164.7 million chemical nucleotide bases
(A, C, T, and G).
•The average gene consists of 3000 bases, but sizes vary greatly, with
the largest known human gene being dystrophin at 2.4 million bases.
•The total number of genes is estimated at 30,000 to 35,000 much lower
than previous estimates of 80,000 to 140,000 that had been based on
extrapolations from gene-rich areas as opposed to a composite of generich and gene-poor areas.
•Almost all (99.9%) nucleotide bases are exactly the same in all people.
•The functions are unknown for over 50% of discovered genes.
The Wheat from the Chaff
•Less than 2% of the genome codes for proteins.
•Repeated sequences that do not code for proteins ("junk DNA") make up at
least 50% of the human genome.
•Repetitive sequences are thought to have no direct functions, but they shed
light on chromosome structure and dynamics. Over time, these repeats
reshape the genome by rearranging it, creating entirely new genes, and
modifying and reshuffling existing genes.
•During the past 50 million years, a dramatic decrease seems to have
occurred in the rate of accumulation of repeats in the human genome.
How It's Arranged
•The human genome's gene-dense "urban centers" are predominantly
composed of the DNA building blocks G and C.
•In contrast, the gene-poor "deserts" are rich in the DNA building blocks A
and T. GC- and AT-rich regions usually can be seen through a microscope
as light and dark bands on chromosomes.
•Genes appear to be concentrated in random areas along the genome, with
vast expanses of noncoding DNA between.
•Stretches of up to 30,000 C and G bases repeating over and over often
occur adjacent to gene-rich areas, forming a barrier between the genes and
the "junk DNA." These CpG islands are believed to help regulate gene
activity.
•Chromosome 1 has the most genes (2968), and the Y chromosome has
the fewest (231).
How the Human Compares with
Other Organisms
•Unlike the human's seemingly random distribution of gene-rich areas, many other
organisms' genomes are more uniform, with genes evenly spaced throughout.
•Humans have on average three times as many kinds of proteins as the fly or worm
because of mRNA transcript "alternative splicing" and chemical modifications to the
proteins. This process can yield different protein products from the same gene.
•Humans share most of the same protein families with worms, flies, and plants, but
the number of gene family members has expanded in humans, especially in proteins
involved in development and immunity.
•The human genome has a much greater portion (50%) of repeat sequences than the
mustard weed (11%), the worm (7%), and the fly (3%).
•Although humans appear to have stopped accumulating repeated DNA over 50
million years ago, there seems to be no such decline in rodents. This may account
for some of the fundamental differences between hominids and rodents, although
gene estimates are similar in these species.
Variations and Mutations
•Scientists have identified about 1.4 million locations where single-base DNA
differences (SNPs) occur in humans. This information promises to revolutionize
the processes of finding chromosomal locations for disease-associated sequences
and tracing human history.
•The ratio of germline (sperm or egg cell) mutations is 2:1 in males vs females.
Researchers point to several reasons for the higher mutation rate in the male
germline, including the greater number of cell divisions required for sperm
formation than for eggs.
Other Model Organisms
Organism
H. influenzae
S. cerevisiae
C. elegans
A. thaliana
D. melanogaster
M. musculus
H. sapiens
Genome size
1.8MB
12.1MB
97MB
100MB
180MB
3000MB
300MB
Completion
date
1995
1996
1998
2000
2000
Estimated no
of genes
1,740
6,034
19,099
25,000
13,061
3,5000-45,000
ELSI
Ethical, Legal, and Social Implications of Human Genetics Research
Discrimination in insurance and employment based on genetic
information
When and how new genetic tests should be integrated into
mainstream health care services
Informed consent in genetic research protocols; and
Public and professional education about genetics research and
bioethics.
ELSI
The Next Step:
Functional Genomics
•Transcriptomics (microarray)
•Proteomics
•Structural genomics
•Knockout studies
•Comparative genomics
•Transcriptomics (microarray) involves large-scale analysis of messenger
RNAs transcribed from active genes to follow when, where, and under what
conditions genes are expressed.
•Studying protein expression and function--or proteomics--can bring researchers
closer to what's actually happening in the cell than gene-expression studies. This
capability has applications to drug design.
•Structural genomics initiatives are being launched worldwide to generate the 3D structures of one or more proteins from each protein family, thus offering clues
to function and biological targets for drug design.
•Experimental methods for understanding the function of DNA sequences and the
proteins they encode include knockout studies to inactivate genes in living
organisms and monitor any changes that could reveal their functions.
•Comparative genomics--analyzing DNA sequence patterns of humans and wellstudied model organisms side-by-side-has become one of the most powerful
strategies for identifying human genes and interpreting their function.