Download The Human Genome Project

Document related concepts

Ridge (biology) wikipedia , lookup

Comparative genomic hybridization wikipedia , lookup

Genomic imprinting wikipedia , lookup

Exome sequencing wikipedia , lookup

Molecular cloning wikipedia , lookup

Gene expression profiling wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

RNA-Seq wikipedia , lookup

Community fingerprinting wikipedia , lookup

Gene wikipedia , lookup

Whole genome sequencing wikipedia , lookup

Genomic library wikipedia , lookup

Non-coding DNA wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Molecular evolution wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
The Human Genome
Project
Dr. Jim Whitfield, Ph.D.
The Human Genome
Project
REMEMBER!
It is the sequence of base pairs in DNA
that determines the genetic make-up of
a given organism
Remember
If two individuals differ (and we all do) then those
individuals should have different DNA sequences – at
least in some portions
Remember
If two individuals differ (and we all do) then those
individuals should have different DNA sequences – at
least in some portions
These two ideas
1- The DNA sequence determines the genetic
information
2- And that everybody has differing DNA were the
driving forces behind the HGP and its quest to map all
the genes on each human chromosome
Getting Started
With the discovery of genetic engineering techniques
such as the polymerase chain reaction (PCR) (Kary
Mullis, 1983) it became possible to isolate, clone and
sequence specific sections of DNA strands
Because of this simple, straight forward procedure
the project to sequence the entire human genome
began in 1990
Getting Started
The HGP still remains the World’s largest
collaborative project
Getting Started
The HGP was funded by the US Department of
Energy, The National Institutes of Health (NIH), The
Wellcome Trust in the UK, as well as several
European Governments and China
The project was officially completed in 2003
The Magnitude of the
Project
The human genome is estimated to have 3 x 10 9 base
pairs
Your book states that initial estimates were that it
would cost about $3 (US) about Rs 192 to sequence
each base pair, putting the total about 9 billion US
dollars – However, the US Department of Energy
projected the cost to be 3 billion US dollars.
The final cost was 2.7 billion US dollars and it was
completed 2 years ahead of schedule
The Magnitude of the
Project
Consider the massive amount of data to be
generated, if each base pair represented a letter and
there were a thousand letters on a page and each
book contained a thousand pages it would take
about 3300 books to store the DNA sequence from a
single person
Fortunately as the HGP advanced do did advances in
computer technology which allowed for high speed
storage, retrieval and analysis
Spin-Offs
The HGP spawned the field of Bioinformatics
Bioinformatics is the application of computer
technology to the management of biological
information. Computers are used to gather, store,
analyze and integrate biological and genetic
information which can then be applied to gene-based
drug discovery and development. More about that
later!
Goals
Some of the important goals of the HGP were
Identify all of the estimated 20,000 – 25,000 genes in
human DNA
Goals
Some of the important goals of the HGP were
Identify all of the estimated 20,000 – 25,000 genes in
human DNA
Determine the sequence of the estimated 3 billion base
pairs that make up human DNA
Goals
Some of the important goals of the HGP were
Identify all of the estimated 20,000 – 25,000 genes in
human DNA
Determine the sequence of the estimated 3 billion base
pairs that make up human DNA
Develop a method to store this information in a widely
searchable data base (bioinformatics)
Goals
Some of the important goals of the HGP were
Identify all of the estimated 20,000 – 25,000 genes in
human DNA
Determine the sequence of the estimated 3 billion base
pairs that make up human DNA
Develop a method to store this information in a widely
searchable data base (bioinformatics)
Develop advanced data analysis techniques
Goals
Continued
Transfer these new technologies to other fields of
industry
Goals
Continued
Transfer these new technologies to other fields of
industry
Address the ethical, legal, and social issues (ELSI) that
would arise from the project
ELSI
The ELSI program at NHGRI, which is considered
unprecedented in biomedical science in terms of
scope and level of priority, provides an effective basis
from which to assess the implications of genome
research, and has resulted in several notable
improvements to the HGP.
ELSI
The ELSI program at NHGRI, which is considered
unprecedented in biomedical science in terms of scope
and level of priority, provides an effective basis from
which to assess the implications of genome research, and
has resulted in several notable improvements to the HGP.
An example is the decision to sequence the DNA of several
anonymous individuals, rather than a known individual, in
order to protect privacy. Craig Venter the CEO of Celara
(the US Private Partner) eventually announced that he
was one of the 5 people sequenced
ELSI
The ELSI program at NHGRI, which is considered unprecedented in
biomedical science in terms of scope and level of priority, provides
an effective basis from which to assess the implications of genome
research, and has resulted in several notable improvements to the
HGP.
An example is the decision to sequence the DNA of several
anonymous individuals, rather than a known individual, in order to
protect privacy. Craig Venter the CEO of Celara (the US Private
Partner) eventually announced that he was one of the 5 people
sequenced
Another example is the development of widely used genetic
privacy guidelines and draft legislation. The ELSI program at NHGRI
now serves as a model for large, publicly funded science efforts.
Methodology
Two distinctly different methods of sequencing were
used
Methodology
Two distinctly different methods of sequencing were
used
The first method focused on identifying only those
genes that are expressed as RNA – This is referred to
as Expressed Sequence Tags (EST’s) – Remember
most DNA is Junk and does not get expressed
Methodology
Two distinctly different methods of sequencing were
used
The first method focused on identifying only those
genes that are expressed as RNA – This is referred to
as Expressed Sequence Tags (EST’s) – Remember
most DNA is Junk and does not get expressed
This method limits the fragment length to between
500 and 800 nucleotides
Methodology
The identification of ESTs has proceeded rapidly, with
approximately 74.2 million ESTs now available in
public databases
Methodology
The identification of ESTs has proceeded rapidly, with
approximately 74.2 million ESTs now available in
public databases
ESTs have become a tool to refine the predicted
transcripts for those genes, which leads to the
prediction of their protein products and ultimately
their function.
Methodology
The identification of ESTs has proceeded rapidly, with
approximately 74.2 million ESTs now available in public
databases
ESTs have become a tool to refine the predicted
transcripts for those genes, which leads to the prediction
of their protein products and ultimately their function.
Also, the situation in which those ESTs are obtained
(tissue, organ, disease state - e.g. cancer) gives
information on the conditions in which the corresponding
gene is acting.
Methodology
The second was a more of a shotgun approach who
goal was to sequence the entire genome including all
coding and non-coding regions. The functions of each
region would be determined at a later date – This is
called Sequence Annotation
It is also known as DNA or Genome Annotation
Methodology
The second was a more of a shotgun approach who
goal was to sequence the entire genome including all
coding and non-coding regions. The functions of each
region would be determined at a later date – This is
called Sequence Annotation
It is also known as DNA or Genome Annotation
This method does not explain what any of the DNA
actually does
Methodology
Therefore, once the DNA has been sequenced it must
be broken up into smaller fragments – remember
DNA is a very long polymer and there are technical
limitations to how much can be sequenced at once
Methodology
Therefore, once the DNA has been sequenced it must
be broken up into smaller fragments – remember
DNA is a very long polymer and there are technical
limitations to how much can be sequenced at once
These shorter sequences are then cloned and
amplified in a suitable host usually yeast or bacteria
and are called YAC (yeast artificial chromosomes)
and BAC (bacterial arttifical chromosomes)
Methodology
Once the fragments have been sequenced they are
arranged based on overlapping regions that are
aligned by specialized computer programs
Methodology
The sequences were then “Annotated” - describing
their function
The last of the human chromosomes – chromosome
#1 was sequenced and annotated in 2006
Methodology
Another major challenge encountered by the HGP
researchers was being able to assign physical and
and genetic maps to the genome
This was accomplished using a technique known as
restriction fragment length polymorphism (RFLP)
RFLP
RFLP is a technique that exploits variations in
homologous DNA sequences. It refers to a difference
between samples of homologous DNA molecules
from differing locations
It also involves looking at microsatellites or Short
Tandem Repeats (STR’s) – This is where short (3-5)
nucleotides are repeated as often as 50X – they often
exhibit high degrees of mutations
RFLP
Analysis of RFLP variation in genomes was a vital tool
in genome mapping and genetic disease analysis. If
researchers were trying to initially determine the
chromosomal location of a particular disease gene,
they would analyze the DNA of members of a family
afflicted by the disease, and look for RFLP alleles that
show a similar pattern of inheritance as that of the
disease. Once a disease gene was localized, RFLP
analysis of other families could reveal who was at
risk for the disease, or who was likely to be a carrier
of the mutant genes.
RFLP
RFLP analysis was also the basis for early methods of
Genetic fingerprinting, useful in the identification of
samples retrieved from crime scenes, in the
determination of paternity, and in the
characterization of genetic diversity or breeding
patterns in animal populations.
America is introduced to DNA analysis and RFLP
What did the Human Genome Project
Discover?
The human genome contains 3,164, 700 nucleotide
bases
What did the Human Genome Project
Discover?
The human genome contains 3,164, 700 nucleotide
bases
The average gene contains about 3000 bases,
however there is a great range up to the largest
human protein dystropin which contains 2.4 million
bases (0.08% of the entire genome)
What did the Human Genome Project
Discover?
The human genome contains 3,164, 700 nucleotide
bases
The average gene contains about 3000 bases,
however there is a great range up to the largest
human protein dystropin which contains 2.4 million
bases (0.08% of the entire genome)
Dystrophin connects the cytoskeleton on muscle cells
to the underlying cellular matrix – Deficiencies can
lead to muscular dystrophy and a specific mutation
leads to Duchennes muscular dystrophy
What did the Human Genome Project
Discover?
The total number of genes is approximately 30,000.
This is significantly lower than the initial estimates.
More than 99% of all nucleotide bases are exactly the
same among all peoples
What did the Human Genome Project
Discover?
The total number of genes is approximately 30,000.
This is significantly lower than the initial estimates.
More than 99% of all nucleotide bases are exactly the
same among all peoples
What did the Human Genome Project
Discover?
The total number of genes is approximately 30,000.
This is significantly lower than the initial estimates.
More than 99% of all nucleotide bases are exactly the
same among all peoples
The functions of many genes still remains unknown
(currently about 30%)
What did the Human Genome Project
Discover?
The total number of genes is approximately 30,000. This is
significantly lower than the initial estimates. More than
99% of all nucleotide bases are exactly the same among all
peoples
The functions of many genes still remains unknown
(currently about 30%)
Only about 1% of the three billion letters directly codes for
proteins. Of the rest, about 25% make up genes and their
regulatory elements. The function of the remaining letters
is still unclear. Some of it may be redundant information
left over from our evolutionary past.
What did the Human Genome
Project Discover?
Repeated sequences make up a large portion of the
genome
What did the Human Genome
Project Discover?
Repeated sequences make up a large portion of the
genome – approaching 98%
These repeating section have no direct coding
functions and they may be evolutionary relics as well
as functioning in the production of ribosomal and
transfer RNA
What did the Human Genome Project
Discover?
Chromosome #1 has the highest number of genes and proteins and the Y
chromosome has the lowest of each
What did the Human Genome
Project Discover?
Scientists discovered about 1.4 million locations that
contained single nucleotide polymorphisms (SNPs)
this is where there is a difference in a single
nucleotide. These differences can be used to study
disease associated sequences and human
evolutionary history
The Future
The challenge to researchers and scientists now is to
determine how to read the contents of the sequenced
genome and then to understand how the parts work
together to discover the genetic basis for health and the
pathology of human disease. In this respect, genomebased research will eventually enable medical science to
develop highly effective diagnostic tools, to better
understand the health needs of people based on their
individual genetic make-ups, and to design new and highly
effective treatments for disease – so called “Precision
Medicine”
The Future
Precision medicine has lead to the new field of
Pharmacogenomics. Pharmacogenomics is the study
of how genes affect a person’s response to particular
drugs. This relatively new field combines
pharmacology (the science of drugs) and genomics
(the study of genes and their functions) to develop
effective, safe medications and doses that will be
tailored to variations in a person’s genes.
India has a Bright
Future
Consider that