Whose DNA was sequenced for the Human Genome Project? Download

Transcript
The Human Genome Project
Human Genome Project
The project that has identified and located all of the genes in human DNA, and
determined the sequences of the chemical bases that make up human DNA. This
information is stored in databases.
The aims of the project were to:





determine the sequences that comprise human DNA,
identify all of the genes in human DNA,
store this information in databases and improve tools available for its analysis,
transfer technologies gained from the project to private industry (eg biotechnology
companies) to develop new medical applications,
address the ethical, legal and social issues that may arise from the project.
The sequence of a ‘working draft’ of the human genome was published in the science journals
Science and Nature in early 2001. This was a special event, as the public and private research efforts
were publishing their results at the same time.
Analysis of the draft sequence revealed a vast amount of information including:







the average human gene consists of 3,000 nucleotide bases, but sizes vary greatly – the
largest known human gene has 2.4 million bases,
the order of 99.9% of nucleotide bases is exactly the same in all people,
the functions of over 50% of discovered genes remain unknown,
less than 2% of the genome encodes for the production of proteins,
much of the genome consists of repetitive base sequences. These repeats appear to have no
direct function, but over time reshape the genome by rearranging it; creating new genes or
modifying and reshuffling existing genes,
gene-rich areas of the genome are predominantly made up of G and C bases, whereas genepoor regions are mainly composed of A and T bases,
chromosome 1 has the most genes (2968) whereas the Y chromosome has the least (231).
Much is still unknown about our genome. Some of the things we still don’t know are:






the exact number of genes in the human genome,
the exact location, function and regulation of these genes,
the amount, distribution, information content and functions of ‘non-coding’ DNA, that is, DNA
that does not code for a protein product,
how gene expression, protein expression and post-translational events are orchestrated,
evolutionary conservation of genes and proteins amongst different organisms,
correlation of genetic variation between individuals with respect to health and disease.
How many genes did you say?



The size of genomes differs from one organism to the next. The human genome contains
more than 3.2 billion base pairs and about 30,000 genes.
The largest known genome belongs to a microscopic amoeba, Amoeba dubia, which is closely
followed in size by the lungfish and the Easter lily.
Three quarters of the Japanese pufferfish's 31,000 genes have direct human counterparts.
genomics
The study of the DNA sequence in the chromosomes of an organism. This includes the genes
that code for proteins, the regulatory sequences that control the genes and the non-coding
DNA segments.
Now that we have a map of the human genome, we have to learn how to read it. That means
figuring out which gene does what. Of the estimated 30,000 genes in the human genome, we have
very little idea about what each one does. One way of studying genes is to directly compare the
entire genome with other organisms. This study is called comparative genomics.
The human genome is extremely complicated and so, by comparing it with others, such as the
mouse or fruit fly genome, we gain insights into the similarities and differences. Scientists can learn
much about the function of human genes by comparing them with their mouse counterparts.
All the organisms scientists are using for genome comparison are known as model organisms, in that
they are a model against which the human genome can be studied.
So how can we compare mice genes with human genes when we have 2 legs and mice have 4, when
we have opposable thumbs and mice have claws? On a DNA level humans and other organisms
aren’t that different - on average, mouse and human genes are 85% similar.
So far completely mapped ‘model organism’ genomes include chimpanzee, mouse, rat, pufferfish,
fruit fly, sea squirts, roundworm, baker's yeast, the bacterium Escherichia coli and in February 2005,
the kangaroo.
All of these organisms are being used by comparative genomics researchers to further understanding
of the human genome. Around the world scientists from different nations are sequencing genomes.
In early 2005 research was underway into the dog, chicken, honey bee and sea urchin genomes.
What We've Learned So Far
What Does the Draft Human Genome Sequence Tell Us?
By the Numbers





The human genome contains 3164.7 million chemical nucleotide bases (A, C, T, and G).
The average gene consists of 3000 bases, but sizes vary greatly, with the largest known
human gene being dystrophin at 2.4 million bases.
The total number of genes is estimated at 30,000 —much lower than previous estimates of
80,000 to 140,000 that had been based on extrapolations from gene-rich areas as opposed to a
composite of gene-rich and gene-poor areas.
Almost all (99.9%) nucleotide bases are exactly the same in all people.
The functions are unknown for over 50% of discovered genes.
The Wheat from the Chaff




Less than 2% of the genome codes for proteins.
Repeated sequences that do not code for proteins ("junk DNA") make up at least 50% of the
human genome.
Repetitive sequences are thought to have no direct functions, but they shed light on
chromosome structure and dynamics. Over time, these repeats reshape the genome by
rearranging it, creating entirely new genes, and modifying and reshuffling existing genes.
During the past 50 million years, a dramatic decrease seems to have occurred in the rate of
accumulation of repeats in the human genome.
How It's Arranged





The human genome's gene-dense "urban centers" are predominantly composed of the DNA
building blocks G and C.
In contrast, the gene-poor "deserts" are rich in the DNA building blocks A and T. GC- and
AT-rich regions usually can be seen through a microscope as light and dark bands on
chromosomes.
Genes appear to be concentrated in random areas along the genome, with vast expanses of
noncoding DNA between.
Stretches of up to 30,000 C and G bases repeating over and over often occur adjacent to
gene-rich areas, forming a barrier between the genes and the "junk DNA." These CpG islands
are believed to help regulate gene activity.
Chromosome 1 has the most genes (2968), and the Y chromosome has the fewest (231).
How the Human Compares with Other Organisms





Unlike the human's seemingly random distribution of gene-rich areas, many other organisms'
genomes are more uniform, with genes evenly spaced throughout.
Humans have on average three times as many kinds of proteins as the fly or worm because of
mRNA transcript "alternative splicing" and chemical modifications to the proteins. This
process can yield different protein products from the same gene.
Humans share most of the same protein families with worms, flies, and plants, but the
number of gene family members has expanded in humans, especially in proteins involved in
development and immunity.
The human genome has a much greater portion (50%) of repeat sequences than the mustard
weed (11%), the worm (7%), and the fly (3%).
Although humans appear to have stopped accumulating repeated DNA over 50 million years
ago, there seems to be no such decline in rodents. This may account for some of the
fundamental differences between hominids and rodents, although gene estimates are similar
in these species. Scientists have proposed many theories to explain evolutionary contrasts
between humans and other organisms, including those of life span, litter sizes, inbreeding,
and genetic drift.
4.1
Discuss the Benefits of the Human Genome Project (H.P.G)
1.
The genome of an organism is all the genetic material of an individual or species. Eg in a haploid human cell
there are about 3 billion DNA bases arranged along a chromosome.
2.
The HGP is an international project that aims to identify all the human genes and to determine the sequences of
the 3 billion bases in human DNA.
3.
The project began in 1990 – should be completed by 2003 – 97% of human DNA is called “junk” DNA – it serves
no obvious purposes. The outcome has been made possible by advances in technology, molecular techniques,
electronics and computer software.
4.
The goals of the project are:a)
Genetic mapping of the human genome – ie locating 3000 genetic markers n human DNA
(genes);
b)
Physical mapping – ie cutting each chromosome into fragments and then determining the
correct order of the pieces;
c)
DNA sequencing – determining the exact order of the nucleotides on each chromosome,
d)
Analysing the genomes of other organisms eg bacteria, yeast
Benefits
1. Medical Benefits

Improved diagnosis of disease eg cancer, high blood pressure

Detection of genetic predisposition to disease

Drug design and gene therapy

When faulty genes are detected the disease can be treated early, it also leads to diagnostic tests which can
include genetic counselling

Normal genes may be cloned and the product of the expression of these genes used for treatment.
2. Non-Medical Benefits

Creates understanding of human evolution eg comparison between species, understanding of evolutionary
relationships (eg 1% difference between us and chimps)

Development in forensics

Understanding gene expression, mutate and are expressed, also the function of same genes
Possible Ethical Issues.

Who controls the genetic information

Will researchers be prevented by rival commercial interests from developing new tests and therapies]

Will knowledge of our genetic profile allow us to avoid disease, or inform us about our possible early death

Will our genetic profile be used against us by employers, insurers or governments (discrimination).
4.2
Describe and Explain the Limitations of data obtained from the HGP – see above – also
1.
Some scientists believe some complex processes eg brain function, will never be understood even with the HGP.
2.
Some genes, when they interact and function together, produce qualities that can’t be explained by the genes’
activities: (knowing the base sequences of DNA does not determine the function of every gene.)
3.
Criticism of money spent on the project – could it be spent elsewhere.
Why was a portion of the NHGRI budget set aside for ethical considerations?
Since the beginning of the Human Genome Project, it has been clear that science's expanding
knowledge of the genome would have a profound impact upon humanity. To maximize the potential
for beneficial effects while minimizing the risk of detrimental effects, it was essential that research
be conducted to investigate a wide range of issues related to the acquisition and use of genomic
information.
Five percent of the annual budget of the NHGRI is dedicated to examining ethical, legal and social
implications (ELSI) related to human genome research, incorporating specific recommendations into
the activities of NHGRI and providing guidance to policymakers and the public. The ELSI program
at NHGRI, which is considered unprecedented in biomedical science in terms of scope and level of
priority, provides an effective basis from which to assess the implications of genome research, and
has resulted in several notable improvements to the HGP.
An example is the decision to sequence the DNA of several anonymous individuals, rather than a
known individual, in order to protect privacy. Another example is the development of widely used
genetic privacy guidelines and draft legislation. The ELSI program at NHGRI now serves as a model
for large, publicly funded science efforts.
What will the next 50 years of medical science look like?
Having the essentially complete sequence of the human genome is similar to having all the pages of
a manual needed to make the human body. The challenge to researchers and scientists now is to
determine how to read the contents of all these pages and then understand how the parts work
together and to discover the genetic basis for health and the pathology of human disease. In this
respect, genome-based research will eventually enable medical science to develop highly effective
diagnostic tools, to better understand the health needs of people based on their individual genetic
make-ups, and to design new and highly effective treatments for disease.
Individualized analysis based on each person's genome will lead to a very powerful form of
preventive medicine. We'll be able to learn about risks of future illness based on DNA analysis.
Physicians, nurses, genetic counselors and other health-care professionals will be able to work with
individuals to focus efforts on the things that are most likely to maintain health for a particular
individual. That might mean diet or lifestyle changes, or it might mean medical surveillance. But
there will be a personalized aspect to what we do to keep ourselves healthy. Then, through our
understanding at the molecular level of how things like diabetes or heart disease or schizophrenia
come about, we should see a whole new generation of interventions, many of which will be drugs
that are much more effective and precise than those available today.
When can we expect new and better drugs?
It's important to be careful about raising expectations. Most new drugs based on the completed
genome are still perhaps 10 to 15 years in the future, although more than 350 biotech products many based on genetic research - are currently in clinical trials, according to the Biotechnology
Industry Organization. It usually takes more than a decade for a company to conduct the kinds of
clinical studies needed to win marketing approval from the Food and Drug Administration.
Testing, however, will arrive more quickly, especially the ability to predict individual future health
risks, and the ability to implement an enhanced approach to preventive medicine. In the next decade,
we may also be better able to determine which drugs work best for individuals, based on their
genetic make-up.
How has the Human Genome Project affected biological research?
Biological research has traditionally been a very individualistic enterprise, with researchers pursuing
medical investigations more or less independently. The magnitude of both the technological
challenge and the necessary financial investment prompted the Human Genome Project to assemble
interdisciplinary teams, encompassing engineering and informatics as well as biology; automate
procedures wherever possible; and concentrate research in major centers to maximize economies of
scale.
As a result, research involving other genome-related projects (e.g., the International HapMap Project
to study human genetic variation and the Encyclopedia of DNA Elements, or ENCODE, project) is
now characterized by large-scale, cooperative efforts involving many institutions, often from many
different nations, working collaboratively. The era of team-oriented research in biology is here.
In addition to introducing large-scale approaches to biology, the Human Genome Project has
produced all sorts of new tools and technologies that can be used by individual scientists to carry out
smaller scale research in a much more effective manner.
Will the era of the genome begin in April?
Yes. We are entering a new age of discovery that will transform human health. Our eventual
knowledge about the workings of the genome has the potential to fundamentally change our most
basic perceptions of our biological world. It is difficult to predict what will be learned and how
future knowledge will be applied, but there can be little doubt that understanding the genome will
revolutionize our concept of health and improve the human condition in remarkable ways.
Now that the genome is complete, what's next for NHGRI?
NHGRI's vision for the future, which is being published April 24, 2003 in the journal Nature, details
a diverse and exciting landscape of new possibilities. NHGRI will particularly focus on opportunities
to translate the results of the Human Genome Project into advances in medicine, including projects
that build upon the completed human genome sequence. This is particularly true of projects of a
large international scope that require extensive coordination and public investment to ensure that
results and discoveries remain freely available in the public domain.
An example is NHGRI's genetic variation mapping project, or HapMap, which will speed the
discovery of genes related to common illnesses such as asthma, cancer, diabetes and heart disease.
The HapMap should also be a powerful resource for studying the genetic factors contributing to
variation in response to environmental influences, in susceptibility to infection, and in the
effectiveness of drugs and vaccines. Another example is the ENCODE project, which aims to create
a comprehensive encyclopedia of the functional elements encoded in the DNA sequence, by
cataloging the identity and precise location of all of the protein-encoding and non-protein-encoding
genes within the genome.
the benefits of the Human Genome Project


Completed in 2003, the Human Genome Project (HGP) was a collaborative project that
lasted for 13 years. The goals of the project were to:
o identify all of the approximately 25,000-30,000 genes in human DNA
o determine the sequences of the 3 billion chemical base pairs that make up human
DNA
o store this information in databases
o improve and develop tools for data analysis
o address the ethical, legal and social issues that may arise from the project.
Analysis of the data will continue for many years. By licensing technologies to private
companies and awarding grants for innovative research, the project accelerated the
biotechnology industry and aided the development of new medical applications.
Current and potential applications and benefits of the HGP include:
Improvements in Molecular Medicine:



improved diagnosis of inheritable diseases;
earlier detection of genetic predispositions to disease;
rational drug design; gene therapy and control systems for drugs.
More accurate risk assessment:



assess health damage and risks caused by exposure to both high and low doses of radiation;
assess health damage and risks caused by exposure to mutagenic chemicals and cancercausing toxins;
reduce the likelihood of heritable mutations.
Better understanding of evolution and human migration (Bioarchaeology, Anthropology,
Evolution and Human Migration) :




study evolution through germline mutations in lineages;
study migration of different population groups based on female genetic inheritance;
study mutations on the Y chromosome to trace lineage and migration of males;
compare breakpoints in the evolution of mutations with ages of populations and historical
events.
DNA Forensics:





identify potential suspects whose DNA may match evidence left at crime scenes through
DNA fingerprinting of samples such as blood or skin
exonerate persons wrongly accused of crimes
identify crime and catastrophe victims
establish paternity and other family relationships
match organ donors with recipients in transplant programs.
describe and explain the limitations of data obtained from the Human Genome Project

It is now believed that only approximately 3% of the DNA in human chromosomes codes for
proteins. The other 97% of the DNA consists of non-coding regions (sometimes called ‘junk
DNA’), whose functions may include providing chromosomal structural integrity and
regulating where, when, and in what quantity proteins are made. The use of about 50% of
this ‘"junk DNA" is not known.

Some genes are found inside other genes, thus making their identification difficult.

Non-coding DNA is used in DNA fingerprinting.

It may be a long time before scientists totally understand the role of every gene, its
interaction with other genes and how DNA relates to such things as behaviour, brain function
and other aspects of neurobiology.

Some other limitations of the Human Genome Project involve ethical, legal and social
implications such as:
o fairness in the use of genetic information
o privacy and confidentiality
o psychological impact and stigmatisation
o education, standards and quality control
o commercialisation
o conceptual and philosophical implications.
What is sequencing and how do you sequence a genome?
Sequencing means determining the exact order of the base pairs in a segment of DNA. Human
chromosomes range in size from about 50,000,000 to 300,000,000 base pairs. Because the bases
exist as pairs, and the identity of one of the bases in the pair determines the other member of the pair,
scientists do not have to report both bases of the pair.
The primary method used by the HGP to produce the finished version of the human genetic code is
map-based, or BAC-based, sequencing. BAC is the acronym for "bacterial artificial chromosome."
Human DNA is fragmented into pieces that are relatively large but still manageable in size (between
150,000 and 200,000 base pairs). The fragments are cloned in bacteria, which store and replicate the
human DNA so that it can be prepared in quantities large enough for sequencing. If carefully chosen
to minimize overlap, it takes about 20,000 different BAC clones to contain the 3 billion pairs of
bases of the human genome. A collection of BAC clones containing the entire human genome is
called a "BAC library."
In the BAC-based method, each BAC clone is "mapped" to determine where the DNA in BAC
clones comes from in the human genome. Using this approach ensures that scientists know both the
precise location of the DNA letters that are sequenced from each clone and their spatial relation to
sequenced human DNA in other BAC clones.
For sequencing, each BAC clone is cut into still smaller fragments that are about 2,000 bases in
length. These pieces are called "subclones." A "sequencing reaction" is carried out on these
subclones. The products of the sequencing reaction are then loaded into the sequencing machine
(sequencer). The sequencer generates about 500 to 800 base pairs of A, T, C and G from each
sequencing reaction, so that each base is sequenced about 10 times. A computer then assembles
these short sequences into contiguous stretches of sequence representing the human DNA in the
BAC clone.
Whose DNA was sequenced for the Human Genome Project?
This is intentionally not known to protect the volunteers who provided DNA samples for
sequencing. The sequence is derived from the DNA of several volunteers. To ensure that the
identities of the volunteers cannot be revealed, a careful process was developed to recruit the
volunteers and to collect and maintain the blood samples that were the source of the DNA.
The volunteers responded to local public advertisements near the laboratories where the DNA
"libraries" were prepared. Candidates were recruited from a diverse population. The volunteers
provided blood samples after being extensively counseled and then giving their informed consent.
About 5 to 10 times as many volunteers donated blood as were eventually used, so that not even the
volunteers would know whether their sample was used. All labels were removed before the actual
samples were chosen.
What does it mean when you say you've completed the Human Genome Project?
The main goals of the Human Genome Project were first articulated in 1988 by a special committee
of the U.S. National Academy of Sciences, and later adopted through a detailed series of five-year
plans jointly written by the National Institutes of Health and the Department of Energy. At this time,
the principal goals laid out by the National Academy of Sciences have been achieved, including the
essential completion of a high-quality version of the human sequence. Other goals included the
creation of physical and genetic maps of the human genome, which were accomplished in the mid-
1990s, as well as the mapping and sequencing of a set of five model organisms, including the mouse.
All of these goals have been achieved within the time frame and budget first estimated by the NAS
committee.
Notably, quite a number of additional goals not considered possible in 1988 have been added along
the way and successfully achieved. Examples include advanced drafts of the sequences of the mouse
and rat genomes, as well as a catalog of variable bases in the human genome.