Download 1BIOLOGY 220W - Lecture Notes Packet

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Hardy–Weinberg principle wikipedia , lookup

DNA sequencing wikipedia , lookup

DNA repair wikipedia , lookup

Zinc finger nuclease wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

Genetic drift wikipedia , lookup

Comparative genomic hybridization wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Genetic engineering wikipedia , lookup

Mutagen wikipedia , lookup

Metagenomics wikipedia , lookup

Human genome wikipedia , lookup

DNA barcoding wikipedia , lookup

Primary transcript wikipedia , lookup

Mutation wikipedia , lookup

Cancer epigenetics wikipedia , lookup

DNA polymerase wikipedia , lookup

Gene wikipedia , lookup

DNA profiling wikipedia , lookup

Nucleosome wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Designer baby wikipedia , lookup

Population genetics wikipedia , lookup

Replisome wikipedia , lookup

DNA vaccination wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Human genetic variation wikipedia , lookup

DNA damage theory of aging wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Genomic library wikipedia , lookup

Point mutation wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

Molecular cloning wikipedia , lookup

Genomics wikipedia , lookup

Epigenomics wikipedia , lookup

Genealogical DNA test wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Genome editing wikipedia , lookup

SNP genotyping wikipedia , lookup

Non-coding DNA wikipedia , lookup

DNA supercoil wikipedia , lookup

Nucleic acid double helix wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Gel electrophoresis of nucleic acids wikipedia , lookup

Microsatellite wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Helitron (biology) wikipedia , lookup

History of genetic engineering wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
1
BIOLOGY 220W - Lecture Notes Packet - Claude dePamphilis
Chapter 1 The central importance of variation
Objective
The stunning degree of matching of DNA sequences of different species makes it
very difficult to imagine how they could attain this similarity without actually being
related to one another. Fossil evidence makes it clear that species exist today that did not
exist in the past, and that there are species that were abundant in the past which are totally
gone today. The challenge for Charles Darwin was to piece these observations together
with a proposed mechanism whereby one species could give rise to another. His
argument was fairly simple, and as we will see it hinges on the observation that all
species exhibit variation. In this chapter we will see what is the critical role of this
variation, and how an absence of variation is extremely hazardous to a species. We will
then spend a good bit of time learning how genetic variation is measured in populations,
and how that variation is organized.
The simplest statement of how evolution works
Charles Darwin recognized the central role of variation in his theory of
evolution by Natural Selection. He came to this conclusion after many long hours of
observation of many different species. He noted that some birds were more adept or
stronger fliers than others. Some plants had deeper roots than others. He reasoned that
these differences must result in different chances of survival and reproduction. If the
differences among organisms are in some way passed on from parents to offspring, then
there is a means for the differences to accumulate over generations. In Darwin’s book
Origin of Species, published in 1869, he presents a coherent theory for evolution that is
based on three principles:
1. There is VARIATION in populations.
2. To some degree, the variation is passed on from parents to offspring (the
variation is INHERITED).
3. Some variants are more successful at surviving and rearing offspring
(NATURAL SELECTION discriminates among the variants).
1
-1-
These three features are sufficient to result in adaptive changes in populations of
organisms over time. These lectures will document the changes that have been brought
about by this process of evolution. For now, the point that needs emphasis is that in the
absence of variation, the whole process comes grinding to a stop. There must be variation
in order for natural selection to choose the fit from the less fit.
Variation at the level of the phenotype and genotype
It is crucial to distinguish between two levels of variation. Variation among the
phenotypes in a population refers to variation among individuals in their appearance or
morphology (like height, weight) or their physiology (like running speed, jumping
height). Natural selection acts on this level of variation, because it is differences in
phenotypes that result in differences in survival and reproduction.
The other type of variation is variation at the level of the genotype. Of course,
some genotypic differences result in clear phenotypic differences, but in general,
variation in the genotype is what is passed along from parents to offspring, and the
phenotypes then result from those transmitted genotypes and the effect of environmental
variation on their development and expression. Natural selection acts on genotypes only
indirectly, through the effect of the genes on the phenotype. If a gene (or a variant of a
gene) has no effect on the phenotype at all, then selection does not act on that gene or
variant and it is said to be neutral.
Discrete vs. continuous variation
It is also useful to distinguish between variation in discrete and continuous traits.
Discrete traits are those that fall into easily separable and countable classes, the simplest
being presence or absence of a morphological structure. Gregor Mendel chose seven
discrete traits to study, including smooth vs. wrinkled peas, and yellow vs. green peas. In
the case of Mendel’s peas, the discrete phenotypes were associated with discrete, singlelocus genotypes. If natural selection were to act on the trait smooth-vs-wrinkled peas,
then the effect of the natural selection would be seen immediately on the frequencies of
the alleles for the gene that determines smooth vs. wrinkled. Not all discrete traits,
however are determined by a single gene. One might think of disease as a discrete trait -one can be either healthy or diseased. Many common diseases in humans seem to have
some degree of genetic basis because the incidence of the disease tends to cluster in
families. Arthritis is a good example. When we try to find the genes for susceptibility to
arthritis, it immediately becomes clear that many genes are involved, and generally there
are variables in the environment that also affect the trait.
Continuous traits are those that do not fall into separate categories, but which lie
-2-
along a continuous axis of measurement. Examples include height, weight, hair color, and
shoe size. There is a statistical tendency for the offspring of very tall parents to be tall,
and for offspring of very short parents to be short. Even though this correlation is not
perfect, it suggests that there are genes involved in the familial resemblance.
The relationship between the discreteness of underlying genes and the
continuousness of many characters is analogous to recorded music. Music is inherently a
continuous flow of analog sound, but engineers have figured out that an efficient way to
store, manipulate, and play back sound is to chop it up into discrete time slices which are
represented digitally. Changes in some of those bits would make an obvious audible
change to the music, but changes in other bits would make no difference that you could
hear. Similarly, many genetic changes have a major effect on the phenotype, whereas
others have no effect
How do we characterize and quantify genetic variation in populations?
Now let’s get away from the hypothetical thinking about variation, and go into the
lab to actually see how DNA sequence variation can be quantified. We want to be able to
measure the amount of variation in the DNA among a sample of individuals. It turns out
there are many ways to do this, and most of them depend on first performing a reaction
on the DNA known as the polymerase chain reaction. This reaction provides a way to
take total genomic DNA and to make millions of copies of just one tiny part of that DNA,
also known as an amplification. The reaction depends on the use of a DNA polymerase,
which you should recall is the enzyme that catalyzes the assembly of mononucleotide
triphosphates into a new DNA strand. The DNA polymerase from most organisms is
denatured permanently when it is heated to the temperature that melts apart the two DNA
strands. However, the DNA polymerase from hydrothermal vent organisms, like Thermus
aquaticus, can stand such high temperatures, so that is the enzyme we’ll use.
Polymerase Chain Reaction
The basic ingredients of the reaction include: 1. Sample DNA, 2. Two short (20
base pair) DNA primers whose sequences match the target DNA to be amplified, 3.
Temperature-stable DNA polymerase, 4. Buffer. There are three steps to the reaction, and
they are repeated many times by an instrument called a thermocycler, or a PCR machine.
The first step is to denature the sample DNA at 94 degrees C (almost boiling!). This
typically takes 30 seconds or so. The second step is to anneal the primers to the sample
DNA by lowering the temperature to 55-60 degrees C. The third step is to allow DNA
polymerase to synthesize new DNA strands by raising the temperature to 72 degrees C.
Each round theoretically doubles the amount of DNA between the primer sequences. If it
-3-
is repeated 30 times, we should end up with 230 or about 1 billion copies of the fragment
of DNA that lies between the two primers. In summary, the steps Denature, Anneal,
Synthesis are repeated simply by cycling the temperature between 94, 55, and 72 degrees.
This clever process won its inventor, Kary Mullis, the Nobel Prize.
Agarose gel electrophoresis separates DNA fragments of different lengths
After the PCR reaction is done, we want to know whether we in fact made many
copies of one small piece of DNA. This is generally done by running an agarose gel in a
procedure called gel electrophoresis. To make an agarose gel, the investigator measures
out the needed volume of buffer solution and adds anywhere from 0.8% to 1.4% agarose
by weight. Agarose does not dissolve readily at room temperature, and the easiest way to
get it to dissolve is to heat the mixture in a microwave oven. Once the agarose is
thoroughly melted and mixed, the solution is poured into a gel mold. A half-hour or so
later the gel is placed into a buffer chamber, where it is completely immersed in buffer.
The gel has small slots in it, and we place some of the PCR reaction solution into these
slots with a pipettor. An electric current is then run through the gel, and the negative
charge of DNA makes it move toward the positive electrode. Smaller DNA fragments can
work their way through the agarose gel matrix faster than can larger fragments, and after
an hour or so we can see the location of the DNA fragments by staining the gel with
ethidium bromide and looking under an ultraviolet light. The size of the DNA fragments
(measured in base pairs) is estimated by also running a length standard on the same gel
(in another lane). The length standard, also called a DNA “ladder” will make a series of
bands of known lengths.
How to score DNA polymorphism
After we run the PCR reaction, there are two possibilities. Either the fragments
-4-
are of different sizes, or they are all the same size. If they are different sizes, then we can
detect the difference in size by separating the fragments on an electrophoretic gel. An
example of a kind of polymorphism where there are many differences in length is called a
microsatellite, also called a Short Tandem Repeat Polymorphism (STRP). Microsatellites
are runs of simple repeats, like CACACACACACACA, and it happens that such runs
have a high error rate when DNA polymerase copies them. This results in a high mutation
rate, and the end result is that populations tend to be highly variable for these runs.
Microsatellites are of enormous utility in human genetics, and over 5500 have been
identified and mapped onto the human chromosomes.
If the DNA fragments are all the same size after the PCR reaction, then we need
to do some more work. When the fragments are all the same size, they may still be
different in sequence, so any method that can detect sequence differences should do the
trick. One possibility is to sequence the fragments. This is somewhat expensive, but the
time, cost, and effort in sequencing is rapidly decreasing, so this method is becoming
more acceptable. I will show an example of DNA sequencing and sequence data in class.
Another possibility is to run the fragments out on an electrophoretic gel that lets the DNA
molecules fold up into their native conformation. This is generally done after making the
DNA single stranded by heating it, so that single DNA strands fold up in a way that
differs if the sequence differs. This method is called Single Strand Conformation
Polymorphism (or SSCP). A third approach is to cut the DNA with a restriction
endonuclease, an enzyme that cuts DNA in a sequence-specific manner. If there is
variation in the DNA sequence, then some DNA molecules will be cut and others will
not. The result is fragments of different sizes, which can then be identified by
electrophoresis. Variation in DNA sequence is referred to as Single Nucleotide
Polymorphisms (SNPs), and this is the kind of variation that has the human genome
investigators very excited. SNPs may be very useful for mapping genes that affect the
risk of diseases like cancer and heart disease, so they have important medical application.
Let’s look at an example of how restriction endonucleases work.
Restriction endonucleases cut DNA at specific target sites
When DNA in solution is treated with the restriction endonuclease called EcoRI,
it finds all locations of its particular recognition site GAATTC, and cuts the DNA:
5’ - gatgctacgGAATTCcatgca - 3’
3’ - ctacgatgcCTTAAGgtacgt - 5’
⇓
EcoRI digestion
5’ – gatgctacgGA
3’ – ctacgatgcCTTA
-5-
ATTCcatgca - 3’
AGgtacgt - 5’
There are hundreds of different restriction endonucleases, and each cuts DNA at
its own specific recognition site. The original experiments on cloning of DNA depended
heavily on restriction enzymes to cut DNA in prescribed ways, and they are still
extremely useful in molecular genetics. Daniel Nathans was awarded the Nobel Prize for
his elucidation of restriction endonucleases. The amazing activity of restriction
endonucleases may seem puzzling at first, until you realize that they are a defense
mechanism that bacteria use to recognize and destroy foreign DNA, such as that of an
invading bacteriophage. Provided the bacteria’s own genome does not contain the
recognition site, it is safe, but it will cut any invader’s DNA that has the recognition site.
Recognition sites are typically 4 or 6 nucleotides in length.
Mutations are the original source of genetic variation
Although organisms are generally very good at replicating their DNA, and fixing
most of the mistakes (mutations) they make along the way, mutations that are not
repaired are the ultimate source of genetic variation. Without mutation, there would be
no genetic variation, and without genetic variation, no evolution. Mutations can alter the
base sequence (A,C,T, or G) or may involve structural 'mistakes' such as a DNA
insertion, deletion, inversion, or duplication. Mutations (at least to a good first
approximation) occur at random, meaning that mutations at one location are independent
of mutations at another location. Mutations occur without regard to whether they are
beneficial or deleterious to the organism. All of these different types of mutations are the
"stuff" of evolution and provide genetic variation for natural selection to work on.
Example of a survey of genetic variation
Suppose we extract DNA from 100 classmates. There are kits to do this, and it is
probably harder work to get the blood than it is to extract the DNA! Samples of around
10 ng of the DNA are added to small reaction tubes along with the other needed
ingredients for PCR. We use a pair of oligonucleotide primers to amplify’ a part of a
chromosome that includes a microsatellite repeat, like CACACACACACA. After the
PCR reaction, we take about 1/4 of the PCR product and load it onto a gel and perform
electrophoresis to separate fragments based on their size. Suppose there are only two
fragment sizes, S and L. Each individual will have either a band at the S location, or a
band at the L location, or they will have both the S and the L bands. In this case, we
assume that the people with only the S band are actually homozygotes, SS, and the
people only with the L band are homozygous LL, and the people with both bands are
heterozygous, SL.
We count up the three types and see 16 SS, 48 SL, and 36 LL. The next thing we
want to know are the allele frequencies for the S and L alleles. First we count up the total
number of S alleles in our sample. We do this by noting that the S homozygotes have 2
copies, so these 16 individuals have a total of 32 copies of the S allele. Adding the 48 S
-6-
alleles that the heterozygotes have, we get (16 x 2) + 48 = 80 copies of the S allele.
Similarly, for the count of the L alleles we get 48 + (36 x 2) = 120. In our sample of 100
people there are 200 gene copies, because everybody gets an allele from their mother and
one from their father (except for the case of X chromosomes in males). To calculate the
allele frequency, we divide the allele counts by the total of all alleles in the sample. Thus,
the frequency of the S allele is 80/200 = 0.40, and the frequency of the L allele is 120/200
= 0.60. Note that 0.4 + 0.6 = 1, that is, the sum of the frequencies of all alleles is 1. It
turns out that this simple process of calculating allele frequencies is of enormous
importance in using DNA evidence in rape and murder cases. Let’s see why.
Forensic uses of DNA variation
Variation is not only important in evolution -- it is also important in solving many
violent crimes. The basic idea behind forensic DNA methods is that an individual’s DNA
is as unique to that individual as a fingerprint, and matches of samples from a crime
scene to the DNA of a suspect can be very incriminating. In practice, the way the method
works is as follows. First samples of blood or other fluids or tissues are taken from the
crime scene and DNA is extracted. Then DNA is extracted from a blood sample of the
suspect. These DNA samples are subjected to PCR to amplify a piece of the genome that
is highly variable. One kind of variable gene frequently used for this purpose is called a
STRP. Repeats like this frequently produce errors when the DNA is replicated, so that the
number of copies increases or decreases. The heterozygosity of regions of DNA like this
can be as high as 95%.
After the PCR is done, the resulting DNA fragments are separated by gel
electrophoresis, as we described above. If the electrophoretic gel patterns do not match,
the suspect is innocent, or at least the suspect’s DNA does not match the DNA from the
crime scene. But, what if the banding patterns on the gel match? Does this guarantee that
the suspect is guilty? Does it even guarantee that the suspect was at the crime scene?
Does it even guarantee that the DNA from the crime scene must have come from the
suspect? The answer to all three questions is NO, but it might be possible to say that it is
extremely likely that the DNA came from the suspect. In order to determine this, we need
to calculate the probability that a random person’s DNA would match by chance. To do
this, we need the Hardy-Weinberg principle!
The Hardy-Weinberg Principle
Independently in 1908 Godfrey Hardy and Wilhelm Weinberg determined that the
expected frequencies of genotypes should settle to stable values in just one generation.
This was of great importance for evolutionary biology, because it meant that genetic
variation is not rapidly lost in a population by a simple blending process (as assumed in
Darwin’s time), but rather that the discrete allelic types will continue to segregate in the
population.
-7-
The assumptions that Hardy and Weinberg made were:
Infinite population size
No mutation
No selection
Mendelian segregation
Random mating among the genotypes
The last assumption is, in this case, equivalent to assuming that the gametes come together at
random. If we consider alleles A and a, whose frequencies are p and q (where p + q = 1), then
the Hardy-Weinberg principle states that the offspring generation will have genotypes AA, Aa,
and aa in frequencies p2, 2pq and q2, and that these relative genotype frequencies will also be
found in all subsequent generations. To see where the Hardy-Weinberg principle comes from,
just consider this Punnett square. Male gametes of type A have frequency p, and female gametes
of type A also have frequency p. The only way to get an AA offspring is to have a union of an A
gamete from the father and an A gamete from the mother. This happens with chance p x p = p2.
Similar reasoning is used for the other genotypes. Although our discussion has focused on two
alleles at a locus, many loci have more than two alleles. The Hardy-Weinburg principle is easily
extended for more alleles, eg., (p + q + r)2 gives the genotype frequencies for 3 independently
segregating alleles under Hardy-Weinburg.
A convenient way to understand the Hardy-Weinberg principle is to plot a graph showing
the frequencies of the genotypes as a function of the allele frequencies. It turns out that the
frequencies of all 3 genotypes are parts of parabolas. This is easiest to see in the plot of the
-8-
heterozygote frequency. Note that the frequency of the heterozygotes is maximal when p = q =
1/2. Also note that there is a point where there are equally many homozygotes AA as there are
heterozygotes. This happens when p2 = 2pq = 2p(l-p ) = 2p-2p2, or 3p2 = 2p. Dividing by p we
get 3p = 2 or p = 2/3.
Returning to our crime scene, we want to know the chance that a random individual
drawn from the population has the same genotype as does the blood from the crime scene. To
determine this, we must first calculate the population frequency of each allele. For the kind of
markers used in forensics, there are almost always more than 2 alleles. Suppose the frequency of
allele Ai is pi, and the frequency of allele Aj is pj. Then the Hardy-Weinberg principle tells us that
the expected frequency of genotype AiAj is 2pipj. For example, suppose the accused has genotype
A10A32, and the blood from the crime scene is also A10A32. The frequency of the A10 allele in the
population is 0.02, and the allele frequency of A32 is 0.01. The expected frequency of the
genotype A10A32 in the general population is 2 x 0.02 X 0.01 = 0.0004. This number, 0.0004, is
the probability that a random person drawn from the population has the genotype A10A32
(assuming that the population is in Hardy-Weinberg equilibrium). The critical question is: Is this
a low enough chance to convict the suspect? Typically several genes are examined at once, so
that the probability of a random match can be vanishingly low. DNA evidence is widely accepted
in courts today, but only recently the issues of how one calculates population frequency, and
what one does with the match probability had been a source of heated legal debates.
So how much DNA sequence variation is there?
Returning to the issue of evolution by natural selection (we did stray just a bit), the
methods of PCR, restriction enzymes, and DNA sequencing have been used to determine the
amount of DNA sequence variation in a wide variety of organisms. This variation is the
fundamental material upon which natural selection works, so it would seem to be very important
for the long-term evolutionary prospects for species. It turns out that the amount of variation in
populations varies enormously. DNA sequence variation is quantified by a term called the
-9-
nucleotide diversity, calculated as the chance that two copies of a gene will differ at a site (or
you can think of it as the fraction of sites that differ). The fruit fly, Drosophila melanogaster, has
an average nucleotide diversity of 0.005, meaning that about 1 base every 200 differs from
individual to individual. Humans have an average nucleotide diversity of 0.001, or a difference
in about 1 base every 1000. In both humans and flies, enough different genes have been studied
to see that the level of nucleotide diversity actually varies by over ten-fold from one gene to
another. Reasons for this relate to mutation rates, levels of selective constraint, and rate of
recombination within the genes. The cheetah has a nucleotide diversity of less than 0.0001, and
is so low because this species underwent a severe bottleneck, reaching very low numbers. In
contrast, the bacterium Escherichia ccli has an astronomically large population size, and its
average nucleotide diversity is about 0.03.
Many things influence the amount of DNA sequence variation, including the rate of
mutation, the population size, and the past effects of natural selection, and we will be spending
time in the next few lectures learning more about the inferences we can make about how
evolution works by determining the role that these forces play in the maintenance of genetic
variation.
Summary
1. Heritable variation is central to Darwin’s theory of evolution.
2. Variation occurs at the phenotypic and genotypic levels. Natural selection occurs at the
phenotypic level, changing the underlying gene frequencies.
3. DNA sequence variation ultimately originates with mutation, which can cause base
substitutions, or insertions, deletions, inversions, or duplications of DNA sequences. Many
methods exist for scoring DNA sequence variation directly. The Polymerase Chain Reaction
has proven remarkably versatile as a method for making many copies of a small segment of a
genome. It is useful for scoring restriction site variation and for scoring short tandem repeat
polymorphisms.
4. Genetic variation is quantified by allele frequencies.
5. The Hardy-Weinberg principle states that a population with allele frequencies p and q will
have stable genotype frequencies p2 of AA, 2pq of Aa and q2 of aa if there is random mating
(and other assumptions are met). This principle can easily be extended to multiple alleles at
a locus.
6. The Hardy-Weinberg principle is used in forensic DNA analysis to calculate the probability
that a random individual’s DNA would match the forensic sample.
- 10 -