Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Non-coding DNA wikipedia , lookup

Oncogenomics wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

Human genome wikipedia , lookup

Mutation wikipedia , lookup

Gene nomenclature wikipedia , lookup

Genetic engineering wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Public health genomics wikipedia , lookup

Ridge (biology) wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene desert wikipedia , lookup

Genomic imprinting wikipedia , lookup

Epistasis wikipedia , lookup

Minimal genome wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Metagenomics wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

History of genetic engineering wikipedia , lookup

Pathogenomics wikipedia , lookup

Genetic drift wikipedia , lookup

Gene expression programming wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

RNA-Seq wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Helitron (biology) wikipedia , lookup

Point mutation wikipedia , lookup

Gene expression profiling wikipedia , lookup

Gene wikipedia , lookup

Human genetic variation wikipedia , lookup

Genome (book) wikipedia , lookup

Genome evolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Designer baby wikipedia , lookup

Koinophilia wikipedia , lookup

Population genetics wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
This presentation was originally prepared by
C. William Birky, Jr.
Department of Ecology and Evolutionary Biology
The University of Arizona
It may be used with or without modification for
educational purposes but not commercially or for profit.
The author does not guarantee accuracy and will not
update the lectures, which were written when the course
was given during the Spring 2007 semester.
Population and Evolutionary Genetics
The Nature and Origin of Biological Diversity
Subject of evolutionary biology is nature and origin of biological diversity:
 diversity of individuals within species
 diversity of species in the biosphere
Biological diversity matters! Population and evolutionary genetic data and
theory are used intensively in conservation, agriculture, medicine, physical
anthropology, and genomics.
Population and evolutionary genetics studies diversity at th e level of genes:
 measure amount and kind and look for patterns. (How many genes or bp are
different in two humans? Are some genes more variable than ot hers?)
 explain it in terms of mutation, random drift, natural selection, sexual
reproduction, migration, etc. (Why are the genes for fibrinopeptides more
variable than the genes for cytochrome c? Why do introns evolve faster than
exons?)
Population and evolutionary genetics reconstructs the history of life. (Are
humans more closely related to chimpanzees or gorillas? Did Homo sapiens and
H. neanderthalensis interbreed? When did the multicellular animals arise?)
Population Genetics is the study of
 genetic diversity among individuals, within species (or populations)
 mechanisms that determine the amount of diversity
 short-term evolutionary changes in organisms (short-term = thousands or
tens of thousands of generations)
Evolutionary genetics is the study of the
 genetic differences between species
 long-term evolution of genes and genomes
Roughly speaking
 population genetics compares different copies of a gene within a species
 evolutionary genetics compares a gene in two different species
Both are governed by the same basic forces, of which the most important are:
 mutation
universal
 random genetic drift
universal
 natural selection
universal
 sexual reproduction
in sexual species
 migration
in subdivided populations
Although some processes are universial, they are not equally important!
S?
S/A
A
A
S/A
A
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
S/A
A
S
S/A
A
Homo sapiens?
Charles Darwin and Alfred Russell Wallace
 Knew about evolution (as did many others at that ti me).
 Knew that all organisms share a common anc estor and tha t life can be
portrayed as a tree.
 Knew that there is great variation among species, but didn’t know that
variation originates by mutation.
 Knew that variation is inherited, but didn’t know about genes, had poor
theory of inheritance.
 Knew that natural selection acts on variation to cause adaptation in
organisms, but didn’t know about some other forms of selection, and didn’t
know about random drift.
Darwin would be happily amazed to see what we have learned about
evolutionary genetics. For example, my colleague Michael Nachman and his
students, especially Hopi Hoekstra (now at Harvard University), showed that
pocket mice living on lava flows are very dark while those living on sandy areas
are light, and that th is adaptive difference is due to one or a few mutations in a
specific pigmentation gene.
Start with an example.
Marty Kreitman cloned and sequenced ll different copies of the Adh gene
encoding alcohol dehydrogenase from different strains of Drosophila melanogaster.
Nowadays the genes are amplified by PCR and the sample sizes are much larger,
e.g. 100 flies. The sequences must be aligned before comparison, but this is easily
done because the differences between them are small, and the alignments are
unambiguous.
Computers will translate the DNA sequences into protein sequences. Below is an
alignment of exon 4 from three of Kreitman's sequences and from two different
species, D. willistoni and D. virilis.

The D. simulans sequen ce is i dentical t o one of the D. melanogaster sequen ces. D.
simulans is a close relative of D. melanogaster, based on simil arity of other genes and
of morphology. Phylogeny :
Note evo lutionary li neage s leading to melanogas ter and virilis split before the one s
leading to melanogas ter and simulans. Dif ferences reflect mutations that accumulated
along the branche s. Dif ferenc es propo rtiona l to branch leng ths:
melanogaster -simulans = a + b = 2 amelanogaster-virilis = a + c + d = 2 (a + c)
Variation within a species:

Drosophila melanogaster is polymorphic = has • 2 alleles of Adh. The sequences
fall into two classes, those with threonine (T) and those with lysine (K) at site
#25. This difference was previously detected by using starch gel
electrophoresis. All D. melanogaster populations are polymorphic, with both
Fast and Slow alleles.
AdhF/AdhF
S
F
AdhF/AdhS

In the entire coding sequence (exons 1 + 2 + 3 + 4), there is only site whe re different
copies of the Adh gene from D. melanogaster differ in amino acid sequences.
Now look at the DNA sequence:
The bp difference at site 74, C
vs. A, is the one that causes the
difference between threonine
and lysine at amino acid 25.
This is a change in the second
codon position.
There are a number of other
differences between the three
Adh genes from D.
melanogaster, and also between
the three species. Most of these
are synonymous differences =
differences that change from
one codon to a synonymous
codon and hence don’t change an
amino acid. E.g. site 15: codon
difference ATC vs. ATT, both code
for isoleucine. Change in third
codon position, where most
changes are synonymous/silent.
Patterns in molecular genetic variation to explain:
 DNA sequences > amino acid sequences.
 synonymous > nonsynonymous
Adh gene has introns and flanking sequences as well as exons.
 flanking - introns > exons
Conserved sequences = (nearly) invariant
e.g.
ATG start codon
Promoter
All mutations very detrimental or lethal.
 Functionally important regions tend to be conserved.
So can look for conserved regions and they are likely to be important.
This illustrates recurring theme:
3 kinds of mutations with respect to natural selection:
 Neutral: no effect on fitness (number of offspring produced by individual
with mutation)
 Detrimental (= deleterious): decrease f itness, usually eliminated by natural
selection
 Advantageous: increase fitness, favored by natural selection, rare
Neutral variation is most common type, because most non-neutral mutations
are detrimental and individuals carrying them reproduce less.
Mutations are more often detrimental in genes or regions that are les s variable
(more conserved).
We will spend the remainder of my lectures talking about biological diversity
within sp ecies and about differences between species: how biological diversity is
measured; and the mechanisms that govern it: mutation, random genetic drift,
and natural selection.
Outline of remaining lectures:
Population genetics
 Parameters used to describe diversity within species
 More on patterns and phenomena
 Factors determining amount and patterns of diversity
Mutation
Random drift
Selection (directional, balancing)
Sex (if have time)
Evolution
 Measuring rates of evolution
 Patterns seen in rates
 How rates are determined by mutation, drift, and selection
 Genome evolution
Variation in number and arrangement of genes
How new genes arise by gene duplication
Why some people think this subject is hard:
1) Must be comfortable with stochastic models as well as deterministic; hopefully
we have already got past that hurdle.
2) Must learn to think about populations of individuals and genes instead of
individuals.
3) Mathematical models … but we will consider only simple ones and try to get
intuitive understanding of them.
MEASURING DIVERSITY WITHIN SPECIES (POPULATIONS)
Defining populations:
A population is usually defined as a group of individuals of the same species. In
a sexual species, the members of the population are usually able to mat e with
each other, at least potentially. Otherwise the definition is somewhat arbitrary,
being whatever group of organisms one is studying at the moment. When my
student Jody Banks studied Texas bluebonnets, the population was the entire
species. Some people study the population of bacteria in a chemostat, while
others analyze the people in one small religious group.
Populations may be strongly subdivided into local populations connected by
infrequent migration. We won’t cover this.
Gene (Allele) and Gene Frequencies
A gene or population is polymorphic if there are • 2 alleles of the gene in the
population. Otherwise it is monomorphic.
If a locus is polymorphic, then we must ignore individuals and treat genes and
genotypes in a new way.
1908
G. H. Hardy (prominent British mathematician, who responded to a question
raised by a geneticist)
Wilhelm Weinberg (German physician interested in human hereditary diseases)
gene frequency = frequency of a particular allele of a gene in the population
Alleles could be identified by electrophoresis or by sequencing.
Hypothetical example:
•Population of Drosophila melanogaster is examined for genetic variation at the Adh locus.
•Sample of 100 flies subjected to electrophoresis.
•Calculate genotype and gene frequencies:
Numbers of genotypes
Genotype frequencies
Numbers of genes
Allele frequencies
40 FF
0.40
80 F
0.4F
40 FS
0.40
40 F
40S
0.2F
0.2S
20 SS
0.20
total 100
1
40S
200
0.2S
0.6 F
0.4 S
1
The allele frequencies could have been calculated in two ways:
(1) There are 2  100 = 200 genes in the sample gene pool. 80 + 40 = 120 are F and
40 + 40 are S. Frequency of F = f(F) = 120/200 = 0.6; frequency of S = f(S) = 80/200
= 0.4.
(2) f(F) = f(FF) + f(FS)/2 = 0.4 + 0.2 = 0.6; f(S) = f(SS) + f(FS)/2 = 0.2 + 0.2 = 0.4
Either way, it is absolutely crucial to check that the frequencies add up to 1: 0.6 +
0.4 = 1. If they don't, either you made an error in the calculations, or there are
more than two alleles and you forgot to count some of them; i.e. you screwed up.
In population and evolutionary genetics, we never think or talk about
individuals or families or individual crosses or mating, only about populations
and gene or genotype frequencies (and occasionally about frequencies of different
types of matings or other events).
The abstraction process can be visualized as first making the flies disappear,
leaving only the genes (two from each fly). Then the genes are mixed up.
This collection of genes is sometimes called the gene pool. You can visualize it as
a swimming pool filled with genes if you wish.
“The trouble with the gene pool is that there is no life guard.”
Measures of Allelic Diversity
(1) The observed heterozygosity of a gene in a population is the frequency of
individuals that ar e heterozygous for the gene.
Problem: idepends on whether the population is inbreeding or outbreeding.
If Drosophila melanogaster was an extreme inbreeder, would have mainly two
genotypes, F F and S S, and observed heterozygosity would be - 0.
(2) The expected heterozygosity of a gene is the probability that t wo copies of
the gene, drawn at random from the population, are different alleles.
Terminology:f (x ) = f requenc y of x
e.g. 2 alleles
P( x) = pr obability of x
f ( A in ge ne p ool) = p
f ( a in gene poo l) = q
We can calculate the probability of drawing different genotypes (pairs of alleles)
as follows:
P( draw A) = f( A) = p
P( draw a) = f ( a) = q
P( draw A & A ) = p 2
P( draw a & a) = q 2
P( draw A & a ) = 2pq
Note that this is equivalent to a real population that is random mating. (HardyWeinberg law)
expected heterozygosity = h = 2pq = 1 - (p 2 + q 2 )
Inbreeding Produces Homozygotes
e.g. selfing
Aa
1/4
1/4 AA
1/2
1/2 Aa
?
1/4
1/4 aa
all Aa
1/4
1/4 AA
1/4
1/2 Aa
1/4
3/8 AA
1/2
1/4 aa
1/4
1/4 Aa
1/4
7/16 AA
1/2
1/2
1/8 Aa
3/8 aa
1/4
7/16 aa
How many heterozygotes after n generations?
all Aa
1/4
1/4 AA
1/4
1/2 Aa
1/4
3/8 AA
1/2
1/4 aa
1/4
1/4 Aa
1/4
7/16 AA
1/2
1/2
1/8 Aa
3/8 aa
1/4
7/16 aa
How many heterozygotes after n generations?
(1/2)n
When there are just two alleles, it is common to designate the two frequencies as
p and q. But especially in molecular biology, we often deal with more than two
alleles, and so we often use x1, x2, ... xn for n different alleles.
When there are > 2 alleles, it is easier to calculate h "backwards". If one has m
different alleles and xi is the frequency of the ith allele,
m
h = 1 -
xi2
= 1 – ( x1 2 + x2 2 + ... . + xm 2 )
i=1
e.g. for imaginary Adh data a bove:
observed heterozygosity = 0.40
2
2
h = (2)(0.6)(0.4) = 0.48 o r 1 - (0.6 + 0.4 )
Note: The fact that h is greater than the observed heterozygosity in this example
suggests that this population may be slightly inbred.
This is why expected heterozygosity is a better measure of diversity; a
population could have many different alleles and genotypes and still have zero
observed heterozygosity if it was strongly inbred.
Kreitman's sample of 11 copies of Adh gene had 6 S and 5 F alleles, so gene
frequencies were f(S) = 6/11 - 0.55 and f(F) = 5/11 - 0.45.
h = 0.495
Human population is obviously not perfectly random mating, but is close
enough so that in many cases, the observed and expected heterozygosity are very
similar.
Expected heterozygosity is high for many genes, on o rder of 0.1 – 0.5.
h = 0.1 means P(two random alleles differ in charge) = 0.1.
Sequence Diversity
Allelic heterozygosity based on electrophoresis or amino acid sequences or
morphology actually underestimates genetic diversity. In particular, as we saw
above that amino acid sequences don't detect synonymous base sequence
differences.
Kreitman actually sequenced over 2.6 kb from a larger sample of 11 genes and
found 8 different alleles, 7 singlets and 1 represented 3 times.
h = 0.86
cf. h = 0.50 detected with electrophoresis
Nucleotide Diversity, a Measure of Sequence Diversity
Use parameter analogous to expected heterozygosity:
 = P(a site has a different bp in 2 random copies of a gene)
= proportion of bps different in 2 random copies of a gene
= mean pairwise sequence difference
 is calculated by aligning the sequences of a sample of genes A, B, C, etc. and
comparing all possible pairs (A and B, A and C, B and C, etc.). For each pair,
determine the proportion of sites that are different. Then  is the average of
these proportions.
Kreitman’s 11 Adh genes:  = 0.007 differences/bp
 is smaller than h because it is differences per bp, and there are many bp in the
gene.
Calculation of  is tedious by hand for large samples. Best to use calculator with statistical
functions or use Excel spreadsheet. Example of data from small sample of freshwater invertebrate
Keratella cochlearis. This is the nucleotide diversity of 590 bp of the mitochondrial cox1 gene.
1
2
1 2 0.00338 3 0.00169
0.00169
4 0.03723
0.04061
5
0.0423
0.04569
3
4
5
mean
0.026058
0.03892
0.04399
0.00508
-
Sequence diversity is high. If diversity is high when measured at the gene level,
it is not surprising that it is also high at the sequence level. For humans ,
nucleotide diversity  - 7 X 10-4 differences/bp
Interpretation:
For any 2 random individuals:
0.07% of bp's differ
0.7 bp differs in gene of 1 kbp
2.1  106 bp's differ in genome of 3  109 bp's
We have long known that no two individuals of a species are genetically
identical, unless they are members of a clone (and even then they will differ in
several mutations). But these data suggest that two humans chosen at random
will differ in a large proportion of all genes, perhaps more than 1/3 , and in about
two million base pairs!
What is also the expected difference between the two copies of the genome that
you got from Mom and Dad?
Sequence diversity is high. If diversity is high when measured at the gene level,
it is not surprising that it is also high at the sequence level. For humans ,
nucleotide diversity  - 7 X 10-4 differences/bp
Interpretation:
For any 2 random individuals:
0.07% of bp's differ
0.7 bp differs in gene of 1 kbp
2.1  106 bp's differ in genome of 3  109 bp's
We have long known that no two individuals of a species are genetically
identical, unless they are members of a clone (and even then they will differ in
several mutations). But these data suggest that two humans chosen at random
will differ in a large proportion of all genes, perhaps more than 1/3 , and in about
two million base pairs!
What is also the expected difference between the two copies of the genome
that you got from Mom and Dad?
2.1  106 bp's
Phenomena to be explained:
 Different species have different diversities.
E.g humans > cheetahs
 Different genomes have different diversities.
E.g. human mitochondrial genes > human nuclear genes
 Different genes or regions of genome have different diversity.
E.g. pseudogenes > noncoding regions > genes
E.g. 3 rd codon position > 1st codon position > 2nd codon position