Download Making sense of genetic variation!

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Deoxyribozyme wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Copy-number variation wikipedia , lookup

Genetic code wikipedia , lookup

Genetic testing wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Epistasis wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

RNA-Seq wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

DNA barcoding wikipedia , lookup

Gene wikipedia , lookup

Behavioural genetics wikipedia , lookup

Metagenomics wikipedia , lookup

Genome evolution wikipedia , lookup

Medical genetics wikipedia , lookup

Frameshift mutation wikipedia , lookup

Public health genomics wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Non-coding DNA wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Human genome wikipedia , lookup

Genetics and archaeogenetics of South Asia wikipedia , lookup

Mutation wikipedia , lookup

Designer baby wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Genetic engineering wikipedia , lookup

Genome (book) wikipedia , lookup

Koinophilia wikipedia , lookup

Genomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Helitron (biology) wikipedia , lookup

Genome editing wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genetic drift wikipedia , lookup

Microsatellite wikipedia , lookup

Point mutation wikipedia , lookup

Heritability of IQ wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Population genetics wikipedia , lookup

Human genetic variation wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
Native
Americans
Finns
African
Americans
Making sense of genetic variation!
•! Is there an association
between DNA sequence
variation and the disease
phenotype?
•! What do the sequences
tell us about human
history?
•! How has natural
selection shaped diversity
DNA sequence diversity in a 9.7-kb region of the human lipoprotein lipase gene
Nickerson, et al. 1998 Nature Genetics 19, 233 - 240
in the gene?
Population and Quantitative Genetics
Population genetics describes variation within and
between species
There are two major areas of interest:
•!Describe degrees of genetic variation within
and between individuals and/or population
•!infer the evolutionary mechanisms responsible for
the origins and maintenance of genetic
variation
Mutation is the source of variation that stochastic
and deterministic factors can upon.
The aims of population genetics!
•! To understand the link between genetic variation and phenotypic variation!
–! Is variation at this gene associated with disease susceptibility?!
–! Which loci contribute the variation in hair colour?!
•! To investigate the evolutionary history of a species!
–! How long have these populations been separate?!
–! Which genes have experienced recent adaptive evolution?!
•! To learn about fundamental biological processes!
–! How does the recombination rate vary along the genome?!
–! What determines the mutation rate?!
Genetic variation are of several types
(1) visible, discrete variation
Biston betularia melanic and non-melanic
moths
in Great Britain
Fruit color variation between different
cultivars (populations) of chile peppers
Color variation between snail shells
(Amphidromus floresianus)
(2) quantitative variation
variation is of degree rather than kind
(3) chromosomal or cytogenetic variation
Genetic variation in chromosomal structure or
constitution between individuals
Diversity of chromosomal structure
•!Insertions
•!Deletions
•!Inversions
•!Translocation
Example: classic third chromosome inversion
polymorphisms in Drosophila pseudoobscura
Diversity in chromosome number
•!Variation in entire chromosome sets
(polyploidization)
•!Variation in numbers of single chromosomes
Fairly common in plant species
Transposable elements
-movement of genomic elements
througout the genome - may or may not be site
specific
Tragopogon species in Washington, Oregon and
Idaho. Three species (diploid) co-exist with
tetraploid individuals that result from
interspecific hybridization
(4) protein variation
Allozymes are protein products of allelic
variants of genes
Allozymes can sometimes be discriminated by
electrophoresis (e.g. electromorphs)
These electromorphs differ in charge (+ or -)
and can be observed by starch gel
electrophoreses
- Isozymes are like allozymes but variants are
from potentially more than one gene.
Example: Variation in the GPI locus in snail
species
Other example: Drosophila melanogaster Adh
Fast (F) and Slow (S) electromorphs
Use of allozyme data in population genetics
was revolutionary in two ways
•!It showed populations contained a significant
amount of genetic variation
•!Techniques used to detect this diversity
could be applied to virtually any organism
Another method of detecting protein
variation: Immunological variation (e.g., blood
group detection, the ABO allele system)
Codominant markers
(5) DNA sequence variation
All genetic variation stems from variation in DNA
sequence
Several techniques:
•!Restriction fragment length polymorphisms (RFLP)
Variation in DNA sequences due to
mutations in restriction sites
normal EcoRI site
mutant EcoRI site
GAATTC
GAGTTC
EcoRI site
300 bps
500 bps
Individual A
Individual B
A
B
•!Microsatellite sequences
Simple sequence repeats of di-, tri- or
tetranucleotides
Randomly scattered across throughout genome
(ubiquitous)
Very polymorphic because replication slippage
results in high mutation rates
For example,
AGGTCGGT(CTG)nGGTATCGG
n = 1 to >100
Microsatellite gel of willow population
An example: structuring of human
populations!
•!
Questions
–! Is there significant natural structuring to genetic variation in humans?
–! Does this structuring coincide with geographical boundaries?
•!
Data
–! 377 autosomal microsatellite loci in 1056 individuals from 52 populations. Rosenberg et
al (2002)
•!
Model
–! K ‘Hidden’ populations in linkage and Hardy-Weinberg equilibrium
•!
Estimation
–! Estimate population allele frequencies
–! Most likely value of K
–! Posterior probability for each individual
Africa
Europe
Middle east
Asia
Oceania
America
Science 298: 2381
•! direct DNA sequencing
The ultimate assay for genetic variation is
direct sequence information of the DNA.
The first systematic assay of variation using
direct sequencing was of the Drosophila melanogaster
Adh gene.
single nucleotide polymorphisms or SNPs
Levels of Genetic Variation:
One can always view genetic variation at different levels, from
the molecular to phenotypic variation.
Adh gene DNA sequence variation
(exon 4 A to T change)
!
ADH protein sequence variation
(threonine to lysine change in amino acids in the protein sequence)
!
Adh allozyme variation in electrophoretic mobility
(Fast [F] and Slow [S] polymorphism)
MUTATIONS:
The Ultimate Source of Genetic Variation
•!genetic variation ultimately traces from mutations
•!some mutations are large-scale
•!others occur at the smaller, DNA scale.
Three types of mutations at the DNA level:
•!Insertions
•!Deletions
•!Substitutions!
Insertions:
AGGTCGT " AGGGTCGTATCGT
large insertions (>100 bps) can
be caused by mobile transposable element sequences
these include SINES, LINES, Alu, transposons
and retrotransposons.
Deletions:
AGGTCGTGCTCGT " AGGTCGT
Caused largely by unequal crossover or excision of
inserted transposable elements.
Substitutions:
•!transitions - nucleotide changes between similar nucleotide types
purine to purine
A # G
pyrimidine to pyrimidine
C # T
•!transversions - nucleotide changes between different nucleotide
types
purine to pyrimidine
A,G # C,T
•! mutations according to functional impact
coding region mutation
synonymous or silent mutation
$!
Gly
Gly
. . . GGG . . .
. . . GGC . . .
nonsynonymous or replacement mutation
$!
Gly
Ala
. . . GGG . . .
. . . GAG . . .
QUANTIFYING THE VARIATION
Variation at single loci
If we want to have a sufficient description of the
genetic constitution of a population at a single locus,
we need to specify two things:
•!specification of what genotypes are present
•!specification of how many of each genotype there is
in the population
Example: Australian aborigine population using
the MN blood group. This variation is protein
variation of red blood cell antigens using
immunological techniques. Here’s the data:
Blood
group
Number
of individuals
Genotype
frequency
Notation
MM
MN
NN
Total
22
216
492
730
0.030
0.296
0.674
1.000
P
H
Q
P+H+Q = 1
Genotype frequencies gives the description of a population
at an individual instance of time (present time).
•!NOT a description as a breeding group, since whole genotypes
are not transmitted between generations.
•!Alleles (also sometimes referred to incorrectly by molecular
biologists as genes) are the ones transmitted across generations.
•!We can describe the genetic constitution of a population by
specifying the allele frequencies.
Blood Group
Number of
individuals
Number
of alleles
M
N
MM
MN
NN
22
216
492
44
216
0
0
216
984
Total
730
260
1200
Computing allele frequencies:
•!frequency of M allele
f(M) = p = 260/1460 = 0.178
•!frequency of N allele
f(N) = q = 1200/1460 = 0.822
You can also compute the gene frequencies directly
from the genotype frequencies:
p = P + H/2
q = Q + H/2
Note that p + q = 1.
Variation at many loci (multilocus measures)
There are two ways one can readily quantify variation
if you are looking at more than one locus.
•!fraction of loci that are polymorphic
P = proportion of polymorphic loci
•!estimate the heterozygosity of the population averaged over
all loci.
•!H = mean heterozygosity
Suppose we have data for 5 allozyme loci from Zea mays.
Genotype
Locus
FF
FS
SS
Adh
32
16
2
Mdh
1
9
40
total
50
50
f(F)
f(FS)
0.80
0.32
0.11
0.18
Pgi-A Pgm
50
8
0
24
0
18
50
50
1.00*
0.00
0.40
0.48
Ldh
0
1
49
50
0.01*
0.02
The frequencies of FS heterozygotes are also known as the
“observed heterozygosities”.
•!Proportion of polymorphic loci:
The loci that show variation are Adh, Mdh and Pgm.
Both Pgi and Ldh are monomorphic
(frequency of least common allele < 5%).
P = 3/5 = 0.60
•!Mean heterozygosity
Take the average of the frequencies of the heterozygous
genotypes across all loci.
H = [f(FS)Adh + f(FS)Mdh + f(FS)Pgi + f(FS)Pgm + f(FS)Ldh]/5
= 0.20
From Hartl and Clark (chapter 2)
Measures of DNA polymorphisms
•!Measures such as estimates of gene diversity
(proportion of polymorphic loci, heterozygosity)
are difficult to interpret from DNA sequence data.
•!DNA-level variation can be quite extensive.
•!some measures of variation that are
particularly suited for DNA sequence data.
•! number of alleles in a sample, na
count the number of different alleles
(“haplotypes”) in your sample.
•! number of segregating sites in the sample
segregating site is nucleotide site that is
polymorphic in your sample.
K (“big K”) is the total number of
segregating sites.
Note that K is dependent on the length of the sequence, L.
The longer the sequence, the larger K can be.
We can normalize K by dividing it by sequence length.
S = K/L
This normalized estimate of the number of segregating
sites, S.
Both na and S are used to develop more sophisticated measures
of DNA variation used in molecular population genetics.
The raw material of population
genetics!
•! Population genetics is the study of
naturally occuring genetic variation!
•! Genetic variation comes in all shapes and
sizes!
–!
–!
–!
–!
Re-sequencing v. SNPs v. microsatellites!
Recombining v. partially linked v. unlinked markers!
Single gene v. multiple loci v. whole genome!
Single population v. multiple population v. multiple species!
•! Which data type you collect (and analyse)
depends on the questions you want to ask!