Download Lecture 3 - Population genetics.key

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Koinophilia wikipedia , lookup

Twin study wikipedia , lookup

Designer baby wikipedia , lookup

Inbreeding wikipedia , lookup

Pharmacogenomics wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Neuronal ceroid lipofuscinosis wikipedia , lookup

Human genetic variation wikipedia , lookup

Behavioural genetics wikipedia , lookup

Tay–Sachs disease wikipedia , lookup

Genetic testing wikipedia , lookup

Fetal origins hypothesis wikipedia , lookup

Medical genetics wikipedia , lookup

Genome (book) wikipedia , lookup

Heritability of IQ wikipedia , lookup

Genome-wide association study wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Microevolution wikipedia , lookup

Public health genomics wikipedia , lookup

Dominance (genetics) wikipedia , lookup

Genetic drift wikipedia , lookup

Population genetics wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

Transcript
BIO227
Introduction to Statistical Genetics
Lecture 3:
Introduction to population genetics
1
What have we studied
Background Structure of Human Genome DNA Variants and disease Mendelian Inheritance Mendel’s first law Mendel’s second law Mode of inheritance Genetic models for mendelian and
complex disease
2
Overview of Today’s Material
Population Genetics Concepts: Estimation and Inference About Allele Frequencies Hardy Weinberg Equilibrium Population Substructure Measuring Genetic Contribution to Traits Recurrence Risk Ratios Heritability
3
Allele Frequencies
• Definition:
Allele frequency = proportion of chromosomes in population carrying the allele of interest. (e.g. a disease allele) • Allele frequencies are compared in association studies to detect disease genes • Allele frequencies tell us about the probability of observed genotypes
4
5
Hardy Weinberg Equilibrium (HWE)
Theorem: Allele frequencies in a population remain constant if no evolutionary forces exist. Requirements for Hardy-­‐Weinberg equilibrium: • Large population • Random mating • No mutation • No migration • No selection
Departures from HW equilibrium provide a mechanism to study evolution
6
Hardy Weinberg Equilibrium (HWE)
Rule: If you know allele frequency, use HWE to calculate genotype probabilities. Last week: Use Mendel’s law to compute P(offspring genotype = Go|parents genotypes = Gm,Gf). Suppose we do not know parents, but we do know P(allele) = p. How to compute: P(person’s genotype = G| allele frequencies at the locus)
6
HWE
Two allele system: A and a Probabilities p and q=1-­‐p If HWE holds, genotype probabilities are: AA(X=2) Aa(X=1) aa(X=0) p2 2pq q2 X is B(2,p) E(X) = Var(X) = Many assumptions required for HWE to hold.
8
IMPLICATIONS OF HWE
Suppose population is in HWE, then will remain in HWE after a round of random mating. Suppose population is not in HWE, then it will get in HWE after one round of random mating. The allele frequency does not change from one generation to the next.
Bio 227 lecture 3
9
How to Detect Failure of HWE:
Testing for HWE in a Sample
• Estimate allele frequencies • Compute Expected genotype frequencies
assuming HWE holds • Use Pearson Chi-Square test
Bio 227 lecture 3
10
Hardy-Weinberg Equilibrium (HWE)
• Test for HWE based on Pearson chi-­‐square test: Genotype
AA
Aa
aa
Observed
nAA
nAa
naa
n
Expected
np2
2np(1-p)
n(1-p)2
n
• Estimate p as (2nAA + nAa) / 2n • The Chi Square Test has 1 degree of freedom. (Why?)
11
When is HWE is useful?
The failure of HWE can reveal a lot about sample
features:
- Selection of subjects related to genotype
- Population Substructure
- Genotyping errors
Subject Selection: CCR5 and HIV
(Hartl and Jones)
• CCR5 is a protein on the surface of white
blood cells (T cells), involved in the
immune system • With ‘normal’ genotypes, enables HIV virus
entry into T cells • A deletion of 32 base pairs creates coding
of incorrect amino acids; inhibits HIV virus
binding and infection • An association of deletion and lack of AIDS • Variant is present in many subpopulations
world-wide
13
HWE at CCR5 Receptor: Unselected
sample of 1000 subjects
➢Test for HWE based on Pearson chi-­‐square test: Genotype
Observed
Expected
++
+Δ32
Δ32Δ32
795
190
15
1000
195.8
12.1
1000
792.1
p- = 0.11 MAF
➢Chi-­‐square = 0.9, 1 df
Bio 227 lecture 3
14
Test subjects without AIDS
Genotype
Observed
Expected
++
+Δ32
Δ32Δ32
175
33
4
212
37
2
212
173
Frequency p =0.096
• Chi-square is marginal (2.5), but AIDS-free individuals
show an excess of two delta-32 alleles. • Small sample sizes in table above. Should use an exact
test.
15
Population Substructure:
Stratification / Admixture / Inbreeding
• Population stratification: distinct subgroups within a population. • Population admixture: mating among individuals of different genetic origin over multiple generations. Usually occult.
• Inbreeding: mating between ‘close’ relatives
Bio 227 lecture 3
17
Stratification
18
Dog Breeds and the Albumin Alleles
Test for HWE using entire population: Genotype
SS
SF
FF
Observed
463
376
301
1140
Expected
371.8
558.4
209.8
1140
pF = 0.43
Highly Significant X2
(not shown)
19
New Topic: How do we measure extent to which a trait is genetic?
Two primary measures: Recurrence Risk Ratios (dichotomous traits) Heritability (quantitative traits) 26
Recurrence Risk Ratio
Definitions: Proband: Subject selected into sample because of
disease status. P(disease) = K
Relative of type R (parent, sib, etc) Recurrence risk ratio defined for dichotomous disease
trait as λR = P(relative of type R diseased | proband diseased) P(disease) If the disease has a genetic basis, what should λR be? How should λR vary with R? If disease is NOT genetic, what should λR be?
Bio 227 lecture 3
28
How do we use λR?
• Justifies doing a genetic study of the
disease
• λR is the basis for power calculations for
many types of linkage analysis
• Compare estimated λR to different genetic
models
• We will look at how λR is calculated in
simple Mendelian models
Bio 227 lecture 3
29
Notation
Disease Phenotype: Y (Y=1 is affected; Y=0 is unaffected) Genotype at Disease Locus:
X=0,1,2 (dd,Dd,DD) Penetrance functions: f_x:
P(Y=1|genotype = x) R:
Denotes a relative of the proband p:
Frequency of D allele p(X)
frequency of genotypes, p(DD, Dd or dd genotype) Hardy Weinberg Equilibrium (HWE): X is Binomial (2,p) p(dd) = (1-p)2 p(dD) = 2p(1-p) p(DD) = p2
Bio 227 lecture 3
30
30/53
For Simple Mendelian Models, P(disease) depends only on genotype at a single
locus, no other factors influence disease Denominator: K = P(disease) = f_0*(1-p)2 + f_1*2p(1-p) + f_2*p2
= ∑f_x * p_x Assumes penetrance functions, allele frequency,
HWE
Bio 227 lecture 3
31
31/53
What does λR depend on?
What about the numerator ? P(relative of type R diseased | proband diseased) = P(both diseased)/K λR = P(both diseased)/K2 What does P(both diseased) depend on?
32
32/53
Calculating λR
Depends on degree of relationship R,
penetrance functions and Mendel’s Laws Example: Consider the sibling recurrence
risk ratio and a recessive Mendelian
model: Show that λS = [(1+p)/2p]2 Step 1: Calculate K Step 2: Calculate p(both sibs have disease) Step 3: Calculate λS
33
Calculating λS
Denominator: K2 = ? Numerator:
Values in table represent probability of an affected child
33
Recurrence Risk Ratio
Recurrence risk to relatives of type R: How to calculate? 1) Assume a specific genetic model (single gene, dominant) 2) Assume a frequency for the disease allele p 3) Assume 3 penetrance functions: f_0, f_1, f_2 4) Simple to compute K=P(disease in population) 5) Assume random mating and HWE to get all possible
genotypes for common ancestors 6) Use Mendel’s Laws to get offspring genotypes phenotypes
and to compute P(both relatives affected) 7) Easiest when use Parent-Offspring or Sibling for R, and
deterministic Mendelian models
Heritability
➢Originally defined for continuous traits; can be
adapted to dichotomous disease traits
➢Heritability is defined as percent of total trait
variance ‘explained’ by genes
➢Requires a very specific genetic model
explaining how genes affect outcome
➢Requires data on relatives to estimate
➢Can also estimate using GWAS data
Bio 227 lecture 3
36