* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lecture 3 - Population genetics.key
Koinophilia wikipedia , lookup
Designer baby wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Neuronal ceroid lipofuscinosis wikipedia , lookup
Human genetic variation wikipedia , lookup
Behavioural genetics wikipedia , lookup
Tay–Sachs disease wikipedia , lookup
Genetic testing wikipedia , lookup
Fetal origins hypothesis wikipedia , lookup
Medical genetics wikipedia , lookup
Genome (book) wikipedia , lookup
Heritability of IQ wikipedia , lookup
Genome-wide association study wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Microevolution wikipedia , lookup
Public health genomics wikipedia , lookup
Dominance (genetics) wikipedia , lookup
Genetic drift wikipedia , lookup
BIO227 Introduction to Statistical Genetics Lecture 3: Introduction to population genetics 1 What have we studied Background Structure of Human Genome DNA Variants and disease Mendelian Inheritance Mendel’s first law Mendel’s second law Mode of inheritance Genetic models for mendelian and complex disease 2 Overview of Today’s Material Population Genetics Concepts: Estimation and Inference About Allele Frequencies Hardy Weinberg Equilibrium Population Substructure Measuring Genetic Contribution to Traits Recurrence Risk Ratios Heritability 3 Allele Frequencies • Definition: Allele frequency = proportion of chromosomes in population carrying the allele of interest. (e.g. a disease allele) • Allele frequencies are compared in association studies to detect disease genes • Allele frequencies tell us about the probability of observed genotypes 4 5 Hardy Weinberg Equilibrium (HWE) Theorem: Allele frequencies in a population remain constant if no evolutionary forces exist. Requirements for Hardy-‐Weinberg equilibrium: • Large population • Random mating • No mutation • No migration • No selection Departures from HW equilibrium provide a mechanism to study evolution 6 Hardy Weinberg Equilibrium (HWE) Rule: If you know allele frequency, use HWE to calculate genotype probabilities. Last week: Use Mendel’s law to compute P(offspring genotype = Go|parents genotypes = Gm,Gf). Suppose we do not know parents, but we do know P(allele) = p. How to compute: P(person’s genotype = G| allele frequencies at the locus) 6 HWE Two allele system: A and a Probabilities p and q=1-‐p If HWE holds, genotype probabilities are: AA(X=2) Aa(X=1) aa(X=0) p2 2pq q2 X is B(2,p) E(X) = Var(X) = Many assumptions required for HWE to hold. 8 IMPLICATIONS OF HWE Suppose population is in HWE, then will remain in HWE after a round of random mating. Suppose population is not in HWE, then it will get in HWE after one round of random mating. The allele frequency does not change from one generation to the next. Bio 227 lecture 3 9 How to Detect Failure of HWE: Testing for HWE in a Sample • Estimate allele frequencies • Compute Expected genotype frequencies assuming HWE holds • Use Pearson Chi-Square test Bio 227 lecture 3 10 Hardy-Weinberg Equilibrium (HWE) • Test for HWE based on Pearson chi-‐square test: Genotype AA Aa aa Observed nAA nAa naa n Expected np2 2np(1-p) n(1-p)2 n • Estimate p as (2nAA + nAa) / 2n • The Chi Square Test has 1 degree of freedom. (Why?) 11 When is HWE is useful? The failure of HWE can reveal a lot about sample features: - Selection of subjects related to genotype - Population Substructure - Genotyping errors Subject Selection: CCR5 and HIV (Hartl and Jones) • CCR5 is a protein on the surface of white blood cells (T cells), involved in the immune system • With ‘normal’ genotypes, enables HIV virus entry into T cells • A deletion of 32 base pairs creates coding of incorrect amino acids; inhibits HIV virus binding and infection • An association of deletion and lack of AIDS • Variant is present in many subpopulations world-wide 13 HWE at CCR5 Receptor: Unselected sample of 1000 subjects ➢Test for HWE based on Pearson chi-‐square test: Genotype Observed Expected ++ +Δ32 Δ32Δ32 795 190 15 1000 195.8 12.1 1000 792.1 p- = 0.11 MAF ➢Chi-‐square = 0.9, 1 df Bio 227 lecture 3 14 Test subjects without AIDS Genotype Observed Expected ++ +Δ32 Δ32Δ32 175 33 4 212 37 2 212 173 Frequency p =0.096 • Chi-square is marginal (2.5), but AIDS-free individuals show an excess of two delta-32 alleles. • Small sample sizes in table above. Should use an exact test. 15 Population Substructure: Stratification / Admixture / Inbreeding • Population stratification: distinct subgroups within a population. • Population admixture: mating among individuals of different genetic origin over multiple generations. Usually occult. • Inbreeding: mating between ‘close’ relatives Bio 227 lecture 3 17 Stratification 18 Dog Breeds and the Albumin Alleles Test for HWE using entire population: Genotype SS SF FF Observed 463 376 301 1140 Expected 371.8 558.4 209.8 1140 pF = 0.43 Highly Significant X2 (not shown) 19 New Topic: How do we measure extent to which a trait is genetic? Two primary measures: Recurrence Risk Ratios (dichotomous traits) Heritability (quantitative traits) 26 Recurrence Risk Ratio Definitions: Proband: Subject selected into sample because of disease status. P(disease) = K Relative of type R (parent, sib, etc) Recurrence risk ratio defined for dichotomous disease trait as λR = P(relative of type R diseased | proband diseased) P(disease) If the disease has a genetic basis, what should λR be? How should λR vary with R? If disease is NOT genetic, what should λR be? Bio 227 lecture 3 28 How do we use λR? • Justifies doing a genetic study of the disease • λR is the basis for power calculations for many types of linkage analysis • Compare estimated λR to different genetic models • We will look at how λR is calculated in simple Mendelian models Bio 227 lecture 3 29 Notation Disease Phenotype: Y (Y=1 is affected; Y=0 is unaffected) Genotype at Disease Locus: X=0,1,2 (dd,Dd,DD) Penetrance functions: f_x: P(Y=1|genotype = x) R: Denotes a relative of the proband p: Frequency of D allele p(X) frequency of genotypes, p(DD, Dd or dd genotype) Hardy Weinberg Equilibrium (HWE): X is Binomial (2,p) p(dd) = (1-p)2 p(dD) = 2p(1-p) p(DD) = p2 Bio 227 lecture 3 30 30/53 For Simple Mendelian Models, P(disease) depends only on genotype at a single locus, no other factors influence disease Denominator: K = P(disease) = f_0*(1-p)2 + f_1*2p(1-p) + f_2*p2 = ∑f_x * p_x Assumes penetrance functions, allele frequency, HWE Bio 227 lecture 3 31 31/53 What does λR depend on? What about the numerator ? P(relative of type R diseased | proband diseased) = P(both diseased)/K λR = P(both diseased)/K2 What does P(both diseased) depend on? 32 32/53 Calculating λR Depends on degree of relationship R, penetrance functions and Mendel’s Laws Example: Consider the sibling recurrence risk ratio and a recessive Mendelian model: Show that λS = [(1+p)/2p]2 Step 1: Calculate K Step 2: Calculate p(both sibs have disease) Step 3: Calculate λS 33 Calculating λS Denominator: K2 = ? Numerator: Values in table represent probability of an affected child 33 Recurrence Risk Ratio Recurrence risk to relatives of type R: How to calculate? 1) Assume a specific genetic model (single gene, dominant) 2) Assume a frequency for the disease allele p 3) Assume 3 penetrance functions: f_0, f_1, f_2 4) Simple to compute K=P(disease in population) 5) Assume random mating and HWE to get all possible genotypes for common ancestors 6) Use Mendel’s Laws to get offspring genotypes phenotypes and to compute P(both relatives affected) 7) Easiest when use Parent-Offspring or Sibling for R, and deterministic Mendelian models Heritability ➢Originally defined for continuous traits; can be adapted to dichotomous disease traits ➢Heritability is defined as percent of total trait variance ‘explained’ by genes ➢Requires a very specific genetic model explaining how genes affect outcome ➢Requires data on relatives to estimate ➢Can also estimate using GWAS data Bio 227 lecture 3 36