Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Fetal origins hypothesis wikipedia , lookup
Hardy–Weinberg principle wikipedia , lookup
Genetic drift wikipedia , lookup
Genetic testing wikipedia , lookup
Genome (book) wikipedia , lookup
Microevolution wikipedia , lookup
Population genetics wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Behavioural genetics wikipedia , lookup
Human genetic variation wikipedia , lookup
SNP genotyping wikipedia , lookup
Haplogroup G-M201 wikipedia , lookup
Nathaniel Dang 403221488 CSM124, Spring 2008 Motivation There are many factors that go into calculating disease risk – environmental, genetic, and sometimes chance. If we are able to effectively approximate an individual’s increased disease risk factor due to genetic variation, we can take several actions (if possible), including: More frequent/earlier screening Pre-symptomatic medication aimed at reducing one or more risk factors Focusing on reducing the environmental risk factors, including lifestyle changes Study these variations in depth, find correlated SNPs, study proteins encoded by the genes – leads to a greater understanding. Goals Given the probabilities of mutations, the relative risks of those mutations, and a baseline risk level, what is the prevalence and genetic contribution to risk? What has more of a genetic effect on overall risk - to have more SNPs of lower relative risk or probability, or fewer SNPs of higher relative risk/probability? Background We learned in class: pa pa ( 1) pa 1 Where γ is relative risk. We then used pa and pa+ to find the power of the association, given a specific value for γ. This assumes a very low F (prevalence) value. Background Now, we wish to find how SNP(s) affect our disease risk – How much more likely does a SNP make it that we will catch a disease? To simplify the problem, assume that SNPs are not correlated, and thus their effects on disease risk are completely independent. Single-SNP Case The probability of an individual having disease without mutation is given by R = P(+|g1=0 ), where R is our baseline level of risk. Then, the probability of having disease given a mutation is: P(+|g1=1) = γ1R Thus, the prevalence in the entire population is: F=P1γ1R + (1-P1)R Then, the genetic contribution can be found by dividing by the baseline risk: F/R=P1 γ1+(1-P1) Two SNP Case The probability of an individual having disease without mutation is given by P( + | g1=0, g2=0) = R, where R is our baseline level of risk. Then, the probability of having disease given a mutation at SNP1 but not SNP2 is: P(+ | g1=1,g2=0) = γ1R And the probability of having disease given a mutation at SNP2 but not SNP1 is: P(+ | g1=0,g2=1) = γ2R Thus, the prevalence in the entire population is: F =P1P2 γ1 γ2R + P1(1-P2) γ1 R + P2(1-P1) γ2R + (1-P1)(1-P2)R Then, the genetic contribution can be found by dividing by the baseline risk: F/R = P1P2 γ1 γ2 + P1(1-P2) γ1 + P2(1-P1) γ2 + (1-P1)(1-P2) What about n-SNP cases? Again, holding the p’s and ‘s constant across all SNPs, it can be shown that for n number of SNPs, gi n F/R= ( pigi (1 pi )(1 gi ))( i ) g{0,1}n i 1 Where gi is the genotype for SNP i. Implementation Treat each SNP’s genotype as a binary value: 1 for mutation, 0 for normal For example, applying the formula from the previous slide to a 2-SNP case, and treating all pi’s and γi’s as the same values p and γ, then as seen before: F/R = (1-P1)(1-P2) + P1γ1(1-P2) + P2γ2(1-P1) + P1γ1P2γ2 = 00+01+10+11 Which simplifies to: F/R = (1-P)2 + Pγ(1-P) + (1-P)Pγ + (Pγ)2 Implementation So, to find F/R for n SNPs, we generate an n-digit binary matrix, with all values from 0 to 2n-1(decimal). Each row represents one of the terms in the F/R equation. Thus, iterate over each row, counting the number of ‘1’ and ‘0’ values; for each ‘1’ in the row, the term is multiplied by Pγ, and for each ‘0’, the term is multiplied by (1-P). Finally, Sum each of the rows to get the final value. Methods We calculated the F/R (genetic contribution) for several cases: Holding the # of SNPs constant, how do varying minor allele frequencies affect F/R? Holding the # of SNPs constant, how do varying relative risks for each SNP affect F/R? Finally, holding the minor allele frequencies and relative risks constant for all SNPs, how does the number of SNPs affect F/R? Compare and contrast the three scenarios Results: Varying γ and p (same values for all SNPs) One SNP: Relative Risks (γ) 0.1 0.2 0.3 0.4 0.5 1.0 1.0 1.0 1.0 1.0 1.0 2.0 1.1 1.2 1.3 1.4 1.5 3.0 1.2 1.4 1.6 1.8 2.0 5.0 1.4 1.8 2.2 2.6 3.0 10.0 1.9 2.8 3.7 4.6 5.5 25.0 3.4 5.8 8.2 10.6 13.0 Two SNPs: Relative Risks (γ) Minor Allele Frequencies (p) Minor Allele Frequencies (p) 0.1 0.2 0.3 0.4 0.5 1.0 1.0 1.0 1.0 1.0 1.0 2.0 1.21 1.44 1.69 1.96 2.25 3.0 1.44 1.96 2.56 3.24 4.0 5.0 1.96 3.24 4.84 6.76 9.0 10.0 3.61 7.84 13.69 21.16 30.25 25.0 11.56 33.64 67.24 112.36 169.00 Results Ten SNPs: Minor Allele Frequencies (p) 0.1 0.2 0.3 0.4 0.5 1.0 1.0 1.0 1.0 1.0 1.0 2.0 2.59 6.19 13.78 28.92 57.66 3.0 6.19 28.9 109.9 357.04 1024.00 5.0 28.92 357.04 2655.99 14116.70 59049.00 10.0 6.13e+02 2.96e+04 4.81e+05 4.24e+06 2.53e+07 25.0 2.06e+05 4.31e+07 1.374e+09 1.79e+10 1.38e+11 Results F/R vs. Relative Risk, for 1 and 2 SNPs, p=0.1 4.00 3.50 3.00 2.50 2.00 1 SNP 1.50 2 SNPs 1.00 0.50 0.00 0.0 5.0 10.0 15.0 F/R vs. Number of SNPs, for γ=2,3, and 5, p constant at 0.1 35 30 25 Relative Risk = 2 20 Relative Risk = 3 15 10 Relative Risk = 5 5 0 -5 0 5 10 15 Conclusions Here, we can see that as expected, when holding all else constant, larger minor allele frequencies lead to an increased genetic contribution to disease risk. Similarly, all else constant, larger relative risks also lead to increased genetic contribution to disease risk. Similarly, larger numbers of SNPs leads to increased genetic factor of disease risk. Conclusions However, they do not scale equivalently! For example, holding p=0.1, increasing the γ from 1 to 10 increased F/R by factors of 1.9, 3.61, and 613, for one, two, and ten SNPs, respectively. Yet holding p=0.1, increasing the number of SNPs from 1 to 10 increased F/R by factors of 2.35, 5.8, 20.65, and 322, for γ=2,3,5,and 10, respectively. We can see that increasing #of SNPs by a factor of 10 usually has a greater effect on F/R than increasing γ by the same factor, for most cases. Conclusions As another example, say we had two SNPs, each with p=0.1 and relative risk of 10, which gives an F/R of 3.61. A single SNP with the same p hoping to achieve the same F/R value would require a relative risk of 25! It appears that going from p=0.1 to p=0.5 holding all else equal, results in greater F/R gains than going from γ=1 to γ=5 holding all else equal, most of the time. However, for low γ values, increased p does not result in a large increase in F/R. Future Work Scalability: Currently, the program in R doesn’t allow for creation of large matrices (>20) due to memory issue Allow for differing values of p and γ for each SNP (Read in vectors of p and γ values?)