Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Single nucleotide polymorphisms Usman Roshan SNPs • DNA sequence variations that occur when a single nucleotide is altered. • Must be present in at least 1% of the population to be a SNP. • Occur every 100 to 300 bases along the 3 billion-base human genome. • Many have no effect on cell function but some could affect disease risk and drug response. Toy example SNPs on the chromosome SNP Chromosome Gene Perl exercise • Determining SNPs from a pairwise genome alignment: – Can we solve this problem with a Perl script? Bi-allelic SNPs • Most SNPs have one of two nucleotides at a given position • For example: – A/G denotes the varying nucleotide as either A or G. We call each of these an allele – Most SNPs have two alleles (bi-allelic) Perl exercise • Determining SNP type from a multiple genome alignment. SNP genotype • We inherit two copies of each chromosome (one from each parent) • For a given SNP the genotype defines the type of alleles we carry • Example: for the SNP A/G one’s genotype may be – – – – AA if both copies of the chromosome have A GG if both copies of the chromosome have G AG or GA if one copy has A and the other has G The first two cases are called homozygous and latter two are heterozygous SNP genotyping Perl exercise • SNP encoding: – Convert SNP genotype from a character sequence to numeric one Real SNPs • SNP consortium: snp.cshl.org • SNPedia: www.snpedia.com Application of SNPs: association with disease • Experimental design to detect cancer associated SNPs: – Pick random humans with and without cancer (say breast cancer) – Perform SNP genotyping – Look for associated SNPs – Also called genome-wide association study Case-control example • Study of 100 people: – Case: 50 subjects with cancer – Control: 50 subjects without cancer • Count number of dominant and recessive alleles and form a contingency table #Recessive alleles #Dominant alleles Case 10 40 Control 2 48 Perl exercise • Contingency table: – Compute contingency table given case and control SNP genotype data Odds ratio • Odds of recessive in cancer = a/b = e • Odds of recessive in no-cancer = c/d = f • Odds ratio of recessive in cancer vs no-cancer = e/f #Recessive alleles #Dominant alleles Cancer a b No cancer c d Risk ratio (Relative risk) • Probability of recessive in cancer = a/(a+b) = e • Probability of recessive in no-cancer = c/(c+d) = f • Risk ratio of recessive in cancer vs no-cancer = e/f #Recessive alleles #Dominant alleles Cancer a b No cancer c d Odds ratio vs Risk ratio • Risk ratio has a natural interpretation since it is based on probabilities • In a case-control model we cannot calculate the probability of cancer given recessive allele. Subjects are chosen based disease status and not allele type • Odds ratio shows up in logistic regression models Example • Odds of recessive in case = 15/35 • Odds of recessive in control = 2/48 • Odds ratio of recessive in case vs control = (15/35)/(2/48) = 10.3 • Risk of recessive in case = 15/50 • Risk of recessive in control = 2/50 • Risk ratio of recessive in case vs control = 15/2 = 7.5 #Recessive alleles #Dominant alleles Case 15 35 Control 2 48 Odds ratios in genome-wide association studies • Higher odds ratio means stronger association • Therefore SNPs with highest odds ratios should be used as predictors or risk estimators of disease • Odds ratio generally higher than risk ratio • Both are similar when small Statistical test of association (P-values) • P-value = probability of the observed data (or worse) under the null hypothesis • Example: – Suppose we are given a series of co in-tosses – We feel that a biased coin produced the tosses – We can ask the following question: what is the probability that a fair coin produced the tosses? – If this probability is very small then we can say there is a small chance that a fair coin produced the observed tosses. – In this example the null hypothesis is the fair coin and the alternative hypothesis is the biased coin