Download Multilocus Genetics

Molecular Evolution: Selection • the ratio between the number of nonsynonymous substitutions (KA) and synonymous substitutions (KS) in a gene during a specific evolutionary period. • Assuming that KS provides an index of the random mutation rate, the KA/KS ratio measures whether the rate of protein evolution differs from the rate expected under neutral drift. – If KA>KS, this is taken to indicate accelerated aminoacid change, which might be due to positive selection. – Conversely, if KA<KS, this suggests purifying selection. Brain Development in Primates • MCPH1 (the gene that encodes microcephalin) • and ASPM (abnormal-spindle-like, microcephaly associated) • Both MCPH1 and ASPM are evolutionarily ancient, with orthologues that are likely to be present in all chordates MCPH MCPH (A) Schematic representation of the alignment. Promoter regions, exons, and introns are marked in gray, red, and blue, respectively. White segments correspond to gaps. (B) Positions of long (50 bp or longer) insertions/deletions. “O” denotes orangutan, “M” macaque, “OGCH” the orangutan–gorilla–chimpanzee–human clade, and “GCH” the gorilla–chimpanzee–human clade. (C) Positions of polymorphic bases derived from the GenBank single nucleotide polymorphism (SNP) database. (D) Positions of the CpG island. The approximately 800-bp-long CpG island includes promoter, 5′ UTR, first exon, and a small portion of the first intron. (E) Location of an approximately 3-kb-long segmental duplication. (F) Positions of selected motifs associated with genomic rearrangements in the human sequence. Numbers in parentheses reflect number of allowed differences from the consensus motif (zero for short or two ambiguous motifs, two for longer sites). (G) Distribution of repetitive elements. The individual ASPM genes share the same repeats except of indels marked in (B). (H) DNA identity and GC content. Both plots were made using a 1-kb-long sliding window with 100-bp overlaps. The GC profile corresponds to the consensus sequence; the individual sequences have nearly identical profiles. Linkage Studies Monogenic and Complex Studies Genetic and physical markers  Markers ordered based on recombination distances  Measured in centiMorgans (cM)  Physical map  Markers ordered based on sequence distance  Measured in number of bases  Can integrate physical and genetic maps genetic map physical map 0 17.0 cM 34.5 cM DIIMit35 DIIMit260 kilobases (kb)  Genetic map 62.0 cM DIIMit199 IL5 IL3 1,000 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Linkage analysis  Find markers that are genetically linked to disease phenotype  Calculate the odds of linkage vs. nonlinkage  Threshold for significant linkage typically taken at 1,000:1 odds of being linked vs. nonlinked  Usually converted into LOD (logarithm of the odds)  LOD threshold is 3 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 An example of linkage analysis  Cystic fibrosis found to be linked to wellknown “met” marker on chromosome 7  But linkage was not strong enough to start sequencing  Finer genetic mapping  Markers D7S122 and D7S340 showed very strong linkage  This finding allowed researchers to narrow down search area for a chromosome walk © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Chromosome walking  In 1989, vectors could not carry the large DNA inserts that are possible today  Example: finding the cystic fibrosis gene  Walk began with probes constructed from the DNA markers D7S122 and D7S340  Clones (~30 kb) are pulled out, beginning with marker probe  Clones used for new probes to find next sequence  Chromosome jumping facilitated the process and avoided unclonable regions  Process repeated for 280-kb region © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458  CF affects lung and pancreas, not testes or brain  Northern blot analysis  One gene expressed in affected tissues, not expressed in unaffected tissues d glan Swe at n Colo Panc rea Nasa s l pol yp Lung T84 Brai  Four genes found by chromosome walking  Associate gene with disease n Plac en L iv e t a r Adre na Test l gland is Paro t id g land kidn ey Analyzing candidate genes 28S – 18S – 1 2 3 4 5 6 7 8 910 11 12 13 14 15 16 17 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Using genomics to find Mendelian disease genes Linkage analysis Finer genetic mapping Physical mapping Identify candidate genes in human genome databases Gene cloning Mutation identification Mutation identification © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Postgenome approach for finding Mendelian disease genes  High-density SNP map makes linkage analysis easier  Pre–human genome: ~6,000 chromosomal markers  Post–human genome: millions of SNPs  Complete sequence of human genome provides all possible candidate genes  Homology searches now much easier  Characteristics of disease may suggest what disease gene should look like (e.g., a brain-specific ion channel for a neurological disease) © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Complex disease  Monogenic disease traits  Mendelian disease  Mitochondrial disease  Typically rare: < 0.1%  Complex disease  Common: > 0.1%  Polygenic or oligogenic  Environmental factors contribute © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Factors complicating the analysis of complex-disease traits  Incomplete penetrance  Inheritance of gene predisposing individual to disease is not sufficient to cause disease  Phenocopies  People may develop disease without necessarily possessing a gene that predisposes them to that illness © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Common multifactorial diseases  Frequency of some well-known complexdisease traits        Hypertension (~23%) Diabetes (~5%) Epilepsy (~1%) Schizophrenia (~1%) Bipolar disorder (~1%) Multiple sclerosis (~0.1%) Autism (~0.1%) © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Environmental contribution to complex disease  Environmental factors can affect expression of a complex-disease phenotype  Example: diabetes  Large increase in adultonset diabetes in the U.S. from 1990 to 2000  Likely due to sedentary lifestyle and poor eating habits Percentage of adults with diabetes 1990 2000 no data < 4% 4%–6% > 6% © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Quantifying the genetic component of complex diseases  Familial clustering  Frequency of complex disease in relatives (s)  Twin studies  Frequency of disease traits in twins  Studies of adoptees  Example: twins separated at birth  Studies of populations with similar genetics, but different environments  Example: members of the Pima tribe in the U.S.A. and Mexico © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 A genetic component for epilepsy  Epilepsy affects 1% of the population  Diverse etiology  Genetics of epilepsy  Concordance of epilepsy in monozygotic twin pairs: 62%  Can also be used as a measure of hereditability  Concordance of epilepsy in dizygotic twins: 18%  Probability that the child of an epileptic parent will have the illness: 4% © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 The Pima tribe and diabetes  Pimas of Arizona have highest rate of type 2 diabetes in world  ~50% of Pimas over 35 have type 2 diabetes  High-fat American diet  2 hrs/week hard labor  Pimas of Mexico  Normal rates of diabetes  Low-fat diet  23 hrs/week hard labor © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Understanding polygenic traits  Phenotypes associated with polygenic traits  Continuous  Threshold effect  Distribution of phenotype explained by multiple genes, each with multiple alleles AB Ab aB ab AB AABB AABb AaBB AaBb Ab AABb AAbb AaBb Aabb aB AaBB AaBb aaBB aaBb ab AaBb Aabb aaBb aabb © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 More genes means a normal distribution of phenotype  A continuous trait 5 Frequency  More underlying genes create bell-shaped distribution of phenotype  Bell curve also called normal distribution, or Gaussian  Shape of distribution is expected from statistics 6 4 3 2 1 0 4'0" 4'6" 5'0" 5'6" 6'0" Height © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 The threshold effect  Mendelian disease  Complex disease mm Mm healthy Cc sick Complex  Disease risk small for cc genotype  Greater for Cc  Greater still for CC sick Mendelian  Example: autosomaldominant disease  Almost all individuals carrying M allele will have disease healthy cc CC © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Genetic and physical markers genetic map  Markers ordered based on recombination distances  Measured in 17.0 cM DIIMit35 centiMorgans (cM)  Physical map  Markers ordered based on sequence distance  Measured in bases  Can integrate physical and genetic maps 34.5 cM physical map DIIMit260 0 kilobases (kb)  Genetic map 62.0 cM DIIMit199 IL5 IL3 1000 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Strict criteria for complex-disease gene studies  Statistical significance for genomewide association or linkage  No genomewide association studies to date; only linkage  Resolution of linkage insufficient for identifying gene  Fine mapping of locus  Linkage disequilibrium  Analysis of the sequence  Find nucleotide variants consistent with disease phenotype  Testing of candidate-gene function  Complement deficient gene to restore healthy state  Circumstantial evidence © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Complex-disease genes that satisfy the criteria  At end of 2002: Seven complex-disease gene studies in humans satisfied the strict criteria        HLA-DQA (Type 1 diabetes) HLA-DQB (Type 1 diabetes) CAPN10 (Type 2 diabetes) NOD2 (Crohn’s disease) ApoE (Alzheimer’s disease) ADAM33 (Asthma) ACE (Cardiovascular disease)  No complementation studies for these genes © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 The impact of genomics on finding complex-disease genes  Improved annotation  Better candidate genes  High-resolution marker map makes linkage analyses easier  Density of SNP markers makes linkagedisequilibrium studies possible 100 % windows with 1 or more SNP  Finding human homologues of nonhuman genes 80 60 40 20 0 1 2 5 10 15 20 40 80 size of sequence windows (kb) © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Nail-Patella Syndrome • Nail Patella Syndrome (also called Fong's Disease, Hereditary OnychoOsteodysplasia ['HOOD'] is characterized by several typical abnormalities of the arms and legs as well as kidney disease and glaucoma Recombination Frequency • to determine the linkage distance between the two genes (B/O and NP genes). The original mating in generation I and the first two matings in generation II are test cross. The third mating in generation II is not informative because it involves the A allele which we are not following. We have a total of 16 offspring that are informative. Of these three were recombinant. As with all test crosses, this gives a genetic distance of 18.8 cM [100*(3/16)]. http://www.ndsu.edu/instruct/mcclean/plsc431/linkage/linkage6.htm Lod Score Method of Estimating Linkage Distances The following pedigree will be used to demonstrate a method developed to determine the distance between genes. This approach has been widely adapted to various system and genetic programs have been developed based on this technique. Pedigree • Even though we are working with the same two genes, nail-patella and blood type, in this pedigree the dominant allele seems to be coupled with the A blood type allele. • Remember in the previous example, the dominant nailpatella allele was linked with the B allele. This is an important point in genetics --- not all linkages between alleles of two genes are found to be constant throughout a species. • Why??? Because at some point in the lineage of this family, the disease (nail-patella) allele recombined and became linked to a different blood type allele. In even other lineages, the nail-patella causing allele is linked to the O blood type allele. Recombination Frequency • we have one recombinant among the eight progeny. This gives us a recombination frequency of 0.125 and a distance of 12.5 cM. LOD Score Method • developed by Newton E. Morton, and is an iterative approach that include a series of lod scores calculated from a number of proposed linkage distance. LOD Score Method • A linkage distance is estimated, and given that estimate, the probability of a given birth sequence is calculated. That value is then divided by the probability of a given birth sequence assuming that the genes are unlinked. The log of this value is calculated, and that value is the lod score for this linkage distance estimate. LOD Score Method Example • In this first birth sequence, we have an individual with a parental genotype. The probability of this event is (1 - 0.125). Because there are two parental types, this value is divided by two to give a value of 0.4375. In this pedigree we have a total of seven parental types. We also have one recombinant type. The probability of this event is 0.125 which is divided by two because two recombinant types exist. Example • What would the sequence of births be if these genes were unlinked? • When two genes are unlinked the recombination frequency is 0.5. Therefore, the probability of any given genotype would be 0.25. Linkage Probability • The probability of a given birth sequence is the product of each of the independent events. So the probability of the birth sequence based on our estimate of 0.125 as the recombination frequency would be equal to (0.4375)7(0.0625)1 = 0.0001917. Non-linkage Probability • The probability of the birth sequence based on no linkage would be (0.25)8 = 0.0000153. Calculation of LOD score • Now divide the linkage probability by the non-linkage probability and you get a value of 12.566. Next take the log of this value, and you obtain a value of 1.099. This value is the lod score. • LOD= 0.0001917/ 0.0000153=log(12.566) In practice, we would like to see a lod score greater that 3.0. What this means is that the likelihood of linkage occurring at this distance is 1000 times greater that no linkage. What we can learn from LD  Finding disease genes  Fine mapping of genes  Genomewide association studies  Other uses  Revealing the history of human populations  Understanding human origins  Studying patterns of recombination in the human genome © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Finding a gene associated with asthma  Use classic linkage analysis to localize disease gene to area on the chromosome  Avoid the problem of multiple etiologies by restricting disease definition  Use linkage disequilibrium to narrow down area where disease gene might be located  Look for candidate genes  Perform gene expression study to ensure that disease gene is expressed in appropriate tissues © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Case Control Studies Modified from Iris A. Granek, M.D., M.S. Case-control studies • Search for differences in allele frequency between disease carriers (cases) and non carriers (controls) with the assumption differences in frequencies are associated with disease outcome. • Can be applied to exposure to a chemical or a carcinogen instead of allele (genotypes). Case Selection • Define the source population – residents of a geographic region – hospital inpatient or clinic • Strict case definition – inclusion criteria Control Selection • Same source population as the cases • Choose the controls by random from the source population – spouses – associates – patients within the same facility – matched for certain criteria Hospital Controls • Without regard to diagnosis • Excluding certain diseases • Including only diseases believed to be unrelated to the exposures (or alleles) being studied • Clinic patients from same hospital Case Control Study Design Compares distribution of exposure cases (disease) vs. controls (without disease) Exposure History Cases & Controls Exposed CASES a Not Exposed Totals Proportions exposed CONTROLS b c a+c a a+c d b+d b b+d Distribution of past benzene exposure among leukemia cases vs. controls • 20 leukemia cases found among large group of chemical workers • 16 cases had past benzene exposure • Proportion of cases exposed to benzene: 16/20=80% • 100 healthy controls randomly selected from same group of chemical workers • 12 controls had past benzene exposure • Proportion of controls exposed to benzene 12/100=12% Odds Ratio Unmatched Analysis CASES EXPOSED a NOT EXPOSED c Ratio of a/c CONTROLS b d odds of exposure in cases odds of exposure in controls b/d Odds Ratio OR = ad bc Odds Ratio Unmatched Analysis BENZENE NO BENZENE LUNG CA CONTROLS 16 12 4 88 Ratio of odds of exposure in cases 16/4 odds of exposure in controls 12/88 Odds Ratio OR = 16 X 88 = 29.3 4 X 12 Odds Ratios • OR > 1 indicates a positive association between the factor and the disease – The lung cancer patients were 29 times more likely than the controls to have been exposed to benzene • OR < 1 indicates the factor is protective • OR = 1 indicates no association 95% Confidence Limits • 95% probability that the true value lies within the confidence interval or between the confidence limits • Odds ratios are statistically significant if they do not include 1 • OR = 7 (0.5 - 15.0) not statistically significant • OR = 7 (3.0 - 12.0) is statistically significant Advantages of Case Control • Quick and Inexpensive • Optimal for rare diseases • Useful for diseases of long latency from exposure to disease development • Can evaluate multiple risk factors Bias in Case Control Studies • Bias is a systematic error in the study that distorts the results & limits the validity of the conclusions. – Selection Bias – Confounding – Observation Bias (recall bias, interviewer bias, misclassification) Selection Bias • Systematic errors arising from the way the subjects are selected – Study subjects are selected in a way that can misleadingly increase or decrease the magnitude of an association • Exposure of cases differs from exposure of all cases in source population or exposure of controls selected differs from non diseased in source population Selection Bias Source Population Study Sample EEEEXX EEE XXXXXX EXX With disease Cases EEEEEE EEX EEXXXX XXX Without disease Controls Confounding • Distortion of the true relationship between the exposure and outcome due to a mutual relationship with another factor • Can be the reason for an apparent association & also may cause a true association to not be observed • Confounder must be associated with the outcome and the exposure Confounding Factors Benzene Exposure Lung Cancer Cigarette Smoking (Confounder) Controlling for Confounding The effect of confounding variables • can be controlled during the data analysis by various methods – stratification – multivariate analysis • can be controlled during the study design by matching controls and cases for the factor Matched Case Control Design • Controls selected matched to cases on factors associated with the disease – age, sex, race, socioeconomic status • Makes the two groups similar on factors other than the exposure of interest • Cannot compare groups on matched factors • Must used matched analysis Observation Bias • Interviewer (data collection) bias – keep data collection same for cases and controls • Misclassification Bias – incorrect characterization of exposure • Recall Bias – recall of exposures may be influenced by current disease status Calculate the Odds Ratio Esophagial cancer and alcohol Fisher’s Exact Test http://www.matforsk.no/ola/fisher.htm Circumstantial evidence for disease-gene function  ADAM33 gene is expressed in lung tissue, bronchial smooth muscle (BSM), and other tissues  Gene codes for a putative protease  Perhaps ADAM33 processes proteins in affected tissues BS M lun nonlung tissues g  Still no definitive theory for cause of asthma 5.0 kb 3.5 kb actin © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Progress in finding complex-disease genes 100 Mendelian traits 1,800 1980 Complex traits  Mendelian disease rarer, but genes much easier to find than complexdisease genes  Progress in finding complex-disease genes is slow, but increasing 1990 2000 Human Mendelian Human complex All complex © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Genomewide association studies  Based on concept of linkage disequilibrium  Basic concept  Characterize SNPs in a variety of populations  Use SNP associations with disease  Presently prohibitively expensive  New technologies being developed  DNA pooling  More efficient genotyping methods © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Linkage disequilibrium A1 A2 After 100 Years 100 kb A1 A2 Population 50% 50% 50% 50% Mutation 99% 1% 65% 35% M B1 B2 100 Mb After 1,000 Years A1 B1 B2 chromosomes A2 B1 B2 Population 50% 50% 50% 50% Mutation 70% 30% 50% 50% © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Quantifying LD  Measures of linkage disequilibrium (LD) depend on the frequency of alleles  D often used as a measure of LD  What is the linkage disequilibrium between loci A and B?  D = pAB – pApB  pAB is frequency of alleles A and B occurring together  pA is frequency of allele A occurring alone  pB is frequency of allele B occurring alone © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Scaling for allele frequency  Value of D will depend strongly on allele frequency  D can be positive or negative  Developing a more consistent measure of LD  What is Dmax for given allele frequencies?  Dmax = min(pApb, papB) if D is positive  Dmax = min(pApB, papb) if D is negative  D’ = D / Dmax  |D’| = 0 for linkage equilibrium  |D’| = 1 for 100% co-occurrence of alleles © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Square-of-the-correlation coefficient  Another common measure of linkage disequilibrium is the square of the correlation coefficient, r2  r2 = D2/(pApapBpb)  Advantages of using r2  Existing body of literature in population genetics already uses r2  Correlation coefficient is standard statistical measure  Sample size needed to detect statistically significant LD is inversely proportional to r2 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 LD and recombination distance 1.0 0.8 r2 0.6 0.4 0.2 0 0.00 0.05 0.10 0.15 0.20 0.25 0.30 cM © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Caveats about linkage disequilibrium  Population stratification  Arises when examining populations with different genetic backgrounds  Need population-specific strategies  300,000 SNPs likely needed for Europeans  1,000,000 SNPs likely needed for Africans  False associations  Marker in LD with the ability to speak Icelandic  Have you found the Icelandic-language gene? © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Multilocus Genetics