Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 2: Foundations of Genetic Variation January 10, 2014 Last Time u Class introduction u Basic probability theory: Sample space Counting rules Permutations Combinations Mathematical Tools for Population Genetics Basic algebra 1 1 fe 4Ne 1 1 Basic calculus Basic statistics Probability m P Pk k 1 PIDsibk He 1 1 1 4 2 2 2 (1 pi ) [ pi ( pi ) ] 4 2 i i i Population Genetics and Probability Probability is at the core of much of population genetics Reproduction is a sampling process Effects of mutation, gene flow, selection,and nonrandom mating must be seen as departures from expectations based on random processes Example: 1 genetic locus and two alleles in a forest of 20 trees determines color of foliage. Green is dominant. What proportion of offspring will have white foliage? : 4 copies : 36 copies Overview Review of genetic variation and Mendelian Genetics Methods for detecting variation Applications of probability What is Genetic Variation? Chromosome: structural unit of genetic material, containing DNA and protein Homologous: genetic material that pairs during meiosis in diploid cells Diploid: two sets of homologous chromosomes (one from each parent) Haploid: one set of chromosomes (the Genome) Locus: position on a chromosome Allele: different forms of the same locus Organelle Genomes Mitochondria (most Eukaryotes) and chloroplasts (most plants) are ancient endosymbionts Maintain their own genomes, but with greatly reduced numbers of genes: dependent on imports from nucleus Mostly maternally inherited and haploid: no recombination Phenotypes versus Genotypes Phenotype: Any observable characteristic of an organism External morphology: height, weight, color Physiology: Metabolic rate, photosynthetic rate, salt sensitivity Biochemical: Enzymatic rates, chemical composition Genotype: The hereditary or genetic constitution of an individual http://en.wikipedia.org/wiki/ Why can’t you directly infer the genotype from the phenotype? Why can’t you directly infer the phenotype from the genotype? Genetics vs Environment Many advances made in evolutionary theory based on morphology Problem was variation could be exaggerated Only variable 'loci' scored Phenotype vs Genotype Var(phenotype) = Var(genotype) + Var(environment) Heritability: Var(genotype) / Var(phenotype) Phenotypic plasticity: organisms with the same genotype have different phenotypes under different conditions Solution: control environmental variance by raising organisms in common environment Lamarck: inheritance of acquired characteristics http://en.wikipedia.org/ Early Models of Inheritance 1744-1829 Developed first fully coherent evolutionary theory A “complexifying force” drives organisms to higher levels of complexity http://morriscourse.com Use and disuse of organs affects their development and inheritance Early Models of Inheritance http://en.wikipedia.org/ Blending Inheritance Offspring have phenotypes that are intermediate between that of their parents Originally explored by Francis Galton and favored by the “biometricians” such as Pearson and Weldon Origin of modern statistics 1857-1936 Hamilton 2009 Darwin’s Theory: Pangenesis Explains variation among individuals, gradual evolutionary change in response to selection Hereditary material consists of “gemmules” distributed throughout body that accumulate in reproductive organs Elements of Lamarckian inheritance http://en.wikipedia.org/ Early Models of Inheritance 1809-1882 Early Models of Inheritance Darwin’s cousin Galton performed experiments to disprove pangenesis “Sports” or mutations with large effects were considered key drivers of evolution by Francis Galton, William Bateson and others http://en.wikipedia.org/ Discontinuous Variation 1822-1911 Mendel and Particulate Inheritance Gregor Mendel conducted a large number of experiments with peas and other plants in the Augustinian Abbey of St Thomas in Brno between 1857 and 1863 Studied over 29,000 pea plants to determine how traits were inherited Why peas? Self-fertile, little or no outcrossing Bred pure lines and then intercrossed them and followed advanced generations http://www.schoolnotes.com/32233/tss8.htm schoolnotes.com Mendel’s Observations: F1 and F2 Pure bred lines will produce only one phenotype at F1 when intercrossed F2 generation has a 3:1 ratio of dominant:recessive phenotypes Hamilton 2009 Two Types of F2s When F2’s are selfed, some breed true and some of the dominant phenotype produce 3:1 ratios of offspring phenotypes Mendel’s “Law” of Independent Segregation Based on analyzing simply inherited traits During gamete formation, two members of a gene pair (alleles) segregate separately so that half of the gametes carry one allele and half carry the other Mendel’s “Law” of Independent Assortment Based on analyzing ratios of two traits segregating simultaneously During gamete formation, the segregation of alleles of one gene is independent of the segregation of alleles of another gene Mendel’s “Laws” of Independent Segregation and Assortment Phenotype Ratio: (3:1) x (3:1) = 9:3:3:1 Genotype Ratio: (1:2:1) x (1:2:1) = 1:2:1:2:4:2:1:2:1 AABB:AABb:AAbb:AaBB:AaBb:Aabb:aaBB:aaBb:aabb Morphological Markers Traditionally used to measure genetic variation Mendel’s Laws derived from simply-inherited morphological markers in peas: genotype directly inferred from phenotype Genetic maps originally constructed from such characteristics (e.g., corn genetic map at right) Isozymes and Allozymes Mutations can cause differences in basic and acidic amino acid composition, but no change in enzyme function Small changes in primary structure can alter secondary and quaternary structure Isozymes: different forms of an enzyme Allozymes: Allelic isozymes: different forms of an enzyme that are coded at the same locus Lactate Dehydrogenase Dym et al 2000: PNAS 97:9413–9418 Detection Separate through electrophoresis in starch gels Isozymes dected based on enzyme action Stain contains substrate for enzyme, cofactors, and oxidized salt (dye) Resulting pattern is zymogram Often a direct link between phenotype (spots on gel) and genotype (genes encoding the enzyme) Hillis, D.M., C. Moritz and B. K. Mable. 1996. Molecular Systematics, 2nd ed. Sinauer Assoc. Inc., Sunderland, Mass Allozymes revolutionized population genetics Richard Lewontin Landmark 1966 papers by Lewontin and Hubby Simple and unbiased way of detecting genetic variation Explosion of studies of genetic variation in natural populations Levels of diversity in natural populations MUCH higher than predicted by prevailing theory at the time Role of selection not most important factor determining genetic diversity: Neutral Theory http://www.patentdocs.us/patent_docs /2007/05/the_as_yet_unfu.html PCR and the Molecular Revolution PCR: Polymerase Chain Reaction Invented by Kary Mullis in 1983 Exponential amplification of a specific sequence of DNA Most important molecular marker techniques involve PCR Components: primers, nucleotides, template, thermostable polymerase http://www.dnalc.org/ddnalc/resources/pcr.html Molecular Markers Molecular markers provide closer link between phenotype and genotype “Anonymous” molecular markers: RFLP, RAPD, AFLP and GBS: no knowledge of underlying sequence polymorphism or location in genome “Sequence-Tagged” markers like microsatellites or SNPs derived from defined locations in genome Often reveal higher levels of polymorphism than allozymes and morphological markers Allow studies of neutral variation in natural populations Anonymous and Sequence-Tagged Markers Anonymous markers often have short “primer” sequences (e.g., 10 bp primer sequences in RAPD) Randomly amplify portions of genome TCAAGTCTCA AGTTCAGAGT agctggactacctctacgtcagcTGAGACTTGA ACTCTGAACT Sequence-Tagged markers have longer primers (e.g., 20 bp for microsatellite primers) ATGCTGAGGTCGCTTAGCAGctctctctctctctctctctcctctctctctctctGGATCCTGAATGCTGACTG ATGCTGAGGTCGCTTAGCAGctctctctctctctGGATCCTGAATGCTGACTG DNA Sequencing Direct determination of sequence of bases at a location in the genome Shotgun versus PCR sequencing Dye terminators (Sanger) and capillaries revolutionized DNA sequencing Modern sequencing methods (sequencing by synthesis, pyrosequencing) have catapulted sequencing into realm of population genetics Human genome took 10 years to sequence originally, and hundreds of millions of dollars Now we can do it in a week for <$2,000 SNPs A Single Nucleotide Polymorphism (SNP) is a single base mutation in DNA. The most common source of genetic polymorphism (e.g., 90% of all human DNA polymorphisms). Identify SNP by screening a sample of individuals from study population: usually 16 to 48 Once identified, SNP are assayed in populations using high-throughput methods Genotyping by Sequencing New sequencing methods generate 10’s of millions of short sequences per run Combine restriction digests with sequencing and pooling to genotype thousands of markers covering genome at very high density Presence-Absence Polymorphism SNP Generate 10’s of thousands of markers for <$100 per sample http://www.maizegenetics.net/images/stories/GBS_CSSA_101102sem.pdf Genotyping by Sequencing Cost Example http://www.maizegenetics.net/gbs-overview If nucleotides occur randomly in a genome, which sequence should occur more frequently? AGTTCAGAGT AGTTCAGAGTAACTGATGCT What is the expected probability of each sequence to occur once? How many times would each sequence be expected to occur by chance in a 100 Mb genome? What is the expected probability of each sequence to occur once? AGTTCAGAGT What is the sample space for the first position? A T Probability of “A” at that position? 1 G 4 C Probability of “A” at position 1, “G” at position 2, “T” at position 3, etc.? 1 1 1 1 1 1 1 1 1 1 x x x x x x x x x 0.2510 9.54 x10 7 4 4 4 4 4 4 4 4 4 4 AGTTCAGAGTAACTGATGCT 0.2520 9.09 x10 13 How many times would each sequence be expected to occur in a 100 Mb genome? AGTTCAGAGT 9.54x10 10 95.4 7 8 AGTTCAGAGTAACTGATGCT 9.09x10 10 9.1x10 13 8 5 Why is this calculation wrong?