* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Leveraging Genetic variability across populations for
Genetic testing wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Genetic studies on Bulgarians wikipedia , lookup
Heritability of IQ wikipedia , lookup
Polymorphism (biology) wikipedia , lookup
Genome (book) wikipedia , lookup
SNP genotyping wikipedia , lookup
Public health genomics wikipedia , lookup
Behavioural genetics wikipedia , lookup
Medical genetics wikipedia , lookup
Koinophilia wikipedia , lookup
Genetics and archaeogenetics of South Asia wikipedia , lookup
Genetic drift wikipedia , lookup
Microevolution wikipedia , lookup
Population genetics wikipedia , lookup
Human Genetic Variation Genetics of Complex Diseases 2 Challenges Challenge 2: Correcting genotyping errors 3 How can we detect genotyping errors? Hardy-Weinberg Equilibrium If we have Mother-father-child trios we can check Mendelian consistency. Challenge 3: Population Substructure Imagine that all the cases are collected from Africa, and all the controls are from Europe. Many association signals are going to be found The vast majority of them are false; Why ??? Different evolutionary forces: drift, selection, mutation, migration, population bottleneck. Shaping Genetic Variation Mutations add to genetic variation Natural Selection controls the frequency of certain traits and alleles Genetic drift Ancestral population Ancestral population migration Ancestral population Genetic drift different allele frequencies Population Substructure Imagine that all the cases are collected from Africa, and all the controls are from Europe. Many association signals are going to be found The vast majority of them are false; What can we do about it? Ancestry Inference To what extent can population structure be detected from SNP data? What can we learn from these inferences? Can we build the tree of life? How do we analyze complex populations (mixed)? Novembre et al., Nature, 2008 Principal Component Analysis Dimensionality reduction Based on linear algebra Intuition: find the ‘most important’ features of the data. Principal Component Analysis Plotting the data on a one dimensional line for which the spread is maximized. Principal Component Analysis In our case, we want to look at two dimensions at a time. The original data points have many dimensions – each SNP corresponds to one dimension. 14 Data Available International consortium that aims in genotyping the genome of 270 individuals from four different populations. HUJI 2006 - Launched in 2002. - First phase (2005): ~1 million SNPs for 270 individuals from four populations - Second phase (2007): ~3.1 million SNPs for 270 individuals from four populations - Third phase (ongoing): > 1 million SNPs for 1115 individuals across 11 populations HUJI 2006 HapMap Populations 17 MKK LWK YRI GIH ASW CHD CEU MEX JPT CHB TSI HapMap PCA 1-2 18 HapMap PCA 1-3 19 HapMap PCA 1,2,4 20 Lessons from the HapMap 21 African populations have higher genetic diversity than other populations Evidence for bottlenecks or founder effect in the other population Evidence for the out-of-Africa theory HapMap was used to detect: Common deletions across the genome Regions under selection Recombination rates, hotspots Associations of SNPs with disease Example: detection of deletions using SNPs 22 Conrad et al., Nature Genetics, 2006 Example: detection of deletions using SNPs 23 Conrad et al. applied the method on the HapMap and found: Typical individuals have roughly 30-50 deletions larger than 5kb (500kb-750kb total sequence length). Deletions tend to be gene-poor. The deletions detected in the HapMap span 267 known and predicted genes. Deletions were found to be related to different conditions such as Schizophrenia (Steffanson et al., 2008), lupus glomerulonephritis (Aitman et al., Nature, 2006), and others. Distribution of deletion length 24 Conrad et al., Nature Genetics, 2006 Significant Region - Why do we have differences between data1 and data2? - How come so many SNPs seem to be associated in this region? • Maybe there are multiple ‘causal SNPs’? • Or maybe there are correlations between the SNPs… ? 26 Linkage Disequilibrium Signatures of History Linkage Disequilibrium 27 Haplotypes vs. Genotypes Haplotypes ATCCGA AGACGC Genotype T chromosome C C mother A CG father chromosome GA A Cost effective genotyping technology gives genotypes and not haplotypes. Possible phases: ATACGA AGCCGC AGACGA ATCCGC …. Haplotypes cluster naturally Haplotypes cluster naturally