Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College [email protected] http://clavius.bc.edu/~marthlab/MarthLab Computational research labs Prof. Peter Clote RNA secondary structure and energy landscape Protein motif recognition Prof. Jeffrey Cheung Human mutation landscape Regulatory networks Prof. Gabor Marth Genetic polymorphism discovery Population Genetics Medical Genetics Resources • CLAVIUS – a multi-CPU UNIX computer cluster • UNIX development servers • A teaching laboratory equipped with PC laptop computers running LINUX over VMWARE • A professional new server room under construction The CompBio teaching program • Currently part of the Biology graduate program (PhD only) • We have 2 Bioinformatics graduate students with a larger class expected for Fall 2006 • Curriculum combines Biology, Computer Science, Math and Statistics courses • We are working towards an inter-departmental Bioinformatics / Computational Biology PhD program The Computational Genetics Lab http://clavius.bc.edu/~marthlab/MarthLab Sequence variations (polymorphisms) The Human Genome Project has determined a reference sequence of the human genome However, every individual is unique, and is different from others at millions of nucleotide locations sequence polymorphisms Why are sequence variations important? source of phenotypic difference cause inherited diseases allow tracking ancestral human history 1. Polymorphism discovery tools Polymorphism discovery in clonal sequences P( SNP ) all var iable P( S N | RN ) P( S1 | R1 ) ... PPr ior ( S1 ,..., S N ) PPr ior ( S1 ) PPr ior ( S N ) P ( SiN | R1 ) P ( S | R ) i 1 S 1 ... PPr ior ( Si1 ,..., SiN ) ... PPr ior ( SiN ) S i1 [ A ,C ,G ,T ] S iN [ A ,C ,G ,T ] PPr ior ( S i1 ) Homozygous C Heterozygous C/T Homozygous T Marth et al. Nature Genetics 1999 Automated detection of somatic mutations in diploid individual samples 2. Mining genetic variation data Cataloguing all naturally occurring normal sequence polymorphisms Marth et al. Nature Genetics 2001 Genetic and epigenetic changes in cancer nucleotide changes, short insertions / deletions copy number changes, chromosomal rearrangements DNA methilation, histone modification 3. Demographic inference Data – statistical distributions 1. marker density (MD): distribution of number of SNPs in pairs of sequences 0.3 0.2 Clone 1 Clone 2 # SNPs AL00675 AL00982 8 0.1 0 AS81034 AK43001 0 CB00341 AL43234 2 0 1 2 3 4 5 6 7 8 9 10 2. allele frequency spectrum (AFS): distribution of SNPs according to allele frequency in a set of samples 0.1 0.05 0 1 2 “rare” 3 4 5 6 7 8 9 10 “common” SNP Minor allele Allele count A/G A 1 C/T T 9 A/G G 3 Models – mathematical and simulation stationary past collapse expansion bottleneck history present MD (simulation) 0.3 0.3 0.3 0.3 0.2 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0 0 0 AFS (direct form) 1 2 3 4 5 6 7 8 9 10 0 0 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 0 10 0.1 0.1 0.1 0.1 0.05 0.05 0.05 0.05 0 0 1 2 3 4 5 6 7 8 9 10 Marth et al. PNAS 2003 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 9 10 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 Conclusions based on model fitting European data African data bottleneck modest but uninterrupted expansion Marth et al. Genetics 2004 4. Medical Genetics The polymorphism structure of individuals follow strong patterns http://pga.gs.washington.edu/ 3. An international project is under way to map out human polymorphism structure… However, the variation structure observed in the reference DNA samples… … often does not match the structure in another set of samples such as those used in a clinical case-control association study to find disease genes and diseasecausing genetic variants … we build computational tools to test sampleto-sample variability for clinical studies Instead of genotyping additional sets of (clinical) samples with costly experimentation, and comparing the variation structure of these consecutive sets directly… … we generate additional samples with computational means, based on our Population Genetic models of demographic history. We then use these samples to test the efficacy of gene-mapping approaches for clinical research. 5. We develop methods to connect genotype and clinical outcome in simple gene systems genetic marker (haplotype) in genome regions of drug metabolizing enzyme (DME) genes computational prediction based on haplotype structure functional allele (known metabolic polymorphism) clinical endpoint (adverse drug reaction) molecular phenotype (drug concentration measured in blood plasma)