Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Tools in Human Genomics Brett Bowman March 3rd, 2013 Summary • Brief Bio • Review of Genotyping Technologies – SNP-chips (23andMe) – Exome Sequencing (In Clinical Use) – Whole Genome Sequencing • Review of SNP Analysis tools – SNP Databases – Report Tools – OMIM My (Sort of) Karyotype GENOTYPING TECHNOLOGIES 23andMe – How They Do It 23andMe – How it Works • Attach un-labeled sequence probes to array surface • Extract and Amplify sample DNA • Fragment • Wash over and bind to array probes • Extend probe 1 bp with polymerase and labeled dNTPs • Photograph! 23andMe – Processed Output • rsid == refSNP id == dbSNP id • Two letter genotype representing both alleles • NOT phased data • No quality information SNP-Chips Limitations • Requires a priori knowledge of SNPs of interest • Requires individual probes be designed and manufactured for each SNP • SNP-Chips limited by size in the number of probes they can contain • Cannot determine phase • Cannot determine copy number • Small Error Rate * Large Number = High Error Count Exome Sequencing – How it Works • Prepare labeled sequence probes • Extract, Sheer, and clean-up DNA • Mix probes with DNA • Wash away un-bound DNA • Digest probes • Sequence! Exome Sequencing – Raw Output PHRED Quality Scores Encoded Score (E) = chr(Q + 33) Numerical Score (Q) = ord(E) - 33 Exome Sequencing – Processed Output Exome Sequencing Limitations • Requires a priori knowledge of Genes of interest • Requires individual probes be designed and manufactured for each exon/gene • Hard to infer copy number • Very limited ability to phase data • Hard to make sense of novel data • Contains very little regulatory data • Complicated, unstandardized, computationally intensive analysis processes SNP-Chip vs Exome • • • • • • • SNP-Chip Cheaper (~$100) Lower Accuracy Requires precise knowledge a priori At best gets 10-20% of known variants No phasing data No structural data Simple analysis tools • • • • • • • Exome Expensive (~$1000) Higher Accuracy Requires general knowledge a priori At best gets 80-90% of known variants Some phasing data Some structural data Complex analysis tools Full Genome Sequencing How many human genomes have been completely sequenced end-to-end? Full Genome Sequencing How many human genomes have been completely sequenced end-to-end? Full Genome Sequencing - Challenges • Sequence Repeats • Secondary Structure • Particularly – Telemeric Region – Centromeric Region • Methylation • Regulation • Interpretation ANALYSIS TOOLS SNPedia SNPedia SNPedia - Promethease openSNP OMIM 23andMe API Genomes Unzipped Questions?