Download Keynote for 2008 Genomics Workshop

cheap sequencing for regular Joes and Janes: the demand for more bioinformatics software Gane Ka-Shu Wong: iCORE Chair in BioSystems Informatics The University of Alberta – Biological Sciences and Medicine And also, Associate Director of the Beijing Genomics Institute the essence of the human genome project and it’s offspring the human HapMap project was really about improving the core technologies of genetics: sequencing, genotyping 2001 2002 2005 2007 historical costs to sequence the 3 billion bp of a human genome Gordon Moore costs to sequence a human genome 10 log10 (US dollar) 9 8 $3 billion 7 $300 million 6 5 $300,000 two vendors 4 1989 1991 1993 1995 1997 1999 2001 year sequenced 2003 competition 2005 2007 2009 Roche-454 (pyro)sequencing 1. fragment and denature 2. add adapters to both ends 3. one fragment per bead chemiluminescent signal generation: dNTP incorporation releases PPi; sulfurylase converts PPi to ATP; luciferase converts ATP to visible light 4. emulsion PCR amplification 5. sequencing by synthesis 6. analyze image of bead array Illumina-Solexa sequencing 1. fragment, denature, and add adapters 2. bind randomly to primer lawn, perform bridge amplification in contrast to Roche-454, the Illumina-Solexa technology generates multi-colored fluorescent signals on a randomly arrayed 2D surface 3. sequencing by synthesis, four-color labeled dNTPs 4. computer analysis of lawn image capillaries versus next generation (massively parallel) DNA sequencing PE-ABI (3730xl) Amplify before loading ng's DNA per read One hour to run 96 reads Lengths 1000 bp Daily throughput: 2.3 Mb Higher accuracy Costs per 1000 bp: $2 Roche-454 (FLX ) Illumina-Solexa (GA) Single molecule sensitivity Single molecule sensitivity μg's DNA per run μg's DNA per run 7.5 hrs run for 0.4M reads 2.5 days run for 80M reads Lengths 200 bp Lengths 40 bp Daily throughput: 0.25 Gb Daily throughput: 1.28 Gb Lower accuracy Higher accuracy Costs per 1000 bp: $0.20 Costs per 1000 bp: $0.005 excitement about Pacific Biosciences is based on read lengths of many kb, albeit with lower base pair accuracy BGI Offers Next-Gen Sequencing Service: Kicks Off 100Genome Sequencing Project [8 January 2008] Knome, BGI Forge Sequencing Alliance; GATC Spins Off Personal Genomics Unit [January 15 2008] Google 580,000 SNPs BGI-Shenzhen 1 million SNPs whole genome BGI-Shenzhen and allies in the US and UK will be sequencing 1000 human genomes in the next 3 years Nature: 17 January 2008 Science: 25 January 2008 1000 human genomes will turn the medical genetics world upside down PHENOTYPE TO GENOTYPE cystic fibrosis  CFTR  disease affects less than a percent of population breast cancer  BRCA1+BRCA2  genes affect only a few percent of patients GENOTYPE TO PHENOTYPE functional polymorphisms identified in 1000 individuals  linked to disease by association studies  information of value to policy makers in public health Rommens JM, … Tsui L-C, Collins FS (1989). Identification of the cystic fibrosis gene: chromosome walking and jumping. Science 245: 1059-1065. Riordan JR, … Collin FS, Tsui L-C (1989). Identification of the cystic fibrosis gene: cloning and characterization of complementary DNA. Science 245: 1066-1073. Kerem B, … Tsui L-C (1989). Identification of the cystic fibrosis gene: genetic analysis. Science 245: 1073-1080. 8 September 1989 after 19 years (and 1000 genes) we have not cured a genetic disease Maynard, I just decided that I hate your generation. You made all those promises about the human genome sequence improving health care, but my generation will have to deliver. That’s right. One of these days one of you will have to actually cure something! Prof. Maynard Olson YanHuang and the panda genome (raising awareness for the new technologies) Emperor’s Yan and Huang were the first rulers of ancient China, so modern Chinese say that they are descendants of YanHuang. The panda is a Chinese national treasure and the logo for the World Wildlife Fund. While not the first endangered species to be sequenced (chimp was first), it will be the first with a conservation focus. Whole genome shotgun assembly is nontrivial for 45 bp reads even with paired end information and 50x redundancy. YanHuang genome: data collected YanHuang genome: read coverage YanHuang genome: SNP accuracy YanHuang genome: disease alleles aftermath of 12 May 2008 earthquake in Sichuan measuring 7.9 on the Richter scale aftermath of 12 May 2008 earthquake in Sichuan measuring 7.9 on the Richter scale our plans for the panda genome (whole genome assembly using short reads) 50x of paired end data using Solexa average read lengths 40~50 bp estimated scaffold sizes 10~100 kbp anchored by synteny to human first assembly by end of August ’08 graph and overlap layout based molecular censusing doubles giant panda population estimate in a key nature reserve Zhan X, Li M, Zhang Z, Goossens B, Chen Y, Wang H, Bruford MW, Wei F. Curr Biol. 2006 Jun 20; 16(12): R451-2 redo experiments on more comprehensive population from every panda reserve, and with 1536 SNPs rather than just 9 microsatellites expressed gene sequences of 1000 medicinal plants for only $2 million There are 96 plant species with more than 20,000 expressed sequence tags (ESTs), but most are crop plants. If we count only medicinal plants, generously defined to include makers of secondary metabolites with purported health benefits, such as lycopene for tomatoes and resveratrol for grapes, there are 16 plant species with more than 20,000 ESTs. If we use a strict definition of medicinal, there are just 4 plant species with more than a mere 5000 ESTs. They are artemesia, Madagascar periwinkle, gingko, and ginseng. 1/1000 of the proposed data has launched the field of phylogenomics 10 April 2008 – 40 Mb total from ESTs in 29 animals 27 June 2008 – 5.4 Mb total from genome of 169 birds artemisinin: poster child for the synthetic biology investment world synthesized by Jay Keasling $40M from Gates foundation Amyris is now into biofuels $600M to Berkeley university CYP71AV1 by x-species EST in leaves of sweet wormwood FPP pathway most effective anti-malarial a proposal to crowd source the writing of bioinformatics software OPEN SOURCING’s classic example is Linux sophisticated software (e.g. comparable sophistication in bioinformatics is whole genome shotgun assembly) that was developed by a small handful of talented programmers CROWD SOURCING alternative is Wikipedia millions of contributors each writing a small article on a specific topic; similar to much (but not all) of bioinformatics as it does not require PhDs and can be done by students who will work for free and how would we incentivize them to do so biologist with data to analyze technical specification of issues talented bioinformatics student contributions recorded on website open to prospective employers young people need a chance to prove themselves; we will provide a web based mechanism for them to do so, on a high profile international scale Alberta and China: where is this happening and who is paying for it BGI – Jian Wang, Jun Wang, Huanming Yang, Jun Yu UofA Biological Sciences – Michael Deyholos UofA Medicine – Andrew Mason, Richard Fedorak, Lorne Tyrrell UofA Computing Science – Paul Lu, Guohui Lin Research funding from the Alberta Informatics Circle of Research Excellence and the Government of Shenzhen Additional support from UofA Biological Sciences

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Keynote for 2008 Genomics Workshop