Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Genomics of Microbial Eukaryotes Igor Grigoriev Fungal Genomics Program Head US DOE Joint Genome Institute, Walnut Creek, CA <[email protected]> Outline Eukaryotic Genome Annotation Fungal Genomics Program MycoCosm 2 Are you in the right room? IMG MycoCosm 100+ annotated eukaryotic genomes genome.jgi.doe.gov 3 Started with Human Genome Project 4 Eukaryotic Gene Prediction Train on known genes Ab initio methods Promoter use knowledge of known ATG genes’ structures to predict GT start, stop, and splice sites Gene model in CDS only. (Fgenesh+, 5’UTR GeneMark) exons Transcript-based methods map or assemble transcripts on the genome, including UTRs (EST_map, Combest) EST contig Protein-based methods build CDS exons around known protein alignments. (Fgenesh, GeneWise) GenBank protein TGA PolyA AG 3’UTR introns Predict model Predict model 5 Protein Annotation Signal peptide Domain (signalP) (InterPro, tmhmm) Predicted protein Possible orthologs (in nr, SwissProt, KEGG, KOG) Possible paralog (Blastp+MCL) Higher order assignments: Gene Ontology terms EC numbers --> KEGG pathways Gene families, with and without other species 6 EST Support is Critical for Eukaryotes A. ac ul ea tu s ab eu m G .tr W .c oc os cr et a* N. di s A. ni ge r* ol a in ic .g ra m M S. ro se us ee an us L. bi c ol or Other Genes Supported by ESTs P. bl ak es l N. he am at oc oc ca 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Sanger 454 Illumina CombEST gene models 5531 34 EST profile 7 Best Models Representative set FGENESH GENEWISE EXTERNAL MODELS Multiple gene predictors offer several different gene models at each gene locus; A single best model from each locus is automatically selected based on homology and EST support; These compose a non-redundant (or Filtered) gene set for further analysis This set is further improved during community-driven manual curation 8 Bring it all together Annotation Pipeline Genomic assembly and EST contigs Repeat mask Transcript + protein maps Gene predictions Manual curation Gene families Gene expression Phylogenomics Proteomics Protein targeting etc Annotation Analysis Protein annotations 9 Many Genes of Eco-responsive Daphnia pulex First crustacean, aquatic animal sequenced, new model organism 30,940 predicted D.pulex genes in ~200Mb genome 85% supported by 1+ lines of evidence Colbourne et al, Science, 2011 10 Half of Daphnia Genes have no Homologs * Of 716 highly conserved single copy orthologs, Daphnia is missing only two With Evgeny Zdobnov’s group (Univ. Genève) 11 Outline Eukaryotic Genome Annotation Fungal Genomics Program MycoCosm 12 Fungal Genomics for Energy and Environment Bio-refinery Plant symbionts and pathogens Grow Lignocellulose degradation Degrade Sugar Fermentation Ferment GOAL: Scale up sequencing and analysis of fungal diversity for DOE science and applications 13 14 Genomic Encyclopedia of Fungi Launched • Plant feedstock health • Symbiosis • Plant Pathogenicity • Biocontrol • Biorefinery fungi • Lignocellulose degradation • Sugar fermentation • Industrial organisms • Fungal diversity • Phylogentic • Ecologic www.jgi.doe.gov/fungi 100+ fungal genomes 600+ registered users 5000+ visitors/month 15 Distinct Mechanisms of Cellulose Degradation No cellulose binding domain CBM1 in brown rot! Cellobiohydrolase II Cellobiohydrolase I GH6(CBH50) Endoglucanases GH7 (CBH58,62) GH5-CBM1,GH12 White rot P.chrysosporium Brown rot Postia placenta Cellulose GH3 b-glucosidase Glucose Glucose oxidases Copper radical oxidases Fe2+ + H2O2 Fe3+ Iron reductase Fe3+ + HO- + HO. Martinez et al, PNAS 2009 16 Diverse Basidiomycota • FGP09 pilots • Basidio jam (Mar 2010) • 3 CSP11 proposals • Basidio jam (Mar 2011) 17 Future Grand Challenges 1. 1000 fungal genomes sampling fungal diversity 2. Model fungi MODELING sampling 100s of conditions 3. Fungal ecosystems: FUNCTION Bioenergy crops symbionts & pathogens Biorefinery Fungal metagenomes SEQUENCE Fungal isolates & groups Systems of interacting organisms Systems in wild 18 Leadership in Sequencing Fungi DOE Joint Genome Institute 31% 41% Broad Institute Sanger Institute Washington Univ 5% other 10% 13% Ascomycota Basidiomycota Blastocladiomycota 23% Chytridiomycota 68% Glomeromycota Microsporidia Neocallimastigomycota Unknown Zygomycota 19 Annotation and Analysis Tools • Automated Annotation • Pipeline • Genomics Analysis Platform • Genome Centric • Comparative Genomics • Community Resource • Integrated data • User tools • Training 20 Comparative View Genome-Centric View www.jgi.doe.gov/fungi 21 Genome-Centric View Focus: functional genomics, user data deposition and curation 22 New Comparative View 23 Community Building Tools • Jamborees: • Genome analysis for publications • MycoCosm Tutorials: • On-line video, MGM, workshops w/ large • meetings (Asilomar, JGI Users, MSA) • Preparation for CSP: • Large meetings and focused groups 24 Summary Eukaryotic Annotation Recipe: • Combined gene predictors, experimental data, and community annotation Fungal Genomics Program: • Scaled-up sequencing & comparative analysis of fungi relevant for energy & environment (jgi.doe.gov/fungi) 25 Outline Eukaryotic Genome Annotation Fungal Genomics Program MycoCosm 26