Download JGI_Grigoriev - YSU Proteomics/Genomics Research Group

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Genomics of Microbial
Eukaryotes
Igor Grigoriev
Fungal Genomics Program Head
US DOE Joint Genome Institute, Walnut Creek, CA
<[email protected]>
Outline
 Eukaryotic Genome Annotation
 Fungal Genomics Program
 MycoCosm
2
Are you in the right room?
IMG
MycoCosm
100+ annotated
eukaryotic genomes
genome.jgi.doe.gov
3
Started with Human Genome Project
4
Eukaryotic Gene Prediction
Train on known genes
Ab initio methods
Promoter
use knowledge of known
ATG
genes’ structures to predict
GT
start, stop, and splice sites
Gene model
in CDS only. (Fgenesh+,
5’UTR
GeneMark)
exons
Transcript-based methods
map or assemble transcripts
on the genome, including
UTRs (EST_map, Combest)
EST contig
Protein-based methods
build CDS exons around
known protein alignments.
(Fgenesh, GeneWise)
GenBank protein
TGA
PolyA
AG
3’UTR
introns
Predict model
Predict model
5
Protein Annotation
Signal peptide
Domain
(signalP)
(InterPro, tmhmm)
Predicted protein
Possible orthologs
(in nr, SwissProt, KEGG, KOG)
Possible paralog
(Blastp+MCL)
Higher order assignments:
Gene Ontology terms
EC numbers --> KEGG pathways
Gene families, with and without other species
6
EST Support is Critical for Eukaryotes
A.
ac
ul
ea
tu
s
ab
eu
m
G
.tr
W
.c
oc
os
cr
et
a*
N.
di
s
A.
ni
ge
r*
ol
a
in
ic
.g
ra
m
M
S.
ro
se
us
ee
an
us
L.
bi
c
ol
or
Other Genes
Supported by ESTs
P.
bl
ak
es
l
N.
he
am
at
oc
oc
ca
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Sanger
454
Illumina
CombEST gene models
5531
34
EST profile
7
Best Models
Representative set
FGENESH
GENEWISE
EXTERNAL MODELS
 Multiple gene predictors offer several different gene models at each gene locus;
 A single best model from each locus is automatically selected based on homology
and EST support;
 These compose a non-redundant (or Filtered) gene set for further analysis
 This set is further improved during community-driven manual curation
8
Bring it all together
Annotation Pipeline
Genomic assembly
and EST contigs
Repeat mask
Transcript + protein maps
Gene predictions
Manual curation
Gene families
Gene expression
Phylogenomics
Proteomics
Protein targeting
etc
Annotation
Analysis
Protein annotations
9
Many Genes of Eco-responsive Daphnia pulex
First crustacean, aquatic animal sequenced, new model organism
30,940 predicted D.pulex genes in ~200Mb genome
85% supported by 1+ lines of evidence
Colbourne et al, Science, 2011
10
Half of Daphnia Genes have no Homologs
* Of 716 highly conserved single copy orthologs, Daphnia is missing only two
With Evgeny Zdobnov’s group (Univ. Genève)
11
Outline
 Eukaryotic Genome Annotation
 Fungal Genomics Program
 MycoCosm
12
Fungal Genomics for Energy and Environment
Bio-refinery
Plant symbionts
and pathogens
Grow
Lignocellulose
degradation
Degrade
Sugar
Fermentation
Ferment
GOAL: Scale up sequencing and analysis of fungal diversity
for DOE science and applications
13
14
Genomic Encyclopedia of Fungi Launched
• Plant feedstock health
• Symbiosis
• Plant Pathogenicity
• Biocontrol
• Biorefinery fungi
• Lignocellulose degradation
• Sugar fermentation
• Industrial organisms
• Fungal diversity
• Phylogentic
• Ecologic
www.jgi.doe.gov/fungi
100+ fungal genomes
600+ registered users
5000+ visitors/month
15
Distinct Mechanisms of Cellulose Degradation
No cellulose binding domain CBM1 in brown rot!
Cellobiohydrolase II
Cellobiohydrolase I
GH6(CBH50)
Endoglucanases GH7 (CBH58,62)
GH5-CBM1,GH12
White rot
P.chrysosporium
Brown rot
Postia placenta
Cellulose
GH3
b-glucosidase
Glucose
Glucose oxidases
Copper radical oxidases
Fe2+ + H2O2
Fe3+
Iron reductase
Fe3+ + HO- + HO.
Martinez et al, PNAS 2009
16
Diverse Basidiomycota
• FGP09 pilots
• Basidio jam (Mar 2010)
• 3 CSP11 proposals
• Basidio jam (Mar 2011)
17
Future Grand Challenges
1. 1000 fungal genomes
sampling fungal diversity
2. Model fungi
MODELING
sampling 100s of conditions
3. Fungal ecosystems:
FUNCTION



Bioenergy crops symbionts & pathogens
Biorefinery
Fungal metagenomes
SEQUENCE
Fungal
isolates & groups
Systems of
interacting organisms
Systems in wild
18
Leadership in Sequencing Fungi
DOE Joint Genome
Institute
31%
41%
Broad Institute
Sanger Institute
Washington Univ
5%
other
10%
13%
Ascomycota
Basidiomycota
Blastocladiomycota
23%
Chytridiomycota
68%
Glomeromycota
Microsporidia
Neocallimastigomycota
Unknown
Zygomycota
19
Annotation and Analysis Tools
• Automated Annotation
• Pipeline
• Genomics Analysis Platform
• Genome Centric
• Comparative Genomics
• Community Resource
• Integrated data
• User tools
• Training
20
Comparative View
Genome-Centric View
www.jgi.doe.gov/fungi
21
Genome-Centric View
Focus: functional genomics,
user data deposition and curation
22
New Comparative View
23
Community Building Tools
• Jamborees:
• Genome analysis for publications
• MycoCosm Tutorials:
• On-line video, MGM, workshops
w/ large
•
meetings (Asilomar, JGI Users, MSA)
• Preparation for CSP:
• Large meetings and focused groups
24
Summary
Eukaryotic Annotation Recipe:
• Combined gene predictors,
experimental data, and
community annotation
Fungal Genomics Program:
• Scaled-up sequencing &
comparative analysis of
fungi relevant for energy &
environment (jgi.doe.gov/fungi)
25
Outline
 Eukaryotic Genome Annotation
Fungal Genomics Program
 MycoCosm
26