Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Transposable element wikipedia , lookup
Point mutation wikipedia , lookup
Personalized medicine wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Endogenous retrovirus wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Molecular ecology wikipedia , lookup
Homology modeling wikipedia , lookup
Community fingerprinting wikipedia , lookup
Center for Integrated Fungal Research Fungal Genomics Laboratory Industrial applications •glutamic acid •citric acid •amylases •proteases •lipases Bioterrorism Biologically interesting and genetically tractable Insight into eukaryotic gene regulation and development Framework of rice blast genome Deep (25X) large insert (130 kb) single enzyme (HindIII) BAC library from rice infecting strain 70-15 – 9,216 clones B. BAC fingerprints used to create contigs RFLP 2 RFLP 1 RFLP 3 BAC 1 C. BAC contigs BAC 2 anchored to genetic BAC 3 map BAC 4 BAC 5 BAC 6 A. BAC-end sequence provides “Sequence Tag Connectors” 1f 2f 3f 4f 5f 6f 1r 2r 3r 4r 5r 6r STC: ~500 bp sequence every 3-4 kb across genome USDA-IFAFS project Oct 2000 “Gene discovery in the rice blast fungus: ESTs and sequence of chromosome 7” 1. Generate ~5 X draft sequence of chromosome 7 (4.2 Mb). 2. Generate 35,000 ESTs and create a set of ~5,000 ESTs representing unique genes. 3. Provide basic sequence analysis and integration of data into physical map of chromosome 7. NSF-IFAFS project Oct 2001 whole genome sequence host-pathogen function analysis • Generate ~7 x draft sequence of M.grisea • Generate 50,000 knockouts • Analyze host-pathogen interaction • Provide basic sequence analysis Consequences of Scaling 1994 1995 1996 1997 1998 lab processing Base Pairs 1999 2000 Sequences 2001 2002 moore's law • Moore’s law has allowed labs to keep ahead of data • Sequence data is now outpacing processing capability • Bioinformatics processing will be a real problem Computational platforms • Modern biology requires robust computational platforms • Computer technology implementation is expensive (from a biologists viewpoint) • Computer technology development is even more expensive (you want how much?!) • This detracts from research for small labs On the brink • Significant investment in off the shelf components and cross training people • Moderate sized genomes • 20 to 50 Mega Bases • Takes 2 weeks for initial analyses • Homology searches take days Local blast (www.fungalgenomics.ncsu.edu) Federated database Select a chromosome Link to genetic information (blue) Link to marker data and other data at http://ascus.cit.cor nell.edu/blastdb/ High Throughput Genomic Processing and Display Rice blast N. crassa synteny 3 kb N. crassa Contig 1.515 2 kb 10 kb 185kb 0.5 kb M. grisea - BAC 6J18 111kb 15 kb 20 kb 1 kb N. crassa Contig 1.13 1 kb N. crassa Contig 1.513 N. crassa Contig 1.841 17 kb 97 out of 179 unique ESTs from chromosome 7 gave significant (E<10-5) tBlastX match to N. crassa genome shotgun assembly CIFR BioInformatics Foundation BioInformatics Biological results GRL Rube Sequence Pipe line mask Phred Phrap Advanced BioInformatics Artemis Curation consed extract load Research BioInformatics Higher Order BioInformatics synteny Cluster analysis Pathway analysis In-silico mutation Cellular models Blast Report db Data Loading Relational Data Model Curation Work area High Throughput WebBlaster Sequence Data OO Genomic Analysis Public Http Exposure AlkaEST Data mining Http Blast Report Genome browser BioPerl Interface Submissions Extraction homology Repeat analysis Gene prediction EST analysis PBS/LSF Grid Access NC BioInformatics Super computing Grid Genbank Developed at CIFR Ongoing work at CIFR Open source and others And over the . . . • Our whole genome arrives Spring 2002 • Everyone wants immediate results • Host (Rice) genome size far greater than the pathogen • Comparative genomics likely to require N way analyses • And then there’s proteomics …. Research Biology NCSU GRL •Romulus •Remus Excellent foundation work ~6 years to sequence M.grisea Industrial Scale Biology High Throughput Sequence Centers (Whitehead) ~4 days to sequence M.grisea Research Bioinformatics CIFR FGL •Mycelial mat Excellent foundation work est. 4 years to analyze M.grisea Industrial Scale Bioinformatics North Carolina BioGrid Hopefully 4 hours to analyze M.grisea Islands of Capability • There are not enough resources for every lab to re-implement technologies • Individual centers specialize according to their research focus • Grid ties together disparate systems • Share knowledge and capabilities • Standards based for interoperability Future directions 5 years* • Organized distributed research - “Virtual Centers” • Bioinformatics • Tool development • Gene prediction algorithms for filamentous fungi • Gene Indexing • “Distributed Annotation Systems (DAS)” • Develop better search features “Queries” • Integrate sequenced and annotated BAC clones • Integrate ESTs and expression profiles etc • Functional Genomics • Comparative studies - saprophyte vs pathogen etc • Coordinate IRBGC and PGI etc • • • • Complete nucleotide sequence, full length ESTs Knock out/silence all genes Transcriptional profiling in various backgrounds (path mutants) Construct protein-protein linkage maps (signaling pathways) * The biologists view Future Directions 5 years* • • • • Collaborative knowledge sharing New data mining approaches New ways of visualizing the information In-silico experimentation • • • • Gene knock outs Regulatory modification Pathway models Cellular models * The bioinformaticians view Finding solutions to practical problems • • • • • • Seeking answers requires asking questions Takes 1-2 weeks per question BioGrid may give near real-time response BioGrid will bridge the islands of capability Focus resources back on our work Consequently, we are going to further accelerate the rate of discovery