Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. A Database of Drosophila Genes & Genomes Genetic Literature Curation at FlyBase-Cambridge Steven Marygold ABC meeting, December 2007 www.flybase.org [email protected] QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. 1. 2. 3. 4. 5. 6. Talk Outline Group Structure The FlyBase bibliography Prioritizing curation Curation practice Curation support Future directions www.flybase.org [email protected] QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. 1. 2. 3. 4. 5. 6. Talk Outline Group Structure The FlyBase bibliography Prioritizing curation Curation practice Curation support Future directions www.flybase.org [email protected] QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Group structure FlyBase FB-Harvard FB-Cambridge FB-Indiana - database - genome annotation - expression curation - bibliography - gene and phenotype curation - ontologies - website - fly stocks - image curation Principal Investigators Michael Ashburner Nick Brown Reactome Curator Group Manager GO Curator 1 FTE Steven Marygold 1 FTE Literature Curators Developer FB Ontology Editor 3.25 FTEs 1 FTE 0.25 FTE www.flybase.org [email protected] QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. 1. 2. 3. 4. 5. 6. Talk Outline Group Structure The FlyBase bibliography Prioritizing curation Curation practice Curation support Future directions www.flybase.org [email protected] QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Bibliography • Search for string ‘Drosophil*’ in title, abstract or keywords • Semi-automated search of publication databases – Medline, BIOSIS, ZooRec • Manual searches of journal issues www.flybase.org [email protected] QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. 1. 2. 3. 4. 5. 6. Talk Outline Group Structure The FlyBase bibliography Prioritizing curation Curation practice Curation support Future directions www.flybase.org [email protected] QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Curation prioritization Types of publication curated: – Primary research papers – Supplemental information – Errata – Personal communications to FlyBase – Conference abstracts – Reviews – Books/Book chapters – Miscellaneous others www.flybase.org [email protected] QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Curation prioritization 1. Prioritization of selected journals: • • Set of (~50) journals publishing on Drosophila biology Chronological, issue by issue curation 2. Prioritization of selected papers: • • • • Flagged by ‘skim curation’ Flagged by stock center Genes prioritized by GO project Alerted to by research community www.flybase.org [email protected] QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. 1. 2. 3. 4. 5. 6. Talk Outline Group Structure The FlyBase bibliography Prioritizing curation Curation practice Curation support Future directions www.flybase.org [email protected] QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Identify/select relevant paper Curation practice Access pdf Curate material into individual ‘proformae’ to form a ‘curation record’ Error-checking: - spelling - consistency - validity www.flybase.org Read abstract; skim-read intro Highlight curatable material within Results, Methods, Figures & legends, Tables Completed records submitted for loading into Chado database [email protected] QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Curation practice Curated data classes (proforma types): – Publication – Gene – Allele – Aberration – Transgenic constructs – Transgenic insertions – Natural transposons www.flybase.org [email protected] QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Curation practice Gene-level curated data: – – – – – – – – – – valid FlyBase gene symbol/name gene symbol/name used in paper action gene rename or merge action creation or deletion of gene etymology of gene name Sequence Ontology (SO) terms cytological map position relationship to cDNA/genomic clone Gene Ontology (GO) terms y/n flags to indicate paper has expression or annotation information www.flybase.org [email protected] QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Curation practice Allele-level curated data: – – – – – – – – – – – – valid FlyBase allele symbol/name allele symbol/name used in paper action allele rename or merge action creation or deletion of allele allele class mutagen nucleotide/amino acid changes phenotype: class, anatomy, free text genetic interaction: class, anatomy, free text complementation data associated transgenic construct/insertion associated tag www.flybase.org [email protected] QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Curation practice ! ! ! ! ! GENE PROFORMA Version 50: 05 Oct 2007! G1a. Gene symbol to use in database :ey G1b. Gene symbol used in reference :ey G24a. GO -- Cellular component | evidence [CV] : G24b. GO -- Molecular function | evidence [CV] :calcium channel activity ; GO:0005262 | IDA ! G24c. GO -- Biological process | evidence [CV] :eye-antennal disc development ; GO:0035214 | IMP ! ! ! ! ALLELE PROFORMA Version 39: 6 July 2007! GA1a. Allele symbol to use in database :ey[46] GA1b. Allele symbol used in paper :ey[461] GA56. Phenotypic | dominance class [bipartite CV] :visible | recessive ! GA17. Phenotype [CV, body part(s) where manifest] :eye anterior vertical bristle www.flybase.org [email protected] QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. 1. 2. 3. 4. 5. 6. Talk Outline Group Structure The FlyBase bibliography Prioritizing curation Curation practice Curation support Future directions www.flybase.org [email protected] QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Curation support • Curation support files – Text files of data from latest DB instance • Ontology files – GO, SO, FB-anatomy, FB-phenotypes etc. • PeeVeS – Proforma Validation Software • Other custom scripts www.flybase.org [email protected] QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Future directions • More paper-by-paper prioritization • ‘Skim curation’ – Manual curation – Automated curation? – User-submitted data • Use of text-mining aids for ‘deep curation’ • Review breadth and depth of curation • Enhanced curation interface www.flybase.org [email protected] QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Acknowledgements FB-Cambridge: Michael Ashburner (co-PI) Nick Brown (co-PI) Steven Marygold (Manager) Gillian Millburn (Literature curator) David Osumi-Sutherland (Ontology Editor and Literature curator) Ruth Seal (Literature curator) Peter McQuilton (Literature curator) Paul Leyland (Developer) Susan Tweedie (GO curator) Mark Williams (Reactome curator) Rachel Drysdale (former FB-Cambridge co-PI) Genetics Dept., University of Cambridge, UK The FlyBase Consortium NHGRI at the NIH www.flybase.org [email protected]