Download GENE PROFORMA Version 50: 05 Oct 2007!

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
A Database of Drosophila Genes & Genomes
Genetic Literature Curation at
FlyBase-Cambridge
Steven Marygold
ABC meeting, December 2007
www.flybase.org
[email protected]
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
1.
2.
3.
4.
5.
6.
Talk Outline
Group Structure
The FlyBase bibliography
Prioritizing curation
Curation practice
Curation support
Future directions
www.flybase.org
[email protected]
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
1.
2.
3.
4.
5.
6.
Talk Outline
Group Structure
The FlyBase bibliography
Prioritizing curation
Curation practice
Curation support
Future directions
www.flybase.org
[email protected]
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Group structure
FlyBase
FB-Harvard
FB-Cambridge
FB-Indiana
- database
- genome annotation
- expression curation
- bibliography
- gene and phenotype curation
- ontologies
- website
- fly stocks
- image curation
Principal Investigators
Michael Ashburner
Nick Brown
Reactome Curator
Group Manager
GO Curator
1 FTE
Steven Marygold
1 FTE
Literature Curators
Developer
FB Ontology Editor
3.25 FTEs
1 FTE
0.25 FTE
www.flybase.org
[email protected]
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
1.
2.
3.
4.
5.
6.
Talk Outline
Group Structure
The FlyBase bibliography
Prioritizing curation
Curation practice
Curation support
Future directions
www.flybase.org
[email protected]
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Bibliography
• Search for string ‘Drosophil*’ in title,
abstract or keywords
• Semi-automated search of publication
databases
– Medline, BIOSIS, ZooRec
• Manual searches of journal issues
www.flybase.org
[email protected]
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
1.
2.
3.
4.
5.
6.
Talk Outline
Group Structure
The FlyBase bibliography
Prioritizing curation
Curation practice
Curation support
Future directions
www.flybase.org
[email protected]
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Curation prioritization
Types of publication curated:
– Primary research papers
– Supplemental information
– Errata
– Personal communications to FlyBase
– Conference abstracts
– Reviews
– Books/Book chapters
– Miscellaneous others
www.flybase.org
[email protected]
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Curation prioritization
1. Prioritization of selected journals:
•
•
Set of (~50) journals publishing on Drosophila biology
Chronological, issue by issue curation
2. Prioritization of selected papers:
•
•
•
•
Flagged by ‘skim curation’
Flagged by stock center
Genes prioritized by GO project
Alerted to by research community
www.flybase.org
[email protected]
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
1.
2.
3.
4.
5.
6.
Talk Outline
Group Structure
The FlyBase bibliography
Prioritizing curation
Curation practice
Curation support
Future directions
www.flybase.org
[email protected]
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Identify/select
relevant paper
Curation practice
Access pdf
Curate material into
individual ‘proformae’ to
form a ‘curation record’
Error-checking:
- spelling
- consistency
- validity
www.flybase.org
Read abstract;
skim-read intro
Highlight curatable
material within Results,
Methods, Figures &
legends, Tables
Completed records
submitted for loading
into Chado database
[email protected]
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Curation practice
Curated data classes (proforma types):
– Publication
– Gene
– Allele
– Aberration
– Transgenic constructs
– Transgenic insertions
– Natural transposons
www.flybase.org
[email protected]
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Curation practice
Gene-level curated data:
–
–
–
–
–
–
–
–
–
–
valid FlyBase gene symbol/name
gene symbol/name used in paper
action gene rename or merge
action creation or deletion of gene
etymology of gene name
Sequence Ontology (SO) terms
cytological map position
relationship to cDNA/genomic clone
Gene Ontology (GO) terms
y/n flags to indicate paper has expression or
annotation information
www.flybase.org
[email protected]
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Curation practice
Allele-level curated data:
–
–
–
–
–
–
–
–
–
–
–
–
valid FlyBase allele symbol/name
allele symbol/name used in paper
action allele rename or merge
action creation or deletion of allele
allele class
mutagen
nucleotide/amino acid changes
phenotype: class, anatomy, free text
genetic interaction: class, anatomy, free text
complementation data
associated transgenic construct/insertion
associated tag
www.flybase.org
[email protected]
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Curation practice
!
!
!
!
!
GENE PROFORMA
Version 50: 05 Oct 2007!
G1a. Gene symbol to use in database
:ey
G1b. Gene symbol used in reference
:ey
G24a. GO -- Cellular component | evidence [CV] :
G24b. GO -- Molecular function | evidence [CV] :calcium channel
activity ; GO:0005262 | IDA
! G24c. GO -- Biological process | evidence [CV] :eye-antennal disc
development ; GO:0035214 | IMP
!
!
!
!
ALLELE PROFORMA
Version 39: 6 July 2007!
GA1a. Allele symbol to use in database
:ey[46]
GA1b. Allele symbol used in paper
:ey[461]
GA56. Phenotypic | dominance class [bipartite CV] :visible |
recessive
! GA17. Phenotype [CV, body part(s) where manifest] :eye
anterior vertical bristle
www.flybase.org
[email protected]
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
1.
2.
3.
4.
5.
6.
Talk Outline
Group Structure
The FlyBase bibliography
Prioritizing curation
Curation practice
Curation support
Future directions
www.flybase.org
[email protected]
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Curation support
• Curation support files
– Text files of data from latest DB instance
• Ontology files
– GO, SO, FB-anatomy, FB-phenotypes etc.
• PeeVeS
– Proforma Validation Software
• Other custom scripts
www.flybase.org
[email protected]
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Future directions
• More paper-by-paper prioritization
• ‘Skim curation’
– Manual curation
– Automated curation?
– User-submitted data
• Use of text-mining aids for ‘deep curation’
• Review breadth and depth of curation
• Enhanced curation interface
www.flybase.org
[email protected]
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Acknowledgements
FB-Cambridge:
Michael Ashburner (co-PI)
Nick Brown (co-PI)
Steven Marygold (Manager)
Gillian Millburn (Literature curator)
David Osumi-Sutherland (Ontology Editor and Literature curator)
Ruth Seal (Literature curator)
Peter McQuilton (Literature curator)
Paul Leyland (Developer)
Susan Tweedie (GO curator)
Mark Williams (Reactome curator)
Rachel Drysdale (former FB-Cambridge co-PI)
Genetics Dept., University of Cambridge, UK
The FlyBase Consortium
NHGRI at the NIH
www.flybase.org
[email protected]
Related documents