* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Poster - Protein Information Resource
Histone acetylation and deacetylation wikipedia , lookup
Immunoprecipitation wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Index of biochemistry articles wikipedia , lookup
List of types of proteins wikipedia , lookup
Gene expression wikipedia , lookup
G protein–coupled receptor wikipedia , lookup
Magnesium transporter wikipedia , lookup
Ancestral sequence reconstruction wikipedia , lookup
Expression vector wikipedia , lookup
Protein design wikipedia , lookup
Protein folding wikipedia , lookup
Protein moonlighting wikipedia , lookup
Phosphorylation wikipedia , lookup
Interactome wikipedia , lookup
Protein structure prediction wikipedia , lookup
Western blot wikipedia , lookup
Protein (nutrient) wikipedia , lookup
Proteolysis wikipedia , lookup
Protein adsorption wikipedia , lookup
Protein purification wikipedia , lookup
Nuclear magnetic resonance spectroscopy of proteins wikipedia , lookup
iProLINK – A Literature Mining Resource at PIR (integrated Protein Literature INformation and Knowledge ) Hu ZZ1, Liu H2, Vijay-Shanker K3, Mani I4, and Wu CH1 1Protein Information Resource, 2Department of Biostatistics, Bioinformatics, and Biomathematics, 4Department of Computational Linguistics, Georgetown University, Washington, DC 20007; 3University of Delaware, DE 19716 Introduction: With the increasing volume of scientific literature available electronically, efficient text mining tools will greatly facilitate the extraction of information buried in free text and will assist in database annotation and scientific inquiry. Many methods, including natural language processing, machine learning, and rule-based approaches have been employed for biological literature mining, especially in areas of entity recognition, information retrieval and extraction. The Protein Information Resource (PIR) group, actively collaborating with several other groups, conducts research and provides resources on literature mining in the above three areas. iProLINK is a public resource provided by PIR that aims at providing annotated literature data sets for development of new literature mining algorithms, such as protein named entity recognition, text categorization, and protein annotation extraction, and of protein ontology. iProLINK also provides literature mining tools for scientific users and curators. (Comp Biol Chem, 28:409-416, 2004) iProLINK Resource Overview Bibliography mapping: 1. Bibliography mapping - UniProtKB mapped citations 2. Annotation extraction - annotation tagged literature 3. Protein entity recognition - dictionary, tagged literature 4. Protein ontology development Contains curated literature citations for UniProtKB protein entries from multiple sources including GeneRIF, SGD, and MGI, in addition to current UniProt literature citations. Also included are user-submitted and computationally mapped citations. - PIRSF-based ontology Annotation tagged literature sets: e.g. acetylation, glycosylation, hydroxylation, phosphorylation, methylation in abstract or full text. Protein entity recognition: name dictionaries, tagged abstracts and tagging guidelines Search and browse tagged features Tagging guideline versions 1.0 and 2.0 2 sets of tagged corpora Inter-coder reliability Data sets for the five PTMs are being used for developing machine learning algorithms for text categorization (classification). A substringbased approach is developed that is highly effective in biomedical document classification (Bioinformatics, submitted, 2006) Data sets for protein phosphorylation were used for testing and benchmarking a rulebased text mining program for phosphorylation – RLIMS-P (Bioinformatics 21:2759-65, 2005.) PIRSF-Based Protein Ontology RLIMS-P PIRSF family hierarchy based on evolutionary relationships Standardized PIRSF family names and relations as protein ontology DAG Network structure for PIRSF family classification system (left) PIRSF-based protein ontology can complement Gene Ontology (right) PIRSF in DAG View Details in a separate RLIMS-P poster Guideline v1.0 Guideline v2.0 Bioinformatics. 2006 Apr 27 Protein name tagging guidelines: lessons learned – Comp. Funct Genomics, 6(1-2): 72-76, 2005 RLIMS-P and BioThesaurus combined can be used for UniProt protein feature annotations. BioThesaurus • Comprehensive collection of protein/gene names from multiple molecular databases • Associates names with UniProtKB entries • Primary usage: • Retrieve synonymous names • Resolve ambiguous names • Evaluate name coverage Synonyms for Metalloproteinase inhibitor 3 Bioinformatics, 21(11): 2759-2765, 2005 http://pir.georgetown.edu/iprolink Name ambiguity of TIMP-3 Summary - iProLINK is a public resource for literature mining and ontology development. - RLIMS-P is a text-mining tool for protein phosphorylation. - BioThesaurus is for gene and protein name mapping to solve name ambiguity. - BioThesaurus and RLIMS-P can be used to assist UniProtKB protein annotations. - PIRSF-based protein ontology can complement GO. Acknowledgements: NIH (UniProt), NSF (Entity Tagging, Ontology). PIR team: Hermoso V, Fang C, Yuan X, Huang H, Zhang J, Natale D, Nikolskaya A. Temple University: Han B, Obradovic Z, Vucetic S. Contact: [email protected]