Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris (Plant Pathology) Motivation Many fungal genomes being sequenced – 100s in the next few years Important fungal genetics work done by large and strong UNL group – Have prototype fungal genome database Numerous groups around the world are developing disparate fungal genome databases – Databases dissimilar and widely distributed – Difficult to unify others’ results with one’s own – Unnecessarily complicates research But it’s impractical to unify the databases! Semantic Web Develop an ontology to describe the fungal genome data – An ontology is a formal, explicit specification of shared concepts – Allows both human and machine processing – Concepts shared between ontology files on the WWW – Ontology describes properties of genes, relations between genes, and operations useful in analyzing them Participants keep their own data locally, but represent it in a way consistent with this framework Captures the semantic meaning of the data, facilitating automatic processing – This is where the fun starts Semantic Web Architecture What we can do with it Can do transitive reasoning on genes – E.g. if genes A and B are related via property 1 and B and C are related via 2, then perhaps so are A and C Inverse relationships to reduce data entry – E.g. “EvolvedFrom” data entered automatically implies “EvolvedTo” relation Consistency checking – E.g. verify that UNL’s assertions about fungal genomes don’t contradict those by others on the same genomes Hypothesis building and testing – E.g. identification of genes that function in specific cellular processes Knowledge discovery and data mining – Ontology includes appropriate techniques for users to apply to extract new knowledge from the data What we can do with it (cont’d) Uniform interface to the world’s collection of genomic resources – Visualization, query & search – Instructional tool: Train postdocs/students as bi/trilingual scientists who can understand molecular (fungal) biology/genetics, bioinformatics, and computer science Can add active machine learning component to facilitate querying of database to classify new sequences – Computer learns how to classify biological sequences via labeled examples, interaction with the user, and interaction with other experts Prior Work Application of semantic web technology to bioinformatics is not new Gene Ontology (http://www.geneontology.org) – Collection of ontologies related to molecular functions, biological processes, and cellular components – Takes a rather limited view of ontologies Little (if any) use of quantifiers, shared concepts, etc. Prior Work (cont’d) Fungal Web (http://www.cs.concordia.ca/~baker/) – Built a fungal gene ontology based on GO – Developing technologies to parse on-line scientific literature to add data to database – Tools to query databases and perform analysis Similar to what we propose, but: – Their extensions to GO do not suit the needs of UNL scientists or the broader fungal community They focus on fungi that degrade cellulose Their annotations too limited to represent entire fungal kingdom – They support machine learning, but not active learning Extending Other Repositories Use existing ontologies (e.g. GO) and data stores as a basis for fungal ontologies – Utilize existing concepts in other gene ontologies – Extend to meet needs of fungal genomes – Extensions can in turn be utilized by other researchers, both fungal and other kingdoms Because we use common concepts where applicable Funding Opportunities NSF – Frontiers in Integrative Biological Research (FIBR): Oct prerop, Feb full http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf05597 – Science and Engineering Information Integration and Informatics (SEIII): December http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf04528 NIH – INNOVATIONS IN BIOMEDICAL COMPUTATIONAL SCIENCE AND TECHNOLOGY: Sept LOI, Oct full http://grants1.nih.gov/grants/guide/pa-files/PAR-03-106.html Nebraska Research Initiative: November Conclusions Semantic web now popular within bioinformatics, but no support for the work of UNL’s fungal research community We plan to build the necessary infrastructure to unify disparate data sources and provide an interface conducive to knowledge discovery, hypothesis testing, and collaboration – Will build on existing fungal database here at UNL – Contributions: distributed infrastructure, means for querying, drawing inferences We should send someone to the KnowledgeBased Bioinformatics Workshop to learn more about the state of the art