Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Beespace Prototype Design Meeting Entity Recognition Jing Jiang 09/28/2005 Entity Recognition in Prototype V1 Target entities: gene names Supervised learning: LingPipe (word trigram and tag bigram model) Training data: BioCreative (manually annotated) Drosophila (generated from gene lists) Sample Results http://sifaka.cs.uiuc.edu/jiang4/Beespace Performance Some gene names without explicit mention of “gene” can be captured E.g., “glutathione S-transferase” Problems Gene-like phrases, e.g., “China 2”, “13.8” Mismatch of gene name boundaries and noun phrase boundaries, e.g., “nicotinic” in “nicotinic pathway” V2 -- Entity Types Annotation guideline for BioCreative Guideline for Beespace? Ontology? (GENIA ontology) What to tag? Genes and proteins Family of genes Gene descriptions Entity boundaries and noun phrase boundaries Tag only noun phrases that refer to genes or tag any occurrence of a gene name inside a noun phrase? Sample Sentences A dose-dependent transactivation of human hARE-mediated chloramphenicol acetyltransferase (cat) gene expression was observed upon treatments of the Hepa-1 transfectants with TPA, a known inducer, as well as with CAPE. In the present study, we identified its preferred binding sequence as 5'-CCCTATCGATCGATCTCTACCT-3' and characterized its DNA binding properties using truncated Mblk-1 mutants. Sample Sentences (cont.) At least two kinds of nicotinic receptors seem to be involved in honeybee memory, an alpha-bungarotoxin-sensitive and an alpha-bungarotoxin-insensitive receptor. The involvement of nicotinic pathways in memory formation and retrieval processes was tested by injecting… Sample Sentences (cont.) We report the cloning of a honeybee CSP gene called ASP3c, as well as the structural and functional characterization of the encoded protein. Natural occurring variatioin in npr-1, a gene encoding a putative receptor for an NPY-like molecule, causes variation in feeding behaviour. Sample Sentences (cont.) The gene encoding ZENK, an EARLY IMMEDIATE GENE well known in other learning and memory contexts, has figured prominently in molecular songbird research thus far. This is because frequent contacts of these types cause an increase in the expression of the gene encoding a glucocortocoid receptor in the hippocampus, and… Training Data Dictionary Rules/guidelines Bootstrapping Cross-domain training Can training data in other domains (fly, human, etc.) still be useful?