Download Beespace Prototype Design Meeting Entity Recognition

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Beespace Prototype Design Meeting
Entity Recognition
Jing Jiang
09/28/2005
Entity Recognition in Prototype V1
 Target entities: gene names
 Supervised learning: LingPipe (word
trigram and tag bigram model)
 Training data:


BioCreative (manually annotated)
Drosophila (generated from gene lists)
Sample Results
 http://sifaka.cs.uiuc.edu/jiang4/Beespace
Performance
 Some gene names without explicit
mention of “gene” can be captured

E.g., “glutathione S-transferase”
 Problems


Gene-like phrases, e.g., “China 2”, “13.8”
Mismatch of gene name boundaries and
noun phrase boundaries, e.g., “nicotinic” in
“nicotinic pathway”
V2 -- Entity Types
 Annotation guideline for BioCreative


Guideline for Beespace?
Ontology? (GENIA ontology)
 What to tag?



Genes and proteins
Family of genes
Gene descriptions
 Entity boundaries and noun phrase boundaries

Tag only noun phrases that refer to genes or tag any
occurrence of a gene name inside a noun phrase?
Sample Sentences
 A dose-dependent transactivation of human
hARE-mediated chloramphenicol
acetyltransferase (cat) gene expression was
observed upon treatments of the Hepa-1
transfectants with TPA, a known inducer, as
well as with CAPE.
 In the present study, we identified its preferred
binding sequence as 5'-CCCTATCGATCGATCTCTACCT-3' and characterized its DNA binding properties using truncated Mblk-1
mutants.
Sample Sentences (cont.)
 At least two kinds of nicotinic receptors
seem to be involved in honeybee
memory, an alpha-bungarotoxin-sensitive
and an alpha-bungarotoxin-insensitive
receptor.
 The involvement of nicotinic pathways in
memory formation and retrieval
processes was tested by injecting…
Sample Sentences (cont.)
 We report the cloning of a honeybee
CSP gene called ASP3c, as well as the
structural and functional characterization
of the encoded protein.
 Natural occurring variatioin in npr-1, a
gene encoding a putative receptor for an
NPY-like molecule, causes variation in
feeding behaviour.
Sample Sentences (cont.)
 The gene encoding ZENK, an EARLY
IMMEDIATE GENE well known in other
learning and memory contexts, has figured
prominently in molecular songbird research
thus far.
 This is because frequent contacts of these
types cause an increase in the expression of
the gene encoding a glucocortocoid receptor in
the hippocampus, and…
Training Data
 Dictionary
 Rules/guidelines
 Bootstrapping
 Cross-domain training

Can training data in other domains (fly,
human, etc.) still be useful?