Download Project Descriptions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Class Projects
Future Work and Possible Project
Topic in Gene Regulatory network
Learning from multiple data sources;
Learning causality in Motifs;
Learning GRN with feedback loops;
Learning from multiple data
sources
 We have gene
expression data and
topological ordering
information;
 Incorporating some
other data sources as
prior knowledge for the
learning;


Transcription factor
binding location data;
…
Example: Partial regulatory network recovered
using expression data and location data.
Learning Causality in Motifs
•They be used to assemble a transcriptional regulatory network.
•Network motifs are the simplest units of network architecture.
Learning GRN with feedback
loops
Learning GRN with feedback loops
(Con’dProtein-Protein Interactions
Future work and Possible Project
Topics in protein interaction
 Learning from multiple data sources;
 Disease related protein-protein interactions;
 Learning from different species;
Learning from Multiple data sources
(a) Gene Neighbor:
identifies protein pair
encoded in close
proximity across multiple
genomes.
(b) Rosetta Stone
(c) Phylogenetic Profile
(d) Gene Clustering:
closely spaced genes, and
assigns a probability P of
observing a particular gap
distance
Disease related protein-protein
interactions;
Disease Related???
-- Query NCBI OMIM
Database
Learning from different species
BioQA related projects
Projects for BioQA
1. Learning


Given a set of relevant abstracts, what kind of features can
we obtain to enhance our queries?
Given a set of questions from users, how can we identify
keywords from the questions to form queries?
2. Answer Presentation

Given a relevant abstract/article,


how can we retrieve the relevant passage with respect to the user’s
question?
how to extract answers?
Projects for BioQA
3. Automatic Extraction


Extract relations of gene-disease, gene-biological process (also their
corresponding organisms)
Uniquely identify the genes


A gene symbol can be associated with multiple gene identifiers. Which
gene identifier is the right one?
Can these extraction processes be generalized?
4. Sortal Resolution


Given an abstract and query, perform sortal resolution (but not on
pronouns)
Example:

Given the following abstract:



“In this report, we show that virus infection of cells results in a dramatic
hyperacetylation of histones H3 and H4 that is localized to the IFN-beta
promoter. … Thus, coactivator-mediated localized hyperacetylation of
histones may play a crucial role in inducible gene expression. [PMID:
10024886]
and the query about histones, perform resolution on histones
Results: histones refer to H3, H4.
Projects for BioQA
5. Semantics of Words

Dealing with the semantics of words to improve the retrieval of answers

Example: semantic relation between “role” and “play”
6. Gene symbol variants, disambiguate gene symbols, entity recognition

Generate gene symbol synonyms and variants given a gene symbol in a query



Example: variants of “CDC28” can be written as “Cdc28”, “Cdc28p”, “cdc-28”
“GSS” is a synonym of “PRNP”, but “GSS” itself is also a gene which is
unrelated to “PRNP”.
Improve on recognition of diseases, biological processes
7. Extension of Ontology


To capture biological processes and their possible relations to diseases
Examples:



learning and/or memory can influence Alzheimer’s disease
Degradation of ubiquitin cycle can cause extra long/short half-life of genes
Extra long/short half-life of genes can cause cancer
CBioC Class Projects
Extraction of organism info for each High-priority. Use existing software for
entity in a relationship
extraction, but need to use biological databases
and algorithms for deducing info (not explicit),
and allow users to correct this info.
Example, PMID 16107876.
KALPESH
Image extension - extracts images
& information about images and
allows collaborative curation.
Take PDFs & other structured documents, and
extract images with their captions & references
within the text, then let users polish. Related.
Use ontologies and some automated 2 people. Information entered by users needs to
tools to ensure consistency and
be validated against existing DB & ontologies.
cross-link info
Also, need to tag our data for
cross-reference. Example
Other projects
Build an Ontology
 Build an ontology for a domain for which we
do not have an ontology yet.
 Verify its consistency.
Various kinds of text extraction
systems
 TREC suggested ones






Which method/protocol is used in which experiment/procedure
Gene – disease – role
Gene – biological process – role
Gene – mutation type – biological impact
Gene – interaction – gene – function – organ
Gene – interaction – gene – disease – organ
 Protein Lounge inspired



Kinase-phosphatase
transcription factor
peptide antigen
Drug classification in
Pharmacogenetics
Experimental Data available
Drug response on cell lines; gene expression data; gene copy data; mutation
analysis data; RNAi data

Data from literature
Mutation data (Sanger lab); NCI-60 drug response data; Mutation analysis data;
Pathway data (e.g. BIND); Gene Ontology
Proprietary data




Where does the drug physically interact? (600 Kinase – IC 50)
Gene expression data of patients after treatments
Goal:


Given a patient, what kinds of data do we need in order to determine if a drug
should be applicable to that patient or not? How do we develop a classifier using
these kinds of data?
Find gene and protein interaction network (or components) using these data.