Download Function and evolution of enhancers in Drosophila development

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
1
(PRONOUNCED “KNOWING”)
KNOWLEDGE ENGINE FOR
GENOMICS
Saurabh Sinha
Co-Director & Reseach Lead, KnowEnG Center.
Associate Professor of Computer Science
Faculty, Carl Woese Institute of Genomic Biology
University of Illinois at Urbana-Champaign
2
Knowledge-guided analysis of user’s genomic data
A Common Paradigm in Genomics
USER DATA
USER DATA
Gene
Set 1
KNOWLEDGE BASE
Gene
Set 1
KNOWLEDGE BASE
Gene
Set n
CURRENT
PROCESS
DAVID, GSEA,
GREAT, MSIGDB, etc.
FUTURE
PROCESS
KNOWENG
3
Basic idea
• Knowledge Network (KN): a heterogeneous graph whose nodes and edges
represent genes/proteins, their properties and relationships
• User data: a spreadsheet (rows = genes or proteins)
Graph Mining
Machine learning
SPREADSHEET
Knowledge network + user spreadsheet
4
Example
Mayo Clinic Drug Response Data in LCLs
• 284 lymphoblastoid cell lines
• 26 drug treatments
• Gene expression, genetic variation, CG
methylation data from cell lines
Genes determining drug response
5
Specific Aims
V
I
Development
Data
Science
Research
IV
II
(with Mayo
Clinic)
VI
User
interfaces &
visualization
Analysis
Algorithms
III
Scalable
computing
Genomics of
behavior
VII
Mining of
microbial
genomes
Other challenges
Data sharing, data objects, privacy
Widespread access to cyberinfrastracture, payment models
Implementation
Cancer
Pharmaco
-genomics
Building the
Knowledge
Network
6
Knowledge Network Construction
• Public data from different species
• Homo sapiens
• Mus musculus
• Drosophila melanogaster
• 200K nodes
• Saccharomyces cerevisiae
• 150K gene nodes
• Caenorhabditis elegans
• Arabidopsis thaliana • 50K property nodes
• 60M edges and 30 edge types
• Nodes: genes and properties
• 58M Gene-Gene edges
• Edges:
• protein-protein interactions
• 2M Gene-Property edges
• genetic interactions
• co-expression
• protein domains
• Gene Ontology
• pathways
• TF binding motif presence
• homology
7
Example Analytics: Predicting drug response
• Motivation: Predicting best treatment strategy for cancer patients
• Goal: Network-guided prediction of cancer drug response using
genomic and epigenomic profiling data
• Datasets:
• Genotype (~1.3M), gene expression (~17K), and DNA methylation
(~450K)
• ~300 Human lymphoblastoid cell lines (LCL)
• Drug response from dosage-response curves of 26 cytotoxic
treatments
• Publically available datasets in the form of Knowledge Network (KN)
8
Thank you !
Related documents