Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
1 (PRONOUNCED “KNOWING”) KNOWLEDGE ENGINE FOR GENOMICS Saurabh Sinha Co-Director & Reseach Lead, KnowEnG Center. Associate Professor of Computer Science Faculty, Carl Woese Institute of Genomic Biology University of Illinois at Urbana-Champaign 2 Knowledge-guided analysis of user’s genomic data A Common Paradigm in Genomics USER DATA USER DATA Gene Set 1 KNOWLEDGE BASE Gene Set 1 KNOWLEDGE BASE Gene Set n CURRENT PROCESS DAVID, GSEA, GREAT, MSIGDB, etc. FUTURE PROCESS KNOWENG 3 Basic idea • Knowledge Network (KN): a heterogeneous graph whose nodes and edges represent genes/proteins, their properties and relationships • User data: a spreadsheet (rows = genes or proteins) Graph Mining Machine learning SPREADSHEET Knowledge network + user spreadsheet 4 Example Mayo Clinic Drug Response Data in LCLs • 284 lymphoblastoid cell lines • 26 drug treatments • Gene expression, genetic variation, CG methylation data from cell lines Genes determining drug response 5 Specific Aims V I Development Data Science Research IV II (with Mayo Clinic) VI User interfaces & visualization Analysis Algorithms III Scalable computing Genomics of behavior VII Mining of microbial genomes Other challenges Data sharing, data objects, privacy Widespread access to cyberinfrastracture, payment models Implementation Cancer Pharmaco -genomics Building the Knowledge Network 6 Knowledge Network Construction • Public data from different species • Homo sapiens • Mus musculus • Drosophila melanogaster • 200K nodes • Saccharomyces cerevisiae • 150K gene nodes • Caenorhabditis elegans • Arabidopsis thaliana • 50K property nodes • 60M edges and 30 edge types • Nodes: genes and properties • 58M Gene-Gene edges • Edges: • protein-protein interactions • 2M Gene-Property edges • genetic interactions • co-expression • protein domains • Gene Ontology • pathways • TF binding motif presence • homology 7 Example Analytics: Predicting drug response • Motivation: Predicting best treatment strategy for cancer patients • Goal: Network-guided prediction of cancer drug response using genomic and epigenomic profiling data • Datasets: • Genotype (~1.3M), gene expression (~17K), and DNA methylation (~450K) • ~300 Human lymphoblastoid cell lines (LCL) • Drug response from dosage-response curves of 26 cytotoxic treatments • Publically available datasets in the form of Knowledge Network (KN) 8 Thank you !