* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download Statistical analysis of DNA microarray data
Vectors in gene therapy wikipedia , lookup
Gene therapy of the human retina wikipedia , lookup
Gene therapy wikipedia , lookup
Genome evolution wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Gene desert wikipedia , lookup
Gene nomenclature wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Metabolic network modelling wikipedia , lookup
Genome (book) wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Oncogenomics wikipedia , lookup
Microevolution wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Designer baby wikipedia , lookup
Gene expression programming wikipedia , lookup
Gene Co-expression Network Analysis BMI 730 Kun Huang Department of Biomedical Informatics Ohio State University Announcement • No class this Wed • Change of schedule – miRNA lecture moved to a later time • More time for project – only the last class is used for presentation • Today – lecture more relevant to the projects – Discuss possible class projects – Decide on the groups • Decide on the project topic by next Monday – meeting with me later this week is recommended. http://www.rithme.eu/img/storage_cost.gif Gene Expression Microarray Gene Networks/Pathways • • • • • • Regulatory network Metabolic pathways Signaling pathways Protein-protein interaction networks Gene interaction networks Co-expression network Networks/Pathways Resources • • • • • • www.pathguide.org KEGG HPRD MIMI BIND … Networks/Pathways in Research • Genes don’t act alone • One gene – one disease model is not sufficient • Need to understand how genes coordinate and work together as a system Networks/Pathways • How to build the network? • Manual curation – e.g., IPA • Automatic inference from literature – e.g., NLP based method • Inference from data – e.g., co-expression network • Integration from multiple resources – e.g., STRING database (http://string.embl.de/) Networks/Pathways • How to build the network? • Manual curation – e.g., IPA • Automatic inference from literature – e.g., NLP based method • Inference from data – e.g., co-expression network • Integration from multiple resources – e.g., STRING database (http://string.embl.de/) Networks/Pathways • How to use the network? • Functional inference • Identify new candidate for further investigation • Dynamical simulation • Other types of inferences MicroRNA (miRNA) a Myc E2F3 E2F1 E2F2 17-5p 17-3p 18a 19a 20a 19b 92-1 b c Myc p E2F 1 2 mir-17-92 m Reviewed by: Coller et al. (2008), PLoS Genet 3(8): e146 Figures from Dr. Baltz Agula Gene Co-Expression HMMR siRNA Gene Co-Expression Network • Expansion – Negative correlation – Multiple breast cancer datasets – More anchor genes –… • Is there a way to find all highly correlated genes in multiple datasets? • Do these genes form clusters? Gene Co-Expression Network • Step 1: Compute pairwise PCC values • Step 2: Weighted or unweighted? – Unweighted – need to select a cutoff on PCC – Weighted – need to consider transformation of the data – Keep the scale-free topology • Step 3: Identify “dense” networks (subgraphs) from the overall graph – Hierarchical clustering – Graph mining Graph Mining • Definition of “dense” – Ratio of connectivity: for a subgraph with K nodes and L edges r = L/(K(K-1)/2). – K-core: a subgraph in which every node is connected to at least K other nodes (within this subgraph). • Identification of all the “dense” networks is usually an NP-complete problem. – Heuristic or approximate algorithms are used – e.g., greedy algorithm Frequent network mining • CODENSE – Originally applied to yeast microarray data, later expanded to cancers – Used for functional annotation Data selection and correlation • Selected 23 datasets from Gene Expression Omnibus (GEO) – Search term “human metastatic cancer” – Contain both control and tumor, # sample > 8 – Only primary biopsy • Correlation – PCC > 0.75 (really high similarity) • For CODENSE – Edge support in at least 4 datasets – Connectivity ratio r > 40% (L > r∙n(n-1)/2) – # of nodes > 20 Results from CODENSE • 44 networks are identified • # of nodes: 21 ~ 74 (average 44) • Connectivity: 0.41 ~ 0.78 Finding New Functions Relation to BRCA1 Comparing ER- and ER+ breast cancer patients • Estrogen receptor status is one of the key biomarkers for breast cancer prognosis (ER- indicates poor prognosis) • Select a dataset (GSE2034, Wang et al) from GEO containing 286 samples (77 ER-, 209 ER+) • Compare the ER- group vs ER+ group, select the networks that is most perturbed • The network containing HMMR is most perturbed – more than half of the genes are differentially regulated Select gene signature from a network to predict survival • Use the genes in this network as features to cluster patients in the Rosseta data (295 breast cancer patients) and compare the survival between the two groups. Log-rank test p < 1e-8 Possible Project Topics: 1. Compare the gene expression profiles between tumor and its microenvironment – differential expression, gene co-expression network, and tissue-tissue expression network. 2. Similarly compare the co-expression network between different types of tissues. 3. Herpes virus and cancer; predict human gene targets for virus (Herpes virus) microRNAs. 4. Gene expression “stalling” prediction using “stalling index” from ChIP-seq data for RNA polymerase II. 5. TF binding motif prediction using graph theoretical method. 6. MicroRNA co-expression network to predict microRNA transcription regulation. 7. Your own research problem …