* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download this research presentation
Public health genomics wikipedia , lookup
Point mutation wikipedia , lookup
Minimal genome wikipedia , lookup
Designer baby wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Transcription factor wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Genome (book) wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Primary transcript wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Gene expression profiling wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Mir-92 microRNA precursor family wikipedia , lookup
Regulatory Signatures Inferred From Gene Expression Data Jayanth (Jay) Krishnan SBCNY Fellow Mahopac High School Mount Sinai School of Medicine Bio-Engineering/Bioinformatics Central Questions What causes cells to become malignant? How can we reverse the harmful effects of cancer? The Wetlab Approach Onconase and Amphinase, the Antitumor Ribonucleases from Rana pipiens Oocytes Ardelt W, Shogen K, Darzynkiewicz Z. – – – New York Medical College: Cancer Biology X mol of drug + chemotherapeutic agent + cancer cells = Observation of cytostatic and cytotoxic properties Accurate, but search space too large New Methodology: My Approach Bioinformatics and mathematical modeling to prune search space – – – – Efficient Faster Economically Sound Easily Reproducible Wetlab biology = verification Experimental Goals Use Bioinformatics to identify the regulatory signatures for 60 different Cancer Cell Lines – Transcription Factors, Protein subnetworks, Kinases Identify relationships between cancers/regulatory components Implement a quantitative method to predict drugs for each cancer cell line Work Flow: Phase 1 NCI -60 database mRNA profile analysis Use statistical techniques to compute over/under expressed genes Phase 1 Phase 1: NCI-60 database The database gives gene expression values for each gene – cancer line pair using several experimental probes. Standard statistics are computed Perl program used to process the data from the NCI-60 database. NCI Citation: "DTP - Cell Lines in the In Vitro Screen." Developmental Therapeutics Program NCI/NIH. Web. 10 June. 2010. <http://dtp.nci.nih.gov/docs/misc/common_files/cell_list.html>. Phase 1: Representation of the NCI-60 Identifying over and under expressed genes Cancer 1 Cancer 2 Cancer 3 Probe 1 N(1,1) N(1,2) Probe 2 N(2,1) Probe 3 ……….. Cancer 59 Cancer 60 N(1,3) N(1,59) N(1,60) N(2,2) N(2,3) N(2,59) N(2,60) N(3,1) N(3,2) N(3,3) N(3,59) N(3,60) N(S,1) N(S,2) N(S,3) N(S,59) N(S,60) Gene “G” ….. Probe S Table 1: Depiction of the NCI-60 database for a single gene. The columns indicate the cancer cell lines while the rows show the probes. The intersections show the mRNA or expression value. Statistics Two sided Z test with a .025 p value was used to determine whether the gene is disregulated S Sample mean Xbar(c) for cancer cell line “c” = ∑ N(i,c) / S i=1 60 Population mean µ = ∑ Xbar(i) / 60. This is the mean across all 60 cancer cell lines. i=1 60 Standard deviation σ = sqrt (∑ (Xbar(i) - µ)(Xbar(i) - µ) / 59) i=1 Test statistic(c) for cancer cell line “c” = (Xbar(c) - µ) / σ Assuming a significance level of α, Gene G over expressed for cancer cell line “c”: Test statistic(c) > Z(α/2) Gene G under expressed for cancer cell line “c”: Test statistic(c) < -Z(α/2) Top 223 Over Expressed Genes for MDA_N Work Flow Phase 2 NCI -60 database mRNA profile analysis Use statistical techniques to compute over/under expressed genes Phase 1 Chip Enrichment Analysis - ChEA Created database Determine top ranked transcription factors responsible for the over/under expressed genes Genes2Networks Identify protein sub-networks that “connect” the transcription factors through additional proteins Existing database Kinase Enrichment Analysis - KEA Top ranked protein kinases regulating the protein subnetworks Phase 2 Phase 2: Creation of a system to predict transcription factors ChIP-on-chip and ChIP-Seq data is gathered from prior experiments Extraction of data from the supplemental Excel spreadsheets and PDF tables Creation of a database of mammalian ChIP data Phase 2: ChIP Enrichment Analysis ChIP Enrichment Analysis (ChEA) – – 100,000 (TF-to-gene) interactions extracted from over 60 publications. 80 transcription factors and the thousands of target genes which they potentially regulate The accumulated data is then manipulated using a user friendly system which implements the Fisher’s Exact Test Software Inputs The over and under expressed genes from the NCI-60 are inputted into ChEA to get transcription factors The top transcription factors are inputted to Genes2Networks (Ma’ayan Lab) to get protein subnetworks The subnetworks are inputted into Kinase Enrichment Analysis (Ma’ayan Lab) to get kinases Materials and Methods: Phase 1 Berger SI, Posner JM, Ma'ayan A. Genes2Networks: connecting lists of gene symbols using mammalian protein interactions databases. BMC Bioinformatics. 2007 Oct 4;8:372. Alexander Lachmann and Avi Ma'ayan. KEA: Kinase Enrichment Analysis. Bioinformatics 25:684-6 (2009) PMID: 19176546. Work Flow: Phase 3 and 4 NCI -60 database mRNA profile analysis Use statistical techniques to compute over/under expressed genes Phase 1 Chip Enrichment Analysis - ChEA Created Determine top ranked transcription factors database responsible for the over/under expressed genes Genes2Networks Identify protein sub-networks that “connect” the transcription factors through additional proteins Phase 2 Existing database Kinase Enrichment Analysis - KEA Top ranked protein kinases regulating the protein subnetworks Compute integrated matrices for transcription factors, protein complexes and kinases vs. cancer cell lines Phase 3 Use MATLAB to form heat maps and dendrograms and Use principal component analysis to determine clusters. Phase 4 Phase 3: Creation of Integrated Matrices MATLAB: Results and Analysis MATLAB code written to find relationships between the regulatory signatures and cancer cell lines Boxplots, dendrograms, principal component analysis, and similarity heat maps were created Principal Component Analysis Convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components. A (n x n) covariance matrix is created for each pair of signatures or cancer cell lines Eigen vectors are computed The vectors with the highest Eigen values are the principal components. Data replotted with principal components as axes Work Flow: Phase 5 & 6 – Identifying drugs to reverse the effects of cancer Future Research Created database Existing database Results Results and Analysis NCI -60 database mRNA profile analysis Use statistical techniques to compute over/under expressed genes Phase 1 Chip Enrichment Analysis - ChEA Determine top ranked transcription factors responsible for the over/under expressed genes Genes2Networks Identify protein sub-networks that “connect” the transcription factors through additional proteins Kinase Enrichment Analysis - KEA Top ranked protein kinases regulating the protein subnetworks Phase 2 Compute integrated matrices for transcription factors, protein complexes and kinases vs. cancer cell lines Phase 3 Use MATLAB to form heat maps and dendrograms and Use principal component analysis to determine clusters. Phase 4 Using Jaccard co-efficients, find the top FDA approved drugs for each cancer cell line Correlate changes in expression induced by these drugs and the discovered pathways Corroborate top kinases and transcription factors found with prior research Future research Conduct wet lab experiments to corroborate results Phase 6 Phase 5 Predicting Drugs CMAP database contains 500 drugs and associated genes for each drug Intersection of down regulating genes of the drug and up regulating genes of the cancer Jaccard coeffficient was calculated for each cancer cell line – – Drug with the highest Jaccard co-efficient is chosen Can be calculated at the gene/transcription factor/kinase levels Case Studies and Future Research MG-132 was identified as the top drug for the BR:T47D (Breast) cancer cell line. 6 case studies were performed confirming our prediction of the regulatory signatures and drugs by comparing it with wet lab data Drugs are being submitted to Mount Sinai wet lab department Conclusion: What was accomplished? 1) A web interface was developed and published to identify transcription factors 2) Entire regulatory signatures identified for 60 cancer cell lines 3) Matlab analysis to group cancer lines and regulatory components 4) Drugs Predicted for all 60 Cancer Cell Lines 5) Case Studies performed ; Wet Lab verification being done Acknowledgements Dr. Avi Ma’ayan - Science Research Mentor Mr. Mark Langella – Adult Sponsor, Mahopac High School Mr. Bilyeu – Principal, Mahopac High School Mr. Manko – Superintendent of Mahopac Schools Board of Education Art Department