Download this research presentation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Public health genomics wikipedia , lookup

Point mutation wikipedia , lookup

Minimal genome wikipedia , lookup

Gene wikipedia , lookup

Designer baby wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Transcription factor wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Genome (book) wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Primary transcript wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Gene expression profiling wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Oncogenomics wikipedia , lookup

RNA-Seq wikipedia , lookup

NEDD9 wikipedia , lookup

Transcript
Regulatory Signatures Inferred From
Gene Expression Data
Jayanth (Jay) Krishnan
SBCNY Fellow
Mahopac High School
Mount Sinai School of Medicine
Bio-Engineering/Bioinformatics
Central Questions

What causes cells to become malignant?

How can we reverse the harmful effects of
cancer?
The Wetlab Approach


Onconase and Amphinase, the Antitumor
Ribonucleases from Rana pipiens Oocytes
Ardelt W, Shogen K, Darzynkiewicz Z.
–
–
–
New York Medical College: Cancer Biology
X mol of drug + chemotherapeutic agent + cancer
cells = Observation of cytostatic and cytotoxic
properties
Accurate, but search space too large
New Methodology: My Approach

Bioinformatics and mathematical modeling to
prune search space
–
–
–
–

Efficient
Faster
Economically Sound
Easily Reproducible
Wetlab biology = verification
Experimental Goals

Use Bioinformatics to identify the regulatory
signatures for 60 different Cancer Cell Lines
–


Transcription Factors, Protein subnetworks,
Kinases
Identify relationships between
cancers/regulatory components
Implement a quantitative method to predict
drugs for each cancer cell line
Work Flow: Phase 1
NCI -60 database mRNA profile analysis
Use statistical techniques to compute
over/under expressed genes
Phase 1
Phase 1: NCI-60 database



The database gives gene expression values
for each gene – cancer line pair using
several experimental probes.
Standard statistics are computed
Perl program used to process the data from
the NCI-60 database.
NCI Citation: "DTP - Cell Lines in the In Vitro Screen." Developmental Therapeutics
Program NCI/NIH. Web. 10 June. 2010.
<http://dtp.nci.nih.gov/docs/misc/common_files/cell_list.html>.
Phase 1: Representation of the NCI-60
Identifying over and under expressed genes
Cancer 1
Cancer 2
Cancer 3
Probe 1
N(1,1)
N(1,2)
Probe 2
N(2,1)
Probe 3
………..
Cancer
59
Cancer
60
N(1,3)
N(1,59)
N(1,60)
N(2,2)
N(2,3)
N(2,59)
N(2,60)
N(3,1)
N(3,2)
N(3,3)
N(3,59)
N(3,60)
N(S,1)
N(S,2)
N(S,3)
N(S,59)
N(S,60)
Gene “G”
…..
Probe S
Table 1: Depiction of the NCI-60 database for a single gene.
The columns indicate the cancer cell lines while the rows show the probes.
The intersections show the mRNA or expression value.
Statistics














Two sided Z test with a .025 p value was used to determine whether the gene is
disregulated
S
Sample mean Xbar(c) for cancer cell line “c” = ∑ N(i,c) / S
i=1
60
Population mean µ = ∑ Xbar(i) / 60. This is the mean across all 60 cancer cell lines.
i=1
60
Standard deviation σ = sqrt (∑ (Xbar(i) - µ)(Xbar(i) - µ) / 59)
i=1
Test statistic(c) for cancer cell line “c” = (Xbar(c) - µ) / σ
Assuming a significance level of α,
Gene G over expressed for cancer cell line “c”:
Test statistic(c) > Z(α/2)
Gene G under expressed for cancer cell line “c”:
Test statistic(c) < -Z(α/2)
Top 223 Over Expressed Genes for
MDA_N
Work Flow Phase 2
NCI -60 database mRNA profile analysis
Use statistical techniques to compute
over/under expressed genes
Phase 1
Chip Enrichment Analysis - ChEA
Created
database Determine top ranked transcription factors
responsible for the over/under expressed genes
Genes2Networks
Identify protein sub-networks that “connect” the
transcription factors through additional proteins
Existing
database Kinase Enrichment Analysis - KEA
Top ranked protein kinases regulating
the protein subnetworks
Phase 2
Phase 2: Creation of a system to
predict transcription factors



ChIP-on-chip and ChIP-Seq data is gathered
from prior experiments
Extraction of data from the supplemental
Excel spreadsheets and PDF tables
Creation of a database of mammalian ChIP
data
Phase 2: ChIP Enrichment Analysis

ChIP Enrichment Analysis (ChEA)
–
–

100,000 (TF-to-gene) interactions extracted from
over 60 publications.
80 transcription factors and the thousands of
target genes which they potentially regulate
The accumulated data is then manipulated
using a user friendly system which
implements the Fisher’s Exact Test
Software Inputs



The over and under expressed genes from
the NCI-60 are inputted into ChEA to get
transcription factors
The top transcription factors are inputted to
Genes2Networks (Ma’ayan Lab) to get
protein subnetworks
The subnetworks are inputted into Kinase
Enrichment Analysis (Ma’ayan Lab) to get
kinases
Materials and Methods: Phase 1
Berger SI, Posner JM,
Ma'ayan A.
Genes2Networks:
connecting lists of gene
symbols using
mammalian protein
interactions databases.
BMC Bioinformatics.
2007 Oct 4;8:372.
Alexander Lachmann and Avi Ma'ayan. KEA: Kinase Enrichment
Analysis. Bioinformatics 25:684-6 (2009) PMID: 19176546.
Work Flow: Phase 3 and 4
NCI -60 database mRNA profile analysis
Use statistical techniques to compute
over/under expressed genes
Phase 1
Chip Enrichment Analysis - ChEA
Created Determine top ranked transcription factors
database responsible for the over/under expressed genes
Genes2Networks
Identify protein sub-networks that “connect” the
transcription factors through additional proteins
Phase 2
Existing
database Kinase Enrichment Analysis - KEA
Top ranked protein kinases regulating the protein subnetworks
Compute integrated matrices for transcription factors,
protein complexes and kinases vs. cancer cell lines
Phase 3
Use MATLAB to form heat maps and dendrograms and
Use principal component analysis to determine clusters.
Phase 4
Phase 3: Creation of Integrated Matrices
MATLAB: Results and Analysis


MATLAB code written to find relationships
between the regulatory signatures and
cancer cell lines
Boxplots, dendrograms, principal component
analysis, and similarity heat maps were
created
Principal Component Analysis





Convert a set of observations of possibly correlated
variables into a set of values of uncorrelated
variables called principal components.
A (n x n) covariance matrix is created for each pair of
signatures or cancer cell lines
Eigen vectors are computed
The vectors with the highest Eigen values are the
principal components.
Data replotted with principal components as axes
Work Flow: Phase 5 & 6 – Identifying drugs to
reverse the effects of cancer
Future Research
Created
database
Existing
database
Results
Results
and
Analysis
NCI -60 database mRNA profile analysis
Use statistical techniques to compute
over/under expressed genes
Phase 1
Chip Enrichment Analysis - ChEA
Determine top ranked transcription factors
responsible for the over/under expressed genes
Genes2Networks
Identify protein sub-networks that “connect” the
transcription factors through additional proteins
Kinase Enrichment Analysis - KEA
Top ranked protein kinases regulating
the protein subnetworks
Phase 2
Compute integrated matrices for transcription factors,
protein complexes and kinases vs. cancer cell lines
Phase 3
Use MATLAB to form heat maps and dendrograms and
Use principal component analysis to determine clusters.
Phase 4
Using Jaccard co-efficients, find the top FDA approved drugs for each cancer cell line
Correlate changes in expression induced by these drugs and the discovered pathways
Corroborate top kinases and transcription factors found with prior research
Future
research
Conduct wet lab experiments to corroborate results
Phase 6
Phase 5
Predicting Drugs



CMAP database contains 500 drugs and
associated genes for each drug
Intersection of down regulating genes of the
drug and up regulating genes of the cancer
Jaccard coeffficient was calculated for each
cancer cell line
–
–
Drug with the highest Jaccard co-efficient is
chosen
Can be calculated at the gene/transcription
factor/kinase levels
Case Studies and Future Research



MG-132 was identified as the top drug for the
BR:T47D (Breast) cancer cell line.
6 case studies were performed confirming
our prediction of the regulatory signatures
and drugs by comparing it with wet lab data
Drugs are being submitted to Mount Sinai
wet lab department
Conclusion: What was accomplished?





1) A web interface was developed and published to
identify transcription factors
2) Entire regulatory signatures identified for 60
cancer cell lines
3) Matlab analysis to group cancer lines and
regulatory components
4) Drugs Predicted for all 60 Cancer Cell Lines
5) Case Studies performed ; Wet Lab verification
being done
Acknowledgements






Dr. Avi Ma’ayan - Science Research Mentor
Mr. Mark Langella – Adult Sponsor, Mahopac
High School
Mr. Bilyeu – Principal, Mahopac High School
Mr. Manko – Superintendent of Mahopac
Schools
Board of Education
Art Department