Download robust fit

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genetic engineering wikipedia , lookup

Saethre–Chotzen syndrome wikipedia , lookup

Public health genomics wikipedia , lookup

Pathogenomics wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Epitranscriptome wikipedia , lookup

Genomic imprinting wikipedia , lookup

History of genetic engineering wikipedia , lookup

Gene therapy wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Ridge (biology) wikipedia , lookup

Point mutation wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Gene desert wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Gene nomenclature wikipedia , lookup

Genome evolution wikipedia , lookup

Epigenetics in learning and memory wikipedia , lookup

Genome (book) wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Transcription factor wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Helitron (biology) wikipedia , lookup

Gene wikipedia , lookup

Primary transcript wikipedia , lookup

Microevolution wikipedia , lookup

Nutriepigenomics wikipedia , lookup

NEDD9 wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Gene expression programming wikipedia , lookup

Designer baby wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression profiling wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Transcript
About genewise regulation analysis
with regression
Janne Nikkilä
Contents of the talk
•
•
•
•
•
•
Biological background of gene expression regulation
Previous work in statistical modeling of gene
regulation
Our motivation
Our analysis approach
Some results
Discussion
Biological background for gene
expression regulation
•
Gene expression is regulated at several stages:
–
–
–
–
–
–
•
DNA unpacking (demethylation, histone acetylation)
Transcription
Alternative RNA splicing
mRNA degradation
Translation initiation
Protein processing and degradation
Transcription is believed to be the most important
one
Biological background for gene
transcription regulation
•
•
•
•
In the transcription, mRNA corresponding the coding
DNA sequence is formed
Transcription initiation is mainly controlled by
binding of specific protein complexes, transcription
factors (tf), to gene promoter region
Tfs may enhance, suppress, or do both
As tfs are composed of proteins, which are coded by
genes, tf activities can be analyzed by studying the
expressions of the genes that code tfs
Analysis methods used in the literature
•
Modelling of gene interactions by
–
–
–
–
–
Boolean networks
Differential equations
Linear regression
Clustering
Probabilistic models (e.g. Bayesian networks)
Our motivation
•
•
➔
None of the previous methods seem to work
adequately
This may be due to methods, due to the quality of the
data, or perhaps due to the cumulative effect of these
two factors
We wish to find some evidence that gene regulation
mechanisms can be inferred from gene expression
data
A simple approach
•
Study one gene expression at time and try to explain
it with the sum of the transcription factor component
activities
–
•
Regression as model
–
•
Intuitive interpretation of the set up and the results
Easy to interpret, computationally feasible
Evaluate the results statistically
–
Somewhat quantitative interpretation of the results
Data
•
Expression data
–
–
•
300 different knockout mutations of the yeast (300
arrays)
over 6000 yeast genes on each cDNA-array
Binding data
–
–
–
Binding activity of 147 transcription factors to all yeast
genes (147 arrays)
About same genes on array as above
Used to choose a set of candidate tfs for each gene
Preprocessing of the data
•
Normal quantity provided by cDNA-arrays is the
log-ratio of the sample and the control intensities
from each spot
–
•
Plain log-intensities separately?
•
➔
May hinder the discovery of normal regulation
mechanisms
Not possible because of spotwise variation
Only the arraywise and genewise averages were
removed and the normal log-ratios were used in the
analysis
The regression model
•
•
•
•
The expression of a gene, y, is modelled as a
weighted sum of x, the expressions of a set of
transcription factor genes
The error e is assumed to be normally distributed
As a result each transcription factor gene is assigned
a coefficient
, which denotes its role in gene
regulation
Fitted with robust fit-method
Statistical analysis of the results
•
•
A subset of nine genes: some confirmed transcription
factors, the binding activities of the 50 tfs and
significances of the same 50 tfs in regression model
Tests:
–
–
Test whether binding activity and regression model
produce same kind of information about the roles of the
tfs for each gene -> no statistical significance
Test whether the confirmed tfs are found among the most
significant ones in either binding or regression -> no stat
signif
Discussion
•
•
•
Clearly, there is no linear association between the
regulator genes and the regulated genes in this data
set
The biggest problem is perhaps the type of the data:
cDNA-data without time dimension -> the change of
data to Affymetrix and/or timeseries data might help
Another problem may be oversimplified model, but
with this kind of data statistical models for gene
interactions seem to be fruitless