Download ppt

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Epigenetics of neurodegenerative diseases wikipedia , lookup

Public health genomics wikipedia , lookup

Genomic imprinting wikipedia , lookup

History of genetic engineering wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene therapy wikipedia , lookup

Metagenomics wikipedia , lookup

Long non-coding RNA wikipedia , lookup

Genome (book) wikipedia , lookup

Gene wikipedia , lookup

Gene desert wikipedia , lookup

Gene therapy of the human retina wikipedia , lookup

Genome evolution wikipedia , lookup

Gene nomenclature wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Mir-92 microRNA precursor family wikipedia , lookup

Epigenetics of diabetes Type 2 wikipedia , lookup

Microevolution wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Designer baby wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Gene expression programming wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene expression profiling wikipedia , lookup

Transcript
A New Approach to Analyzing Gene
Expression Time Series Data
Ziv Bar-Joseph
Georg Gerber
David K. Gifford
Tommi S. Jaakkola
Itamar Simon
Learning Seminar: Bioinformatics & Other Applications
Prof. Nathan Intrator
Presented By: Adam Segoli Schubert
May 16, 2005
Overview















Gene Expression
Time Series
Statistical Analysis of Time-Series
DNA Microarray
Gene Expression Time-Series
Analyzing Gene Expression Time-Series Data
Estimating Unobserved Expression Values and Time Points
What is a Spline?
Using the Splines
Parameters Analysis
Aligning Time-Series Data
Aligning Temporal Data Using Splines
Results – Unobserved Data Estimation
Result - Aligning Temporal Data
References
Gene Expression
Time-Series
A series of values of variables taken in successive
periods of time


Time Points
Sampling Intervals (constant / inconstant)
A well established area in statistical analysis of
data is dedicated to the study of time-series
Statistical Analysis of Time-Series
Two main goals:


Identifying the nature of the phenomenon
Predicting unobserved values of the timeseries variable
DNA Microarray



Allows the monitoring of expression
levels of thaousands of genes under a
variety of conditions.
The data of microarray experiments is
usually in the form of a large matrix.
Very Expensive.
Gene Expression Time-Series




Determined by measuring mRNA levels or protein
concentrations
Commonly are very short (i.e. 4 to 20 samples)
Usually unevenly sampled
The measuring techniques are extremely noiseprone and/or subject to bias in the biological
measurements.
Analyzing Gene Expression TimeSeries Data

Estimating Unobserved Expression Values
and Time Points

Aligning Time-Series Data
Estimating Unobserved Expression
Values and Time Points
Row Average or Filling with Zeros
 Singular Value Decomposition (SVD)

Amxn  U mxm  mxnV
T
nxn
Weighted K-Nearest Neighbors
 Linear Interpolation

A New Analysis Approach

By using Cubic Splines.
What is a Spline?




A special curve defined piecewise by polynomials.
Given k points ti called knots in an interval [a,b] with
The parametric curve
degree n if
A Cubic Spline if n = 3.
is called a Spline of
and
Using the Splines



We Obtain a continues time formulation by
using cubic splines to represent gene
expression curves.
Spline control points are uniformly spaced.
We constrain spline coefficients of coexpressed genes to have the same
covariance matrix.
Estimating Unobserved Data Using
Splines



Given c Genes Classes.
- The gene i (of class j) value as observed at time t
Can be written as
Estimating Unobserved Data Using
Splines

Resampling gene I at any time t’ of an unobserved time point:

Estimating Missing Values:
Averaging of the observed values using the class covariance
matrix
, class average
and the gene specific variation
Where
are determined by a probabilistic model.
.
Estimating Unobserved Data Using
Splines
Parameters Analysis

Yi – Vector of observed expression values for gene i.

Si – Matrix mxq for m observations.
Aligning Time-Series Data

Dynamic Time Wraping
Developed for voice recognition purposes at the 70’s.
Dynamic Programming

John Aach & George M. Church
operates on individual genes
Aligning Temporal Data Using Splines

Operates on a set of genes.

Assume we have two spline curve for gene i:

We define a mapping function T(s) = t
Aligning Temporal Data Using Splines

We Define the alignment error for each gene:
Alignment Limits:
Starting Point
Ending Point
Aligning Temporal Data Using Splines

We define the error for a set of genes S of size n as:
- Weighted coefficients that sum to one
(uniform / nonuniforn).
Aligning Temporal Data Using Splines

The Mapping function (T(s) = t) can then be found by
minimizing
‘s value. Using standard non-linear
optimization techniques.
Results – Unobserved Data Estimation
Comparison of the new approach with:



Linear Interpolation
Spline interpolation using individual genes
K-Nearest neighbors (KNN)
k = 20
Result - Aligning Temporal Data

Aligned three yeast cell-cycle gene
expression time series
Thank You!
Any Questions?
References








C. S. Moller-Levet. Clustering of Gene Expressiom Time-Series Data.
Biology. Fifth Edition By Neil A. Campbell, Jane B. Reece, and Lawrence G.
Mitchell.
J. Aach and G. M. Church. Aligning gene expression time series with time
warping algorithms. Bioinformatics, 17:495-508, 2001.
C. de Boor. A practical guide to splines. Springer, 1978.
P. D’haeseleer, X. Wen, S. Fuhrman, and R. Somogyi. Linear modeling of mrna
expression levels during cns development and injury. In PSB99, 1999.
G. James and T. Hastie/ Functional linear discriminant analysis for irregulary
sampled curves. Jurnal of the Royal Statistical Society, to appear, 2001.
Sharan R. and Shamir R. Algorithmic approaches to clustering gene expression
data/ current topics in coputational Biology, To appear.
O. Troyanskaya, M. Cantor, and et al/ Missing value estimation methods for
dna microarrays. bioinformatics, 17:520-525, 2001.