Download Clustered alignments of gene-expression time series data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Public health genomics wikipedia , lookup

Pathogenomics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Epigenetics of neurodegenerative diseases wikipedia , lookup

Polycomb Group Proteins and Cancer wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Quantitative comparative linguistics wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Essential gene wikipedia , lookup

Microevolution wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Genome evolution wikipedia , lookup

RNA-Seq wikipedia , lookup

Gene wikipedia , lookup

Gene expression programming wikipedia , lookup

Computational phylogenetics wikipedia , lookup

Genomic imprinting wikipedia , lookup

Genome (book) wikipedia , lookup

Minimal genome wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Designer baby wikipedia , lookup

Ridge (biology) wikipedia , lookup

Gene expression profiling wikipedia , lookup

Sequence alignment wikipedia , lookup

Smith–Waterman algorithm wikipedia , lookup

Multiple sequence alignment wikipedia , lookup

Transcript
Clustered alignments of geneexpression time series data
Adam A. Smith, Aaron Vollrath, Cristopher A. Bradfield and
Mark Craven
Department of Biosatatistics & Medical Informatics,
Department of Computer Sciences and Department of
Oncology, University of Wisconsin, Madison, USA
BIOINFROMATICS Vol. 25 pages i119-i127, 2009
Outline
• Introduction
• Method
– SCOW
– Clustered alignments
• Results and Discussion
• Conclusion
Introduction
• Charactering and comparing temporal geneexpression responses is an important
computational task for answering a variety of
questions in biological studies.
• One application : Toxicongenomics
charactering the potential toxicity of
chemicals
Introduction
• answering similarity queries:
assess similarity by determine the temporal
correspondence between the query and treatment
Introduction
• Tow issue:
– First :
(Treatment B) all genes should be aligned together.
(Treatment C) some genes need to be warped
separately
– Second :
• The best alignment does not account for the complete
extent of both time series.
• Allow a type of local alignments in which the end of
one series is unaligned
• Shorting the alignment
Introduction
• Multi-segment alignment method :
Shorting : The alignment path that represents shorting ends in the top row or
the right column of the alignment space diagram, but not in the top-right cell.
Introduction
• To solve “all genes are assumed to be aligned in lockstep
with one another”
– Calculated clustered alignments
– Find clusters of gene such that genes within a cluster share a
common alignment
– Each cluster is aligned independently of the others
– Similar to k-means
• Alternates between assigning genes to cluster and recomputing the
alignment for each cluster using the genes assigned to it
• To solve “alignment for the complete extent of both time
series”
– Multi-segment alignment
– shorting
Method – SCOW (Shorting COW)
• COW (Nielsen et al., 1998)
– a dynamic programming algorithm designed to find an
optimal alignment between two series with multiple
channels of information(such as genes).
– Briefly, it aligns and scores two give time series based on
their similarity
– Two series as q (for query series) and d (for database series)
– The series are partitioned into m segments, in which the ith segments of the two series correspond to each other.
– The score of a give alignment is the sum of correlations
between corresponding segments
Method – SCOW
COW search for good segment boundaries in only a limited area of alignment space.
The segment are assumed to
be of constant length and
usually evenly spaced in q
The vector K contains the
coordinates of the knots
(segment endpoints) in q
Variable in d
Method – SCOW
– The zero-indexed matrix , which is of dimensions m+1 by
|d|+1.
– The element  k , x contains the score of the best alignment
of d from zero to x and q from zero to k.
Pearson correlation
The predecessor function list valid starting
locations in d for segments ending at x
q(a,b) : Subseries of q from a to b
d is defined likewise.
Method – SCOW
– The best score
– a one-channel time series : the expression profile
of a single gene
a multi-channel time series : the expression profile
of a set of genes
The only difference between these two cases is in
how the correlations are calculated.
– COW is apt to align segments which differ greatly
in magnitude.
Method – SCOW
• SCOW
– Search for optimal knots in both dimensions
The first step : seach independently
in both dimensions.
Second step : SCOW alternates
horizontal and vertical movement
of each knot until it converges.
Method – SCOW
First step
Second step
Method – SCOW
q

– The matrix
is calculated when the algorithm searches
for knots with respect to q and hold them constant with
respect to d, while  d is calculated during the opposite
case.
– The predecessor function : a cone-shaped search apace
Method – SCOW
– Score function :
• Include terms that incur penalties for segment that involve
stretching and significant difference in amplitude.
The stretching si is defined as the ratio of lengths
between qi and di, and ai is the amplitude ratio between
the two as determined by a weighted least squares fitting
procedure.
Method – Clustered alignment
• Find sets of genes that would have very similar
alignments if they were aligned independently.
• a variant of traditional k-means cluster
– Identifying clusters in which the genes have
similar warpings
– The genes in one of our clusters may have very
different expression profiles.
Method – Clustered alignment
The first step is to assign the initial alignment centroids, to select a
representative set of gene alignments as the centroids.
Subroutine Align returns the best
alignment between two sereis based on a
give set of genes.
ScoreGene returns the score
of two series when aligned
using a given alignment and
a specified gene.
Record the best score so far that gene
using one of the current centroidls.
Method – Clustered alignment
It alternates between assigning genes to cluster and recomputing the alignment
for each cluster using the genes assigned to it.
Results and Discussion
• SCOW experiments
– We construct queries for which we know the correct
matching database treatments and their correct
alignments.
– The data we use comes from the EDGE toxicolog
databases (http://edge.oncology.wisc.edu)
– Dataset consists of 216 unique observations of
microarray data, each of which represents the the
values for 1600 different genes.
– Time range from 6h up to 96h.
– The data span 11 different treatments.
Results and Discussion
– Assemble 10 queries for each treatment by
randomly sub-sampling time series in our dataset
– We measure two accuracy :
• Treatment accuracy : identify the treatment from which
each query series was extracted
• Alignment accuracy : align the query points to their
actual time points in the treatment.
Results and Discussion
•
The top line : treatment accuracy with different orders of splines
The middle line : alignment accuracy by adding the criterion that the average time
error in the mapping is less than or equal to 24 h
The bottom line : alignment accuracy where this tolerance is decreased to 12 h.
Results and Discussion
– Conclusion :
• Multi-segment alignment computed by SCOW, COW
and Generative Multi-segment are superior to the
alignment determined by ordinary dynamic time
warping and the linear alignment method
• SCOW find more accurate alignment than the other
two multi-segment algorithms
Results and Discussion
• Clustered alignment experiments
Conclusion
• Present new method which advance in two
ways :
– Compute clustered alignments
– A new multi-segment alignment method, called
SCOW