* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Clustered alignments of gene-expression time series data
Public health genomics wikipedia , lookup
Pathogenomics wikipedia , lookup
History of genetic engineering wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Epigenetics of neurodegenerative diseases wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Quantitative comparative linguistics wikipedia , lookup
Quantitative trait locus wikipedia , lookup
Essential gene wikipedia , lookup
Microevolution wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genome evolution wikipedia , lookup
Gene expression programming wikipedia , lookup
Computational phylogenetics wikipedia , lookup
Genomic imprinting wikipedia , lookup
Genome (book) wikipedia , lookup
Minimal genome wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Designer baby wikipedia , lookup
Ridge (biology) wikipedia , lookup
Gene expression profiling wikipedia , lookup
Sequence alignment wikipedia , lookup
Clustered alignments of geneexpression time series data Adam A. Smith, Aaron Vollrath, Cristopher A. Bradfield and Mark Craven Department of Biosatatistics & Medical Informatics, Department of Computer Sciences and Department of Oncology, University of Wisconsin, Madison, USA BIOINFROMATICS Vol. 25 pages i119-i127, 2009 Outline • Introduction • Method – SCOW – Clustered alignments • Results and Discussion • Conclusion Introduction • Charactering and comparing temporal geneexpression responses is an important computational task for answering a variety of questions in biological studies. • One application : Toxicongenomics charactering the potential toxicity of chemicals Introduction • answering similarity queries: assess similarity by determine the temporal correspondence between the query and treatment Introduction • Tow issue: – First : (Treatment B) all genes should be aligned together. (Treatment C) some genes need to be warped separately – Second : • The best alignment does not account for the complete extent of both time series. • Allow a type of local alignments in which the end of one series is unaligned • Shorting the alignment Introduction • Multi-segment alignment method : Shorting : The alignment path that represents shorting ends in the top row or the right column of the alignment space diagram, but not in the top-right cell. Introduction • To solve “all genes are assumed to be aligned in lockstep with one another” – Calculated clustered alignments – Find clusters of gene such that genes within a cluster share a common alignment – Each cluster is aligned independently of the others – Similar to k-means • Alternates between assigning genes to cluster and recomputing the alignment for each cluster using the genes assigned to it • To solve “alignment for the complete extent of both time series” – Multi-segment alignment – shorting Method – SCOW (Shorting COW) • COW (Nielsen et al., 1998) – a dynamic programming algorithm designed to find an optimal alignment between two series with multiple channels of information(such as genes). – Briefly, it aligns and scores two give time series based on their similarity – Two series as q (for query series) and d (for database series) – The series are partitioned into m segments, in which the ith segments of the two series correspond to each other. – The score of a give alignment is the sum of correlations between corresponding segments Method – SCOW COW search for good segment boundaries in only a limited area of alignment space. The segment are assumed to be of constant length and usually evenly spaced in q The vector K contains the coordinates of the knots (segment endpoints) in q Variable in d Method – SCOW – The zero-indexed matrix , which is of dimensions m+1 by |d|+1. – The element k , x contains the score of the best alignment of d from zero to x and q from zero to k. Pearson correlation The predecessor function list valid starting locations in d for segments ending at x q(a,b) : Subseries of q from a to b d is defined likewise. Method – SCOW – The best score – a one-channel time series : the expression profile of a single gene a multi-channel time series : the expression profile of a set of genes The only difference between these two cases is in how the correlations are calculated. – COW is apt to align segments which differ greatly in magnitude. Method – SCOW • SCOW – Search for optimal knots in both dimensions The first step : seach independently in both dimensions. Second step : SCOW alternates horizontal and vertical movement of each knot until it converges. Method – SCOW First step Second step Method – SCOW q – The matrix is calculated when the algorithm searches for knots with respect to q and hold them constant with respect to d, while d is calculated during the opposite case. – The predecessor function : a cone-shaped search apace Method – SCOW – Score function : • Include terms that incur penalties for segment that involve stretching and significant difference in amplitude. The stretching si is defined as the ratio of lengths between qi and di, and ai is the amplitude ratio between the two as determined by a weighted least squares fitting procedure. Method – Clustered alignment • Find sets of genes that would have very similar alignments if they were aligned independently. • a variant of traditional k-means cluster – Identifying clusters in which the genes have similar warpings – The genes in one of our clusters may have very different expression profiles. Method – Clustered alignment The first step is to assign the initial alignment centroids, to select a representative set of gene alignments as the centroids. Subroutine Align returns the best alignment between two sereis based on a give set of genes. ScoreGene returns the score of two series when aligned using a given alignment and a specified gene. Record the best score so far that gene using one of the current centroidls. Method – Clustered alignment It alternates between assigning genes to cluster and recomputing the alignment for each cluster using the genes assigned to it. Results and Discussion • SCOW experiments – We construct queries for which we know the correct matching database treatments and their correct alignments. – The data we use comes from the EDGE toxicolog databases (http://edge.oncology.wisc.edu) – Dataset consists of 216 unique observations of microarray data, each of which represents the the values for 1600 different genes. – Time range from 6h up to 96h. – The data span 11 different treatments. Results and Discussion – Assemble 10 queries for each treatment by randomly sub-sampling time series in our dataset – We measure two accuracy : • Treatment accuracy : identify the treatment from which each query series was extracted • Alignment accuracy : align the query points to their actual time points in the treatment. Results and Discussion • The top line : treatment accuracy with different orders of splines The middle line : alignment accuracy by adding the criterion that the average time error in the mapping is less than or equal to 24 h The bottom line : alignment accuracy where this tolerance is decreased to 12 h. Results and Discussion – Conclusion : • Multi-segment alignment computed by SCOW, COW and Generative Multi-segment are superior to the alignment determined by ordinary dynamic time warping and the linear alignment method • SCOW find more accurate alignment than the other two multi-segment algorithms Results and Discussion • Clustered alignment experiments Conclusion • Present new method which advance in two ways : – Compute clustered alignments – A new multi-segment alignment method, called SCOW