Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A model selection approach to discover age-dependent gene expression patterns using quantile regression models Joshua W.K. Ho1,2 1 School of IT, The University of Sydney 2 NICTA, Australian Technology Park, NSW [email protected] Joint work with Maurizio Stefani, Cristobal dos Remedios and Michael Charleston Ageing Microarray Dataset Arrays Gene Age InCoB 2009 Discover age-dependent patterns How to find DE patterns? InCoB 2009 Standard approach Linear regression using method of least squares Second order linear regression (Quadratic regression) InCoB 2009 Quantile regression Solve a different optimization problem: where InCoB 2009 Check function We obtain a median regression line when InCoB 2009 A quantile regression line How to know slope of the quantile regression line is 0 ? We obtain a median regression line when InCoB 2009 Solution: a model selection approach InCoB 2009 Quantile regression models Constant Model (C) Linear Model (L) Piecewise Model (PL) where Model complexity InCoB 2009 Model selection basics We measure the goodness-of-fit of a model based on Residual Sum of Absolute Differences (RSAD): • The smaller the RSAD, the better the fit. • A more complex model always yields lower RSAD than a simpler model InCoB 2009 Discovering DE patterns InCoB 2009 Simulation results – ROC analysis InCoB 2009 What about this type of patterns? A change in variability of gene expression? InCoB 2009 Differential Variability – a missing pattern in microarray analysis DV analysis is useful in human disease studies DV is related to differential coexpression Ho et al. (2008) Bioinformatics (ISMB’08 issue), 24, i390-i398 InCoB 2009 Solution – DV analysis InCoB 2009 DV and Non-DV QR models where each f(.) is a piecewise linear function Upper quantile where and each f(.) is a piecewise linear function Differences in model Lower quantile Non-DV model, the slopes are identical in fupper and flower DV model, the slopes are all independent A gene is said to be DV if InCoB 2009 Simulation results – ROC analysis InCoB 2009 Analysis of two brain ageing datasets Lu dataset 12625 genes 30 individuals, aged 26-106 Colantuoni dataset 31 schizophrenia susceptibility genes 72 individuals, aged 18-67 InCoB 2009 Selection of alpha based on FDR FDR estimation based on randomization of dataset InCoB 2009 InCoB 2009 DV – Colantuoni dataset InCoB 2009 DV genes – Lu dataset (1) Observation: Different individual age at a different rate w.r.t. gene expression changes. InCoB 2009 DV genes – Lu dataset (2) InCoB 2009 Extension to multi-class problems Using RSAD as goodness-of-fit measures, we can extend our approach to discover DE and DV genes in multi-class datasets InCoB 2009 Summary Novel application of quantile regression models to identify DE and DV patterns in ageing microarray datasets. Our approach is more robust than the standard least-square linear regression approach Application to human brain ageing InCoB 2009 Acknowledgement Supervisors Dr. Michael Charleston (School of IT, USyd) Prof. Cristobal dos Remedios (School of Med Sci, USyd) Collaborator Maurizio Stefani (USyd) Funding: Travel fellowship from InCoB’09 The University of Sydney NICTA http://www.it.usyd.edu.au/~joshua InCoB 2009