* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download click here and type title
Survey
Document related concepts
Polymorphism (biology) wikipedia , lookup
Heritability of IQ wikipedia , lookup
History of genetic engineering wikipedia , lookup
Genetic engineering wikipedia , lookup
Pharmacogenomics wikipedia , lookup
Public health genomics wikipedia , lookup
Human genetic variation wikipedia , lookup
Genome (book) wikipedia , lookup
Genetic testing wikipedia , lookup
Genetic drift wikipedia , lookup
Group selection wikipedia , lookup
Gene expression programming wikipedia , lookup
Smith–Waterman algorithm wikipedia , lookup
Transcript
International Biometric Society COVARIATE SELECTION IN A MULTIVARIATE COX REGRESSION USING A GENETIC ALGORITHM Dr. Katrin Kupas1, Simon Fink2, Dr. Hendrik Schmidt3 1. Statistical Consultant, Frankfurt, Germany 2. Ludwig-Maximilians-Universität, München, Germany 3. Boehringer Ingelheim Pharma GmbH & Co KG, Biberach, Germany In clinical trials, time to event is often used as a robust and patient-relevant endpoint. The identification of meaningful prognostic factors for the time to event is important for the prediction of the progress of the disease and the outcome of the therapy and can contribute to the recognition of patient subgroups. Hence, this technique will also play a more prominent role in health economic evaluations and personalized medicine. Usually, a number of patient characteristics are collected at baseline, which can contribute to the patient’s risk of experiencing the event. Therefore these patient characteristics are used to identify potential prognostic factors and predictors by adding them as covariates into a multivariate Cox regression model according to certain algorithms. The genetic algorithm gives the possibility to search a very large space of possible models to find the best one. In contrast to the built-in algorithms with SAS® PROC PHREG like backward and stepwise selection this algorithm does not use the chi-square statistic with a threshold for the p-value for in- and exclusion of variables, but optimizes Akaike’s Information Criterion (AIC) of the Cox model. In a first step a number of randomly combined subsets of covariates are analysed. The subsets of covariates of the best Cox models with respect to AIC are the parents for the next generation of covariate subsets. By application of mutation and crossover, random variations of the covariate subsets are included for the next generation, enlarging the search space of possible models and avoiding local minima of the AIC. A predefined stopping criterion is used to stop the genetic algorithm when the parsimonious model has been found with regard to the AIC. Due to its flexibility the genetic selection algorithm has a great power to search the space of possible Cox models. But convergence of the genetic algorithm is not always guaranteed in an acceptable timeframe, if the random selection rate is too high. Different blinded data sets from clinical trials with a different numbers of possible covariates have been modelled using the multivariate Cox regression with the genetic selection algorithm. In most cases the genetic algorithm converged to a parsimonious model with respect to the AIC value. The optimal mutation and crossover rate was found to be between 15% and 30%. All results have been compared to classical covariate selection methods for validation. For each data set, several runs of the algorithm have been performed to check the robustness of the algorithm in finding the parsimonious model. Especially in studies with many potential covariates and borderline significances of chisquare statistics the genetic algorithm leads to a robust selection of covariates for the optimal model in the sense of AIC. References: 1. Tsai JS. Optimal Model Selection by a Genetic Algorithm Using SAS®. Proceedings of the Western Users of SAS® Software Conference, 2009, Cary, NC 2. Wiegand RE. Performance of using multiple stepwise algorithms for variable selection. Statistics in Medicine 2010; 29:1647-1659 International Biometric Conference, Florence, ITALY, 6 – 11 July 2014