Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Scientific Methods for Health Sciences (SMHS): Special Topics (HS853) SOCR Team http://www.socr.umich.edu/people/dinov/2016/Fall/HS853/ Contents Methods for Studying Heterogeneity of Treatment Effects, Case-Studies of Comparative Effectiveness Research ....................................................................................................................................... 2 Methods for Studying Heterogeneity of Treatment Effects ............................................................. 2 Examine the Fifth Dutch growth study (2009) fdgs ....................................................................... 2 Classification and Regression Trees (CART)..................................................................................... 3 Random Forests ........................................................................................................................................ 12 Latent growth and growth mixture modeling (LGM/GMM) ........................................................ 13 Results interpretation: ........................................................................................................................... 15 Meta-analysis ............................................................................................................................................. 21 Series of "N of 1" trials ............................................................................................................................ 25 Quantile Treatment Effect (QTE) ....................................................................................................... 28 Nonparametric Regression Methods ..................................................................................................... 32 Predictive risk models ................................................................................................................................. 38 HIV Example ............................................................................................................................................... 38 Comparative Effectiveness Research: Case-Studies ........................................................................ 41 Case-Study 1: The Cetuximab Study.................................................................................................. 41 Case-Study 2: The Rosiglitazone Study ............................................................................................ 45 Case-Study 3: The Nurses’ Health Study.......................................................................................... 50 Methods for Studying Heterogeneity of Treatment Effects, CaseStudies of Comparative Effectiveness Research #install.packages("rpart") #install.packages("rpart.plot") library("rpart") library(rpart.plot) Methods for Studying Heterogeneity of Treatment Effects Examine the Fifth Dutch growth study (2009) fdgs Is it true that "the world's tallest nation has stopped growing taller: the height of Dutch children from 1955 to 2009"? #install.packages("mice") library("mice") ## Loading required package: Rcpp ## mice 2.25 2015-11-09 library(caret) ## Loading required package: lattice ## Loading required package: ggplot2 # ?fdgs # to see more info about the FDGS head(fdgs) ## ## ## ## ## ## ## 1 2 3 4 5 6 id 100001 100003 100004 100005 100006 100018 reg West West West West West East age sex hgt wgt 13.09514 boy 175.5 75.0 13.81793 boy 148.4 40.0 13.97125 boy 159.9 42.0 13.98220 girl 159.7 46.5 13.52225 girl 160.3 47.8 10.21492 boy 157.8 39.7 hgt.z 1.751 -2.292 -1.000 -0.743 -0.414 2.025 wgt.z 2.410 -1.494 -1.315 -0.783 -0.355 0.823 summary(fdgs) ## ## ## ## ## ## ## ## ## ## ## id Min. :100001 1st Qu.:106353 Median :203855 Mean :180091 3rd Qu.:210591 Max. :401955 hgt Min. : 46.0 1st Qu.: 83.8 reg North: 732 East :2528 South:2931 West :2578 City :1261 age Min. : 0.008214 1st Qu.: 1.618754 Median : 8.084873 Mean : 8.157936 3rd Qu.:13.547570 Max. :21.993155 wgt Min. : 2.585 1st Qu.: 11.600 sex boy :4829 girl:5201 hgt.z Min. :-4.470000 1st Qu.:-0.678000 wgt.z Min. :-5.04000 1st Qu.:-0.62475 ## ## ## ## ## Median :131.5 Mean :123.9 3rd Qu.:162.3 Max. :208.0 NA's :23 Median : 27.500 Mean : 32.385 3rd Qu.: 51.100 Max. :135.300 NA's :20 Median :-0.019000 Mean :-0.006054 3rd Qu.: 0.677000 Max. : 3.900000 NA's :23 Median : 0.02600 Mean : 0.04573 3rd Qu.: 0.70700 Max. : 4.74100 NA's :20 summary(fdgs) ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## id Min. :100001 1st Qu.:106353 Median :203855 Mean :180091 3rd Qu.:210591 Max. :401955 hgt Min. : 46.0 1st Qu.: 83.8 Median :131.5 Mean :123.9 3rd Qu.:162.3 Max. :208.0 NA's :23 reg North: 732 East :2528 South:2931 West :2578 City :1261 age Min. : 0.008214 1st Qu.: 1.618754 Median : 8.084873 Mean : 8.157936 3rd Qu.:13.547570 Max. :21.993155 wgt Min. : 2.585 1st Qu.: 11.600 Median : 27.500 Mean : 32.385 3rd Qu.: 51.100 Max. :135.300 NA's :20 sex boy :4829 girl:5201 hgt.z Min. :-4.470000 1st Qu.:-0.678000 Median :-0.019000 Mean :-0.006054 3rd Qu.: 0.677000 Max. : 3.900000 NA's :23 wgt.z Min. :-5.04000 1st Qu.:-0.62475 Median : 0.02600 Mean : 0.04573 3rd Qu.: 0.70700 Max. : 4.74100 NA's :20 Classification and Regression Trees (CART) CART models start with an apriori regression model 𝑦 ∼ 𝑥1 + 𝑥2 +. . . +𝑥𝑛 + 𝜖), where 𝑦 is the response variable we are modeling as a function of the predictors {𝑥1 , 𝑥2 , . . . , 𝑥𝑛 }, a collection of independent variables. • • If 𝑦 is numeric, the resulting tree will be a regression tree. **If the 𝑦 is categorical*, the resulting tree will be a classification tree. The rpart package allows independent variables of all types for both classification and regression trees. To determine if missing observations should be split left or right rpart assesses surrogate splits where the missing observation is sent to the child node with the largest relative frequency. The following libraries are helpful for: • • • • library(rpart) # CART modeling library(randomForest) # Random forest modeling library(rpart.plot) # rpart graphics library(caret) # model assessment using confusion matrices Let's use the data frame fdgs to predict Region, from Age, Height, and Weight. CP = complexity parameter. # grow tree fit.1 <- rpart(reg ~ age + hgt + wgt, method="class", data= fdgs[,-1]) par(mar=c(3,6,3,6)) printcp(fit.1) # display the results ## ## ## ) ## ## ## ## ## ## ## ## ## ## ## ## Classification tree: rpart(formula = reg ~ age + hgt + wgt, data = fdgs[, -1], method = "class" Variables actually used in tree construction: [1] age Root node error: 7099/10030 = 0.70778 n= 10030 CP nsplit rel error xerror xstd 1 0.029582 0 1.00000 1.00000 0.0064159 2 0.021834 1 0.97042 0.97619 0.0065193 3 0.010000 2 0.94858 0.95140 0.0066161 plotcp(fit.1) # visualize cross-validation results rpart.plot(fit.1, extra=103) prp(fit.1, type=1, extra=103, branch=1) summary(fit.1) # detailed summary of splits ## ## ) ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## Call: rpart(formula = reg ~ age + hgt + wgt, data = fdgs[, -1], method = "class" n= 10030 CP nsplit rel error xerror xstd 1 0.02958163 0 1.0000000 1.0000000 0.006415919 2 0.02183406 1 0.9704184 0.9761938 0.006519283 3 0.01000000 2 0.9485843 0.9514016 0.006616142 Variable importance age wgt hgt 49 26 25 Node number 1: 10030 observations, complexity param=0.02958163 predicted class=South expected loss=0.7077767 P(node) =1 class counts: 732 2528 2931 2578 1261 probabilities: 0.073 0.252 0.292 0.257 0.126 left son=2 (4207 obs) right son=3 (5823 obs) Primary splits: age < 5.397673 to the left, improve=61.38357, (0 missing) hgt < 115.15 to the left, improve=53.45438, (23 missing) wgt < 19.55 to the left, improve=51.41483, (20 missing) Surrogate splits: hgt < 115.05 to the left, agree=0.983, adj=0.959, (0 split) wgt < 20.05 to the left, agree=0.973, adj=0.935, (0 split) Node number 2: 4207 observations predicted class=East expected loss=0.6893273 class counts: 266 1307 1097 1228 309 probabilities: 0.063 0.311 0.261 0.292 0.073 P(node) =0.4194417 Node number 3: 5823 observations, complexity param=0.02183406 predicted class=South expected loss=0.6850421 P(node) =0.5805583 class counts: 466 1221 1834 1350 952 probabilities: 0.080 0.210 0.315 0.232 0.163 left son=6 (1118 obs) right son=7 (4705 obs) Primary splits: age < 17.11294 to the right, improve=82.03326, (0 missing) hgt < 166.85 to the right, improve=52.81994, (4 missing) wgt < 52.95 to the right, improve=51.27184, (13 missing) Surrogate splits: wgt < 63.9 to the right, agree=0.848, adj=0.207, (0 split) hgt < 179.65 to the right, agree=0.840, adj=0.165, (0 split) Node number 6: 1118 observations predicted class=East expected loss=0.5805009 class counts: 50 469 314 224 61 probabilities: 0.045 0.419 0.281 0.200 0.055 P(node) =0.1114656 ## Node number 7: 4705 observations ## predicted class=South expected loss=0.6769394 ## class counts: 416 752 1520 1126 891 ## probabilities: 0.088 0.160 0.323 0.239 0.189 P(node) =0.4690927 # plot tree par(oma=c(0,0,2,0)) plot(fit.1, uniform=TRUE, margin=0.3, main="Classification Tree for Region ( FDGS Data)") text(fit.1, use.n=TRUE, all=TRUE, cex=1.0) # create a better plot of the classification tree post(fit.1, title = "Classification Tree for Region (FDGS Data)", file = "") The printcp() call generates a cost complexity parameter table including: * The complexity parameter (CP), that control the size of the tree; the greater CP values correspond to fewer splits in the tree. The optimal CP value is determined by a 10-fold cross validation. Then, the optimal size of the tree is the row in the CP table that minimizes the error with fewest branches. * relative model error (1 − relative error ∼ variance explained), * error estimated from a 10-fold cross validation (xerror), and * the standard error of the xerror (xstd). Pruning the tree Pruning back the resulting classificaiton tree aims to reduce data overfitting. A tree size that minimizes the cross-validation error, the xerror column printed by printcp( ), would be preferred. This command will automatically choose the optimal complexity parameter (CP) associated with the smallest cross-validation error (xerror). pruned.fit.1<- prune(fit.1, cp=fit.1$cptable[which.min(fit.1$cptable[,"xerror "]),"CP"]) # plot the pruned tree plot(pruned.fit.1, uniform=TRUE, margin = 0.4, main="Pruned Classification Tr ee for Region (FDGS Data)") text(pruned.fit.1, use.n=TRUE, all=TRUE, cex=1.0) post(pruned.fit.1, title = "Pruned Classification Tree for Region (FDGS Data )") # not much change, as the initial tree is not complex! Assessing the Tree Classificaiton Accuracy First assess the model with suboptimal (largest 𝐶𝑃 = 0.029582) fdgs.train80.ind <- sort(sample(nrow(fdgs), nrow(fdgs)*0.8)) # 80% training fdgs.train80 <- fdgs[fdgs.train80.ind,] fdgs.test20 <- fdgs[-fdgs.train80.ind,] fit.1 <- rpart(reg ~ age + hgt + wgt, 1]) printcp(fit.1) ## ## ## ## ## ## ## ## ## ## ## method="class", data= fdgs.train80[,- Classification tree: rpart(formula = reg ~ age + hgt + wgt, data = fdgs.train80[, -1], method = "class") Variables actually used in tree construction: [1] age Root node error: 5700/8024 = 0.71037 n= 8024 ## ## CP nsplit rel error xerror xstd ## 1 0.030526 0 1.00000 1.00000 0.0071283 ## 2 0.025789 1 0.96947 0.98895 0.0071842 ## 3 0.010000 2 0.94368 0.94561 0.0073796 pruned.1 <- prune(fit.1, cp=0.029582) # cp=0.01 is the optimal CP pred.1 <- predict(pruned.1, newdata=fdgs.test20, type="class") #predicting cl ass test data using the pruned model confusionMatrix(pred.1, fdgs.test20$reg) #compute confusion matrix and summar y statistics ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## Confusion Matrix and Statistics Reference Prediction North East South West City North 0 0 0 0 0 East 54 256 231 252 66 South 79 241 376 264 187 West 0 0 0 0 0 City 0 0 0 0 0 Overall Statistics Accuracy 95% CI No Information Rate P-Value [Acc > NIR] : : : : 0.3151 (0.2948, 0.3359) 0.3026 0.1171 Kappa : 0.0499 Mcnemar's Test P-Value : NA Statistics by Class: Sensitivity Specificity Pos Pred Value Neg Pred Value Prevalence Detection Rate Detection Prevalence Balanced Accuracy Sensitivity Specificity Pos Pred Value Neg Pred Value Prevalence Detection Rate Class: North Class: East Class: South Class: West 0.0000 0.5151 0.6194 0.0000 1.0000 0.6004 0.4489 1.0000 NaN 0.2980 0.3278 NaN 0.9337 0.7899 0.7311 0.7428 0.0663 0.2478 0.3026 0.2572 0.0000 0.1276 0.1874 0.0000 0.0000 0.4282 0.5718 0.0000 0.5000 0.5577 0.5342 0.5000 Class: City 0.0000 1.0000 NaN 0.8739 0.1261 0.0000 ## Detection Prevalence ## Balanced Accuracy 0.0000 0.5000 #sensitivity=producer's accuracy and specificity=user's accuracy Next assess the model with optimal (smalest 𝐶𝑃 = 0.01): pruned.2 <- prune(fit.1, cp=0.01) # cp=0.01 is the optimal CP pred.2 <- predict(pruned.2, newdata=fdgs.test20, type="class") #predicting cl ass test data using the pruned model confusionMatrix(pred.2, fdgs.test20$reg) #compute confusion matrix and summar y statistics ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## Confusion Matrix and Statistics Reference Prediction North East South West City North 0 0 0 0 0 East 62 331 299 300 76 South 71 166 308 216 177 West 0 0 0 0 0 City 0 0 0 0 0 Overall Statistics Accuracy 95% CI No Information Rate P-Value [Acc > NIR] : : : : 0.3185 (0.2982, 0.3394) 0.3026 0.0634 Kappa : 0.0621 Mcnemar's Test P-Value : NA Statistics by Class: Sensitivity Specificity Pos Pred Value Neg Pred Value Prevalence Detection Rate Detection Prevalence Balanced Accuracy Sensitivity Specificity Pos Pred Value Neg Pred Value Prevalence Detection Rate Class: North Class: East Class: South Class: West 0.0000 0.6660 0.5074 0.0000 1.0000 0.5116 0.5497 1.0000 NaN 0.3099 0.3284 NaN 0.9337 0.8230 0.7200 0.7428 0.0663 0.2478 0.3026 0.2572 0.0000 0.1650 0.1535 0.0000 0.0000 0.5324 0.4676 0.0000 0.5000 0.5888 0.5285 0.5000 Class: City 0.0000 1.0000 NaN 0.8739 0.1261 0.0000 ## Detection Prevalence ## Balanced Accuracy 0.0000 0.5000 #sensitivity=producer's accuracy and specificity=user's accuracy Notice that the classificaiton accuracy, albeit not great, increased from 0.3061 (model.1, only South Predictions, degenerate) to 0.3534 (model.2, East and South predictions). Random Forests Random forests may improve predictive accuracy by generating a large number of bootstrapped trees (based on random samples of variables). It classifies cases using each tree in this new "forest", and decides the final predicted outcome by combining the results across all of the trees (an average in regression, a majority vote in classification). See the randomForest package. The randomForest algorithm fits many (1,000's) CART models to random subsets of input data combining the results of hte predicitons of the individual trees. Again, all data types may be used as independent variables, regardless of whether the model is a classification or regression tree. However, missing values are handles with rfImpute() which uses a proximity matrix from the randomForest to impute (populate) incomplete data by: • • for continuous predictors: the weighted average of the non-missing observations (weighted by the proximities), Categorical predictors: the category with the largest average proximity. library(randomForest) ## randomForest 4.6-12 ## Type rfNews() to see new features/changes/bug fixes. ## ## Attaching package: 'randomForest' ## The following object is masked from 'package:ggplot2': ## ## margin fit.2 <- randomForest(reg ~ age + hgt + wgt, .omit, data= fdgs[,-1]) print(fit.2) # view results method="class", na.action = na ## ## Call: ## randomForest(formula = reg ~ age + hgt + wgt, data = fdgs[, -1], hod = "class", na.action = na.omit) ## Type of random forest: classification ## Number of trees: 500 ## No. of variables tried at each split: 1 ## ## OOB estimate of error rate: 68.06% met ## ## ## ## ## ## ## Confusion matrix: North East South West City class.error North 23 176 247 183 100 0.9684499 East 48 951 728 628 165 0.6226190 South 48 640 1246 739 248 0.5734338 West 59 712 844 731 219 0.7150097 City 50 265 396 302 239 0.8091054 importance(fit.2) # importance of each predictor ## MeanDecreaseGini ## age 2623.875 ## hgt 2304.597 ## wgt 2370.084 • • • • Note on missing values/incomplete data: If the data have missing values, we have 3 choices: Use a different tool (rpart handles missing values well) Impute the missing values For a small number of missing cases, we can use na.action = na.omit Latent growth and growth mixture modeling (LGM/GMM) We can illustrate the latent class linear mixed models implemented in hlme through a study of the quadratic trajectories of the response (remission) with TumorSize, adjusting for CO2*Pain interaction and assuming correlated random-effects for the functions of SmokingHx and Sex. To estimate the corresponding standard linear mixed model using 1 latent class where CO2 interacts with Pain: # install.packages("lcmm") library("lcmm") ## Loading required package: survival ## ## Attaching package: 'survival' ## The following object is masked from 'package:caret': ## ## cluster hdp <- read.csv("http://www.ats.ucla.edu/stat/data/hdp.csv") hdp <- within(hdp, { Married <- factor(Married, levels = 0:1, labels = c("no", "yes")) DID <- factor(DID) HID <- factor(HID) }) # add a new subject ID column (last column in the data, "ID"), this is necess ary for the hmle call hdp$ID <- seq.int(nrow(hdp)) model.hlme <- hlme(remission ~ IL6 + CRP + LengthofStay + Experience + I(tumo rsize^2) + co2*pain + I(tumorsize^2)*pain, random=~ SmokingHx + Sex, subject= 'ID', data=hdp, ng=1) ## Be patient, hlme is running ... ## The program took 22.05 seconds summary(model.hlme) ## ## ## ## ## Hx ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## Heterogenous linear mixed model fitted by maximum likelihood method hlme(fixed = remission ~ IL6 + CRP + LengthofStay + Experience + I(tumorsize^2) + co2 * pain + I(tumorsize^2) * pain, random = ~Smoking + Sex, subject = "ID", ng = 1, data = hdp) Statistical Model: Dataset: hdp Number of subjects: 8525 Number of observations: 8525 Number of latent classes: 1 Number of parameters: 21 Iteration process: Convergence criteria satisfied Number of iterations: 34 Convergence criteria: parameters= 1.2e-09 : likelihood= 8.3e-06 : second derivatives= 2.7e-05 Goodness-of-fit statistics: maximum log-likelihood: -5223.9 AIC: 10489.79 BIC: 10637.86 Maximum Likelihood Estimates: Fixed effects in the longitudinal model: intercept IL6 CRP LengthofStay Experience I(tumorsize^2) coef 0.28636 -0.01134 -0.00674 -0.04834 0.01695 0.00000 Se Wald p-value 0.24314 1.178 0.23890 0.00183 -6.184 0.00000 0.00167 -4.043 0.00005 0.00463 -10.436 0.00000 0.00119 14.263 0.00000 0.00001 -0.076 0.93953 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## co2 -0.03549 0.16204 pain 0.03930 0.04278 co2:pain -0.01489 0.02871 I(tumorsize^2):pain 0.00000 0.00000 -0.219 0.919 -0.519 0.553 0.82663 0.35832 0.60395 0.58045 Variance-covariance matrix of the random-effects: intercept SmokingHxformer SmokingHxnever Sexmale intercept 0.19311 SmokingHxformer -0.10618 0.20916 SmokingHxnever -0.12389 0.06834 0.22627 Sexmale -0.08131 -0.00735 -0.00002 0.17302 Residual standard error: coef Se 0.12998 1.18743 Results interpretation: • • • • • • • The first part of the summary provides information about the dataset, the number of subjects, observations, observations deleted (since by default, missing observations are deleted), number of latent classes and number of parameters. Next, details about the algorithm convergence is provided along with the number of iterations, the convergence criteria, and the information indicating if the model converged correctly: "convergence criteria satisfied". The maximum log-likelihood, Akaike criterion (AIC) and Bayesian Information criterion (BIC) are reported. Estimates of parameters, the estimated standard error, the Wald Test statistics (with Normal approximation) and the corresponding p-values are reported below. For the random-effect distribution, the estimated matrix of covariance of the randomeffects is displayed. The standard error of the residuals is given along with its estimated standard error. The effect of TumorSize seems not associated with change over Pain of Remission. This may be formally assessed using a multivariate Wald test: WaldMult(model.hlme, pos=c(6,8)) ## Wald Test p_value ## I(tumorsize^2) = pain = 0 0.85562 0.65193 # pos - a vector containing the indices in model.hlme of the parameters to te st We may consider the model with an adjustment for CRP only on the intercept. Below we estimate the corresponding models for a varying number of latent classes (from 1 to 3) using the default initial values: # Initial Model: model.hlme <- hlme(remission ~ IL6 + CRP + LengthofStay + Ex perience + I(tumorsize^2) + co2*pain + I(tumorsize^2)*pain, random=~ SmokingH x + Sex, subject='ID', data=hdp, ng=1) model.hlme.1 <- hlme(tumorsize ~ IL6 + CRP + LengthofStay, subject='ID', data =hdp, ng=1) ## Be patient, hlme is running ... ## The program took 0.57 seconds model.hlme.2 <- hlme(tumorsize ~ IL6 + CRP + LengthofStay + SmokingHx, mixtur e=~ SmokingHx, subject='ID', data=hdp, ng=2) ## Be patient, hlme is running ... ## The program took 29.53 seconds model.hlme.3 <- hlme(tumorsize ~ IL6 + CRP + LengthofStay + SmokingHx, mixtur e=~ SmokingHx, subject='ID', data=hdp, ng=3) # this may take over 6-min to c omplete! ## Be patient, hlme is running ... ## The program took 375.19 seconds The estimation process for a varying number of latent classes can be summarized with summarytable, which gives the log-likelihood, the number of parameters, the Bayesian Information Criterion, and the posterior proportion of each class: summarytable(model.hlme.1, model.hlme.2, model.hlme.3) ## G loglik npm BIC %class1 %class2 %class3 ## model.hlme.1 1 -33301.82 5 66648.89 100.000000 ## model.hlme.2 2 -31592.79 11 63285.15 99.214076 0.7859238 ## model.hlme.3 3 -31589.55 15 63314.86 6.357771 82.2991202 11.34311 (This program may take over 400 seconds to complete!) In this example, the optimal number of latent classes according to the BIC is two (the smallest BIC). The posterior classification is described with: postprob(model.hlme.2) ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## Posterior classification: class1 class2 N 8458.00 67.00 % 99.21 0.79 Posterior classification table: --> mean of posterior probabilities in each class prob1 prob2 class1 0.8555 0.1445 class2 0.4362 0.5638 Posterior probabilities above a threshold (%): class1 class2 prob>0.7 92.48 2.99 ## prob>0.8 ## prob>0.9 ## 77.38 38.53 0.00 0.00 In this example, the first class includes a posteriori 8458 subjects (99%) while class 2 includes 67 (0.79%) subjects. Subjects were classified in class 1 with a mean posterior probability of 0.8555%. In class 1, 92.48% were classified with a posterior probability above 0.7 while 2.99% of the subjects were classified in class 2 with a posterior probability above 0.7. Goodness-of-fit of the model can be assessed by displaying the residuals as in figure and the mean predictions of the model as in figure, according to the time variable given in var.time: plot(model.hlme.2) # Figure (left panel) plot(model.hlme.2, which="fit", var.time="Age", bty="l", ylab=" Remission ", xlab="Age", lwd=2) # Figure (right panel) plot(model.hlme.2, which="fit", var.time="Age", bty="l", ylab=" Remission ", xlab="Age", lwd=2, marg=FALSE) The latent process mixed models implemented in lcmm are illustrated through the study of the linear trajectory of ntumors with Age adjusted for Sex and assuming correlated random-effects for the intercept and Age. In the plot below, lines estimate the corresponding latent process mixed model with different link functions: library("lcmm") model.hlme.lin <- lcmm(ntumors ~ Age*Sex, random=~ Age ,subject='ID', data= hdp) ## Be patient, lcmm is running ... ## The program took 27.02 seconds model.hlme.beta <- lcmm(ntumors ~ Age*Sex, random=~ Age, subject='ID', data =hdp, link='beta') ## Be patient, lcmm is running ... ## The program took 109.44 seconds model.hlme.spl <- lcmm(ntumors ~ Age*Sex, random=~ Age, subject='ID', data= hdp, link='splines') ## Be patient, lcmm is running ... ## The program took 54.36 seconds model.hlme.spl5q <- lcmm(ntumors ~ Age*Sex, random=~ Age, subject='ID', dat a=hdp, link='5-quant-splines') # takes over 4 minutes ## Be patient, lcmm is running ... ## The program took 53.15 seconds Link function: An optional family of link functions. By default, * "linear" option specifies a linear link function leading to a standard linear mixed model (homogeneous or heterogeneous as estimated in hlme). * "beta" for estimating a link function from the family of Beta cumulative distribution functions, * "thresholds" for using a threshold model to describe the correspondence between each level of an ordinal outcome and the underlying latent process, and * "Splines" for approximating the link function by I-splines. For this latter case, the number of nodes and the nodes location should be also specified. The number of nodes is first entered followed by ??? -, then the location is specified with "equi", "quant" or "manual" for respectively equidistant nodes, nodes at quantiles of the marker distribution or interior nodes entered manually in argument * intnodes. It is followed by - and finally "splines" is indicated. For example, "7-equi-splines" means I-splines with 7 equidistant nodes, "6-quant-splines" means I-splines with 6 nodes located at the quantiles of the marker distribution and "9-manual-splines" means I-splines with 9 nodes, the vector of 7 interior nodes being entered in the argument intnodes. summary (model.hlme.lin) Objects mlin, mbeta, mspl and mspl3eq are latent process mixed models that assume the exact same trajectory for the underlying latent process but respectively a linear, BetaCDF, I-splines with 5 equidistant knots (default with link='splines') and I-splines with 5 knots at percentiles. mlin reduces to a standard linear mixed model (link='linear' by default). The only difference with a hlme object is the parameterization for the intercept and the residual standard error that are considered as rescaling parameters. col <- rainbow(4) plot(model.hlme.lin, which="linkfunction", bty='l', ylab="Number-of-Tumors" , col=col[1], lwd=2, xlab="underlying latent process") plot(model.hlme.beta, which="linkfunction", add=T, col=col[2], lwd=2) plot(model.hlme.spl, which="linkfunction", add=T, col=col[3], lwd=2) plot(model.hlme.spl5q, which="linkfunction", add=T, col=col[4], lwd=2) legend(x="topleft",legend=c("linear", "beta","splines (5equidistant)", "spl ines (5 at quantiles)"), lty=1,col=col,bty="n",lwd=2) # to obtain confidence bands use function predictlink link.lin <- predictlink(model.hlme.lin, ndraws=2000) You would most likely get an error like: Error in predictlink.lcmm(model.hlme.spl, ndraws = 2000):No confidence intervals can be produced since the program did not converge properly To fix that, change the convergence criteria model.hlme.lin <- lcmm(ntumors ~ Age*Sex, random=~ Age ,subject='ID', epsY = 0.5, convB = 1e-01, convL = 1e-01, convG = 1e-01, maxiter=200, data=hdp); m odel.hlme.lin$conv # Now that we have convergence, we can obtain CI's!!! link.lin <- predictlink(model.hlme.lin, ndraws=2000) # plot(model.hlme.lin, which="linkfunction", bty='l', ylab="Number-of-Tumor s", col=col[1], lwd=2, xlab="underlying latent process") plot(link.lin, add=TRUE, col=col[1], lty=2, lwd=2) legend(x="left", legend=c("95% confidence bands", "for linear fit"), lty=c( 2,NA), col=c(col[1],NA), bty="n", lwd=2) # Repeat using the other link functions . model.hlme.beta, model.hlme.spl, . model.hlme.beta <- lcmm(ntumors ~ Age*Sex, random=~ Age, subject='ID', data=h dp, link='beta', convB = 1e-01, convL = 1e-01, convG = 1e-01, maxiter=200); model.hlme.beta$co nv link.beta <- predictlink(model.hlme.beta, ndraws=2000) plot(link.beta, add=TRUE, col=col[2], lty=2, lwd=2) legend(x="left", legend=c("95% confidence bands", "for BETA fit"), lty=c(3,NA ), col=c(col[2],NA), bty="n", lwd=1) Meta-analysis Meta-analysis is an approach to combine treatment effects across trials or studies into an aggregated treatment effect with higher statistical power than observed in each individual trials. It may detect HTE by testing for differences in treatment effects across similar RCTs. It requires that the individual treatment effects are similar to ensure pooling is meaningful. In the presence of large clinical or methodological differences between the trials, it may be to avoid meta-analyses. The presence of HTE across studies in a meta-analysis may be due to differences in the design or execution of the individual trials (e.g., randomization methods, patient selection criteria). Cochran's Q is a methods for detection of heterogeneity, which is computed as the weighted sum of squared differences between each study's treatment effect and the pooled effects across the studies. It is a barometer of inter-trial differences impacting the observed study result. A possible source of error in a meta-analysis is publication bias. Trial size may introduce publication bias since larger trials are more likely to be published. Language and accessibility represent other potential confounding factors. When the heterogeneity is not due to poor study design, it may be useful to optimize the treatment benefits for different cohorts of participants. Cochran's Q statistics is the weighted sum of squares on a standardized scale. The corresponding P value indicates the strength of the evidence of presence of heterogeneity. This test may have low power to detect heterogeneity sometimes and it is suggested to use a value of 0.10 as a cut-off for significance (Higgins et al., 2003). The Q statistics also may have too much power as a test of heterogeneity when the number of studies is large. Simulation Example 1 # Install and Load library # install.packages("meta") library(meta) ## Loading 'meta' package (version 4.6-0). ## Type 'help("meta-package")' for a brief overview. # Set number of studies n.studies = 15 # number of treatments: case1, case2, control n.trt = 3 # number of outcomes n.event = 2 # simulate the (balanced) number of cases (case1 and each study ctl.group = rbinom(n = n.studies, size = 200, prob = case1.group = rbinom(n = n.studies, size = 200, prob case2.group = rbinom(n = n.studies, size = 200, prob case2) and controls in 0.3) = 0.3) = 0.3) # Simulate the number of outcome events (e.g., deaths) and no events in the control group event.ctl.group = rbinom(n = n.studies, size = ctl.group, prob = rep(0.1, l ength(ctl.group))) noevent.ctl.group = ctl.group - event.ctl.group # Simulate the number of events and no events in the case1 group event.case1.group = rbinom(n = n.studies, size = case1.group, prob = rep(0. 5, length(case1.group))) noevent.case1.group = case1.group - event.case1.group # Simulate the number of events and no events in the case2 group event.case2.group = rbinom(n = n.studies, size = case2.group, prob = rep(0. 6, length(case2.group))) noevent.case2.group = case2.group - event.case2.group # Run the univariate meta-analysis using metabin(), Meta-analysis of binary outcome data # Calculation of fixed and random effects estimates (risk ratio, odds ratio , risk difference or arcsine # difference) for meta-analyses with binary outcome data. Mantel-Haenszel (MH), # inverse variance and Peto method are available for pooling. # method = A character string indicating which method is to be used for poo ling of studies. # one of "MH" , "Inverse" , or "Cochran" # sm = A character string indicating which summary measure ("OR", "RR" "RD" =risk difference) is to be # used for pooling of studies # Control vs. Case1, n.e and n.c are numbers in experimental and control gr oups meta.ctr_case1 <- metabin(event.e = event.case1.group, n.e = case1.group, e vent.c = event.ctl.group, n.c = ctl.group, method = "MH", sm = "OR") # in this case we use Odds Ratio, of the odds of death in the experimental and control studies forest(meta.ctr_case1) # Control vs. Case2 meta.ctr_case2 <- metabin(event.e = event.case2.group, n.e = case2.group, e vent.c = event.ctl.group, n.c = ctl.group, method = "MH", sm = "OR") forest(meta.ctr_case2) # Case1 vs. Case2 meta.case1_case2 <- metabin(event.e = event.case1.group, n.e = case1.group, event.c = event.case2.group, n.c = case2.group, method = "MH", sm = "OR") forest(meta.case1_case2) summary(meta.case1_case2) ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## Number of studies combined: k = 15 OR 95%-CI z p-value Fixed effect model 0.6178 [0.5133; 0.7435] -5.10 < 0.0001 Random effects model 0.6183 [0.5131; 0.7450] -5.06 < 0.0001 Quantifying heterogeneity: tau^2 = 0; H = 1.00 [1.00; 1.36]; I^2 = 0.0% [0.0%; 45.8%] Test of heterogeneity: Q d.f. p-value 11.98 14 0.6076 Details on meta-analytical method: - Mantel-Haenszel method - DerSimonian-Laird estimator for tau^2 The forest plot shows the I2 test indicates the evidence to reject the null hypothesis (no study heterogeneity and the fixed effects model should be used). Series of "N of 1" trials This technique combines (a "series of") n-of-1 trial data to identify HTE. An n-of-1 trial is a repeated crossover trial for a single patient, which randomly assigns the patient to one treatment vs. another for a given time period, after which the patient is re-randomized to treatment for the next time period, usually repeated for 4-6 time periods. Such trials are most feasibly done in chronic conditions, where little or no washout period is needed between treatments and treatment effects are identifiable in the short-term, such as pain or reliable surrogate markers. Combining data from identical n-of-1 trials across a set of patients enables the statistical analysis controlling for patient fixed or random effects, covariates, centers, or sequence effects, see Figure below. These combined trials are often analyzed within a Bayesian context using shrinkage estimators that combine individual and group mean treatment effects to create a "posterior" individual mean treatment effect estimate which is a form of inverse variance-weighted average of the individual and group effects. Such trials are typically more expensive than standard RCTs on a per-patient basis, however, they require much smaller sample sizes, often less than 100 patients (due to the efficient individual-as-own-control design), and create individual treatment effect estimates that are not possible in a non-crossover design . For the individual patient, the treatment effect can be re-estimated after each time period, and the trial stopped at any point when the more effective treatment is identified with reasonable statistical certainty. Example A study involving 8 participants collected data across 30 days, in which 15 treatment days and 15 control days are randomly assigned within each participant . The treatment effect is represented as a binary variable (control day=0; treatment day=1). The outcome variable represents the response to the intervention within each of the 8 participants. Study employed a fixed-effects modeling. By creating N-1 dummy-coded variables representing the N=8 participants, where the last (i=8) participant serves as the reference (i.e., as the model intercept). So, each dummy-coded variable represents the difference between each participant (i) and the 8th participant. Thus, all other patients' values will be relative to the values of the 8th (reference) subject. The overall differences across participants in fixed effects can be evaluated with multiple degree-of-freedom F-tests. Intercept Physical Activity Intervention WP Social Support PM Social Support (0=3) Self Efficacy (0=25) Constant PhyAct Tx WPSS PMss3 SelfEff25 rm(list=ls()) Nof1 <-read.table("https://umich.instructure.com/files/330385/download?down load_frd=1", sep=",", header = TRUE) # 02_Nof1_Data.csv attach(Nof1) head(Nof1) ## ID Day Tx SelfEff SelfEff25 WPSS SocSuppt PMss PMss3 PhyAct ## 1 1 1 1 33 8 0.97 5.00 4.03 1.03 53 ## 2 1 2 1 33 8 -0.17 3.87 4.03 1.03 73 ## 3 1 3 0 33 8 0.81 4.84 4.03 1.03 23 ## 4 ## 5 ## 6 1 1 1 4 5 6 0 1 1 33 33 33 8 -0.41 8 0.59 8 -1.16 3.62 4.03 4.62 4.03 2.87 4.03 1.03 1.03 1.03 36 21 0 df.1 = data.frame(PhyAct, Tx, WPSS, PMss3, SelfEff25) library("lme4") ## Loading required package: Matrix ## ## Attaching package: 'lme4' ## The following objects are masked from 'package:lcmm': ## ## fixef, ranef lm.1 = model.lmer <- lmer(PhyAct ~ Tx + SelfEff + Tx*SelfEff + (1|Day) + (1 |ID) , data= df.1) summary(lm.1) ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## Linear mixed model fit by REML ['lmerMod'] Formula: PhyAct ~ Tx + SelfEff + Tx * SelfEff + (1 | Day) + (1 | ID) Data: df.1 REML criterion at convergence: 8820 Scaled residuals: Min 1Q Median -2.7012 -0.6833 -0.0333 3Q 0.6542 Max 3.9612 Random effects: Groups Name Variance Std.Dev. Day (Intercept) 4.808e-13 6.934e-07 ID (Intercept) 6.015e+02 2.453e+01 Residual 9.690e+02 3.113e+01 Number of obs: 900, groups: Day, 30; ID, 30 Fixed effects: Estimate Std. Error t value (Intercept) 38.3772 14.4738 2.651 Tx 4.0283 6.3745 0.632 SelfEff 0.5818 0.5942 0.979 Tx:SelfEff 0.9702 0.2617 3.708 Correlation of Fixed Effects: (Intr) Tx SlfEff Tx -0.220 SelfEff -0.946 0.208 Tx:SelfEff 0.208 -0.946 -0.220 # Model: PhyAct = Tx + WPSS + PMss3 + Tx*WPSS + Tx*PMss3 + SelfEff25 + Tx* SelfEff25 + ?? lm.2 = lm(PhyAct ~ Tx + WPSS + PMss3 + Tx*WPSS + Tx*PMss3 + SelfEff25 + Tx* SelfEff25, df.1) summary(lm.2) ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## Call: lm(formula = PhyAct ~ Tx + WPSS + PMss3 + Tx * WPSS + Tx * PMss3 + SelfEff25 + Tx * SelfEff25, data = df.1) Residuals: Min 1Q -102.39 -28.24 Median -1.47 3Q 25.16 Max 122.41 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 52.0067 1.8080 28.764 < 2e-16 *** Tx 27.7366 2.5569 10.848 < 2e-16 *** WPSS 1.9631 2.4272 0.809 0.418853 PMss3 13.5110 2.7853 4.851 1.45e-06 *** SelfEff25 0.6289 0.2205 2.852 0.004439 ** Tx:WPSS 9.9114 3.4320 2.888 0.003971 ** Tx:PMss3 8.8422 3.9390 2.245 0.025025 * Tx:SelfEff25 1.0460 0.3118 3.354 0.000829 *** --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 37.03 on 892 degrees of freedom Multiple R-squared: 0.2394, Adjusted R-squared: 0.2334 F-statistic: 40.11 on 7 and 892 DF, p-value: < 2.2e-16 Quantile Treatment Effect (QTE) QTE employs quantile regression estimation (QRE) to examine the central tendency and statistical dispersion of the treatment effect in a population. These may not be revealed by the conventional mean estimation in RCTs. For instance, patients with different comorbidity scores may respond differently to a treatment. Quantile regression has the ability to reveal HTE according to the ranking of patients' comorbidity scores or some other relevant covariate by which patients may be ranked. Therefore, in an attempt to inform patient-centered care, quantile regression provides more information on the distribution of the treatment effect than typical conditional mean treatment effect estimation. QTE characterizes the heterogeneous treatment effect on individuals and groups across various positions in the distributions of different outcomes of interest. This unique feature has given quantile regression analysis substantial attention and has been employed across a wide range of applications, particularly when evaluating the economic effects of welfare reform. One caveat of applying QRE in clinical trials for examining HTE is that the QTE doesn't demonstrate the treatment effect for a given patient. Instead, it focuses on the treatment effect among subjects within the qth quantile, such as those who are exactly at the top 10th percent in terms of blood pressure or a depression score for some covariate of interest, for example, comorbidity score. It is not uncommon for the qth quantiles to be two different sets of patients before and after the treatment. For this reason, we have to assume that these two groups of patients are homogeneous if they were in the same quantiles. Income-Food Expenditure Example Let's examine the Engel data (N=235) on the relationship between food expenditure (foodexp) and household income income. We can plot the data and then explore the superposition of the six fitted quantile regression lines. #install.packages("quantreg") library(quantreg) ## Loading required package: SparseM ## ## Attaching package: 'SparseM' ## The following object is masked from 'package:base': ## ## backsolve ## ## Attaching package: 'quantreg' ## The following object is masked from 'package:survival': ## ## untangle.specials data(engel) attach(engel) head(engel) ## ## ## ## ## ## ## • 1 2 3 4 5 6 income 420.1577 541.4117 901.1575 639.0802 750.8756 945.7989 foodexp 255.8394 310.9587 485.6800 402.9974 495.5608 633.7978 Note: If 𝑌 be a real valued random variable with cumulative distribution function 𝐹𝑌 (𝑦) = 𝑃(𝑌 ≤ 𝑦), then the 𝜏-quantile of 𝑌 is given by: 𝑄𝑌 (𝜏) = 𝐹𝑌−1 (𝜏) = inf{𝑦: 𝐹𝑌 (𝑦) ≥ 𝜏}, where 0 ≤ 𝜏 ≤ 1. # (1) Graphics plot(income, foodexp, cex=.25, type="n", xlab="Household Income", ylab="Foo d Expenditure") points(income, foodexp, cex=.5, col="blue") # tau - the quantile(s) to be estimated, in the range from 0 to 1. An objec t "rq.process" and an object "rqs" # are returned containing the matrix of coefficient estimates at the specif ied quantiles. abline( rq(foodexp ~ income, tau=.5), col="blue") # Quantile Regression Model abline( lm(foodexp ~ income), lty=2, lwd=3, col="red") taus <- c(0.05, 0.1, 0.25, 0.75, 0.90, 0.95) colors <- rainbow(length(taus)) # linear model models <- vector(mode = "list", length = length(taus)) # define a vector of models to store QR for diff taus model.names <- vector(mode = "list", length = length(taus)) # define a vect or model names for( i in 1:length(taus)){ models[[i]] <- rq(foodexp ~ income, tau=taus[i]) var <- taus[i] model.names[[i]] <- paste("Model [", i , "]: tau=", var) abline( models[[i]], lwd=2, col= colors[[i]]) } legend(3000, 1100, model.names, col= colors, pch= taus, bty='n', cex=.75) Inference about quantile regression coefficients As an alternative to the rank-inversion confidence intervals, we can obtain a table of coefficients, standard errors, t-statistics, and p-values using the summary function: summary(models[[3]], se = "nid") ## ## ## ## ## ## ## ## ## Call: rq(formula = foodexp ~ income, tau = taus[i]) tau: [1] 0.25 Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) 95.48354 21.39237 4.46344 0.00001 income 0.47410 0.02906 16.31729 0.00000 Alternatively, we can use summary.rq to compute bootstrapped standard errors. summary.rq(models[[3]], se = "nid") ## ## Call: rq(formula = foodexp ~ income, tau = taus[i]) ## ## tau: [1] 0.25 ## ## Coefficients: ## Value Std. Error t value Pr(>|t|) ## (Intercept) 95.48354 21.39237 ## income 0.47410 0.02906 4.46344 16.31729 0.00001 0.00000 Nonparametric Regression Methods Nonparametric regression enables dealing with HTE in RCTs. Different nonparametric methods, such as kernel smoothing methods and series methods, can be used to generate test statistics for examining the presence of HTE. A kernel method is a weighting scheme based on a kernel function (e.g. uniform, Gaussian). When evaluating the treatment effect of a patient in RCTs, the kernel method assigns larger weights to those observations with similar covariates. This is done because it is assumed that patients with similar covariates provide more relevant data on predicted treatment response. Examining participants that have different backgrounds (e.g., demographic, clinical), kernel smoothing methods utilize information from highly divergent participants when estimating a particular subject's treatment effect. Lower weights are assigned to very different subjects and the kernel methods require choosing a set of smoothing parameters to group patients according to their relative degree of similarities. A drawback is that the corresponding proposed test statistics may be sensitive to the chosen bandwidths, which inhibits the interpretation of the results. Series methods use approximating functions (splines or power series of the explanatory variables) to construct test statistics. Compared to kernel smoothing methods, series methods normally have the advantage of computational convenience; however, the precision of test statistics depends on the number of terms selected in the series. Canadian Wage Data Example: Nonparametric regression extends the classical parametric regression (e.g., lm, lmer) involving one continuous dependent variable, y, and (1 or more) continuous explanatory variable(s), x. Let's start with a popular parametric model of a wage equation that we can extend to a fully nonparametric regression model. First, we will compare and contrast the parametric and nonparametric approach towards univariate regression and then proceed to multivariate regression. Let's use the Canadian cross-section wage data (cps71) consisting of a random sample taken from the 1971 Canadian Census for male individuals having common education (High-School). N=205 observations, 2 variables, the logarithm of the individual's wage (logwage) and their age (age). The classical wage equation model includes a quadratic term of age. # install.packages("np") library("np") ## Nonparametric Kernel Methods for Mixed Datatypes (version 0.60-2) ## [vignette("np_faq",package="np") provides answers to frequently asked ques tions] data("cps71") # (1) Linear Model ??? R2 = 0.2308 model.lin <- lm( logwage ~ age + I(age^2), data = cps71) summary(model.lin) ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## Call: lm(formula = logwage ~ age + I(age^2), data = cps71) Residuals: Min 1Q -2.4041 -0.1711 Median 0.0884 3Q 0.3182 Max 1.3940 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 10.0419773 0.4559986 22.022 < 2e-16 *** age 0.1731310 0.0238317 7.265 7.96e-12 *** I(age^2) -0.0019771 0.0002898 -6.822 1.02e-10 *** --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.5608 on 202 degrees of freedom Multiple R-squared: 0.2308, Adjusted R-squared: 0.2232 F-statistic: 30.3 on 2 and 202 DF, p-value: 3.103e-12 # (2) Next, we consider the local linear nonparametric method employing cro ss-validated # bandwidth selection and estimation in one step. Start with computing the least-squares # cross-validated bandwidths for the local constant estimator (default). # Note that R2 = 0.3108675 bandwidth <- npregbw(formula= logwage ~ age, data = cps71) ## Multistart Multistart Multistart Multistart Multistart Multistart 1 1 1 1 1 1 of of of of of of 1 1 1 1 1 1 | | | / | | model.np <- npreg(bandwidth, ts = TRUE, data = cps71) summary(model.np) ## ## ## ## ## ## ## ## ## ## regtype = "ll", bwmethod = "cv.aic", gradien Regression Data: 205 training points, in 1 variable(s) age Bandwidth(s): 1.892157 Kernel Regression Estimator: Local-Constant Bandwidth Type: Fixed Residual standard error: 0.5307943 R-squared: 0.3108675 ## Continuous Kernel Type: Second-Order Gaussian ## No. Continuous Explanatory Vars.: 1 # NP model significance may be tested by npsigtest(model.np) ## Kernel Regression Significance Test ## Type I Test with IID Bootstrap (399 replications, Pivot = TRUE, joint = FA LSE) ## Explanatory variables tested for significance: ## age (1) ## ## age ## Bandwidth(s): 1.892157 ## ## Individual Significance Tests ## P Value: ## age < 2.22e-16 *** ## --## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 So, as was the case for the linear parametric model, Age is significant in the local linear NPmodel # (3) Graphical comparison of parametric and nonparametric models. plot(cps71$age, cps71$logwage, xlab = "age", ylab = "log(wage)", cex=.1) lines(cps71$age, fitted(model.lin), lty = 2, col = " red") lines(cps71$age, fitted(model.np), lty = 1, col = "blue") legend("topright", c("Data", "Linear", "Non-linear"), col=c("Black", "Red", "Blue"), pch = c(1, 1, 1), bty='n', cex=.75) # some additional plots resenting the parametric (quadratic, dashed line) a nd the nonparametric estimates # (solid line) of the regression function for the cps71 data. plot(model.np, plot.errors.method = "asymptotic") plot(model.np, gradients = TRUE) lines(cps71$age, coef(model.lin)[2]+2*cps71$age*coef(model.lin)[3], lty = 2 , col = "red") plot(model.np, gradients = TRUE, plot.errors.method = "asymptotic") # (4) using the Lin and NL models to generate predictions based on the obta ined appropriate # bandwidths and estimated a nonparametric model. We need to create a set o f explanatory # variables for which to generate predictions. These can be part of the ori ginal dataset or be # outside its scope. Typically, we don't have the outcome for the evaluatio n data and need only # provide the explanatory variables for which predicted values are generate d by the models. # Occasionally, splitting the dataset into two independent samples (trainin g/testing), allows estimation # of a model on one sample, and evaluation of its performance on another. cps.eval.data <- data.frame(age = seq(10,70, by=10)) # simulate some explan atory X values (ages) pred.lin <- predict(model.lin, newdata = cps.eval.data) # Linear Pred iction of log(Wage) pred.np <- predict(model.np, newdata = cps.eval.data) # non-Linear Pred iction of log(Wage) plot(pred.lin, pred.np) abline(lm(pred.np ~ pred.lin)) Predictive risk models Predictive risk models represent a class of methods for identifying potential for HTE when the individual patient risk for disease-related events at baseline depends on observed factors. For instance, common measures are disease staging criteria, such as those used in COPD or heart failure, Framingham risk scores for cardiovascular event risk, or genetic variations, e.g., HER2 for breast cancer. Initial predictive risk modeling, aka risk function estimation, is often performed without accounting for treatment effects. Least squares or Cox proportional hazards regression methods are appropriate in many cases and provide relatively more interpretable risk functions, but rely on linearity assumptions and may not provide optimal predictive metrics. Partial least squares is an extension of least squares methods that can reduce the dimensionality of the predictor space by interposing latent variables, predicted by linear combinations of observable characteristics, as the intermediate predictors of one or more outcomes. Recursive partitioning, such as random forests, support vector machines, and neural networks represent latter methods with better predictive power than linear methods. Risk function estimation can range from highly exploratory analyses to near meta-analytic model validation, and may be useful at any stage of product development. HIV Example The "hmohiv" dataset represents a study of HIV positive patients examining whether there was a difference in survival times of HIV positive patients between a cohort using intravenous drugs (drug=1) and a cohort not using the IV drug (drug=0). The hmohiv data includes the following variables: ID time age drug censor entdate enddate. #cleaning up environment rm(list=ls()) # load survival library library(survival) # load hmohiv data hmohiv<-read.table("http://www.ats.ucla.edu/stat/r/examples/asa/hmohiv.csv" , sep=",", header = TRUE) attach(hmohiv) ## The following object is masked from Nof1: ## ## ID # construct a frame of the 2 cohorts IV_drug and no-IV-drug drug.new<-data.frame(drug=c(0,1)) # Fit Cox proportional hazards regression model cox.model <- coxph( Surv(time, censor) ~ drug, method="breslow") fit.1 <- survfit(cox.model, newdata=drug.new) # plot results plot(fit.1, xlab="Survival Time (Months)", ylab="Survival Probability") points(fit.1$time, fit.1$surv[,1], pch=1) points(fit.1$time, fit.1$surv[,2], pch=2) legend(40, .8, c("Drug Absent", "Drug Present"), pch=c(1,2)) # to inslect the resulting Cox Proportional Hazard Model cox.model ## ## ## ## ## ## ## ## Call: coxph(formula = Surv(time, censor) ~ drug, method = "breslow") coef exp(coef) se(coef) z p drug 0.779 2.180 0.242 3.22 0.0013 Likelihood ratio test=10.2 on 1 df, p=0.00141 n= 100, number of events= 80 Comparative Effectiveness Research: Case-Studies 1 Observational Studies: Tips for the CER Practitioners • Different study types can offer different understandings; neither should be discounted without closer examination. • RCTs provide an accurate understanding of the effect of a particular intervention in a well-defined patient group under “controlled” circumstances. • Observational studies provide an understanding of real-world care and its impact, but can be biased due to uncontrolled factors. • Observational studies differ in the types of databases used. These databases may lack clinical detail and contain incomplete or inaccurate data. • Before accepting the findings from an observational study, consider whether confounding factors may have influenced the results. • In this scenario, subgroup analysis was vital in clarifying both study designs; what is true for the many (e.g., overall, estrogen appeared to be detrimental) may not be true for the few (e.g., that for the younger post-menopausal woman, the benefits were greater and the harms less frequent). • Carefully examine the generalizability of the study. Do the study’s patients and intervention match those under consideration? • Observational studies can identify associations but cannot prove cause-and-effect relationships. Case-Study 1: The Cetuximab Study 2 What was done and what was found? Cetuximab, an anti-epidermal growth factor receptor (EGFR) agent, has recently been added to the therapeutic armamentarium. Two important CRTs examined its impact in patients with mCRC (metastatic-stage Colorectal cancer). In the first one, 56 centers in 11 European countries investigated the outcomes associated with cetuximab therapy in 329 mCRC patients who experienced disease progression either on irinotecan therapy or within 3 months thereafter. The study reported that the group on a combination of irinotecan and cetuximab had a significantly higher rate of overall response to treatment (primary endpoint) than the group on cetuximab alone: 22.9% (95% CI, 17.5-29.1%) vs. 10.8% (95% CI, 5.7-18.1%) (P=0.007), respectively. Similarly, the median time to progression was significantly longer in the combination therapy group (4.1 vs. 1.5 months, P<0.001). As these patients had already progressed on irinotecan prior to the study, any response was 1 Based on 2009 NPC report, www.npcnow.org/publication/demystifying-comparative-effectiveness- research-case-study-learning-guide 2 http://www.cancer.gov/cancertopics/druginfo/fda-cetuximab viewed as positive. Safety between the two treatment arms was similar: approximately 80% of patients in each arm experienced a rash. Grade 3 or 4 (the more severe) toxic effects on the skin were slightly more frequent in the combination-therapy group compared to cetuximab monotherapy, observed in 9.4% and 5.2% of participants, respectively. Other side effects such as diarrhea and neutropenia observed in the combination-therapy arm were considered to be in the range expected for irinotecan alone. Data from this study demonstrated the efficacy and safety of cetuximab and were instrumental in the FDA’s 2004 approval. A second CRT (2007) examined 572 patients and suggested efficacy of cetuximab in the treatment of mCRC. This study was a randomized, nonblinded, controlled trial that examined cetuximab monotherapy plus best supportive care compared to best supportive care alone in patients who had received and failed prior chemotherapy regimens. It reported that median overall survival (the primary endpoint) was significantly higher in patients receiving cetuximab plus best supportive care compared to best supportive care alone (6.1 vs. 4.6 months, respectively) (hazard ratio for death=0.77; 95% CI: 0.64- 0.92, P=0.005). This RCT described a greater incidence of adverse events in the cetuximab plus best supportive care group compared to best supportive care alone including (most significantly) rash, as well as edema, fatigue, nausea and vomiting. Was this the right answer? These RCTs had fairly broad enrollment criteria and the cetuximab benefits were modest. Emerging scientific theories raised the possibility that genetically defined population subsets might experience a greater-thanaverage treatment benefit. One such area of inquiry entailed examining “biomarkers,” or genetic indicators of a patient’s greater response to therapy. Even as the above RCTs were being conducted, data emerged showing the importance of the KRAS gene. Emerging Data Based on the emerging biochemical evidence that the epidermal growth factor receptor (EGFR) treatment mechanism (Cetuximab,) was even more finely detailed than previously understood, the study authors of the 2007 RCT undertook a retrospective subgroup analysis using tumor tissue samples preserved from their initial study. Following laboratory analysis, all viable tissue samples were classified as having a wild-type (non-mutated) or a mutated KRAS gene. Instead of the previous two study arms (cetuximab plus best supportive care vs. best supportive care alone), there were 4 for this new analysis: each of the two original study arms was further divided by wild-type vs. mutated KRAS status. Laboratory evaluation determined that 40.9% and 42.3% of all patients in the RCT had a KRAS mutation in the cetuximab plus best supportive care group compared to the best supportive care group alone, respectively. The efficacy of cetuximab was found to be significantly correlated with KRAS status: in patients with wild-type (non-mutated). KRAS genes, cetuximab plus best supportive care compared to best supportive care alone improved overall survival (median 9.5 vs. 4.8 months, respectively; hazard ratio for death=0.55; 95% CI, 0.41-0.74, P<0.001), and progression-free survival (median 3.7 vs. 1.9 months, respectively; hazard ratio for progression or death=0.40; 95% CI, 0.30-0.54, P<0.001). Meanwhile, in patients with mutated KRAS tumors, the authors found no significant difference in outcome between cetuximab plus best supportive care vs. best supportive care alone. What next? Based on these and similar results from other studies, the FDA narrowed its product labeling in July 2009 to indicate that cetuximab is not recommended for mCRC patients with mutated KRAS tumors. This distinction reduces the relevant population by approximately 40%. Similarly, the American society of Clinical oncology released a provisional clinical recommendation that all mCRC patients have their tumors tested for KRAS status before receiving anti-EGFR therapy. The benefits of targeted treatment are many. Patients who previously underwent cetuximab therapy without knowing their genetic predisposition would no longer have to be exposed to the drug’s toxic effects if unnecessary, as the efficacy of cetuximab is markedly higher in the genetically defined appropriate patients. In a less-uncertain environment, clinicians can be more confident in advocating a course of action in their care of patients. And finally, knowledge that targeted therapy is possible suggests the potential for further innovation in treatment options. In fact, research continues to demonstrate options for targeted cetuximab treatment of mCRC at an even finer scale than seen with KRAS; and similar genetic targeting is being investigated, and advocated, in other cancer types. Lessons Learned From this case Study Although RCTs are generally viewed as the gold standard, results of one or even a series of trials may not accurately reflect the benefits experienced by an individual patient. This case-study suggests that cetuximab initially appeared to have rather modest clinical benefits. Albeit, new information that became available and subsequent genetic subgroup assessments led to very different conclusions. Clinicians should be aware that the current knowledge is likely to evolve and any decisions about patient care should be carefully considered with that sense of uncertainty in mind. As in this case study, subgroup analyses (e.g., genetic subtypes) need a theoretical rationale. Ideally, the analyses should be determined at the time of original RCT design and should not just occur as explorations of the subsequent data. When improperly employed, post hoc analyses may lead to incorrect patient care conclusions. RCTs Tips for the CER Practitioners o RCTs can determine whether an intervention can provide benefit in a very controlled environment. o The controlled nature of an RCT may limit its generalizability to a broader population. o No results are permanent; advances in scientific knowledge and understanding can influence how we view the effectiveness (or safety) of a therapeutic intervention. o Targeted therapy illuminated by carefully thought out subgroup analyses can improve the efficacious and safe use of an intervention. Case-Study 2: The Rosiglitazone Study 3 Meta-analysis Often the results for the same intervention differ across clinical trials and it may not be clear whether one therapy provides more benefit than another. As CER increases and more studies are conducted, clinicians and policymakers are more likely to encounter this scenario. In a systematic review, a researcher identifies similar studies and displays their results in a table, enabling qualitative comparisons across the studies. With a meta-analysis, the data from included studies are statistically combined into a single “result.” Merging the data from a number of studies increases the effective sample size of the investigation, providing a statistically stronger conclusion about the body of research. By so doing, investigators may detect low frequency events and demonstrate more subtle distinctions between therapeutic alternatives. When studies have been properly identified and combined, the meta-analysis produces a summary estimate of the findings and a confidence interval that can serve as a benchmark in medical opinion and practice. However, when done incorrectly, the quantitative and statistical analysis can create impressive “numbers” but biased results. The following are important criteria for properly conducted meta-analyses: 1. Carefully defining unbiased inclusion or exclusion criteria for study selection 2. Including only those studies that have similar design elements, such as patient population, drug regimen, outcomes being assessed, and timeframe 3. Applying correct statistical methods to combine and analyze the data Reporting this information is essential for the reader to determine whether the data were suitable to combine, and if the meta-analysis draws unbiased conclusions. Metaanalyses of randomized clinical trials are considered to be the highest level of medical evidence as they are based upon a synthesis of rigorously controlled trials that systematically reduce bias and confounding. This technique is useful in summarizing available evidence and will likely become more common in the era of publicly funded comparative effectiveness research. The following case study will examine several key principles that will be useful as the reader encounters these publications. Clinical Application Heart disease is the leading cause of mortality in the United States, resulting in approximately 20% of all deaths. Diabetics are particularly susceptible to heart disease, with more than 65% of deaths attributable to it. The nonfatal complications of diabetes are wide-ranging and include kidney failure, nerve damage, amputation, stroke and blindness, among other outcomes. In 2007, the total estimated cost of diabetes in the United States was $174B; $116B was derived from direct medical expenditures and the rest from the indirect cost of lost productivity due to the disease. With such serious health effects and heavy direct and indirect costs tied to diabetes, 3 http://www.nejm.org/doi/full/10.1056/NEJMoa072761 proper disease management is critical. Historically, diabetes treatment has focused on strict blood sugar control, assuming that this goal not only targets diabetes but also reduces other serious comorbidities of the disease. Anti-diabetic agents have long been associated with key questions as to their benefits/risks in the treatment of diabetes. the sulfonylurea tolbutamide, a first generation anti-diabetic drug, was found in a landmark study in the 1970s to significantly increase the CV mortality rate compared to patients not on this agent. Further analysis by external parties concluded that the methods employed in this trial were significantly flawed (e.g., use of an “arbitrary” definition of diabetes status, heterogeneous baseline characteristics of the populations studied, and incorrect statistical methods). Since these early studies, CV concerns continue to be an issue with selected oral hypoglycemic agents that have subsequently entered the marketplace. A class of drugs, thiazolidinedione (TZD), was approved in the late 1990s, as a solution to the problems associated with the older generation of sulfonylureas. Rosiglitazone, a member of the TZD class, was approved by the FDA in 1999 and was widely prescribed for the treatment of type-2 diabetes. A number of RCTs supported the benefit of rosiglitazone as an important new oral antidiabetic agent. However, safety concerns developed as the FDA received reports of adverse cardiac events potentially associated with rosiglitazone. It was in this setting that a meta-analysis by Nissen and Wolski was published in the New England Journal of Medicine in June 2007. What was done? Nissen and Wolski conducted a meta-analysis examining the impact of rosiglitazone on cardiac events and mortality compared to alternative therapeutic approaches. The study began with a broad search to locate potential studies for review. The authors screened published phase II, III, and IV trials; the FDA website; and the drug manufacturer’s clinical-trial registry for applicable data relating to rosiglitazone use. When the initial search was complete, the studies were further categorized by prestated inclusion criteria. Meta-analysis inclusion criteria were simple: studies had to include rosiglitazone and a randomized comparator group treated with either another drug or placebo, study arms had to show similar length of treatment, and all groups had to have received more than 24 weeks of exposure to the study drugs. The studies had to contain outcome data of interest including the rate of myocardial infarction (MI) or death from all CV causes. Out of 116 studies surveyed by the authors, 42 met their inclusion criteria and were included in the meta-analysis. Of the studies they included, 23 had durations of 26 weeks or less, and only five studies followed patients for more than a year. Until this point, the study’s authors were following a path similar to that of any reviewer interested in CV outcomes, examining the results of these 42 studies and comparing them qualitatively. Quantitatively combining the data, however, required the authors to make choices about the studies they could merge and the statistical methods they should apply for analysis. Those decisions greatly influenced the results that were reported. What was found? When the studies were combined, the meta-analysis contained data from 15,565 patients in the rosiglitazone group and 12,282 patients as comparators. Analyzing their data, the authors chose one particular statistical method (the Peto odds ratio method, a fixed-effect statistical approach), which calculates the odds of events occurring where the outcomes of interest are rare and small in number. In comparing rosiglitazone with a “control” group that included other drugs or placebo, the authors reported odds ratios of 1.43 (95% CI, 1.03-1.98; P=0.03) and 1.64 (95% CI, 0.98-2.74; P=0.06) for MI and death from CV causes, respectively. In other words, the odds of an MI or death from a CV cause are higher for rosiglitazone patients than for patients on other therapies or placebo. The authors reported that rosiglitazone was significantly associated with an increase in the risk of MI and had borderline significance in increasing the risk of death from all CV causes. These findings appeared online on the same day that the FDA issued a safety alert regarding rosiglitazone. Discussion of the meta-analysis was immediately featured prominently in the news media. By December 2007, prescription claims for the drug at retail pharmacies had fallen by more than 50%. As diabetic patients and their clinicians reacted to the news, a methodologic debate also ensued. This discussion included statistical issues pertaining to the conduct of the analysis, its implications for clinical care, and finally the FDA and drug manufacturer’s roles in overseeing and regulating rosiglitazone. The concern among patients with diabetes regarding treatment, continues in the medical community today. Was this the right answer? Should the studies have been combined? Commentators faulted the authors for including several studies that were not originally intended to investigate diabetes, and for combining both placebo and drug therapy data into one comparator arm. Some critics noted that despite the stated inclusion criteria, some data were derived from studies where the rosiglitazone arm was allowed a longer follow-up than the comparator arm. By failing to account for this longer follow-up period, commentators felt that the authors may have overestimated the effect of rosiglitazone on CV outcomes. Many reviewers were concerned that this meta-analysis excluded trials in which no patients suffered an MI or died from CV causes – the outcomes of greatest interest. Some reviewers also noted that the exclusion of zero-event trials from the pooled dataset not only gave an incomplete picture of the impact of rosiglitazone but could have increased the odds ratio estimate. In general, the pooled dataset was criticized by many for being a faulty microcosm of the information available regarding rosiglitazone. It is essential that a meta-analysis be based on similarity in the data sources. If studies differ in important areas such as the patient populations, interventions, or outcomes, combining their data may not be suitable. The researchers accepted studies and populations that were clinically heterogeneous, yet pooled them as if they were not. The study reported that the results were combined from a number of trials that were not initially intended to investigate CV outcomes. Furthermore, the available data did not allow for time-to-event analysis, an essential tool in comparing the impact of alternative treatment options. Reviewers considered the data to be insufficiently homogeneous, and the line of cause and effect to be murkier than the authors described. Were the statistical methods optimal? The statistical methods for this meta-analysis also came under significant criticism. The critiques focused on the authors’ use of the Peto method as being an incorrect choice because data were pooled from both small and very large studies, resulting in a potential overestimation of treatment effect. Others reviewers pointed that the Peto method should not have been used, as a number of the underlying studies did not have patients assigned equally to rosiglitazone and comparator groups. Finally, critics suggested that the heterogeneity of the included studies required an altogether different set of analytic techniques. Demonstrating the sensitivity of the authors’ initial analysis to the inclusion criteria and statistical tests used, a number of researchers reworked the data from this study. one researcher used the same studies but analyzed the data with a more commonly used statistical method (Mantel-Haenszel), and found no significant increase in the relative risk or common odds ratio with MI or CV death. When the pool of studies was expanded to include those originally eliminated because they had zero CV events, the odds ratios for MI and death from CV causes dropped from 1.43 to 1.26 (95% CI, 0.931.72) and from 1.64 to 1.14 (95% CI, 0.74-1.74), respectively. Neither of the recalculated odd ratios were significant for MI or CV death. Finally, several newer longterm studies have been published since the Nissen meta-analysis. Incorporating their results with the meta-analysis data showed that rosiglitazone is associated with an increased risk of MI but not of CV death. Thus, the findings from these meta-analyses varied with the methods employed, the studies included, and the addition of later trials. Emerging Data The controversy surrounding the rosiglitazone meta-analysis authored by Nissen and Wolski forced an unplanned interim analysis of a long-term, randomized trial investigating the CV effects of rosiglitazone among patients with type 2 diabetes. The authors of the RECORD trial noted that even though the follow-up at 3.75 years was shorter than expected, rosiglitazone, when added to standard glucose-lowering therapy, was found to be associated with an increase in the risk of heart failure but was not associated with any increase in death from CV or other causes. Data at the time were found to be insufficient to determine the effect of rosiglitazone on an increase in the risk of MI. the final report of that trial, published in June 2009, confirmed the elevated risk of heart failure in people with type 2 diabetes treated with rosiglitazone in addition to glucose-lowering drugs, but continued to show inconclusive results about the effect of the drug therapy on the risk of MI. Further, the RECORD trial clarified that rosiglitazone does not result in an increased risk of CV morbidity or mortality compared to standard glucose-lowering drugs. Other trials conducted since the publishing of the meta-analysis have corroborated these results, casting further doubt on the findings of the meta-analysis published by Nissen and Wolski. Now what? Some sources suggest that the original Nissen meta-analysis delivered more harm than benefit, and that a well-recognized medical journal may have erred in its process of peer review. Despite this criticism, it is important to note that subsequent publications support the risk of adverse CV events associated with rosiglitazone, although rosiglitazone use does not appear to increase deaths. These results and emerging data point to the need for further rigorous research to clarify the benefits and risks of rosiglitazone on a variety of outcomes, and the importance of directing the drug to the population that will maximally benefit from its use. Lessons Learned From this case Study Results from initial randomized trials that seem definitive at one time may not be conclusive, as further trials may emerge to clarify, redirect, or negate previously accepted results. A meta-analysis of those trials can lead to varying results based upon the timing of the analysis and the choices made in its performance. Meta-Analysis: Tips for CER Practitioners o o o o The results of a meta-analysis are highly dependent on the studies included (and excluded). Are these criteria properly defined and relevant to the purposes of the meta-analysis? Were the combined studies sufficiently similar? Can results from this cohort be generalized to other populations of interest? The statistical methodology can impact study results. Have there been reviews critiquing the methods used in the meta-analysis? A variety of statistical tests should be considered, and perhaps reported, in the analysis of results. Do the authors mention their rationale in choosing a statistical method? Do they show the stability of their results across a spectrum of analytical methods? Nothing is permanent. Emerging data may change the playing field, and meta- analysis results are only as good as the data and statistics from which they are derived. Case-Study 3: The Nurses’ Health Study 4 An observational study An observational study is a very common type of research design in which the effects of a treatment or condition are studied without formally randomizing patients in an experimental design. Such studies can be done prospectively, wherein data are collected about a group of patients going forward in time; or retrospectively, in which the researcher looks into the past, mining existing databases for data that have already been collected. Latter studies are frequently performed by using an electronic database that contains, for example, administrative, “billing,” or claims data. Less commonly, observational research uses electronic health records, which have greater clinical information that more closely resembles the data collected in an RCT. Observational studies often take place in “real- world” environments, which allow researchers to collect data for a wide array of outcomes. Patients are not randomized in these studies, but the findings can be used to generate hypotheses for investigation in a more constrained experimental setting. Perhaps the best known observational study is the “Framingham study,” which collected demographic and health data for a group of individuals over many years (and continues to do so) and has provided an understanding of the key risk factors for heart disease and stroke. Observational studies present many advantages to the comparative effectiveness researcher. the study design can provide a unique glimpse of the use of a health care intervention in the “real world,” an essential step in gauging the gap between efficacy (can a treatment work in a controlled setting?) and effectiveness (does the treatment work in a real-life situation?). Furthermore, observational studies can be conducted at low cost, particularly if they involve the secondary analysis of existing data sources. CER often uses administrative databases, which are based upon the billing data submitted by providers during routine care. These databases typically have limited clinical information, may have errors in them, and generally do not undergo auditing. The uncontrolled nature of observational studies allows them to be subject to bias and confounding. For example, doctors may prescribe a new medication only for the sickest patients. Comparing these outcomes (without careful statistical adjustment) with those from less ill patients receiving alternative treatment may lead to misleading results. Observational studies can identify important associations but cannot prove cause and effect. These studies can generate hypotheses that may require RCTs for fuller demonstration of those relationships. Secondary analysis can also be problematic if researchers overwork datasets by doing multiple exploratory analyses (e.g., datadredging): the more we look, the more we find, even if those findings are merely statistical aberrations. Unfortunately, the growing need for CER and the wide availability of administrative databases may lead to selection of research of poor quality with inaccurate findings. 4 http://jech.bmj.com/content/59/9/740.short In comparative effectiveness research, observational studies are typically considered to be less conclusive than RCTs and meta-analyses. Nonetheless, they can be useful, especially because they examine typical care. Due to lower cost and improvements in health information, observational studies will become increasingly common. Critical assessment of whether the described results are helpful or biased (based upon how the study was performed) are necessary. This case will illustrate several characteristics of the types of studies that will assist in evaluating newly published work. Clinical Applications Cardiovascular diseases (CVD) are the leading cause of death in women older than the age of 50. Epidemiologic evidence suggests that estrogen is a key mediator in the development of CVD. Estrogen is an ovarian hormone whose production decreases as women approach menopause. The steep increase in CVD in women at menopause and older and in women who have had hysterectomies further supports a relationship between estrogen and CVD. Building on this evidence of biologic plausibility, epidemiological and observational studies suggested that estrogen replacement therapy (a form of hormone replacement therapy, or HRT) had positive effects on the risk of CVD in postmenopausal women, (albeit with some negative effects in its potential to increase the risk for breast cancer and stroke).65 Based on these findings, in the 1980s and 1990s HRT was routinely employed to treat menopausal symptoms and serve as prophylaxis against CVD. What was done? The Nurses’ Health Study (NHS) began collecting data in 1976. In the study, researchers intended to examine a broad range of health effects in women over a long period of time, and a key goal was to clarify the role of HRT in heart disease. The cohort (i.e., the group being followed) included married registered nurses aged 30-55 in 1976 who lived in the 11 most populous states. To collect data, the researchers mailed the study participants a survey every 2 years that asked questions about topics such as smoking, hormone use, menopausal status, and less frequently, diet. Data were collected for key end points that included MI, coronary-artery bypass grafting or angioplasty, stroke, total CVD mortality, and deaths from all causes. What was found? At a 10-year follow-up point, the NHS had a study pool of 48,470 women. The researchers found that estrogen use (alone, without progestin) in postmenopausal women was associated with a reduction in the incidence of CVD as well as in CVD mortality compared to non-users. Later, estrogen-progestin combination therapy was shown to be even more cardioprotective than estrogen monotherapy, and lower doses of estrogen replacement therapy were found to deliver equal cardioprotection and lower the risk for adverse events. NHS researchers were alert to the potential for bias in observational studies. Adjustment for risk factors such as age (a typical practice to eliminate confounding) did not change the reported findings. Was this the right answer? The NHS was not unique in reporting the benefits associated with HRT; other observational studies corroborated the NHS findings. A secondary retrospective data analysis of the UK primary care electronic medical record database, for example, also showed the protective effect associated with HRT use. Researchers were aware of the fundamental limitations of observational studies, particularly with regard to selection bias. They and practicing clinicians were also aware of the potential negative health effects of HRT, which had to be constantly weighed against the potential cardioprotective benefits in deciding a patient’s course of treatment. As a large section of the population could experience the health effects of HRT, researchers began planning RCTs to verify the promising observational study results. It was highly anticipated that those RCTs would corroborate the belief that estrogen replacement can reduce CVD risk. Randomized Controlled Trial: The Women’s Health Initiative The Women’s health Initiative (WHI) was a major study established by the National Institutes of health in 1992 to assess a broad range of health effects in postmenopausal women. The trial was intended to follow these women for 8 years, at a cost of millions of dollars in federal funding. Among its many facets, it included an RCT to confirm the results from the observational studies discussed above. To fully investigate earlier findings, the WHI had two subgroups. One subgroup consisted of women with prior hysterectomies; they received estrogen monotherapy. The second group consisted of women who had not undergone hysterectomy; they received estrogen in combination with progestin. The WHI enrolled 27,347 women in their HRT investigation: 10,739 in the estrogen-alone arm and 16,608 in the estrogen plus progestin arm. Within each arm, women were randomly assigned to receive either HRT or placebo. All women in the trial were postmenopausal and aged 50-79 years; the mean age was 63.6 years (a fact that would be important in later analysis). Some participants had experienced previous CV events. The primary outcome of both subgroups was coronary heart disease (CHD), as described by nonfatal MI or death due to CHD. The estrogen-progestin arm of the WHI was halted after a mean follow-up of 5.2 years, 3 years earlier than expected, as the HRT users in this arm were found to be at increased risk for CHD compared to those who received placebo. The study also noted elevated rates of breast cancer and stroke, among other poor outcomes. The estrogenalone arm continued for an average follow-up of 6.8 years before being similarly discontinued ahead of schedule. Although this part of the study did not find an increased risk of CHD, it also did not find any cardioprotective effect. Beyond failing to locate any clear CV benefits, the WHI also found real evidence of harm, including increased risk of blood clots, breast cancer and stroke. Initial WHI publications therefore recommended against HRT being prescribed for the secondary prevention of CVD. What Next? Scientists and the clinicians who relied on their data for guidance in treating patients, were faced with conflicting data: epidemiological and observational studies suggested that HRT was cardioprotective while the higher-quality evidence from RCTs strongly suggested the opposite. Clinicians primarily followed the WHI results, so prescriptions for HRT in postmenopausal women quickly declined. Meanwhile, researchers began to analyze the studies for potential discrepancies, and found that the women being followed in the NHS and the WHI differed in several important characteristics. First, the WHI population was older than the NHS cohort, and many had entered menopause at least 10 years before they enrolled in the RCT. Thus, the WHI enrollees experienced a long duration from the onset of menopause to the commencement of HRT. At the same time, many in the NHS population were closer to the onset of menopause and were still displaying hormonal symptoms when they began HRT. Second, although the NHS researchers adjusted the data for various confounding effects, their results could still have been subject to bias. In general, the NHS cohort was more highly educated and of a higher socioeconomic status than the WHI participants, and therefore more likely to see a physician regularly. The NHS women were also leaner and generally healthier than their RCT counterparts, and had been selected for their evident lack of pre-existing CV conditions. This selection bias in the NHS enrollment may have led to a “healthy woman” effect that in turn led to an overestimation of the benefits of therapy in the observational study. Third, researchers noted that dosing differences between the two study types may have contributed to the divergent results. The NHS reported beneficial results following low-dose estrogen therapy. The WHL, meanwhile, used a higher estrogen dose, exposing women to a larger dosage of hormones and increasing their risk for adverse events. The increased risk profile of the WHI women (e.g., older, more comorbidities, higher estrogen dose) could have contributed to the evidence of harm seen in the WHI results. Emerging Data In addition to identifying the inherent differences between the two study populations, researchers began a secondary analysis of the NHS and WHI trials. NHS researchers reported that women who began HRT close to the onset of menopause had a significantly reduced risk of CHD. In the subgroups of women that were older and had a similar duration after menopause compared with the WHI women, they found no significant relationship between HRT and CHD. Also, the WHI study further stratified these results by age, and found that women who began HRT close to their onset of menopause experienced some cardioprotection, while women who were further from the onset of menopause had a slightly elevated risk for CHD. Secondary analysis of both studies was therefore necessary to show that age and a short duration from the onset of menopause are crucial to HRT success as a cardioprotective agent. Neither study type provided “truth” or rather, both studies provided “truth” if viewed carefully (e.g., both produced valid and important results). The differences seen in the studies were rooted in the timing of HRT and the populations being studied.81 Lessons Learned From this case Study Although RCTs are given a higher evidence grade, observational studies provide important clinical insights. In this example, the study populations differed. For policymakers and clinicians, it is crucial to examine whether the CER was based upon patients similar to those being considered. Any study with a dissimilar population may provide non-relevant results. Thus, readers of CER need to carefully examine the generalizability of the findings being reported.