Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Speed Dating with Regression Procedures David J Corliss, PhD Wayne State University Physics and Astronomy / Public Outreach Model Selection Flowchart NON-LINEAR LINEAR MIXED NON-PARAMETRIC Decision: Continuous or Discrete Outcome PROC LOGISTIC PROC REG Simple Linear Regression • Regression Type: Continuous, linear • General regression procedure with a number of options but limited specialized capabilities, for which other procedures or packages have been developed • Choice of model variable selection methods (e.g., Forward, Backwards, Best Subsets), can be coded for polynomial regression, multiple model statements and features interactive capability • SAS = REG, R = lm function, regress Simple Linear Regression Example: Homeless Students by State Actual Percent r2=.652 Model - Percent of Student Population Solid performance of the model across the range from low to high homelessness states indicates consistency of factors correlated with the number of homeless students Special Data Needs: Problems with Outliers Robust Regression • Regression Type: Continuous, linear • Robust regression is achieved by identifying outliers, limiting their influence by assigning weights and then performing standard regression • Choice of methods for outlier detection e.g. M, LTS, S and MM estimation; robust ANOVA • SAS = ROBUSTREG, R = robustbase, robust PROC ROBUSTREG Example: Log-Log Regression With Weighted Outliers SAS/STAT® 9.2 User’s Guide, support.sas.com In Robust Regression, the outliers need not be disregarded: weights can be assigned and incorporated in the regression Special Data Needs: Ill-Conditioned Data Regression Using Givens Rotations • Regression Type: Continuous, linear • Regression using the Gentleman-Givens procedure instead of collecting crossproducts • For ill-conditioned data, where small errors in the data may cause large errors in the results – more accurate than simple regression • SAS = ORTHOREG, R = givens Givens Rotation Regression Example: Fitting a Higher-Order Polynomial SAS/STAT® 9.2 User’s Guide, support.sas.com An example of fitting a 9th-degree polynomial, where near singularities must be distinguished from true ones Special Data Needs: Transformation Regression with Data Transformation • Regression Type: Continuous, linear • Regression with a number of data transformations, including smooth, spline, Box-Cox and other nonlinear forms • Supports fitting splines with a user-specified degree and number of knots; capable of piece-wise solutions • SAS = TRANSREG, R = reg, betareg Regression with Data Transformation Example: Spline Regression to a Complex Form Splines used to fit to a spectrographic line profile to determine the radial velocity of erupting gas from a star Special Model Types: General Linear General Linear Models • Regression Type: Continuous, linear • General purpose procedure for continuous least squares regression using classification predictor variables as well as continuous • While capable of many types of models and analysis, another procedure is often better for a specific task • SAS = GLM, R = glm function General Linear Model Example: Age Group as a Categorical Predictor Variable Distribution of Response An Overview of ODS Statistical Graphics in SAS® 9.3 Robert N. Rodriguez, SAS Institute Inc., Cary, NC agegroup GLM used with Box and Whisker output Special Model Types: By Quantile Quantile Regression • Regression Type: Continuous, linear • Quantile regression: while other procedures model the mean, quantile regression models the median and other specified quantiles to provide a more complete picture of the response variable • Uncertainties for individual quantiles can be estimated by bootstrapping • SAS = QUANTREG, R = quantreg Quantile Regression Example: 5/10/ 25/50/75/90/95% Quantiles Predicted birth weight by maternal weight gain Quantile regression with PROC QUANTREG Peter L. Flom, Peter Flom Consulting, New York, NY An example of Quantile Regression demonstrating greater detail than possible with ordinary regression Special Model Types: PLS, PCA Regression Partial Least Squares & Principal Components • Regression Type: Continuous, linear • Partial Least Squares and Principal Component regression: predictor and response variables are projected into a new coordinate systems, possibly with reduced complexity • Supports reduced rank regression with cross validation of the number of components • SAS = PLS, R = pls Partial Least Squares / Principal Components Example: Variable Importance Plot Quantile regression with PROC QUANTREG Peter L. Flom, Peter Flom Consulting, New York, NY Principal Component variables derived from the original, observed variables Special Model Types: Survey Data Survey Regression • Regression Type: Continuous, linear • Special capabilities for analysis in the presence of common survey data features, including stratification, clustering and weighting • Supports several methods for sampling and estimation of sampling error using either Taylor series or primary sample units • SAS = SURVEYREG, R = survey Survey Regression Example: Regression with Stratified Sampling Stratum Information Stratum Index State Note: Parameter Region N Obs Population Total Sampling Rate 1 Iowa 1 3 100 3.00% 2 2 5 50 10.0% 3 3 3 15 20.0% 4 Nebraska 1 6 30 20.0% 5 2 2 40 5.00% Estimated Regression Coefficients The denominator degrees of freedom for Standard is 14. Estimatethe F testsError t Value Tests of Model Effects Effect Num DF F Value Pr > F Pr > |t| Model 1 21.74 0.0004 Intercept 11.8162978 5.31981027 2.22 0.0433 Intercept 1 4.93 0.0433 FarmArea 0.2126576 0.04560949 4.66 0.0004 FarmArea 1 21.74 0.0004 Covariance of Estimated Regression Coefficients Intercept FarmArea Intercept 28.300381277 -0.146471538 FarmArea -0.146471538 0.0020802259 PROC SURVEYREG sas.support.com, example 98.4 Example output from application to survey data, with summary statistics and model parameters Special Model Types: PH on Survey Data Proportional Hazards with Survey Data • Regression Type: Continuous, linear • Performs Cox Proportional Hazards modeling on survey data with truncation, supporting stratification, clustering and weighting • Performs estimation of variance by model parameters by Taylor series, BRR or Jackknife • SAS = SURVEYPHREG, R = survey Proportional Hazards with Survey Data Example: Stratified Sampling with Truncated Data Analysis of Maximum Likelihood Estimates DF Estimate Standard Error t Value Pr > |t| Hazard Ratio BodyWeight 586 0.011920 0.003155 3.78 0.0002 1.012 Smoke -1 586 -1.174048 0.739450 -1.59 0.1129 0.309 Smoke 1 586 -1.006515 0.578810 -1.74 0.0826 0.365 Smoke 2 586 -0.674183 0.558412 -1.21 0.2278 0.510 Smoke 3 586 0 . . . 1.000 Parameter Type III Tests of Model Effects Effect Num DF Den DF F Value Pr > F BodyWeight 1 586 14.27 0.0002 Smoke 3 586 1.49 0.2160 Estimate Label Row 1 Estimate Standard Error DF t Value Pr > |t| Exponentiated -0.7532 0.3870 586 -1.95 0.0521 0.4709 PROC SURVEYPHREG sas.support.com, example 97.2 Example output for Proportional Hazards regression on survey data with truncation: summary statistics and model parameters Special Model Types: Categorical Regression on Categorical Data • Regression Type: Continuous, linear • A generalization of continuous methods to categorical data, performs linear regression and other analyses on data than can be expressed in a contingency tables • Supports both ordinary and logistic regression, loglinear and repeated measures • SAS = CATMOD, R = catdata, vgam Regression on Categorical Data Example: Bartlett's Data, No 3-Variable Interaction Response Profiles Data Summary Response Length*Time*Status Response Levels 8 Response Length Time Status Weight Variable wt Populations 1 1 1 1 1 Data Set BARTLETT Total Frequency 960 2 1 1 2 Frequency Missing 0 Observations 8 3 1 2 1 4 1 2 2 5 2 1 1 6 2 1 2 7 2 2 1 8 2 2 2 Maximum Likelihood Analysis of Variance Source DF Chi-Square Pr > ChiSq Length 1 2.64 0.1041 Time 1 5.25 0.0220 Length*Time 1 5.25 0.0220 Status 1 48.94 <.0001 Length*Status 1 48.94 <.0001 Time*Status 1 95.01 <.0001 Likelihood Ratio 1 2.29 0.1299 PROC CATMOD sas.support.com, example 28.4 Example output from regression on categorical data, with summary statistics and model parameters Special Model Types: Complex Optimization Response Surface Regression • Regression Type: Continuous, linear • Linear regression for fitting quadratic Response Surface Models – a type of general linear model that identifies where optimal response values occur more efficiently than ordinary regression or GLM • Output displays the Response Surface and identifies ridges of optimum response • SAS = RSREG, R = rsm Response Surface Regression Example: A Response Surface with Optimal Solution Quantile regression with PROC QUANTREG Peter L. Flom, Peter Flom Consulting, New York, NY An example of a response surface with the optimal solution found at the minimum; multiple minima and maxima are possible Special Model Types: Time to Failure Survival Analysis • Regression Type: Continuous, linear • Models time to failure data as a linear combination of predictors and a random disturbance term, which can be described by many different distributions • Supports standard survival analysis data censored on the right, left, both or neither • SAS = LIFEREG, R = survival Survival Analysis Example: A Cumulative Hazard Model Quantile regression with PROC QUANTREG Peter L. Flom, Peter Flom Consulting, New York, NY This example plots the log-logistic vs. the Kaplan-Meier Cumulative Hazard Special Model Types: Time-dependent Risk Proportional Hazards Model • Regression Type: Continuous, linear • Cox Proportional Hazards modeling, where the a unit increase in a predictor multiplies the risk by a factor determined by the model • Supports proportional hazards models with data censored on the right, left, both or neither, variable selection by multiple methods incl. best subset • SAS = PHREG, R = coxph Proportional Hazards Model Example: Model With Time-Dependent Predictors Example output from a Proportional Hazards model, with summary statistics and model parameters Special Model Types: Simultaneous Outcomes Structural Equation Models • Regression Type: Continuous, linear • In Structural Equation Modeling, a linear combination of predictors describes a vector equal to a linear combination of outcome variables • • Supports latent variables, multiple and multivariate regression, path analysis and canonical correlation • SAS = CALIS, R = sem Structural Equation Model Example: Linear Relations among Factor Loadings Example output from a Structural Equation model, with matrices of model parameters Discrete Outcomes: Simple Logistic Logistic Regression • Regression Type: binary & ordinal outcomes, linear • General procedure for logistic regression with a number of options; other procedures may offer more capabilities for specific types of discrete models • Supports many model variable selection methods and diagnostic tests • SAS = LOGISTIC, R = glm function Discrete Outcomes: Simple Logistic Logistic Regression Data: IDRE / UCLA Example data and output from a Logistic Regression model, with summary statistics and model parameters Discrete Outcomes: Generalized General Linear Models • Regression Type: discrete outcomes, linear • Generalized linear models with discrete outcomes, appropriate where the data are not normally distributed or the variance is not the same for all observations • • Supports Poisson Regression and Repeated Measures • SAS = GENMOD, R = glm function Discrete Outcomes: Generalized General Linear Models Example output from a General Linear Regression model of a discrete outcome, with summary statistics and model parameters Discrete Outcomes: Outcome Probability PROBIT Models • Regression Type: discrete outcomes, linear • Models the probability that an observation will have a particular outcome • Supports probit, logit, ordinal logistic, and extreme value / gompit • SAS = PROBIT, R = glm, family = binomial(link = "probit") Discrete Outcomes: Outcome Probability PROBIT Models Example data and output from a PROBIT model, with summary statistics and model parameters Non-Linear Models: General Non-Linear Models • Regression Type: non-linear • Performs non-linear regression with the dependent variable divided into a mean component and a (random) error component; process is iterative • Supports steepest-descent, Newton, modified GaussNewton and Marquardt methods • SAS = NLIN, R = nls function, nleqslv Non-Linear Models Example: Fitting a Model to a Complex Curve In this example observations are normally distributed about a non-linear function – in this case, a Morlet wavelet Non-Linear Models: Mixed Effects Non-Linear Mixed-Effects Models • Regression Type: non-linear • Performs non-linear regression where both the mean and errors components of the dependent variable are non-linear; process uses a Taylor series expansion about zero • Supports normal, binomial and Poisson distributions and capability for programing a general distribution • SAS = NLMIXED, R = nlme Non-Linear Mixed-Effects Models Example: Plot of Profile of Trees Over Time In this example, variability the shape of observed trees increases over time Linear Mixed: Fixed and Random Effects Mixed Models • Regression Type: linear, fixed and random effects • Performs linear regression using a linear combination of fixed effects added to a second linear combination of random effects • Supports repeated measures in longitudinal studies; especially useful for dealing with missing data • SAS = MIXED, R = lme4, coxme Linear Mixed-Effects Models Example: Repeated Measures Example of a Mixed Effects Model, incorporating both fixed and random effects to improve the predictive power Linear Mixed: General General Mixed Models • Regression Type: linear mixed • Generalization of mixed models to permit normallydistributed random effects and non-normal error terms • Supports fitting models to correlated data or where the variability is not constant • SAS = GLIMMIX, R = lme4 General Mixed Models Example: Crossed Random Effects LOESS with crossed random effects analyzes in-breeding in an isolated population, allowing generalization to all populations Non-Parametric Models: Localized Local Regression • Regression Type: linear, non-parametric • Develops a model using non-parametric regression to segments of data and calculates confidence limits for • the outcome; computationally intensive • Supports multiple dependent variables, multidimensional predictors and interpolation using kd trees • SAS = LOESS, R = locfit Local Regression Example: Periodicities in Weather Data In this example, Local Regression is used to identify potential periodicities at 12 and 42 months Non-Parametric Models: Additive Generalized Additive Models • Regression Type: linear, non-parametric • Generalized Additive Models, with multiple independent non-parametric predictors; univariate smoothing provides finer details than is possible with the piece-wise LOESS procedure • Supports non-parametric and semi-paramentric models, multidimensional predictors • SAS = GAM, R = gam Additive Model Example: Segmented Response Surface An Additive Model used to fit a complex response surface without loss of detail to due piece-wise fitting in local regression Questions [email protected]