Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois What is secondary data? • Data collected by a person or organization other than the users of the data Survey Research Laboratory 2 of 27 Advantages of Secondary Data • Unobtrusive • Fast & inexpensive • Avoid data collection problems • Provide bases for comparison Survey Research Laboratory 3 of 27 Disadvantages of Secondary Data • Data availability • Level of observation • Quality of documentation • Data quality control • Outdated data Survey Research Laboratory 4 of 27 Data Sources Inter-university Consortium for Political and Social Research (ICPSR) http://www.icpsr.umich.edu/index-medium.html National Center for Health Statistics (NCHS) http://www.cdc.gov/nchs/default.htm Center for Medicare and Medicaid Services (CMS) http://cms.hhs.gov/researchers/ US Census Bureau http://www.census.gov/main/www/access.html Survey Research Laboratory 5 of 27 Data Sources (cont.) Examples of Directly Downloadable Data from NCHS: National Health and Nutrition Examination Survey (NHANES) National Ambulatory Medical Care Survey (NAMCS) National Hospital Ambulatory Medical Care Survey (NHAMCS) National Hospital Discharge Survey (NHDS) National Home and Hospice Care Survey (NHHCS) National Nursing Home Survey (NNHS) National Survey of Ambulatory Surgery (NSAS) National Employer Health Insurance Survey (NEHIS) National Vital Statistics System (NVSS) National Health Interview Survey (NHIS) Survey Research Laboratory 6 of 27 Survey Documentation & Analysis Web-based analysis and documentation • • • • http://sda.berkeley.edu/ http://www.icpsr.umich.edu/access/sda.html http://www.icpsr.umich.edu/NACJD/das.html http://www.icpsr.umich.edu/SAMHDA/ Survey Research Laboratory 7 of 27 Data Sources (cont.) Data Available for Use with Survey Documentation and Analysis (SDA): Aging Data • Longitudinal Study of Aging, 70 Years and Older, 1984-1990 • National Survey of Self-Care and Aging: Follow-Up, 1994 • National Health and Nutrition Examination Survey II: Mortality Study, 1992 • National Hospital Discharge Survey, 1994-1997 • National Health Interview Survey, 1994, Second Supplement on Aging Criminal Justice Data • International Crime Data • Homicide Data • National Crime Victimization Survey Data • Corrections Data Survey Research Laboratory 8 of 27 Data Sources (cont.) Data Available for Use with Survey Documentation and Analysis (continued): Substance Abuse Data • Drug Abuse Warning Network • Monitoring the Future • National Household Survey on Drug Abuse • National Pregnancy and Health Survey • National Treatment Improvement Evaluation Study • Treatment Episode Data Set • Uniform Facility Data Set • Washington, DC Metropolitan Area Drug Study (DC*MADS) Survey Research Laboratory 9 of 27 Evaluation of Data Sources • • • • • Purpose of the study Sponsor/collector of the data Mode of data collection Sampling procedures Consistency of data with other sources Survey Research Laboratory 10 of 27 Evaluation of Data Sources (cont.) • • • • • Documentation Number of observations Number of variables Coding scheme Summary statistics Survey Research Laboratory 11 of 27 Types of Survey Sample Design • Simple Random Sampling • Systematic Sampling • Complex sample designs ▪ stratified designs ▪ cluster designs ▪ mixed mode designs Survey Research Laboratory 12 of 27 Types of Survey Sample Design • Simple Random Sampling Each member of the population has an equal and known chance of being selected Simple Random Sample With Replacement (SRSWR) Simple Random Sample Without Replacement (SRSWOR) Survey Research Laboratory 13 of 27 Types of Survey Sample Design • Systematic Random Sampling the selection of every kth element from a sampling frame with the sampling interval k (=N/n). Survey Research Laboratory 14 of 27 Types of Survey Sample Design • Stratified sample The population is first divided into nonoverlapping subpopulations: strata such as gender, race or SES. Sample from each stratum. Proportionate vs. disproportionate Works most effectively when the variance of the dependent variable is smaller within the stratum than in the sample as a whole. Survey Research Laboratory 15 of 27 Types of Survey Sample Design • Cluster sample Elements are selected in groups or clusters PSU: Primary Sampling Unit. This is the first unit that is sampled in the design. For example, school districts from Chicago may be sampled and then schools within districts may be sampled. Homogeneity within cluster: Intracluster correlation (ICC) Survey Research Laboratory 16 of 27 Why complex survey design? • Increased efficiency • Decreased costs • Sometimes the only option available Survey Research Laboratory 17 of 27 Complex Survey Design • Complex designs with clustering and unequal selection probabilities generally increase the sampling variance. • Not accounting for the impact of complex sample design can lead to Type I error. Survey Research Laboratory 18 of 27 Sample Weights • “pweight” or selection weight: Used to adjust for differing probabilities of selection (=N/n). • In theory, simple random samples are self-weighted • In practice, simple random samples are likely to also require adjustments for non-response Survey Research Laboratory 19 of 27 Types of Sample Weights Post-stratification weights: • Typically used to adjust for minor differences in nonresponse by demographic subgroup. • Bring the sample proportions in demographic subgroups into agreement with the population proportion in the subgroups. • Requires auxiliary dataset to use as a comparison. • Not a fix for bad sample design Survey Research Laboratory 20 of 27 Post-Stratification Weights Example Sample Population Percent Percent Weight Male 42% 49% 1.16 Female 58% 51% .879 Survey Research Laboratory 21 of 27 Types of Sample Weights (cont.) Non-response weights: • Designed to inflate the weights of survey respondents to compensate for nonrespondents with similar characteristics. • Only useful if nonresponse varies by stratum (unless inflating sample size to population size). Survey Research Laboratory 22 of 27 Types of Sample Weights (cont.) “Blow-up” (expansion) weights: • Weights sum to population total • Provide estimates for the total population of interest Survey Research Laboratory 23 of 27 Types of Sample Weights (cont.) Replicate weights: • A series of weight variables that are used instead of PSUs and strata in an effort to protect the respondents' identity. Pweight and the replicate weights must be used for the correct calculation of the point estimate and its standard error. Survey Research Laboratory 24 of 27 Summary of Weights • • • • Weight for probability of selection Adjust for non-response Post-stratify Expand or contract to population/sample totals Survey Research Laboratory 25 of 27 Syntax Examples of Design-based Analysis in STATA, SUDAAN & SAS svyset svyset svyset svyreg STATA strata strata psu psu pweight finalwt fatitk age male black hispanic SUDAAN proc regress data=”c:\nhanes.sav” filetype=spss desgn=wr; nest strata psu; weight finalwt subpgroup sex race; levels 2 3; model fatintk = age sex race; Survey Research Laboratory 26 of 27 Syntax Examples of Design-based Analysis in STATA, SUDAAN & SAS SAS proc surveyreg data=nhanes; strata strata; cluster psu; class sex race; model fatintk = age sex race; weight finalwt; Survey Research Laboratory 27 of 27