Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
History of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Foundations of statistics wikipedia , lookup
Psychometrics wikipedia , lookup
Categorical variable wikipedia , lookup
Analysis of variance wikipedia , lookup
Omnibus test wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Summer Research Opportunities Program Statistics Boot Camp May 22-27, 2011 Instructor: Dr. Kimberly Maier SROP 2011 1 What We Will Learn This Week How to describe data (quantitatively and graphically) How to select and compute statistical estimates and hypothesis tests How to use SPSS to accomplish these tasks How to interpret and write about the results of the estimates and tests SROP 2011 2 Introduction of SPSS and Course Dataset SROP 2011 3 SPSS Software designed to perform a wide variety of statistical analyses Other commonly used packages include SAS, Stata, R, and Minitab Runs in the Windows environment Most recent version (19.0) is called IBM SPSS (earlier version, 18.0, is called PASW). Older versions of SPSS will work just fine for the analyses we will do this week. Two important files: Data Editor Viewer SROP 2011 4 Data Editor The type of file contains the data This file has the extension .sav New data files can be created Old data files can be opened The data file has two views data view: SROP 2011 5 Data Editor variable view: SROP 2011 6 Data Editor Menus File: Edit: Select New to start a new SPSS Data file Select Open to access an existing SPSS Data file Many different types of files can be opened by SPSS, including Excel and SAS data files Cut, Copy, and Paste Quick navigation to a particular case or variable Options allows you to change a variety of SPSS environment options such as how variables are displayed and where files are saved View Can change fonts Can display value labels instead of values SROP 2011 7 Data Editor Menus Data Transform Can insert new variable or new value Can split files Can select cases Can weight cases Compute: calculates data values according to an expression you enter, has a variety of functions Recode: assigns discrete values to a variable based on the present values of the variable being recoded Analyze Used to select various statistical procedures, such as Crosstabs, Chi-square, One-way ANOVA, and Linear Regression SROP 2011 8 Data Editor Menus Graphs Utilities Use this menu to get information about your variables and go to them quickly in your data file Window Used to create bar charts, pie charts, histograms, scatterplots, and many other graphs Chart Builder and Legacy Dialogs will allow you to do the same graphs, but they are two different ways of specifying the graphs Use this menu to switch between open windows in the current SPSS session Help Access Help including general topics, tutorial, statistics coach, and a syntax guide SROP 2011 9 You try it… 1. 2. 3. Open SPSS on your computer Download one of the course datasets from Angel: log into angel, click on ‘Content’, then ‘DATA SETS (Add Health and NHanes) (2011)’, save to p:\ drive (your personal storage space – referred to as AFS). Open the file in SPSS… SROP 2011 10 Data Editor – Data view Actual data values are displayed Row represent cases or observations Columns represent variables (different items of information you collect on each case) Cells within the rows and columns are the intersection of the data entered on each case and the variable. SROP 2011 11 You try it… Click on ‘Variable View’ tab at the bottom of Data Editor… SROP 2011 12 Data Editor – Variable view Displays variable definition information, including: Variable and value labels Data type Measurement scale User-defined missing values Number of decimal places, number of digits Section headings have been added to simplify navigation SROP 2011 13 Data Editor – Variable view Name Type Click once in this column to get a white box with a gray square at the right. Click on the gray square and select the data format Formats include numeric, dollar, and string (rare) Width Enter up to eight characters Must begin with a letter Enter a number to change the text field width Decimals Assign the number of decimal places by entering a number or using the arrows in the box SROP 2011 14 Data Editor – Variable view Labels Important to label variables! This is the label of the variable that is descriptive Can be up to 256 characters Values Enter a value labels for categorical variables Click once in the column to get a white box with a gray square on the right Click on the gray square to open the Value Labels dialog box SROP 2011 15 Data Editor – Variable view Missing Specify codes for missing data Examples include “98=Don’t Know” or “99=Not Applicable” Click once in this column to get a white box with a gray square on the right Click on the gray square to open the Value Labels dialog box Enter all the missing data values Up to three missing data value codes can be entered for a variable A range of values would count as two value codes Columns Enter a number to change the column width SROP 2011 16 Data Editor – Variable view Align Click once in this column to get a white box with a gray square on the right Click on the gray square to choose Right, Left, or Center to align text in the columns Measure white box with a gray square on the right Click on the gray square to choose the appropriate level of measurement Scale is used for continuously valued variables Ordinal is used for categorical variables that have ordering Nominal is used for categorical variables that do not have ordering SROP 2011 17 You try it… Open a blank PASW data file… SROP 2011 18 You try it… 1. Enter the following data in PASW: SROP 2011 19 You try it… 2. Label Variables: ID=identification number Gender=Gender Reading=scores on reading test PreMath=scores on math pre-test PostMath=scores on math post-test Readlev=reading level group 3. Label variable values for categorical variables: Gender: 1=female, 0=male Readlev: 0=low group, 1=high group SROP 2011 20 You try it… 4. Compute new variable: ‘Change’ is defined as “PostMath-PreMath” scores… SROP 2011 21 Viewer Displays the statistics, graphics, or output from your work in SPSS Can have more than one Output Viewer file open at the same time Use caution when multiple windows are open, SPSS will save in whatever viewer is active. The active Output Viewer is called the designated file and will have a red exclamation point at the bottom of the window. You can change the designation by clicking on the red exclamation point in the toolbar SROP 2011 22 Viewer Output includes charts, Pivot tables, Text output, Titles, and Notes This file type has the extension .spv (.spo in earlier versions) Opens after you run the first analysis The window is split into two parts, or panes The left side is the outline pane – indicates the items contained in the contents pane to the right The right side is the Contents pane or the Output pane To maneuver around, either click on the item in the Outline pane to see it on the right or scroll down in the Contents pane to get to an item. SROP 2011 23 You try it… Calculate descriptive statistics (mean, standard deviation) for variable ‘Change’: SROP 2011 24 You try it… Calculate descriptive statistics (mean, standard deviation) for variable ‘Change’: SROP 2011 25 You try it… Generate a scatterplot displaying the relationship between PreMath and PostMath scores for all students… SROP 2011 26 You try it… Generate a scatterplot displaying the relationship between PreMath and PostMath scores for all students… SROP 2011 27 You try it… Generate a scatterplot displaying the relationship between PreMath and PostMath scores for all students… SROP 2011 28 You try it… Save the Data Editor and the Viewer files as Educ.sav and Educ.spv (you can save to either the p:\ drive or to your flash drive)… SROP 2011 29 Workshop Datasets SROP 2011 30 Add Health A subset of the National Longitudinal Study of Adolescent Health (Add Health) Goal of program: data that provides “…opportunities to study how social environments and behaviors in adolescence are linked to health and achievement outcomes in young adulthood.” Data from Wave 3 for 18-26 year old respondents. Variables containing information on social, economic, psychological and physical well-being for individuals. Information on family, neighborhood, community, and relationships. SROP 2011 31 Add Health Sample size: 4,882 people Because some groups were under-sampled and some were over-sampled, a weighting variable was constructed that when used in the statistical models, allows us to make inferences from the results of the dataset to that of the population. We must specify the weighting variable in SPSS. The weighting variable: DATAWEIGHT The weighting variable can be turned on and off in SPSS by Data Weight Cases, and selecting DATAWEIGHT. The status of the use of the weighting variable is displayed in the lower right-hand of the dataset in SPSS; Weight On is displayed. SROP 2011 32 You try it… 1. 2. 3. Download the Add Health dataset from Angel: log into angel, click on ‘Content’, then ‘DATA SETS (Add Health and NHanes) (2011)’, save to p:\ drive (your personal storage space – referred to as AFS). Open the file in SPSS. Apply the weighting variable DATAWEIGHT… SROP 2011 33 NHanes National Health and Nutrition Examination Study. Goal of program: “…program of studies designed to assess the health and nutritional status of adults and children in the United States…” conducted by the National Center for Health Statistics (NCHS) . with a changing focus on a variety of health and nutrition measurements which were designed to meet current and emerging concerns. This dataset contains data collected from 10,149 adults and children, 2007-2008. Data was collected using personal interviews and physical examinations. All but the very young provided blood samples. SROP 2011 34 NHanes Variables contain information on social, economic, psychological and physical well-being for individuals, with a focus on health. Older subjects have more extensive physical examination data. There are two weighting variables for this dataset: ALL13PLWT and EX13PLWT. Use EX13PLWT for analyses involving examination data. Use ALL13PLWT for all other analyses. These weight variables set the weight for all subjects younger than 13 years old to zero, hence all analyses will only include people 13 years old or older (even though you will find younger people in the dataset). SROP 2011 35 You try it… 1. 2. 3. Download the Nhanes dataset from Angel: log into angel, click on ‘Content’, then ‘DATA SETS (Add Health and NHanes) (2011)’, save to p:\ drive (your personal storage space – referred to as AFS). Open the file in SPSS. Apply the weighting variable ALL13PLWT… SROP 2011 36 Introduction to Statistics SROP 2011 37 Why Statistics? We never truly know the ‘exact’ answer to anything. We cannot often collect data from a whole population There is always some measurement error Some processes may have more measurement error than others Too expensive Unrealistic to survey all of the population Too time-consuming Instead of gathering data from the whole population, we use a sample. A sample is a subset of the population Measurements made on a sample are subsets of measurements that could have been made on the population SROP 2011 38 Why Statistics? Questions we can ask using numerical data can include those such as: Do two groups of students have different average test scores? Does increased calcium intake reduce blood pressure? Do stock prices (when adjusted for inflation) show only random variation? SROP 2011 39 Research Paradigms Classification by the whys of research Basic (Pure) Research Goal – advance theory Setting – controlled laboratory Applied Research Goal –test theories; develop and test research hypotheses Setting –less controlled than laboratory Action Research Goal –improve a teacher’s professional practice; provide understanding to teacher Setting –the context of application SROP 2011 40 Research Paradigms Classification by measurement procedure Qualitative research Quantitative research Goal –understanding of individuals & events in their natural states Data –case study, historical research, ethnographic research Results of a qualitative study not usually generalizable to other cases. Goal –objective explanation of cause & effect relationships & determining whether the effects can be manipulated. Data –numerical Findings are usually generalizable to other cases. Mixed Methods research – combination of the two SROP 2011 41 Research Paradigms Classification by intervention Experimental Research Researchers plan an intervention and study its effects on individuals or groups of individuals. The researchers can manipulate the intervention. Study of cause and effect. Nonexperimental Research Researchers do not manipulate intervention or no intervention is implemented. Two kinds of nonexperimental research Causal comparative – intervention is not manipulated, but it is implemented & cause and effect is studied. Descriptive –no intervention is implemented, there is no interest in study of cause and effect, only description. SROP 2011 42 Some basic concepts about sampling Population: The entire collection of events in which we are interested Sample: A subset of the population Random sample: Each sample of the population has an equal probability (chance) of being selected External Validity: A ‘measure’ of the extent to which the sample accurately reflects the entire population; generalizability Random Assignment: A method of assigning members of a sample to either a treatment or control group SROP 2011 43 Sampling The purpose of sampling is to achieve generalizabilitythe ability to study a subset of the population we are interested in. To ensure a high level of generalizability, the sample should be a good representation of the population. Sampling frame (not always possible to create): List of individuals from which a sample is actually selected Ideally, the frame should list every individual in the population A frame that leaves out part of the population is a common source of under-coverage Make sure that the sample represents the population you intend to study. SROP 2011 44 Sampling Biased samples can result from: Some strategies to help prevent biased sampling: ‘Opportunistic’ samples or ‘substitutions’ Non-response Faulty sampling frame Incentives Follow ups for missing data Novel data collection (computer vs. pen/paper) Do not rely on volunteers If some sampled individuals do not provide data, there is the results using the data provided could be biased. SROP 2011 45 Creating a representative sample Simple random sample Every sample has the same probability of being chosen. For example, if the population consists of 100 people and the sample will consist of 20 people, every one of the possible samples of 20 people in the population have an equal chance of being chosen. Systematic sample Every kth person is chosen k = number of people in population divided by number of people in sample. Each sample still has an equal probability of being chosen. The list of the population must be randomly ordered. SROP 2011 46 Creating a representative sample Stratified sample Cluster sampling: A random sample is taken from each strata (group). The number sampled from each strata is determined by the proportion that strata appears in the population and the sample size required. A cluster is a group whose members share common characteristics. Groups of people, not individuals, are selected. In contrast: Incidental (convenience) sampling: Sample is not drawn by researcher. Not necessarily random. Common in action research. More common than you think! SROP 2011 47 Gathering Data Surveys administered to a sample Data are used to make inferences about the population Designs for gathering data Experimental Factorial Experimental Completely randomized design Randomized block design Latin square Designs that look at one factor at a time Designs that can identify interactions between multiple factors Observational study SROP 2011 48 Data Collection Data that has already been collected HSB (High School and Beyond) NELS (National Education Longitudinal Study) PISA (Program for International Student Assessment) See http://nces.ed.gov/surveys/ for more data sets Data that you collect Pilot study surveys Generate sample for data collection from the appropriate population Collect data Follow up on missing data Offer incentives to study participants Enter and clean data Pros and cons of each approach? SROP 2011 49 Planning Your Research SROP 2011 50 Research Plan Describes the methods and procedures you will use to carry out your research. Advantages of a research plan Forces you to think through every aspect of the research. Provides a way that others can critique your research. Helps guide the research. SROP 2011 51 Parts of a Research Plan Introduction Sets the stage for the rest of the document. Statement of the topic Literature review Be clear and concise Provides direction Sets the context for your topic Show what’s been done before, compare and contrast works, and show strengths and weaknesses Hypothesis statement Each hypothesis should have an underlying explanation for its prediction This is based on the literature. SROP 2011 52 Parts of a Research Plan Method – detailed description Research participants Instruments – describe how they will measure your variables in the study Materials/Apparatus Design – general strategy for conducting the data gathering Procedure Include sampling technique. Assumptions & limitations SROP 2011 53 Parts of a Research Plan Data Analysis Time Schedule Describes the statistical techniques to be used. Is the analysis impossible? Hypothesis determines the appropriate statistical technique to be used. Something will always go wrong. Everything takes twice as long as you think. The hypotheses (null & alternative) that you have formed provide the direction of the research plan Encompass the research questions We will talk about hypotheses in more detail in a later module SROP 2011 54 Validity SROP 2011 55 Validity One of two important characteristics of a test, assessment, or measure. A measure of the accuracy of inferences and interpretations that can be made using measurements. A test or measurement is valid to the extent to which it lives up to the claims that the researcher has made for it. Different types of claims are appropriate for different measurement situations, so different types of validity need to be addressed. SROP 2011 56 Types of Validity Content validity The extent that the test items represent the content that the test is designed to measure. Determined systematically by comparing the test content with the course content (or reference content). Important in achievement testing and tests of skill and proficiency. Particularly important in selecting tests to use in experiments/different instructional methods/programs. Concurrent validity The extent that individuals’ scores on the test correlate with their scores on another test administered at the same time or within a short interval of time. Carried out to locate simple, easy-to-use tests in place of complex, expensive tests. Calculate correlation between scores on test A and text B (given within a short time). SROP 2011 57 Types of Validity Predictive validity The extent that scores on the test predict individuals’ subsequent performance on a criterion measure. To measure-administer a test to a group, wait until behavior has occurred, and determine degree of relationship between test score and occurrence of behavior. Construct validity The extent that a test can be shown to measure a particular hypothetical construct. To measure begin with hypotheses about characteristics of people who would obtain high scores as opposed to the characteristics of those who would have low scores. See how well test conforms to these hypotheses. Most tests use multiple sources of evidence. SROP 2011 58 Types of Validity Face validity The extent that the test appears to measure what it purports to measure. Claims of this validity are not too convincing. Different from content validity because it is determined subjectively. SROP 2011 59 Reliability SROP 2011 60 Reliability Refers to the consistency of the measurement we obtain for people on a test. The notation is r (same as the Pearson Correlation coefficient). r ranges from 0.0 to 1.0, with 1.0 having the highest possible reliability. In physical sciences, synonymous with ‘accuracy’ In social sciences, reliability is a characteristic of a survey, questionnaire, assessment. SROP 2011 61 Reliability Standard Error of Measurement (SEM) Tells you what range the True Scores should fall within from the Observed Scores. The calculation of SEM: SEM s 1 Reliability where s = standard deviation of the test or instrument To use, calculate SEM for the test, and calculate the mean of the observed scores, and use the empirical rule for the normal distribution (i.e. 68% of true scores should be 1 SEM of mean). SROP 2011 62 Reliability Factors affecting reliability – Psychological Measures Heterogeneity of the subjects tested – the more heterogeneous the subjects, the more reliable the test. Test length – the longer the test, the more reliable. The difficulty of the items – the more mixed difficulty on the test, the higher the reliability. Quality of items – the higher the quality, the higher the reliability. How high should the reliability be? Depends on what you’re using the test results for (what decisions will be made about the students?). Usually a minimum of 0.5, with a reliability of 0.9 for high-stakes decisions. SROP 2011 63 Reliability How high should the reliability be? Depends on what you’re using the measurements for What decisions will be made? High-stakes? Rule of thumb for psychological measures: usually a minimum of 0.7, with a reliability of at least 0.9 for highstakes decisions. Rule of thumb for physical science measures: likely more demanding SROP 2011 64 Levels of Measurement SROP 2011 65 Measurement Terminology Variable: A property of an object or event that can take on different values. Examples: age, weight, height, IQ, math achievement Independent vs. Dependent variables An independent variable, according to theory, has a causal influence on the dependent variable Also known as: Predictor Explanatory variable A dependent variable is of greatest substantive interest to the researcher—the variable with real world implications Also known as: Predicted variable Outcome Response variable SROP 2011 66 Measurement Terminology The variable is also defined by the nature of the measurement Discrete variable Measures that are made by placing observations into mutually exclusive and exhaustive categories ordinal variable: ordered categories: nominal variable: unordered categories Continuous variable Measures that are made by positioning observations on a linear continuum. Interval variable: continuum does not have an absolute zero Ratio variable: continuum has an absolute zero Interval and ratio variables can be used interchangeably in statistical calculations when a continuous-valued variable is required SROP 2011 67 Measurement Terminology Nominal (categorical) A set of labels applied to groups composed of individuals with similar characteristics. Typically the labels are exhaustive (cover everything) and mutually exclusive (don’t overlap). Characterized by mathematical functions of = and Example: Department: CEP, EAD, TE, KIN. Binary (also known as dichotomous) Special nominal variable consisting of two groups Example: Gender: Male, Female SROP 2011 68 Measurement Terminology Ordinal (categorical) A set of labels applied to groups composed of individuals with similar characteristics The labels indicate more or less of a quality Labels can be rank ordered. Characterized by mathematical functions of =, ≠, <, and > Example: Liking of math: high, medium, low Interval/Ratio (continuous): Numeric values assigned to individuals The measures can be characterized by mathematical functions of =, ≠, <, >, +, and –. Intervals are assumed to be equal across the range of the measures Example: IQ measure. In SPSS: Scale SROP 2011 69 You try it… Assign a measurement level to each of the variables (except ID) in Educ.sav… SROP 2011 70 Descriptive Statistics SROP 2011 71 Statistics Terminology Descriptive Statistic (also known as a sample statistic) Parameters A number used to describe some aspect of a sample (data set). May refer to the location, dispersion, symmetry, or flatness/ peakedness of the distribution of data for a particular variable Or may refer to the association/relationship between two variables Descriptive statistics are used to estimate parameters Numbers used to describe some aspect of the population Typically not observable Inferential Statistic A number used to generalize from observations in a sample data set to a population that the sample represents. SROP 2011 72 Descriptives Data summaries can be accomplished with graphical or numerical methods Graphical methods • Pie charts • Box Plot • Boxplots • Scatterplots • Stem-and-leaf plots • Frequency histograms • Dotplots Numerical methods Frequency table (Contingency table) Measures of central tendency Measures of variance Measures of association SROP 2011 73 Graphs for Categorical Data Pie Chart You try it… Create a pie chart for where the respondent lives (H3HR2) SROP 2011 74 Graphs for Categorical Data Bar Chart You try it… Create a bar chart for where the respondent lives (H3HR2) SROP 2011 75 Numerical Summaries for Categorical Data Frequency Table You try it… Create a frequency table for where the respondent lives SROP 2011 76 Numerical Summaries for Categorical Data Crosstabs Table – for 2 variables You try it… Create a crosstabs for gender and where the respondent lives SROP 2011 77 Graphs for Continuous Data Histogram You try it… Create a histogram for age at first paying job SROP 2011 78 Measures of Central Tendency For population: parameters (i .e., ) For sample: sample statistics (i .e., X ) Measures of central tendency: Mode (Mo) – for both categorical and continuous data Median (Me) – for ordinal or continuous data Most frequently occurring value Middle value of an ordered listing of values Arithmetic Mean – for continuous and ordinal data N N X Xi i 1 Xi n N i 1 SROP 2011 79 Measures of Central Tendency A measure of ‘Skewness’ will indicate the extent and the degree of asymmetry about the mean. Positive Skew, skewness>0 Negative Skew, skewness<0 Mode Mode Median Mean Median Mean General rule of thumb: you can assume normality if -1 < skewness < 1 From: Huck, S. (2004) Reading Statistics & Research (4th ed.). Allyn & Bacon. SROP 2011 80 Measures of Variability For sample, sample statistic, (i.e., s2) For population, parameter, (i.e., 2 ) Measures of variability Range – for ordinal and continuous data The difference between the two most extreme data points (maximum – minimum). Sensitive to outliers Interquartile Range (IQR) – for ordinal and continuous data The difference between the 25th and 75th percentiles. insensitive to outer 50% of the data SROP 2011 81 Measures of Variability Variance – for continuous data The average squared deviation of scores from the mean Sensitive to outliers N X2 X i 1 i N N 2 s X2 X i 1 i X 2 N 1 SROP 2011 82 Measures of Variability Standard deviation – for continuous data The average absolute deviation of scores from the mean The square root of the variance Sensitive to outliers N X X i 1 i N N 2 sX X i 1 i X 2 N 1 SROP 2011 83 You try it… 1. Use SPSS to generate descriptive statistics for ‘age at first paying job’ 2. Use SPSS to generate descriptive statistics for where respondent lives (H3HR2) 3. Use SPSS to generate descriptive statistics for ‘gender’ SROP 2011 84 Inferences – Confidence Intervals SROP 2011 85 Making Inferences Parameter estimation and confidence intervals What is the value of the parameter? Hypothesis testing Is the parameter equal to a specific value? SROP 2011 86 Making Inferences – Confidence Intervals A 100(1-a)% level confidence interval is an interval estimator of the population parameter. It has the form: Lower bound Population Parameter Upper bound There are formulas for computing the upper and lower bounds from the data the sample provides. The 100(1-a)% = confidence level is typically a percentage between 90% and 99% Example: a choice of a = 0.10 gives a 90% confidence interval. SROP 2011 87 Making Inferences – Confidence Intervals Sample statistics are used as estimates of population parameters Confidence intervals are constructed to reflect the uncertainty of the estimate The confidence interval is based on knowledge of the sampling distribution of the population parameter All statistics (mean, variance, etc.) have sampling distribution. This is how we do hypothesis testing Does the sample come from a population that has a parameter equal to a particular value? SROP 2011 88 Sampling Distribution – The Mean Central Limit Theorem – sampling distribution of the mean When the sample size n is large, the sampling distribution of the mean will be approximately normal. When the population is normally distributed, the sampling distribution of the mean is exactly normal for any sample size. The mean and standard deviation (standard error) of the sampling distribution of the mean are: x , x n Let x sample mean of n measurements . Then the mean and the standard deviation (standard error) of the sampling distribution of the mean can be estimated: x x, x n SROP 2011 89 Sampling Distribution – The Mean The sampling distribution gives a probability model for the distribution of values of the statistics (for example, the mean) in repeated sampling of a population having a particular parameter value. Parent Population Sampling Distribution of the Mean http://onlinestatbook.com/stat_sim/sampling_dist/index.html SROP 2011 90 Sampling Distribution – The Mean SROP 2011 91 Sampling Distribution – The Mean When the population variance is unknown, then the sampling distribution for the mean is a t-distribution Use of this distribution takes account of the sampling error in the variance estimate z distribution t(10) distribution SROP 2011 92 Confidence Interval for the Mean, known To construct a 100 1 a % confidence interval for the population mean for any confidence coefficient 1 a : X za / 2 y , where y n The confidence interval for is: X za /2 y , X za /2 y za / 2 is the value of z having a tail area of a / 2 to its right. Common values: z a a /2 .10 1.645 .05 1.96 .01 2.575 .001 3.31 SROP 2011 93 Confidence Interval for the Mean, unknown To construct a 100 1 a % confidence interval for the population mean for any confidence coefficient 1 a : s X ta / 2 y , where sy n The confidence interval for is: X t s , X ta /2,df n 1sy a /2,df n 1 y t a /2,df n 1 is the value of t having a tail area of a / 2 to its right. Critical values change with sample size, need to use statistical tables. SROP 2011 94 Confidence Interval for the Mean, unknown SROP 2011 95 You try it… Using descriptive statistics, calculate the confidence interval for the mean of the raw Peabody scores (raw_ah)… SROP 2011 96 Inferences – Hypothesis Testing SROP 2011 97 Making Inferences – Hypotheses What’s a Hypothesis? An ‘educated guess’ Based on theory & previous research Different from research questions or objectives Very specific Drives the research SROP 2011 98 Hypothesis Tests A hypothesis test consists of two hypotheses: Null Hypothesis: Ho: status quo Alternative Hypothesis: HA: research hypothesis We wish to test the null hypothesis versus the alternative hypothesis. The set-up is not symmetric: it takes strong evidence to reject the null in favor of the alternative hypothesis. The alternative hypothesis can be: Two-tailed (testing for change either way) One-tailed Upper tail (testing for an increase) Lower tail (testing for a decrease) The decision about the form of the alternative hypothesis must be made before you look at the data (a priori). SROP 2011 99 Hypothesis Tests A statistical test (at a selected level of significance usually labeled a) will reject the null hypothesis if the test statistic is in the critical/rejection region. The level of significance is the type I error rate of the test procedure. Type I error is the error of rejecting the null hypothesis when you shouldn’t. Example: a test procedure with .05 level of significance would reject the null in 5% of the possible samples even though the null is true. Example: HO: = 15 versus HA: ≠ 15, a type I error would be to claim that the mean age at first paying job is not equal to 15 when in reality it was equal to 15. SROP 2011 100 Hypothesis Tests Typical values of significance (a are .05, .10, .02, .01, although the choice is often field-dependent. When our data rejects HO at a significance level of a, then we say that the data is significant at level a. When using statistical software, testing is often approached via the p-value and then we compare the p-value to a. The p-value gives more information than simply testing at a fixed significance level. The p-value measures how extreme the data is (in favor of HA when HO is true; or tells us how far out the data is in the tail(s) of the HO distribution. Example: p-value = .01: data at the 1% cut, data is unlikely if HO is true. Example: p-value = .45: data at the 45% cut, data is fairly likely if HO is true. SROP 2011 101 Hypothesis Tests Small p-values are evidence against HO . The smaller the p-value, the more evidence against the null hypothesis. To tie this together with significance level, if the data gives a p-value ≤ a, then the data would reject HO at significance level a. Example: p-value = .06 would not reject HO at the a = 0.05 level. Another way to think about it is that the data isn’t in the 5% tail, it hasn’t passed the critical value for a = 5% . Data is not extreme enough to reject null. SROP 2011 102 Hypothesis Tests What happens if assumptions of the statistical test are violated? The p-value reported may not be the actual p-value, so the conclusions from the statistical test are invalid. Hard to know whether actual p-value is actually bigger or smaller than reported. SROP 2011 103 Inferences – t-tests SROP 2011 104 Hypothesis Testing – The Mean Format of hypotheses (two-tailed): H0 : x 0 Null Hypothesis: Alternative hypothesis: HA : x 0 Sampling distribution: z distribution if population variance is known t distribution if population variance is unknown. Test statistic for known population variance (z-test for mean): X 0 z X Test statistic for unknown population variance (t-test for mean): X 0 t sX SROP 2011 105 Hypothesis Testing – The Mean Assumptions of the tests Normality of the population for smaller sample sizes SROP 2011 106 You try it… Test the null hypothesis that the mean of the raw Peabody test scores is equal to 67. SROP 2011 107 Hypothesis Testing – Equality of Two Means Means must be from two independent groups Format of hypotheses (two-tailed): H0 : 1 2 Null Hypothesis: Alternative hypothesis: HA : 1 2 Sampling distribution: t distribution with mean of zero and standard error of: . s x1 x2 n1 1 s12 n2 1 s22 n1 n2 2 1 1 n1 n2 Independent samples t-test statistic: t x1 x 2 n1 1 s12 n2 1 s22 n1 n2 2 1 1 n1 n2 x1 x 2 s x1 x 2 df n1 n2 2 SROP 2011 108 Hypothesis Testing – Equality of Two Means Assumptions of the test: Normality of the sampling distribution of the means. For equal sample sizes, violating this assumption has only a small impact on the difference between the assumed and true Type I error rate provided the distribution shapes are similar or are both symmetric. If the distributions are skewed, then serious problems arise unless the variances are similar. Homogeneity of variance. For equal sample sizes, violating this assumption has only a small impact on the difference between the assumed and true Type I error rate. Equality of sample size. When sample sizes are unequal and variances are non-homogeneous, there are large differences between the assumed and true Type I error rates SROP 2011 109 What if the population variances are not equal? Assess whether the variances are equal using hypothesis testing (null is that 1 2. The null hypothesis is that the two sample variances could have come from the same population. This approach is not recommended when the data are not normally distributed. SPSS performs ‘Levene’s Test for the Equality of Variances: If the null hypothesis is rejected, must use an approximate t-test and corrected degrees of freedom or use a nonparametric test. SROP 2011 110 Hypothesis Testing – Equality of Two Means Effect size, standardized mean difference X1 X 2 Calculated as: n1 1 s12 n2 1 s22 n1 n2 2 Represents the difference between the two means as standardized, in terms of standard deviation units. Cohen provided guidelines for interpreting this particular effect size: Small = 0.25, Medium = 0.50, Large = 1.0 or greater These are only guidelines: “These qualitative adjectives…may not be reasonably descriptive in any specific area. Thus, what a sociologist may consider a small effect may be appraised as medium by a clinical psychologist.” Cohen, 1977, p. 278. Cohen, J. (1977). Statistical Power Analysis for the Behavioral Sciences, 2nd ed. New York: Academic Press. SROP 2011 111 SROP 2011 112 You try it… Test the null hypothesis that the means of the raw Peabody test scores for boys and girls are equal. SROP 2011 113 SROP 2011 114 Hypothesis Testing – Equality of Two Means Means must be from two dependent groups Format of hypotheses (two-tailed): H0 : 1 2 Null Hypothesis: Alternative hypothesis: HA : 1 2 Sampling distribution: t distribution with mean of zero and standard error of: . d n sy y 1 s s 2s1s2r n 2 1 2 2 2 i 1 i d 2 n n 1 Paired samples t-test statistic: t x1 x 2 s12 s22 2s1s2r n df npairs 1 SROP 2011 115 You try it… Test the null hypothesis that the means of the standardized Peabody test scores for waves 1 (PVTSTD1) and 3 (PVTSTD3C) are equal. SROP 2011 116 Chi-square Goodness of Fit Test SROP 2011 117 Chi-square goodness of fit test When we have a single categorical variable, and we want to determine whether observed classifications are consistent with a theory, use a chi-square goodness of fit test Allows us to compare observed relative frequencies (percentages) to theory-based relative frequencies using the hypothesis testing framework. Our observed statistics are the proportions associated with each classification, and our null parameters are the expected proportions for each classification. SROP 2011 118 Chi-square goodness of fit test Example: We are interested in determining whether a purposive sample that we have drawn is comparable to the US population with respect to ethnicity. Research question: Is our sample representative of the proportions of racial groups in the general population? The null hypothesis: The observed proportions of Asians, blacks, Hispanics, and whites in our sample reflect the proportions of these groups in the general population. SROP 2011 119 Chi-square goodness of fit test Example: In this case, frequencies and proportions calculated using the data we collect are the observed proportions. The theory-based null parameters (shown in the bottom row of the table) are obtained from the US Census are expected proportions. Asian Black Hispanic White Observed Frequencies 30 50 30 200 Observed Proportions (ng / N) .10 .16 .10 .65 Census Proportions .04 .12 .10 .74 SROP 2011 120 Chi-square goodness of fit test The chi-square goodness of fit statistic: 2 2 k O E O E i i 2 k Ei E i 1 O is the observed frequency , and k equals the number of classifications in the table (i.e., the number of cells), and E is the expected frequency given the sample size. The numerator is the squared difference (deviation) between the observed sample size and that predicted by theory. The denominator weights each difference by its expected value (cells with larger expected values will have larger deviations). The degrees of freedom are k - 1 SROP 2011 121 Chi-square goodness of fit test The expected frequencies (E) constitute what we would expect to be the values of the observed frequencies (O) if, indeed, our theory was true. That is, the expected number of cases in each group should be consistent with p (our theory-based proportions). We compute our expected values (E) as: Ei p i N The expected value for each cell (Ei) should equal the total number of participants in the study (N) times the theory-based proportion for that group (pi). SROP 2011 122 Chi-square goodness of fit test The chi-square statistic tells us how far, on average, the observed cell frequencies are from the theory-based expectations. From our example: Asian Black Hispanic White Observed Frequencies 30 50 30 200 Expected Frequencies 12.4 37.2 31 229.4 Observed – Expected 17.6 12.8 -1.0 -29.4 (O – E)2 309.76 163.84 1.00 864.36 (O – E)2 / E 24.98 4.40 0.03 3.77 2 k O E E 2 33.15, df 3 SROP 2011 123 Chi-square goodness of fit test The table of critical values for the chi-square distribution with 3 degrees of freedom for a = .05 equals 7.82. Hence, the observed differences between our sample and our expected values are extremely unlikely if the null hypothesis is true—that the vector of observed probabilities equals the vector of theory-based probabilities. Substantively, we conclude that our sample does not match the population that it was intended to represent. SROP 2011 124 Chi-square goodness of fit test Summary of test: 1. 2. 3. 4. Determine which test statistic is required for your problem and data. The chi-square goodness-of-fit statistic is relevant when you want to compare the observed frequencies or proportions for a single categorical variable to the frequencies predicted by a theory. State your research hypothesis—that the observed frequencies were not generated by the population described by your theory. State the alternative hypothesis: that the observed proportions are not equal to the theory based proportions (i.e., pobserved ptheory—this is a non-directional test). State the null hypothesis: that the observed proportions are equal to the theory-based proportions (i.e., pobserved = ptheory—here, p is the population parameter estimated by p, which is not the p-value but the proportion in each group observed). SROP 2011 125 Chi-square goodness of fit test Summary of test, continued: 5. 6. 7. 8. Compute your observed chi-square value. Determine the critical value for your test based on your degrees of freedom and desired a level OR determine the p-value for the observed chi-square value based on its degrees of freedom. Compare the observed chi-square value to your critical value OR compare the p-value for the observed chi-square statistic to your chosen a, and make a decision to reject or retain your null hypothesis Make a substantive interpretation of your test results. SROP 2011 126 You try it… Run a chi-square goodness of fit test on birth month (what is your null hypothesis?)… SROP 2011 127 You try it… Run a chi-square goodness of fit test on birth month (what is your null hypothesis?)… SROP 2011 128 Chi-square Test of Association SROP 2011 129 Chi-square test of association Also known as: Contingency test Test of independence This test examines whether two categorical variables are independent of one another. If the pattern of frequencies of outcomes in one variable are not related to the pattern of frequencies in the other variable, they are independent. The counts of observations for two categorical variables can be displayed in a contingency table, or cross-tab table SROP 2011 130 You try it… Create a crosstabs table for gender and ‘Ages 5-12 did not listen’ (H3RA4)… SROP 2011 131 Chi-square test of association To determine if the two categorical variables are independent, we can examine the expected and observed frequency counts in each cell and use the chisquare statistic to determine if they differ statistically. The observed counts are the number of occurrences for each cell ni The expected counts must be calculated under the premise that the two variables are independent. SROP 2011 132 You try it… Display the expected counts for the crosstabs table… SROP 2011 133 Chi-square test of association The null hypothesis that is tested is that the two categorical variables are independent. The test statistic is the Pearson chi-square, calculated by: 2 O ij Eij E ij 2 , df R 1C 1 where: Oij observed counts E ij RiC j N expected counts SROP 2011 134 You try it… Calculate the Pearson chi-square test of association for the crosstabs table… SROP 2011 135 Assumptions for chi-square test of association Normality: The distribution of possible values for any single cell in the table is normally distributed, given that the sample size is large enough and the probability of an observation falling in that cell is not extreme. Also, recall that the expected cell frequencies for the chi-square test are defined as Np (total sample size times the probability of being in that cell). Hence, the requirement of normality can be satisfied if the expected cell frequencies are of sufficient size. A rule of thumb is that all of the expected cell frequencies should be 5 or greater. SROP 2011 136 Assumptions for chi-square test of association Inclusion of non-occurrences: Another requirement of the chi-square test is that all cases in the data set be included in the contingency table. That is, the coding system must be exhaustive—it must represent all elements of the sample. SROP 2011 137 Measures of association for categorical data Phi coefficient – applies only to 2 X 2 tables The absolute value of this measure ranges from 0 to 1. 0 indicates no association and 1 indicates a perfect relationship between the two variables in the contingency table. As a rule, values less than .2 indicate a negligible relationship, values from .2 up to .5 indicate an important relationship, and values from .5 up to 1 indicate a very strong relationship. phi coefficient 2 N SROP 2011 138 Measures of association for categorical data Cramer’s phi coefficient Also known as Cramer’s V Same range and rules of thumb as phi coefficient Applies to any two-way table. Cramer ' s phi C 2 N k 1 where k min R,C Please note: A two-way table is a contingency table for two variables (each variable can have 2 or more categories) A 2X2 table is a contingency table for two variables, where each of the variables has only two categories (resulting in four cells). SROP 2011 139 You try it… Calculate the effect size for the Pearson chi-square test of association… SROP 2011 140 Exact test for chi-square test of association Recall that normality is an important assumption for the chi-square test of association. When the expected cell sizes are 5 or greater, we usually assume that we’ve met this assumption. When this requirement is not met, you can use exact statistics to perform the hypothesis test. The exact statistic is based on the empirical probability of observing a certain configuration of cell frequencies with fixed marginal frequencies. SROP 2011 141 Exact test for chi-square test of association To perform an exact test, you would rank order the tables based on the value of one of the cells, determine the probability of observing a value in that cell equal to or less than the observed value, and declare that probability as the p-value for your hypothesis test. You can request SPSS to perform an exact test. Typically the p-value for an exact test will be greater than that from a chi-square test that relies on normality. SROP 2011 142 You try it… Rerun your previous chi-square test of association and request an exact test (labeled Fisher’s Exact Test in output)… SROP 2011 143 Chi-square test of association So far, we’ve used categorical data without any attention to the level of measurement. If one or more variables are ordinal, then it’s a good thing to attend to this in the analyses. The test can potentially have higher power when the pattern of association is determined by the order of an ordinal variable. SROP 2011 144 Chi-square test of association If you have two ordinal variables, you can test the significance of a linear relationship This is the ‘Linear by Linear’ association chi-square Also known as the Mantel-Haenszel test for linear association To use, should have 5 or more expected counts per cell Use Gamma or Kendal’s tau-b (more conservative) as measures of the strength of the relationship: SROP 2011 145 Chi-square test of association If you have one ordinal variable that you can assume has an underlying interval/ratio variable and one nominal variable, use Eta as a measure of association Eta ranges from 0 to 1 Values close to 1 indicate a strong relationship To use, should have 5 or more expected counts per cell SROP 2011 146 Inferences – One-Way ANOVA SROP 2011 147 Hypothesis Testing – Equality of 3+ Means One-way ANOVA (Analysis of Variance) Involves a single factor (categorical variable) and an interval/ratio variable t-test is not applicable because: Type I errors propagate Not efficient Format of hypotheses: Null Hypothesis: H0: 1 = 2 … = k Alternative hypothesis: HA: at least one mean differs The model: Sampling distribution: X ij a i ij F distribution SROP 2011 148 Hypothesis Testing – Equality of 3+ Means Involves 3 sums of squares: SStotal : the sum of the squared differences between each observation and the mean of all observations, ignoring group membership: J nj SStotal X ij X .. j 1 i 1 2 SStreatment : the sum of the squared differences between each group’s mean and the mean of all observations (the grand mean), ignoring group membership: J SStreatment n j X j X .. j 1 2 SROP 2011 149 Hypothesis Testing – Equality of 3+ Means SSerror : the sum of the squared differences between each observation and the mean of the group (the sum of the sums of squared deviations of scores around each group’s mean). J nj SSerror X ij X j j 1 i 1 n 2 J j 1 j 1 s 2j These 3 sums of squares are related: SStotal SStreatment SSerror Each has its own degrees of freedom: dftreatment J 1 dferror N J dftotal N 1 dftotal dftreatment dferror SROP 2011 150 Hypothesis Testing – Equality of 3+ Means We can create a mean square for each sum of square by dividing the relevant sum of squares by its degrees of freedom. Results in three indicators of variance depicted by the three mean squares: Mean square of total – the variance of all observations, ignoring group membership MStotal SStotal N 1 Mean square of treatment – the variance between group mean, relative to the “grand mean”—an indication of the degree to which observations in one group differ from observations in another group, on average MStreat SStreat J 1 SROP 2011 151 Hypothesis Testing – Equality of 3+ Means Results in three indicators of variance depicted by the three mean squares: Mean square of error – the variance within the groups, on average—an indication of the degree to which observations vary, relative to their group’s mean: MSerror SSerror N J SROP 2011 152 Hypothesis Testing – Equality of 3+ Means ANOVA test statistic is the ratio of the mean square of treatment and the mean square of error: Fdf1,df2 MStreatment MSerror with df1 J 1 and df2 N J If comparisons between groups (MStreatment) are about the same as comparisons within groups (MSerror), then we don’t have much evidence of group differences: MSerror MStreat MStreat F 1, retain null MSerror SROP 2011 153 Hypothesis Testing – Equality of 3+ Means But, if comparisons between groups (MStreatment) are greater than comparisons within groups (MSerror), then we can conclude that group differences exist: MSerror MStreat MStreat F 1, reject null MSerror We conclude that at least one of the population means is not equal to at least one other population mean (i.e., this is an omnibus test). This is the only information we have from the F test. To determine which mean or means are different, need to do multiple comparisons. SROP 2011 154 Hypothesis Testing – Equality of 3+ Means Assumptions of test: Homogeneity of variances: Variances between groups need to be equal, and all need to be equal to the error variance. This assumption can be evaluated using Hartley’s Fmax statistic (see the next slide). Also can use the following rule of thumb: If the largest standard deviation is less than twice the smallest, you can assume the assumption of equal population standard derivations has been met. Normality of the dependent variable within each group: This can be simplified as normality of residuals of each observation from its group mean. Sometimes observations are transformed in order to “normalize” them. This assumption can be evaluated by examining the univariate plots of the dependent variable for each independent variable group. Independence of Observations: Again, this simplifies to independence of errors within a group. This assumption is evaluated by thinking about the quality of the design of the study. --Equal cell sizes make for the most powerful test SROP 2011 155 Hypothesis Testing – Equality of 3+ Means Assessment of Homogeneity of Variance: Use Hartley’s Fmax statistic The null hypothesis for Hartley’s Fmax is: 2 H0 : j2 common for all J To test this hypothesis, compute the sample variances for each level of the independent variable (there will be J levels), find the largest and the smallest of those variances: 2 Fmax slargest 2 ssmallest SROP 2011 156 You try it… Run a one-way anova to examine the equality of mean years of education (Educ) for 3 groups of raw Peabody test scores from Wave 3... First, create a three category variable using raw scores… SROP 2011 157 You try it… Run a one-way anova to examine the equality of mean years of education (Educ) for 3 groups of raw Peabody test scores from Wave 3… SROP 2011 158 You try it… Run a one-way anova to examine the equality of mean years of education (Educ) for 3 groups of raw Peabody test scores from Wave 3… SROP 2011 159 You try it… Run a one-way anova to examine the equality of mean years of education (Educ) for 3 groups of raw Peabody test scores from Wave 3… SROP 2011 160 Hypothesis Testing – Equality of 3+ Means Effect size, h2 Analogous to R2 in regression Calculated as: h2 SSB SST Easy to calculate, but very biased. Interpretation: h2 % of the variance in [outcome variable] is explained by the effects of [factor]. For example, if h2 = .865, we would say that 86.5% of the variance in [outcome variable] is explained by the effects of [factor] SROP 2011 161 Hypothesis Testing – Equality of 3+ Means Effect size, w2 Analogous to R2 in regression Calculated as: SSB dfBMSerror w SST MSerror 2 Less biased than h2. Interpretation is the same as h2. SROP 2011 162 You try it… Calculate h2 and w2 for the one-way anova you just ran… SROP 2011 163 Hypothesis Testing – Equality of 3+ Means When the F test is rejected, multiple comparisons must be made among the means. Planned contrasts are a priori (differences theorized before data collected and the F-test for the one-way ANOVA is run) Post hoc contrasts are a posteriori comparisons (after the F-test for the one-way ANOVA is run) SROP 2011 164 Hypothesis Testing – Equality of 3+ Means Type I Errors Tests using contrasts are built to protect us from having overly large chances of making a Type I error To do this the tests consider Type I error rates in two different ways, by examining: The rate “per comparison” (PC or “per-contrast”) the so-called “familywise” (FW) rate, which pertains to a set of comparisons Comparison procedures differ because some of them limit the per-contrast error rate and the others limit the familywise rate. SROP 2011 165 Hypothesis Testing – Equality of 3+ Means Familywise (FW) rate pertains to a set of comparisons If we are making several comparisons, the familywise rate may apply. The FW rate tells us what the chance is of making “at least one Type I error” in the set of comparisons Suppose a is the error rate of one comparison: Per-contrast (PC) error rate a Familywise (FW) error rate 1 (1 a )c SROP 2011 166 Hypothesis Testing – Equality of 3+ Means Example: Suppose a =.05 is the error rate of one comparison and we are making 3 comparisons: Per-contrast (PC) error rate a .05 Familywise (FW) error rate 1 (1 a )c 1 (1 .05)3 .142 If we just add the rates of the three comparisons we get: ca 3(.05) .15 In general, PC FW ca , with FW close to ca SROP 2011 167 Hypothesis Testing – Equality of 3+ Means Planned Contrasts Questions about the population means are expressed as hypotheses about contrasts A contrast should express a specific question that is driven by our research when designing the study. When contrasts are formulated before seeing the data, inference about contrasts is valid whether or not the ANOVA null hypothesis of equality of means is rejected. SROP 2011 168 Hypothesis Testing – Equality of 3+ Means Planned Contrasts Because the tests for planned comparisons are more powerful than the F omnibus test, it is possible to retain the null hypothesis using the F-test and to find a statistically significant contrast. If you have planned some comparisons of means ahead of time because you expect specific means to differ, then you do not really need to do the F test to see if you can reject the omnibus null hypothesis that all means are equal. SROP 2011 169 Hypothesis Testing – Equality of 3+ Means Planned Contrasts A contrast is a combination of population means of the form: ai i The coefficients of the contrasts must sum to zero. The corresponding sample contrast is: c ai xi The standard error of c is: sec s p ai2 n i SROP 2011 170 Hypothesis Testing – Equality of 3+ Means Planned Contrasts The null hypothesis for each contrast says that a combination (contrast) of population means is 0: H0 : 0 Choose to define the contrast so that it will be a positive number when the alternative hypothesis of interest is true (makes some computations easier). The t-test statistic: c t , df dferror sec SROP 2011 171 Hypothesis Testing – Equality of 3+ Means Planned Contrasts Example of null and alternative hypotheses for a contrast: 1 1 2 3 2 1 1 1 2 13 0 2 2 1 H A : 1 2 3 2 1 1 1 2 13 0 2 2 H0 : SROP 2011 172 Hypothesis Testing – Equality of 3+ Means Example: Planned Orthogonal Contrasts (POC) Planned orthogonal comparisons (POC) are contrasts of a certain type. For any one-way anova with k groups, there are k-1 POCs. POCs are simply contrasts that are “orthogonal” or independent of one another – and we can determine this by looking at each pair of contrasts and seeing if they are independent. Each set of k-1 POCs provides tests of all of the unique information in the k means. There may be more than one set of POCs for any set of k means. SROP 2011 173 Hypothesis Testing – Equality of 3+ Means Example: Planned Orthogonal Contrasts (POC) To tell whether two contrasts are orthogonal, we multiply together the weights (the cj values) from the contrasts. For example, assume we are comparing five groups and have the following set of contrasts: L1 X 1 X 2 L2 X 1 X 3 2 X2 X4 X5 3 L1 Group 1 vs. 3 L2 Groups 1 and 3 vs. the rest SROP 2011 174 Hypothesis Testing – Equality of 3+ Means Example: Planned Orthogonal Contrasts (POC) The following weights are being applied to the 5 group means: For example, assume we are comparing five groups and have the following set of contrasts: Each pair of contrasts within the set is orthogonal; for example, for L1 and L2 , (4 0)+(-1 1) (1 1) (1 1) (1 1) 0 SROP 2011 175 Hypothesis Testing – Equality of 3+ Means Example: Planned Orthogonal Contrasts (POC) The test statistic: ta / 2,df c jX j L , df dfwithin 2 se(L) MSW c n j j SROP 2011 176 Hypothesis Testing – Equality of 3+ Means Sample of Planned Contrasts Test Contrasts Type I error and power information Planned Orthogonal Contrasts (POC) k – 1 contrasts in an orthogonal set. Can have multiple sets Per-contrast a. Can apply familywise error, a/c if c is large (Bonferroni correction). Most powerful contrast tests. Trend Contrasts k – 1 ind. trend tests. Useful for quantitative factors that are equally spaced Per-contrast a. Can apply familywise error, a/c if c is large (Bonferroni correction). SROP 2011 177 Hypothesis Testing – Equality of 3+ Means Sample of Planned Contrasts Test Contrasts Type I error and power information Dunn or Bonferroni Any number of c Familywise a. contrasts. Use Use per-contrast level of if contrasts are a/c. not orthogonal. Dunnett paired contrasts Familywise a. of 1 mean with (k-1) other means (e.g.,one control vs other treatments) SROP 2011 178 You try it… Set up a contrast that tests whether the means of education of the respondents in the two highest Peabody groups are greater than the mean education of the respondents in the lowest Peabody group… Note: order of the coefficients is important because it corresponds to the ascending order of the category values of the factor variable SROP 2011 179 You try it… Set up an additional contrast that tests whether the mean of education of the respondents in the middle Peabody group is less than the mean education of the respondents in the highest Peabody group… SROP 2011 180 Hypothesis Testing – Equality of 3+ Means Post-hoc Contrasts Used when hypotheses cannot be formulated a priori Used after we analyze our data using ANOVA and after rejecting the null hypothesis of the ANOVA. When we look at the data before doing comparisons we are increasing our chances of making a Type I error (beyond what we talked about before), because we may decide to only test the differences among means that look big. This is why most post-hoc comparisons examine all possible comparisons, and also why post-hoc tests are not as powerful as planned tests. SROP 2011 181 Hypothesis Testing – Equality of 3+ Means Post-hoc Contrasts The idea of the t-test is used to perform multiple comparisons. The t statistic is calculated for each pair of means. The type I error is controlled using an appropriate standard error in the test statistic equation and by making the type I error level more stringent for each individual test. SROP 2011 182 Hypothesis Testing – Equality of 3+ Means Post-hoc Contrasts In general, the null hypothesis: H0 : i j The general test statistic: Xi X j t ij sp 1 1 ni n j Xi X j Root MSE 1 1 ni n j ** t t The null hypothesis is rejected if ij SROP 2011 183 Hypothesis Testing – Equality of 3+ Means Post-hoc Contrasts The value of t** depends upon which multiple comparisons procedure we choose. Note that we use the pooled estimator from all groups rather than the pooled estimator from just the 2 groups being compared. The additional information about the common pooled estimator increases the power of the tests. The degrees of freedom for all of these statistics are the same: dferror Because we don’t have any specific ordering of the means in mind as an alternative to equality, we must use a two-sided approach to hypothesis testing. SROP 2011 184 Hypothesis Testing – Equality of 3+ Means Post-hoc Contrasts Example: the Scheffe Test – examines all possible differences while controlling for Type I error: FScheffe X i Xj 2 1 1 MSW K 1 ni n j df1 K 1, df2 N 1 SROP 2011 185 Hypothesis Testing – Equality of 3+ Means Sample of Post-hoc Contrasts – assume equal variances Test Contrasts Type I error and power information Familywise a. Fisher’s LSD All pairs of means Tukey’s HSD All pairs of Familywise a. means. Same critical value is used for each test. Newman-Keuls All pairs of means. Critical value changes. Mystery a, but power is higher than Tukey’s test, but less conservative. SROP 2011 186 Hypothesis Testing – Equality of 3+ Means Sample of Post-hoc Contrasts – assume equal variances Test Contrasts Type I error and power information Ryan All pairs of means Controls a by using different levels for each pair of means. Scheffe Any number of post hoc contrasts. Familywise a. Low power if large number of contrasts are used. Sidak All pairs of means. For same a as Bonferroni, provides tighter bounds. SROP 2011 187 Hypothesis Testing – Equality of 3+ Means Sample of Post-hoc Contrasts – assume equal variances Test Contrasts Type I error and power information Dunn or Bonferroni Any number of c Familywise a. contrasts. Use Use per-contrast level of if contrasts are a/c. not orthogonal. Dunnett paired contrasts Familywise a. of 1 mean with (k-1) other means (e.g.,one control vs other treatments) SROP 2011 188 Hypothesis Testing – Equality of 3+ Means Sample of Post-hoc Contrasts – assume unequal variances Test Contrasts Type I error and power information Familywise a Tamhane's T2 Conservative pairwise comparisons test based on a t test. Games-Howell Pairwise Familywise a comparison test that is sometimes liberal. SROP 2011 189 Inferences – Two-Way ANOVA SROP 2011 190 Hypothesis Testing – Two-Way ANOVA The general class of factorial ANOVAs are designed for situations where our cases have been categorized according to two or more factors or characteristics. Two-way ANOVAs are a subset of factorial ANOVAs. Two-way ANOVAs have two factors, which are called "main effects". Each main effect represents a factor that could be examined in a one-way ANOVA. We’d like to examine them together, as well as their interaction, which tells us how the two factors work together to impact the outcome. Another way to think of the interaction is that it represents whether the effect of one factor depends on the second factor. SROP 2011 191 Hypothesis Testing – Two-Way ANOVA As with the one-way ANOVA, F statistics are used to test statistical significant There is an F-test of the main effects There is also an F-test of the interaction of the main effects. Planned comparisons and post-hoc tests are also used with two-way ANOVAs. As with one-way ANOVA, strive for equal group sizes Most powerful design In the case of 2+ factors, ensures factors are independent (not confounded). SROP 2011 192 Hypothesis Testing – Two-Way ANOVA Advantages of two-way ANOVA (vs. one-way) It is more efficient to study 2 factors at once than separately. Including a second factor thought to influence the response variable helps reduce the residual variation in a model of the data. In a one-way ANOVA for factor A, any effect of factor B is assigned to the residual (“error” term). In a 2-way ANOVA, both factors contribute to the fit part of the model. Interactions between factors can be investigated. The 2-way ANOVA breaks down the fit part of the model between each of the main components (the 2 factors) and an interaction effect. The interaction cannot be tested with a series of one-way ANOVAs. SROP 2011 193 Hypothesis Testing – Two-Way ANOVA Notation: X ijk score of the k th subject in group i of factor A and group j of factor B X ij Mean for subjects in group i of factor A and group j of factor B X i Mean score for group i of factor A, i 1,...,a X j Mean score for group j of factor B, i 1,..., b X Grand mean of all scores nij Number of subjects in group i of factor A and group j of factor B ni Sample size for group i of factor A n j Sample size for group j of factor B n Total sample size SROP 2011 194 Hypothesis Testing – Two-Way ANOVA The model: X ijk a i + j +a ij + ijk where: X ijk =score of the k th subject in group i of factor A and group j of factor B Grand mean in the population a i = Population factor A treatment effect for group i j Population factor B treatment effect for group j a ij Interaction of factors A and B ijk Residual of person k in group ij SROP 2011 195 Hypothesis Testing – Two-Way ANOVA Compare the two: One-Way ANOVA Two-Way ANOVA What was "error" in the one-way model is now being explained by the second factor (B) and the interaction. We hope the addition of the second factor will allow us to explain more (and get a larger h2). We may still have some error but ijk is not the same residual as for the one-way model (that's why I renamed the one-way error term as '). SROP 2011 196 Hypothesis Testing – Two-Way ANOVA is estimated by the sample grand mean a i is estimated by X i X =ai j is estimated by X j X =bi a ij is estimated by X ij X i X j X which is X ij ai b j X where: X ij X j = k X ijk nij i k n j , X ijk X i , X j k X ijk ni , i j k X ijk n SROP 2011 197 Hypothesis Testing – Two-Way ANOVA Three sets of hypotheses are tested: 1. Null hypothesis for factor A: H0 : a1 a 2 ... a a 0 or H0 : 1 2 ... a 2. Null hypothesis for factor B: H0 : 1 2 ... b 0 or H0 : 1 2 ... b 3. Null hypothesis for interaction of factors A and B: H0 : a11 a12 ... aab 0 or SROP 2011 198 Hypothesis Testing – Two-Way ANOVA Sums of squares for the model: SROP 2011 199 Hypothesis Testing – Two-Way ANOVA Again as for the one-way the F tests get big when the population treatment effects are nonzero and the MSA, MSB and MSAB are larger than MSW. For two-way ANOVA we will do three tests Two main effects Interaction In general: 1. Examine the interaction test 2. 3. Makes sense to examine first, to determine how much attention we devote to main effects If the interaction is significant, we may need to be especially cautious in interpreting the main effects. Examine the cell means Go back to the other F tests. SROP 2011 200 Hypothesis Testing – Two-Way ANOVA Interactions Plots of means are important in two-way ANOVA. Tables of means are acceptable but often plots of the means show patterns very quickly – patterns that may not be apparent from simple tables of means. Plots only suggest the presence of interactions; the ANOVA F test tells us whether the suggested interaction is significant or not. SROP 2011 201 Hypothesis Testing – Two-Way ANOVA Interactions Main effects are shown by lines at different heights or lines that slant, when lines are parallel, there is no interaction. SROP 2011 202 Hypothesis Testing – Two-Way ANOVA Interactions Ordinal interactions When the size of the effect (or mean difference) for one factor is not the same at all levels of the second factor. Disordinal interactions The means for levels of one factor (say B) are not in the same order within each of the levels of A. SROP 2011 203 You try it… Run a two-way anova to examine the equality of mean years of education (Educ) for 3 groups of raw Peabody test scores and gender from Wave 3… SROP 2011 204 SROP 2011 205 SROP 2011 206 SROP 2011 207 You try it… Run a two-way anova to examine the equality of mean years of education (Educ) for 3 groups of raw Peabody test scores and gender from Wave 3… SROP 2011 208 You try it… Run a two-way anova to examine the equality of mean years of education (Educ) for 3 groups of raw Peabody test scores and gender from Wave 3… SROP 2011 209 Fixed and Random Effects SROP 2011 210 Hypothesis Testing – Two-Way ANOVA Fixed effects What we have considered thus far Each level of the factor is represented in the model Random effects Factor levels for the analysis are sampled from a population of levels Common factors that appear as random effects include: Schools Classrooms Manufacturing lines Labs Appropriate in cases where: There are many factors (too many to include) and we want to generalize to a broader set of cases than the ones we will study SROP 2011 211 Hypothesis Testing – Two-Way ANOVA The factor levels must be randomly sampled because the variation among the means of the population of factor levels is estimated. Because we wish to generalize to the population of levels, estimation must take this into account. If both factors are random, the model is a random-effects model. If only one factor is random, the model is a mixed-effects model. Generally, a random factor is factor manipulated by the researcher, random factors in observational data are not common. SROP 2011 212 Hypothesis Testing – Two-Way ANOVA If a factor is random, the extra variation must be built into the model. As compared to a fixed factor, a random factor has: A different mean square The F test for the random factor is different (e.g., MSA/MSAB instead of MSA/MSW) SROP 2011 213 You try it… Run a two-way anova to examine the equality of mean years of education (Educ) for 3 groups of raw Peabody test scores and gender from Wave 3… IF Peabody test scores were considered to be a random effect (incorrect assumption for this data). SROP 2011 214 Multiple Regression SROP 2011 215 Multiple Regression A linear model with an interval/ratio outcome variable: Yi 0 1X 1i 2 X 2i ... p X pi i where: Yi outcome for person i 1...p regression coefficients i residual for person i, ~ N 0, 2 Yi Yˆi , where Yˆi predicted outcome for person i Goal: explain the variation in the outcome variable with independent variables, which in turn reduces the error (residual) variance. SROP 2011 216 Multiple Regression How do we find ‘good’ independent variables? A statistically significant correlation coefficient suggests a linear relationship between two variables. A scatterplot with a linear ‘trend’ for the points suggests a linear relationship exists between two variables. Independent variables are not highly correlated to one another. This is referred to as multicollinearity Affects estimation of all slopes Indicators of multicollinearity Sign of slopes change when new Xs are added Magnitude of slope for a predictor changes greatly when another variable is added to our model Increase in standard error of slope when new Xs are added. SROP 2011 217 You try it… Run a bivariate correlation table that includes Raw Peabody scores (AH_RAW), Highest grade completed (EDUC), BMI (BMI), annual income: wages/salaries (H3EC1A), number of hours/week spent at work (H3DA31), cumulative Math GPA (EAMGPAC), cumulative overall GPA (EAOGPAC) SROP 2011 218 Multiple Regression Modeling steps Examine omnibus F test for overall model significance. Examine individual t-tests for coefficient significance. Check residuals Histogram for assessment of normality Standardized residuals versus predicted values scatterplot to assess randomness of residuals Test homogeneity of variance. SROP 2011 219 Multiple Regression Indicators of overall regression model quality. MSE -- the mean squared residual from the regression – 2 compare this to SY. Similar to MSW in ANOVA. Adjusted R2 -- "variance accounted for“, adjusted for size Like h2 and w2 in ANOVA Calculated as: 2 adj R (n 1) 1 - (1 R ) (n p 1) 2 F test, H0 : 1 2 ... p Compare to critical F values, df1 p,df2 n p 1 This is also a test of the correlation between observed and ˆ with H0 : 2 0 predicted, 2 (Y,Y) SROP 2011 220 Outliers SROP 2011 221 Multiple Regression Sums of squares Similar to use in ANOVA Total sum of squares (SS Total) is partitioned into two parts Explained variation (also called SS Regression, SS Model or SS Explained) SS Residual (or SS Error) SS Total = SS Regression + SS Residual 2 2 ˆ ˆ (Yi - Y) (Yi - Y) (Yi - Yi) 2 SROP 2011 222 Multiple Regression Sums of squares table for regression SROP 2011 223 Multiple Regression Regression coefficients Each regression coefficient has a related t-test, H0 : j 0 Null tests whether slope is zero Calculated as: t bj se b j , df n p 1 Standardized regression coefficient Calculated as: bi s xi B sy Interpreted as the number of standard deviations of change in Yi to expect given one standard deviation of change in Xi SROP 2011 224 You try it… Run a linear regression using standardized Peabody in Wave 3 (pvtstd3c) as the outcome variable and math gpa (eamgpac), math course sequence (eamsqh), and age as the independent variables (reg1)… SROP 2011 225 You try it… Examine the residuals for normality and randomness… SROP 2011 226 Multiple Regression Categorical independent variables These variables require that a dummy code set be constructed. Dummy codes are a common way to code categorical variables for use in regression Effects coding and orthogonal coding are also options, although less common ANOVA can be run using a regression with a corresponding dummy code set SROP 2011 227 Multiple Regression A categorical variable that has 2 categories Referred to as a dichotomy or a binary (dichotomous) variable The dummy code set consists of one member (e.g., X=0 if male, X=1 if female). If X = 0, then: Ŷi b0 b1X i b0 Ymales If X = 1, then: Ŷi b0 b1X i Ymales Yfemales Ymales SROP 2011 228 Multiple Regression In general, a categorical variable with k categories will have a dummy code set of k – 1 members. Example: Subject Group T X1 X2 Jim John Joe Physical Mental Control 1 2 3 1 0 0 0 1 0 The intercept is the expected value of the reference group (in this example, the control group) The slopes are the difference in means between the control group and the corresponding group (in this example, the slope of X1 is the difference between the means of the control and physical groups) SROP 2011 229 You try it… Add female to your linear regression (reg2)… SROP 2011 230 Multiple Regression To test whether the addition of extra independent variables ‘add’ to the prediction of the outcome variable, use an ‘Increment to R2’ F test: p p F 1 R n p 1 RL2 RS2 2 L L s L where the subscript L indicates larger model, the subscript S indicates the subset model. Model S must be 'nested' within L. SROP 2011 231 You try it… Using the F-test, did the model improve with the addition of female? SROP 2011 232 Analysis of Covariance (ANCOVA) SROP 2011 233 ANCOVA In order to compare groups, it’s helpful if the groups are comparable in terms of the independent variables. This control can be created by a variety of ways: Sampling – random sampling helps create this control; this is the ‘gold standard’ Design control – groups are ‘forced’ to be equivalent Statistical control – the group differences are ‘controlled for’ in the statistical model. ANCOVA is a statistical model used for statistical control. SROP 2011 234 ANCOVA The idea of analysis of covariance (ANCOVA) is very similar to that of ANOVA. We wish to look for possible differences among group means BUT in ANCOVA we have some additional variable we want to “control for”, hold constant, or account for in our analysis. This additional variable is known as the covariate, and we will denote it as X. We’d like to remove the covariate X from having an influence on our outcome. If we could hold constant the values of X for our subjects, we would have a clearer picture of the differences on our outcome Y. SROP 2011 235 ANCOVA Gives us results that allow us to estimate what the group means on our outcome WOULD HAVE BEEN if the groups had the same means (or “were equivalent”) on the covariate. SROP 2011 236 ANCOVA Choosing an appropriate covariate The covariate X should be linearly related to the outcome Y, and we sometimes hope (or expect) that the groups of interest will show mean differences on the covariate (though that is not a requirement). If there is a treatment involved, we also have to know that the treatment did not affect X and similarly that X did not affect the treatment. So, for instance, if subjects are assigned to treatment groups on the basis of a variable, that variable would not be a good covariate. X should relate to Y in exactly the same way for all of the groups in our analysis; there should be “no covariategroup interaction”. SROP 2011 237 ANCOVA Additional assumptions Assume that the errors: are independent, are normally distributed Check using a histogram of the residuals have homogeneous variances across groups Use Levene’s test to check for equality of variances SROP 2011 238 ANCOVA The ANCOVA model: Yij m ai bw X ij mx eij where: Yij outcome score of person i in group j X ij Covariate score of person i in group j m Grand mean of Y in the population ai j th treatment effect in the population, with X held constant mi m bw slope of covariate in the population, which is the predicted change in Y given one unit change in X , with group membership held constant eij Residual or unexplained variance for person i in group j SROP 2011 239 ANCOVA The ANCOVA model: The w label on b represents the within-group slope Implies all groups must share the same population slope for X predicting Y. bw is multiplied not only by the X score, but the deviation of the score from the mean of X, mX. SROP 2011 240 ANCOVA SROP 2011 241 ANCOVA ANCOVA modeling steps: Test whether the covariate has the same slope for each group in the factor (or factors in a multi-way ANCOVA). One of the key assumptions of ANCOVA is that this interaction does not exist Plot the slopes for the different groups in a scatterplot Run a ANCOVA-like model, but with the addition of the interaction between factor and covariate If interaction is nonsignificant, covariate does not interact with group If interaction is significant, cannot use covariate Check the homogeneity of variance assumption Check residuals for normality SROP 2011 242 You try it… Pick an appropriate covariate for your ANCOVA model and evaluate whether it’s appropriate for use… SROP 2011 243 You try it… Evaluate the assumptions of the ANCOVA model… SROP 2011 244 Other Models from Experimental Design: Repeated Measures MANOVA SROP 2011 245 Repeated Measures Uses ideas of ANOVA: testing differences between different means. Each group of people (units) had a mean, which we compared. In Repeated Measures ANOVA, the same individuals can contribute to the different means. More formal definition: participants participate in all conditions of an experiment It might be the measurement of the same thing over time It might be exposure to multiple treatments, with one measure per treatment The assumption of independence that ANOVA relied on is violated SROP 2011 246 Repeated Measures The implication is that the F-test from ANOVA will lack accuracy in this situation. Instead, we assume that the variances of the differences between treatment levels are homogeneous. This assumption is called sphericity. Sphericity is evaluated using Mauchly’s Test The null is that there is sphericity If the null is rejected, we can’t depend on the F-test Big samples will likely have significant Mauchly Tests What to do if sphericity is violated: Apply correction factor to the F-test (e.g., Greenhouse and Geisser, Huynh and Feldt) Use MANOVA – it does not rely on the assumption of sphericity (although it does have less power). SROP 2011 247 Multivariate Analysis of Variance (ANOVA) When to use MANOVA Similar to ANOVA Repeated measures ANOVA inappropriate because of a violation of sphericity Interested in modeling several dependent variables that are correlated to one another. Tests differences between group means Multiple factors can be examined Interactions can be examined Used rather than multiple ANOVAs so that familywise error rate is not inflated. Detects differences between groups along a dimensional space rather than with one outcome. SROP 2011 248 Multivariate Analysis of Variance (ANOVA) MANOVA should not be used if the dependent variables are not correlated. The power of MANOVA depends on the correlation between the dependent variables and the effect size to be detected. Use theory to guide you on what variables to include in your analyses. SROP 2011 249 Generalized Linear Models: Dealing with Categorical Outcomes SROP 2011 250 Generalized Linear Models Different from General Linear Model, which usually refers to regression models with interval/ratio outcomes Applies many of the ideas of Linear Regression Big difference is that the outcome variable is something that is modeled by a distribution other than the normal distribution. The outcome is ‘linked’ to a linear model by the canonical link function. What kind of generalized linear model you have is determined by the canonical link that you use. SROP 2011 251 Generalized Linear Models Some examples: Logistic regression: data is binomial (0/1), possible distribution function is binomial and the link is the logit. Poisson regression: data consists of counts, distribution function is the Poisson and the link is the log. Linear regression: data is interval/ratio, distribution function is normal and the link is the identity function. Negative binomial regression: data consists of counts, the distribution function is the negative binomial and a possible canonical link is the log. Other models; Probit (binary data), Ordered Logit (ordinal data), Gamma (counts)… SROP 2011 252 A Few Useful References Agresti, Allan (2007). An Introduction to Categorical Data Analysis. Wiley. Dean, A.M. and Voss, D. (1998). Design and Analysis of Experiments. Springer. Draper, Norman R. and Smith, Harry (1998). Applied Regression Analysis. Wiley Field, Anthony (2009). Discovering Statistics. Thousand Oaks, CA: Sage. Kennedy, R. (2008). A Guide to Econometrics. Wiley. Kirk, R. (1995). Experimental Design: Procedures for the Behavioral Sciences. 3rd ed. Pacific Grove, CA; Brooks/Cole. SROP 2011 253