Download Summer Research Opportunities Program Statistics Boot Camp

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Categorical variable wikipedia , lookup

Analysis of variance wikipedia , lookup

Omnibus test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Summer Research Opportunities Program
Statistics Boot Camp
May 22-27, 2011
Instructor: Dr. Kimberly Maier
SROP 2011 1
What We Will Learn This Week




How to describe data (quantitatively and graphically)
How to select and compute statistical estimates and
hypothesis tests
How to use SPSS to accomplish these tasks
How to interpret and write about the results of the
estimates and tests
SROP 2011 2
Introduction of SPSS and
Course Dataset
SROP 2011 3
SPSS






Software designed to perform a wide variety of statistical
analyses
Other commonly used packages include SAS, Stata, R,
and Minitab
Runs in the Windows environment
Most recent version (19.0) is called IBM SPSS (earlier
version, 18.0, is called PASW).
Older versions of SPSS will work just fine for the
analyses we will do this week.
Two important files:


Data Editor
Viewer
SROP 2011 4
Data Editor





The type of file contains the data
This file has the extension .sav
New data files can be created
Old data files can be opened
The data file has two views
data view:
SROP 2011 5
Data Editor
variable view:
SROP 2011 6
Data Editor Menus

File:




Edit:




Select New to start a new SPSS Data file
Select Open to access an existing SPSS Data file
Many different types of files can be opened by SPSS, including
Excel and SAS data files
Cut, Copy, and Paste
Quick navigation to a particular case or variable
Options allows you to change a variety of SPSS environment
options such as how variables are displayed and where files are
saved
View


Can change fonts
Can display value labels instead of values
SROP 2011 7
Data Editor Menus

Data





Transform



Can insert new variable or new value
Can split files
Can select cases
Can weight cases
Compute: calculates data values according to an expression you
enter, has a variety of functions
Recode: assigns discrete values to a variable based on the
present values of the variable being recoded
Analyze

Used to select various statistical procedures, such as Crosstabs,
Chi-square, One-way ANOVA, and Linear Regression
SROP 2011 8
Data Editor Menus

Graphs



Utilities


Use this menu to get information about your variables and go to
them quickly in your data file
Window


Used to create bar charts, pie charts, histograms, scatterplots,
and many other graphs
Chart Builder and Legacy Dialogs will allow you to do the same
graphs, but they are two different ways of specifying the graphs
Use this menu to switch between open windows in the current
SPSS session
Help

Access Help including general topics, tutorial, statistics coach,
and a syntax guide
SROP 2011 9
You try it…
1.
2.
3.
Open SPSS on your computer
Download one of the course datasets from Angel: log into angel, click on ‘Content’,
then ‘DATA SETS (Add Health and NHanes) (2011)’, save to p:\ drive (your personal
storage space – referred to as AFS).
Open the file in SPSS…
SROP 2011 10
Data Editor – Data view




Actual data values are displayed
Row represent cases or observations
Columns represent variables (different items of
information you collect on each case)
Cells within the rows and columns are the intersection of
the data entered on each case and the variable.
SROP 2011 11
You try it…
Click on ‘Variable View’ tab at the bottom of Data Editor…
SROP 2011 12
Data Editor – Variable view

Displays variable definition information, including:






Variable and value labels
Data type
Measurement scale
User-defined missing values
Number of decimal places, number of digits
Section headings have been added to simplify navigation
SROP 2011 13
Data Editor – Variable view

Name



Type




Click once in this column to get a white box
with a gray square at the right.
Click on the gray square and select the data format
Formats include numeric, dollar, and string (rare)
Width


Enter up to eight characters
Must begin with a letter
Enter a number to change the text field width
Decimals

Assign the number of decimal places by entering a number or
using the arrows in the box
SROP 2011 14
Data Editor – Variable view

Labels




Important to label variables!
This is the label of the variable that is descriptive
Can be up to 256 characters
Values



Enter a value labels for categorical variables
Click once in the column to get a white box with a gray square on
the right
Click on the gray square to open the Value Labels dialog box
SROP 2011 15
Data Editor – Variable view

Missing








Specify codes for missing data
Examples include “98=Don’t Know” or “99=Not Applicable”
Click once in this column to get a white box
with a gray square on the right
Click on the gray square to open the Value Labels dialog box
Enter all the missing data values
Up to three missing data value codes can be entered for a
variable
A range of values would count as two value codes
Columns

Enter a number to change the column width
SROP 2011 16
Data Editor – Variable view

Align



Click once in this column to get a white box
with a gray square on the right
Click on the gray square to choose Right, Left,
or Center to align text in the columns
Measure





white box with a gray square on the right
Click on the gray square to choose the appropriate level of
measurement
Scale is used for continuously valued variables
Ordinal is used for categorical variables that have ordering
Nominal is used for categorical variables that do not have
ordering
SROP 2011 17
You try it…
Open a blank PASW data file…
SROP 2011 18
You try it…
1.
Enter the following data in PASW:
SROP 2011 19
You try it…
2.
Label Variables:
ID=identification number
Gender=Gender
Reading=scores on reading test
PreMath=scores on math pre-test
PostMath=scores on math post-test
Readlev=reading level group
3.
Label variable values for categorical variables:
Gender: 1=female, 0=male
Readlev: 0=low group, 1=high group
SROP 2011 20
You try it…
4.
Compute new variable:
‘Change’ is defined as “PostMath-PreMath” scores…
SROP 2011 21
Viewer


Displays the statistics, graphics, or output from your work
in SPSS
Can have more than one Output Viewer file open at the
same time



Use caution when multiple windows are open, SPSS will save in
whatever viewer is active.
The active Output Viewer is called the designated file and will
have a red exclamation point at the bottom of the window.
You can change the designation by clicking on the red
exclamation point in the toolbar
SROP 2011 22
Viewer







Output includes charts, Pivot tables, Text output, Titles,
and Notes
This file type has the extension .spv (.spo in earlier
versions)
Opens after you run the first analysis
The window is split into two parts, or panes
The left side is the outline pane – indicates the items
contained in the contents pane to the right
The right side is the Contents pane or the Output pane
To maneuver around, either click on the item in the
Outline pane to see it on the right or scroll down in the
Contents pane to get to an item.
SROP 2011 23
You try it…
Calculate descriptive statistics (mean, standard deviation) for variable ‘Change’:
SROP 2011 24
You try it…
Calculate descriptive statistics (mean, standard deviation) for variable ‘Change’:
SROP 2011 25
You try it…
Generate a scatterplot displaying the relationship between PreMath and PostMath scores for
all students…
SROP 2011 26
You try it…
Generate a scatterplot displaying the relationship between PreMath and PostMath scores for
all students…
SROP 2011 27
You try it…
Generate a scatterplot displaying the relationship between PreMath and PostMath scores for
all students…
SROP 2011 28
You try it…
Save the Data Editor and the Viewer files as Educ.sav and Educ.spv (you can save to either
the p:\ drive or to your flash drive)…
SROP 2011 29
Workshop Datasets
SROP 2011 30
Add Health





A subset of the National Longitudinal Study of Adolescent
Health (Add Health)
Goal of program: data that provides “…opportunities to
study how social environments and behaviors in
adolescence are linked to health and achievement
outcomes in young adulthood.”
Data from Wave 3 for 18-26 year old respondents.
Variables containing information on social, economic,
psychological and physical well-being for individuals.
Information on family, neighborhood, community, and
relationships.
SROP 2011 31
Add Health






Sample size: 4,882 people
Because some groups were under-sampled and some
were over-sampled, a weighting variable was constructed
that when used in the statistical models, allows us to
make inferences from the results of the dataset to that of
the population.
We must specify the weighting variable in SPSS.
The weighting variable: DATAWEIGHT
The weighting variable can be turned on and off in SPSS
by Data Weight Cases, and selecting DATAWEIGHT.
The status of the use of the weighting variable is
displayed in the lower right-hand of the dataset in SPSS;
Weight On is displayed.
SROP 2011 32
You try it…
1.
2.
3.
Download the Add Health dataset from Angel: log into angel, click on ‘Content’,
then ‘DATA SETS (Add Health and NHanes) (2011)’, save to p:\ drive (your personal
storage space – referred to as AFS).
Open the file in SPSS.
Apply the weighting variable DATAWEIGHT…
SROP 2011 33
NHanes






National Health and Nutrition Examination Study.
Goal of program: “…program of studies designed to
assess the health and nutritional status of adults and
children in the United States…” conducted by the
National Center for Health Statistics (NCHS) .
with a changing focus on a variety of health and nutrition
measurements which were designed to meet current and
emerging concerns.
This dataset contains data collected from 10,149 adults
and children, 2007-2008.
Data was collected using personal interviews and
physical examinations.
All but the very young provided blood samples.
SROP 2011 34
NHanes






Variables contain information on social, economic,
psychological and physical well-being for individuals, with
a focus on health.
Older subjects have more extensive physical examination
data.
There are two weighting variables for this dataset:
ALL13PLWT and EX13PLWT.
Use EX13PLWT for analyses involving examination data.
Use ALL13PLWT for all other analyses.
These weight variables set the weight for all subjects
younger than 13 years old to zero, hence all analyses will
only include people 13 years old or older (even though
you will find younger people in the dataset).
SROP 2011 35
You try it…
1.
2.
3.
Download the Nhanes dataset from Angel: log into angel, click on ‘Content’, then
‘DATA SETS (Add Health and NHanes) (2011)’, save to p:\ drive (your personal
storage space – referred to as AFS).
Open the file in SPSS.
Apply the weighting variable ALL13PLWT…
SROP 2011 36
Introduction to Statistics
SROP 2011 37
Why Statistics?

We never truly know the ‘exact’ answer to anything.



We cannot often collect data from a whole population




There is always some measurement error
Some processes may have more measurement error than others
Too expensive
Unrealistic to survey all of the population
Too time-consuming
Instead of gathering data from the whole population, we
use a sample.


A sample is a subset of the population
Measurements made on a sample are subsets of measurements
that could have been made on the population
SROP 2011 38
Why Statistics?
Questions we can ask using numerical data can include
those such as:




Do two groups of students have different average test scores?
Does increased calcium intake reduce blood pressure?
Do stock prices (when adjusted for inflation) show only random
variation?
SROP 2011 39
Research Paradigms
Classification by the whys of research

Basic (Pure) Research


Goal – advance theory
Setting – controlled laboratory
Applied Research



Goal –test theories; develop and test research hypotheses
Setting –less controlled than laboratory
Action Research



Goal –improve a teacher’s professional practice; provide
understanding to teacher
Setting –the context of application
SROP 2011 40
Research Paradigms
Classification by measurement procedure
 Qualitative research




Quantitative research




Goal –understanding of individuals & events in their natural states
Data –case study, historical research, ethnographic research
Results of a qualitative study not usually generalizable to other
cases.
Goal –objective explanation of cause & effect relationships &
determining whether the effects can be manipulated.
Data –numerical
Findings are usually generalizable to other cases.
Mixed Methods research – combination of the two
SROP 2011 41
Research Paradigms
Classification by intervention
 Experimental Research




Researchers plan an intervention and study its effects on
individuals or groups of individuals.
The researchers can manipulate the intervention.
Study of cause and effect.
Nonexperimental Research


Researchers do not manipulate intervention or no intervention is
implemented.
Two kinds of nonexperimental research


Causal comparative – intervention is not manipulated, but it is
implemented & cause and effect is studied.
Descriptive –no intervention is implemented, there is no interest in
study of cause and effect, only description.
SROP 2011 42
Some basic concepts about sampling





Population: The entire collection of events in which we
are interested
Sample: A subset of the population
Random sample: Each sample of the population has an
equal probability (chance) of being selected
External Validity: A ‘measure’ of the extent to which the
sample accurately reflects the entire population;
generalizability
Random Assignment: A method of assigning members
of a sample to either a treatment or control group
SROP 2011 43
Sampling



The purpose of sampling is to achieve generalizabilitythe ability to study a subset of the population we are
interested in.
To ensure a high level of generalizability, the sample
should be a good representation of the population.
Sampling frame (not always possible to create):




List of individuals from which a sample is actually selected
Ideally, the frame should list every individual in the population
A frame that leaves out part of the population is a
common source of under-coverage
Make sure that the sample represents the population you
intend to study.
SROP 2011 44
Sampling

Biased samples can result from:




Some strategies to help prevent biased sampling:





‘Opportunistic’ samples or ‘substitutions’
Non-response
Faulty sampling frame
Incentives
Follow ups for missing data
Novel data collection (computer vs. pen/paper)
Do not rely on volunteers
If some sampled individuals do not provide data, there is
the results using the data provided could be biased.
SROP 2011 45
Creating a representative sample

Simple random sample



Every sample has the same probability of being chosen.
For example, if the population consists of 100 people and the
sample will consist of 20 people, every one of the possible
samples of 20 people in the population have an equal chance of
being chosen.
Systematic sample




Every kth person is chosen
k = number of people in population divided by number of people in
sample.
Each sample still has an equal probability of being chosen.
The list of the population must be randomly ordered.
SROP 2011 46
Creating a representative sample

Stratified sample



Cluster sampling:



A random sample is taken from each strata (group).
The number sampled from each strata is determined by the
proportion that strata appears in the population and the sample
size required.
A cluster is a group whose members share common
characteristics.
Groups of people, not individuals, are selected.
In contrast: Incidental (convenience) sampling:




Sample is not drawn by researcher.
Not necessarily random.
Common in action research.
More common than you think!
SROP 2011 47
Gathering Data



Surveys administered to a sample
Data are used to make inferences about the population
Designs for gathering data

Experimental




Factorial Experimental



Completely randomized design
Randomized block design
Latin square
Designs that look at one factor at a time
Designs that can identify interactions between multiple factors
Observational study
SROP 2011 48
Data Collection

Data that has already been collected





HSB (High School and Beyond)
NELS (National Education Longitudinal Study)
PISA (Program for International Student Assessment)
See http://nces.ed.gov/surveys/ for more data sets
Data that you collect



Pilot study surveys
Generate sample for data collection from the appropriate
population
Collect data



Follow up on missing data
Offer incentives to study participants
Enter and clean data
Pros and cons of each approach?
SROP 2011 49
Planning Your Research
SROP 2011 50
Research Plan
Describes the methods and procedures you will use to
carry out your research.
Advantages of a research plan





Forces you to think through every aspect of the research.
Provides a way that others can critique your research.
Helps guide the research.
SROP 2011 51
Parts of a Research Plan

Introduction


Sets the stage for the rest of the document.
Statement of the topic



Literature review



Be clear and concise
Provides direction
Sets the context for your topic
Show what’s been done before, compare and contrast works, and
show strengths and weaknesses
Hypothesis statement


Each hypothesis should have an underlying explanation for its
prediction
This is based on the literature.
SROP 2011 52
Parts of a Research Plan

Method – detailed description





Research participants
Instruments – describe how they will measure your variables in
the study
Materials/Apparatus
Design – general strategy for conducting the data gathering
Procedure


Include sampling technique.
Assumptions & limitations
SROP 2011 53
Parts of a Research Plan

Data Analysis




Time Schedule



Describes the statistical techniques to be used.
Is the analysis impossible?
Hypothesis determines the appropriate statistical technique to be
used.
Something will always go wrong.
Everything takes twice as long as you think.
The hypotheses (null & alternative) that you have formed
provide the direction of the research plan


Encompass the research questions
We will talk about hypotheses in more detail in a later module
SROP 2011 54
Validity
SROP 2011 55
Validity




One of two important characteristics of a test,
assessment, or measure.
A measure of the accuracy of inferences and
interpretations that can be made using measurements.
A test or measurement is valid to the extent to which it
lives up to the claims that the researcher has made for it.
Different types of claims are appropriate for different
measurement situations, so different types of validity
need to be addressed.
SROP 2011 56
Types of Validity

Content validity





The extent that the test items represent the content that the test is
designed to measure.
Determined systematically by comparing the test content with the
course content (or reference content).
Important in achievement testing and tests of skill and proficiency.
Particularly important in selecting tests to use in
experiments/different instructional methods/programs.
Concurrent validity



The extent that individuals’ scores on the test correlate with their
scores on another test administered at the same time or within a
short interval of time.
Carried out to locate simple, easy-to-use tests in place of
complex, expensive tests.
Calculate correlation between scores on test A and text B (given
within a short time).
SROP 2011 57
Types of Validity

Predictive validity



The extent that scores on the test predict individuals’ subsequent
performance on a criterion measure.
To measure-administer a test to a group, wait until behavior has
occurred, and determine degree of relationship between test
score and occurrence of behavior.
Construct validity




The extent that a test can be shown to measure a particular
hypothetical construct.
To measure begin with hypotheses about characteristics of
people who would obtain high scores as opposed to the
characteristics of those who would have low scores.
See how well test conforms to these hypotheses.
Most tests use multiple sources of evidence.
SROP 2011 58
Types of Validity

Face validity



The extent that the test appears to measure what it purports to
measure.
Claims of this validity are not too convincing.
Different from content validity because it is determined
subjectively.
SROP 2011 59
Reliability
SROP 2011 60
Reliability





Refers to the consistency of the measurement we obtain
for people on a test.
The notation is r (same as the Pearson Correlation
coefficient).
r ranges from 0.0 to 1.0, with 1.0 having the highest
possible reliability.
In physical sciences, synonymous with ‘accuracy’
In social sciences, reliability is a characteristic of a
survey, questionnaire, assessment.
SROP 2011 61
Reliability
Standard Error of Measurement (SEM)
 Tells you what range the True Scores should fall within
from the Observed Scores.
 The calculation of SEM:
SEM  s 1 Reliability
where s = standard deviation of the test or instrument

To use, calculate SEM for the test, and calculate the
mean of the observed scores, and use the empirical rule
for the normal distribution (i.e. 68% of true scores should
be 1 SEM of mean).
SROP 2011 62
Reliability
Factors affecting reliability – Psychological Measures
 Heterogeneity of the subjects tested – the more
heterogeneous the subjects, the more reliable the test.
 Test length – the longer the test, the more reliable.
 The difficulty of the items – the more mixed difficulty on
the test, the higher the reliability.
 Quality of items – the higher the quality, the higher the
reliability.
 How high should the reliability be?
 Depends on what you’re using the test results for (what
decisions will be made about the students?).
 Usually a minimum of 0.5, with a reliability of 0.9 for
high-stakes decisions.
SROP 2011 63
Reliability
How high should the reliability be?
 Depends on what you’re using the measurements for




What decisions will be made?
High-stakes?
Rule of thumb for psychological measures: usually a
minimum of 0.7, with a reliability of at least 0.9 for highstakes decisions.
Rule of thumb for physical science measures: likely
more demanding
SROP 2011 64
Levels of Measurement
SROP 2011 65
Measurement Terminology

Variable: A property of an object or event that can take
on different values.


Examples: age, weight, height, IQ, math achievement
Independent vs. Dependent variables

An independent variable, according to theory, has a causal
influence on the dependent variable

Also known as:
 Predictor
 Explanatory variable

A dependent variable is of greatest substantive interest to the
researcher—the variable with real world implications

Also known as:
 Predicted variable
 Outcome
 Response variable
SROP 2011 66
Measurement Terminology

The variable is also defined by the nature of the
measurement

Discrete variable

Measures that are made by placing observations into mutually
exclusive and exhaustive categories
 ordinal variable: ordered categories:
 nominal variable: unordered categories

Continuous variable

Measures that are made by positioning observations on a linear
continuum.
 Interval variable: continuum does not have an absolute zero
 Ratio variable: continuum has an absolute zero
 Interval and ratio variables can be used interchangeably in statistical
calculations when a continuous-valued variable is required
SROP 2011 67
Measurement Terminology

Nominal (categorical)





A set of labels applied to groups composed of individuals with
similar characteristics.
Typically the labels are exhaustive (cover everything) and
mutually exclusive (don’t overlap).
Characterized by mathematical functions of = and 
Example: Department: CEP, EAD, TE, KIN.
Binary (also known as dichotomous)


Special nominal variable consisting of two groups
Example: Gender: Male, Female
SROP 2011 68
Measurement Terminology

Ordinal (categorical)






A set of labels applied to groups composed of individuals with
similar characteristics
The labels indicate more or less of a quality
Labels can be rank ordered.
Characterized by mathematical functions of =, ≠, <, and >
Example: Liking of math: high, medium, low
Interval/Ratio (continuous):





Numeric values assigned to individuals
The measures can be characterized by mathematical functions of
=, ≠, <, >, +, and –.
Intervals are assumed to be equal across the range of the
measures
Example: IQ measure.
In SPSS: Scale
SROP 2011 69
You try it…
Assign a measurement level to each of the variables (except ID) in Educ.sav…
SROP 2011 70
Descriptive Statistics
SROP 2011 71
Statistics Terminology

Descriptive Statistic (also known as a sample statistic)





Parameters



A number used to describe some aspect of a sample (data set).
May refer to the location, dispersion, symmetry, or flatness/
peakedness of the distribution of data for a particular variable
Or may refer to the association/relationship between two variables
Descriptive statistics are used to estimate parameters
Numbers used to describe some aspect of the population
Typically not observable
Inferential Statistic

A number used to generalize from observations in a sample data
set to a population that the sample represents.
SROP 2011 72
Descriptives


Data summaries can be accomplished with graphical or
numerical methods
Graphical methods
• Pie charts
• Box Plot
• Boxplots
• Scatterplots

• Stem-and-leaf plots
• Frequency histograms
• Dotplots
Numerical methods




Frequency table (Contingency table)
Measures of central tendency
Measures of variance
Measures of association
SROP 2011 73
Graphs for Categorical Data
Pie Chart
You try it…
Create a pie chart for where the
respondent lives (H3HR2)
SROP 2011 74
Graphs for Categorical Data
Bar Chart
You try it…
Create a bar chart for where the respondent lives
(H3HR2)
SROP 2011 75
Numerical Summaries for Categorical Data
Frequency Table
You try it…
Create a frequency table for where the
respondent lives
SROP 2011 76
Numerical Summaries for Categorical Data
Crosstabs Table – for 2 variables
You try it…
Create a crosstabs for gender and where the
respondent lives
SROP 2011 77
Graphs for Continuous Data
Histogram
You try it…
Create a histogram for age at first paying
job
SROP 2011 78
Measures of Central Tendency



For population: parameters (i .e.,  )
For sample: sample statistics (i .e., X )
Measures of central tendency:

Mode (Mo) – for both categorical and continuous data


Median (Me) – for ordinal or continuous data


Most frequently occurring value
Middle value of an ordered listing of values
Arithmetic Mean – for continuous and ordinal data
N
N
X
Xi
i 1
Xi
n
N
i 1
SROP 2011 79
Measures of Central Tendency

A measure of ‘Skewness’ will indicate the extent and the
degree of asymmetry about the mean.
Positive Skew, skewness>0
Negative Skew, skewness<0
Mode
Mode
Median
Mean
Median
Mean
General rule of thumb: you can assume normality if -1 < skewness < 1
From: Huck, S. (2004) Reading Statistics & Research (4th ed.). Allyn & Bacon.
SROP 2011 80
Measures of Variability



For sample, sample statistic, (i.e., s2)
For population, parameter, (i.e.,  2 )
Measures of variability

Range – for ordinal and continuous data



The difference between the two most extreme data points (maximum
– minimum).
Sensitive to outliers
Interquartile Range (IQR) – for ordinal and continuous data


The difference between the 25th and 75th percentiles.
insensitive to outer 50% of the data
SROP 2011 81
Measures of Variability

Variance – for continuous data


The average squared deviation of scores from the mean
Sensitive to outliers
N
 X2 
 X
i 1
i
 
N
N
2
s X2 
 X
i 1
i
 X
2
N 1
SROP 2011 82
Measures of Variability

Standard deviation – for continuous data



The average absolute deviation of scores from the mean
The square root of the variance
Sensitive to outliers
N
X 
 X
i 1
i
N
 
N
2
sX 
 X
i 1
i
X
2
N 1
SROP 2011 83
You try it…
1. Use SPSS to generate descriptive statistics for ‘age at first paying job’
2. Use SPSS to generate descriptive statistics for where respondent lives (H3HR2)
3. Use SPSS to generate descriptive statistics for ‘gender’
SROP 2011 84
Inferences – Confidence Intervals
SROP 2011 85
Making Inferences

Parameter estimation and confidence intervals


What is the value of the parameter?
Hypothesis testing

Is the parameter equal to a specific value?
SROP 2011 86
Making Inferences – Confidence Intervals


A 100(1-a)% level confidence interval is an interval
estimator of the population parameter.
It has the form:
Lower bound  Population Parameter  Upper bound



There are formulas for computing the upper and lower
bounds from the data the sample provides.
The 100(1-a)% = confidence level is typically a
percentage between 90% and 99%
Example: a choice of a = 0.10 gives a 90% confidence
interval.
SROP 2011 87
Making Inferences – Confidence Intervals




Sample statistics are used as estimates of population
parameters
Confidence intervals are constructed to reflect the
uncertainty of the estimate
The confidence interval is based on knowledge of the
sampling distribution of the population parameter
All statistics (mean, variance, etc.) have sampling
distribution.


This is how we do hypothesis testing
Does the sample come from a population that has a parameter
equal to a particular value?
SROP 2011 88
Sampling Distribution – The Mean

Central Limit Theorem – sampling distribution of the
mean



When the sample size n is large, the sampling distribution of the
mean will be approximately normal.
When the population is normally distributed, the sampling
distribution of the mean is exactly normal for any sample size.
The mean and standard deviation (standard error) of the sampling
distribution of the mean are:
 x  ,  x 


n
Let x  sample mean of n measurements . Then the mean and
the standard deviation (standard error) of the sampling distribution
of the mean can be estimated:
x  x,  x 

n
SROP 2011 89
Sampling Distribution – The Mean

The sampling distribution gives a probability model for
the distribution of values of the statistics (for example, the
mean) in repeated sampling of a population having a
particular parameter value.
Parent Population
Sampling Distribution of the Mean
http://onlinestatbook.com/stat_sim/sampling_dist/index.html
SROP 2011 90
Sampling Distribution – The Mean
SROP 2011 91
Sampling Distribution – The Mean


When the population variance is unknown, then the
sampling distribution for the mean is a t-distribution
Use of this distribution takes account of the sampling
error in the variance estimate
z distribution
t(10) distribution
SROP 2011 92
Confidence Interval for the Mean,  known

To construct a 100 1  a % confidence interval for the
population mean for any confidence coefficient 1  a  :
X  za / 2 y , where  y 




n
The confidence interval for  is: X  za /2 y , X  za /2 y
za / 2 is the value of z having a tail area of a / 2 to its
right.
Common values:
z
a


a /2
.10
1.645
.05 1.96
.01 2.575
.001 3.31
SROP 2011 93
Confidence Interval for the Mean,  unknown


To construct a 100 1  a % confidence interval for the
population mean for any confidence coefficient 1  a  :
s
X  ta / 2 y , where sy 
n
The confidence interval for  is:
X  t
s , X  ta /2,df  n 1sy
a /2,df  n 1 y
t
 a /2,df  n 1


is the value of t having a tail area of a / 2 to its
right.
Critical values change with sample size, need to use
statistical tables.
SROP 2011 94
Confidence Interval for the Mean,  unknown
SROP 2011 95
You try it…
Using descriptive statistics, calculate the confidence interval for the mean of the raw
Peabody scores (raw_ah)…
SROP 2011 96
Inferences – Hypothesis Testing
SROP 2011 97
Making Inferences – Hypotheses
What’s a Hypothesis?
 An ‘educated guess’
 Based on theory & previous research
 Different from research questions or objectives
 Very specific
 Drives the research
SROP 2011 98
Hypothesis Tests

A hypothesis test consists of two hypotheses:





Null Hypothesis: Ho: status quo
Alternative Hypothesis: HA: research hypothesis
We wish to test the null hypothesis versus the alternative
hypothesis.
The set-up is not symmetric: it takes strong evidence to
reject the null in favor of the alternative hypothesis.
The alternative hypothesis can be:


Two-tailed (testing for change either way)
One-tailed



Upper tail (testing for an increase)
Lower tail (testing for a decrease)
The decision about the form of the alternative hypothesis
must be made before you look at the data (a priori).
SROP 2011 99
Hypothesis Tests





A statistical test (at a selected level of significance
usually labeled a) will reject the null hypothesis if the test
statistic is in the critical/rejection region.
The level of significance is the type I error rate of the test
procedure.
Type I error is the error of rejecting the null hypothesis
when you shouldn’t.
Example: a test procedure with .05 level of significance
would reject the null in 5% of the possible samples even
though the null is true.
Example: HO:  = 15 versus HA:  ≠ 15, a type I error
would be to claim that the mean age at first paying job is
not equal to 15 when in reality it was equal to 15.
SROP 2011 100
Hypothesis Tests







Typical values of significance (a are .05, .10, .02, .01, although the
choice is often field-dependent.
When our data rejects HO at a significance level of a, then we say
that the data is significant at level a.
When using statistical software, testing is often approached via the
p-value and then we compare the p-value to a.
The p-value gives more information than simply testing at a fixed
significance level.
The p-value measures how extreme the data is (in favor of HA when
HO is true; or tells us how far out the data is in the tail(s) of the HO
distribution.
Example: p-value = .01: data at the 1% cut, data is unlikely if HO is
true.
Example: p-value = .45: data at the 45% cut, data is fairly likely if HO
is true.
SROP 2011 101
Hypothesis Tests




Small p-values are evidence against HO .
The smaller the p-value, the more evidence against the
null hypothesis.
To tie this together with significance level, if the data
gives a p-value ≤ a, then the data would reject HO at
significance level a.
Example: p-value = .06 would not reject HO at the a =
0.05 level.


Another way to think about it is that the data isn’t in the 5% tail, it
hasn’t passed the critical value for a = 5% .
Data is not extreme enough to reject null.
SROP 2011 102
Hypothesis Tests

What happens if assumptions of the statistical test are
violated?


The p-value reported may not be the actual p-value, so the
conclusions from the statistical test are invalid.
Hard to know whether actual p-value is actually bigger or smaller
than reported.
SROP 2011 103
Inferences – t-tests
SROP 2011 104
Hypothesis Testing – The Mean


Format of hypotheses (two-tailed):
H0 : x  0
 Null Hypothesis:

Alternative hypothesis: HA : x  0
Sampling distribution:




z distribution if population variance is known
t distribution if population variance is unknown.
Test statistic for known population variance (z-test for
mean):
X  0
z
X
Test statistic for unknown population variance (t-test for
mean):
X  0
t
sX
SROP 2011 105
Hypothesis Testing – The Mean
Assumptions of the tests
 Normality of the population for smaller sample sizes
SROP 2011 106
You try it…
Test the null hypothesis that the mean of the raw Peabody test scores is equal to 67.
SROP 2011 107
Hypothesis Testing – Equality of Two Means



Means must be from two independent groups
Format of hypotheses (two-tailed):
H0 : 1  2
 Null Hypothesis:

Alternative hypothesis:
HA : 1  2
Sampling distribution:

t distribution with mean of zero and standard error of: .
s x1 x2 

n1  1 s12  n2  1 s22
n1  n2  2
1 1

n1 n2
Independent samples t-test statistic:
t
x1  x 2
 n1  1 s12   n2  1 s22
n1  n2  2

1 1

n1 n2
x1  x 2
s x1 x 2
df  n1  n2  2
SROP 2011 108
Hypothesis Testing – Equality of Two Means
Assumptions of the test:



Normality of the sampling distribution of the means. For equal
sample sizes, violating this assumption has only a small impact on
the difference between the assumed and true Type I error rate
provided the distribution shapes are similar or are both symmetric. If
the distributions are skewed, then serious problems arise unless the
variances are similar.
Homogeneity of variance. For equal sample sizes, violating this
assumption has only a small impact on the difference between the
assumed and true Type I error rate.
Equality of sample size. When sample sizes are unequal and
variances are non-homogeneous, there are large differences
between the assumed and true Type I error rates
SROP 2011 109
What if the population variances are not equal?

Assess whether the variances are equal using hypothesis
testing (null is that 1  2.




The null hypothesis is that the two sample variances could have
come from the same population.
This approach is not recommended when the data are not
normally distributed.
SPSS performs ‘Levene’s Test for the Equality of Variances:
If the null hypothesis is rejected, must use an approximate t-test
and corrected degrees of freedom or use a nonparametric test.
SROP 2011 110
Hypothesis Testing – Equality of Two Means
Effect size, standardized mean difference
X1  X 2 
 Calculated as:

 n1  1 s12  n2  1 s22
n1  n2  2


Represents the difference between the two means as
standardized, in terms of standard deviation units.
Cohen provided guidelines for interpreting this particular
effect size:
Small = 0.25, Medium = 0.50, Large = 1.0 or greater

These are only guidelines: “These qualitative
adjectives…may not be reasonably descriptive in any
specific area. Thus, what a sociologist may consider a
small effect may be appraised as medium by a clinical
psychologist.” Cohen, 1977, p. 278.
Cohen, J. (1977). Statistical Power Analysis for the Behavioral Sciences, 2nd ed. New York: Academic Press.
SROP 2011 111
SROP 2011 112
You try it…
Test the null hypothesis that the means of the raw Peabody test scores for boys and girls
are equal.
SROP 2011 113
SROP 2011 114
Hypothesis Testing – Equality of Two Means



Means must be from two dependent groups
Format of hypotheses (two-tailed):
H0 : 1  2
 Null Hypothesis:

Alternative hypothesis:
HA : 1  2
Sampling distribution:

t distribution with mean of zero and standard error of: .
 d
n
sy  y
1

s  s  2s1s2r


n
2
1
2
2
2
i 1
i
d

2
n  n  1
Paired samples t-test statistic:
t
x1  x 2
s12  s22  2s1s2r
n
df  npairs  1
SROP 2011 115
You try it…
Test the null hypothesis that the means of the standardized Peabody test scores for waves 1
(PVTSTD1) and 3 (PVTSTD3C) are equal.
SROP 2011 116
Chi-square Goodness of Fit Test
SROP 2011 117
Chi-square goodness of fit test



When we have a single categorical variable, and we want
to determine whether observed classifications are
consistent with a theory, use a chi-square goodness of fit
test
Allows us to compare observed relative frequencies
(percentages) to theory-based relative frequencies using
the hypothesis testing framework.
Our observed statistics are the proportions associated
with each classification, and our null parameters are the
expected proportions for each classification.
SROP 2011 118
Chi-square goodness of fit test
Example:
 We are interested in determining whether a purposive
sample that we have drawn is comparable to the US
population with respect to ethnicity.
 Research question: Is our sample representative of the
proportions of racial groups in the general population?
 The null hypothesis: The observed proportions of
Asians, blacks, Hispanics, and whites in our sample
reflect the proportions of these groups in the general
population.
SROP 2011 119
Chi-square goodness of fit test
Example:
 In this case, frequencies and proportions calculated using
the data we collect are the observed proportions.
 The theory-based null parameters (shown in the bottom
row of the table) are obtained from the US Census are
expected proportions.
Asian
Black
Hispanic
White
Observed Frequencies
30
50
30
200
Observed Proportions
(ng / N)
.10
.16
.10
.65
Census Proportions
.04
.12
.10
.74
SROP 2011 120
Chi-square goodness of fit test

The chi-square goodness of fit statistic:
2
2
k
O

E
O

E



i
i
2
k  

Ei
E
i 1

O is the observed frequency , and k equals the number of
classifications in the table (i.e., the number of cells), and
E is the expected frequency given the sample size.
The numerator is the squared difference (deviation)
between the observed sample size and that predicted by
theory.
The denominator weights each difference by its expected
value (cells with larger expected values will have larger
deviations).
The degrees of freedom are k - 1



SROP 2011 121
Chi-square goodness of fit test



The expected frequencies (E) constitute what we would
expect to be the values of the observed frequencies (O)
if, indeed, our theory was true.
That is, the expected number of cases in each group
should be consistent with p (our theory-based
proportions).
We compute our expected values (E) as:
Ei  p i N

The expected value for each cell (Ei) should equal the
total number of participants in the study (N) times the
theory-based proportion for that group (pi).
SROP 2011 122
Chi-square goodness of fit test


The chi-square statistic tells us how far, on average, the
observed cell frequencies are from the theory-based
expectations.
From our example:
Asian
Black
Hispanic
White
Observed Frequencies
30
50
30
200
Expected Frequencies
12.4
37.2
31
229.4
Observed – Expected
17.6
12.8
-1.0
-29.4
(O – E)2
309.76
163.84
1.00
864.36
(O – E)2 / E
24.98
4.40
0.03
3.77
 
2
k
O  E 
E
2
 33.15, df  3
SROP 2011 123
Chi-square goodness of fit test



The table of critical values for the chi-square distribution
with 3 degrees of freedom for a = .05 equals 7.82.
Hence, the observed differences between our sample
and our expected values are extremely unlikely if the null
hypothesis is true—that the vector of observed
probabilities equals the vector of theory-based
probabilities.
Substantively, we conclude that our sample does not
match the population that it was intended to represent.
SROP 2011 124
Chi-square goodness of fit test
Summary of test:
1.
2.
3.
4.
Determine which test statistic is required for your problem and
data. The chi-square goodness-of-fit statistic is relevant when you
want to compare the observed frequencies or proportions for a
single categorical variable to the frequencies predicted by a
theory.
State your research hypothesis—that the observed frequencies
were not generated by the population described by your theory.
State the alternative hypothesis: that the observed proportions
are not equal to the theory based proportions (i.e., pobserved 
ptheory—this is a non-directional test).
State the null hypothesis: that the observed proportions are equal
to the theory-based proportions (i.e., pobserved = ptheory—here, p is
the population parameter estimated by p, which is not the p-value
but the proportion in each group observed).
SROP 2011 125
Chi-square goodness of fit test
Summary of test, continued:
5.
6.
7.
8.
Compute your observed chi-square value.
Determine the critical value for your test based on your
degrees of freedom and desired a level OR determine the
p-value for the observed chi-square value based on its
degrees of freedom.
Compare the observed chi-square value to your critical
value OR compare the p-value for the observed chi-square
statistic to your chosen a, and make a decision to reject or
retain your null hypothesis
Make a substantive interpretation of your test results.
SROP 2011 126
You try it…
Run a chi-square goodness of fit test on birth month (what is your null hypothesis?)…
SROP 2011 127
You try it…
Run a chi-square goodness of fit test on birth month (what is your null hypothesis?)…
SROP 2011 128
Chi-square Test of Association
SROP 2011 129
Chi-square test of association

Also known as:





Contingency test
Test of independence
This test examines whether two categorical variables are
independent of one another.
If the pattern of frequencies of outcomes in one variable
are not related to the pattern of frequencies in the other
variable, they are independent.
The counts of observations for two categorical variables
can be displayed in a contingency table, or cross-tab
table
SROP 2011 130
You try it…
Create a crosstabs table for gender and ‘Ages 5-12 did not listen’ (H3RA4)…
SROP 2011 131
Chi-square test of association



To determine if the two categorical variables are
independent, we can examine the expected and
observed frequency counts in each cell and use the chisquare statistic to determine if they differ statistically.
The observed counts are the number of occurrences for
each cell ni
The expected counts must be calculated under the
premise that the two variables are independent.
SROP 2011 132
You try it…
Display the expected counts for the crosstabs table…
SROP 2011 133
Chi-square test of association


The null hypothesis that is tested is that the two
categorical variables are independent.
The test statistic is the Pearson chi-square, calculated by:

2
O


ij
 Eij 
E ij
2
, df   R  1C  1
where:
Oij  observed counts
E ij 
RiC j
N
 expected counts
SROP 2011 134
You try it…
Calculate the Pearson chi-square test of association for the crosstabs table…
SROP 2011 135
Assumptions for chi-square test of association

Normality:



The distribution of possible values for any single cell in the table is
normally distributed, given that the sample size is large enough
and the probability of an observation falling in that cell is not
extreme.
Also, recall that the expected cell frequencies for the chi-square
test are defined as Np (total sample size times the probability of
being in that cell).
Hence, the requirement of normality can be satisfied if the
expected cell frequencies are of sufficient size. A rule of thumb is
that all of the expected cell frequencies should be 5 or greater.
SROP 2011 136
Assumptions for chi-square test of association

Inclusion of non-occurrences:

Another requirement of the chi-square test is that all cases in the
data set be included in the contingency table. That is, the coding
system must be exhaustive—it must represent all elements of the
sample.
SROP 2011 137
Measures of association for categorical data

Phi coefficient – applies only to 2 X 2 tables



The absolute value of this measure ranges from 0 to 1.
0 indicates no association and 1 indicates a perfect relationship
between the two variables in the contingency table.
As a rule, values less than .2 indicate a negligible relationship,
values from .2 up to .5 indicate an important relationship, and
values from .5 up to 1 indicate a very strong relationship.
phi coefficient   
2
N
SROP 2011 138
Measures of association for categorical data

Cramer’s phi coefficient



Also known as Cramer’s V
Same range and rules of thumb as phi coefficient
Applies to any two-way table.
Cramer ' s phi  C 
2
N  k  1
where k  min R,C

Please note:


A two-way table is a contingency table for two variables (each
variable can have 2 or more categories)
A 2X2 table is a contingency table for two variables, where each
of the variables has only two categories (resulting in four cells).
SROP 2011 139
You try it…
Calculate the effect size for the Pearson chi-square test of association…
SROP 2011 140
Exact test for chi-square test of association

Recall that normality is an important assumption for the
chi-square test of association.



When the expected cell sizes are 5 or greater, we usually assume
that we’ve met this assumption.
When this requirement is not met, you can use exact statistics to
perform the hypothesis test.
The exact statistic is based on the empirical probability of
observing a certain configuration of cell frequencies with
fixed marginal frequencies.
SROP 2011 141
Exact test for chi-square test of association



To perform an exact test, you would rank order the tables
based on the value of one of the cells, determine the
probability of observing a value in that cell equal to or
less than the observed value, and declare that probability
as the p-value for your hypothesis test.
You can request SPSS to perform an exact test.
Typically the p-value for an exact test will be greater than
that from a chi-square test that relies on normality.
SROP 2011 142
You try it…
Rerun your previous chi-square test of association and request an exact test (labeled Fisher’s
Exact Test in output)…
SROP 2011 143
Chi-square test of association



So far, we’ve used categorical data without any attention
to the level of measurement.
If one or more variables are ordinal, then it’s a good thing
to attend to this in the analyses.
The test can potentially have higher power when the
pattern of association is determined by the order of an
ordinal variable.
SROP 2011 144
Chi-square test of association

If you have two ordinal variables, you can test the
significance of a linear relationship




This is the ‘Linear by Linear’ association chi-square
Also known as the Mantel-Haenszel test for linear association
To use, should have 5 or more expected counts per cell
Use Gamma or Kendal’s tau-b (more conservative) as
measures of the strength of the relationship:
SROP 2011 145
Chi-square test of association

If you have one ordinal variable that you can assume has
an underlying interval/ratio variable and one nominal
variable, use Eta as a measure of association



Eta ranges from 0 to 1
Values close to 1 indicate a strong relationship
To use, should have 5 or more expected counts per cell
SROP 2011 146
Inferences – One-Way ANOVA
SROP 2011 147
Hypothesis Testing – Equality of 3+ Means
One-way ANOVA (Analysis of Variance)
 Involves a single factor (categorical variable) and an
interval/ratio variable
 t-test is not applicable because:



Type I errors propagate
Not efficient
Format of hypotheses:


Null Hypothesis: H0: 1 = 2 … = k
Alternative hypothesis: HA: at least one mean differs

The model:

Sampling distribution:

X ij    a i   ij
F distribution
SROP 2011 148
Hypothesis Testing – Equality of 3+ Means
Involves 3 sums of squares:
 SStotal : the sum of the squared differences between each
observation and the mean of all observations, ignoring
group membership:
J
nj

SStotal   X ij  X ..
j 1 i 1


2
SStreatment : the sum of the squared differences between
each group’s mean and the mean of all observations (the
grand mean), ignoring group membership:
J

SStreatment  n j  X j  X ..
j 1

2
SROP 2011 149
Hypothesis Testing – Equality of 3+ Means

SSerror : the sum of the squared differences between each
observation and the mean of the group (the sum of the
sums of squared deviations of scores around each
group’s mean).
J
nj

SSerror   X ij  X j
j 1 i 1

   n
2
J
j 1
j
 1 s 2j
These 3 sums of squares are related:
SStotal  SStreatment  SSerror

Each has its own degrees of freedom:
dftreatment  J  1 dferror  N  J
dftotal  N  1
dftotal  dftreatment  dferror
SROP 2011 150
Hypothesis Testing – Equality of 3+ Means


We can create a mean square for each sum of square by
dividing the relevant sum of squares by its degrees of
freedom.
Results in three indicators of variance depicted by the
three mean squares:

Mean square of total – the variance of all observations, ignoring
group membership
MStotal

SStotal

N 1
Mean square of treatment – the variance between group mean,
relative to the “grand mean”—an indication of the degree to which
observations in one group differ from observations in another
group, on average
MStreat
SStreat

J 1
SROP 2011 151
Hypothesis Testing – Equality of 3+ Means

Results in three indicators of variance depicted by the
three mean squares:

Mean square of error – the variance within the groups, on
average—an indication of the degree to which observations vary,
relative to their group’s mean:
MSerror
SSerror

N J
SROP 2011 152
Hypothesis Testing – Equality of 3+ Means

ANOVA test statistic is the ratio of the mean square of
treatment and the mean square of error:
Fdf1,df2 
MStreatment
MSerror
with df1  J  1 and df2  N  J

If comparisons between groups (MStreatment) are about the
same as comparisons within groups (MSerror), then we
don’t have much evidence of group differences:
MSerror  MStreat
MStreat
F 
 1, retain null
MSerror
SROP 2011 153
Hypothesis Testing – Equality of 3+ Means

But, if comparisons between groups (MStreatment) are
greater than comparisons within groups (MSerror), then we
can conclude that group differences exist:
MSerror  MStreat
MStreat
F 
 1, reject null
MSerror



We conclude that at least one of the population means is
not equal to at least one other population mean (i.e., this
is an omnibus test).
This is the only information we have from the F test.
To determine which mean or means are different, need to
do multiple comparisons.
SROP 2011 154
Hypothesis Testing – Equality of 3+ Means
Assumptions of test:
Homogeneity of variances: Variances between groups need to be
equal, and all need to be equal to the error variance. This
assumption can be evaluated using Hartley’s Fmax statistic (see the
next slide). Also can use the following rule of thumb: If the largest
standard deviation is less than twice the smallest, you can assume
the assumption of equal population standard derivations has been
met.
 Normality of the dependent variable within each group: This can be
simplified as normality of residuals of each observation from its group
mean. Sometimes observations are transformed in order to
“normalize” them. This assumption can be evaluated by examining
the univariate plots of the dependent variable for each independent
variable group.
 Independence of Observations: Again, this simplifies to
independence of errors within a group. This assumption is evaluated
by thinking about the quality of the design of the study.
--Equal cell sizes make for the most powerful test

SROP 2011 155
Hypothesis Testing – Equality of 3+ Means
Assessment of Homogeneity of Variance:


Use Hartley’s Fmax statistic
The null hypothesis for Hartley’s Fmax is:
2
H0 :  j2   common
for all J

To test this hypothesis, compute the sample variances for
each level of the independent variable (there will be J
levels), find the largest and the smallest of those
variances:
2
Fmax 
slargest
2
ssmallest
SROP 2011 156
You try it…
Run a one-way anova to examine the equality of mean years of education (Educ) for 3
groups of raw Peabody test scores from Wave 3...
First, create a three category variable using raw scores…
SROP 2011 157
You try it…
Run a one-way anova to examine the equality of mean years of education (Educ) for 3
groups of raw Peabody test scores from Wave 3…
SROP 2011 158
You try it…
Run a one-way anova to examine the equality of mean years of education (Educ) for 3
groups of raw Peabody test scores from Wave 3…
SROP 2011 159
You try it…
Run a one-way anova to examine the equality of mean years of education (Educ) for 3
groups of raw Peabody test scores from Wave 3…
SROP 2011 160
Hypothesis Testing – Equality of 3+ Means
Effect size, h2
 Analogous to R2 in regression
 Calculated as:
h2 
SSB
SST
Easy to calculate, but very biased.
 Interpretation:



h2 % of the variance in [outcome variable] is explained by
the effects of [factor].
For example, if h2 = .865, we would say that 86.5% of the
variance in [outcome variable] is explained by the effects of
[factor]
SROP 2011 161
Hypothesis Testing – Equality of 3+ Means
Effect size, w2
 Analogous to R2 in regression
 Calculated as:
SSB  dfBMSerror
w 
SST  MSerror
2
Less biased than h2.
 Interpretation is the same as h2.

SROP 2011 162
You try it…
Calculate h2 and w2 for the one-way anova you just ran…
SROP 2011 163
Hypothesis Testing – Equality of 3+ Means
When the F test is rejected, multiple comparisons
must be made among the means.
 Planned contrasts are a priori (differences
theorized before data collected and the F-test for
the one-way ANOVA is run)
 Post hoc contrasts are a posteriori comparisons
(after the F-test for the one-way ANOVA is run)

SROP 2011 164
Hypothesis Testing – Equality of 3+ Means
Type I Errors
 Tests using contrasts are built to protect us from
having overly large chances of making a Type I
error
 To do this the tests consider Type I error rates in
two different ways, by examining:



The rate “per comparison” (PC or “per-contrast”)
the so-called “familywise” (FW) rate, which pertains to
a set of comparisons
Comparison procedures differ because some of
them limit the per-contrast error rate and the
others limit the familywise rate.
SROP 2011 165
Hypothesis Testing – Equality of 3+ Means

Familywise (FW) rate




pertains to a set of comparisons
If we are making several comparisons, the familywise
rate may apply.
The FW rate tells us what the chance is of making “at
least one Type I error” in the set of comparisons
Suppose a  is the error rate of one comparison:
Per-contrast (PC) error rate  a 
Familywise (FW) error rate  1  (1  a )c
SROP 2011 166
Hypothesis Testing – Equality of 3+ Means
Example:
 Suppose a  =.05 is the error rate of one
comparison and we are making 3 comparisons:
Per-contrast (PC) error rate  a   .05
Familywise (FW) error rate  1  (1  a )c
 1  (1  .05)3  .142
If we just add the rates of the three comparisons
we get: ca   3(.05)  .15
 In general, PC  FW  ca  , with FW close to ca 

SROP 2011 167
Hypothesis Testing – Equality of 3+ Means
Planned Contrasts
 Questions about the population means are expressed as
hypotheses about contrasts
 A contrast should express a specific question that is
driven by our research when designing the study.
 When contrasts are formulated before seeing the data,
inference about contrasts is valid whether or not the
ANOVA null hypothesis of equality of means is rejected.
SROP 2011 168
Hypothesis Testing – Equality of 3+ Means
Planned Contrasts
 Because the tests for planned comparisons are more
powerful than the F omnibus test, it is possible to retain
the null hypothesis using the F-test and to find a
statistically significant contrast.
 If you have planned some comparisons of means ahead
of time because you expect specific means to differ, then
you do not really need to do the F test to see if you can
reject the omnibus null hypothesis that all means are
equal.
SROP 2011 169
Hypothesis Testing – Equality of 3+ Means
Planned Contrasts
 A contrast is a combination of population means of the
form:
   ai i


The coefficients of the contrasts must sum to zero.
The corresponding sample contrast is:
c   ai xi

The standard error of c is:
sec  s p
ai2
n
i
SROP 2011 170
Hypothesis Testing – Equality of 3+ Means
Planned Contrasts
 The null hypothesis for each contrast says that a
combination (contrast) of population means is 0:
H0 :  0


Choose to define the contrast so that it will be a positive
number when the alternative hypothesis of interest is true
(makes some computations easier).
The t-test statistic:
c
t
, df  dferror
sec
SROP 2011 171
Hypothesis Testing – Equality of 3+ Means
Planned Contrasts
 Example of null and alternative hypotheses for a contrast:
1
 1  2   3
2
1
1
  1  2  13  0
2
2
1
H A :  1  2   3
2
1
1
  1  2  13  0
2
2
H0 :
SROP 2011 172
Hypothesis Testing – Equality of 3+ Means
Example: Planned Orthogonal Contrasts (POC)
 Planned orthogonal comparisons (POC) are contrasts of
a certain type.
 For any one-way anova with k groups, there are k-1
POCs.
 POCs are simply contrasts that are “orthogonal” or
independent of one another – and we can determine this
by looking at each pair of contrasts and seeing if they are
independent.
 Each set of k-1 POCs provides tests of all of the unique
information in the k means.
 There may be more than one set of POCs for any set of k
means.
SROP 2011 173
Hypothesis Testing – Equality of 3+ Means
Example: Planned Orthogonal Contrasts (POC)
 To tell whether two contrasts are orthogonal, we multiply
together the weights (the cj values) from the contrasts.
 For example, assume we are comparing five groups and
have the following set of contrasts:
L1  X 1  X 2

L2  X 1  X 3


2  X2  X4  X5

3
L1  Group 1 vs. 3
L2  Groups 1 and 3 vs. the rest
SROP 2011 174
Hypothesis Testing – Equality of 3+ Means
Example: Planned Orthogonal Contrasts (POC)
 The following weights are being applied to the 5 group
means:
 For example, assume we are comparing five groups and
have the following set of contrasts:

Each pair of contrasts within the set is orthogonal; for
example, for L1 and L2 ,
(4  0)+(-1 1)  (1 1)  (1 1)  (1 1)  0
SROP 2011 175
Hypothesis Testing – Equality of 3+ Means
Example: Planned Orthogonal Contrasts (POC)
 The test statistic:
ta / 2,df
c jX j
L


, df  dfwithin
2
se(L) MSW  c n 
 j j 
SROP 2011 176
Hypothesis Testing – Equality of 3+ Means
Sample of Planned Contrasts
Test
Contrasts
Type I error and power
information
Planned Orthogonal
Contrasts (POC)
k – 1 contrasts
in an orthogonal
set. Can have
multiple sets
Per-contrast a.
Can apply familywise
error, a/c if c is large
(Bonferroni correction).
Most powerful contrast
tests.
Trend Contrasts
k – 1 ind. trend
tests. Useful for
quantitative
factors that are
equally spaced
Per-contrast a.
Can apply familywise
error, a/c if c is large
(Bonferroni correction).
SROP 2011 177
Hypothesis Testing – Equality of 3+ Means
Sample of Planned Contrasts
Test
Contrasts
Type I error and power
information
Dunn or Bonferroni
Any number of c Familywise a.
contrasts. Use Use per-contrast level of
if contrasts are a/c.
not orthogonal.
Dunnett
paired contrasts Familywise a.
of 1 mean with
(k-1) other
means (e.g.,one
control vs other
treatments)
SROP 2011 178
You try it…

Set up a contrast that tests whether the means of education of the respondents in the
two highest Peabody groups are greater than the mean education of the respondents in
the lowest Peabody group…
Note: order of the coefficients is important
because it corresponds to the ascending order
of the category values of the factor variable
SROP 2011 179
You try it…

Set up an additional contrast that tests whether the mean of education of the
respondents in the middle Peabody group is less than the mean education of the
respondents in the highest Peabody group…
SROP 2011 180
Hypothesis Testing – Equality of 3+ Means
Post-hoc Contrasts
 Used when hypotheses cannot be formulated a priori
 Used after we analyze our data using ANOVA and after
rejecting the null hypothesis of the ANOVA.
 When we look at the data before doing comparisons we
are increasing our chances of making a Type I error
(beyond what we talked about before), because we may
decide to only test the differences among means that
look big.
 This is why most post-hoc comparisons examine all
possible comparisons, and also why post-hoc tests are
not as powerful as planned tests.
SROP 2011 181
Hypothesis Testing – Equality of 3+ Means
Post-hoc Contrasts
 The idea of the t-test is used to perform multiple
comparisons.
 The t statistic is calculated for each pair of means.
 The type I error is controlled using an appropriate
standard error in the test statistic equation and by making
the type I error level more stringent for each individual
test.
SROP 2011 182
Hypothesis Testing – Equality of 3+ Means
Post-hoc Contrasts
 In general, the null hypothesis:
H0 : i   j

The general test statistic:
Xi  X j
t ij 
sp

1 1

ni n j

Xi  X j
Root MSE 
1 1

ni n j
**
t

t
The null hypothesis is rejected if ij
SROP 2011 183
Hypothesis Testing – Equality of 3+ Means
Post-hoc Contrasts
 The value of t** depends upon which multiple
comparisons procedure we choose.
 Note that we use the pooled estimator from all groups
rather than the pooled estimator from just the 2 groups
being compared.
 The additional information about the common pooled
estimator increases the power of the tests.
 The degrees of freedom for all of these statistics are the
same: dferror
 Because we don’t have any specific ordering of the
means in mind as an alternative to equality, we must use
a two-sided approach to hypothesis testing.
SROP 2011 184
Hypothesis Testing – Equality of 3+ Means
Post-hoc Contrasts
 Example: the Scheffe Test – examines all possible
differences while controlling for Type I error:
FScheffe 
X
i
 Xj

2
1 1
MSW    K  1
 ni n j 
df1  K  1, df2  N  1
SROP 2011 185
Hypothesis Testing – Equality of 3+ Means
Sample of Post-hoc Contrasts – assume equal variances
Test
Contrasts
Type I error and power
information
Familywise a.
Fisher’s LSD
All pairs of
means
Tukey’s HSD
All pairs of
Familywise a.
means.
Same critical
value is used for
each test.
Newman-Keuls
All pairs of
means. Critical
value changes.
Mystery a, but power is
higher than Tukey’s test,
but less conservative.
SROP 2011 186
Hypothesis Testing – Equality of 3+ Means
Sample of Post-hoc Contrasts – assume equal variances
Test
Contrasts
Type I error and power
information
Ryan
All pairs of
means
Controls a by using
different levels for each
pair of means.
Scheffe
Any number of
post hoc
contrasts.
Familywise a.
Low power if large
number of contrasts are
used.
Sidak
All pairs of
means.
For same a as
Bonferroni, provides
tighter bounds.
SROP 2011 187
Hypothesis Testing – Equality of 3+ Means
Sample of Post-hoc Contrasts – assume equal variances
Test
Contrasts
Type I error and power
information
Dunn or Bonferroni
Any number of c Familywise a.
contrasts. Use Use per-contrast level of
if contrasts are a/c.
not orthogonal.
Dunnett
paired contrasts Familywise a.
of 1 mean with
(k-1) other
means (e.g.,one
control vs other
treatments)
SROP 2011 188
Hypothesis Testing – Equality of 3+ Means
Sample of Post-hoc Contrasts – assume unequal variances
Test
Contrasts
Type I error and power
information
Familywise a
Tamhane's T2
Conservative
pairwise
comparisons
test based on a
t test.
Games-Howell
Pairwise
Familywise a
comparison test
that is
sometimes
liberal.
SROP 2011 189
Inferences – Two-Way ANOVA
SROP 2011 190
Hypothesis Testing – Two-Way ANOVA



The general class of factorial ANOVAs are designed for
situations where our cases have been categorized
according to two or more factors or characteristics.
Two-way ANOVAs are a subset of factorial ANOVAs.
Two-way ANOVAs have two factors, which are called
"main effects".



Each main effect represents a factor that could be examined in a
one-way ANOVA.
We’d like to examine them together, as well as their interaction,
which tells us how the two factors work together to impact the
outcome.
Another way to think of the interaction is that it represents
whether the effect of one factor depends on the second
factor.
SROP 2011 191
Hypothesis Testing – Two-Way ANOVA

As with the one-way ANOVA, F statistics are used to test
statistical significant




There is an F-test of the main effects
There is also an F-test of the interaction of the main effects.
Planned comparisons and post-hoc tests are also used
with two-way ANOVAs.
As with one-way ANOVA, strive for equal group sizes


Most powerful design
In the case of 2+ factors, ensures factors are independent (not
confounded).
SROP 2011 192
Hypothesis Testing – Two-Way ANOVA
Advantages of two-way ANOVA (vs. one-way)
 It is more efficient to study 2 factors at once than
separately.
 Including a second factor thought to influence the
response variable helps reduce the residual variation in a
model of the data.



In a one-way ANOVA for factor A, any effect of factor B is
assigned to the residual (“error” term).
In a 2-way ANOVA, both factors contribute to the fit part of the
model.
Interactions between factors can be investigated.


The 2-way ANOVA breaks down the fit part of the model between
each of the main components (the 2 factors) and an interaction
effect.
The interaction cannot be tested with a series of one-way
ANOVAs.
SROP 2011 193
Hypothesis Testing – Two-Way ANOVA
Notation:
X ijk  score of the k th subject in group i of factor A
and group j of factor B
X ij  Mean for subjects in group i of factor A and group
j of factor B
X i   Mean score for group i of factor A, i  1,...,a
X  j  Mean score for group j of factor B, i  1,..., b
X   Grand mean of all scores
nij  Number of subjects in group i of factor A and
group j of factor B
ni   Sample size for group i of factor A
n j  Sample size for group j of factor B
n  Total sample size
SROP 2011 194
Hypothesis Testing – Two-Way ANOVA
The model:
X ijk    a i + j +a ij + ijk
where:
X ijk =score of the k th subject in group i of factor A
and group j of factor B
  Grand mean in the population
a i = Population factor A treatment effect for group i
 j  Population factor B treatment effect for group j
a ij  Interaction of factors A and B
 ijk  Residual of person k in group ij
SROP 2011 195
Hypothesis Testing – Two-Way ANOVA
Compare the two:
One-Way ANOVA
Two-Way ANOVA



What was "error" in the one-way model is now being
explained by the second factor (B) and the interaction.
We hope the addition of the second factor will allow us to
explain more (and get a larger h2).
We may still have some error but ijk is not the same
residual as for the one-way model (that's why I renamed
the one-way error term as ').
SROP 2011 196
Hypothesis Testing – Two-Way ANOVA
 is estimated by the sample grand mean
a i is estimated by  X i   X   =ai
 j is estimated by  X  j  X   =bi
a ij is estimated by  X ij  X i   X  j  X  

which is X ij  ai  b j  X 

where:
X ij


X j

=
k
X ijk
nij
i
k
n j
,
X ijk
X i



, X 
j
k
X ijk
ni 
,


i
j
k
X ijk
n
SROP 2011 197
Hypothesis Testing – Two-Way ANOVA
Three sets of hypotheses are tested:
1. Null hypothesis for factor A:
H0 : a1  a 2  ...  a a  0 or
H0 : 1  2  ...  a
2.
Null hypothesis for factor B:
H0 : 1   2  ...   b  0 or
H0 : 1  2  ...  b
3.
Null hypothesis for interaction of factors A and B:
H0 : a11  a12  ...  aab  0 or
SROP 2011 198
Hypothesis Testing – Two-Way ANOVA
Sums of squares for the model:
SROP 2011 199
Hypothesis Testing – Two-Way ANOVA


Again as for the one-way the F tests get big when the
population treatment effects are nonzero and the MSA,
MSB and MSAB are larger than MSW.
For two-way ANOVA we will do three tests



Two main effects
Interaction
In general:
1.
Examine the interaction test


2.
3.
Makes sense to examine first, to determine how much attention we
devote to main effects
If the interaction is significant, we may need to be especially cautious
in interpreting the main effects.
Examine the cell means
Go back to the other F tests.
SROP 2011 200
Hypothesis Testing – Two-Way ANOVA
Interactions
 Plots of means are important in two-way ANOVA.
 Tables of means are acceptable but often plots of the
means show patterns very quickly – patterns that may
not be apparent from simple tables of means.
 Plots only suggest the presence of interactions; the
ANOVA F test tells us whether the suggested interaction
is significant or not.
SROP 2011 201
Hypothesis Testing – Two-Way ANOVA
Interactions

Main effects are shown by lines at different heights or lines
that slant, when lines are parallel, there is no interaction.
SROP 2011 202
Hypothesis Testing – Two-Way ANOVA
Interactions
 Ordinal interactions


When the size of the effect (or mean difference) for one factor is
not the same at all levels of the second factor.
Disordinal interactions

The means for levels of one factor (say B) are not in the same
order within each of the levels of A.
SROP 2011 203
You try it…
Run a two-way anova to examine the equality of mean years of education (Educ) for 3
groups of raw Peabody test scores and gender from Wave 3…
SROP 2011 204
SROP 2011 205
SROP 2011 206
SROP 2011 207
You try it…
Run a two-way anova to examine the equality of mean years of education (Educ) for 3
groups of raw Peabody test scores and gender from Wave 3…
SROP 2011 208
You try it…
Run a two-way anova to examine the equality of mean years of education (Educ) for 3
groups of raw Peabody test scores and gender from Wave 3…
SROP 2011 209
Fixed and Random Effects
SROP 2011 210
Hypothesis Testing – Two-Way ANOVA

Fixed effects



What we have considered thus far
Each level of the factor is represented in the model
Random effects


Factor levels for the analysis are sampled from a population of
levels
Common factors that appear as random effects include:





Schools
Classrooms
Manufacturing lines
Labs
Appropriate in cases where:


There are many factors (too many to include) and
we want to generalize to a broader set of cases than the ones we will
study
SROP 2011 211
Hypothesis Testing – Two-Way ANOVA





The factor levels must be randomly sampled because the
variation among the means of the population of factor
levels is estimated.
Because we wish to generalize to the population of
levels, estimation must take this into account.
If both factors are random, the model is a random-effects
model.
If only one factor is random, the model is a mixed-effects
model.
Generally, a random factor is factor manipulated by the
researcher, random factors in observational data are not
common.
SROP 2011 212
Hypothesis Testing – Two-Way ANOVA


If a factor is random, the extra variation must be built into
the model.
As compared to a fixed factor, a random factor has:


A different mean square
The F test for the random factor is different (e.g., MSA/MSAB
instead of MSA/MSW)
SROP 2011 213
You try it…
Run a two-way anova to examine the equality of mean years of education (Educ) for 3
groups of raw Peabody test scores and gender from Wave 3… IF Peabody test scores
were considered to be a random effect (incorrect assumption for this data).
SROP 2011 214
Multiple Regression
SROP 2011 215
Multiple Regression

A linear model with an interval/ratio outcome variable:
Yi  0  1X 1i   2 X 2i  ...   p X pi   i
where:
Yi  outcome for person i
1...p  regression coefficients
 i  residual for person i,  ~ N 0, 2 
 Yi  Yˆi , where Yˆi  predicted outcome for person i

Goal: explain the variation in the outcome variable with
independent variables, which in turn reduces the error
(residual) variance.
SROP 2011 216
Multiple Regression
How do we find ‘good’ independent variables?
 A statistically significant correlation coefficient suggests a
linear relationship between two variables.
 A scatterplot with a linear ‘trend’ for the points suggests a
linear relationship exists between two variables.
 Independent variables are not highly correlated to one
another.



This is referred to as multicollinearity
Affects estimation of all slopes
Indicators of multicollinearity



Sign of slopes change when new Xs are added
Magnitude of slope for a predictor changes greatly when another
variable is added to our model
Increase in standard error of slope when new Xs are added.
SROP 2011 217
You try it…
Run a bivariate correlation table that includes
Raw Peabody scores (AH_RAW), Highest grade completed (EDUC), BMI (BMI),
annual income: wages/salaries (H3EC1A), number of hours/week spent at work
(H3DA31), cumulative Math GPA (EAMGPAC), cumulative overall GPA (EAOGPAC)
SROP 2011 218
Multiple Regression
Modeling steps
 Examine omnibus F test for overall model significance.
 Examine individual t-tests for coefficient significance.
 Check residuals



Histogram for assessment of normality
Standardized residuals versus predicted values scatterplot to
assess randomness of residuals
Test homogeneity of variance.
SROP 2011 219
Multiple Regression
Indicators of overall regression model quality.
 MSE -- the mean squared residual from the regression –
2
 compare this to SY.


Similar to MSW in ANOVA.
Adjusted R2 -- "variance accounted for“, adjusted for size


Like h2 and w2 in ANOVA
Calculated as:
2
adj
R

(n  1)
 1 - (1  R )
(n  p  1)
2
F test, H0 : 1   2  ...   p


Compare to critical F values, df1  p,df2  n  p  1
This is also a test of the correlation between observed and
ˆ with H0 :  2  0
predicted,  2 (Y,Y)
SROP 2011 220
Outliers
SROP 2011 221
Multiple Regression
Sums of squares
 Similar to use in ANOVA
 Total sum of squares (SS Total) is partitioned into two
parts


Explained variation (also called SS Regression, SS Model or SS
Explained)
SS Residual (or SS Error)
SS Total = SS Regression + SS Residual
2
2
ˆ
ˆ
 (Yi - Y)   (Yi - Y)   (Yi - Yi)
2
SROP 2011 222
Multiple Regression

Sums of squares table for regression
SROP 2011 223
Multiple Regression
Regression coefficients
 Each regression coefficient has a related t-test, H0 :  j  0
 Null tests whether slope is zero
 Calculated as:
t

bj
se  b j 
, df  n  p  1
Standardized regression coefficient
 Calculated as:
bi s xi
B
sy

Interpreted as the number of standard deviations of change in Yi
to expect given one standard deviation of change in Xi
SROP 2011 224
You try it…
Run a linear regression using standardized Peabody in Wave 3 (pvtstd3c) as the outcome
variable and math gpa (eamgpac), math course sequence (eamsqh), and age as the
independent variables (reg1)…
SROP 2011 225
You try it…
Examine the residuals for normality and randomness…
SROP 2011 226
Multiple Regression
Categorical independent variables
 These variables require that a dummy code set be
constructed.



Dummy codes are a common way to code categorical variables
for use in regression
Effects coding and orthogonal coding are also options, although
less common
ANOVA can be run using a regression with a
corresponding dummy code set
SROP 2011 227
Multiple Regression
A categorical variable that has 2 categories



Referred to as a dichotomy or a binary (dichotomous) variable
The dummy code set consists of one member (e.g., X=0 if male,
X=1 if female).
If X = 0, then:
Ŷi  b0  b1X i
 b0
 Ymales

If X = 1, then:
Ŷi  b0  b1X i

 Ymales  Yfemales  Ymales

SROP 2011 228
Multiple Regression




In general, a categorical variable with k categories will
have a dummy code set of k – 1 members.
Example:
Subject
Group
T
X1
X2
Jim
John
Joe
Physical
Mental
Control
1
2
3
1
0
0
0
1
0
The intercept is the expected value of the reference
group (in this example, the control group)
The slopes are the difference in means between the
control group and the corresponding group (in this
example, the slope of X1 is the difference between the
means of the control and physical groups)
SROP 2011 229
You try it…
Add female to your linear regression (reg2)…
SROP 2011 230
Multiple Regression

To test whether the addition of extra independent
variables ‘add’ to the prediction of the outcome variable,
use an ‘Increment to R2’ F test:
p  p 


F
1 R  n  p  1
RL2  RS2
2
L
L
s
L
where the subscript L indicates larger model,
the subscript S indicates the subset model.
Model S must be 'nested' within L.
SROP 2011 231
You try it…
Using the F-test, did the model improve with the addition of female?
SROP 2011 232
Analysis of Covariance (ANCOVA)
SROP 2011 233
ANCOVA


In order to compare groups, it’s helpful if the groups are
comparable in terms of the independent variables.
This control can be created by a variety of ways:




Sampling – random sampling helps create this control; this is the
‘gold standard’
Design control – groups are ‘forced’ to be equivalent
Statistical control – the group differences are ‘controlled for’ in the
statistical model.
ANCOVA is a statistical model used for statistical control.
SROP 2011 234
ANCOVA



The idea of analysis of covariance (ANCOVA) is very
similar to that of ANOVA.
We wish to look for possible differences among group
means BUT in ANCOVA we have some additional
variable we want to “control for”, hold constant, or
account for in our analysis.
This additional variable is known as the covariate, and
we will denote it as X.


We’d like to remove the covariate X from having an influence on
our outcome.
If we could hold constant the values of X for our subjects, we
would have a clearer picture of the differences on our outcome Y.
SROP 2011 235
ANCOVA

Gives us results that allow us to estimate what the group
means on our outcome WOULD HAVE BEEN if the
groups had the same means (or “were equivalent”) on
the covariate.
SROP 2011 236
ANCOVA
Choosing an appropriate covariate
 The covariate X should be linearly related to the outcome
Y, and we sometimes hope (or expect) that the groups of
interest will show mean differences on the covariate
(though that is not a requirement).
 If there is a treatment involved, we also have to know that
the treatment did not affect X and similarly that X did not
affect the treatment.


So, for instance, if subjects are assigned to treatment groups on
the basis of a variable, that variable would not be a good
covariate.
X should relate to Y in exactly the same way for all of the
groups in our analysis; there should be “no covariategroup interaction”.
SROP 2011 237
ANCOVA
Additional assumptions
 Assume that the errors:


are independent,
are normally distributed


Check using a histogram of the residuals
have homogeneous variances across groups

Use Levene’s test to check for equality of variances
SROP 2011 238
ANCOVA
The ANCOVA model:
Yij  m  ai  bw  X ij  mx   eij
where:
Yij  outcome score of person i in group j
X ij  Covariate score of person i in group j
m  Grand mean of Y in the population
ai  j th treatment effect in the population, with X held constant
 mi  m
bw  slope of covariate in the population, which is the
predicted change in Y given one unit change in X ,
with group membership held constant
eij  Residual or unexplained variance for person i in group j
SROP 2011 239
ANCOVA
The ANCOVA model:
 The w label on b represents the within-group slope


Implies all groups must share the same population slope for X
predicting Y.
bw is multiplied not only by the X score, but the deviation
of the score from the mean of X, mX.
SROP 2011 240
ANCOVA
SROP 2011 241
ANCOVA
ANCOVA modeling steps:
 Test whether the covariate has the same slope for each
group in the factor (or factors in a multi-way ANCOVA).



One of the key assumptions of ANCOVA is that this interaction
does not exist
Plot the slopes for the different groups in a scatterplot
Run a ANCOVA-like model, but with the addition of the interaction
between factor and covariate




If interaction is nonsignificant, covariate does not interact with group
If interaction is significant, cannot use covariate
Check the homogeneity of variance assumption
Check residuals for normality
SROP 2011 242
You try it…
Pick an appropriate covariate for your ANCOVA model and evaluate whether it’s
appropriate for use…
SROP 2011 243
You try it…
Evaluate the assumptions of the ANCOVA model…
SROP 2011 244
Other Models from Experimental
Design:
Repeated Measures
MANOVA
SROP 2011 245
Repeated Measures




Uses ideas of ANOVA: testing differences between
different means.
Each group of people (units) had a mean, which we
compared.
In Repeated Measures ANOVA, the same individuals can
contribute to the different means.
More formal definition: participants participate in all
conditions of an experiment



It might be the measurement of the same thing over time
It might be exposure to multiple treatments, with one measure per
treatment
The assumption of independence that ANOVA relied on
is violated
SROP 2011 246
Repeated Measures




The implication is that the F-test from ANOVA will lack
accuracy in this situation.
Instead, we assume that the variances of the differences
between treatment levels are homogeneous.
This assumption is called sphericity.
Sphericity is evaluated using Mauchly’s Test




The null is that there is sphericity
If the null is rejected, we can’t depend on the F-test
Big samples will likely have significant Mauchly Tests
What to do if sphericity is violated:


Apply correction factor to the F-test (e.g., Greenhouse and
Geisser, Huynh and Feldt)
Use MANOVA – it does not rely on the assumption of sphericity
(although it does have less power).
SROP 2011 247
Multivariate Analysis of Variance (ANOVA)

When to use MANOVA



Similar to ANOVA





Repeated measures ANOVA inappropriate because of a violation
of sphericity
Interested in modeling several dependent variables that are
correlated to one another.
Tests differences between group means
Multiple factors can be examined
Interactions can be examined
Used rather than multiple ANOVAs so that familywise
error rate is not inflated.
Detects differences between groups along a dimensional
space rather than with one outcome.
SROP 2011 248
Multivariate Analysis of Variance (ANOVA)



MANOVA should not be used if the dependent variables
are not correlated.
The power of MANOVA depends on the correlation
between the dependent variables and the effect size to
be detected.
Use theory to guide you on what variables to include in
your analyses.
SROP 2011 249
Generalized Linear Models:
Dealing with Categorical Outcomes
SROP 2011 250
Generalized Linear Models





Different from General Linear Model, which usually refers
to regression models with interval/ratio outcomes
Applies many of the ideas of Linear Regression
Big difference is that the outcome variable is something
that is modeled by a distribution other than the normal
distribution.
The outcome is ‘linked’ to a linear model by the canonical
link function.
What kind of generalized linear model you have is
determined by the canonical link that you use.
SROP 2011 251
Generalized Linear Models
Some examples:
 Logistic regression: data is binomial (0/1), possible
distribution function is binomial and the link is the logit.
 Poisson regression: data consists of counts, distribution
function is the Poisson and the link is the log.
 Linear regression: data is interval/ratio, distribution
function is normal and the link is the identity function.
 Negative binomial regression: data consists of counts,
the distribution function is the negative binomial and a
possible canonical link is the log.
 Other models;

Probit (binary data), Ordered Logit (ordinal data), Gamma
(counts)…
SROP 2011 252
A Few Useful References
Agresti, Allan (2007). An Introduction to Categorical Data
Analysis. Wiley.
Dean, A.M. and Voss, D. (1998). Design and Analysis of
Experiments. Springer.
Draper, Norman R. and Smith, Harry (1998). Applied
Regression Analysis. Wiley
Field, Anthony (2009). Discovering Statistics. Thousand
Oaks, CA: Sage.
Kennedy, R. (2008). A Guide to Econometrics. Wiley.
Kirk, R. (1995). Experimental Design: Procedures for the
Behavioral Sciences. 3rd ed. Pacific Grove, CA;
Brooks/Cole.
SROP 2011 253