Download MBA 9 Research and Q..

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Psychometrics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Categorical variable wikipedia , lookup

Analysis of variance wikipedia , lookup

Omnibus test wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
WORKSHOP 2
MBA 9: RESEARCH AND QUANTITYATIVE
METHODS
Understand your data
Need to understand your research
question/objective/aim to determine what type,
and level of data is required
• QN data is numerical
• Deductive
• Properties have implications for type of analysis
– Internal/external
– Discrete/Continuous
– Primary/Secondary
Levels of Measurement
Increase in mathematical and
statistical ability
Ratio
Interval
Ordinal
Nominal
Bar Chart
Nominal/Ordinal
Histogram
Interval/Ratio
Line Chart
Temporal
Descriptive
Organises
Summarises
Inferential
Infer from a
sample of the
population
of interest
Summarising Data
Three ways:
• There are three basic ways in which we summarise data:
– Central tendency, or the single value that best describes
the sample (Mean, Median, Mode)
– The spread of the distribution (Variance and Standard
Deviation)
– The shape of the distribution (Skewness and Kurtosis)
1. The centre of the distribution or
measure of central tendency
2. The spread of
data
3. Shape of
distribution
1. Measure of central tendency
• Mean, median, mode
• Examine outliers
Measure
Usage
Advantages
Disadvantages
Mean
Most familiar
average. Use
mainly for
ratio/interval.
Exists for each dataset.
Takes every score into account.
Works well with many statistical
methods.
Is affected by extreme scores.
Median
Commonly
used. Use
mainly for
ordinal.
Always exists.
Is not affected by extreme scores.
Is often a good choice if there are some
extreme scores in the dataset.
Does not take every score into
account.
Mode
Sometimes
used. Use
mainly for
nominal.
Is not affected by extreme scores.
Is appropriate for data at the nominal
level.
It might not exist or there may
be more than one mode.
It does not take every score into
account.
12 22 12 42 29 10 33 40 12
Mode
Median
Mean
Most frequently
occurring number
in a dataset
Sort the data in
ascending order,
then take
middle number
Add all of the
numbers
together, and
divide by the total
number of
numbers
In numbers
above:
𝑀𝑜 = 12
Occurs 3 times
10 12 12 12 22 29 33
40 42
𝑀𝑒 = 22
12+22+12+42+29+10+33+4
0+12 = 212
212
9
= 23.56
𝜇 = 23.56
2. Spread of the data
Level of
Representation
measurement
Nominal
Table or frequency distribution showing frequencies
Ordinal
Tables/frequency distribution, but choosing a single measure is problematic. Use
interquartile range if single measure is chosen.
Interval/Ratio
Graphic dispersion, standard deviation provided cases have an approximately normal
distribution.
Standard Deviation and Variance
• The most important and commonly used
measure of variability in statistics is the
variance.
• Variance is a measure of the dispersion in a
set of scores, and is calculated by determining
the ‘average distance’ of a set of scores from
its ‘centre’ or mean, by the formula:
•
•
•
•
Sums of Squares -SS
Degrees of Freedom df
Standard Deviation
• The standard deviation σ, in the case of a
population, is the square root of the variance
(σ2), so the formula is the same as for the
variance, except that a square root is
calculated
3. Shape of the distribution
• Look at:
– Symmetry (Skewness)
– Peakedness (Kurtosis)
ess
skewn
kur
tosis
Data Analysis
• Enhance knowledge
– Break knowledge down into elements
– Elements contain concepts
• Can look at relationship between concepts
• Can look for differences or associations
• Test hypotheses
Term
Definition
Data
Groups of observations
Attribute/Value
Characteristic of the studied phenomenon
e.g. female
Variable
Logical
collection
of
attributes,
or
characteristics, e.g. gender, with attributes
male and female
Response Variables
Cases
Variables we are interested in
The individual, person, things, events where
we get our information from
• To test hypotheses we need variables
• A variable or construct is theoretical or operational
(explains how we are going to measure the variable)
• Operationalization is a process of defining the
measurement of a phenomenon that is not directly
measurable
• Two types of variables: Independent Variables (IV’s) and
Dependent Variables (DV’s)
• IV’s
– the thing we think causes the DV
• DV’s
– is the effect
VARIABLE/CONCEPT
Hypothesis Testing
• Hypothesis testing uses sample evidence to
statistically test whether a claim made about a
population is valid. The results of the sample are
used to make an inference about the population
as a whole. Requires the identification of
Independent, and Dependent variables.
– Null Hypothesis: Assumes no difference in the
state of affairs
– Alternate hypothesis: If our theory is, in fact,
true beyond reasonable doubt
If I hypothesised the more cups of coffee had,
the more alert
• IV and DV?
IV: Number of cups of coffee
DV: Level of alertness
Hypothesis:
𝑯𝟎 : There is no significant difference in the level of alertness between those
people drinking coffee, and not drinking coffee
𝑯𝟏 : There is a significant difference in the level of alertness between those
people drinking coffee, and those not drinking coffee
There is a significant difference in the level of alertness between
those people drinking coffee, and those not drinking coffee
What's
normal ??
Normal Distribution
• Smooth continuous curve representing the
form a binomial distribution would take for an
infinite number of events with equiprobable
outcomes
•
•
•
•
Bell-shaped curve
Symmetrical
Unimodal (Mean, Median, Mode all coincide)
Tails extend indefinitely to the left and right
Normal Distribution cont…
• The area under the curve of a normal
distribution represents probability
• Allows us to determine where an individual
score lies in relation to other scores
• Model of the shape of the frequency
distribution of many naturally occurring
phenomena
• Help us understand the “relative position” of a
case relative to other cases
How do you calculate where an individual
stands relative to the normal distribution
The Standard Normal Distribution
• Distributions allow us to predict probability or
proportion from an individual score…
• But to do so we need three pieces of info..
– Mean
– Variance
– Shape
• Every phenomenon has a different distribution
(different means and variances), but all have
the same shape (normal shape)
So what’s the problem?
Because there are so many different types of
distributions – each distribution has a different
proportion of cases falling below any particular
score
The Standard Normal Distribution
• Measuring standard for all distributions
• Mean = 0
• Variance = 1
• Defined:
• Standard deviation units (z-scores)
Difference between Normal and
Standard Normal
• Normal distributions have x-values along x-axis
(individual scores); standard normal distribution
has z-values
• z-scores:
– Standardised scores
– Do not depict real values of individuals
– Hypothetical values to show where an individual case
lies relative to other cases
– Indicate the number of SD units a score lies above or
below the mean
Parametric vs non-Parametric Tests
Normality
Independence
Parametric tests
Homogeneity of variance
But how do I decide how to test this?
•
•
•
•
•
•
Frequency?
Relationships?
Difference?
Association?
Correlation?
Am I trying to forecast?
Determined by?.....
YOUR AIM!
Decision tree
Number 1…
CHI-SQUARE
Nominal/Categorical: Chi-Square
• Significance test used where the data consists of counts rather than
scores
• Most MBA dissertations will involve the use of categorical data
• Best analysed:
– Basic descriptives
• Frequency tables
• Crosstabs
Classifications:
• Dichotomous classifications: married and single, children and
adults, politically active and politically indifferent, etc.
• Multiple classifications: Sheldon’s classification of body types as
ectomorphic (thin), mesomorphic (muscular), and endomorphic
(fat)
Classifications are of interest to a researcher mainly when they are
exhaustive and mutually exclusive
Chi-Square cont…
Contingency tables, or Crosstabulations
• Frequency tables summarise a single categorical variable
• Cross-tabulations summarise the relationship between two
categorical variables
• When data are classified with respect to two or more variables
Notice something?
Mutually exclusive!
Exhaustive!
Cell
Frequency/Count (count rather than a continuous measurement)
The χ² significance test
• Appropriate for the analysis of counts is the χ²
test
• Used as a goodness of fit test (i.e. does the
existing data fit a theoretical distribution, such
as a normal distribution?)
• The null hypothesis would be that no
association exists between the sets of
categories
The χ² significance test
• The key concept in this test is the notion of an
expected frequency:
– What we would expect if only chance variation
were operating across the categories of interest,
and the category frequencies were in fact equal in
the population
It is not difficult to calculate expected frequencies for the
teadrinker data.
The population as a whole contains 84 moderate tea-drinkers.
If the sample of long sleepers and the sample of short sleepers
were from the same population, we would expect the moderate
tea-drinkers to be distributed in proportion to the number of
people in each sample.
Since the two samples actually contain the same number of
people, we would expect the same number of moderate teadrinkers in both the long and short sleepers, i.e. 42 people.
We can work out the other columns in a similar fashion. The
general principle for working out the expected frequency in each
cell of a contingency table is:
• Once we have expected and observed values
for each cell, we are in a position to calculate
the χ2 statistic.
• To calculate χ2, we apply the formula below.
The resulting total is the χ2 value, and we can
look up its significance in the χ2 tables.
• Obviously, it will always have a positive value
because of the squaring of the differences.
Alternative measures for contingency
tables
• Why?
– size of the sample, confounding sample size and
effect size
• The simplest measure of effect size, the mean
square contingency coefficient (usually
denoted by φ ) simply divides χ²by the size of
the sample:
2
In the case of our tea-drinking study, φ² = χ²/n = 10.68/200 = 0.0534,
which indicates a very small effect
Cramer’s V
• φ² is, however, not considered a good measure of
association, largely because it does not generate
scores that fall between 0 and 1 in the same way as a
correlation does
• Nevertheless φ² is used, with some modifications, in
meta-analytic studies
• A measure of association in contingency tables with
somewhat better properties is Cramer’s V, usually
denoted by φ𝑐 :
Odds Ratio
• Unaffected by sample size or by unequal row
or column totals
• 2 x 2 tables
– Collapse over one of the categories to generate a
2 × 2 table.
– Collapsing over categories is in general not a good
idea because it can (sometimes) alter the meaning
of the data, either obscuring or exaggerating the
association between the categories.
Assumptions of the χ² test
There are two assumptions that must be satisfied if
a χ2 test is to be used appropriately
1. Expected frequency minimum:
• The number of subjects expected in each cell must reach a
certain minimum
• A rule of thumb that is frequently used is that the expected
frequency should be no less than 5 in at least 80% of the
cells
2. All the items or people involved in the test are
independent of each other:
• Each observation comes from a different subject.
• No subject should be omitted from the table.
Interpreting Chi-Square
Case Processing Summary
Cases
Valid
Missing
N
Percent
N
Percent
Gender *
Level_of_satisfaction
100
100.0%
0
Total
N
0.0%
Percent
100
100.0%
Gender * Level_of_satisfaction Crosstabulation
Count
Gender
Level_of_satisfaction
Satisfied
Dissatisfied
26
26
27
21
53
47
Male
Female
Total
Value
Pearson Chi-Square
Continuity Correction
Likelihood Ratio
Chi-Square Tests
Asymp. Sig. (2sided)
df
Crosstabs (2 x 2)
52
48
100
a
1
.532
.181
1
.671
.392
1
.391
b
Total
Exact Sig. (2sided)
.554
.336
.531
.554
.336
.534
.554
.554
.336
.336
Fisher's Exact Test
Linear-by-Linear Association
N of Valid Cases
c
.387
100
1
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 22.56.
b. Computed only for a 2x2 table
c. The standardized statistic is -.622.
Ordinal IV
Exact Sig. (1sided)
Point Probability
If assumptions are met
If assumptions are violated
.131
Check Assumptions
Decision Making Tree
Number 2…
REGRESSION
Regression
• Paired Data
– Allows us to measure the relationship between
two measures
– Collected from two INDEPENDENT measurements
• Refined way of analysing scatterplots
X-axis: Predictor or IV
Y-axis: Criterion or DV
Trend: Overall shape of plotted points
Best fitting line or Regression line
• Best fitting line that can be drawn through the
points on a scatterplot
• Linear Regression: Straight line
• Non-linear Regression: Curved line
Finding Regression Coefficients
• To define a straight line, 2 pieces of information
are required:
– Slope
– Intercept: point on graph where crosses the y-axis
y represents the percentage of people on the criterion variable
x represents the predictor variable
a and b represent the two pieces of information required to fit
the line (i.e. b is the slope, and a is the intercept) (Regression
coefficients)
Calculating Regression Coefficients
n: the number of pairs of values
Σx: the sum of the x values
Σy: the sum of the y values
Σx²: the sum of the squares of the x values
Σxy: the su
These intermediate values are substituted into the
following equation to find the covariance, 𝑠𝑥𝑦 , and
following this, the slope, b:
Calculating a
• Having calculated b, we can find the intercept
a.
• The midpoint of all the points on the
scattergraph is the middlemost point in the
scatter
• substitute these mean values into the general
equation for a line (y a + bx) and then
rearrange to solve for a:
Making predictions
• Regression equation is essentially a
mathematical summary of what we think the
relationship between the two variables might
be
• We can use this mathematical relationship to
make predictions, though not without some
danger of making a mistake
Linear Regression
Regression applied
Descriptive Statistics
Mean
21.35
2.69
Level_of_alertness
Cups_of_coffee
Std. Deviation
12.175
1.942
N
Descriptives
55
55
The “descriptives” command
also gives you a correlation
matrix, showing you the
Pearson rs between the
variables (in the top part of
this table).
Correlations
Pearson Correlation
Sig. (1-tailed)
N
Level_of_alertness
Cups_of_coffee
Level_of_alertness
Cups_of_coffee
Level_of_alertness
Cups_of_coffee
Level_of_alertne
ss
Cups_of_coffee
1.000
.989
.989
1.000
.
.000
.000
.
55
55
55
55
This table tells you what % of
variability in the DV is accounted
for by all of the IVs together (it’s a
multiple R-square). The footnote
on this table tells you which
variables were included in this
equation
Model Summary
Adjusted R
Std. Error of the
Square
Estimate
Model
R
R Square
1
.989a
.978
.977
1.826
a. Predictors: (Constant), Cups_of_coffee
ANOVAa
Model
1
Regression
Residual
Total
Sum of
Squares
7827.647
df
Mean Square
1
7827.647
176.789
53
8004.436
54
a. Dependent Variable: Level_of_alertness
b. Predictors: (Constant), Cups_of_coffee
3.336
F
2346.666
Sig.
.000b
This table gives you an F-test
to determine whether the
model is a good fit for the
data. According to this pvalue, it is.
Regression applied cont…
Coefficientsa
Model
1
Unstandardized Coefficients
B
Std. Error
(Constant)
4.666
.423
Cups_of_coffee
6.198
a. Dependent Variable: Level_of_alertness
.128
Standardized
Coefficients
Beta
.989
t
Sig.
11.024
.000
48.442
.000
Finally, here are the beta coefficients—one to go with each predictor.
(Use the “unstandardized coefficients,” because the constant [beta zero] is included).
Based on this table, the equation for the regression line is:
y = 4.666 + 6.198 (cups of coffee)
Using this equation, given values for “cups of coffee,” you can come up with a
prediction for the “level of alertness” variable
So what’s the problem?
• The regression line is a useful statement of the
underlying trend, but it tells us nothing about
the strength of the relationship.
• Correlation is a measure of the strength of
linear association between two variables
Decision Making Tree
Number 3……
CORRELATION
Correlations
• Useful to gauge the strength of a relationship
by looking at a scatterplot
• More formal manners….. Correlation!
Parametric Correlations
Product-moment coefficient of correlation
OR
Pearson’s correlation coefficient
Correlations
• Calculated on the basis of how far the points
lie from the ‘best-fit’ regression line
• Symbolised by the small letter r
• r will fall with in the range –1 to +1.
• –1 means a perfect negative correlation (a perfect
inverse relationship, where, as the value of x rises, so
the value of y falls)
• +1 means a perfect positive correlation (where the
values of x and y rise or fall together)
• An r of 0 means zero correlation, which means that
there is no relationship between x and y
Calculating r
x is the variable on the horizontal axis
y is the variable on the vertical axis
sx and sy are the standard deviations of x and y, respectively
sxy is the covariance between x and y
Strength
0.0 to 0.2
-
0.2 to 0.4
-
0.4 to 0.7
0.7 to 0.9
0.9 to 1.0
-
Very weak to negligible
correlation
Weak, low correlation (not
very significant)
Moderate correlation
Strong, high correlation
Very strong correlation
Non-Parametric Correlations
Spearman’s Rho
• Spearman’s ƿ is a statistic for measuring the
relationship between two variables
• It is a non-parametric measure avoids assumptions that the variables have a
straight-line relationship
• Used when one or both measures
is measured on ordinal scale.
Spearman’s Rho
• A value of 0 indicates no relationship and valu
es of +1 or 1 indicate a one-toone relationship between the variables or ‘per
fect correlation’
• The difference is that Spearman’s rho refers to
the ranked values rather than the original
measurements.
Correlations Applied
Correlations
Cups_of_coffee
Pearson Correlation
Sig. (2-tailed)
Level_of_alertness
Level_of_alertne
ss
Cups_of_coffee
1
.989**
N
Pearson Correlation
Sig. (2-tailed)
N
**. Correlation is significant at the 0.01 level (2-tailed).
.000
55
.989**
55
1
.000
55
Significant (two-tailed)
Strong, positive correlation
55
Decision Making Tree
Up next……..
DIFFERENCES
But first…
SOME KEY CONCEPTS
Key Concepts - Effect Size
• Effect size: measures of how strong the
association is between the two sets of
categories that define a table
Effect Size
• Effect size is an estimate of the proportion of
total variance explained by differences among the
treatment means, and is thus an indication of the
strength of the effect.
• The meaning of effect size is evident in the
formula to compute eta-squared (η²), a widely
used index of effect size.
• Although η² is a biased estimate of effect size, it is
simple to calculate by hand and quite easy to
understand
Degrees of freedom
• Degrees of freedom are commonly discussed in
relation to chi-square and other forms of hypothesis
testing statistics
• Each of a number of independently variable factors
affecting the range of states in which a system may
exist, in particular
• Degrees of freedom are used to then determine
whether a particular null hypothesis can be rejected
based on the number of variables and samples of in
the experiment. For example, while a sample size of 50
students might not be large enough to obtain
significant information, obtaining the same results
from a study of 500 samples can be judged as being
valid.
df
• The concept of degrees of freedom is central
to the principle of estimating statistics of
populations from samples of them.
• "Degrees of freedom" is commonly
abbreviated to df.
• Think of df as a mathematical restriction that
needs to be put in place when estimating one
statistic from an estimate of another.
Parametric vs. non-Parametric Tests
• All statistical tests need to estimate
probabilities, and if distribution-free tests do
not use the well-understood characteristics of
the normal curve, how do they estimate
probabilities?
• Most distribution-free tests use either the
characteristics of ranked data or they use
randomisation procedures to calculate
probabilities.
Parametric Tests
1. The assumption of normality
• It is assumed that all the samples you are analysing have been drawn from
populations that are normally distributed.
• You can get a rough idea if data is normally distributed by drawing a
histogram of the data and examining the shape of the distribution.
• If the histogram has a bell shape, then it is probably normally distributed.
2. The assumption of homogeneity of variance
• If your samples have variances that are highly different, then it is difficult
to get accurate results from a t-test
• This can be formally checked for, but is quite complex. We can ‘cheat’ and
say that if the two variances differ by a factor of less than 4, the variance is
probably homogenous. This is a rule of thumb, so it is not perfect, but
seems to work a lot of the time.
More assumptions
The assumption of Independence
• The majority of t-tests (with the exception of
the repeated measures t-test) assume that the
samples the means were calculated from did
not influence each other’s scores in any way.
For example, if you collect two datasets from
the same group of people (as in a pretest/post-test design), then these two
datasets are not independent.
Assumption of Normality
Tests of Normality
a
1. Department
2. Gender
3. Highest level of education
4. Age:
5. Length of service at Mhlathuze Water:
6. Marital Status
7. How satisfied are you, working at Mhlathuze Water?
8. How satisfied are you working in your current department?
9.1 “Mhlathuze Water is an Employer of Choice” when speaking to your friends and
family about your employer?
9.2 “Mhlathuze Water provides you with job security:?
9.3 Mhlathuze Water does a good job of placing competent people in key positions.
10.1 Health and Safety is a key priority for management and staff at Mhlathuze
Water.
10.2 The work environment at Mhathuze Water is non-discriminatory and promotes
diversity
10.3 The organisational policies promote a healthy and conducive work
environment.
10.4 Mhlathuze Water respects and supports employees trying to balance work,
career and their personal life.
10.5 Employee social activities, team buildings and sport and corporate events
promote a positive, community like work environment which is pleasant to
employees.
10.6 The work environment at Mhlathuze Water is a key factor in retaining your
services at Mhlathuze Water
11.1 Mhlathuze Water provides opportunities for training and development for its
employees
11.2 Mhlathuze Water is committed to the long term career development of its
employees?
11.3 Gaps in employee performance have been addressed through training and
development initiatives.
11.4 Supervisors play a key role in training and development.
11.5 The provisions of Training and Development policies are clearly communicated
to and promoted by management to employees.
11.6 Training, development and career advancement opportunities are key factors in
retaining your services at Mhlathuze Water.
12.1 Your supervisor enables you to perform at your best
12.2 Your supervisor provides you with regular feedback and is clear about what
he/she expects from you in terms of performance
12.3 Your supervisor is able to address questions and concerns when they arise
12.4 Your supervisor is fair in the application of company policies and procedures
12.5 Your supervisor practices and encourages open communication and
information sharing
12.6 Your supervisor has a positive impact/influence in management being able to
retain your services at Mhlathuze Water.
12.7 You can trust the leadership to lead the organisation towards the attainment of
the vision
12.8 The leadership of the organisation promotes the values of the organisation
12.9 There is good communication from the leadership to employees
12.10 The leadership does a good job of aligning the organisation’s objectives to
individual performance
12.11 The leadership recognises and values employees for their contribution to the
organisation.
12.12 The leadership style at Mhlathuze Water is a factor that contributes towards
the retention of your services with the organisation.
13.1 The salary you receive is competitive with similar jobs in other organisations
13.2 Mhlathuze Water benefits sufficiently meets your needs
13.3 The PMS system is open, transparent and fair, rewarding those who have
performed.
Kolmogorov-Smirnov
Shapiro-Wilk
Statistic
df
Sig. Statistic
df
.205
88 .000
.894
88
.352
88 .000
.636
88
.146
88 .000
.927
88
.264
88 .000
.874
88
.218
88 .000
.900
88
.299
88 .000
.758
88
.344
88 .000
.788
88
.331
88 .000
.778
88
Sig.
.000
.000
.000
.000
.000
.000
.000
.000
.282
88
.000
.821
88
.000
.309
.278
88
88
.000
.000
.789
.868
88
88
.000
.000
.306
88
.000
.797
88
.000
.324
88
.000
.814
88
.000
.318
88
.000
.758
88
.000
.295
88
.000
.830
88
.000
.284
88
.000
.830
88
.000
.353
88
.000
.768
88
.000
.295
88
.000
.758
88
.000
.247
88
.000
.865
88
.000
.262
88
.000
.867
88
.000
.228
88
.000
.895
88
.000
.254
88
.000
.872
88
.000
.204
88
.000
.887
88
.000
.233
88
.000
.876
88
.000
.259
88
.000
.873
88
.000
.260
.245
88
88
.000
.000
.867
.877
88
88
.000
.000
.253
88
.000
.866
88
.000
.252
88
.000
.879
88
.000
.285
88
.000
.850
88
.000
.282
.245
88
88
.000
.000
.853
.868
88
88
.000
.000
.213
88
.000
.871
88
.000
.236
88
.000
.894
88
.000
.255
88
.000
.886
88
.000
.261
.237
88
88
.000
.000
.885
.891
88
88
.000
.000
.217
88
.000
.899
88
.000
Assumption of Homogeneity of
Variance
Non-Parametric Tests
• Non-parametric tests are used when
assumptions of parametric tests are not met:
– level of measurement (e.g. interval or ratio data);
– normal distribution; and
– homogeneity of variances across groups
• They make fewer assumptions about the type
of data on which they can be used
• Many of these tests will use “ranked” data
Decision Making Tree
Next up…
Z-TESTS AND T-TESTS
z-and t-tests
• The z-test is used to determine whether a sample mean differed from a
population mean.
• t-tests are used to determine the difference between means in
situations where we have to estimate the population standard deviation
from sample data.
• Difference between the two tests are that with z-tests the population
parameters (ơ & µ) are known, however, with t-tests, they are
unknown.
• The one-sample t-test uses a similar formula to the z-test, but the
standard error is estimated from the sample standard deviation.
• The aim of the t-test is to compare distributions that are normally
distributed. We can represent such distributions with a bell curve.
• The t-test formula always has the same general form
Assumptions of the z- and t-tests
1. The assumption of normality
2. The assumption of homogeneity of variance
3. The assumption of independence
Different types of t-tests…
1. One-sample t-test
• Standard error is estimated from the sample standard
deviation.
2. Independent samples t-test
• Used to compare two distributions that are independent of
each other
• Suitable in most situations where you have created two
separate groups by random assignment.
• It is not necessary to have equal sample sizes for your
samples.
• It is quite important to ensure that the assumption of
homogeneity of variance is not violated for this test
Types of t-tests
3. Repeated measures t-test
• used to compare means when the samples are
not independent. It is also known as the
related samples t-test
T-tests Applied
One-Sample Statistics
N
Mean
Std. Deviation
Std. Error Mean
Cups_of_coffee
55
2.69
1.942
.262
Level_of_alertness
55
21.35
12.175
1.642
One-Sample Test
Test Value = 0
95% Confidence Interval of the
Difference
Mean
t
df
Sig. (2-tailed)
Difference
Lower
Upper
Cups_of_coffee
10.274
54
.000
2.691
2.17
3.22
Level_of_alertness
13.002
54
.000
21.345
18.05
24.64
Decision Making Tree
Next up…
ANOVA
ANOVA – Analysis of Variance
• ANOVA is used to test for differences between the
means of more than two groups
• Allows us to test the difference between more
than two groups of subjects and the influence of
more than one independent variable
• Because we are examining a set of possible
differences, instead of testing for a difference
between two means, we test for an effect.
• A significant effect is present in the data when at
least one of the possible comparisons between
group means is significant.
How does it work?
• As the name suggests, the procedure involves
analysing variance.
• Variance is a measure of the dispersion in a
set of scores, and is calculated by determining
the ‘average distance’ of a set of scores from
its ‘centre’ or mean, by the formula:
ANOVA cont…
• In ANOVA terminology, the independent variables are
called factors.
• Instead of talking about variance, in ANOVA
terminology we talk about Mean Squares (abbreviated
to MS).
• This is essentially what variance is – the mean or
average of the sum of squared differences between
each score in a set of scores and the mean of those
scores.
• ANOVA we need to distinguish between, and estimate,
two different types of variance – random/error
variance, and systematic variance.
Error variance and systematic
variance
• Error variance random or unexplained Variance
between the means of samples drawn from the
same population.
• variance between sample means is unexplained, random variance,
and is also commonly known as error variance
• Systematic variance is the variance in a set of
scores that we can explain in terms of the
independent variable.
The whole aim of computing ANOVA is to determine whether there is
systematic variance present. If there is systematic variance present in a
dataset, we have a significant effect.
Detecting systematic variance
• To determine whether or not there is
systematic variance present in a dataset, we
have to follow a rather indirect path by
comparing the variance within the groups to
the variance between the groups.
Compare the distribution of scores within the cells of Dataset 2 with the differences
between the means, you will note a very different pattern.
• Little variance within the cells (look at the range of scores within each cell), but
there are large differences between the cell means.
• Here it appears as though there may be systematic differences between the groups,
since, although there is error variance present, it appears to be relatively small in
comparison with the differences between the group means.
It is quite likely that a significant effect would be found for this pattern of data since the
difference in group means is large in comparison with the error variance between
individual scores within the groups.
This is a situation where the null hypothesis, H˳: μ1 = μ2 = μ3, is very likely to be false.
• If the variance between the group means (error
variance + systematic variance) is much greater than
the variance within the cells (error variance), then this
must be due to the presence of systematic variance
• In technical language, the variance within the cells is
known as MSError - estimate of error variance.
• The variance between the groups is known as
MSGroup - estimate of error variance plus systematic
variance.
• To determine whether an effect is present in an
ANOVA, we should estimate mathematically the size of
MSGroup and MSError, and then compare them. To the
extent that MSGroup (error variance + systematic
variance) is larger than MSError (error variance), it is
likely that there is a significant effect.
Post-hoc tests
• Useful for determining precisely where the
differences between the means lie
• Tukey’s Honestly Significantly Difference test
(HSD):
– The HSD statistic is a critical range applied to pairwise comparisons between
groups. What this means is that if any of the differences between the group
means is greater than this critical range, we can conclude that there are
significant differences between these groups.
Multiple Comparisons
Dependent Variable: Stress_Levels
Tukey HSD
Mean
95% Confidence Interval
(I) Subject
(J) Subject
Difference (I-J) Std. Error Sig.
Lower Bound
Upper Bound
Statistics
Management
-10.50000*
1.48499
.000
-14.1819
-6.8181
HR
-22.20000*
1.48499
.000
-25.8819
-18.5181
10.50000*
1.48499
.000
6.8181
14.1819
HR
-11.70000*
1.48499
.000
-15.3819
-8.0181
Statistics
22.20000*
1.48499
.000
18.5181
25.8819
Management
11.70000*
1.48499
.000
8.0181
15.3819
Management Statistics
HR
*. The mean difference is significant at the 0.05 level.
There is a significant difference in the means of
student stress levels between the subjects Statistics
and Management subjects (p < 0.0001), Statistics and
HR (p < 0.0001). There were also significant
differences in stress levels between Management
and HR (p < 0.0001).
Factorial analysis of
variance
• Used for research designs that have more than
one independent variable.
Why use factorial designs?
• Factorial designs are preferable to one-way
designs for three related reasons:
1. They are realistic, capturing the complexity of social and
psychological phenomena.
2. They allow us to analyse interactions between variables.
3. They are economical, allowing many hypotheses to be
tested simultaneously.
Assumptions
Normality
• The populations represented by the data should be normally distributed,
making the mean an appropriate measure of central tendency.
• Estimate the distribution of the parent populations from the data at hand.
When we have small cell numbers, therefore, we should tolerate
deviations from normality, appreciating that our estimates are unreliable.
• In addition, ANOVA is a robust statistical procedure: the assumption of
normality can be violated with relatively minor effects. Nevertheless,
ANOVA is inappropriate in situations where you have unequal cell sizes
and distributions skewed in different directions.
2. Homogeneity of variance
• The populations from which the data are sampled should have the same
variance. With balanced designs (i.e. equal numbers of subjects per cell)
this assumption can be violated without major effects on the final results.
ANOVA Applied…
Descriptives
Level_of_alertness
N
0
1
2
3
4
5
6
7
Total
5
14
12
6
6
5
6
1
55
Mean
4.80
11.36
17.17
21.67
28.00
36.20
43.00
48.00
21.35
Descriptives
95% Confidence Interval for
Mean
Std.
Deviation
Std. Error Lower Bound Upper Bound Minimum Maximum
.837
.374
3.76
5.84
4
6
1.216
.325
10.66
12.06
10
14
1.267
.366
16.36
17.97
15
19
1.366
.558
20.23
23.10
20
24
2.098
.856
25.80
30.20
24
30
3.899
1.744
31.36
41.04
32
40
.894
.365
42.06
43.94
42
44
.
.
.
.
48
48
12.175
1.642
18.05
24.64
4
48
Test of Homogeneity of Variances
Level_of_alertness
Levene Statistic
df1
df2
Sig.
6.875a
6
47
.000
a. Groups with only one case are ignored in computing the
test of homogeneity of variance for Level_of_alertness.
Levene’s is significant – homogeneity
cannot be assumed
ANOVA Applied 2…
ANOVA
Level_of_alertness
Sum of Squares
Between Groups
Within Groups
Total
df
Mean Square
7868.622
7
1124.089
135.814
47
2.890
8004.436
54
F
389.003
Sig.
.000
Significant result
(p < 0.0001)
Decision Making Tree
Repeated Measures ANOVA
• Equivalent to the one-way ANOVA, except:
– For related, not independent groups
•
•
•
•
•
•
Extension of the repeated t-test
Test to detect any overall differences between related means
Requires one IV and one DV
DV = continuous (interval or ratio)
IV = categorical (either nominal or ordinal)
Because data is tested more than once – the assumption of independence
is not relevant
• Makes the assumption of Sphericity
– relationship between pairs of experimental conditions is similar i.e. the level
of dependence between pairs of groups is equal
– SPSS tests for this through Mauchly’s test for Sphericity
– If the assumption of Sphericity is not met i.e. violated: use a correction factor
Epsilon(𝜀)
• 𝜀 > 0.75, then use Huynh-Feldt
• 𝜀 < 0.75, then use the Greenhouse-Geisser
Repeated measures ANOVA
a
Mauchly's Test of Sphericity
Measure: MEASURE_1
b
Epsilon
Within Subjects
Mauchly's Approx. ChiGreenhouseHuynhLowerEffect
W
Square
df
Sig.
Geisser
Feldt
bound
TIme
.434
3.343
2
.188
.638
.760
.500
Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent
variables is proportional to an identity matrix.
a. Design: Intercept
Within Subjects Design: TIme
b. May be used to adjust the degrees of freedom for the averaged tests of significance. Corrected tests are
displayed in the Tests of Within-Subjects Effects table.
Mauchly’s test of Sphericity has been met (χ² = 3.343, df = 2, p = 0.188)
Tests of Within-Subjects Effects
Measure: MEASURE_1
Type III
Sum of
Squares
Source
TIme
Sphericity
Assumed
GreenhouseGeisser
Huynh-Feldt
Lower-bound
Error(TIme) Sphericity
Assumed
GreenhouseGeisser
Huynh-Feldt
Lower-bound
Mean
Square
df
F
Sig.
Partial
Eta
Squared
Noncent. Observed
a
Parameter Power
143.444
2
71.722 12.534
.002
.715
25.068
.975
143.444
1.277
112.350 12.534
.009
.715
16.003
.886
143.444
143.444
1.520
1.000
94.351 12.534
143.444 12.534
.005
.017
.715
.715
19.056
12.534
.930
.806
57.222
10
5.722
57.222
6.384
8.964
57.222
7.602
7.528
57.222
5.000
11.444
a. Computed using alpha = .05
There is a significant difference in safety behaviours as a result of having
undergone safety training (F(2, 10) = 12.534, p = 0.002).
Pairwise Comparisons
Measure: MEASURE_1
Mean Difference
(I) TIme
(J) TIme
(I-J)
Std. Error
1
2
-2.500
1.522
*
3
-6.833
1.701
2
1
2.500
1.522
*
3
-4.333
.715
*
3
1
6.833
1.701
*
2
4.333
.715
Based on estimated marginal means
*. The mean difference is significant at the .05 level.
b. Adjustment for multiple comparisons: Bonferroni.
Sig.
b
.484
.030
.484
.005
.030
.005
95% Confidence Interval for
b
Difference
Lower Bound
Upper Bound
-7.879
2.879
-12.846
-.821
-2.879
7.879
-6.860
-1.807
.821
12.846
1.807
6.860
The differences lie between the pre-training and 6 months (p = 0.030),
and between 3 and 6 months (p = 0.005).
Decision Making Tree
Next Up……
NON-PARAMETRIC TESTS
Mann-Whitney U-test
• Perhaps the most common distribution-free
test for differences between unrelated
samples
• Used for research designs similar to those for
which the independent samples t-test is used
• This means that it can be used whenever you
have two groups of scores that are
independent of each other
Test Statisticsa
Time_taken
Mann-Whitney U
6.500
Wilcoxon W
21.500
Z
-1.261
Asymp. Sig. (2-tailed)
.207
Exact
Sig.
[2*(1-tailed
Sig.)]
.222b
a. Grouping Variable: Gender
b. Not corrected for ties.
You can see that Sig is > 0.05 in both Asymp and Exact Sig, which means that
there is no significant difference in time taken, based on respondents gender.
You report your findings as: U = 6.500, p = 0.207.
Kruskal-Wallis test
• Test the difference between three or more
groups in much the same way as ANOVA does
in parametric statistical procedures
• Extension of the Mann-Whitney U-test for
three or more independent samples
• Omnibus test for the equality of independent
population medians
Kruskal-Wallis Applied…
Level_of_alertness
Ranks
Cups_of_coffee
1
2
3
4
5
6
7
Total
N
19
12
6
6
5
6
1
Mean Rank
10.00
25.50
34.58
40.42
46.00
51.50
55.00
55
Test Statisticsa,b
Level_of_alertness
Chi-Square
51.055
df
6
Asymp. Sig.
.000
a. Kruskal Wallis Test
b. Grouping Variable: Cups_of_coffee
Significant result (p < 0.0001)
Related Samples: The sign test
• Related samples occur when the same group
of people is measured more than once, such
as in ‘before and after’ research designs.
• Non-parametric equivalent of related samples
t-test
• Considers the difference between two related
samples
Frequencies
N
Time_taken2
- Negative Differencesa
8
Time_taken1
Positive Differencesb
0
Tiesc
2
Total
10
a. Time_taken2 < Time_taken1
b. Time_taken2 > Time_taken1
You can see how many participants decreased (the
"Negative Differences" row), improved (the
"Positive Differences" row) or witnessed no
change (the "Ties" row) in their performance
between the two trials.
c. Time_taken2 = Time_taken1
Test Statisticsa
Time_taken2 Time_taken1
Exact Sig. (2-tailed) .008b
a. Sign Test
b. Binomial distribution used.
The statistical significance of the sign test is found in the
"Exact Sig. (2-tailed)" row of the table above. However, if
you had more than a total of 25 positive and negative
differences, an "Asymp. Sig. (2-sided test)" row will be
displayed instead. You report your findings as:
An exact sign test was used to compare the differences in
the speed with which in the two trials were completed. The
respondents elicited a statistically significant median
decrease in time between the two tests (p = 0.008).
Related samples: The Wilcoxon
matched pairs test
• Similar to the sign test except that when we
have obtained the difference scores between
the two samples we must rank-order the
differences, ignoring the sign of the difference
• Tests whether two related samples have the
same median
• More powerful than the sign test.
Ranks
N
Mean Rank Sum of Ranks
Time_taken2
- Negative Ranks
8a
4.50
36.00
Time_taken1
Positive Ranks
0b
.00
.00
Ties
2c
Total
10
a. Time_taken2 < Time_taken1
b. Time_taken2 > Time_taken1
c. Time_taken2 = Time_taken1
The Ranks table provides some interesting data on the comparison of participants'
Before (Pre) and After (Post) speed at which they completed the test’s score. We can
see from the table's legend that 8 participants had a higher pre-intervention time
than after the intervention. And 0 participants had a higher time post the
intervention, whilst 2 participants saw no change in their time taken.
Test Statisticsa
Time_taken2 Time_taken1
Z
-2.536b
Asymp. Sig. (2-tailed) .011
a. Wilcoxon Signed Ranks Test
b. Based on positive ranks.
We report it as follows:
The Wilcoxon signed-rank test revealed that
the intervention yielded a statistically
significant difference in the time taken to
complete the test (Z = -2.536, p = 0.011).
Three or more groups of scores: Friedman
(the economist’s) rank test for related
samples
• Friedman test is an extension of the Wilcoxon test for
three or more related samples
• Analogue of one-way repeated measures analysis of
variance
• It is used for the analysis of within-subjects designs
where more than two conditions are being compared
• In general, the degrees of freedom for an estimate is
equal to the number of values minus the number of
parameters estimated en route to the estimate in
question
Descriptive Statistics
Ranks
Percentiles
Std.
N
Mean
Deviation
Mean Rank
50th
Minimum Maximum 25th
(Median)
Time_taken1
2.90
75th
Time_taken2
1.85
Time_taken3
1.25
Time_taken1 10
22.5000 5.46199
15.00
30.00
18.2500 21.0000
27.7500
Time_taken2 10
20.2000 4.10420
15.00
28.00
16.5000 20.5000
22.5000
Time_taken3 10
18.9000 4.74810
12.00
29.00
15.0000 18.0000
22.0000
The Friedman test compares the mean ranks between the related groups and indicates how
the groups differed, and it is included for this reason. However, you are not very likely to
actually report these values in your results section, but most likely will report the median
value for each related group.
Test Statisticsa
N
10
Chi-Square
15.943
df
2
Asymp. Sig.
.000
a. Friedman Test
If you look at the contents of this table – you
can see that Sig. is significant (< 0.05),
therefore there is a significant difference in the
amount of time taken to complete the three
tests (χ² (2) = 15.943, p < 0.0001).
Some considerations
• Extraneous variables
– These are external and uncontrolled variables that impact on a
relationship. These interfere in multiple ways:
– Where the change in the DV is attributable to a variable other than the
IV (third variable problem)
– Results are “confounded” by interactions between the DV’s
– Additional variables outside of the “experiment” enter the
experimental condition, and impact on the results
• Confounding or “third” variables
– This is where a third variable could be accountable for the observed
relationship between two variables
Moderator variables
Moderator variables impact the strength of the
relationship
between two variables.
Mediator variables
This occurs when the variables affect each other.