Download to course notes for last six chapters in .

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Omnibus test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Introduction to Statistics
Class Overheads
for
APA 3381 - part 2
“Measurement and Data Analysis
in Human Kinetics”
by
D. Gordon E. Robertson, PhD, FCSB
School of Human Kinetics
University of Ottawa
Copyright © D.G.E. Robertson, October 2015
2
Hypothesis Testing
Hypothesis: conjecture, proposition or statement based on
published literature, data or a theory which may or may not be
true.
Statistical Hypothesis: conjecture about a population
parameter.
•
usually stated in mathematical terms
•
two types, null and alternate
Null Hypothesis (H0): states that there is NO
difference between a parameter and a specific value or
among several different parameters
Alternate Hypothesis (H1): states that there is a
“significant” difference between a parameter and a
specific value or among several different parameters
Examples:
•
H0: : = 82 kg H1: : … 82 kg*
•
H0: : # 150 cm
H1: : > 150 cm
•
H0: : $ 65.0 s H1: : < 65.0 s
•
H0: :0 = :1
H1: ì0 … ì1*
•
H0: :0 $ :1
H1 : : 0 < : 1
Notice that the equality symbols are always with the
null hypotheses.
* These are called two-tailed tests; others are all
“directional” or one-tailed tests.
3
Two-tailed vs One-tailed Tests
Two-tailed: also called a non-directional test.
•
null hypothesis is “disproved” if sample mean falls in
either tail
•
most appropriate test
especially with no previous
experimentation
•
less powerful than onetailed
One-tailed: also called a directional
test.
•
researcher must have reason that permits selecting in
which tail the test will be done, i.e., will the
experimental protocol increase or decrease the sample
statistic
•
more powerful than two-tailed since it is easier to
achieve a significant difference
•
fails to handle the situation when the sample means
falls in the “wrong” tail
One-tailed, left
One-tailed, right
4
Statistical Testing
To determine the veracity (truth) of an hypothesis a statistical
test must be undertaken that yields a test value. This value is
then evaluated to determine if it falls in the critical region of a
appropriate probability distribution for a given significance or
alpha (a) level.
The critical region is the region of the probability
distribution which rejects the null hypothesis. Its limit(s), called
the critical value(s), are defined
by the specified confidence level.
The confidence level must be
selected in advance of
computing the test value. To do
otherwise is statistical dishonesty.
When in doubt one should always
use a two-tailed test.
Truth table:
H0 is true and
H1 is false
H0 is false and
H1 is true
Test rejects H0
(accepts H1)
Error (a) ;
Type I error
Correct (1 - b) (
(experiment
succeeded)
Test does not reject
H0 (accepts H0)
Correct (1 - a) ;
(experiment
failed)
Error (b) ;
Type II error
5
z-Test and t-Test
Test for a Single Mean:
•
•
used to test a single sample mean (
) when the
population mean (:) is known
Is the sample taken from the population or is it different
(greater, lesser, or either)?
z-Test:
• when population s.d. (s) is known
Test value:
•
•
if z is in critical region defined by critical value(s) then
sample mean is “significantly different” from
population mean, m
if sample size is less than 30, data must be normally or
approximately normally distributed
t-Test:
•
•
if s is unknown use s and use t-test and t-distribution
with d.f. = n –1
if sample size is less than 30, data must be normally or
approximately normally distributed
Test value:
6
Flow Diagram for Choosing
the Correct Statistical Test
Same as flow diagram used for confidence intervals. Generally
the sample’s mean and standard deviation are used with the
t-distribution. The t-distribution becomes indistinguishable from
the z-distribution (normal distribution) when n is 30 or greater.
P-values of a Statistical Test
Instead of reporting significance levels (a = 0.05) or equivalent
probabilities (P<0.05) many researchers report the test values as
probabilities or P-values (e.g., P = 0.0455, P = 0.253, P < 0.001).
Advanced statistical programs report P-values, if not, use P<0.05
or P<0.01. If test shows P=0.000 then report P<0.0005.
7
Power of a Statistical Test
Power: ability of a statistical test to detect a real difference.
•
•
probability of rejecting the null hypothesis when it is
false (i.e., there is a real difference)
equal to 1 – b (1 – probability of Type II error)
Ways of increasing power
•
•
•
•
•
•
•
•
Increasing a will increase power but it also increases
chance of a Type I error (usually not > 0.10).
Increasing sample size is always a good choice but
costs increase.
Using ratio or interval data versus nominal or ordinal.
Tests involving ratio/interval data are called
parametric tests. Tests involving nominal and ordinal
data are called nonparametric tests. Parametric tests
are more powerful. Use them when you can.
Using repeated-measures tests, such as, the repeatedmeasures t-test or repeated-measures ANOVA. By
using the same subjects repeatedly, variability is
reduced. But subjects could improve because of
practice or become worse because of fatigue or
boredom.
If variances are equal use pooled estimates and the
appropriate test.
Using samples that represent extremes. Reduces
generalizability of experiment results.
Standardizing testing procedures reduces variability.
Using one-tailed vs. two-tailed tests. Serious problem
occurs if results are in wrong tail. Not recommended.
8
Testing Differences between Two Means
Large Independent Sample Means: used to test whether the
data from two samples come from the same population or two
different populations.
Assumptions:
•
data were randomly sampled
•
samples are independent, i.e., there can be no
relationship between the two samples
•
standard deviations known and if sample size < 30
population(s) is/are normally distributed
•
if more than two sample means are tested
adjustments must be made to significance level
(e.g., Bonferroni correction, aBonferroni = a/number
of tests)
z-test:
Test value:
Critical value comes from standard normal (z) distribution. Use
one- or two-tailed test. Conservatively, choose the two-tailed test.
Values are also available at bottom of t-distribution.
9
The Step-by-Step Approach
Step 1: State hypotheses
Two-tailed:
One-tailed:
H0 : : 1 = : 2
H0: :1 # :2 or H0: :1 $ :2
H1 : : 1 … : 2
H1: :1 > :2 or H1: :1 < :2
Step 2: Find critical value
Look up z-score for specified significance (a) level and for oneor two-tailed test (selected in advance). Usually use a = 0.05 and
two-tailed test, i.e., zcritical = ±1.960. For one-tailed tests use
zcritical = –1.645 or +1.645.
Step 3: Compute test value
Step 4: Make decision
Draw diagram of normal
distribution and critical regions. If test
value is in critical region reject the null
hypothesis otherwise do not reject.
Step 5: Summarize results
Restate hypothesis (null or alternate)
accepted in step 4.
If null is rejected:
There is enough evidence to reject the null hypothesis.
If null is not rejected:
There is not enough evidence to reject the null hypothesis.
Optionally, reword hypothesis in “lay” terms. E.g., There is or is
not a difference between the two populations or one population is
greater/lesser than the other for the independent variable.
10
Testing Differences between Two Means
Small Independent Sample Means: when population
standard deviations are unknown and sample size is < 30 use
t-distribution for critical values and t-test for test values. Use
F-ratio to determine whether sample variances are equal or
unequal. Then choose the appropriate t-test.
Assumptions
•
samples are random samples
•
two samples must be independent, i.e., different
subjects, if not, use “dependent groups t-test”
•
population must be normally distributed
If sample variances are NOT equal:
Use test value:
for degrees of freedom
(df) use smaller of
n1 –1 and n2 –1
(i.e., conservative choice,
higher critical value)
If sample variances are equal:
Use test value:
and df = n1 + n2 – 2
Uses a “pooled” estimate of variance that combined with
reduced degrees of freedom increases the test’s power.
11
Test for Equal Variances
Also called Homogeneity of Variance
•
used primarily to determine which t-test to use
•
uses F-distribution and F-test (later used for ANOVA)
•
assume variances are equal and test if unequal
•
SPSS uses “Levine’s Test for Equality of Variances.” If
P (Sig.) < a variances are NOT equal.
Step 1: Always a two-tailed test.
H0: s12 = s22
H1: s12 … s22
Step 2:
Find critical value (FCV) from F-distribution. Use degrees of
freedom of larger variance (dfN = nlarger –1) as numerator and
degrees of freedom of smaller variance as denominator (dfD =
nsmaller –1).
Step 3:
Compute test value:
Note, FTV will always be $1.
Steps 4 and 5:
If FTV > FCV then reject H0 and conclude variances are unequal.
If FTV # FCV then do NOT reject H0 and conclude variances
are equal. I.e., you have homogeneity of variances.
You can now select the appropriate “Independent Groups t-test”.
12
Flow Diagram for Choosing the
Correct Independent Samples t-Test
Similar to flow diagram used for single sample means. But
requires a test for equality of variances (homogeneity of
variance). Generally the sample’s mean and standard deviation
are used with the t-distribution. The t-distribution becomes
indistinguishable from the z-distribution (i.e., normal distribution)
when n is 30 or greater. Samples must be random and
independent.
13
Testing Differences between Two Means
Dependent Sample Means:
Used when two samples are not independent. More powerful
than independent groups t-test and easier to perform (no variance
test required). Simplifies research protocol (i.e., fewer subjects)
but dependence may limit generalizability.
Examples:
•
repeated measures (test/retest, before/after)
•
matched pairs t-test (subjects matched by a relevant
variable: height, weight, shoe size, IQ score, age)
•
twin studies (identical, heterozygotic, living apart)
Step 1: Two-tailed:
One-tailed:
H0 : : D = 0
H0: :D # 0 or H0: :D $ 0
H1 : : D … 0
H1 : : D > 0
or H1: :D < 0
Step 2:
Critical value from t-distribution with degrees of freedom equal
to number of data pairs minus one (df = n – 1).
Step 3:
Compute differences
between pairs (D) then
mean difference (
)
and sD :
Test value:
Step 4 and 5:
If test value > critical value reject H0 otherwise there is no
difference between the two trials/groups.
14
Correlation and Regression
Linear Correlation:
•
Does one variable increase or decrease linearly with
another?
•
Is there a linear relationship between two or more variables?
Types of linear relationships:
Positive linear
No relationship
Negative linear
None or weak
15
Scattergrams
Weak linear
Strong Linear
Other relationships:
Nonlinear or
Curvilinear
Linear and
Exponential?
16
Correlation
Pearson Product Moment Correlation Coefficient:
•
simply called correlation coefficient, PPMC, or
r-value
•
linear correlation between two variables
Examples:
Weight increases with height.
IQ with brain size?!
Used for calibration of instruments, force transducers, spring
scales, electrogoniometers (measure joint angles).
Multiple Correlation:
•
used when several independent variables influence a
dependent variable
•
R-value (capital R vs r)
Defined as: Y = A + B1 X1 + B2 X2 + B3 X3 + ... + Bn Xn
Examples:
•
Heart disease is affected by family history, obesity,
smoking, diet, etc.
•
Academic performance is affected by intelligence,
economics, experience, memory, etc.
•
Lean body mass is predicted by a combination of body
mass, thigh, triceps and abdominal skinfold measures.
17
Significance of Correlation Coefficient
Method 1
Step 1: H0: r = 0; H1: r … 0
Step 2: Look up rcrit for n – 2 degrees of freedom (Table I)
Step 3: Compute sample r (as above)
Step 4: Sample r is significant if it is greater than
rcrit
Step 5: If significance occurs data are linearly correlated
otherwise they are not.
If table of significant correlation coefficients is not available or
significance level (a) is not 0.05 or 0.01 use Method 2.
Method 2
Step 1: H0: r = 0; H1: r … 0
Step 2: Look up tcrit for n –2 degrees of freedom
Step 3: Compute sample r then t
Step 4: Sample t is significant if it is greater than tcrit
Step 5: If significance occurs data are linearly correlated
otherwise they are not.
18
Regression
Regression: can only be done if a significant correlation
exists.
•
Equation of line or curve which defines the relationship
between variables.
•
The “line of best fit”.
•
Mathematical technique is called “least squares” method.
This technique computes the line that minimizes the squares
of the deviations of the data from the line.
19
Coefficient of Determination and
Standard Error of Estimate
Coefficient of Determination
•
Measures the strength of the relationship between the two
variables.
•
Equal to the explained variation divided by the total
variation = r2.
•
Usually given as a percentage, i.e.,
coefficient of determination = r2 × 100%
For example, an r of 0.90 has 81% of total variation explained
but an r of 0.60 has only 36% of its variation. A correlation may
be significant but explain very little.
Standard Error Of Estimate
•
Measure of the variability of the observed values about the
regression line
•
Can be used to compute a confidence interval for a predicted
value
standard error of estimate:
20
Possible Reasons for a Significant Correlation
1. There is a direct cause-and-effect relationship between the
variables. That is, x causes y. For example, positive reinforcement
improves learning, smoking causes lung cancer and heat causes ice to
melt.
2. There is a reverse cause-and-effect relationship between the
variables. That is, y causes x. For example, suppose a researcher
believes excessive coffee consumption causes nervousness, but the
researcher fails to consider that the reverse situation may occur. That is,
it may be that an nervous people crave coffee.
3. The relationship between the variables may be caused by a third
variable. For example, if a statistician correlated the number of deaths
due to drowning and the number of cans of soft drinks consumed
during the summer, he or she would probably find a significant
relationship. However, the soft drink is not necessarily responsible for
the deaths, since both variables may be related to heat and humidity.
4. There may be a complexity of interrelationships among many
variables. For example, a researcher may find a significant relationship
between students’ high school grades and college grades. But there
probably are many other variables involved, such as IQ, hours of study,
influence of parents, motivation, age and instructors.
5. The relationship may be coincidental. For example, a researcher
may be able to find a significant relationship between the increase in
the number of people who are exercising and the increase in the number
of people who are committing crimes. But common sense dictates that
any relationship between these two variables must be due to
coincidence.
21
Comparing Frequencies using
Chi-square
Chi-square or c2: pronounced “ki squared”.
• Used to test whether the frequency of nominal
data fit a certain pattern (goodness of fit) or
whether two variables have a dependency (test
for independence).
• Can be used to test whether data are normally
distributed and for homogeneity of proportions
• Frequency of each nominal category is computed
and compared to an expected frequency.
Goodness of Fit:
• Need to know expected pattern of frequencies.
• If not known assume equal distribution among all
categories.
Assumptions:
• data are from a random sample
• expected frequency for each category must be
5 or more
Examples:
• test for product / procedure preference (each
is assumed equally likely to be selected)
• test for “fairness” of coin, die, roulette wheel
(expect each outcome equally)
• test for expected frequency distribution (need
theoretically expected pattern)
22
Goodness of Fit Test
Step 1
H0: data fit the expected pattern
H1: data do not fit expected pattern
Step 2
Find critical value from c2 table. Test is always a
one-tailed right test with n-1 degrees of freedom,
where n is number of categories.
Step 3
Compute test value from:
O = observed freq.
E = expected frequency
Step 4
Make decision. If c2 > critical value reject H0.
Step 5
Summarize the results.
E.g., There is (not) enough evidence to
accept/reject the claim that there is a preference
for ________.
E.g.,
Coin is fair / unfair
Die is fair / “loaded”
Wheel is fair / flawed
23
Test for Independence
Step 1
H0: two variables are independent
H1: two variables are dependent
Step 2
Find critical value from c2 table. Test is always
one-tailed right with (nrow –1)(ncol. –1) degrees of
freedom, where nrow and ncol. are the number of
categories of each variable. These correspond to the
number of rows and columns in the contingency table.
Step 3
Create the contingency table to derive the
expected values (see next page).
Compute test value from:
O = observed freq.
E = expected frequency
Step 4
Make decision. If c2 > critical value reject H0.
Step 5 Summarize the results.
E.g. Getting a cold is dependent upon whether
you took a cold vaccine.
- Smoking and lung disease are dependent.
- Is a cure dependent upon placebo vs. drug.
24
Contingency Table
First, enter observed (O) scores and compute
row and column totals.
Col.1
Col.2
Col.3
Row 1 25
10
5
40
Row 2 10
20
5
35
Row 3 5
30
15
50
totals
60
25
125* / 125*
40
totals
* Notice sums of row and column totals must be equal.
Second, compute expected (E) values based of
row and column totals.
Col.1
Col.2
Col.3
Row 1 40x40/125=E 11
40x60/125=E 12 40x25/125=E 13
Row 2 35x40/125=E21
35x60/125=E22 35x25/125=E23
Row 3 50x40/125=E31
50x60/125=E32 50x25/125=E33
Finally, compute the test value:
25
Analysis of Variance (ANOVA)
One-way ANOVA:
•
used to test for significant differences among sample
means
•
differs from t-test since more than 2 groups are tested,
simultaneously
•
one factor (independent variable) is analyzed, also
called the “grouping” variable
•
dependent variable should be interval or ratio but
factor is nominal
Factorial Design: - groups must be independent (i.e.,
subjects in each group are different and unrelated)
Assumptions:
•
data must be normally distributed or nearly
•
variances must be equal (i.e., homogeneity of
variance)
Examples:
•
Does fitness level (VO2Max) depend on province
of residence? Fitness level is a ratio variable,
residence is a nominal variable.
•
Does statistics grade depend of highest level of
mathematics course taken?
•
Does hand grip strength vary with gender? (Can
be done with t-test. t-test can handle equal or
unequal variances.)
26
One-way ANOVA
An ANOVA tests whether one or more samples means are
significantly different from each other. To determine which or
how many sample means are different requires post hoc testing.
Two samples where
means are significantly
different.
These two sample means
are NOT significantly
different due to smaller
difference and high variability.
Even with same difference
between means, if variances are
reduced the means can be
significantly different.
27
One-way ANOVA
Step 1
H0: all sample means are equal
H1: at least one mean is different
Step 2
Find critical value from F table (Table H). Tables are
for one-tailed test. ANOVA is always one-tailed.
Step 3
Compute test value
from:
Step 4
Make decision.
If F > critical value reject H0.
Step 5
Summarize the results with ANOVA table.
All means are the same, i.e., come from the same
population or at least one mean is significantly
different.
Step 6 If a significant difference is found, perform post hoc
testing to determine which mean(s) is/are different.
28
ANOVA Summary Table
Source
Sums of
d.f.
squares
Between SSB
k-1
(also called Main effect)
Mean
F
square
SSB/(k-1)=sB 2 sB 2/sW 2
P
Within
SSW
N–k
SSW /(N–k)=sW 2
(also called Error term)
Total
SSB +SSW
(k–1)+(N–k)= N–1
Examples:
One-way Factorial
Source
Sums of
squares
Between 160.13 2
Within
104.80 12
Total
264.93 14
Two-way Factorial
Source
Sums of
squares
Factor A 3.920
Factor B
9.690
AxB
54.080
Within
3.300
Total
70.980 7
d.f.
d.f.
1
1
65.552
4
Mean
square
80.07
8.73
Mean
square
3.920
9.680
79.456
0.825
F
9.17
F
P
<0.01
P
4.752 NS
11.733 <0.025
<0.005
29
Post Hoc Testing
Post Hoc testing
•
used to determine which mean or group of means is/are
significantly different from the others
•
many different choices depending upon research design
and research question (Duncan’s, Scheffé’s, Tukey’s
HSD, ...)
•
only done when ANOVA yields a significant F
Scheffé test:
•
when sample sizes are unequal
•
conservative test is desirable
•
when all comparisons are to be tested
Critical value: Use critical value from ANOVA and
multiply by k–1. k = number of groups (means)
F'critical = (k–1) Fcritical
Test value:
Decision:
If Fs > F'critical, then the two means are
significantly different.
Summary:
graph the sample means
30
Post Hoc Testing 2
Tukey HSD test:
•
sample sizes must be equal
•
used when less conservative test is desirable, i.e., more
powerful
•
when only some comparisons are to be tested
Critical value:
Use Table N, where k = number of groups and
v = degrees of freedom of sW 2
Test value:
Decision:
If q > critical value, then the means are significantly
different.
Summary:
Graph the results and summarize.
31
Nonparametric Statistics
Nonparametric or Distribution-free statistics:
•
used when data are ordinal (i.e., rankings)
•
used when ratio/interval data are not normally
distributed (data are converted to ranks)
•
for studies not involving population parameters
Advantages:
•
no assumptions about distribution of data
•
suitable for ordinal data
•
can test hypotheses that do not involve population
parameters
•
computations are easier to perform
•
results are easier to understand
Disadvantages:
•
less powerful (less sensitive) than parametric
•
uses less information than parametric (ratio data are
reduced to ordinal scale)
•
less efficient, therefore larger sample sizes are needed to
detect significance
Examples:
•
Is there a bias in the rankings of judges from
different countries?
•
Is there a correlation between the rankings of two
judges?
•
Do different groups rank a professor’s teaching
differently?
32
Paired Sign Test
•
used for repeated measures tests
Step 1
H0: no change/increase/decrease between before and
after tests
H1: there was a change/increase/decrease
Step 2
Find critical value from table (Table J) for given a level,
sample size (>7) and whether one- or two-tailed
hypothesis
Step 3
Subtract “after” from “before” scores, then count number
of positive (+) AND negative (–) differences. Zeros
(pairs are equal) do not count.
Step 4
Make decision. If smallest count (+ or –) less than
critical value, reject H0.
Step 5
Summarize the result.
I.e., there was/was not a change or there was/was not and
increase/decrease in the dependent variable.
33
Wilcoxon Rank Sum Test
•
•
•
also called Mann-Whitney U test
used to compare two independent groups
replacement for independent groups t-test
Step 1 H0: no difference/increase/decrease in groups
H1: there was a difference/increase/decrease between
groups
Step 2 Find critical value from z-table (Table E) for given a
level and whether hypotenuse was one- or two-tailed.
Step 3
•
Rank all data together.
•
Sum ranks of group with
smaller size (n1). Call this
R.
•
Compute test value, z.
(Note, n1 and n2 must be
10 or greater and n2 is
larger of two sample sizes
or equal to n1)
Step 4
If test value (z) > critical value, reject H0.
Step 5 Summarize the result.
34
Wilcoxon Signed-Rank Test
•
•
•
more powerful than Paired Sign Test
use to compare two dependent samples (e.g., repeated
measures)
replaces dependent groups t-test
Step 1 H0: no change/increase/decrease between groups
H1: there is a change/increase/decrease
Step 2 Find critical value from table (Table K) for given a
level, sample size (5 or greater) and whether hypothesis
was one- or two-tailed. Use z-table (Table E) and z test
value if n>30.
Step 3
•
•
•
•
•
compute differences
find absolute value of differences.
rank the differences
sum the positive and negative ranks separately and call
smaller absolute value the test value, ws
If n > 30 use test value,
z, from:
Step 4 If ws less than critical
value or
if z > critical z value, reject H0.
Step 5 Summarize the result.
35
Kruskall-Wallis Test
•
•
similar to Wilcoxon Rank Sum test but for more than 2
groups
replacement for One-way ANOVA
Step 1 H0: there is no difference in the groups
H1: at least one group is different
Step 2 Find critical value from c2-table (Table G) for given a
level and degrees of freedom (k–1). Test is always onetailed (right-tailed).
Step 3
•
Rank all data together.
•
Sum ranks within each group, call them R1, R2, R3, ..., Rk.
•
Compute test value, H, from:
where, N = n1 + n2 + n3 + ... nk
Step 4
If test value > critical value, reject H0.
Step 5
Summarize the result. I.e., there is a difference in at least
one sample.
36
Spearman Correlation
•
•
similar to Pearson except data are ordinal vs.
ratio/interval
data are ranked then correlated
Step 1 H0: r = 0
H1: r … 0
Step 2 Find critical value from table (Table L) for given a level
and sample size, n (number of pairs), n must be greater
than 5 for a=0.05
Step 3
•
Rank data within each group
•
Compute differences between pairs, Di
•
Compute correlation coefficient from:
where n is number of pairs (5 or more)
Step 4
If absolute value of test value > critical value, reject H0.
Step 5 Summarize the result. I.e., data are correlated or
uncorrelated. Note, no regression line is possible since
data were converted to ranks.
37
Comparison of Statistical Tests
Data type
Single sample
Parametric Nonparametric
Ratio,
Ordinal
Interval
z-test,
Sign test*,
t-test
K-S test*
Frequency
Nominal
c2 Goodnessof-fit
Two independent z-test,
Wilcoxon Rank Sum
samples
t-test (Mann-Whitney U)
(2 types)
Two dependent
Paired
samples
t-test Wilcoxon
Paired Sign*,
signed-rank
More than two
independent
samples
One-way
ANOVA
Two factors
Two-way
ANOVA*
Correlation
Pearson
* not studied in this course
Kruskall-Wallis
c2 Test of
Independence
Spearman
Phi*