Download Chapter 6 Hypothesis Testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 6
Hypothesis Testing
What is Hypothesis Testing?
•  … the use of statistical procedures to answer research
questions
•  Typical research question (generic):
•  For hypothesis testing, research questions are statements:
•  This is the null hypothesis (assumption of “no difference”)
•  Statistical procedures seek to reject or accept the null
hypothesis (details to follow)
2
• 
Thus far:
–  You have generated a hypothesis (E.g. The mean of group A is different
than the mean of group B)
–  You have collected some data (samples in group A, samples in group B)
–  Now you want to know if this data supports your hypothesis
–  Formally:
–  H0 (null hypothesis): there is no difference in the mean values of group A
and group B
–  H1 (experimental hypothesis): there is a difference in the mean of group A
and group B
3
A practitioner’s point of view
•  Test statistic
–  Inferential statistics tell us what is the likelihood that the experimental
hypothesis is true à by computing a test statistic.
–  Typically, if the likelihood of obtaining a value of a test statistic is <0.05,
then we can reject the null hypothesis
–  “…significant effect of …”
•  Non-significant results
–  Does not mean that the null hypothesis is true
–  Interpreted to mean that the results you are getting could be a chance
finding
•  Significant result
–  Means that the null hypothesis is highly unlikely
4
A practitioner’s point of view
•  Errors:
–  Type 1 error (False positive) : we believe that there is an
effect when there isn’t one
–  Type 2 error (False negative) : we believe that there isn’t an
effect, when there is one
–  If p<0.05, then the probability of a Type 1 error is < 5%
(alpha level)
•  Typically, we deal with two types of hypotheses
–  The mean of group A is different from the mean of group B
(one-tailed test)
–  The mean of group A is larger than the mean of group B
(two-tailed test)
5
Statistical Procedures
•  Two types:
–  Parametric
•  Data are assumed to come from a distribution, such as the
normal distribution, t-distribution, etc.
–  Non-parametric
•  Data are not assumed to come from a distribution
–  Lots of debate on assumptions testing and what to do if
assumptions are not met (avoided here, for the most
part)
–  A reasonable basis for deciding on the most appropriate
test is to match the type of test with the measurement
scale of the data (next slide)
6
Measurement Scales vs. Statistical Tests
Examples
M=Male,
F=Female
Preference
ranking
Likert scale
responses
Task
completion
time
•  Parametric tests most appropriate for…
–  Ratio data, interval data
•  Non-parametric tests most appropriate for…
–  Ordinal data, nominal data (although limited use for ratio and
interval data)
7
Tests Presented Here
•  Parametric
–  T-test
–  Analysis of variance (ANOVA)
–  Most common statistical procedures in HCI research
8
T-test
• 
• 
• 
Goal: To ascertain if the difference in the means of two groups is significant
Assumptions
–  Data are normally distributed (you checked for this by looking at the
histograms, reporting the mean/median/standard deviation, and by running
Shapiro-Wilks)
–  If data come from different groups of people à Independent t-test
(assumes scores are independent and variances in the populations are
roughly equal … check your table of descriptive statistics)
–  If data come from same group of people à dependent t-test
Practioner’s point of view: When in doubt, consult a book! Let’s do an
example in R
9
Tests Presented Here
•  Parametric
–  Analysis of variance (ANOVA)
•  Used for ratio data and interval data
•  Most common statistical procedure in HCI research
•  Non-parametric
–  Chi-square test
•  Used for nominal data
–  Mann-Whitney U, Wilcoxon Signed-Rank, KruskalWallis, and Friedman tests
•  Used for ordinal data
10
Analysis of Variance
•  The analysis of variance (ANOVA) is the most
widely used statistical test for hypothesis testing in
factorial experiments
•  Goal à determine if an independent variable has a
significant effect on a dependent variable
•  Remember, an independent variable has at least
two levels (test conditions)
•  Goal (put another way) à determine if the test
conditions yield different outcomes on the
dependent variable (e.g., one of the test conditions
is faster/slower than the other)
11
Why Analyse the Variance?
•  Seems odd that we analyse the variance, but the
research question is concerned with the overall
means:
•  Let’s explain through two simple examples (next
slide)
12
Example #1
“Significant” implies that in all
likelihood the difference observed
is due to the test conditions
(Method A vs. Method B).
Example #2
“Not significant” implies that the
difference observed is likely due
to chance.
File: 06-AnovaDemo.xlsx
13
Example #1 - Details
Note: Within-subjects design
Error bars show
±1 standard deviation
Note: SD is the square root of the variance
14
Example #1 – ANOVA1
Probability of obtaining the observed
data if the null hypothesis is true
Reported as…
F1,9 = 9.80, p < .05
1 ANOVA table
Thresholds for “p”
•  .05
•  .01
•  .005
•  .001
•  .0005
•  .0001
created by StatView (now marketed as JMP, a product of SAS; www.sas.com)
How to Report an F-statistic
•  Notice in the parentheses
–  Uppercase for F
–  Lowercase for p
–  Italics for F and p
–  Space both sides of equal sign
–  Space after comma
–  Space on both sides of less-than sign
–  Degrees of freedom are subscript, plain, smaller font
–  Three significant figures for F statistic
–  No zero before the decimal point in the p statistic (except in Europe)
Example #2 - Details
Error bars show
±1 standard deviation
Example #2 – ANOVA
Probability of obtaining the observed
data if the null hypothesis is true
Reported as…
F1,9 = 0.626, ns
Note: For non-significant
effects, use “ns” if F < 1.0,
or “p > .05” if F > 1.0.
Example #2 - Reporting
19
More Than Two Test Conditions
20
ANOVA
•  There was a significant effect of Test Condition on the
dependent variable (F3,45 = 4.95, p < .005)
•  Degrees of freedom
–  If n is the number of test conditions and m is the number of
participants, the degrees of freedom are…
–  Effect à (n – 1)
–  Residual à (n – 1)(m – 1)
–  Note: single-factor, within-subjects design
21
Post Hoc Comparisons Tests
•  A significant F-test means that at least one of the test
conditions differed significantly from one other test
condition
•  Does not indicate which test conditions differed
significantly from one another
•  To determine which pairs differ significantly, a post hoc
comparisons tests is used
•  Examples:
–  Fisher PLSD, Bonferroni/Dunn, Dunnett, Tukey/Kramer, Games/
Howell, Student-Newman-Keuls, orthogonal contrasts, Scheffé
22
Related documents