Download slides - Courses

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Psychometrics wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Major Points
• Formal Tests of Mean Differences
• Review of Concepts: Means,
Standard Deviations, Standard
Errors, Type I errors
• New Concepts: One and Two Tailed
Tests
• Significance of Differences
Important Concepts
• Concepts critical to hypothesis
testing
– Decision
– Type I error
– Type II error
– Critical values
– One- and two-tailed tests
Decisions
• When we test a hypothesis we
draw a conclusion; either correct
or incorrect.
– Type I error
• Reject the null hypothesis when it is
actually correct.
– Type II error
• Retain the null hypothesis when it is
actually false.
Possible Scenarios
Results show
TRUE
FALSE
Null Hypothesis is actually
TRUE
FALSE
Correct Acceptance Type II
TypeI
Correct Rejection
Type I Errors
• Assume there are no differences (null
hypothesis is true)
• Assume our results show that they are
not same (we reject null hypothesis)
• This is a Type I error
– Probability set at alpha ()
•  usually at .05
– Therefore, probability of Type I error = .05
Type II Errors
• Assume there are differences
(alternative hypothesis is true)
• Assume that we conclude they are the
same (we accept null hypothesis)
• This is also an error
– Probability denoted beta ()
• We can’t set beta easily.
• We’ll talk about this issue later.
• Power = (1 - ) = probability of
correctly rejecting false null hypothesis.
Critical Values
• These represent the point at which we
decide to reject null hypothesis.
• e.g. We might decide to reject null
when (p|null) < .05.
– Our test statistic has some value with p =
.05
– We reject when we exceed that value.
– That value is the critical value.
One- and Two-Tailed Tests
• Two-tailed test rejects null when
obtained value too extreme in either
direction
– Decide on this before collecting data.
• One-tailed test rejects null if obtained
value is too low (or too high)
– We only set aside one direction for
rejection.
One- & Two-Tailed
Example
• One-tailed test
– Reject null if number of red in Halloween
candies is higher
• Two-tailed test
– Reject null if number of red in Halloween
candies is different (whether higher or
lower)
Within subjects t tests
•
•
•
•
Related samples
Difference scores
t tests on difference scores
Advantages and disadvantages
Related Samples
• The same participant / thing give us data on two
measures
– e. g. Before and After treatment
– Usability problems before training on PP and after
training
– Darts and Pros during same time period
• With related samples, someone high on one
measure probably high on other(individual
variability).
Cont.
Related Samples--cont.
• Correlation between before and after scores
– Causes a change in the statistic we can use
• Sometimes called matched samples or
repeated measures
Difference Scores
• Calculate difference between first and
second score
– e. g. Difference = Before - After
• Base subsequent analysis on difference
scores
– Ignoring Before and After data
Difference between Darts and Pros
TIME
TIMENO
1January-June 1990
1.00
2Fe bruary-July1990
2.00
3March-August1990
3.00
4April-Se pte mbe r1990
4.00
5May-Octobe r1990
5.00
6June -Nove mbe r1990
6.00
7July-De ce mbe r1990
7.00
8August1990-January1991
8.00
9Se pte mbe r1990-Fe bruary1991
9.00
10Octobe r1990-March1991
10.00
11Nove mbe r1990-April1991
11.00
12De ce mbe r1990-May1991
12.00
13January-June 1991
13.00
14Fe bruary-July1991
14.00
15March-August1991
15.00
PROS
12.70
26.40
2.50
-20.00
-37.80
-33.30
-10.20
-20.30
38.90
20.20
50.60
66.90
7.50
17.50
39.60
DARTS
.00
1.80
-14.30
-7.20
-16.30
-27.40
-22.50
-37.30
-2.50
11.20
72.90
16.60
28.70
44.80
71.30
Results
• Pros got more gains than darts
• Was this enough of a change to be
significant?
• If no difference, mean of computed
differences should be zero
– So, test the obtained mean of difference scores
against m = 0.
– Use same test as in one sample test
t test
D and sD = mean and standard deviation of differences.
D  m 8.22 8.22
t


 6.85
sD
3.6
1.2
n
9
df = 100 - 1 = 9 - 1 = 8
Cont.
t test--cont.
•
•
•
•
With 99 df, t.01 = +2.62 (Table E.6)
We calculated t = 2.64
Since 6.64 > 2.62, reject H0
Conclude that the Pros did get significantly
more than Darts
Advantages of Related Samples
• Eliminate subject-to-subject variability
• Control for extraneous variables
• Need fewer subjects
Disadvantages of Related
Samples
•
•
•
•
•
Order effects
Carry-over effects
Subjects no longer naïve
Change may just be a function of time
Sometimes not logically possible
Between subjects t test
• Distribution of differences between means
• Heterogeneity of Variance
• Nonnormality
Pros during ups and downs in DOW
• Effect of fluctuations in DOW: did it effect
Pros
– Different question than previously
• Now we have two independent groups of
data
– Pros during positive DOW, and Pros during
negative DOW
– We want to compare means of two groups
Effect of changes in DOW
TIME
1January-June1990
2February-July1990
3March-August1990
4April-September1990
5May-October1990
6June-November1990
7July-December1990
8August1990-January1991
9September1990-February1991
10October1990-March1991
11November1990-April1991
12December1990-May1991
13January-June1991
14February-July1991
15March-August1991
16April-September1991
TIMENO
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
10.00
11.00
12.00
13.00
14.00
15.00
16.00
PROS
12.70
26.40
2.50
-20.00
-37.80
-33.30
-10.20
-20.30
38.90
20.20
50.60
66.90
7.50
17.50
39.60
15.60
DARTS
.00
1.80
-14.30
-7.20
-16.30
-27.40
-22.50
-37.30
-2.50
11.20
72.90
16.60
28.70
44.80
71.30
2.80
DJIA DowTrend
2.50
1
11.50
1
-2.30
0
-9.20
0
-8.50
0
-12.80
0
-9.30
0
-.80
0
11.00
1
15.80
1
16.20
1
17.30
1
17.70
1
7.60
1
4.40
1
3.40
1
Differences from within
subjects test
Cannot compute pairwise differences, since we
cannot compare two random data points
We want to test differences between the two
sample means (not between a sample and
population)
Analysis
• How are sample means distributed if H0 is
true?
• Need sampling distribution of differences
between means
– Same idea as before, except statistic is
(X1 - X2) (mean 1 – mean2)
Sampling Distribution of Mean
Differences
• Mean of sampling distribution = m1 - m2
• Standard deviation of sampling distribution
(standard error of mean differences) =
sX X
1
2
1
2
2
2
s s


n1 n2
Cont.
Sampling Distribution--cont.
• Distribution approaches normal as n
increases.
• Later we will modify this to “pool”
variances.
Analysis--cont.
• Same basic formula as before, but with
accommodation to 2 groups.
X1  X 2 X1  X 2
t
 2
2
sX X
s1 s2

n1 n2
1
2
• Note parallels with earlier t
Degrees of Freedom
• Each group has 5 data points.
• Each group has n - 1 = 50 - 1 = 8 df
• Total df = n1 - 1 + n2 - 1 = n1 + n2 - 2
50 + 50 - 2 = 98 df
• t.01(98) = +2.62 (approx.)
Assumptions
• Two major assumptions
– Both groups are sampled from populations with
the same variance
• “homogeneity of variance”
– Both groups are sampled from normal
populations
• Assumption of normality
– Frequently violated with little harm.
Heterogeneous Variances
• Refers to case of unequal population
variances.
• We don’t pool the sample variances.
• We adjust df and look t up in tables for
adjusted df.
• Minimum df = smaller n - 1.
– Most software calculates optimal df.