Download File

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistical inference wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Lesson 6: Section 9.3 (part 2)
 Recognize
paired data and use one-sample t
procedures to perform significance tests for
such data.
 Comparative
inference is used more than onesample inference because those studies are
more convincing.
 Paired data involves making two observations
on the same individual or one observation on
each of two similar individuals.
 If the conditions for inference are met, we can
use one-sample t procedures to perform
inference about the mean difference µ0.
 These methods are sometimes called paired t
procedures.

For their second semester project in AP Statistics,
Leighton and Brooke decided to investigate which line
was faster in the supermarket: the express lane or the
regular lane. To collect their data, they randomly
selected 15 times during a week, went to the same
store, and bought the same item. However, one of
them used the express lane and the other used a
regular lane. To decide which lane each of them would
use, they flipped a coin. If it was heads, Leighton used
the express lane and Brooke used the regular lane. If it
was tails, Leighton used the regular lane and Brooke
used the express lane. They entered their randomly
assigned lanes at the same time, and each recorded the
time in seconds it took them to complete the
transaction.
Time in express lane Time in regular lane
(seconds)
(seconds)
337
342
226
472
502
456
408
529
151
181
284
339
150
229
357
263
349
332
257
352
321
341
383
397
565
694
363
324
85
127
Carry out a test to
see if there is
convincing evidence
that the express
lane is faster.
 Since
these data are paired, we
will consider the difference in time
(regular – express). Here are the
15 differences. In this case, a
positive difference means the
express lane was faster.
Regular - Express
(seconds)
5
246
-46
121
30
55
79
-94
-17
95
20
14
129
-39
42
 STATE
want to test the following hypotheses at the   0.05
significance level
H0: µd =0
Ha: µd > 0
where µd= the true mean difference (regular –
express) in time required to purchase an item at
the supermarket.
 We

PLAN

If conditions are met, we will perform a paired t test
for µd.
Random: A random sample of times to make the purchases
was selected, and the students were assigned to lanes at
random.
 Normal: We don’t know if the population distribution of
difference is approximately Normal and we don’t have a
large sample size, so we will graph the differences and look
for any departures from Normality (draw a dot plot or box
plot or Normal Probability plot).



Slightly skewed right with no outliers, so it is reasonable to use t
procedures for these data.
Independent: One mean time should not affect another since
we randomly selected the times to conduct the study.
 DO
 Using
One-Variable STATS in the calculator, the
sample mean difference is x  42.7 seconds with a
standard deviation of sx = 84.0 seconds.
42.7  0
 Test statistic:
t
84.0
15
 P-value
P(t>1.97) using the t distribution with 15 1 = 14 degrees of freedom. Using Table B, I find
that the P-value is between .05 and 0.025. Using
technology, P-value = tcdf(1.97,100,14) = 0.034.
 CONCLUDE
the P-value is less than  (.034<.05), we
reject the null hypothesis. There is convincing
evidence that the express lane is faster than the
regular lane, on average.
 Since
 When
you’re working with paired data, the
Independent condition refers to the
differences.
 When calculating the P-value, be sure to report
the degrees of freedom and please be specific
if with your method (how you used your
calculator or Table B).
 Significance
does not mean important!!
 Example: Suppose that a large school district
implemented a new math curriculum for the
current school year. To see if the new curriculum is
effective, the district randomly selects 500
students and compares their scores on a
standardized test after that curriculum change
with their scores on the same test before the
curriculum change. The mean improvement xd  0.9
with a standard deviation of sd = 12.0.
(a)Calculate the test statistic and P-value for a test
of H0: µd =0 and Ha: µd > 0.
0.9  0
t
 1.68
12
500
 P-value
= tcdf(1.68, 100, 499) = 0.047
(b) Are the results significant at the 5% level?
 Since the P-value is less than 0.05 (barely), the
results ARE significant at the 5% level. Then we
reject the null of no difference and conclude that
there was an improvement.
(c) Can we conclude that the new curriculum is the
cause of the apparent increase in scores?
 No! Even though the increase was statistically
significant, we cannot conclude that the new
curriculum was the cause of the increase (we
would need a control group for comparison and
random treatment of assignments!)
 If
there is a lack of significance, don’t ignore it!
Address that you have absence of evidence!
 Example: When an AP Statistics class did caffeine
and pulse rates experiment, the confidence
interval for the difference in average pulse rate
changes was (-2.52, 4.59).
 This means that drinking soda with caffeine can
increase pulse rates by as much as 4.59 beats per
minute, on average OR decrease pulse rates by as
much as 2.52 beats per minute, on average.
(compared with drinking soda with no caffeine).
 Because of this, it seems that caffeine does not
affect pulse rates (our results are not statistically
significant), but we need more data to estimate
this effect.
 Do


not use inference when…
Your experimental design is poor (you can’t make
inference when your data collection was biased!)
Don’t use it when you already have the entire
population’s parameters!
 Example:
Suppose that you wanted to know the
average GPA for students at your school who are
enrolled in AP Statistics. Since this isn’t a large
population, you conduct a census and record the
GPA for each student. Is it appropriate to
construct a one-sample t interval for the
population mean GPA?

NO!! You already have all the GPAs for every student
– we only use inference to make a conclusion about
something unknown about the population from
SAMPLES
 Search
data for patterns; however, do NOT search
for significance!


Running one test and reaching the significance level is
reasonably good evidence that you found something.
Running 20 tests and finding ONLY ONE out of 20
reaching the significance level is NOT reasonably good
evidence.
 Assigned
reading: p. 565-587
 HW problems:

Lesson 5

Lesson 6
 Check
p. 564 #57-60;
p. 586 #63, 67, 71, 73
p. 588 #75, 77, 80, 82, 89, 94-97,
99-104
answers to odd problems.