Download This file has the solutions as produced by computer

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Inductive probability wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

Omnibus test wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Worksheet on Statistical Tests
Testing Proportions
A genetic marker is expected to be present in approximately 1/3 of the population. A simple random sample of 70
individuals is tested, and 20 turn out to have the marker. Do a test for the Null Hypothesis p = 1:
3
H0 : p = 1
3
H1 : p ≠1
3
•
Choose a level, and reject or not the hypothesis on the basis of your chosen level
•
Find the p-value for this test
Solutions
The sample mean for this sample is
0.28571
p
q
p
2
We assume p = 1, which implies a standard deviation of p(1 ¡ p) = 1 ¢ 2 =
for the Bernoulli random
3
3 3
3
q
p(1 ¡ p)
variable and a “standard error” (that’s
n ) of
0.05634
Assuming that we can use the normal approximation, the z-score for our observations turns out to be
−0.8452
which corresponds to a p-value of
0.19901
That’s very high, which implies that we cannot reject the null hypothesis at any reasonable significance level.
Indeed, the acceptance regions for the usual levels are
90 left
0.24066
90 right
0.42601
95 left
0.22290
95 right
0.44376
99 left
0.18820
99 right
0.47846
all of which include our sample mean
p-value when working with tables
The p-value above was calculated by a spreadsheet. If you are using tables, you will have to look up the probability
corresponding to the z-score ¡0:845. This is rounded to three decimal places, and tables are rounded to two decimal
places, but you could take the midpoint between 0.84 and 0.85. Using the negative score table, or subtracting the
read for 0.84 and 0.85 in the positive score table from 1, we get 0.1991 which misses the computer value by .0001.
“Real” Data
BMI
A table (taken from another textbook) lists values of the Body Mass Index for a sample of females. The sample is unlikely to
have been a true simple random one, and some of the data are by definition extraneous since the BMI standard for teenagers and
younger is different from the standard for adults 20 and over. After purging the table of these spurious entries, we have the
following summaries:
Sum
Count
Sum squares
874.1
33
24471.3
The BMI standard for adults is the following:
BMI
Weight Status
Below 18.5
Underweight
18.5 – 24.9
Normal
25.0 – 29.9
Overweight
30.0 and Above
Obese
We want to test whether the mean female is overweight or not, according to this standard.
1. Null Hypothesis: The mean is not overweight
H 0 : ¹ = 24:9
H 1 : ¹ > 24:9
2. Null Hypothesis: The mean is overweight
H 0 : ¹ = 24:9
H 1 : ¹ < 24:9
Solutions
The essential statistics for our sample are
Mean
Standard Error
Standard Deviation
Sample Variance
26.4879
1.11729
6.41837
41.1955
Recall: The “standard error” is psn, the term multiplying the critical t value in interval estimates and in computing
critical values.
x¹ ¡ ¹
We can compute the t-score, s ¢ p n, where µ is the mean indicated in the null hypothesis (in both our cases,
24.9). This turns out to be equal to
1.42118
corresponding to a p-value of
0.08247
Of course, since our t-score is positive, it cannot fall in the left tail, so if we are testing following test #2 we have no
way of rejecting the null hypothesis (that the mean is overweight). If we are following test #1, where the critical
region is the right tail, we are in the acceptance region at any level lower than 8.247%, like the usual 1%, 5%, or
even 8%, that is, at these levels, we cannot reject the hypothesis that the mean of the population is not overweight.
We would reject this hypothesis (hence, decide that there is enough evidence to suggest that the mean is overweight)
if we tested at the 10% level
p-value when working with tables
The p-value above was calculated by a spreadsheet. If you are using tables, you will have a hard time coming up
with a precise p-value, but you will be able to produce bounds. In our case, the t-score of 1.42118 lies (in the row for
32 degrees of freedom) between 1.694 (probability 0.05) and 1.309 (probability 0.1). Your answer would then be
that the p-value lies between 0.05 and 0.1, which for acceptance/rejection purposes is often enough, as significance
levels between these two are not as commonly used.
Screws
These are measurement made reportedly by the author on a box of screws. Assuming it is reasonable to treat this as a simple
random sample for the manufacturer’s production (a shaky assumption), which is assumed to be normally distributed around the
nominal value of .75 (not unreasonable), we can test if this is indeed the true mean:
count
Sums
Sum of Squares
H 0 : ¹ = :75
H 1 : ¹ ≠ :75
50
37.341
27.8944
Solution
This is a two-tailed test, and the statistics for our sample are
Mean
0.74682
Standard Error
0.00174
Standard Deviation
0.01232
Sample Variance
0.00015
The t-score resulting from these numbers (referring to an assumed true mean of 0.75) is
1.82491
which corresponds to a p-value of
0.03706
This is a two-tailed test, so that, at level α, each tail has an area (probability) of α/2: at 10%, the right tail has
probability 5% (we reject the null hypothesis), at 8% it has a probability of 4% (we reject the null hypothesis), at 5%
a probability of 2.5% (we cannot reject the null hypothesis). In other words, we will reject the null hypothesis (that
the screws are of the nominal size) only at levels higher than about 7.4%
p-value when working with tables
The p-value above was calculated by a spreadsheet. If you are using tables, you will have a hard time coming up
with a precise p-value, but you will be able to produce bounds. We should deal with 49 degrees of freedom, which
are not usually listed explicitly. In our case, the t-score of 1.82491 lies approximately between 2.009 (probability
0.025) and 1.676 (probability 0.05) in the 50 d.o.f. row, and your probability bounds are the same if you look at the
45 and 40 d.o.f. row. Your answer would then be that the p-value lies between 0.025 and 0.05, which for acceptance/
rejection purposes is often enough, as significance levels between these two are not as commonly used.
Freshmen weights (paired samples)
The following data summarizes the weight of a group of freshmen in September and April. The two sets are clearly not
independent, and the test that we can apply refers to the differences between the two measurements, which we will assume to be
normally distributed (and, as usual, hopefully normal).
count
67
sum weights September
4359
sum weights April
4438
sum of differences (equal to difference of the sums)
79
sum of squares of differences
1091
As usual, we can set up various tests, depending on our goals. For example,
H 0 : d = 0; H 1 : d > 0
H 0 : d = 0; H 1 : d < 0
H 0 : d = 0; H 1 : d ≠ 0
Solutions
The statistics for this data are
Mean
0.45418
Standard Error
0.15388
Standard Deviation
1.25954
Sample Variance
1.58643
The t-score from these numbers is
2.95
which is high: we are going to reject the null hypothesis in case 1 (right tail is the critical region), or 3 (two-tailed
test), since the p-value is
.00219
That indicates that only if the right critical region is smaller than 0.22% (which is incompatible with any “normal”
significance level) we would not reject the null hypothesis in the first and third test. The second test has a left tail as
a critical region, so we would not reject the hypothesis that, essentially, says that the average difference of the
weight is greater or equal to zero.
p-value when working with tables
The p-value above was calculated by a spreadsheet. If you are using tables, you will have a hard time coming up
with a precise p-value, but you will be able to produce bounds. It is unlikely that any table will have a row for 66
degrees of freedom, but a quick inspection will show you that 2.95 will have to correspond to a p-value smaller than
0.005 for any number of degrees of freedom larger than 14, let alone for 66! Thus, your answer would be that the pvalue is lower than .005, which is usually more than enough for rejection/not rejection purposes
Remark
For practical purposes, we did not report the raw data, which consists of 67 pairs of data, but only the summaries,
that is the summaries of the differences between each pair. This underlines the fact that “paired data” tests are
simply run-of-the-mill t-tests, where the basic data are the differences between pairs of data points - but otherwise
not any different from tests on simple samples. Traditional books (including our book) and software place this topic
in the “two samples” chapter, but this is somewhat misleading, since, in fact, we are dealing with one sample (the
collection of differences). We will quickly look at the protocol for true “two sample” tests - comparing two
independent samples - and, although, in the end, this still results in a t-test (assuming normality), the statistic to be
used is computed in a significantly different way.