Download HW3 Solutions - uf statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Gibbs sampling wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
HW # 3
STA 6166
Fall 2005
DUE: 11 October 2005 at start of class
There are several problems listed in this HW. Please do the first 2 problems (gasoline and cattle)
as your assignment. The rest are practice problems to be tried and whose answers will be put up
on the website after the HW is handed in.
1. Each time I purchase gasoline, I calculate the miles per gallon I got on the last tank. Here
are the mpg values for my last 20 fills:
20.4,
22.6,
22.1,
23.5,
21.7,
21.2,
25.3,
23.1,
22.6,
24.6,
25.3,
22.4,
21.2,
23.4,
22.4,
24.1,
21.8, 23.7,
and
22.9.
22.1,
a. Can these be considered a random sample from the population of values for mpg
for my car that could have been recorded? Explain. If this is not a random sample,
how might I take a true random sample?
No, they are not a true random sample in that the values were not randomly selected from the
population of values for my car. Unfortunately, I failed to keep good records so we can not
actually take a random sample. We could have if I had recorded mpg for every tank of gas I put
into the car since I got it. Then we could randomly select from that set.
For the following we shall assume that the sample was randomly selected.
b. Describe the sampling distribution of the sample mean when a random sample of
size 20 is taken from the population. What assumptions did you have to make for
this sampling distribution to be correct?
For a sample of size 20, I will need to assume that the population of mpg values is approximately
normally distributed so that the sample mean would be normally distributed. If so, the mean of
the sampling distribution is µ X = µ , the population mean mpg, and the standard deviation is
σX =
σ
n
.
c. Describe the sampling distribution of χ 2 = (n − 1) s 2 / σ 2 when a random sample
of size 20 is taken from the population. What assumptions did you have to make
for this sampling distribution to be correct?
If the population of mpg values is normally distributed, then the chi-square statistics has a chisquare distribution on n-1 = 20-1 = 19 degrees of freedom. The mean for the sample variance is
2σ 4
.
µ S 2 = σ 2 and the variance of the sample variance is σ S22 =
n −1
d. Use JMP to obtain the frequency distribution, sample mean, and sample variance
for this dataset. Describe the distribution of the sample data (not the sample
statistics!).
Distributions
mpg
The data appear to come from a Normal
distribution since the histogram is not very
skewed. The sample mean is 22.82 and the
sample standard deviation is 1.3308.
20
21
22
23
24
25
26
Moments
Mean
Std Dev
Std Err Mean
upper 95% Mean
lower 95% Mean
N
22.82
1.3308486
0.2975868
23.442856
22.197144
20
e. Do any of the results in (d) support any of the assumptions that you list in (b) or
(c)? Explain.
The assumptions that I made were that the sampling was random (we are told to consider it true)
and that the population was normally distributed. From the histogram this does not seem
unreasonable.
f. Suppose the true mean mpg for my car when it is running correctly is 26 mpg.
Calculate the z-score for the sample mean you obtained in (d). Does this imply
that the car is in good running order right now?
I cannot calculate a Z-score since I do not know the population standard deviation σ . I can
22.82 − 26
calculate a T-score and that is T =
= −10.685. This is a very small value, implying
0.29759
that it is very unlikely that I am still getting 26 mpg on average. As a result, I think maybe my
car is not in good running order and I need to get a tune-up!
2. The incidence of paratuberculosis in Florida’s beef cattle is believed to be 10%. Suppose
you take a random sample of 100 animals and find 12 infected.
a. Is the sample size sufficiently large for you to use the Central Limit Theorem and
claim that the sample proportion is approximately Normal distributed?
Yes, n = 100 and if π = 0.10 is true then we would expect 10 infected cattle and 90 uninfected
cattle in our sample. Both meet the condition for approximate Normality.
b. If your answer to (a) is yes, what is the sampling distribution of the sample
proportion of infected cattle?
The sampling distribution is approximately Normal with a mean equal to the population
π (1 − π )
proportion π and a standard deviation of
. If π = 0.10 , then we can assign values to
n
the mean and standard deviation of 0.10 and 0.03, respectively.
c. Using your answer in (b), calculate the probability that you would observe a
sample proportion of 0.12 or higher if the true proportion is 10%.
0.12 − 0.10 ⎞
⎛
Pr( p > 0.12 | π = 0.10) = Pr ⎜ Z >
⎟ = Pr( Z > 0.67) = 1 − Pr( Z < 0.67) = 1 − 0.7486 = 0.2514
0.03 ⎠
⎝
d. Does your answer in (c ) imply that the true population proportion is 10%?
Explain.
Observing a value of 0.12 if the true proportion of 0.10 is not particularly unusual.
e. Suppose the sample had yielded 23 animals infected. What is the probability that
you would observe sample proportion of 0.23 or higher if the true proportion is
10%? Would this result imply that the true population proportion is 10%?
0.23 − 0.10 ⎞
⎛
Pr( p > 0.23 | π = 0.10) = Pr ⎜ Z >
⎟ = Pr( Z > 4.33) = 0 + . This implies that it would be
0.03 ⎠
⎝
extremely unlikely that our sample would yield 23% infection rate if the true rate is only 10%.
3. Read the short article on bee dances, Pennisi, E. 2001. Bee Dance Reveals Bee’s-Eye
View. Science, 292: 1628-1629, that is available online through UF’s library. After doing
so, please answer the following questions. For our purposes we are interested in the
Bernoulli variable Y = “the bee flew up to the observer at the 70-meter distance from the
hive” that was recoded for each of the 220 bees that were followed by the experimenter.
Note that the researchers summarized their results by reporting that X=165 of the bees
were recorded as a Yes (the success).
a. Discuss the sampling requirements that must be met for X to have a Binomial
distribution. Can we tell from this write-up whether they have been met in this
experiment?
Binomial experiment sampling requirements are:
1. The sampling is random and that the observations are independent.
2. The true probability that the bee would fly up to an observer 70 yards from the hive is
constant over the length of the experiment.
3. The population size is much larger than the sample size (or sampling is with
replacement).
From the article it is difficult to discern whether any of the conditions have been met
although it doesn’t seem unreasonable that they have.
Let’s assume that the requirements have been met.
b. What is the sampling distribution for the sample proportion of successes for this
experiment where the sample size is n=220? What assumptions did you make for
this sampling distribution to be correct?
Given a sample of n=220 and a sample proportion of successes of 0.75, it seems reasonable to
believe that we would observe on average at least 10 successes and 10 failures. Further, we
require that the sampling be done according to a Binomial experiment. If true, the sampling
π (1 − π )
distribution is approximately Normal with a mean of π and a standard deviation of
.
n
c. Suppose that the population proportion of bees that would fly up to the observer at
the 70-meter distance from the hive is 0.81. Given this piece of information you
can now check if one of the assumptions listed in (b) has been met. Do so.
nπ = 220*(0.81) = 178.2 and n(1-π) = 220*(0.19) = 41.8. Hence, the sample size is sufficiently
large.
d. If the population proportion is 0.81, is observing a sample proportion of 0.75, as
was done in this experiment, unusual? To answer this, calculate the z-score for
this value. Also, use your answer in (b) to calculate the probability that, if we
repeated the experiment, we would see a sample proportion bigger than the 0.75
observed this time.
Z=
0.75 − 0.81
=
− 0.06
= −2.27
0.0264
0.81(1 − 0.81)
220
Pr(Z<-2.27)=0.0116. From these results, I would say it is fairly unlikely we would see a sample
proportion as small as 0.75 if the true proportion is 0.81.
4. Calculate a 95% confidence interval for the population proportion of Florida cattle
infected with paratuberculosis based on the results from the random sample of 100 cattle.
Interpret this interval in the context of the problem.
p ± 1.96
p (1 − p )
0.75 * 0.25
= 0.75 ± 1.96 *
= 0.75 ± 1.96 * 0.029 = 0.75 ± 0.057 = {0.693, 0.807}
n
220
This can be interpreted as “we are 95% confident that the true proportion of bees that would fly
to an observer 70 yards from the hive when given the type of false information given in this
experiment is captured within the interval of 0.693 to 0.807.
5. Calculate a 90% confidence interval for the true mean mpg of my car at the current time
based on the sample of 20 values (see problem (1)) taken most recently. Interpret this
interval in the context of the problem.
T for 19 df and a confidence level of 0.90 is T=1.729. So, the 90% confidence interval is
⎛ s ⎞
X ± t n −1,0.90 ⎜⎜
⎟⎟ = 22.82 ± 1.729(0.298) = 22.82 ± 0.5152 = {22.305, 23.335} . We are 90%
⎝ n⎠
confident that my car’s true current average mpg is somewhere between 22.3 and 23.3 mpg.
6. Consider the following three statements:
1) a confidence level of 95% says that the probability that a calculated 95% confidence
interval includes the true value of the population mean is 0.95;
2) a confidence level of 95% says that on average 95 out of 100 samples would have
confidence intervals that include the true value of the true population mean; and
3) a confidence level of 95% says that there is a 95% probability that the true mean is
contained within a particular calculated 95% confidence interval.
a. Are these three statements saying the same thing? Explain.
Sort of. The second and third statements are saying the same thing but the first statement could
be interpreted differently. It could be implying that the mean has a probability associated with it
when in fact the random factor here is the confidence interval not the mean. So the probability
refers to whether the interval falls over the mean not that the mean falls in the interval. Yes, I
know it is subtle and you may not have interpreted it as the mean being variable but if you had
….. the distinction is important. The population mean does not move around from sample to
sample, the intervals do.