Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
History of statistics wikipedia , lookup
Confidence interval wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Sampling (statistics) wikipedia , lookup
Gibbs sampling wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
HW # 3 STA 6166 Fall 2005 DUE: 11 October 2005 at start of class There are several problems listed in this HW. Please do the first 2 problems (gasoline and cattle) as your assignment. The rest are practice problems to be tried and whose answers will be put up on the website after the HW is handed in. 1. Each time I purchase gasoline, I calculate the miles per gallon I got on the last tank. Here are the mpg values for my last 20 fills: 20.4, 22.6, 22.1, 23.5, 21.7, 21.2, 25.3, 23.1, 22.6, 24.6, 25.3, 22.4, 21.2, 23.4, 22.4, 24.1, 21.8, 23.7, and 22.9. 22.1, a. Can these be considered a random sample from the population of values for mpg for my car that could have been recorded? Explain. If this is not a random sample, how might I take a true random sample? No, they are not a true random sample in that the values were not randomly selected from the population of values for my car. Unfortunately, I failed to keep good records so we can not actually take a random sample. We could have if I had recorded mpg for every tank of gas I put into the car since I got it. Then we could randomly select from that set. For the following we shall assume that the sample was randomly selected. b. Describe the sampling distribution of the sample mean when a random sample of size 20 is taken from the population. What assumptions did you have to make for this sampling distribution to be correct? For a sample of size 20, I will need to assume that the population of mpg values is approximately normally distributed so that the sample mean would be normally distributed. If so, the mean of the sampling distribution is µ X = µ , the population mean mpg, and the standard deviation is σX = σ n . c. Describe the sampling distribution of χ 2 = (n − 1) s 2 / σ 2 when a random sample of size 20 is taken from the population. What assumptions did you have to make for this sampling distribution to be correct? If the population of mpg values is normally distributed, then the chi-square statistics has a chisquare distribution on n-1 = 20-1 = 19 degrees of freedom. The mean for the sample variance is 2σ 4 . µ S 2 = σ 2 and the variance of the sample variance is σ S22 = n −1 d. Use JMP to obtain the frequency distribution, sample mean, and sample variance for this dataset. Describe the distribution of the sample data (not the sample statistics!). Distributions mpg The data appear to come from a Normal distribution since the histogram is not very skewed. The sample mean is 22.82 and the sample standard deviation is 1.3308. 20 21 22 23 24 25 26 Moments Mean Std Dev Std Err Mean upper 95% Mean lower 95% Mean N 22.82 1.3308486 0.2975868 23.442856 22.197144 20 e. Do any of the results in (d) support any of the assumptions that you list in (b) or (c)? Explain. The assumptions that I made were that the sampling was random (we are told to consider it true) and that the population was normally distributed. From the histogram this does not seem unreasonable. f. Suppose the true mean mpg for my car when it is running correctly is 26 mpg. Calculate the z-score for the sample mean you obtained in (d). Does this imply that the car is in good running order right now? I cannot calculate a Z-score since I do not know the population standard deviation σ . I can 22.82 − 26 calculate a T-score and that is T = = −10.685. This is a very small value, implying 0.29759 that it is very unlikely that I am still getting 26 mpg on average. As a result, I think maybe my car is not in good running order and I need to get a tune-up! 2. The incidence of paratuberculosis in Florida’s beef cattle is believed to be 10%. Suppose you take a random sample of 100 animals and find 12 infected. a. Is the sample size sufficiently large for you to use the Central Limit Theorem and claim that the sample proportion is approximately Normal distributed? Yes, n = 100 and if π = 0.10 is true then we would expect 10 infected cattle and 90 uninfected cattle in our sample. Both meet the condition for approximate Normality. b. If your answer to (a) is yes, what is the sampling distribution of the sample proportion of infected cattle? The sampling distribution is approximately Normal with a mean equal to the population π (1 − π ) proportion π and a standard deviation of . If π = 0.10 , then we can assign values to n the mean and standard deviation of 0.10 and 0.03, respectively. c. Using your answer in (b), calculate the probability that you would observe a sample proportion of 0.12 or higher if the true proportion is 10%. 0.12 − 0.10 ⎞ ⎛ Pr( p > 0.12 | π = 0.10) = Pr ⎜ Z > ⎟ = Pr( Z > 0.67) = 1 − Pr( Z < 0.67) = 1 − 0.7486 = 0.2514 0.03 ⎠ ⎝ d. Does your answer in (c ) imply that the true population proportion is 10%? Explain. Observing a value of 0.12 if the true proportion of 0.10 is not particularly unusual. e. Suppose the sample had yielded 23 animals infected. What is the probability that you would observe sample proportion of 0.23 or higher if the true proportion is 10%? Would this result imply that the true population proportion is 10%? 0.23 − 0.10 ⎞ ⎛ Pr( p > 0.23 | π = 0.10) = Pr ⎜ Z > ⎟ = Pr( Z > 4.33) = 0 + . This implies that it would be 0.03 ⎠ ⎝ extremely unlikely that our sample would yield 23% infection rate if the true rate is only 10%. 3. Read the short article on bee dances, Pennisi, E. 2001. Bee Dance Reveals Bee’s-Eye View. Science, 292: 1628-1629, that is available online through UF’s library. After doing so, please answer the following questions. For our purposes we are interested in the Bernoulli variable Y = “the bee flew up to the observer at the 70-meter distance from the hive” that was recoded for each of the 220 bees that were followed by the experimenter. Note that the researchers summarized their results by reporting that X=165 of the bees were recorded as a Yes (the success). a. Discuss the sampling requirements that must be met for X to have a Binomial distribution. Can we tell from this write-up whether they have been met in this experiment? Binomial experiment sampling requirements are: 1. The sampling is random and that the observations are independent. 2. The true probability that the bee would fly up to an observer 70 yards from the hive is constant over the length of the experiment. 3. The population size is much larger than the sample size (or sampling is with replacement). From the article it is difficult to discern whether any of the conditions have been met although it doesn’t seem unreasonable that they have. Let’s assume that the requirements have been met. b. What is the sampling distribution for the sample proportion of successes for this experiment where the sample size is n=220? What assumptions did you make for this sampling distribution to be correct? Given a sample of n=220 and a sample proportion of successes of 0.75, it seems reasonable to believe that we would observe on average at least 10 successes and 10 failures. Further, we require that the sampling be done according to a Binomial experiment. If true, the sampling π (1 − π ) distribution is approximately Normal with a mean of π and a standard deviation of . n c. Suppose that the population proportion of bees that would fly up to the observer at the 70-meter distance from the hive is 0.81. Given this piece of information you can now check if one of the assumptions listed in (b) has been met. Do so. nπ = 220*(0.81) = 178.2 and n(1-π) = 220*(0.19) = 41.8. Hence, the sample size is sufficiently large. d. If the population proportion is 0.81, is observing a sample proportion of 0.75, as was done in this experiment, unusual? To answer this, calculate the z-score for this value. Also, use your answer in (b) to calculate the probability that, if we repeated the experiment, we would see a sample proportion bigger than the 0.75 observed this time. Z= 0.75 − 0.81 = − 0.06 = −2.27 0.0264 0.81(1 − 0.81) 220 Pr(Z<-2.27)=0.0116. From these results, I would say it is fairly unlikely we would see a sample proportion as small as 0.75 if the true proportion is 0.81. 4. Calculate a 95% confidence interval for the population proportion of Florida cattle infected with paratuberculosis based on the results from the random sample of 100 cattle. Interpret this interval in the context of the problem. p ± 1.96 p (1 − p ) 0.75 * 0.25 = 0.75 ± 1.96 * = 0.75 ± 1.96 * 0.029 = 0.75 ± 0.057 = {0.693, 0.807} n 220 This can be interpreted as “we are 95% confident that the true proportion of bees that would fly to an observer 70 yards from the hive when given the type of false information given in this experiment is captured within the interval of 0.693 to 0.807. 5. Calculate a 90% confidence interval for the true mean mpg of my car at the current time based on the sample of 20 values (see problem (1)) taken most recently. Interpret this interval in the context of the problem. T for 19 df and a confidence level of 0.90 is T=1.729. So, the 90% confidence interval is ⎛ s ⎞ X ± t n −1,0.90 ⎜⎜ ⎟⎟ = 22.82 ± 1.729(0.298) = 22.82 ± 0.5152 = {22.305, 23.335} . We are 90% ⎝ n⎠ confident that my car’s true current average mpg is somewhere between 22.3 and 23.3 mpg. 6. Consider the following three statements: 1) a confidence level of 95% says that the probability that a calculated 95% confidence interval includes the true value of the population mean is 0.95; 2) a confidence level of 95% says that on average 95 out of 100 samples would have confidence intervals that include the true value of the true population mean; and 3) a confidence level of 95% says that there is a 95% probability that the true mean is contained within a particular calculated 95% confidence interval. a. Are these three statements saying the same thing? Explain. Sort of. The second and third statements are saying the same thing but the first statement could be interpreted differently. It could be implying that the mean has a probability associated with it when in fact the random factor here is the confidence interval not the mean. So the probability refers to whether the interval falls over the mean not that the mean falls in the interval. Yes, I know it is subtle and you may not have interpreted it as the mean being variable but if you had ….. the distinction is important. The population mean does not move around from sample to sample, the intervals do.