* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download STAT303: Fall 2003
Survey
Document related concepts
Transcript
STAT303: Fall 2003 Exam #2, Form A 1. Which of the following is/are true? A. A simple random sample is always the best method of sampling. There are other sampling methods such as stratified or cluster sampling that are appropriate in some situations. B. Blocking is used to eliminate confounding variables. Blocking is a method of controlling the variable of the data due to some known source such as gender, location or classification. *C. One type of control within an experiment is the use of comparison of several treatments. See IPS p. 241. To eliminate the effect of ‘being tested’, control groups are included in studies. D. All of the above are true statements. E. Exactly two of the above are true statements. 2. The weight of a package of mints is believed to be normally distributed with mean 21.37 grams and standard deviation 0.4, X ~ N(21.37, 0.42). What is the chance that a sample of 4 packages of mints has an average weight, X 4 , between 21 and 22 grams? P(21< X 4 < 22) = P((2121.37)/(0.4/4) < Z < (2221.37)/(0.4/4)) = P(1.85<Z<3.15) = 0.9992 0.0322 = 0.9670 A. 0.0314 B. 0.9032 C. 0.9426 *D. 0.9670 E. 0.9938 3. Using the rules for shift and scale changes, if X ~ N(25, 32), what is the distribution of W = 10*X + 5? A. W = 255, W = 30, but we can't say the shape is normal. Shift changes affect only the mean, where scale changes affect both the mean and the standard deviation: W = 10*X + 5 = 10*25 + 5 = 255, W = 10*X = 10*3 = 30. Shift and scales changes do not affect the shape of a distribution, so the shape is still normal. B. N(250, 302) C. N(255, 352) *D. N(255, 302) E. N(25, 32) 4. Suppose the IQ of children with Fetal Alcohol Syndrome is normally distributed with true mean = 70 and true standard deviation = 12. What does P(X < 50), where X is the IQ's of these children, actually mean? P = how likely, X = a child with FAS’s IQ, < 50 = 50 or less A. how likely an average child with FAS would have an IQ of 50 or less an average would mean an X *B. how likely any child with FAS would have an IQ of 50 or less C. how likely a sample of children with FAS would have an average of 50 or less not a sample, just one D. how likely one child out of a sample of children with FAS would have an IQ of 50 or less close but not quite, it’s one out of the whole population E. how likely the true mean IQ of children with FAS is 50 or less this is talking about 5. The term ‘sampling distribution’ refers to A sampling distribution is a distribution of a statistic, each observation requiring a sample, hence the inclusion of the word ‘sampling’ in the name. A. the distribution of the sample data No, this is just a plot of the data. B. the standard normal curve This is just the Z curve. C. the distribution of parameters Parameters are FIXED numbers, so they don’t have distributions. D. the distribution from which we took our sample This is the parent population. *E. the distribution of our sample statistic 6. There exists at least 100 years of weather data, but suppose we only took a sample of years and calculated a 95% confidence interval for the true yearly average total rainfall of (16.5,22) inches (obviously this isn't for Texas A&M!). Which of the following is best interpretation of this interval? The confidence interval (16.5, 22) is one particular interval out of innumerable ones possible. It either contains the true mean, , are not. A. There is a 95% probability that the true yearly average total rainfall is between 16.5 and 22 inches. Never, never, never does a particular interval have any probability!! B. If we sampled many years, 95% of the yearly totals would be between 16.5 and 22 inches. Again, this is just one particular interval. Who knows how many actual yearly totals would fall in this range. It could be a lot if is near 19.25 = x for this data, or not many if it’s far away. C. As long as we sampled enough years (at least 30), we will be 95% confident that the true yearly average total rainfall between 16.5 and 22 inches. See B. *D. If we took many different samples of years, about 95% of the con_dence intervals created from these samples would contain the true yearly average total rainfall. E. If we took many different samples of years, about 95% of the yearly average totals would equal the true yearly average total rainfall. Remember the homework! The probability of a sample mean exactly equaling is 0 for continuous data. In reality, although that value would be the most likely (the mode or peak of the distribution), there’s tons of other values. 7. Referring to the confidence interval in the last problem, (16.5,22), which of the following statements are plausible for this data (with 95% confidence)? NOTE: I rewrote these answers after the comments I got. This will be more what you will see on the final. *A. It is plausible that the true yearly average total rainfall is 20 inches. Since 20 falls within the interval, 20 is plausible. B. It is plausible that the true yearly average total rainfall is 16 inches. Since 16 does not fall within the interval, 16 is NOT plausible C. It is plausible that the true yearly average total rainfall is NOT 22 inches. Since 22 is within the interval, 22 is plausible. D. All of the above are plausible. E. Only two of the above are plausible. 8. We want to know the percentage of former students that use Statistics after graduation. What is the “best” sampling scheme to apply? A. We ask the Association of Former Students for a list of graduates. We chose the first 1000 names on the list to interview. Not a random sample. B. We put an advertisement in the 3 major newspapers in U.S. asking former students to contact us regarding a survey. Not a random sample or a representative one. *C. We ask the Association of Former Students for a list of graduates. We took a simple random sample of 1000 persons fron the list to interview them. Although some of you don’t realize this, YES, they keep records of all of us. D. We send 3 students to different gates in Kyle Field before the A&M-UT game. They ask the people at the gates if they are former students and if they answer yes, we ask the question of interest. The students stop after 200 answers have been recorded. Not a random sample not representative even though our football attract lots of former students. E. None of the above would give us the proper sample. 9. Suppose you have 3 confidence intervals for from the same data: 90% - (0.45,0.55), 95% - (0.4,0.6) and 99% (0.2,0.8). If I wanted to know whether the true proportion was 75% or not, what could I conclude? For the two-sided alternative, HA: 0.75, use the following ‘rules’: If the hypothesized value falls within the confidence interval, we cannot reject that value, so the p-value > . If the hypothesized value falls outside the confidence interval, we reject (H 0) that value, so the p-value < . A. Since 75% is only in the 99%, I would reject 75% as a plausible value for at the 1% level. *B. Since 75% is only in the 99%, I would reject 75% as a plausible value for at the 5 and 10% levels. Since 0.75 is NOT in the 90 or 95%, we would reject at = 0.05 and 0.10, so the p-value < 0.05 < 0.10. Since 0.75 is IN the 99%, we would fail to reject at = 0.01, so the p-value > 0.01 0.01 < the p-value < 0.05. C. Since 75% is only in the 99%, I would reject 75% as a plausible value for at the 10% level only. D. 75% is plausible since we can't guarantee the true proportion will fall in any interval. Although we can’t guarantee any probability, our data above agrees with answer B. Remember confidence intervals give us a range of plausible values. If the hypothesized value is plausible we can’t reject it. E. You can't make conclusions with confidence intervals, only hypothesis tests. 10. A new TAAS math test is being developed. It is believed that the sample proportion p that will pass in a sample of 40 students is normally distributed with = 0.73 and p = 0.07, p ~ N(0.73, 0.072). How likely are you to see less than half of a class of 40 students (assuming they make up a random sample) pass? how likely we get a sample prop of 50% or less P( p40 < 0.5) = P(Z < (0.5 0.73)/0.07) = P(Z < 3.29) = 0.0005 *A. 0.0005 B. 5% C. 0.4840 D. 0 E. 0.23 11. Assuming that my samples are always random, what could I do to reduce the standard deviation of my sample means, x by a factor of 3, i.e., make it one third as large? x x n x 3 n = 3, so n = 9 A. use 3 times as much data B. use one third as much data C. use 4 times as much data D. use 6 times as much data *E. use 9 times as much data 12. Ok, I just made these up, but suppose that the distribution of quiz grades is the following: X = X*p(X) = 0*0. + 10*0.25 + 20*0.2 + 30*0.1 + 40*0.25 + 50*0.1 = 24.5 X | 0 | 10 | 20 | 30 | 40 | 50 -------|--------|---------|---------|----------|---------|-------p(X) | 0.10 | 0.25 | 0.20 | 0.10 | 0.25 | 0.10 What is the true mean score of the quiz grades? A. We can only estimate the true mean. B. 25 C. 20 D. 30 *E. 24.5 13. What are the z critical values, the z/2, for a 64% confidence interval? Look up (1 0.64)/2 = 0.18 in the body of the table and read off z/2 A. 0.36 B. 0.57 *C. 0.915 D. 0.895 E. 0.74 14. “How would you describe your own physical health at this time? Would you say your health is excellent, good, only fair or poor?" If the true proportion of people who think their health is good is = 0.78, how many people must we sample to make the sampling distribution of the sample proportion, p, approximately normal? n 10 AND n(1) 10 n 10/0.78 AND n 10/(0.22) n 13 AND n 46 You MUST use the larger of the two. A. 10 B. 13 C. 30 *D. 46 E. 50 15. Suppose the measurements on your bathroom scale are normally distributed with true mean, =your true weight and true standard deviation, = 0.25lbs. If we look at all possible samples of 10 measurements of your weight on the bathroom scale, and calculated the sample mean for each, which of the following would be true about those sample means? The distribution of all possible sample means, X , is normal if the original population is normal (or if the sample size is large enough) with mean, ( X ) = (X) and standard deviation, ( X ) = (X)/n. *A. They would be normally distributed. B. They would not be normally distributed since the sample sizes are only 10. C. The standard deviation of the sample means would be 0.25. D. The true average of the sample means would be normally distributed. E. Exactly two of the above are true. 16. If we constructed 95% confidence intervals from the samples above, which of the following COULD be true about the intervals? In general, we expect 95% of the confidence intervals to include the true mean, but this fluxuate around 95%. Sometimes we can get all of the confidence intervals to contain and sometimes only 85%(or less) of them will. Remember the homework problem using the confidence interval applet where you had to create A. All of the confidence intervals contain all of the sample means. B. All of the confidence intervals contain your true weight. C. 95% of the confidence intervals contain your true weight. *D. All of the above could be true (but obviously not at the same time). E. Only two of the above could ever be true. 17. Which of the following is TRUE regarding the variation of a sampling distribution of a sample proportion, p? The sampling distribution of p ~ N( p = , p2 = (((1)/n))2) IF n AND n(1) 10. A. Its variation depends on the population size as well as the sample size. There is no N, the population size, in the statement above. (the N says the shape is normal) *B. The variance of its sampling distribution depends on the true population proportion, but not the sample proportion, p. There’s no p in p2 = (((1)/n))2. C. As the size of the sample increases, the variation of the sampling distribution approaches the variation of the population. p2 = (((1)/n))2 gets SMALLER as n increases, moving FURTHER from the parent standard deviation. NOTE: as n increases, p gets closer to (Law of Large of Numbers). *D. For a given sample size, the variation of the sampling distribution of a sample proportion, p, is larger when the population proportion is = 0.5 than when it is = 0.9. *E. Two of the above are true. 18. Suppose X N(50, 122) and Y ~ N(30, 152). If we take samples of size 25 from each, what is the distribution of the difference of the means, X 25 Y25 ? ‘Formulas for Sampling Distributions’ Handout: The distribution of the difference of two normally distributed random variables, X and Y is normal with mean, XY = X Y, and standard deviation, X-Y X2 Y2 X Y ~ N( X Y, 2 X2 Y2 ). *A. The shape is normal and independent. B. X Y = 20 and 25 C. X 25 25 Y25 = 20 and X 25 Y25 X 25 Y25 X 25 Y25 = 20, but we can't calculate X 25 Y25 without knowing that X and Y are = 3.84, but we can't say the shape is normal. = 3, but we can't say the shape is normal. D. N(20, 32) E. N(20, 3.842) 19. What do we mean by the term confidence in reference to confidence intervals? We are confidence in the method will produced an interval that contains the parameter of interest (1)100% of the time. See IPS p. 420 A. We are confident that our data is random. Randomness is part of the requirements of our method. *B. We are confident that our method produces intervals that contain the parameter (1)100% of the time. C. We are confident that our method produces intervals that contain the parameter *100% of the time. 1, not D. We are confident that our interval contains the parameter. Close, but we are actually confident that what we did will contain the parameter (1)100% of the time. E. We are confident that our method produces intervals that contain the statistic (1)100% of the time. Not the statistics, it’s the center of our interval. 20. What is the name of the following theorem “In repeated independent samples from any population, the sample mean of the observed values will get as close to the population mean as you choose"? A. The Central Limit Theorem See IPS, Ch 5, p. 397: Draw an SRS of size n from any population with mean and standard deviation . When n is large, the sampling distribution of the sample mean x is approximately normal. *B. The Law of Large Numbers See IPS, Ch 4, p.322 and 333: the law of large numbers says that the average of the values of X observed in many trials must approach . C. Bayes theorem See IPS p. 349. Bayes rule is used to calculate conditional probabilities. D. Sampling Distribution See number 5 above. E. Simpson's Paradox See IPS p. 618: An association or comparison that holds for all of several groups can reverse direction when th e data are combined to form a single group.