Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Unit 3 Review solutions 1. a. The sample space consists of the following pairs: AB, AC, AD, AE, AF, AG, BC, BD, BE, BF, BG, CD, CE, CF, CG, DE, DF, DG, EF, EG, FG (21 equally likely outcomes) b. The outcomes for which both are female are AD, AG, DG. This probability is 3/21, which is about .143. c. The outcomes for which the average age exceeds 50 are BF, BG, CF, CG, DF, DG, EF, EG, FG. This probability is 9/21, which is about .429. d. The only such outcome is FG, so the probability is 1/21, which is about .048. e. If the company really selected the two employees by random chance, there would be less than a 5% chance of choosing the two oldest employees. This is certainly possible, but this probability is small enough to cast considerable doubt on the claim that the selection of the two oldest employees was by random chance. Therefore, there is at least moderate evidence to support the allegation of age discrimination. 2. a. The z‐score is (70 – 75)/5 = 1.00. The area to the left of 1.00 under the standard normal curve is .1587. A sketch follows: b. To reduce this probability to .05 requires a z‐score of –1.645. (This is from your table. Looking up the area of .05 on the table leads to the z‐score being between ‐1.64 and ‐1.65 so we estimate it to be in the middle or 1.645. You could have used –1.64 or –1.65) If you let k denote the new drying time cutoff value, you need (k – 75)/5 = 1.645, so k – 75 = 1.645(5) or 66.775 minutes. A sketch follows: c. With a smaller standard deviation, the error probability in part a would be much smaller. The normal curve would be less spread out (taller and skinnier), so there would be less area to the left of 70. With a smaller standard deviation, the revised cutoff value in part b would be larger. The normal curve would be less spread out (taller and skinnier), so the point at which 5% of the area is to the left would be closer to the mean and, therefore, a larger cutoff value. 3. a. The 80% value is a parameter because it is a number that describes the entire population (all incoming email messages at this college). b. The symbol used to represent the .8 would be π. c. The mean of the sampling distribution of sample proportions will equal the population proportion that are spam, π , which is specified as equaling .8. The standard deviation of the sampling distribution can be calculated as .p (1- p ) = n .8(1- .8) ª .028 200 .5 - .8 d. The z‐score would be z = ª -10.71 so yes, it would be incredibly surprising if the sample .028 turned out to have 50% or less spam, as this would essentially never happen when the population consists of 80% spam. That is, the CLT theorem predicts that almost all of the sample proportions would fall within 3 standard deviations above or below the mean of the sample proportions. A sample 200 messages that only had 50% spam given that the population proportion is 80%, would be more than 10 standard deviations below the mean and that would not be expected to ever happen. 4. According to the Central Limit Theorem, the sample proportion who die in a sample of 371 patients will vary according to a normal distribution, with mean equal to the population proportion of deaths (.20) and with standard deviation equal to .2(1- .2) ª .0208 371 A sketch follows: b. The z‐score is (.213 – .20)/.0208 or 0.63. The area to the left of 0.63 under the standard normal curve is .7357, so the probability is 1 – .7357 or .2643 that the sample proportion of deaths would be .213 or higher. 5. You cannot draw a reasonable conclusion about whether this is a fair coin without knowing the sample size. If the 75% heads was based on a sample of only four flips, then there’s no reason to suspect that the coin is not fair. But if the 75% heads was based on a large number of flips, that would suggest that the probability of heads is close to .75 and so the coin is not fair. 6. No, the distribution of house prices would still be skewed to the right. This is one sample of 1000 homes so it will have a distribution that is approximately the same as the population (prices of all home for sale) distribution. The Central Limit Theorem tells us that the sampling distribution of the sample mean house price would be very close to normal, but that result does not apply to the distribution of individual house prices. That is, the CLT refers to repeatedly taking samples and calculating the sample mean. The distribution of those sample means will be approximately normal with greater normality as the number of samples increases. 7. a. The value .45 is a parameter, and you would use the symbol π to represent it. b. Sampling variability refers to the variability in a statistic– in this case the variability in sample proportions of orange candies. A parameter pertains to the entire population and so has a fixed value; it does not vary. c. You are more likely to get between 35% and 55% orange candies if you take a random sample of 400 candies than if you take a random sample of 40 candies. The Central Limit Theorem tells you that the sampling distribution of the sample proportion of orange candies will be centered around .45 and the spread of the distribution will decrease as the sample size increases. So, results using a larger sample size will be more likely to be grouped around 45% than will results using a smaller sample size. d. You are more likely to get more than 60% orange candies if you take a random sample of 40 candies than if you take a random sample of 400 candies because .60 is an extreme value, far from the mean of .45. The larger the sample size, the smaller the sampling variability. If 45% of all Reese’s Pieces are colored orange, it would be very difficult to find a sample of 400 candies in which 60% or more were orange. This result, however, is much more likely to happen in a small sample of only 40 candies. e. The observational units are the samples of 40 candies, and the variable is the proportion of orange candy in each sample. 8. a. No, it would not be reasonable to model the duration of cell phone calls with a normal distribution because the distribution could not be symmetric with these mean and standard deviation values. Two standard deviations below the mean would indicate negative lengths of cell phone calls, whereas two standard deviations above the mean would indicate calls lasting 4.5 minutes, which is a very reasonable (not very extreme) cell phone call length. It seems more plausible to use a distribution that is skewed to the right to model cell phone call lengths. b. Yes, it would be reasonable to use the Central Limit Theorem to describe the distribution of the sample mean call duration because the sample size used is large ( > 30). c. The CLT says the sampling distribution of the sample mean call duration will be approximately normal, with mean 1.7 minutes and standard deviation 1.4 = .181 minutes. 60 d. Here is a sketch of the sampling distribution: e. If the sample size were 160 calls rather than 60 calls, the curve would be much narrower with less horizontal spread. The area would be much more concentrated around the center (1.7) because the standard deviation of the curve would be smaller. 9. The probability of winning one dollar is .474 means that if you play this game a very large number of times, then the long‐run proportion of spins for which you win one dollar will be very close to .474. In other words, you will win one dollar in very close to 47.4% of the spins if you play a very large number of times.