Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Homework 3 Solution This HW reviews the normal distribution, confidence intervals and the central limit theorem. (1) Suppose that X is a normally distributed random variable where X ∼ N (75, 32 ) (mean 75 and standard deviation 3). (i) Calculate P (X > 67). = −8/3 = −2.66. We We make the z-transform z = 67−75 3 look this up in the tables, to get 0.0039. This is the area to the left of −2.66, the area to the right (which matches P (X > 67)) is 1 − 0.0039. (ii) Find the x such that P (75 − x ≤ X ≤ 75 + x) = 0.99. This means the area in the tails should in total be 1%. By symmetry that is 0.5% on either side of the tail. Looking up 0.5% inside the tables gives 2.57 on the outside. This means x should be 2.57 standard deviations from the mean. Therefore x = 2.57 × 3 = 7.71. (2) A patient is classified as having gestational diabeties if the glucose level is above 140 miligams per deciliter one hour after ingesting a sugary drink. Lucy’s measured sugar level varies according to a normal distribution with mean µ = 125mg/dl and standard deviation 10mg/dl. Since the her mean level is below 140mg/dl she does not have gestational diabetes. However, in reality the mean level is unknown, all that 1 is known are readings taken from blood samples. Therefore, below we want to evaluate the chance of wrongly diagnosing gestational diabetes based on the samples taken. (a) Suppose one single measurement is made (one blood sample), what is the probability that she will be misdiagnosed as having gestational diabetes (in other words what is the chance that her measurement will be above 140mg/dl given that a single measurement is normally distributed with mean µ = 125mg/dl and standard deviation 10mg/dl). We want to calculate the chance that her measurement will be over 140. As measurements are close to normal we use the normal distribution to calculate this. It is easiest to understand with a picture. Draw a normal distribution centered about 125 with standard deviation 10. We want to calculate the area to the right of 140 (this is the probability). To do this make a z-transform 140. z = (140 − 125)/10 = 1.5 (remember we subtract from 125 since this is her mean level). The area to the right of the z-score 1.5 is 1 − 0.933 = 0.066. So the chance of her being diagnosed based on just one measurement is 6.6%. (b) Instead suppose that on three separate days measurements are made and the average measurement is taken over these three days. 2 What is the probability that she will be misdiagnosed as having gestational diabetes (in other words what is the chance that her average over these three measurements will be above 140mg/dl)? Hint: What is the distribution of the sample mean based on three measurements given that a single measurement is normally distributed with mean µ = 125mg/dl and standard deviation 10mg/dl? This is the same as the above, however the main difference is that we use the average of three measurements. The difference √ now is that the standard error has changed from 10 to 10/ 3 = 5.77. The distribution of the sample mean normal with mean 125 (as before) but with standard error 5.77. We before we want to calculate the area to the right of 140 (but using this new standard error). The z-transform is z = (140 − 125)/5.77 = 2.59. The area to the right of 2.59 is 0.0046. Thus the chance of her falsely being diagnosed using the average of three measurements goes down to 0.4%. (c) Compare your solutions from part (a) and part (b). What have you notice about the probability of false diagnosis as a larger sample is used? As the sample size increases, the standard error of the sample mean goes down. The the chance of a wrong diagnoses decreases. 3 (3) Suppose the scores of high school ACT test have mean 19.2 and standard deviation 5.1. As we discussed in class, ACT scores are only very approximately normally distributed. (a) Using the normal distribution, what is the approximate probability that a single randomly selected student will score 23 or higher? The population mean is µ = 19.2 and standard deviation is σ = 5. In order to calculate the probability we assume normality (even though this is not strictly true) and calculate the z-transform z = x−µ = 23−19.2 = 0.75. σ 5.1 Thus the probability P (Z > 0.75) = 0.2266. In other words the probability of a student getting over 23 marks is approximately 22.66% (approximately because we assumed normality of the distribution of scores). (b) A simple random sample of 25 students is taken. What is the mean and standard deviation of the average score (sample mean x̄) of these 25 students? The mean of the sample mean is the same as the population mean µ = 19.2. The standard deviation √ √ of the sample mean is the standard error, which is σ/ n = 5.1/ 25 = 1.02. (c) Using the normal distribution, what is the approximate probability that the sample mean score of these 25 randomly selected students will be 23 or higher? x̄−µ √ = 23−19.2 = Like part (a), we make a z-transform z = σ/ 1.02 n 3.73. Looking this up in the tables gives P (Z > 3.73) = 0.0001. (d) Which of your Normal probability calculations (a) and (c) will be the most accurate, give a reason for your answer? The central limit theorem tells us the distribution will be much more normal if the sample size grow larger. As we have calculated both the probabilities in (a) and (c) under the assumption of normality, the probability in part (c) will be a more accurate estimate the probability. (4) (i) 300 different samples are drawn, each sample is of size 50. For each sample a 90% confidence interval (CI) for the mean µ is con4 structed. On average, how many of the intervals will contain the mean? 300 × 0.9 = 270 (ii) Suppose it is known that the smallest adult is 1.5 feet tall and the tallest known adult is 8.5 tall. A sample of size 50 people is drawn, the average height using this sample is 5.5 feet tall. Give a 100% CI for the mean adult height. This is a slightly trick question. 100% means we need to completely sure it will contain the mean. We know that the smallest person is 1.5 and the tallest is 8.5. Therefore, the mean height must lie somewhere in this interval. Therefore a 100% CI is [1.5, 8.5]. Some of you used the absolute end points of the normal tables, which is a very reasonable solution, but technically this is still not quite 100%. (iii) Suppose a random sample of size 40 is drawn from a population which hasPmean µ and variance σ 2 . I evaluate the sample mean 40 1 X̄ = 40 i=1 Xi . It is known the standard error of the sample mean is 0.5. What is the standard deviation of the original population? To answer this question we use the formula for the √ standard error =standard deviationqof population/ n and solve for σ. This is s.e. = 0.5 = √ ⇒ s.d = 0.5 × 40 = 3.16 σ2 n = s.d √ n = s.d √ 40 (5) A random sample of size 15 is drawn. The QQplot is given below. Suppose that the sample mean is X̄ = 0.606 and the population variance is σ 2 = 1. (a) Construct a 95% CI for the mean. 5 [0.606 ± 1.96 × q 1 15 = [0.1, 1.11] (b) Based on the QQplot comment on whether the 95% CI for the mean is reliable. Give a reason for your answer. The sample size n = 15 is small, hence for the CI to be reliable the distribution of the population should be close to normal. Looking at the QQplot of the observations, the points tend to be on the 45◦ line, suggesting that the observations have come a distribution which does not differ much from a normal distribution. Based on this observation the 95% confidence interval appears to be reliable interval at the 95% level. (6) Suppose that the population mean and variance is µ and 10 respectively, and the distribution is bimodal. A random sample of size 30P is drawn 30 1 from this population and evaluate the sample mean, X̄ = 30 i=1 Xi . (i) What is the approximate distribution of X̄ (give the mean and variance), and given a reason for your answer? Even though the original population is bimodal (does not look at all normal), as the sample size is relatively large (30 observations) it is reasonable to suppose that the sample mean is close to normal (just play with the applet to see this). Therefore roughly speaking we can say X̄ ∼ N (µ, 10 ) 30 (ii) Over your sketch make a sketch of the (density) distribution of X̄. A Bimodal with a much narrower normal distribution superimposed over the top. The both share the same mean µ. 6 (iii) Suppose that the population mean is µ = 5. Find the probability that the sample mean X̄ is greater than 6.5. 10 ). CenThis is calculation using the normal distribution N (5, 30 ter the distribution about 5 and place 6.5 to the right of 5 and calculate the area to the right of 6.5 using the normal tables. 1.5 6.5−5 } = P {Z > 0.58 } = P {Z > P {X̄ > 6.5} = P {Z > √ 10/30 2.59} = 0.0047 ∼ = 0.5% 7