Download Slides - Dr Frost Maths

S3: Chapter 3 – Estimation and Confidence Intervals Dr J Frost ([email protected]) www.drfrostmaths.com Last modified: 30th August 2015 RECAP: Sampling Distributions What is: A statistic? A value based only on the samples (i.e. ? not any population parameters) Sampling distribution of a statistic 𝑇? The probability distribution of 𝑻 (i.e. a distribution describing how the statistic can ? vary as we consider possible samples) e.g. The distribution of heights of everyone in the population. Population distribution Generate sample 𝑋1 𝑋2 𝑋3 𝑇 𝑋4 Each 𝑋𝑖 is a distribution representing the choice of each thing in the sample. Since each sample is chosen from the population, each 𝑋𝑖 has the same distribution as the population. The samples are combined in some way to form a statistic (e.g. mean, maximum, mode). As above, this forms a sampling distribution as we consider possible samples. RECAP: Sampling Distributions A large bar contains counters. 60% of the counters have the number 0 on them and 40% have the number 1. a. Find the mean 𝜇 and variance 𝜎 2 for this population of counters. A simple random sample of size 3 is taken from this population. b. List all possible samples. c. Find the sampling distribution for the mean: 𝑋1 + 𝑋2 + 𝑋3 𝑋= 3 where 𝑋1 , 𝑋2 and 𝑋3 are the three variables representing counters within the sample. d. Hence find 𝐸 𝑋 and 𝑉𝑎𝑟 𝑋 e. Find the sampling distribution for the mode 𝑀. f. Hence find 𝐸(𝑀) and 𝑉𝑎𝑟 𝑀 a 𝒙 0 1 𝑃 𝑋=𝑥 3 5 2 5 ?2 𝜇= c List systematically: ? (0,1,0), etc. (0,0,0), (1,0,0), 𝑿 0 𝑃 𝑋=𝑥 27 125 5 6 𝜎 = 25 2 b d ? 1 3 54 125 2 3 36 125 𝐸 𝑋 = 2 5 ?2 𝑉𝑎𝑟 𝑋 = e 1 𝑴 0 𝑃 𝑀=𝑚 8 125 25 ? 81 125 f 𝐸 𝑀 = 1 44 125 44 𝑉𝑎𝑟 𝑀 = 0.228 125 ? Estimating Population Parameters A large bar contains counters. 60% of the counters have the number 0 on them and 40% have the number 1. A simple random sample of size 3 is taken from this population. In the example we saw: Population Mean 𝜇 = 2 5 Variance 𝜎 2 = 6 25 Mode 𝑀 = 0 ? Sample Average mean across samples: 2 𝐸 𝑋 = 5 Variance of mean across samples: 2 𝑉𝑎𝑟 𝑋 = 25 To estimate the mean number of the population, we sampled 3 counters and took the mean for them 𝑋. ‘On average’, what do you notice about our estimate of the mean? It’s the same as the population! ? 44 = 0.352 125 𝑉𝑎𝑟 𝑀 = 0.228 𝐸 𝑀 = What do you notice about the mode seen on average across samples? It’s different from the population. ? ! Estimator: A statistic used to estimate a population (e.g. 𝑋 for 𝜇). Estimate: A particular value of an estimator. Estimating Population Parameters A large bar contains counters. 60% of the counters have the number 0 on them and 40% have the number 1. A simple random sample of size 3 is taken from this population. In the example we saw: Population Mean 𝜇 = 2 5 Variance 𝜎 2 = 6 25 Mode = 0 Sample Average mean across samples: 2 𝐸 𝑋 = 5 Variance of mean across samples: 2 𝑉𝑎𝑟 𝑋 = 25 44 = 0.352 125 𝑉𝑎𝑟 𝑀 = 0.228 𝐸 𝑀 = ! If a statistic 𝑻 is used as an estimator for a population parameter 𝜃 then the bias is: 𝐸 𝑇 −𝜃 ! If 𝐸 𝑇 − 𝜃 = 0, i.e. 𝐸 𝑇 = 𝜃, then 𝑇 is an unbiased estimator for 𝜃 e.g. 𝑋 is an unbiased estimator for 𝜇 But 𝑀 was a biased estimator of the true mode. See Proof > The variance of the sample mean A large bar contains counters. 60% of the counters have the number 0 on them and 40% have the number 1. A simple random sample of size 3 is taken from this population. Population 2 6 Variance 𝜎 2 = 25 Mean 𝜇 = 5 Mode 𝑀 = 0 Sample Average mean across samples: 2 𝐸 𝑋 = 5 Variance of mean across samples: 2 𝑉𝑎𝑟 𝑋 = 25 44 = 0.352 125 𝑉𝑎𝑟 𝑀 = 0.228 𝐸 𝑀 = We saw that the sample mean, 𝑋, varied as we considered different samples. Can you spot the relationship between the population variance 𝜎 2 , and this variance of the sample mean? 𝜎2 𝑉𝑎𝑟 𝑋 = 𝑛 ? where 𝑛 is the sample size. See Proof > What does this imply as we increase the sample size. Is this expected? If the sample size is larger, the sample mean varies less. This is expected as a large sample size makes our sample mean more accurate. ? Excel Demo Time! File Ref: DemoVarianceOfSampleMean.xlsx Unbiased estimator of 𝜎 2 We’ve seen that we can use 𝐸 𝑋 as an unbiased estimator for the population mean 𝜇. Can the variance of the sample be used as an unbiased estimator for the population? Consider the heights of students at Tiffin, say with 𝜎 2 = 0.4𝑚 If the sample size was 1, what (on average) would the variance of our sample be? What does this tell us about our use of the sample variance to represent the population variance? Variance of sample is 0. Therefore it must be a biased estimator, because on average ? to the variance of the population. the variance of the sample (i.e. 0) is not equal We can modified the formula for variance of a sample to give us an unbiased estimator for 𝝈𝟐 : ! 𝑆2 is an unbiased estimator of 𝜎 2 , i.e. 𝐸 𝑆 2 = 𝜎 2 Variance Type Population Variance (𝜎 2 ) Sample Variance (𝑆 2 ) Definition 1 Σ 𝑋−𝜇 𝑛 ? Simplified (“msmsm”) 2 i.e. average squared distance from the mean 1 Σ?𝑋 − 𝜇 𝑛−1 2 Σ𝑋 2 − 𝜇2 𝑛 ? 1 2 − 𝑛𝑋 2 ) (Σ𝑋 ? 𝑛−1 See Proof > Notation for estimators ! 𝜃 represents an estimator of 𝜃 𝜇=𝑋 ? 𝜎2 = 𝑆2 ? Examples 𝑆2 = The table below summarises the number of breakdowns, 𝑥, on a town’s bypass on 30 randomly chosen days. Calculate unbiased estimates of the mean and variance of the number of breakdowns. Num breakdowns 2 3 4 5 6 7 8 9 Num days 3 5 4 3 5 4 4 2 ? Σ𝑥 = 160 Σ𝑥 2 = 990? (note here that we’re taking into account the frequency here) 160 ? 𝜇=x= = 5.33 30 2 990 − 30 x 𝜎 2 = sx2 = = 4.71 29? 1 (Σ𝑋 2 − 𝑛𝑋 2 ) 𝑛−1 The random variable 𝑋 has a continuous uniform distribution defined over the range 0, 𝛼 . A random sample 𝑋1 , 𝑋2 , … , 𝑋𝑛 is taken. a) Show that 𝑋 is a biased estimate for 𝛼 and state the bias. b) Suggest a suitable unbiased estimator for 𝛼. Note first that this is a stupid question. Remember that to calculate bias 𝐸 𝑇 − 𝜃 𝛼 𝜇= 2 𝛼 𝐸 𝑋 = 𝜇 = ? 2 𝛼 𝛼 Bias = 2 − 𝛼 = − 2 Unbiased estimator would be where 𝐸 𝑇 = 𝛼 as desired. Can be achieved with 𝑇 = 2𝑋 Exercise 3B Standard Error 𝑆2 = 1 (Σ𝑋 2 − 𝑛𝑋 2 ) 𝑛−1 We previously saw that variance of the sample mean is: (i.e. how much the mean varies as we consider different samples) 𝜎2 𝑉𝑎𝑟 𝑋 = 𝑛 Naturally we might want to refer to the standard deviation of the sample mean. This is known as the standard error. 𝜎 𝑠 Standard error = or (if 𝜎 not known) ? 𝑛 𝑛 b Example: Recall the table from before: Num breakdowns 2 3 4 5 6 7 8 9 Num days 3 5 4 3 5 4 4 2 Twenty more days were randomly sampled and this sample had a mean of 6.0 days and 𝑠 2 = 5.0. b. Treating the 50 results as a single sample, obtain further unbiased estimates of the population mean and variance. c. Find the standard error of this estimate of the mean. d. Estimate the size of sample required to achieve a standard error of less than 0.25. New sample: 𝑦 = 6.0 ⇒ Σ𝑦 = 20 × 6 = 120 𝑠𝑦2 = 5.0 ⇒ Σ𝑦 2 = 815 So combined sample (𝑤): Σ𝑤 = 160 + 120 = 280 Σ𝑤 2 = 990 + 815 = 1805 Combined estimate of 𝜇 is: 280 𝑤= = 5.6 50 and the estimate for 𝜎 2 is 1805 − 50 × 5.62 2 𝑠𝑤 = = 4.84 49 𝑠𝑤 = 0.311 ? c d ? < 0.25 → 𝑛 > 77.38 ? Thus need sample of at least 78. 50 4.86… 𝑛 Exercise 3C Central Limit Theorem Suppose you rolled 𝑛 10-sided unfair dice, where each throw is 𝑋1 , 𝑋2 , …, 𝑋𝑛 (all with the same distribution 𝑋) and you added up the values (to obtain a distribution across possible sums, Σ𝑋). See Excel Demo! Questions based on demo: As 𝑛 becomes large, what distribution does Σ𝑋 seem to approximate? A Normal Distribution. ? Does how biased the die is (i.e. 𝑋) affect the type of distribution of 𝛴𝑋? No. It will affect the variance and mean of 𝜮𝑿, but we will always obtain a Normal ? Distribution for 𝜮𝑿, regardless of the distribution of each trial 𝑿. What can we therefore say about the sampling distribution of the sample mean 𝑋? Since 𝑿 is just 𝚺𝑿 divided by a constant 𝒏, ? it will also have a Normal Distribution. ! The Central Limit Theorem says that if 𝑋1 , 𝑋2 , … 𝑋𝑛 is a random sample of size 𝑛, from a population of mean 𝜇 and 2 variance 𝜎 , then 𝑋 is approximately ~𝑁 𝜎2 𝜇, 𝑛 ? Examples June 2013 (R) Bro Note: You might think you could have used 𝑥 = 𝜎2 17.2. However, in 𝑋~𝑁(𝜇, 𝑛 ), we would only resort to an estimate of 𝜇, i.e. 𝑥, if 𝜇 was not known. But 𝜇 is known! ? S3 Textbook Example A sample of size 9 is taken from a population with distribution 𝑁 10, 22 . Find the probability that the sample mean 𝑋 is more than 11. 𝟒 𝑿~𝑵 𝟏𝟎, Bro Tip: Although 𝑋 here (and 𝟗 therefore 𝑋) is discrete, do not 𝟏𝟏 − 𝟏𝟎 use continuity correction unless 𝑷 𝑿 > 𝟏𝟏 = 𝑷 𝒁 > = 𝑷 𝒁 > 𝟏. 𝟓 = 𝟎. 𝟎𝟔𝟔𝟖 𝟐 asked to find 𝑃 Σ𝑋 > 120 𝟑 ? S3 Textbook Example A cubical die is relabelled so that there are three faces marked 1, two faces marked 3 and one marked 6. The die is rolled 40 times and the mean of the 40 scores is recorded. Find an approximation for the probability that the mean is over 3. 𝝁 = 𝑬 𝑿 = 𝟐. 𝟓 𝟏𝟑 𝑋 1 3 6 𝝈𝟐 = 𝑽𝒂𝒓 𝑿 = 𝟒 𝑝(𝑥) 1/2 1/3 1/6 𝟏𝟑 𝑿 ≈ ~𝑵 𝟐. 𝟓, 𝑷 𝑿 > 𝟑 = 𝑷 𝒁 > 𝟏. 𝟕𝟓 = 𝟎. 𝟎𝟒𝟎 𝟏𝟔𝟎 ? Exercise 3D Review Questions What is a statistic? A value calculated only from the sample (and not population parameters), e.g. mean. ? What is meant by the sampling distribution of a statistic? A distribution over possible values of the statistic as we consider different samples. ? What does 𝑋 represent? How conceptually does it differ from 𝑥? 𝒙 is the sample mean from a specific sample. 𝑿 is a distribution over possible sample means across all possible samples (i.e. sampling ? distribution of sampling mean). What things do we know about the original population distribution 𝑋? (i.e. the distribution things in our sample are sampled from) It could be any distribution. But we refer to the mean as 𝝁 and variance as 𝝈𝟐 , which ? are population parameters. What do we know about how 𝑋 is distributed in general? What extra information do we have if the sample size is large? 𝑿 has mean 𝝁 and 𝝈𝟐 variance 𝒏 ? If 𝒏 is large, then 𝑿 approximately has the distribution 𝑵 𝝈𝟐 𝝁, 𝒏 Review Questions What is an estimator? A statistic used to estimate a population parameter, e.g. 𝒙 could be used to estimate 𝝁. ? Why is it appropriate to use 𝑥 calculated from a sample as an estimator for 𝜇. Because on average 𝒙 will be equal to 𝝁, i.e. 𝑬 𝑿 = 𝝁. It is an unbiased estimator. ? Why is it not appropriate to use the variance of the sample as an estimator for 𝜎 2 ? The variance of the sample is lower than the population variance 𝝈𝟐 (e.g. consider when sample size is 1). We therefore use ‘sample variance’ 𝒔𝟐 , which is an unbiased ? estimator of 𝝈𝟐 . What is meant by 𝜇? An estimator of 𝝁. 𝝁=𝒙 ? Confidence Intervals We have seen we can use 𝜇 = 𝑥 as an (unbiased) estimator for the mean by finding the mean from the sample, when the true population mean 𝜇 is unknown. Suppose the population variance 𝜎 2 is known, and that we want a confidence interval where we a 95% sure the population mean 𝜇 lies within this range. Q Show that a 95% confidence interval for 𝜇, based on a sample of size 𝑛, is given by: 𝜎 𝑥 ± 1.96 × 𝑛 𝑿 approximated by 𝑵 𝝈𝟐 𝝁, 𝒏 (Note: typo in your textbooks, which says variance is Standard error (i.e. standard deviation of sample mean) is 𝝈 𝒏 Looking in reverse-Z table, we can see we’re in top 𝟐. 𝟓% if 𝒁 = 𝟏. 𝟗𝟔𝟎𝟎 (and by symmetry, in the bottom 2.5% if 𝒁 = −𝟏. 𝟗𝟔𝟎𝟎) By definition, these z values tell us the ? number of standard deviations above (or below) the mean, i.e. 𝝈 𝒙 ± 𝟏. 𝟗𝟔 × 𝒏 𝜎2 𝑛 ) Confidence Intervals ! The 95% confidence interval for 𝜇 is given by: 𝜎 𝑥 ± 1.96 × 𝑛 The 99% confidence interval is given by: 𝜎 𝑥 ± 2.5758 × 𝑛 I’d remember this first one, but 1.96 value can easily be obtained from tables. The width of the confidence interval is: 𝜎 2×𝑧× 𝑛 What is actually meant by the 95% confidence interval? The interval in which there is a 95% chance that 𝝁 lies within. Think how this is useful in real life: you’re doing a survey to establish the average BMI (Body Mass Index) of Americans. You take a sample of 30 people and the mean BMI in your sample is 23.2. Because it’s a small sample you can’t be certain that the true population mean BMI is exactly that, but you can have good certainty for example it lies between say 23.0 and 23.4 ? Quickfire Intervals Calculate the 95% confidence intervals for 𝜇: 𝝈 𝒏 𝒙 Interval for 𝝁 4 9 10 7.387 to?12.613 4 100 10 9.126 to?10.784 8 25 28.1 24.964 to ? 31.236 June 2013 (R) We earlier found: 3 𝑋~𝑁 𝑎 + 2, 50 𝟏𝟕. 𝟐 − 𝟏. 𝟗𝟔 × ? 𝟏𝟕. 𝟐 + 𝟏. 𝟗𝟔 × 𝟑 = 𝟏𝟔. 𝟕 𝟓𝟎 𝟑 = 𝟏𝟕. 𝟕 𝟓𝟎 𝟏𝟔. 𝟕 < 𝒂 + 𝟐 < 𝟏𝟕. 𝟕 𝟏𝟒. 𝟕 < 𝒂 < 𝟏𝟓. 𝟕 Exercise 3E Also to do… May 2012 Q3 ? ? ? One other thing… setting sample size … in the exam but not so much in your textbook. In a class the standard deviation of weights is 3kg but the mean is unknown. Therefore Bob takes a sample of people in the class and records their mean weight. Calculate the minimum sample size needed so that there is a 95% that the estimate of the population mean from the sample lies within 0.8kg of the true mean. 0.8 𝜎 = 1.96 𝑛 𝑛 > 54.0225 𝑛 = 55 1.96 × ? 3 = 0.8 𝑛 The ages of teachers at Tiffin School has a standard deviation of 8 years. I ask a few teachers for their age and find the mean from these. How many teachers do I need to ask such that there is a 99% chance that the estimate of the population mean age from the sample lies within 1 year of the true mean. 1 𝜎 = 2.5758 𝑛 𝑛 > 424.6237 … 𝑛 = 425 2.5758 × ? 8 =1 𝑛 Hypothesis Testing There’s nothing really new here! We can carry out a hypothesis test on the mean of a normal distribution. Q A certain company sells fruit juice in cartons. The amount of juice in a carton has a normal distribution with a standard deviation of 3ml. The company claims that the mean amount of juice per carton, 𝜇, is 60ml. A trading inspector has received complaints that the company is overstating the mean amount of juice per carton and he wishes to investigate this complaint. The trading inspector took a random sample of 16 cartons which gave a mean of 59.1ml. Using a 5% level of significance, and stating your hypothesis clearly, test whether or not there is evidence to uphold this complaint. 𝐻0 : 𝜇 =?60 𝐻1 : 𝜇 < ? 60 𝑃 𝑋 ≤ 59.1 ? 𝜇 = 60 = 𝑃 𝑍 ≤ 59.1 − 60 ?3 4 = 0.1151 ? 0.115 > 0.05 so the result is not significant and there is insufficient evidence to ? reject 𝐻0 , that 𝜇 = 60 There is insufficient evidence to support?the complaint. ! The test statistic in a test for the population mean 𝜇 is 𝑍 = 𝑋−𝜇 𝜎 𝑛 Further Example Using Critical Values As per S2, an alternative way of carrying out hypothesis tests is to find the critical value and see if our test statistic exceeds/goes below this. At a certain college new students are weighed when they join the college. The distribution of weights of students at the college when they enrol has a standard deviation of 7.5kg and a mean of 70kg. A random sample of 90 students from the new entry were weighed and their mean weight was 71.6kg. Assuming that the standard deviation has not changed. a. Test, at the 5% level, whether there is evidence that the mean of the new entry is more than 70kg. b. State the importance of the Central Limit Theorem to your test. 𝟕𝟎 a 𝑯𝟎 : 𝝁 =?𝟕𝟎, 𝑯𝟏 : 𝝁 > ? 𝝈 = 𝟕. 𝟓 Critical region is top 5%, which from table ? is 𝒁 ≥ 𝟏. 𝟔𝟒𝟒𝟗 Test statistic: 𝟕𝟏. 𝟔 − 𝟕𝟎 𝒛= = 𝟐. 𝟎𝟐𝟑𝟗 𝟕. 𝟓 ? 𝟗𝟎 Value is in critical region so reject 𝑯𝟎 and conclude evidence that new ? class have a higher mean weight. b Central Limit Theorem is used to assume that 𝑿 is normally distributed. ? Test Your Understanding June 2011 Q7 ? ? ? ? Test Your Understanding – Two Tailed Test Example in textbook A machine produces bolts of diameter 𝐷 where 𝐷 has a normal distribution with mean 0.580cm and standard deviation 0.015cm. The machine is serviced and after the service a random sample of 50 bolts from the next production run is taken to see if the mean diameter of the bolts has changed from 0.580cm. The distribution of the diameters of bolts after the service is still normal with a standard deviation of 0.015cm. The mean diameter of the 50 bolts is 0.577cm. a. Stating your hypothesis clearly test, at the 1% level, whether or not there is evidence that the mean diameter of the bolts has changed. b. Find the critical region for 𝑋 in the above test. a 𝑯𝟎 : 𝝁 = 𝟎. 𝟓𝟖𝟎, 𝑯𝟏 :?𝝁 ≠ 𝟎. 𝟓𝟖𝟎 𝝈 = 𝟎. 𝟎𝟏𝟓 Critical region is top 0.5% and bottom 0.5%, which from tables is: 𝒁 ≤ −𝟐. 𝟓𝟕𝟓𝟖 or 𝒁 ≥ 𝟐. 𝟓𝟕𝟓𝟖 Test statistic: 𝟎. 𝟓𝟕𝟕 − 𝟎. 𝟓𝟖𝟎 𝒛= = −𝟏. 𝟒𝟏𝟒 𝟎. 𝟎𝟏𝟓 𝟓𝟎 Value is NOT in critical region so accept 𝑯𝟎 and conclude no significant evidence that mean diameter has changed. ? ? ? b Critical region: 𝟎. 𝟓𝟖𝟎 ± 𝟐. 𝟓𝟓𝟖 × 𝟎.𝟎𝟏𝟓 𝟓𝟎 → 𝑿 ≤ 𝟎. 𝟓𝟕𝟓 𝒐𝒓 𝑿 ≥ 𝟎. 𝟓𝟖𝟓 ? Exercise 3F Reasoning about difference between means GIRLS BOYS 𝜎𝑏𝑜𝑦 = 5𝑘𝑔 Sample: 𝑛1 = 25 𝑥1 = 48𝑘𝑔 𝜎𝑔𝑖𝑟𝑙 = 8𝑘𝑔 Sample: 𝑛2 = 30 𝑥2 = 45𝑘𝑔 The weights of boys and girls in a certain school are known to be normally distributed with standard deviations of 5kg and 8kg respectively. A random sample of 25 boys had a mean weight of 48kg and a random sample of 30 girls had a mean weight of 45kg. Stating your hypothesis clearly test, at the 5% level of significance, whether or not there is evidence that the mean weight of boys in the school is greater than the mean weight of the girls. Reasoning about difference between means We know that 𝑋 is roughly normally distributed as the sample size becomes large. If 𝑋~𝑁 𝜎𝑥2 𝜇𝑥 , 𝑛 and 𝑌~𝑁 𝜇𝑦 , 𝜎𝑦2 𝑛 then by Chapter 1: 𝝈𝟐𝒙 𝝈𝟐𝒚 𝑿 − 𝒀 ~ 𝑵 𝝁𝒙 − ? 𝝁𝒚, 𝒏 + 𝒏 And thus to produce a standardised variable 𝑍: 𝑿? − 𝒀 − 𝝁𝒙 − 𝝁𝒚 ? 𝒁= 𝟐 𝝈𝟐𝒙 𝝈𝒚 ? 𝒏 + 𝒏 ! Test for difference between two means If 𝑋~𝑁 𝜇𝑥 , 𝜎𝑥2 and the independent variable 𝑌~𝑁 𝜇𝑦 , 𝜎𝑦2 then a test of the null hypothesis 𝐻0 : 𝜇𝑥 = 𝜇𝑦 can be carried out using the test statistic: 𝑋 − 𝑌 − 𝜇𝑥 − 𝜇𝑦 𝑍= 2 𝜎𝑥2 𝜎𝑦 𝑛 + 𝑛 If the same sizes 𝑛𝑥 and 𝑛𝑦 are large then the result can be extended, by the Central Limit Theorem, to include cases where the distributions of 𝑋 and 𝑌 are not normal. Example Q The weights of boys and girls in a certain school are known to be normally distributed with standard deviations of 5kg and 8kg respectively. A random sample of 25 boys had a mean weight of 48kg and a random sample of 30 girls had a mean weight of 45kg. Stating your hypothesis clearly test, at the 5% level of significance, whether or not there is evidence that the mean weight of boys in the school is greater than the mean weight of the girls. 𝐻0 : 𝜇𝑏𝑜𝑦 =?𝜇𝑔𝑖𝑟𝑙 𝜎1 = 5, 𝑛1 = 25 Test statistic: 𝐻1 : 𝜇𝑏𝑜𝑦 > ? 𝜇𝑔𝑖𝑟𝑙 𝜎2 = 8, 𝑛2 = 30 𝑧= 48 ? − 45 − 0 ? 25 64 + 25 ? 30 = 1.6947 ? 5% (one-tailed) critical value for 𝑍 is 𝑧 = 1.6449 so value is significant and you can reject 𝐻0 …and conclude there is evidence?that the mean weight of boys is greater than the mean weight of the girls. Test Your Understanding June 2013 Q6 ? ? ? ? Test Your Understanding – Two Tailed Q A manufacturer of personal stereos can use batteries made by two difference manufacturers. The standard deviation of lifetimes for Never Die batteries is 3.1 and for Everlasting batteries it is 2.9 hours. A random sample of 80 Never Die batteries and a random sample of 90 Everlasting batteries were tested and their mean lifetimes were 7.9 hours and 8.2 hours respectively. Stating your hypotheses clearly test, at the 5% level of significance, whether there is evidence of a difference between the mean lifetimes of the two makes of batteries. 𝑯𝟎 : 𝝁𝒙 = 𝝁𝒚 𝑯𝟏 : 𝝁𝒙 ≠ 𝝁𝒚 𝝈𝒙 = 𝟑. 𝟏, 𝒏𝒙 = 𝟖𝟎, 𝝈𝒚 = 𝟐. 𝟗, 𝒏𝒚 = 𝟗𝟎 𝟕. 𝟗 − 𝟖. 𝟐 𝒛= = −𝟎. 𝟔𝟒𝟗 … 𝟐 𝟐 𝟑. 𝟏 𝟐. 𝟗? + 𝟖𝟎 𝟗𝟎 The 5% two-tailed critical values for 𝒁 are 𝒛 = ±𝟏. 𝟗𝟔𝟎𝟎 So the value is not significant and you do not reject 𝑯𝟎 . No significant evidence of a difference in the mean lifetimes of the two makes of battery. Exercise 3G Q1, 3, 5, 7 Hypothesis Tests/Confidences when 𝑛 is large In our hypothesis tests/determining confidence intervals up to now, we have presumed we knew the population variance 𝜎 2 (which in turn allowed us to find the variance the sampling distribution 𝑋 or the test statistic 𝑥−𝜇 𝜎 𝑛 𝜎2 𝑛 ). What might seem an obvious thing to use for 𝜎 2 if we didn’t know it? We saw 𝒔𝟐 was an unbiased estimator for 𝝈𝟐 (i.e. a variance calculated from the sample with on average is the same as 𝝈𝟐 , i.e. 𝑬 𝒔𝟐 = 𝝈𝟐 ) This however is only appropriate for large? samples, and the test statistic is only an approximation. ! If the population is normal, or can assumed to be so, then for large samples, 𝑋−𝜇 the statistic 𝑠 has an approximation 𝑁 0,12 distribution. 𝑛 If the population is not normal, by assuming 𝑠 is a close approximation to 𝜎, 𝑋−𝜇 then 𝑠 can be treated as having an approximate 𝑁(0,12 ) distribution. 𝑛 of Example Q As part of a study into the health of young schoolchildren a random sample of 220 children from area 𝐴 and a second, independent random sample of 180 children from area 𝐵 were weighed. The results are given in the table below: 𝒏 𝒙 𝒔 𝑨𝒓𝒆𝒂 𝑨 220 37.8 3.6 𝑨𝒓𝒆𝒂 𝑩 180 38.6 4.1 a. Test at the 5% level, whether or not there is evidence of a difference in the mean weight of children in the two areas. b. State an assumption you have made in carrying out this test. c. Explain the significance of the Central Limit Theorem to this test. a 𝑯𝟎 : 𝝁 𝑨 = 𝝁 𝑩 𝑯𝟏 : 𝝁 𝑨 ≠ 𝝁 𝑩 Test statistic: 𝒛= b c 𝟑𝟖. 𝟔 − 𝟑𝟕. 𝟖 𝟑. 𝟔𝟐 𝟒. 𝟏𝟐 ?= 𝟐. 𝟎𝟓 (𝟑𝒔𝒇) + 𝟐𝟐𝟎 𝟏𝟖𝟎 Two-tail critical values are 𝒛 = ±𝟏. 𝟗𝟔. Since 2.05 > 1.96, result significant… Test statistic requires 𝝈 so you have to assume that 𝒔𝟐 = 𝝈𝟐 for both samples. You are not told that populations are normally distributed but the samples are both large as so the Central Limit Theorem enables us to assume that 𝑿𝑨 and 𝑿𝑩 are both normal. ? ? Test Your Understanding May 2012 Q5 ? ? Exercise 3H Appendix PROOFS On following slides. Proving 𝐸 𝑋 = 𝜇 Prove that 𝑋 is an unbiased estimator for 𝜇 when the population is normally distributed. A random sample 𝑋1 , 𝑋2 , … , 𝑋𝑛 is taken from a population with 𝑋~𝑁 𝜇, 𝜎 2 1 𝑋 + 𝑋2 + ⋯?+ 𝑋𝑛 𝑛 1 1 𝐸 𝑋 = 𝐸 𝑋1 + ⋯ + 𝑋?𝑛 𝑛 1 = 𝐸 𝑋1 + ⋯ +?𝐸 𝑋𝑛 𝑛 1 = 𝜇 +⋯+𝜇 ? 𝑛 1 = 𝑛𝜇 ? 𝑛 𝑛𝜇 = =𝜇 ? 𝑛 𝑋= Proof that 𝑉𝑎𝑟 𝑋 = 𝜎 2 /𝑛 A random sample 𝑋1 , 𝑋2 , … , 𝑋𝑛 is taken from a population with 𝑋~𝑁 𝜇, 𝜎 2 1 𝑋 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 𝑛 1 𝑉𝑎𝑟 𝑋 = 2 𝑉𝑎𝑟 𝑋1 + ? ⋯ + 𝑋𝑛 𝑛 1 = 2 𝑉𝑎𝑟 𝑋1 +?⋯ + 𝑉𝑎𝑟 𝑋𝑛 𝑛 1 = 2 𝜎 2 + ⋯ + 𝜎?2 𝑛 1 = 2 𝑛𝜎 2 ? 𝑛 𝜎2 = ? 𝑛 Proof of unbiased estimator for 𝜎 2 1 Prove that 𝑆 2 = 𝑛−1 Σ𝑋 2 − 𝑛𝑋 2 is an unbiased estimator for 𝜎 2 Note first 𝐸 𝑋 2 = 𝜎 2 +?𝜇2 And that 𝑉𝑎𝑟 𝑋 = Thus 𝐸 𝑋 2 = 𝜎2 𝑛 𝜎2 𝑛 ? +?𝜇2 (as proven earlier) and 𝐸 𝑋 = 𝜇 ? (as proven earlier) 1 Σ𝑋 2 − 𝑛𝑋 2 𝑛−1 1 𝐸 𝑆2 = 𝐸 Σ𝑋 2 − 𝑛𝑋 2 𝑛−1 1 = 𝐸 Σ𝑋 2 − 𝑛𝐸 𝑋 2 𝑛−1 𝑆2 = ? ? Since 𝐸 Σ𝑋 2 = Σ𝐸 𝑋 2 = 𝑛𝐸 𝑋 2 (as each thing in the sum is the same) 𝑛 𝐸 𝑋2 − 𝐸 𝑋2 𝑛−1 𝑛 𝜎2 2 2 = 𝜎 +𝜇 − + 𝜇2 𝑛−1 𝑛 𝑛 𝜎2 𝑛 − 1 = = 𝜎2 𝑛−1 𝑛 ? 𝐸 𝑆2 = ? ?

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Slides - Dr Frost Maths