Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
S3: Chapter 3 โ Estimation and Confidence Intervals Dr J Frost ([email protected]) www.drfrostmaths.com Last modified: 30th August 2015 RECAP: Sampling Distributions What is: A statistic? A value based only on the samples (i.e. ? not any population parameters) Sampling distribution of a statistic ๐? The probability distribution of ๐ป (i.e. a distribution describing how the statistic can ? vary as we consider possible samples) e.g. The distribution of heights of everyone in the population. Population distribution Generate sample ๐1 ๐2 ๐3 ๐ ๐4 Each ๐๐ is a distribution representing the choice of each thing in the sample. Since each sample is chosen from the population, each ๐๐ has the same distribution as the population. The samples are combined in some way to form a statistic (e.g. mean, maximum, mode). As above, this forms a sampling distribution as we consider possible samples. RECAP: Sampling Distributions A large bar contains counters. 60% of the counters have the number 0 on them and 40% have the number 1. a. Find the mean ๐ and variance ๐ 2 for this population of counters. A simple random sample of size 3 is taken from this population. b. List all possible samples. c. Find the sampling distribution for the mean: ๐1 + ๐2 + ๐3 ๐= 3 where ๐1 , ๐2 and ๐3 are the three variables representing counters within the sample. d. Hence find ๐ธ ๐ and ๐๐๐ ๐ e. Find the sampling distribution for the mode ๐. f. Hence find ๐ธ(๐) and ๐๐๐ ๐ a ๐ 0 1 ๐ ๐=๐ฅ 3 5 2 5 ?2 ๐= c List systematically: ? (0,1,0), etc. (0,0,0), (1,0,0), ๐ฟ 0 ๐ ๐=๐ฅ 27 125 5 6 ๐ = 25 2 b d ? 1 3 54 125 2 3 36 125 ๐ธ ๐ = 2 5 ?2 ๐๐๐ ๐ = e 1 ๐ด 0 ๐ ๐=๐ 8 125 25 ? 81 125 f ๐ธ ๐ = 1 44 125 44 ๐๐๐ ๐ = 0.228 125 ? Estimating Population Parameters A large bar contains counters. 60% of the counters have the number 0 on them and 40% have the number 1. A simple random sample of size 3 is taken from this population. In the example we saw: Population Mean ๐ = 2 5 Variance ๐ 2 = 6 25 Mode ๐ = 0 ? Sample Average mean across samples: 2 ๐ธ ๐ = 5 Variance of mean across samples: 2 ๐๐๐ ๐ = 25 To estimate the mean number of the population, we sampled 3 counters and took the mean for them ๐. โOn averageโ, what do you notice about our estimate of the mean? Itโs the same as the population! ? 44 = 0.352 125 ๐๐๐ ๐ = 0.228 ๐ธ ๐ = What do you notice about the mode seen on average across samples? Itโs different from the population. ? ! Estimator: A statistic used to estimate a population (e.g. ๐ for ๐). Estimate: A particular value of an estimator. Estimating Population Parameters A large bar contains counters. 60% of the counters have the number 0 on them and 40% have the number 1. A simple random sample of size 3 is taken from this population. In the example we saw: Population Mean ๐ = 2 5 Variance ๐ 2 = 6 25 Mode = 0 Sample Average mean across samples: 2 ๐ธ ๐ = 5 Variance of mean across samples: 2 ๐๐๐ ๐ = 25 44 = 0.352 125 ๐๐๐ ๐ = 0.228 ๐ธ ๐ = ! If a statistic ๐ป is used as an estimator for a population parameter ๐ then the bias is: ๐ธ ๐ โ๐ ! If ๐ธ ๐ โ ๐ = 0, i.e. ๐ธ ๐ = ๐, then ๐ is an unbiased estimator for ๐ e.g. ๐ is an unbiased estimator for ๐ But ๐ was a biased estimator of the true mode. See Proof > The variance of the sample mean A large bar contains counters. 60% of the counters have the number 0 on them and 40% have the number 1. A simple random sample of size 3 is taken from this population. Population 2 6 Variance ๐ 2 = 25 Mean ๐ = 5 Mode ๐ = 0 Sample Average mean across samples: 2 ๐ธ ๐ = 5 Variance of mean across samples: 2 ๐๐๐ ๐ = 25 44 = 0.352 125 ๐๐๐ ๐ = 0.228 ๐ธ ๐ = We saw that the sample mean, ๐, varied as we considered different samples. Can you spot the relationship between the population variance ๐ 2 , and this variance of the sample mean? ๐2 ๐๐๐ ๐ = ๐ ? where ๐ is the sample size. See Proof > What does this imply as we increase the sample size. Is this expected? If the sample size is larger, the sample mean varies less. This is expected as a large sample size makes our sample mean more accurate. ? Excel Demo Time! File Ref: DemoVarianceOfSampleMean.xlsx Unbiased estimator of ๐ 2 Weโve seen that we can use ๐ธ ๐ as an unbiased estimator for the population mean ๐. Can the variance of the sample be used as an unbiased estimator for the population? Consider the heights of students at Tiffin, say with ๐ 2 = 0.4๐ If the sample size was 1, what (on average) would the variance of our sample be? What does this tell us about our use of the sample variance to represent the population variance? Variance of sample is 0. Therefore it must be a biased estimator, because on average ? to the variance of the population. the variance of the sample (i.e. 0) is not equal We can modified the formula for variance of a sample to give us an unbiased estimator for ๐๐ : ! ๐2 is an unbiased estimator of ๐ 2 , i.e. ๐ธ ๐ 2 = ๐ 2 Variance Type Population Variance (๐ 2 ) Sample Variance (๐ 2 ) Definition 1 ฮฃ ๐โ๐ ๐ ? Simplified (โmsmsmโ) 2 i.e. average squared distance from the mean 1 ฮฃ?๐ โ ๐ ๐โ1 2 ฮฃ๐ 2 โ ๐2 ๐ ? 1 2 โ ๐๐ 2 ) (ฮฃ๐ ? ๐โ1 See Proof > Notation for estimators ! ๐ represents an estimator of ๐ ๐=๐ ? ๐2 = ๐2 ? Examples ๐2 = The table below summarises the number of breakdowns, ๐ฅ, on a townโs bypass on 30 randomly chosen days. Calculate unbiased estimates of the mean and variance of the number of breakdowns. Num breakdowns 2 3 4 5 6 7 8 9 Num days 3 5 4 3 5 4 4 2 ? ฮฃ๐ฅ = 160 ฮฃ๐ฅ 2 = 990? (note here that weโre taking into account the frequency here) 160 ? ๐=x= = 5.33 30 2 990 โ 30 x ๐ 2 = sx2 = = 4.71 29? 1 (ฮฃ๐ 2 โ ๐๐ 2 ) ๐โ1 The random variable ๐ has a continuous uniform distribution defined over the range 0, ๐ผ . A random sample ๐1 , ๐2 , โฆ , ๐๐ is taken. a) Show that ๐ is a biased estimate for ๐ผ and state the bias. b) Suggest a suitable unbiased estimator for ๐ผ. Note first that this is a stupid question. Remember that to calculate bias ๐ธ ๐ โ ๐ ๐ผ ๐= 2 ๐ผ ๐ธ ๐ = ๐ = ? 2 ๐ผ ๐ผ Bias = 2 โ ๐ผ = โ 2 Unbiased estimator would be where ๐ธ ๐ = ๐ผ as desired. Can be achieved with ๐ = 2๐ Exercise 3B Standard Error ๐2 = 1 (ฮฃ๐ 2 โ ๐๐ 2 ) ๐โ1 We previously saw that variance of the sample mean is: (i.e. how much the mean varies as we consider different samples) ๐2 ๐๐๐ ๐ = ๐ Naturally we might want to refer to the standard deviation of the sample mean. This is known as the standard error. ๐ ๐ Standard error = or (if ๐ not known) ? ๐ ๐ b Example: Recall the table from before: Num breakdowns 2 3 4 5 6 7 8 9 Num days 3 5 4 3 5 4 4 2 Twenty more days were randomly sampled and this sample had a mean of 6.0 days and ๐ 2 = 5.0. b. Treating the 50 results as a single sample, obtain further unbiased estimates of the population mean and variance. c. Find the standard error of this estimate of the mean. d. Estimate the size of sample required to achieve a standard error of less than 0.25. New sample: ๐ฆ = 6.0 โ ฮฃ๐ฆ = 20 × 6 = 120 ๐ ๐ฆ2 = 5.0 โ ฮฃ๐ฆ 2 = 815 So combined sample (๐ค): ฮฃ๐ค = 160 + 120 = 280 ฮฃ๐ค 2 = 990 + 815 = 1805 Combined estimate of ๐ is: 280 ๐ค= = 5.6 50 and the estimate for ๐ 2 is 1805 โ 50 × 5.62 2 ๐ ๐ค = = 4.84 49 ๐ ๐ค = 0.311 ? c d ? < 0.25 โ ๐ > 77.38 ? Thus need sample of at least 78. 50 4.86โฆ ๐ Exercise 3C Central Limit Theorem Suppose you rolled ๐ 10-sided unfair dice, where each throw is ๐1 , ๐2 , โฆ, ๐๐ (all with the same distribution ๐) and you added up the values (to obtain a distribution across possible sums, ฮฃ๐). See Excel Demo! Questions based on demo: As ๐ becomes large, what distribution does ฮฃ๐ seem to approximate? A Normal Distribution. ? Does how biased the die is (i.e. ๐) affect the type of distribution of ๐ด๐? No. It will affect the variance and mean of ๐ฎ๐ฟ, but we will always obtain a Normal ? Distribution for ๐ฎ๐ฟ, regardless of the distribution of each trial ๐ฟ. What can we therefore say about the sampling distribution of the sample mean ๐? Since ๐ฟ is just ๐บ๐ฟ divided by a constant ๐, ? it will also have a Normal Distribution. ! The Central Limit Theorem says that if ๐1 , ๐2 , โฆ ๐๐ is a random sample of size ๐, from a population of mean ๐ and 2 variance ๐ , then ๐ is approximately ~๐ ๐2 ๐, ๐ ? Examples June 2013 (R) Bro Note: You might think you could have used ๐ฅ = ๐2 17.2. However, in ๐~๐(๐, ๐ ), we would only resort to an estimate of ๐, i.e. ๐ฅ, if ๐ was not known. But ๐ is known! ? S3 Textbook Example A sample of size 9 is taken from a population with distribution ๐ 10, 22 . Find the probability that the sample mean ๐ is more than 11. ๐ ๐ฟ~๐ต ๐๐, Bro Tip: Although ๐ here (and ๐ therefore ๐) is discrete, do not ๐๐ โ ๐๐ use continuity correction unless ๐ท ๐ฟ > ๐๐ = ๐ท ๐ > = ๐ท ๐ > ๐. ๐ = ๐. ๐๐๐๐ ๐ asked to find ๐ ฮฃ๐ > 120 ๐ ? S3 Textbook Example A cubical die is relabelled so that there are three faces marked 1, two faces marked 3 and one marked 6. The die is rolled 40 times and the mean of the 40 scores is recorded. Find an approximation for the probability that the mean is over 3. ๐ = ๐ฌ ๐ฟ = ๐. ๐ ๐๐ ๐ 1 3 6 ๐๐ = ๐ฝ๐๐ ๐ฟ = ๐ ๐(๐ฅ) 1/2 1/3 1/6 ๐๐ ๐ฟ โ ~๐ต ๐. ๐, ๐ท ๐ฟ > ๐ = ๐ท ๐ > ๐. ๐๐ = ๐. ๐๐๐ ๐๐๐ ? Exercise 3D Review Questions What is a statistic? A value calculated only from the sample (and not population parameters), e.g. mean. ? What is meant by the sampling distribution of a statistic? A distribution over possible values of the statistic as we consider different samples. ? What does ๐ represent? How conceptually does it differ from ๐ฅ? ๐ is the sample mean from a specific sample. ๐ฟ is a distribution over possible sample means across all possible samples (i.e. sampling ? distribution of sampling mean). What things do we know about the original population distribution ๐? (i.e. the distribution things in our sample are sampled from) It could be any distribution. But we refer to the mean as ๐ and variance as ๐๐ , which ? are population parameters. What do we know about how ๐ is distributed in general? What extra information do we have if the sample size is large? ๐ฟ has mean ๐ and ๐๐ variance ๐ ? If ๐ is large, then ๐ฟ approximately has the distribution ๐ต ๐๐ ๐, ๐ Review Questions What is an estimator? A statistic used to estimate a population parameter, e.g. ๐ could be used to estimate ๐. ? Why is it appropriate to use ๐ฅ calculated from a sample as an estimator for ๐. Because on average ๐ will be equal to ๐, i.e. ๐ฌ ๐ฟ = ๐. It is an unbiased estimator. ? Why is it not appropriate to use the variance of the sample as an estimator for ๐ 2 ? The variance of the sample is lower than the population variance ๐๐ (e.g. consider when sample size is 1). We therefore use โsample varianceโ ๐๐ , which is an unbiased ? estimator of ๐๐ . What is meant by ๐? An estimator of ๐. ๐=๐ ? Confidence Intervals We have seen we can use ๐ = ๐ฅ as an (unbiased) estimator for the mean by finding the mean from the sample, when the true population mean ๐ is unknown. Suppose the population variance ๐ 2 is known, and that we want a confidence interval where we a 95% sure the population mean ๐ lies within this range. Q Show that a 95% confidence interval for ๐, based on a sample of size ๐, is given by: ๐ ๐ฅ ± 1.96 × ๐ ๐ฟ approximated by ๐ต ๐๐ ๐, ๐ (Note: typo in your textbooks, which says variance is Standard error (i.e. standard deviation of sample mean) is ๐ ๐ Looking in reverse-Z table, we can see weโre in top ๐. ๐% if ๐ = ๐. ๐๐๐๐ (and by symmetry, in the bottom 2.5% if ๐ = โ๐. ๐๐๐๐) By definition, these z values tell us the ? number of standard deviations above (or below) the mean, i.e. ๐ ๐ ± ๐. ๐๐ × ๐ ๐2 ๐ ) Confidence Intervals ! The 95% confidence interval for ๐ is given by: ๐ ๐ฅ ± 1.96 × ๐ The 99% confidence interval is given by: ๐ ๐ฅ ± 2.5758 × ๐ Iโd remember this first one, but 1.96 value can easily be obtained from tables. The width of the confidence interval is: ๐ 2×๐ง× ๐ What is actually meant by the 95% confidence interval? The interval in which there is a 95% chance that ๐ lies within. Think how this is useful in real life: youโre doing a survey to establish the average BMI (Body Mass Index) of Americans. You take a sample of 30 people and the mean BMI in your sample is 23.2. Because itโs a small sample you canโt be certain that the true population mean BMI is exactly that, but you can have good certainty for example it lies between say 23.0 and 23.4 ? Quickfire Intervals Calculate the 95% confidence intervals for ๐: ๐ ๐ ๐ Interval for ๐ 4 9 10 7.387 to?12.613 4 100 10 9.126 to?10.784 8 25 28.1 24.964 to ? 31.236 June 2013 (R) We earlier found: 3 ๐~๐ ๐ + 2, 50 ๐๐. ๐ โ ๐. ๐๐ × ? ๐๐. ๐ + ๐. ๐๐ × ๐ = ๐๐. ๐ ๐๐ ๐ = ๐๐. ๐ ๐๐ ๐๐. ๐ < ๐ + ๐ < ๐๐. ๐ ๐๐. ๐ < ๐ < ๐๐. ๐ Exercise 3E Also to doโฆ May 2012 Q3 ? ? ? One other thingโฆ setting sample size โฆ in the exam but not so much in your textbook. In a class the standard deviation of weights is 3kg but the mean is unknown. Therefore Bob takes a sample of people in the class and records their mean weight. Calculate the minimum sample size needed so that there is a 95% that the estimate of the population mean from the sample lies within 0.8kg of the true mean. 0.8 ๐ = 1.96 ๐ ๐ > 54.0225 ๐ = 55 1.96 × ? 3 = 0.8 ๐ The ages of teachers at Tiffin School has a standard deviation of 8 years. I ask a few teachers for their age and find the mean from these. How many teachers do I need to ask such that there is a 99% chance that the estimate of the population mean age from the sample lies within 1 year of the true mean. 1 ๐ = 2.5758 ๐ ๐ > 424.6237 โฆ ๐ = 425 2.5758 × ? 8 =1 ๐ Hypothesis Testing Thereโs nothing really new here! We can carry out a hypothesis test on the mean of a normal distribution. Q A certain company sells fruit juice in cartons. The amount of juice in a carton has a normal distribution with a standard deviation of 3ml. The company claims that the mean amount of juice per carton, ๐, is 60ml. A trading inspector has received complaints that the company is overstating the mean amount of juice per carton and he wishes to investigate this complaint. The trading inspector took a random sample of 16 cartons which gave a mean of 59.1ml. Using a 5% level of significance, and stating your hypothesis clearly, test whether or not there is evidence to uphold this complaint. ๐ป0 : ๐ =?60 ๐ป1 : ๐ < ? 60 ๐ ๐ โค 59.1 ? ๐ = 60 = ๐ ๐ โค 59.1 โ 60 ?3 4 = 0.1151 ? 0.115 > 0.05 so the result is not significant and there is insufficient evidence to ? reject ๐ป0 , that ๐ = 60 There is insufficient evidence to support?the complaint. ! The test statistic in a test for the population mean ๐ is ๐ = ๐โ๐ ๐ ๐ Further Example Using Critical Values As per S2, an alternative way of carrying out hypothesis tests is to find the critical value and see if our test statistic exceeds/goes below this. At a certain college new students are weighed when they join the college. The distribution of weights of students at the college when they enrol has a standard deviation of 7.5kg and a mean of 70kg. A random sample of 90 students from the new entry were weighed and their mean weight was 71.6kg. Assuming that the standard deviation has not changed. a. Test, at the 5% level, whether there is evidence that the mean of the new entry is more than 70kg. b. State the importance of the Central Limit Theorem to your test. ๐๐ a ๐ฏ๐ : ๐ =?๐๐, ๐ฏ๐ : ๐ > ? ๐ = ๐. ๐ Critical region is top 5%, which from table ? is ๐ โฅ ๐. ๐๐๐๐ Test statistic: ๐๐. ๐ โ ๐๐ ๐= = ๐. ๐๐๐๐ ๐. ๐ ? ๐๐ Value is in critical region so reject ๐ฏ๐ and conclude evidence that new ? class have a higher mean weight. b Central Limit Theorem is used to assume that ๐ฟ is normally distributed. ? Test Your Understanding June 2011 Q7 ? ? ? ? Test Your Understanding โ Two Tailed Test Example in textbook A machine produces bolts of diameter ๐ท where ๐ท has a normal distribution with mean 0.580cm and standard deviation 0.015cm. The machine is serviced and after the service a random sample of 50 bolts from the next production run is taken to see if the mean diameter of the bolts has changed from 0.580cm. The distribution of the diameters of bolts after the service is still normal with a standard deviation of 0.015cm. The mean diameter of the 50 bolts is 0.577cm. a. Stating your hypothesis clearly test, at the 1% level, whether or not there is evidence that the mean diameter of the bolts has changed. b. Find the critical region for ๐ in the above test. a ๐ฏ๐ : ๐ = ๐. ๐๐๐, ๐ฏ๐ :?๐ โ ๐. ๐๐๐ ๐ = ๐. ๐๐๐ Critical region is top 0.5% and bottom 0.5%, which from tables is: ๐ โค โ๐. ๐๐๐๐ or ๐ โฅ ๐. ๐๐๐๐ Test statistic: ๐. ๐๐๐ โ ๐. ๐๐๐ ๐= = โ๐. ๐๐๐ ๐. ๐๐๐ ๐๐ Value is NOT in critical region so accept ๐ฏ๐ and conclude no significant evidence that mean diameter has changed. ? ? ? b Critical region: ๐. ๐๐๐ ± ๐. ๐๐๐ × ๐.๐๐๐ ๐๐ โ ๐ฟ โค ๐. ๐๐๐ ๐๐ ๐ฟ โฅ ๐. ๐๐๐ ? Exercise 3F Reasoning about difference between means GIRLS BOYS ๐๐๐๐ฆ = 5๐๐ Sample: ๐1 = 25 ๐ฅ1 = 48๐๐ ๐๐๐๐๐ = 8๐๐ Sample: ๐2 = 30 ๐ฅ2 = 45๐๐ The weights of boys and girls in a certain school are known to be normally distributed with standard deviations of 5kg and 8kg respectively. A random sample of 25 boys had a mean weight of 48kg and a random sample of 30 girls had a mean weight of 45kg. Stating your hypothesis clearly test, at the 5% level of significance, whether or not there is evidence that the mean weight of boys in the school is greater than the mean weight of the girls. Reasoning about difference between means We know that ๐ is roughly normally distributed as the sample size becomes large. If ๐~๐ ๐๐ฅ2 ๐๐ฅ , ๐ and ๐~๐ ๐๐ฆ , ๐๐ฆ2 ๐ then by Chapter 1: ๐๐๐ ๐๐๐ ๐ฟ โ ๐ ~ ๐ต ๐๐ โ ? ๐๐, ๐ + ๐ And thus to produce a standardised variable ๐: ๐ฟ? โ ๐ โ ๐๐ โ ๐๐ ? ๐= ๐ ๐๐๐ ๐๐ ? ๐ + ๐ ! Test for difference between two means If ๐~๐ ๐๐ฅ , ๐๐ฅ2 and the independent variable ๐~๐ ๐๐ฆ , ๐๐ฆ2 then a test of the null hypothesis ๐ป0 : ๐๐ฅ = ๐๐ฆ can be carried out using the test statistic: ๐ โ ๐ โ ๐๐ฅ โ ๐๐ฆ ๐= 2 ๐๐ฅ2 ๐๐ฆ ๐ + ๐ If the same sizes ๐๐ฅ and ๐๐ฆ are large then the result can be extended, by the Central Limit Theorem, to include cases where the distributions of ๐ and ๐ are not normal. Example Q The weights of boys and girls in a certain school are known to be normally distributed with standard deviations of 5kg and 8kg respectively. A random sample of 25 boys had a mean weight of 48kg and a random sample of 30 girls had a mean weight of 45kg. Stating your hypothesis clearly test, at the 5% level of significance, whether or not there is evidence that the mean weight of boys in the school is greater than the mean weight of the girls. ๐ป0 : ๐๐๐๐ฆ =?๐๐๐๐๐ ๐1 = 5, ๐1 = 25 Test statistic: ๐ป1 : ๐๐๐๐ฆ > ? ๐๐๐๐๐ ๐2 = 8, ๐2 = 30 ๐ง= 48 ? โ 45 โ 0 ? 25 64 + 25 ? 30 = 1.6947 ? 5% (one-tailed) critical value for ๐ is ๐ง = 1.6449 so value is significant and you can reject ๐ป0 โฆand conclude there is evidence?that the mean weight of boys is greater than the mean weight of the girls. Test Your Understanding June 2013 Q6 ? ? ? ? Test Your Understanding โ Two Tailed Q A manufacturer of personal stereos can use batteries made by two difference manufacturers. The standard deviation of lifetimes for Never Die batteries is 3.1 and for Everlasting batteries it is 2.9 hours. A random sample of 80 Never Die batteries and a random sample of 90 Everlasting batteries were tested and their mean lifetimes were 7.9 hours and 8.2 hours respectively. Stating your hypotheses clearly test, at the 5% level of significance, whether there is evidence of a difference between the mean lifetimes of the two makes of batteries. ๐ฏ๐ : ๐๐ = ๐๐ ๐ฏ๐ : ๐๐ โ ๐๐ ๐๐ = ๐. ๐, ๐๐ = ๐๐, ๐๐ = ๐. ๐, ๐๐ = ๐๐ ๐. ๐ โ ๐. ๐ ๐= = โ๐. ๐๐๐ โฆ ๐ ๐ ๐. ๐ ๐. ๐? + ๐๐ ๐๐ The 5% two-tailed critical values for ๐ are ๐ = ±๐. ๐๐๐๐ So the value is not significant and you do not reject ๐ฏ๐ . No significant evidence of a difference in the mean lifetimes of the two makes of battery. Exercise 3G Q1, 3, 5, 7 Hypothesis Tests/Confidences when ๐ is large In our hypothesis tests/determining confidence intervals up to now, we have presumed we knew the population variance ๐ 2 (which in turn allowed us to find the variance the sampling distribution ๐ or the test statistic ๐ฅโ๐ ๐ ๐ ๐2 ๐ ). What might seem an obvious thing to use for ๐ 2 if we didnโt know it? We saw ๐๐ was an unbiased estimator for ๐๐ (i.e. a variance calculated from the sample with on average is the same as ๐๐ , i.e. ๐ฌ ๐๐ = ๐๐ ) This however is only appropriate for large? samples, and the test statistic is only an approximation. ! If the population is normal, or can assumed to be so, then for large samples, ๐โ๐ the statistic ๐ has an approximation ๐ 0,12 distribution. ๐ If the population is not normal, by assuming ๐ is a close approximation to ๐, ๐โ๐ then ๐ can be treated as having an approximate ๐(0,12 ) distribution. ๐ of Example Q As part of a study into the health of young schoolchildren a random sample of 220 children from area ๐ด and a second, independent random sample of 180 children from area ๐ต were weighed. The results are given in the table below: ๐ ๐ ๐ ๐จ๐๐๐ ๐จ 220 37.8 3.6 ๐จ๐๐๐ ๐ฉ 180 38.6 4.1 a. Test at the 5% level, whether or not there is evidence of a difference in the mean weight of children in the two areas. b. State an assumption you have made in carrying out this test. c. Explain the significance of the Central Limit Theorem to this test. a ๐ฏ๐ : ๐ ๐จ = ๐ ๐ฉ ๐ฏ๐ : ๐ ๐จ โ ๐ ๐ฉ Test statistic: ๐= b c ๐๐. ๐ โ ๐๐. ๐ ๐. ๐๐ ๐. ๐๐ ?= ๐. ๐๐ (๐๐๐) + ๐๐๐ ๐๐๐ Two-tail critical values are ๐ = ±๐. ๐๐. Since 2.05 > 1.96, result significantโฆ Test statistic requires ๐ so you have to assume that ๐๐ = ๐๐ for both samples. You are not told that populations are normally distributed but the samples are both large as so the Central Limit Theorem enables us to assume that ๐ฟ๐จ and ๐ฟ๐ฉ are both normal. ? ? Test Your Understanding May 2012 Q5 ? ? Exercise 3H Appendix PROOFS On following slides. Proving ๐ธ ๐ = ๐ Prove that ๐ is an unbiased estimator for ๐ when the population is normally distributed. A random sample ๐1 , ๐2 , โฆ , ๐๐ is taken from a population with ๐~๐ ๐, ๐ 2 1 ๐ + ๐2 + โฏ?+ ๐๐ ๐ 1 1 ๐ธ ๐ = ๐ธ ๐1 + โฏ + ๐?๐ ๐ 1 = ๐ธ ๐1 + โฏ +?๐ธ ๐๐ ๐ 1 = ๐ +โฏ+๐ ? ๐ 1 = ๐๐ ? ๐ ๐๐ = =๐ ? ๐ ๐= Proof that ๐๐๐ ๐ = ๐ 2 /๐ A random sample ๐1 , ๐2 , โฆ , ๐๐ is taken from a population with ๐~๐ ๐, ๐ 2 1 ๐ = ๐1 + ๐2 + โฏ + ๐๐ ๐ 1 ๐๐๐ ๐ = 2 ๐๐๐ ๐1 + ? โฏ + ๐๐ ๐ 1 = 2 ๐๐๐ ๐1 +?โฏ + ๐๐๐ ๐๐ ๐ 1 = 2 ๐ 2 + โฏ + ๐?2 ๐ 1 = 2 ๐๐ 2 ? ๐ ๐2 = ? ๐ Proof of unbiased estimator for ๐ 2 1 Prove that ๐ 2 = ๐โ1 ฮฃ๐ 2 โ ๐๐ 2 is an unbiased estimator for ๐ 2 Note first ๐ธ ๐ 2 = ๐ 2 +?๐2 And that ๐๐๐ ๐ = Thus ๐ธ ๐ 2 = ๐2 ๐ ๐2 ๐ ? +?๐2 (as proven earlier) and ๐ธ ๐ = ๐ ? (as proven earlier) 1 ฮฃ๐ 2 โ ๐๐ 2 ๐โ1 1 ๐ธ ๐2 = ๐ธ ฮฃ๐ 2 โ ๐๐ 2 ๐โ1 1 = ๐ธ ฮฃ๐ 2 โ ๐๐ธ ๐ 2 ๐โ1 ๐2 = ? ? Since ๐ธ ฮฃ๐ 2 = ฮฃ๐ธ ๐ 2 = ๐๐ธ ๐ 2 (as each thing in the sum is the same) ๐ ๐ธ ๐2 โ ๐ธ ๐2 ๐โ1 ๐ ๐2 2 2 = ๐ +๐ โ + ๐2 ๐โ1 ๐ ๐ ๐2 ๐ โ 1 = = ๐2 ๐โ1 ๐ ? ๐ธ ๐2 = ? ?