Download chapter6

STP 420 SUMMER 2002 STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES PART 2 – PROBABILITY AND INFERENCE CHAPTER 6 INTRODUCTION TO INFERENCE Introduction We want to be able to draw conclusions from the data collected through a sample. Statistical inference 1. Confidence intervals – used to estimate a population parameter 2. Tests of significance – used to assess the evidence of a claim Both are based on the sampling distributions of statistics. They report the probabilities that state what would happen if we used the inference method many times. Inference is dependent on the probability model of the data and is most reliable when the data are produced by a properly randomized design (random sample or randomized experiment). 1 STP 420 SUMMER 2002 6.1 Estimating with confidence If  = 500 for a population, and a sample is drawn from the population to give a mean of x = 465; we say that the sample mean 465 is an estimate of the population mean 500. If you take a second sample, you may not get the same mean. It is important to present the variation along with the estimate of the population mean. Statistical confidence If repeated samples of size n is taken from a population that has mean  and standard deviation , the sample mean x ~ N(, /n) i.e – the sample has mean equal to the population mean and standard deviation equal to the population standard deviation divided by the square root of the sample size. This implies that the bigger the sample size, the smaller the sample standard deviation since n is in the denominator. Remember the 68-95-99.7 rule ( is unknown but  is known) The probability is 0.68 (68%) that the sample mean x will be within 2 standard deviations of the population mean  (one standard deviation on each side). The probability is 0.95 (95%) that the sample mean x will be within 4 standard deviations of the population mean  (two standard deviation on each side). The probability is 0.997 (99.7%) that the sample mean x will be within 6 standard deviations of the population mean  (three standard deviation on each side). Eg: if x = 40 and  x = 4 the we can say that P(36    44) = 0.68 x lies within 8 points of  is equivalent to  is within 8 points of x 68% of all samples will capture the true  in the interval (36, 44) P(32    48) = 0.95 x lies within 16 points of  is equivalent to  is within 16 points of x 95% of all samples will capture the true  in the interval (32, 48) P(28    52) = 0.997 x lies within 24 points of  is equivalent to  is within 24 points of x 99.7% of all samples will capture the true  in the interval (28, 52) 2 STP 420 SUMMER 2002 Confidence Intervals – used to estimate a population parameter A 68% confidence interval for  is x  4 or (36, 44) A 95% confidence interval for  is x  8 or (32, 48) A 99.7% confidence interval for  is x  12 or (28, 52) Estimate  margin or error Where margin of error – shows how accurate (68%, 95%, or 99.7%, etc.) we believe our guess is and depends on the variability of the estimate. Confidence level (C) – how confident we are that the procedure will capture the true population mean  A level C confidence interval for a parameter is an interval computed from sample data by a method that has probability C of producing an interval containing the true value of the parameter. Confidence Interval for a population mean Choose an SRS of size n from a population having unknown mean  and known standard deviation . A level C confidence interval for  is x  z*  n where z* - value on standard normal curve with area C between –z* and z* Exact interval if population distribution is normal. Interval approximately correct for large n of populations with non normal distributions Estimate  z*estimate 3 STP 420 SUMMER 2002 How Confidence Intervals behave We can reduce margin of error by 1. using smaller C (confidence level) 2. Increase n 3. Reduce  Choosing the sample size The confidence interval for a population mean will have a specified margin or error m when the sample size is  z *  n    m  2 Some cautions 1. Data must be and SRS from the population. 2. Formula not correct for probability sampling designs more complex than an SRS. 3. No correct method for inference from data haphazardly collected with unknown bias. Fancy formulas cannot rescue badly produced data. 4. x is not resistant. Outliers can largely affect confidence intervals any may needed to be removed in a justified manner. 5. If n is small and population is not normal, true confidence level will be different from C. 6. Must know the standard deviation of the population. Bootstrap – new procedure for approximating sampling distributions when theory cannot tell us its shape. It is taking the original sample as a population and then take many samples (each is a resample) from it. 6.2 Tests of Significance – used to assess the evidence of a claim 4 STP 420 SUMMER 2002 Null hypothesis – statement being tested in a test of significance. Test is designed to assess the strength of the evidence against the null hypothesis. It is a statement of no effect or no difference. Eg. H0 :  = 187 Alternate hypothesis – statement we hope to suspect is true instead of H0 Eg. Ha :  > 187 Ha :  < 187 Ha :   187 one sided (right tailed test) one sided (left tailed test) two sided (two tailed test) A hypothesis always refers to some population or model, not to a particular outcome. Test statistic – sample statistic ( x ) that measures the compatibility between the null hypothesis and the data. z x / n where z has the standard normal distribution N(0, 1) P-value – the probability, computed assuming that H0 is true, that the test statistic would take a value as extreme or more extreme than that actually observed. The smaller the pvalue, the stronger the evidence against H0 provided by the data. Statistical significance – if the p-value is as small or smaller than , we say that the data are statistically significant at level . z Test for a population mean 5 STP 420 SUMMER 2002 To test the hypothesis H0 :  = 0 based on an SRS of size n from a population with unknown mean  and known standard deviation , compute the test statistic z x  0 / n In terms of a standard normal random variable Z, the p-value for a test of H0 against Ha :  > 0 is P(Z  z) Ha :  < 0 is P(Z  z) Ha :   0 is 2P(Z  |z|) These p-values are exact if the population distribution is normal and are approximately correct for large n in other cases (non normal populations) Confidence Intervals and Two-Sided Tests A level  two-sided significance test rejects a hypothesis H0 :  = 0 exactly when the value 0 falls outside a level 1 -  confidence interval for . P-values (P) versus fixed  p-value – smallest level  at which the data are significant. Critical value – z* value on the N(0, 1) curve with the specified area to its right/left. A test is significant when   P and not significant otherwise. 6

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download chapter6