Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PARAMETRIC STATISTICAL INFERENCE INFERENCE: • Methodologies that allow us to draw conclusions about population parameters from sample statistics TYPES OF INFERENCE: 1. Estimation 2. Hypothesis testing • Methods based on statistical relationships between samples and populations • POINT ESTIMATION: estimation of parameter from a sample statistic – • For the mean, standard deviation, etc.. INTERVAL ESTIMATION: using a sample to identify an interval within which the population parameter is thought to lie, with a certain probability ESTIMATION OF POPULATION MEAN • Sample mean value is only an estimate of the parameter mean value – Parameter value is not known • Due to sampling variability, no two samples will produce exactly the same outcome, or sample mean Can we estimate how this sample mean value would vary if you take many large samples from the same population? Remember: sample mean values from large samples have a normal distribution the mean of the sampling distribution is the same as the unknown parameter • standard deviation of x for a SRS of size n is ? PARAMETRIC STATISTICAL INFERENCE: ESTIMATION • Example: A random sample of 350 male college students were asked for the number of units they were taking. The mean was 12.3 units, with a standard deviation of 2.50 units. • What can we say about the mean number of units of all student males at the university? How will the estimate value of the parameter vary from one sample to another with a certain confidence, like 95%? Assume that = ?. s = ? PARAMETRIC STATISTICAL INFERENCE: ESTIMATION Statistical confidence Remember: The 68-95-99.7 rule In 95% of all samples, the mean score of x will lie within 2 standard deviations of the population mean score . Since s = 2.50, we can say that In 95% of samples, will lie within 5.0 points of the observed sample mean In 95% of all samples, x 5.0 x 5.0 • Thus, the parameter will lie between 7.3 and 17.3, in 95% of samples PARAMETRIC STATISTICAL INFERENCE: ESTIMATION Rephrasing: 1. We are 95% confident that the interval 7.317.3 contains • We have just assigned statistical confidence to our estimation of the parameter • We call this estimated interval a CONFIDENCE INTERVAL for the mean value PARAMETRIC STATISTICAL INFERENCE: ESTIMATION But, there is still some chance that the true parameter value will not lie in the identified interval • e.g. The SRS chosen was one of few samples for which x is not within 5.0 points of true mean. 5% of samples will give these incorrect results PARAMETRIC STATISTICAL INFERENCE: ESTIMATION CONFIDENCE INTERVAL – formal definition A level C confidence interval for a parameter is defined as estimate margin of error and gives the interval that will capture the true parameter value in repeated samples with a certain probability Confidence intervals usually vary between 90% and 99.9% PARAMETRIC STATISTICAL INFERENCE: ESTIMATION BUILDING CONFIDENCE INTERVALS If we know the parameter and , we can standardize the sample mean. The result is the ONE-SAMPLE Z STATISTIC z x n The z statistic tells us how far the observed x is from , in units of standard deviations of x . Because x has a normal distribution, z has the standard normal distribution N(0,1). PARAMETRIC STATISTICAL INFERENCE: ESTIMATION Constructing confidence intervals When we construct a 95% confidence interval, we are looking for two values for which there is a 95% chance that the population mean is between them. So, P(Low < < High) = 0.95 Thus, 0.95 = P(-1.96 < z < 1.96) = P( 1.96 x 1.96) n x 1.96 ) = P( 1.96 n n = P( x 1.96 0.95 = P( x 1.96 n n x 1.96 x 1.96 n n ) ) PARAMETRIC STATISTICAL INFERENCE: ESTIMATION Draw a SRS of size n from a population having unknown mean , and known standard deviation . A level C confidence interval for x z / 2 x z / 2 n n x z / 2 n This interval is exact when the population distribution is normal and is approximately correct for large n in other cases 1 C where represents the probability that the interval will not capture the true parameter value in repeated sample or confidence level, and C is the confidence level. Confidence intervals and confidence levels of Standardized normal curve N(0,1) Figure 6.5 and figure 6.6 z* = z/2 C = chosen confidence level – probability that a parameter will lie within a given interval with a desired confidence (1-C)/2 = probability that a parameter will be situated either above or below the the lower confidence limit = /2 PARAMETRIC STATISTICAL INFERENCE: ESTIMATION x z / 2 Example: n • A manufacturer of pharmaceutical products analyzes a specimen from each batch of a product to verify the concentration of the active ingredient. The chemical analysis is not perfectly precise. Repeated measurements on the same specimen give slightly different results. The results of repeated measurements follow a normal distribution. The analysis procedure has no bias, so the mean of the population of all measurements is the true concentration in the specimen. The standard deviation of this distribution is known to be 0.0068 g/l. Three analyses of one specimen give the following concentrations 0.8403 0.8363 0.8447 • Calculate the 99% confidence interval for the true concentration. PARAMETRIC STATISTICAL INFERENCE: ESTIMATION INTERVAL ESTIMATION OF WITH UNKNOWN x z / 2 • • n x z / 2 n replaced with estimate s – introduces more uncertainty STUDENT’S T-DISTRIBUTION not standard normal curve x t s n x t / 2,n 1 x t / 2 s n s s x t / 2 n n PARAMETRIC STATISTICAL INFERENCE: ESTIMATION INTERVAL ESTIMATION OF WITH UNKNOWN Intervals derived from t-distribution are wider than those found with z-distribution For large samples (n=>30), it makes no difference which distribution we use to estimate confidence interval PARAMETRIC STATISTICAL INFERENCE: ESTIMATION HOW CONFIDENCE INTERVALS BEHAVE Ideal situation – high confidence and small margin of error Margin of error (E) = z / 2 The smaller the margin of error, the more precise our estimation of n PARAMETRIC STATISTICAL INFERENCE: ESTIMATION Properties of error 1. Error increases with smaller sample size For any confidence level, large samples reduce the margin of error 2. Error increases with larger standard Deviation As variation among the individuals in the population increases, so does the error of our estimate 3. Error increases with larger z values Tradeoff between confidence level and margin of error Interval width (error) increases with Increased confidence level Higher confidence levels have Higher z values Figure 8-10 and 8-11 Error is high in small samples PARAMETRIC STATISTICAL INFERENCE: ESTIMATION Example: Calculate the 99% confidence interval for sample size of 1. = 0.8404, = 0.0068 99% confidence interval for n=3 was 0.8303 to 0.8505 g/l How do these compare in relation to the mean? Which one has the larger margin of error? CHOOSING SAMPLE SIZE Sometimes we wish to estimate our mean within a certain margin of error. • Sometimes we wish to determine a certain sample size in order to achieve a given margin of error • Here is how… Remember: Margin of error (E) = z / 2 n To obtain a desired value of E, for a given confidence level, you need to figure out n. From the above, z / 2 n E 2 It is the sample size that determines the margin of error • Required sample size depends on the desired level of confidence CHOOSING SAMPLE SIZE Example: Management asks the pharmaceutical laboratory to produce results accurate to within 0.005 with 95% confidence. How many measurements must be averaged to comply with this request? m = 0.005 g/l For 95% confidence level, z = ? = 0.0068 g/l CHOOSING SAMPLE SIZE Example: Management asks the pharmaceutical laboratory to produce results accurate to within 0.005 with 95% confidence. How many measurements must be averaged to comply with this request? m = 0.005 g/l For 95% confidence level, z = 1.960. = 0.0068 g/l z 1.96 0.0068 n 7.1 m 0 . 005 2 is n = 7 or n = 8? Choose one that will give a smaller margin of error. How should we always round to meet the requirements necessary? SUMMARY All formulas for inference are only correct under certain conditions o Most inference methods have several assumptions attached to them that must be met if the outcomes produced by them are to be reliable. Confidence interval formula has the following assumptions: 1. The data must come from a simple random sample. different methods exist for stratified and multistage samples undercoverage and non-response can add error 2. X bar must be a random normal variable 3. There must be no outliers. Is the formula sensitive to outliers? 4. If sample size is small (<15) and/or is not known but distribution of x still normal, t-distribution must be used to compute interval 5. When sigma is known use z-distribution. For large sample sizes we can assume that = s and use either z or t distributions