Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Foundations of statistics wikipedia , lookup
History of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Statistical inference wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Resampling (statistics) wikipedia , lookup
German tank problem wikipedia , lookup
Statistics 400 - Lecture 8 Completed so far (any material discussed in these sections is fair game): 2.1-2.5 4.1-4.5 5.1-5.8 (READ 5.7) 6.1-6.4; 6.6 7.1-7.2 Today: finish 7.3, 8.1-8.3 READ 7.4!!! Assignment #3: 6.2, 6.6, 6.34, 6.78 (interpret the plot in terms of Normality), 7.20, 7.28, 8.14, 8.22, 8.36 Due: Tuesday, Oct 16 Central Limit Theorem In a random sample (iid sample) from any population with mean and standard deviation when n is large, the distribution of the sample mean is approximately normal. x That is, Thus, x Z / n Implications So, for random samples, if have enough data, sample mean is approximately normally distributed...even if data not normally distributed If have enough data, can use the normal distribution to make probability statements about x Example A busy intersection has an average of 2.2 accidents per week with a standard deviation of 1.4 accidents Suppose you monitor this intersection of a given year, recording the number of accidents per week. Data takes on integers (0,1,2,...) thus distribution of number of accidents not normal. What is the distribution of the mean number of accidents per week based on a sample of 52 weeks of data Example What is the approximate probability that x is less than 2 What is the approximate probability that there are less than 100 accidents in a given year? Statistical Inference (Chapter 8) Would like to make inferences about a population based on samples The fatality rate for a disease is 50%. In controlled study, 100 patients with a disease are given a new drug. Would you conclude that the drug is successful if: 100% of the patients survived 75% of the patients survived 55% of the patients survived 52% of the patients survived Statistical inference deals with drawing conclusions about population parameters from the analysis of sample data Estimation of parameters Estimate a single value for a parameter (point estimation) Estimate a plausible range of values for a parameter (interval estimation) Testing of hypothesis Procedure for testing whether data supports a hypothesis or theory Point Estimation Objective: to estimate a population parameter based on sample data Point estimator is a statistic that estimates a population parameter Standard deviation of the statistic is called the standard error (most of the time) Example Sample mean: How do you estimate the standard error? If have a random sample of size n from a normal population, what is the distribution of the sample mean? If the sampling procedure is done repeatedly, what proportion of sample means lie in the interval 2 , 2 ? If the sampling procedure is done repeatedly, what proportion of sample means lie in the interval 3 , 3 ? When estimating with , the 100(1- )% margin of error, d, is the value where 100(1- )% of the sample means will fall in the interval d , d x For large samples, d z / 2 n Sample Size Calculation Before collecting data, should have some desired margin of error, d and an associated probability Based on this can determine appropriate sample size d z / 2 n What does this sample size guarantee? Example (8.12) Standard deviation of heights of 5 year-old boys is 3.5 inches How many boys must be sampled if we want to be 90% certain that the population mean height is within 0.5 inches? Confidence Intervals for the Mean Last day, introduced a point estimator…a statistic that estimates a population parameter Often more desirable to present a plausible range for the parameter, based on the data We will call this a confidence interval Ideally, the interval contains the true parameter value In practice, not possible to guarantee because of sample to sample variation Instead, we compute the interval so that before sampling, the interval will contain the true value with high probability This high probability is called the confidence level of the interval Confidence Interval for for a Normal Population Situation: Have a random sample of size n from N ( , ) Suppose value of the standard deviation is known Value of population mean is unknown Last day we saw that 100(1 )% of sample means will fall in the interval: z , z /2 /2 n n Therefore, before sampling the probability of getting a sample mean in this interval is (1 ) Equivalently, P z / 2 X z / 2 (1 ) n n Equivalently, P X z / 2 X z / 2 (1 ) n n The interval below is called a 100(1 )% confidence interval for X z , X z /2 /2 n n Example To assess the accuracy of a laboratory scale, a standard weight known to be 10 grams is weighed 5 times The reading are normally distributed with unknown mean and a standard deviation of 0.0002 grams Mean result is 10.0023 grams Find a 90% confidence interval for the mean