Download 16 - Rice University

Statistics : Statistical Inference Krishna.V.Palem Kenneth and Audrey Kennedy Professor of Computing Department of Computer Science, Rice University 1 Sampling distribution of X Population  and  Sample 1 x1 2 Sample 2 x2 Sample 3 x3 Sample 4 x3 Sampling Distribution …… …… Sample k xk Central Limit Theorem (4) The mean of the sampling distribution of X is equal to the population mean, i.e. X   (5) Standard deviation of the sampling distribution of X is the population standard deviation divided by the square root of sample size, i.e. X  3  n Sampling distribution of X for a Normal population) N=1: X  1.41, SD  0.145 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 N=5: X  1.40, SD  0.065 1.8 N=10: X  1.40, SD  0.047 1.02 1.11 1.2 1.29 1.38 1.47 1.56 1.65 1.74 1.83 4 1.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 N=50: X  1.40, SD  0.020 1 1.05 1.13 1.2 1.27 1.351.43 1.5 1.57 1.65 1.73 1.8 1.87 Sampling dist. of X for a non-Normal population N=1: 1 1.1 1.2 N=50: 5 1 1.1 1.2 N=5:X X = 1.40, SD = 0.147 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 1.4 1.5 1.6 1.7 1.8 1.9 1.1 1.2 1.3 X X = 1.41, SD = 0.021 1.3 1 N=100: 2 = 1.40, SD = 0.066 1.4 1.5 1.6 1.7 1.8 1.9 2 = 1.41, SD = 0.015 1 1.06 1.151.24 1.331.42 1.5 1.58 1.671.76 1.851.942 Computer simulation of the sampling distribution of the sample mean  Pick any probability distribution and specify a mean and standard deviation.  Tell the computer to randomly generate 1000 observations from that probability distributions  E.g., the computer is more likely to spit out values with high probabilities  Plot the “observed” values in a histogram.  Next, tell the computer to randomly generate 1000 averages-of-2 (randomly pick 2 and take their average) from that probability distribution.  Plot “observed” averages in histograms.  Repeat for averages-of-10, and averages-of-100. 6 Uniform Distribution on [0,1]: average of 1 sample (original distribution) 7 Uniform Distribution: 1000 averages of 2 samples 8 Uniform Distribution: 1000 averages of 5 samples 9 Uniform Distribution: 1000 averages of 100 samples 10 Exponential Distribution: 1000 averages of 2 samples 11 Exponential Distribution: average of 1 sample (original distribution) 12 Exponential Distribution: 1000 averages of 5 samples 13 Exponential Distribution: 1000 averages of 100 samples 14 Contents  Summary of Statistics Learnt so Far  Statistical Inference  Central Limit Theorem and its implications  Estimation theory  Interval Estimation  What is Confidence Interval?  Tutorial 15 Estimation Theory  In statistics, estimation refers to the process by which one makes inferences about a population, based on information obtained from a sample.  Statisticians use sample statistics to estimate population parameters.  For example, sample means are used to estimate population means; sample proportions, to estimate population proportions. 16 Two types of Estimates  Point estimate. A point estimate of a population parameter is a single value of a statistic.  For example, the sample mean x is a point estimate of the population mean μ.  When we estimate the mean (μ) by x, the probability that we are exactly correct is close to zero, i.e. P(x= μ) ~ 0  Assuming, the population is heterogeneous and the sample size n << population size N  Hence, we are not very “confident” about our estimates we make using point estimates 17 Two Types of Estimates (contd.)  How can we be more confident about our estimates?  we want P(x = μ) to be a bigger value than zero  We can increase our confidence levels by using a less than precise estimates instead of point estimates  estimate in an interval instead of point  Interval estimate. An interval estimate is defined by two numbers, between which a population parameter is said to lie.  For example, a < x < b is an interval estimate of the population mean μ. It indicates that the population mean is greater than a but less than b. 18 Contents  Summary of Statistics Learnt so Far  Statistical Inference  Central Limit Theorem and its implications  Estimation theory  Interval Estimation  What is Confidence Interval?  Tutorial 19 History of Interval Estimation  Neyman (1937) identified interval estimation ("estimation by interval") as distinct from point estimation ("estimation by unique estimate").  he was the first to recognize and formulate interval estimation  work quoting results in the form of an estimate plus-or-minus a standard deviation was the interval estimation  his paper on this was titled "On the Two Different Aspects of the Representative Method: The Method of Stratified Sampling and the Method of Purposive Selection"  given at the Royal Statistical Society on 19 June 1934 20 You can download the paper from : http://stevereads.com/papers_to_read/on_the_two_different_aspects_of_the_representative_method.pdf What is an Interval Estimate?  In statistics, interval estimation is the use of sample data to calculate an interval of possible (or probable) values of an unknown population parameter  in contrast to point estimation, which is a single number.  Interval estimate. An interval estimate is defined by two numbers, between which a population parameter is said to lie.  for example, a < μ < b is an interval estimate of the population mean μ. indicates that the population mean is greater than a but less than b.  we use x to estimate this interval  21  Interval estimates provide  a "best estimate" of a parameter  an indication of the precision with which the parameter is known. Types of Interval Estimation  The most prevalent forms of interval estimation are:  confidence intervals  a frequentist method  credible intervals  a Bayesian method  Other common approaches to interval estimation, which are encompassed by statistical theory, are:  Tolerance intervals  Prediction intervals  used mainly in Regression Analysis 22 Of these, confidence intervals is the most common and widely used and hence, will be covered in more detail in this class Contents  Summary of Statistics Learnt so Far  Statistical Inference  Central Limit Theorem and its implications  Estimation theory  Interval Estimation  What is Confidence Interval?  Tutorial 23 What is a Confidence Interval?  In statistics, a confidence interval (CI) is an interval estimate of a population parameter.  instead of estimating the parameter by a single value, an interval likely to include the parameter is given.  confidence intervals are used to indicate the reliability of an estimate.  How likely the interval is to contain the parameter is determined by the confidence level  increasing the desired confidence level will widen the confidence interval.  Confidence intervals and interval estimates more generally have applications across the whole range of quantitative studies. 24 Example of Confidence Interval  For example, a confidence interval can be used to describe how reliable some opinion survey results are.  In a survey of election voting-intentions, the result might be that 40% of respondents intend to vote for a certain party.  A 95% confidence level for the proportion in the whole population having the same intention on the survey date might be in the confidence interval 36% to 44%.  From the same survey date one may calculate a smaller 90% confidence level for the proportion in the whole population of for instance in confidence interval 38% to 42%. All other things being equal, a survey result with a small confidence interval with a higher confidence level is more desired 25 Video on Confidence Interval 26 Example  In the whole of Houston, what percentage of adults do you think will want to watch a movie sometime in the next 10 days?  assume a variance of 0.0625 for the whole population  Choose a random sample of 10 adults and ask their opinion Will this be anywhere close to the actual percentage?  Let X be the random variable denoting the percentage of adults attending the movies out of the sample.  Xi be the value from ith sample How can we be sure to be closer to the actual mean? 27 Take very large number of samples Example (contd.)  But, taking large number of samples is generally not feasible.  We want to arrive at an estimate based on fewer samples.  For example, in the previous example, if you take only 1 sample of 10 people and found that 5 of the 10 people would like to go for a movie, then you can say  We are pretty sure that 50% of the adult population would want to go for a movie in the next 10 days. Isn’t this ambiguous? How sure is pretty sure? 28 Need to be more definitive Example (contd.)  We use confidence interval to remove the ambiguity What if we want to be 100% sure?  The only statement we can make which is 100% sure is that the 0%-100% of the adult population would want to watch a movie in the next 10 days. What if we want to be 50% sure?  This statement doesn’t hold much importance as you are wrong half the time Then, what kind of statements make sense?  90% sure or 95% sure or 98% sure or 99% sure 29 Confidence Levels Calculating Confidence Level  The general norm is to vary the interval by multiples of σ and compute the confidence level  σ is varied equally on the either side of the mean  The probability that μ is correct by the interval [x- σ,x+ σ] can be calculated as P( [ x   , x  ])  P( x      x  ) P( [ x   , x  ])  P(  x    )  Assuming Normal distribution, we get P([ x   , x  ])  0.6852 What if we increase the interval from 2σ to 4σ? P([ x  2 , x  2])  0.9544 30 Source for calculations: http://www.analyzemath.com/statistics/normal_calculator.html Confidence Level Table  Some of the most commonly used confidence levels in statistics are given in the table below: Confidence Level Number of σs away from mean 90% 1.64 95% 1.96 98% 2.33 99% 2.575  Less than 90% is generally not considered a strong enough confidence level to make a statement 31 Example (Contd.)  Let us continue with computing the confidence interval for our movie example  Assume that we took a random sample of 10 adults.  Among them, 5 adults said that they would like to go for the movie in the next 10 days  Hence, we get, mean (x)= 0.5 (denotes 50% ) and standard deviation =   0.0625  0.1581 (Var(x) = σ /n ) 2 10  Say, we want to be 95% confident about our estimation. 32 Example (Contd.)  From the table we can see that we have to be 1.96σ away from the mean.  Hence, we need to be 1.96*0.1581 = 0.31 away from the mean  Summarizing, we can now say with 95% confidence that the mean of the actual population will be between [0.5-0.31, 0.5+0.31] = [0.19,0.81] which is between 19%-81% of total population What if you want to be 98% confident? 33 Graphical Representation of Confidence Intervals Example A plot of a normal distribution (or bell curve). Each colored band has a width of one standard deviation. 34 Confidence Interval for  when  is known  A 95% confidence interval for  if  is known is given by: x  1.96  Overlay Plot  n 95% of the x ‘s lie between   1.96 0.4 Normal Density 0.3 0.2 0.1 95% 0 35 -3    1.96 n -2 -1  0 X 1    1.96 n 2 3 X  n Rationale for Confidence Interval  From the sampling distribution of X conclude that  and are within 1.96 standard errors (  ) of each other 95% of n the time  Otherwise stated, 95% of the intervals contain   So, the interval x can be taken as an interval that typically would include  x  1.96  36  n Example  A random sample of 80 tablets had an average potency of 15mg. Assume  is known to be 4mg.  x =15,  =4, n=80  A 95% confidence interval for  is 15  1.96  4 80 = (14.12 , 15.88) 38

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 16 - Rice University