Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Degrees of freedom (statistics) wikipedia , lookup
Sufficient statistic wikipedia , lookup
Foundations of statistics wikipedia , lookup
History of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Statistical inference wikipedia , lookup
Sampling (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Chapter 7 Sampling Distribution Basics Sampling and Sampling Distributions • Sample statistics (the mean and standard deviation are examples) vary from sample to sample. • Sample statistics are computed from random variables from a population and, as such are random variables themselves. • A sampling distribution is simply a probability distribution of a sample statistic. BIT 5724 Sampling Distributions • Generally we do not know the mean or variance of a random variable; and • Often the purpose of sampling is to estimate parameters (mean, variance, etc.) of a population. We use samples because: – The population is too large for a census; – It is too expensive to conduct a census; and/or – The units must be destroyed in order to test the variable(s) of interest, i.e. destructive testing. BIT 5724 Definitions • A parameter is a numerical descriptive measure of a population. It is calculated from the observations in the population. • A sample statistic is a numerical descriptive measure of a sample. It is calculated from the observations in the sample. BIT 5724 1 Chapter 7 Sample Statistics Example • Sample mean (used to estimate the population mean - a parameter); • Sample median; • Sample variance (used to estimate the population variance - another parameter); • Sample standard deviation (derived from the sample variance and used to estimate the population variance - another parameter). • We want to estimate the population mean: BIT 5724 BIT 5724 – Two possible sample statistics • Sample mean • Sample median - • Expected value (of the population) is still: 3 .5 • Mean of x is: x 13 / 3 4 .33 While median is: m 4 • Now which is closer to the true mean (expected value)? BIT 5724 m – Which one should be used? For example, toss a die three times and let x be the number of dots showing on the up face. Suppose we have 2, 2, and 6 come up: • • • • Example, cont. – What if we had sample measurements of 3, 4, and 6? x Expected value (of the population) is: 3 .5 Mean of x is: x 10 / 3 3 .33 While median is: m 2 Which is closer to the true mean (expected value)? Sampling Statistics • Since sampling statistics are random variables, they must be compared on the basis of their probability distributions - the collection of values and associated probabilities of each statistic that would be obtained if the sampling experiment were repeated a very large number of times. BIT 5724 2 Chapter 7 Definitions More Definitions • The sampling distribution for a sample statistic (calculated from a sample of n measurements) is the probability distribution for the statistic; or • The sampling distribution is a function that gives the probability of every possible value of a sample statistic for specified population and sample size. • A point estimator of a population parameter is a rule or formula that tells us how to use the sample data to create a single number that can be used as an estimate of the population parameter. • If a sample statistic has a sampling distribution with a mean equal to the population parameter the statistic is intended to estimate, the statistic is said to be an unbiased estimator of the parameter. BIT 5724 BIT 5724 And More Definitions Sampling Distribution of the Sample Mean • If the mean of the sampling distribution is not equal to the parameter, the statistic is said to be a biased estimator of the parameter. • Often we are interested in making an inference about the mean of some population, . The sample mean is a good choice as the estimator for . BIT 5724 BIT 5724 3 Chapter 7 Variability among Samples Point Estimates S estimates estimates 23 24 25 26 23.5 mpg BIT 5724 27 28 29 27.5 mpg BIT 5724 Normal Distribution for the Mean Distribution Revisited Useful Useful Probabilities for Normal Distributions 68% 95% 99% The Mean and Standard Deviation of Sampling Distribution of x • Regardless of the shape of the population relative frequency distribution: – The mean of the sampling distribution of x will equal , the mean of the sampled population. – The standard deviation of the sampling distribution of will equal , the standard deviation of the sampled population divided by the square root of the sample size n: • Confidence intervals assume that the sample means BIT 5724 are normally distributed. x x n (often referred to as the standard error of the mean) BIT 5724 4 Chapter 7 Standard Error of the Mean • A statistic that measures the variability of your estimate is the standard error of the mean. • It differs from the sample standard deviation because the sample standard deviation is a measure of the variability of data the standard error of the mean is a measure of the variability of sample means. Standard error of the mean = s n = Example • Let x be a normally distributed random variable with a mean of 89 and a standard deviation of 12: – What is the probability that the mean of a sample of size n=19 will be between 85 and 93? – What is the probability that the mean of a sample of size n=40 will exceed 91? s X BIT 5724 BIT 5724 Answer to First Part x x n So, x z Answer to Second Part 12 2.753 19 n So, x x x 85 89 1.45 2.753 93 89 And , z 1.45 2.753 12 1.897 40 So, z z p( 1.45 z 1.45) 0.4265 0.4265 0.8530 n 29, p( 1.8 z 1.8) 0.9266 BIT 5724 91 89 1.05 1.897 p ( z 1.05) 0.500 0.3531 0.1469 BIT 5724 5 Chapter 7 Example Answer • The population of orders for printing jobs at a print shop is approximately normal with a mean of 200 pages and a standard deviation of 40 pages. The shop is almost out of paper and it has five orders that must be finished before a shipment of paper can be expected. If the shop has 1,200 sheets of paper left, what is the probability that the five orders will not exhaust the stock of paper? • Hint: Find P( x 240) BIT 5724 x n So, x z 40 17.889 5 240 200 2.236 17.889 p( z 2.236) 0.500 0.4875 0.9875 BIT 5724 Example • Let x be a random variable with a mean of 1,200 and a standard deviation of 20: – What is the probability that the mean of a sample of size 80 will exceed 1,202? – What is the probability that the mean of a sample of size 50 will be less than 1,202? – If the probability that the mean of a sample of size n will exceed 1,201 is 0.25, what must n equal? BIT 5724 Answers • Part 1 - 0.1867 • Part 2 - 0.7611 • Part 3 - 180 BIT 5724 6 Chapter 7 Central Limit Theorem • If a random sample of n observations is selected from a population, when n is sufficiently large, the sampling distribution of x will be approximately a normal distribution. Typically, a sample size of n 30 is considered large enough. The larger the sample size n, the better the normal approximation. BIT 5724 Central Limit Theorem, Illustrated Normality and the Central Limit Theorem • To satisfy the assumption of normality, you can do one of the following: verify that the population distribution is approximately normal apply the central limit theorem • The central limit theorem states that the distribution of sample means is approximately normal, regardless of the population distribution’s shape, if the sample size is large enough. • “Large enough” is usually approximately 30 observations. It is more if the data are heavily skewed, and fewer if the data are symmetric. BIT 5724 Sampling Distribution of the Proportion • We are often interested in making an inference about the proportion of some population, p. • Examples: – Proportion of freshman that graduate from Virginia Tech in four years. – Proportion of defective items in a lot. – Proportion of a set of loans that will become nonperforming. BIT 5724 BIT 5724 7 Chapter 7 The Sample Proportion and Standard Deviation of the Number of Successes • The sample proportion p is the value of the random variable x divided by the sample X size. p Normal Approximation to the Sampling Distribution of the Proportion np 5 • Rules: n (1 p ) 5 n • The standard deviation of the sampling distribution is: • Z-value for sampling distribution for p: p (1 p ) n BIT 5724 Z p p p BIT 5724 Example • If a sample of size 100 is taken from a population of size 1000 and the population contains 300 successes: – What is the probability that the sample proportion of successes will be 0.35 or more? – What is the probability that the sample proportion of successes will be between 0.25 and 0.45? Answers • Part a: p (1 p ) 0 . 3 (1 0 . 3 ) 0 . 0458 n 100 0 . 35 0 . 30 z 1 . 09 0 . 0458 p ( p 0 . 35 ) p ( z 1 . 09 ) 0 . 5 0 . 3621 0 . 1379 • Part b: p ( 0 . 25 p 0 . 45 ) p ( 1 . 09 z 3 . 28 ) 0 . 3621 0 . 5 0 . 8621 BIT 5724 BIT 5724 8 Chapter 7 Example • An advertising campaign for a new perfume has a goal of reaching 50% of the women in the target group. Suppose a national sample of 300 women from the target group is drawn to see how the campaign in working. 129 women in the group can recall seeing an ad or commercial for the new perfume. If the population proportion was 0.50, what is the probability of observing a sample proportion of 0.43 or less in a sample of 300? BIT 5724 p (1 p ) 0.5(1 0.5) 0.0289 n 300 p p 0.43 0.5 Z 2.42 p 0.0289 p ( p 0.43) p ( z 2.42 ) 0.5 0.4922 0.0078 BIT 5724 From Here To Inference • The primary function of getting a sampling distribution is to produce a statistical inference. • Probability distributions allow us to make probability statements about values of a random variable. Thus, knowledge of the population and its parameters allows us to use the probability distribution to make probability statements about individual members of the population. BIT 5724 Answer From Here To Inference (cont.) • With sampling distributions, knowledge of the parameters and some information about the distribution allow us to make probability statements about a sample statistic. • In applying both probability distributions and sampling distributions, we must know the value of relevant parameters, a highly unlikely circumstance. In the real world, parameters are almost always unknown because they represent descriptive measurements about extremely large populations. • Statistical inference addresses this problem—now we will assume that most population parameters are BIT 5724 unknown. 9