Download Sampling distribution of

Objectives (BPS chapter 11) Sampling distributions  Parameter versus statistic  The law of large numbers  What is a sampling distribution?  The sampling distribution of  The central limit theorem  Statistical process control x Reminder:   Parameter versus statistic Population: the entire group of individuals in which we are interested but can’t usually assess directly. A parameter is a number describing a characteristic of the population. Parameters are usually unknown.  Sample: the part of the population we actually examine and for which we do have data.  A statistic is a number describing a characteristic of a sample. We often use a statistic to estimate an unknown population parameter. Population Sample The law of large numbers Law of large numbers: As the number of randomly-drawn observations (n) in a sample increases, the mean of the sample (x) gets closer and closer to the population mean m (quantitative variable).  the sample proportion (p̂ ) gets closer and closer to the population proportion p (categorical variable). What is a sampling distribution? The sampling distribution of a statistic is the distribution of all possible values taken by the statistic when all possible samples of a fixed size n are taken from the population. It is a theoretical idea—we do not actually build it. The sampling distribution of a statistic is the probability distribution of that statistic. Note: When sampling randomly from a given population,  the law of large numbers describes what happens when the sample size n is gradually increased.  The sampling distribution describes what happens when we take all possible random samples of a fixed size n. Sampling distribution of x (the sample mean) We take many random samples of a given size n from a population with mean m and standard deviation s. Some sample means will be above the population mean m and some will be below, making up the sampling distribution. Sampling distribution of “x bar” Histogram of some sample averages For any population with mean m and standard deviation s: The mean, or center of the sampling distribution of population mean m.  x , is equal to the The standard deviation of the sampling distribution is s/√n, where n is the sample size.  Sampling distribution of s/√n m x  Mean of a sampling distribution of x: There is no tendency for a sample mean to fall systematically above or below m, even if the distribution of the raw data is skewed. Thus, the mean of the sampling distribution of x is an unbiased estimate of the population mean m —it will be “correct on average” in many samples.  Standard deviation of a sampling distribution of x: The standard deviation of the sampling distribution measures how much the sample statistic x varies from sample to sample. It is smaller than the standard deviation of the population by a factor of √n.  Averages are less variable than individual observations. For normally distributed populations When a variable in a population is normally distributed, then the sampling distribution of x for all possible samples of size n is also normally distributed. Sample means If the population is N(m,s), then the sample means distribution is N(m,s/√n). Population IQ scores: population vs. sample In a large population of adults, the mean IQ is 112 with standard deviation 20. Suppose 200 adults are randomly selected for a market research campaign.  The distribution of the sample mean IQ is A) exactly normal, mean 112, standard deviation 20. B) approximately normal, mean 112, standard deviation 20. C) approximately normal, mean 112 , standard deviation 1.414. D) approximately normal, mean 112, standard deviation 0.1. C) approximately normal, mean 112, standard deviation 1.414. Population distribution: N (m = 112; s = 20) Sampling distribution for n = 200 is N (m = 112; s /√n = 1.414) Application Hypokalemia is diagnosed when blood potassium levels are low, below 3.5mEq/dl. Let’s assume that we know a patient whose measured potassium levels vary daily according to a normal distribution N(m = 3.8, s = 0.2). If only one measurement is made, what's the probability that this patient will be misdiagnosed hypokalemic? z (x  m) s 3.5  3.8  0.2 z = 1.5, P(z < 1.5) = 0.0668 ≈ 7% If instead measurements are taken on four separate days, what is the probability of such a misdiagnosis? ( x  m ) 3.5  3.8 z  s n 0.2 4 z = 3, P(z < 1.5) = 0.0013 ≈ 0.1% Note: Make sure to standardize (z) using the standard deviation for the sampling distribution. Practical note   Large samples are not always attainable.  Sometimes the cost, difficulty, or preciousness of what is studied limits drastically any possible sample size.  Blood samples/biopsies: no more than a handful of repetitions acceptable. Often we even make do with just one.  Opinion polls have a limited sample size due to time and cost of operation. During election times, though, sample sizes are increased for better accuracy. Not all variables are normally distributed.  Income is typically strongly skewed for example.  Is x still a good estimator of m then? The central limit theorem Central Limit Theorem: When randomly sampling from any population with mean m and standard deviation s, when n is large enough, the sampling distribution of x is approximately normal: N(m,s/√n). Population with strongly skewed distribution Sampling distribution of x for n = 2 observations  Sampling distribution of x for n = 10 observations Sampling distribution of x for n = 25 observations Income distribution Let’s consider the very large database of individual incomes from the Bureau of Labor Statistics as our population. It is strongly right-skewed.  We take 1000 SRSs of 100 incomes, calculate the sample mean for each, and make a histogram of these 1000 means.  We also take 1000 SRSs of 25 incomes, calculate the sample mean for each, and make a histogram of these 1000 means. Which histogram corresponds to the samples of size 100? 25? How large a sample size? It depends on the population distribution. More observations are required if the population distribution is far from normal.  A sample size of 25 is generally enough to obtain a normal sampling distribution from a strong skewness or even mild outliers.  A sample size of 40 will typically be good enough to overcome extreme skewness and outliers. In many cases, n = 25 isn’t a huge sample. Thus, even for strange population distributions we can assume a normal sampling distribution of the mean, and work with it to solve problems. Statistical process control Industrial processes tend to have normally distributed variability, in part as a consequence of the central limit theorem applying to the sum of many small influential factors. Random samples taken over time can thus be used to easily verify that a given process is not getting out of “control.” What is statistical control? A variable that continues to be described by the same distribution when observed over time is said to be in statistical control, or simply in control. Process-monitoring What are the required conditions? We measure a quantitative variable x that has a normal distribution. The process has been operating in control for a long period, so that we know the process mean µ and the process standard deviation σ that describe the distribution of x as long as the process remains in control. An x control chart displays the average of samples of size n taken at regular intervals from such a process. It is a way to monitor the process and alert us when it has been disturbed so that it is now out of control. This is a signal to find and correct the cause of the disturbance. x control charts For a process with known mean µ standard deviation σ, we calculate the mean x of samples of constant size n taken at regular intervals. Plot x (vertical axis) against time (horizontal axis).  Draw a horizontal center line at µ.  Draw two horizontal control limits at µ ± 3σ/√n (UCL and LCL).  A machine tool cuts circular pieces. A sample of four pieces is taken hourly, giving these average measurements (in 0.0001 inches from the specified diameter). Because measurements are made from the specified diameter, we have a given target µ = 0 for the process mean. The process standard deviation σ = 0.31. What is going on? x xx x x x For the x chart, the center line is 0 and the control limits are ±3σ/√4 = ± 0.465. Sample x 1 −0.14 2 0.09 3 0.17 4 0.08 5 −0.17 6 0.36 7 0.30 8 0.19 9 0.48 10 0.29 11 0.48 12 0.55 13 0.50 14 0.37 15 0.69 16 0.47 17 0.56 18 0.78 19 0.75 20 0.49 21 0.79 The process mean has drifted. Maybe the cutting blade is getting dull, or a screw got a bit loose.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Sampling distribution of