Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
IQL Chapter 8 – From Samples to Populations Statistical Reasoning for everyday life, Bennett, Briggs, Triola, 3rd Edition 8.1 Sampling Distributions Using information gathered from a small sample to convey information about the population is called: Inferential Statistics. LEARNING GOAL Understand the fundamental ideas of sampling distributions and how the distribution of sample means and the distribution of sample proportions are formed. Also learn the notation used to represent sample means and proportions. SAMPLE MEANS: THE BASIC IDEA What is a distribution of Sample Means (Sampling Distribution of the Mean)? X A distribution of sample means ( ); a “distribution of a statistic [in this case a sample mean] over repeated sampling from a specified population” Based on all possible random samples of size n, from a population Can inform us of the degree of sample-to-sample variability we should expect due to chance 08 Sampling Distribution of the Mean Distribution of Statistics • The Shape is a Heap! • μ x • Central Limit Theorem • Standard Error Distribution of Statistics 8.1 As noted in earlier chapters, statistics are the measures of a sample. The measures are used to characterize the sample and to infer measures of the population termed parameters. Parameter A parameter is a numerical description of a population. Examples include the population mean μ and the population standard deviation σ. Statistic IQL Chapter 8: From Samples to Populations Page 1 A statistic is a numerical description of a sample. Examples include a sample mean x and the sample standard deviation sx. Good samples are random samples where any member of the population is equally likely to be selected and any sample of any size n is equally likely to be selected. Consider four samples selected from a population. The samples need not be mutually exclusive as shown, they may include elements of other samples. The sample means x1, x2, x3, x4, can include a smallest sample mean and a largest sample mean. Choosing a number of bins can generate a histogram for the sample means. The question this chapter answers is whether the shape of the distribution of sample means from a population is any shape or a specific shape. Sampling Distribution of the Mean The shape of the distribution of the sample mean is not any possible shape. The shape of the distribution of the sample mean, at least for good random samples with a sample size larger than 30, is a normal distribution. That is, if you take random samples of 30 or more elements from a population, calculate the sample mean, and then create a relative frequency distribution for the means, the resulting distribution will be normal. In the following diagram the underlying data is bimodal and is depicted by the light blue columns. Thirty data elements were sampled forty times and forty sample means were calculated. A relative frequency histogram of the sample means is plotted in a heavy black outline. Note that though the underlying distribution is bimodal, the distribution of the forty means is heaped and close to symmetrical. The distribution of the forty sample means is normal. IQL Chapter 8: From Samples to Populations Page 2 In the following diagram the underlying data is bimodal and is depicted by the columns with thin outlines. Thirty data elements were sampled forty times and forty sample means were calculated. A relative frequency histogram of the sample means is plotted with a heavy black outline. Note that though the underlying distribution is bimodal, the distribution of the forty means is heaped and close to symmetrical. The distribution of the forty sample means is normal. The center of the distribution of the sample means is, theoretically, the population mean. To put this another simpler way, the average of the sample averages is the population mean. Actually, the average of the sample averages approaches the population mean as the number of sample averages approaches infinity. http://www.comfsm.fm/~dleeling/statistics/notes008.html#distributionofstatistics SAMPLE MEANS WITH LARGER POPULATIONS Population Mean: The true mean of a population, denoted by the Greek letter µ (pronounced “mew”) Most statistical studies work with populations that are larger in order to get a more accurate representation, in order to avoid a sampling error. Sampling Error The sampling error is the error introduced because a random sample is used to estimate a population parameter. It does not include other sources of error, such as those due to biased sampling, bad survey questions, or recording mistakes. Notation for Population and Sample Means IQL Chapter 8: From Samples to Populations Page 3 The Distribution of Sample Means SAMPLE PROPORTIONS Much of what we have learned about distributions of samples means carries over to distribution of sample proportions. This proportion is another example of the sample statistic. In this case, it is a sample proportion, in this case we use the the population proportion, p. (read “p-hat”) to distinguish this sample proportion from Notation for Population and Sample Proportions n = sample size p = population proportion = sample proportion The Distribution of Sample Proportions The distribution of sample proportions is the distribution that results when we find the proportions ( ) in all possible samples of a given size. The larger the sample size, the more closely this distribution approximates a normal distribution. In all cases, the mean of the distribution of sample proportions equals the population proportion. If only one sample is available, its sample proportion, proportion, p. IQL Chapter 8: From Samples to Populations , is the best estimate for the population Page 4 8.2 Estimating the Population Mean Estimating a Population Mean: The Basics Confidence Interval: A range of values associated with a confidence level, such as 95%, that is likely to contain the truel value of a population parameter. Margin of Error: The maximum likely difference between an observed sample statistic and the true value of a population parameter. Its size depends on the desired level of confidence. 95% Confidence Interval for a Population Mean The margin of error for the 95% confidence interval is where s is the standard deviation of the sample. We find the 95% confidence interval by adding and subtracting the margin of error from the sample mean. That is, the 95% confidence interval ranges from (x – margin of error) to (x + margin of error) We can write this confidence interval more formally as x–E<μ<x+E or more briefly as x±E EXAMPLE 1 Computing the Margin of Error Compute the margin of error and find the 95% confidence interval for the protein intake sample of n = 267 men, which has a sample mean of x = 77.0 grams and a sample standard deviation of s = 58.6 grams. Solution: The sample size is n = 267 and the standard deviation for the sample is s = 58.6, so the margin of error is IQL Chapter 8: From Samples to Populations Page 5 INTERPRETING THE CONFIDENCE INTERVAL What are confidence intervals? Confidence intervals provide different information from that arising from hypothesis tests. Hypothesis testing produces a decision about any observed difference: either that the difference is ‘statistically significant’ or that it is ‘statistically non-significant’. In contrast, confidence intervals provide a range about the observed effect size. This range is constructed in such a way that we know how likely it is to capture the true – but unknown – effect size. Thus, the formal definition of a confidence interval is: ‘a range of values for a variable of interest [in our case, the measure of treatment effect] constructed so that this range has a specified probability of including the true value of the variable. The specified probability is called the confidence level, and the end points of the confidence interval are called the confidence limits’.9 It is conventional to create confidence intervals at the 95% level – so this means that 95% of the time properly constructed confidence intervals should contain the true value of the variable of interest. This corresponds to hypothesis testing with p-values, with a conventional cut-off for p of less than 0.05. More colloquially, the confidence interval provides a range for our best guess of the size of the true treatment effect that is plausible given the size of the difference actually observed. http://www.medicine.ox.ac.uk/bandolier/painres/download/whatis/what_are_conf_inter.pdf CHOOSING SAMPLE SIZE 2s n 2 In order to estimate the population mean with a specified margin of error of at most E, the size of the sample should be at least 2 2 n where σ is the population standard deviation (often estimated by the sample standard deviation s). IQL Chapter 8: From Samples to Populations Page 6 8.3 Estimating Population Parameters THE BASICS OF EXTIMATING A POPULATION PROPORTION Population Proportion: The true proportion of some characteristic in a population, denoted by p. Estimating a Population Proportion Why Proportions ? There are many times when the easiest, most appropriate, or most illuminating way to frame an issue is in terms of a proportion – i.e. the ratio of a part to a whole - To pick one random example from billions of possible examples, the authors of the US Constitution provided two methods for amending the Constitution :The first method is for a bill to pass both halves of the legislature, by a two-thirds majority in each. Once the bill has passed both houses, it goes on to the states The second method prescribed is for a Constitutional Convention to be called by two-thirds of the legislatures of the States, and for that Convention to propose one or more amendments. These amendments are then sent to the states to be approved by three-fourths of the legislatures or conventions. Stating the rules in terms of such proportions, as opposed to absolute numbers, means that the rules don’t have to be changed every time a new state comes into ( or leaves ) the Union !! Probabilities and Population Proportions : Two Views of the Same Thing Consider the random experiment of picking a George Mason University undergraduate at random. One can ask for the probability that the randomly chosen student is female, or one can ask for the proportion of all George Mason University undergraduates who are female : these are two views of the same thing. http://classweb.gmu.edu/tkeller/HANDOUTS/Handout10.pdf 95% Confidence Interval for a Population Proportion For a population proportion, the margin of error for the 95% confidence interval is E 2 pˆ (1 pˆ ) n where is the sample proportion. The 95% confidence interval ranges from p̂ – margin of error to p̂ + margin of error We can write this confidence interval more formally as pˆ – E p pˆ E IQL Chapter 8: From Samples to Populations Page 7