Download IQL Chapter 8

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

German tank problem wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
IQL Chapter 8 – From Samples to
Populations
Statistical Reasoning for everyday life, Bennett, Briggs, Triola, 3rd Edition
8.1 Sampling Distributions
Using information gathered from a small sample to convey information about the population is called:
Inferential Statistics.
LEARNING GOAL
Understand the fundamental ideas of sampling distributions and how the distribution of sample
means and the distribution of sample proportions are formed. Also learn the notation used to
represent sample means and proportions.
SAMPLE MEANS: THE BASIC IDEA
What is a distribution of Sample Means (Sampling Distribution of the Mean)?
X
A distribution of sample means (
); a “distribution of a statistic [in this case a sample mean] over
repeated sampling from a specified population”
Based on all possible random samples of size n, from a population
Can inform us of the degree of sample-to-sample variability we should expect due to chance
08 Sampling Distribution of the Mean
Distribution of Statistics • The Shape is a Heap! • μ x • Central Limit Theorem • Standard Error
Distribution of Statistics
8.1
As noted in earlier chapters, statistics are the measures of a sample. The measures are used to
characterize the sample and to infer measures of the population termed parameters.
Parameter
A parameter is a numerical description of a population. Examples include the population mean μ and the
population standard deviation σ.
Statistic
IQL Chapter 8: From Samples to Populations
Page 1
A statistic is a numerical description of a sample. Examples include a sample mean x and the sample
standard deviation sx.
Good samples are random samples where any member of the population is equally likely to be selected
and any sample of any size n is equally likely to be selected. Consider four samples selected from a
population. The samples need not be mutually exclusive as shown, they may include elements of other
samples.
The sample means x1, x2, x3, x4, can include a smallest sample mean and a largest sample mean.
Choosing a number of bins can generate a histogram for the sample means. The question this chapter
answers is whether the shape of the distribution of sample means from a population is any shape or a
specific shape.
Sampling Distribution of the Mean
The shape of the distribution of the sample mean is not any possible shape. The shape of the
distribution of the sample mean, at least for good random samples with a sample size larger than 30, is a
normal distribution. That is, if you take random samples of 30 or more elements from a population,
calculate the sample mean, and then create a relative frequency distribution for the means, the
resulting distribution will be normal.
In the following diagram the underlying data is bimodal and is depicted by the light blue columns. Thirty
data elements were sampled forty times and forty sample means were calculated. A relative frequency
histogram of the sample means is plotted in a heavy black outline. Note that though the underlying
distribution is bimodal, the distribution of the forty means is heaped and close to symmetrical. The
distribution of the forty sample means is normal.
IQL Chapter 8: From Samples to Populations
Page 2
In the following diagram the underlying data is bimodal and is depicted by the columns with thin
outlines. Thirty data elements were sampled forty times and forty sample means were calculated. A
relative frequency histogram of the sample means is plotted with a heavy black outline. Note that
though the underlying distribution is bimodal, the distribution of the forty means is heaped and close to
symmetrical. The distribution of the forty sample means is normal.
The center of the distribution of the sample means is, theoretically, the population mean. To put this
another simpler way, the average of the sample averages is the population mean. Actually, the average
of the sample averages approaches the population mean as the number of sample averages approaches
infinity.
http://www.comfsm.fm/~dleeling/statistics/notes008.html#distributionofstatistics
SAMPLE MEANS WITH LARGER POPULATIONS
Population Mean: The true mean of a population, denoted by the Greek letter µ (pronounced “mew”)
Most statistical studies work with populations that are larger in order to get a more accurate
representation, in order to avoid a sampling error.
Sampling Error
The sampling error is the error introduced because a random sample is used to estimate a population
parameter. It does not include other sources of error, such as those due to biased sampling, bad survey
questions, or recording mistakes.
Notation for Population and Sample Means
IQL Chapter 8: From Samples to Populations
Page 3
The Distribution of Sample Means
SAMPLE PROPORTIONS
Much of what we have learned about distributions of samples means carries over to distribution of
sample proportions. This proportion is another example of the sample statistic. In this case, it is a
sample proportion, in this case we use the
the population proportion, p.
(read “p-hat”) to distinguish this sample proportion from
Notation for Population and Sample Proportions
n = sample size
p = population proportion
= sample proportion
The Distribution of Sample Proportions
The distribution of sample proportions is the distribution that results when we find the
proportions ( ) in all possible samples of a given size.
The larger the sample size, the more closely this distribution approximates a normal distribution.
In all cases, the mean of the distribution of sample proportions equals the population proportion.
If only one sample is available, its sample proportion,
proportion, p.
IQL Chapter 8: From Samples to Populations
, is the best estimate for the population
Page 4
8.2 Estimating the Population Mean
Estimating a Population Mean: The Basics
Confidence Interval: A range of values associated with a confidence level, such as 95%, that is likely to
contain the truel value of a population parameter.
Margin of Error: The maximum likely difference between an observed sample statistic and the true
value of a population parameter. Its size depends on the desired level of confidence.
95% Confidence Interval for a Population Mean
The margin of error for the 95% confidence interval is
where s is the standard deviation of the sample. We find the 95% confidence interval by adding and
subtracting the margin of error from the sample mean. That is, the 95% confidence interval ranges
from (x – margin of error) to (x + margin of error)
We can write this confidence interval more formally as
x–E<μ<x+E
or more briefly as
x±E
EXAMPLE 1 Computing the Margin of Error
Compute the margin of error and find the 95% confidence interval for the protein intake sample of n =
267 men, which has a sample mean of x = 77.0 grams and a sample standard deviation of s = 58.6
grams.
Solution: The sample size is n = 267 and the standard deviation for the sample is s = 58.6, so the margin
of error is
IQL Chapter 8: From Samples to Populations
Page 5
INTERPRETING THE CONFIDENCE INTERVAL
What are confidence intervals?
Confidence intervals provide different information from that arising from hypothesis tests. Hypothesis
testing produces a decision about any observed difference: either that the difference is ‘statistically
significant’ or that it is ‘statistically non-significant’. In contrast, confidence intervals provide a range
about the observed effect size. This range is constructed in such a way that we know how likely it is to
capture the true – but unknown – effect size. Thus, the formal definition of a confidence interval is: ‘a
range of values for a variable of interest [in our case, the measure of treatment effect] constructed so
that this range has a specified probability of including the true value of the variable.
The specified probability is called the confidence level, and the end points of the confidence interval are
called the confidence limits’.9 It is conventional to create confidence intervals at the 95% level – so this
means that 95% of the time properly constructed confidence intervals should contain the true value of
the variable of interest. This corresponds to hypothesis testing with p-values, with a conventional cut-off
for p of less than 0.05. More colloquially, the confidence interval provides a range for our best guess of
the size of the true treatment effect that is plausible given the size of the difference actually observed.
http://www.medicine.ox.ac.uk/bandolier/painres/download/whatis/what_are_conf_inter.pdf
CHOOSING SAMPLE SIZE
 2s 
n   
 
2
In order to estimate the population mean with a specified margin of error of at most E, the size of the
sample should be at least
2
 2 
n   
 
where σ is the population standard deviation (often estimated by the sample standard deviation s).
IQL Chapter 8: From Samples to Populations
Page 6
8.3 Estimating Population Parameters
THE BASICS OF EXTIMATING A POPULATION PROPORTION
Population Proportion: The true proportion of some characteristic in a population, denoted by p.
Estimating a Population Proportion
Why Proportions ?
There are many times when the easiest, most appropriate, or most illuminating way to frame an issue is
in terms of a proportion – i.e. the ratio of a part to a whole - To pick one random example from billions
of possible examples, the authors of the US Constitution provided two methods for amending the
Constitution :The first method is for a bill to pass both halves of the legislature, by a two-thirds majority
in each. Once the bill has passed both houses, it goes on to the states The second method prescribed is
for a Constitutional Convention to be called by two-thirds of the legislatures of the States, and for that
Convention to propose one or more amendments. These amendments are then sent to the states to be
approved by three-fourths of the legislatures or conventions. Stating the rules in terms of such
proportions, as opposed to absolute numbers, means that the rules don’t have to be changed every
time a new state comes into ( or leaves ) the Union !!
Probabilities and Population Proportions : Two Views of the Same Thing
Consider the random experiment of picking a George Mason University undergraduate at random. One
can ask for the probability that the randomly chosen student is female, or one can ask for the proportion
of all George Mason University undergraduates who are female : these are two views of the same thing.
http://classweb.gmu.edu/tkeller/HANDOUTS/Handout10.pdf
95% Confidence Interval for a Population Proportion
For a population proportion, the margin of error for the 95% confidence interval is
E 2
pˆ (1  pˆ )
n
where is the sample proportion.
The 95% confidence interval ranges
from p̂ – margin of error to p̂ + margin of error
We can write this confidence interval more formally as
pˆ – E  p  pˆ  E
IQL Chapter 8: From Samples to Populations
Page 7