Download Estimating the population mean µ using the sample mean X

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Gibbs sampling wikipedia , lookup

Misuse of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
STAT1010 – Sampling distributions x-bar
8.1 Sampling distributions
!  Distribution
of the sample mean X
(We will discuss now)
of the sample proportion p̂
(We will discuss later)
!  Distribution
1
Estimating the population mean
µ using the sample mean X
!  Recall,
we often want to make a statement
about the population based on a random
sample taken from a population of interest.
Population
Sample
2
!  We
say we want to infer a general conclusion
about the population based on the sample.
!  This
is called inferential statistics.
Population
Sample
3
1
STAT1010 – Sampling distributions x-bar
!  But
won’t my conclusion about the population
depend on the specific sample chosen?
(sample-to-sample variability leads to sampling variability).
!  Yes,
but if we’ve chosen a sample
appropriately (randomly, for example), we can
STILL make a statement about the population,
with a certain Margin Of Error (MOE).
Population
Sample
4
Population Parameter
Sample Statistic
Population mean
µ
Sample mean
The mean house value for
all houses in Iowa
The mean house value for
a sample of n=200 houses
in Iowa
Population proportion
Sample proportion
The proportion of all houses
in Iowa with lead paint.
The proportion of Iowa
houses in a sample of
n=200 with lead paint.
p
Unknown, but estimated from
X
p̂
Calculated from sample
5
Sample-to-sample variability
! 
The sampling error is the error introduced
because a random sample is used to estimate a
population parameter.
! 
We saw sample-to-sample variability when we
explored the on-line applet called ‘Sampling
distribution of X ’ in the CLT notes.
! 
Sampling error does not include other sources of
error, such as those due to biased sampling, bad
survey questions, or recording mistakes.
6
2
STAT1010 – Sampling distributions x-bar
Example: sample-to-sample
variability
! 
Let’s say we truly do know the information for all
individuals in a specific population (not usually
the case), just to show what we mean by the
phrase ‘sampling error’.
! 
Every student in a population of 400 students
was asked how many hours they spend per
week using a search engine on the Internet.
7
!  We
actually know µ in this case because
we have a census, and µ=3.88
All 400 values. This is the full population.
!  We’ll
8
take a sample of n=32 students.
Sample 1
1.1
3.8
1.7
7.8
5.7
2.1
6.8
6.5
1.2
4.9
2.7
0.3
3.0
2.6
0.9
6.5
1.4
2.4
5.2
7.1
2.5
2.2
5.5
7.8
5.1
3.1
3.4
5.0
4.7
6.8
7.0
6.5
The mean of this sample is xx̄ = 4.17; we use the
standard notation xx̄ to denote this mean.
We say that xx̄ is a sample statistic because it comes
from a sample of the entire population. Thus, xx̄ is
called a sample mean.
9
3
STAT1010 – Sampling distributions x-bar
!  We’ll
take another sample of n=32 students.
Sample 2
1.8
5.2
0.5
0.4
5.7
3.9
4.0
6.5
3.1
2.4
1.2
5.8
0.8
5.4
2.9
6.2
5.7
7.2
0.8
7.2
0.9
6.6
5.1
4.0
5.7
3.2
7.9
3.1
2.5
5.0
3.6
3.1
The mean of this sample is xx̄ = 3.98.
Now you have two sample means that don’t agree with
each other, and neither one agrees with the true
population mean.
x̄x1 = 4.17
x̄x2 = 3.98
µ = 3.88
10
!  As
we saw in the Central Limit Theorem
notes, the distribution of sample means X
is normally distributed.
This is the histogram that
results from 100 different
samples, each with 32
students.
This histograms essentially
shows a sampling
distribution of sample
means.
The mean is very
close to µ=3.88
11
The Distribution of Sample Means
!  The
distribution of sample means is the
distribution that results when we find the
means of all possible samples of a given
size n.
!  Technically, this distribution is approximately
normal, and the larger the sample size, the
closer to normal it is.
12
4
STAT1010 – Sampling distributions x-bar
The Distribution of Sample Means
13
The Distribution of Sample Means
!  As
we saw earlier…
" The
mean of the distribution of sample means
is equal to the population mean.
µx = µ
" The
standard deviation of the distribution of
sample means depends on the population
standard deviation and the sample size.
σx =
σ
n
14
The search-engine time example:
For a sample of size n=32,
X ~ N(µ x = 3.88, σ x =
2.4
)
32
We can use this distribution to compute
probabilities regarding values of X , which is
the average time spent on a search-engine
for a sample of size n=32.
15
5
STAT1010 – Sampling distributions x-bar
Exercise 1: Sampling farms
!  Texas
has roughly 225,000 farms. The
actual mean farm size is µ = 582 acres
and the standard deviation is σ = 150
acres.
" A)
For random samples of n = 100 farms, find
the mean and standard deviation of the
distribution of sample means.
16
Exercise 1: Sampling farms
" B)
What is the probability of selecting a
random sample of 100 farms with a mean
greater than 600 acres?
17
8.2 Estimating Population Means
!  We
use the sample mean X as our estimate
of the population mean µ.
!  We
should report some kind of ‘confidence’
about our estimate. Do we think it’s pretty
accurate? Or not so accurate.
!  What
sample size n do we need for a given
level of confidence about our estimate.
" Larger
n coincides with better estimate.
18
6
STAT1010 – Sampling distributions x-bar
Example: Mean heart rate in
young adults
!  We
wish to make a statement about the
mean heart rate in all young adults. We
randomly sample 25 young adults and record
each person’s heart rate.
" Population:
" Sample:
all young adults
the 25 young adults chosen for the
study
19
!  Parameter
of interest:
" Population
!  Sample
mean heart rate µ
Unknown, but
can be estimated
statistic:
" Sample
mean heart rate X
Can be computed
from sample data
Random sample of n = 25 young adults.
Heart rate (beats per minute)
70, 74, 75, 78, 74, 64, 70, 78, 81, 73
82, 75, 71, 79, 73, 79, 85, 79, 71, 65
70, 69, 76, 77, 66
X = 74.16
beats per minute
20
!  We
know that X won’t exactly equal µ, but
maybe we can provide an interval around
our observed X such that we’re 95%
confident that the interval contains µ.
!  Something
(
71
like [ X - cushion, X + cushion]
72
73
74 75 76
)
77
x
21
7
STAT1010 – Sampling distributions x-bar
!  We
could report an interval like
(72.0, 76.3)
and say we’re 95% sure the true
population mean µ lies in this interval.
!  How
do we choose an appropriate
‘cushion’? (or margin of error (MOE))
!  How
do we decide how ‘likely’ it is that the
population mean µ falls into this interval?
22
95% Confidence Interval (CI) for
a Population Mean µ
!  The
interval we have been describing is
called a confidence interval.
!  There
a specific formula for computing the
margin of error (MOE) in a CI and it is
based on the fact that X is normally
distributed.
23
!  When
we make a confidence interval, we’re
not 100% sure that it contains the unknown
value of the parameter of interest, i.e. µ,
X
M
but the methods we use to construct the interval will allow us to place a confidence level
of parameter containment with our interval.
24
8
STAT1010 – Sampling distributions x-bar
95% Confidence Interval (CI) for
a Population Mean µ
!  The
margin of error (MOE) for the 95% CI
for µis
2s
MOE = E ≈
n
where s is the standard deviation of the
sample (see slide 29), which is the estimate
for the population standard deviation σ.
25
95% Confidence Interval (CI) for
a Population Mean µ
!  We
find the 95% confidence interval by
adding and subtracting the MOE from the
sample mean X . That is, the 95%
confidence interval ranges
from ( X – margin of error) to ( X + margin of error).
26
95% Confidence Interval (CI) for
a Population Mean µ
!  We
can write this confidence interval more
formally as
X−E <µ < X+E
Or more briefly as
X±E
27
9
STAT1010 – Sampling distributions x-bar
95% Confidence Interval (CI) for
a Population Mean µ
The 95% CI extends a distance
equal to the margin of error on
either side of the sample mean.
28
Example: Mean heart rate in
young adults
!  Summary
of data:
!  n = 25
! 
X = 74.16 beats
!  s = 5.375 beats
(∑ ( x − x ) )
2
Recall:
s=
i
n −1
29
Example: Mean heart rate in
young adults
!  Calculating
the 95% CI for population mean
heart rate:
MOE = E ≈
2s 2(5.375)
=
= 2.15
n
25
and the 95% CI is:
74.16 − 2.15 < µ < 74.16 + 2.15
or
(72.01, 76.31)
30
10
STAT1010 – Sampling distributions x-bar
Interpretation of the 95%
Confidence Interval (CI) for a
Population Mean µ
!  We
are 95% confident that this interval
contains the true parameter value µ.
" Note
that a 95% CI always contains X . In fact,
it’s right at the center of every 95% CI.
" I
might’ve missed the µ with this interval, but at
least I’ve set it up so that’s not very likely.
31
Interpretation of the 95%
Confidence Interval (CI) for a
Population Mean µ
!  If
I was to repeat this process 100 times
(i.e. take a new sample, compute the CI, do
again, etc.), then on average, 95 of those
confidence intervals I created will contain µ.
" See
applet linked at our website:
http://statweb.calpoly.edu/chance/applets/
ConfSim/ConfSim.html
32
11