Download Introduction to Sampling Distributions

Document related concepts
no text concepts found
Transcript
Biostatistics
Unit 5
Samples
1
Sampling distributions
• Sampling distributions are important
in the understanding of statistical
inference.
• Probability distributions permit us to
answer questions about sampling and
they provide the foundation for
statistical inference procedures.
2
Definition
• The sampling distribution of a
statistic is the distribution of all possible
values of the statistic, computed from
samples of the same size randomly
drawn from the same population.
• When sampling a discrete, finite
population, a sampling distribution can
be constructed.
• Note that this construction is difficult with
a large population and impossible with
an infinite population.
3
Construction of sampling distributions
1. From a population of size N,
randomly draw all possible samples of
size n.
2. Compute the statistic of interest for
each sample.
3. Create a frequency distribution of the
statistic.
4
Properties of sampling distributions
We are interested in the
• mean,
• standard deviation, and
• appearance of the graph (functional
form) of a sampling distribution.
5
Types of sampling distributions
We will study the following types of
sampling distributions.
A) Distribution of the sample mean
B) Distribution of the difference
between two means
C) Distribution of the sample
proportion
D) Distribution of the difference
between two proportions
6
(A) Sampling distribution of
Given a finite population with mean (m)
and variance (s2). When sampling from
a normally distributed population, it can
be shown that the distribution of the
sample mean will have the following
properties.
7
Properties of the sampling distribution
1. The distribution of
will be normal.
2. The mean
, of the distribution of the values of
will be the same as the mean of the population
from which the samples were drawn;
= m.
3. The variance, , of the distribution of
will be equal to the variance of the population
divided by the sample size;
=
.
8
Standard error
The square root of the variance of the
sampling distribution is called the
standard error of the mean which is
also called the standard error.
9
Nonnormally distributed populations
When the sampling is done from a
nonnormally distributed population, the
central limit theorem is used.
10
The central limit theorem
Given a population of any nonnormal
functional form with mean (m) and
variance (s2) , the sampling distribution
of , computed from samples of size n
from this population will have mean, m,
and variance, s2/n, and will be
approximately normally distributed when
the sample is large (30 or higher).
11
The central limit theorem
Note that the standard deviation of the
sampling distribution is used in
calculations of z scores and is equal to:
12
Sampling distribution of the mean
and
Central Limit Theorem
We do in class together
13
Data
• A small apartment
building has 3
apartments.
• How many people live
in each apartment?
Apartment
People
A
B
C
14
Find m and s
• Use the TI to obtain the values for the population.
The values are:
m=
s=
15
Form samples of size 2
• We need to form all
samples of size 2,
using replacement
since the population
is very small.
• Then we find the
sample mean for
each sample of 2
apartments.
Samples
Sample
mean
A, A
A, B
A, C
B, A
B, B
B, C
C, A
C, B
C, C
16
Find m and s
• Use the TI to obtain the values for the means of
the samples.
The values are:
m=
s=
17
Results
Mean of Sample means
• Mean of population
equals mean of the
sample means
m
x
=m
18
Results of Standard deviation of
the sample means
• S.D. equals the
population standard
deviation divided by
the square root of the
sample size
s
=
x
s
n
19
Distribution of the sample means
• If the population is normally distributed, then the
sample means will be normally distributed.
• If the population is not normally distributed, then
the sample means will be normally distributed if
the sample size is at least 30.
20
Important Consequence
• If we take samples of size n from some
population, under the previous conditions, then
we can determine the probability of the sample
means fulfilling some condition. We use:
x m
z=
s/ n
21
Example #1
• The heights of kindergarten children are
approximately normally distributed with a mean of
39 and a standard deviation of 2. If one child is
randomly selected, what is the probability that the
child is taller than 41 inches?
• This is 1 child – Not the Central Limit Theorem!
22
Example #2
• Suppose we have a class of 30 kindergarten
children. What is the probability that the mean
height of these children exceeds 41 inches?
• This is the Central Limit Theorem as it is asking
about the probability of a sample mean!
23
Conclusion
• It is not unusual for one child, selected at random
from a kindergarten class, to be taller than 41
inches.
• It is highly unlikely that the mean height for 30
kindergarten students exceeds 41 inches.
24
An analogy
• It would not be unusual for a student to get an A
on a statistics test.
• It would be unusual if the class average for a
statistics class was an A!
25
Demonstration that Central Limit Theorem
Really Works (1)
We start with a dwelling that has 3
apartments. Here is the list of
occupancies.
Apt A = 3
Apt B = 4
Apt C = 2
This is the entire population. It is
entered into a list on the TI-83
26
Demonstration that Central Limit Theorem
Really Works (2)
We calculate 1-Var Stats to obtain the
population parameters for this
population.
Mean:
m=3
Standard Deviation:
s = .8164965809
Note: we do not use s = 1 because this
is the entire population, not a sample.
27
Demonstration that Central Limit Theorem
Really Works (3)
• Knowing the population parameters of
m and s, we now determine them using
a sampling distribution.
• We can find the population parameters
because it is a very small population.
• Normally, populations are too large to
determine m and s directly from the
population.
28
Demonstration that Central Limit
Theorem Really Works (4)
• We need to form all
samples of size 2,
using replacement
since the population
is very small.
• Then we find the
sample mean for
each sample of 2
apartments.
Samples
A, A
A, B
A, C
B, A
B, B
B, C
C, A
C, B
C, C
Sample
mean
3.0
3.5
2.5
3.5
4.0
3.0
2.5
3.0
2.0
29
Demonstration that Central Limit Theorem
Really Works (5)
We calculate 1-Var Stats to obtain the
population parameters for the sampling
distribution.
Mean:
m=3
Standard Deviation:
s = .5773502692
30
Demonstration that Central Limit Theorem
Really Works (6)
31
Example
Given the information below, what is the
probability that x is greater than 53?
(1) Write the given information.
m = 50
s = 16
n = 64
x = 53
32
Example
(2) Sketch a normal curve.
33
Example
(3) Convert x to a z score.
34
Example
(4) Find the appropriate value(s) in the table.
A value of z = 1.5 gives an area of .9332.
This is subtracted from 1 to give
the probability P (z > 1.5) = .0668
35
Example
(5) Complete the answer.
The probability that x is greater than 53 is
.0668.
36
(B) Distribution of the difference between two means
• It often becomes important to compare
two population means.
• Knowledge of the sampling distribution
of the difference between two means is
useful in studies of this type.
• It is generally assumed that the two
populations are normally distributed.
37
Sampling distribution of
Plotting mean sample differences
against frequency gives a normal
distribution with mean equal to
which is the difference between the two
population means.
38
Variance
The variance of the distribution of the sample
differences is equal to
Therefore, the standard error of the differences
between two means would be equal to
39
Converting to a z score
To convert to the standard normal distribution, we
use the formula
We find the z score by assuming that there is no
difference between the population means.
40
Sampling from normal populations
This procedure is valid even when Sampling from
normal populations the population variances are
different or when the sample sizes are
different. Given two normally distributed populations
with means, and , and variances, and
,
respectively.
(continued)
41
Sampling from normal populations
The sampling distribution of the difference,
between the means of independent samples of
size n1 and n2 drawn from these populations is
normally distributed with mean,
, and
variance,
,
42
Example
In a study of annual family expenditures for general
health care, two populations were surveyed with the
following results:
Population 1: n1 = 40,
= $346
Population 2: n2 = 35,
= $300
43
Example
If the variances of the populations are
= 2800 and
= 3250, what is the probability of
obtaining sample results
as large as those
shown if there is no difference in the means of the
two populations?
44
Solution
(1) Write the given information
n1 = 40,
= $346,
= 2800
n2 = 35,
= $300,
= 3250
45
Solution
(2) Sketch a normal curve
46
Solution
(3) Find the z score
47
Solution
(4) Find the appropriate value(s) in the table
A value of z = 3.6 gives an area of .9998. This is
subtracted from 1 to give the probability
P (z > 3.6) = .0002
48
Solution
(5) Complete the answer
The probability that
.0002.
is as large as given is
49
C) Distribution of the sample proportion (
)
While statistics such as the sample mean are
derived from measured variables, the sample
proportion is derived from counts or frequency data.
50
Properties of the sample proportion
Construction of the sampling distribution of the
sample proportion is done in a manner similar to that
of the mean and the difference between two
means. When the sample size is large, the
distribution of the sample proportion is approximately
normally distributed because of the central limit
theorem.
51
Mean and variance
The mean of the distribution,
, will be equal to
the true population proportion, p, and the variance of
the distribution,
, will be equal to p(1-p)/n.
52
The z-score
The z-score for the sample proportion is
53
Example
In the mid seventies, according to a report by the
National Center for Health Statistics, 19.4 percent of
the adult U.S. male population was obese. What is
the probability that in a simple random sample of
size 150 from this population fewer than 15 percent
will be obese?
54
Solution
(1) Write the given information
n = 150
p = .194
Find P(
< .15)
55
Solution
(2) Sketch a normal curve
56
Solution
(3) Find the z score
57
Solution
(4) Find the appropriate value(s) in the table
A value of z = -1.36 gives an area of .0869 which is
the probability
P (z < -1.36) = .0869
58
Solution
(5) Complete the answer
The probability that
< .15 is .0869.
59
D) Distribution of the difference
between two proportions
This is for situations with two population
proportions. We assess the probability associated
with a difference in proportions computed from
samples drawn from each of these populations. The
appropriate distribution is the distribution of the
difference between two sample proportions.
60
Sampling distribution of
The sampling distribution of the difference between
two sample proportions is constructed in a manner
similar to the difference between two means.
(continued)
61
Sampling distribution of
Independent random samples of size n1 and n2 are
drawn from two populations of dichotomous
variables where the proportions of observations with
the character of interest in the two populations are p1
and p2 , respectively.
62
Mean and variance
The distribution of the difference between two
sample proportions,
, is approximately
normal.
The mean is
The variance is
These are true when n1 and n2 are large.
63
The z score
The z score for the difference between two
proportions is given by the formula
64
Example
In a certain area of a large city it is hypothesized that 40
percent of the houses are in a dilapidated condition. A
random sample of 75 houses from this section and 90
houses from another section yielded difference,
,
of .09. If there is no difference between the two areas in
the proportion of dilapidated houses, what is the
probability of observing a difference this large or larger?
65
Solution
(1) Write the given information
n1 = 75, p1 = .40
n2 = 90, p2 = .40
= .09
Find P(
.09)
66
Solution
(2) Sketch a normal curve
67
Solution
(3) Find the z score
68
Solution
(4) Find the appropriate value(s) in the table
A value of z = 1.17 gives an area of .8790 which is
subtracted from 1 to give the probability
P (z > 1.17) = .121
69
Solution
(5) Complete the answer
The probability of observing
of .09 or greater is .121.
70
fin
71
Related documents