Download to Powerpoints for the PC

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
The Statistical Imagination
• Chapter 7. Using Probability Theory
to Produce Sampling Distributions
Estimating the Parameters of a
Population
• Point estimate – a statistic provided without
indicating a range of error
• Point estimates are limited because a
calculation made for sample data is only an
estimate of a population parameter. This is
apparent when different results are found with
repeated sampling
Repeated Sampling
• Repeated sampling refers to the procedure of
drawing a sample and computing its statistic,
and then drawing a second sample, a third, a
fourth, and so on
• Repeated sampling reveals the nature of
sampling error
• An illustration of repeated sampling is
presented in Figure 7-1 in the text
Symbols
• Sample statistics are usually noted with English
letters
• Population parameters are usually noted with
Greek letters
What Repeated Sampling Reveals
1. A given sample’s statistic will be slightly off from the
true value of its population’s parameter due to
sampling error
2. Sampling error is patterned, systematic and
predictable
3. Sampling variability is mathematically predictable
from probability curves called sampling distributions
4. The larger the sample size, the smaller the range of
error
A Sampling Distribution
• A mathematical description of all possible
sampling event outcomes and the probability of
each one
• Sampling distributions are obtained from
repeated sampling
• Sampling distributions are probability curves;
they tell us the probability of occurrence of any
sample outcome
A Sampling Distribution of
Means
• A sampling distribution of means describes all
possible sampling event outcomes and the
probability of each outcome when means are
repeatedly calculated on an infinite number of
samples
• It answers the question: What would happen if
we repeatedly sampled a population using a
sample size of n, calculated each sample mean,
and plotted it on a histogram?
Features of a Sampling
Distribution of Means
• A sampling distribution of means is illustrated
in the text in Figure 7-3. It reveals that for an
interval/ratio variable, means calculated from a
repeatedly sampled population calculate to
similar values which cluster around the value of
the population parameter
• A sampling distribution of means will be
approximately normal with a mean equal to the
actual population mean
The Standard Error
• The standard error is the standard deviation of a
sampling distribution
• It measures the spread of sampling error that
occurs when a population is sampled repeatedly
• Rather than repeatedly sample, we estimate
standard errors using the sample standard
deviation of a single sample
The Law of Large Numbers
• The law of large numbers states that the larger
the sample size, the smaller the standard error
of the sampling distribution
• The relationship between sample size and
sampling error is apparent in the formula for
the standard error of the mean; a large n in the
denominator produces a small quotient
The Central Limit Theorem
• The central limit theorem states that regardless
of the shape of the raw score distribution of an
interval/ratio variable, the sampling distribution
of means will be approximately normal in
shape
• This is illustrated in the text in Figure 7-10
The Student’s t Sampling
Distribution
• The sampling distribution curve used with
especially small samples and/or when the
standard error is estimated is called Student’s t
• The t-distribution is an approximately normal
distribution
The t-distribution of Means
• For a sampling distribution of means, when the
sample size is below 120, the probability curve
begins to flatten into a t-distribution
• See Figure 7-7 in the text
Features of the t-distribution
• Standardized scores for the t-distribution are
called t-scores and are computed just as are Zscores
• The t-distribution, like the Z-distribution of the
normal curve, allows us to calculate
probabilities
The t-distribution Table
• The t-distribution table (Appendix B, Statistical
Table C) is organized differently from the
normal curve table
• See Table 7-1 in the text
• The t-distribution table provides t-scores for
only the critical probabilities of .05, .01, and
.001; that is, this table provides
“critical t-values”
Degrees of Freedom
• Degrees of freedom ( df ) are the number of
opportunities in sampling to compensate for
limitations, distortions, and potential
weaknesses in statistical procedures
• Use of the t-distribution table requires the
calculation of degrees of freedom
More on Degrees of Freedom
• From repeated sampling we know that any single
sample is only an estimate. An estimate can be
distorted by limitations of the statistical procedures
used to obtain it
• E.g., the mean is influenced by outliers
• The larger the sample, the greater the opportunity for an
outlier to be neutralized in such a way as to not distort a
sample mean
• A small sample is especially vulnerable to outliers
Sampling Distributions for
Nominal Variables
• A sampling distribution of proportions is
approximately normal and the t-distribution is
used to obtain critical values
• The larger the sample size, the smaller the
range of error
• The standard error is estimated using the
probabilities of success and failure in a sample
Features of a Sampling Distribution for Nominal Variables
• The mean of a sampling distribution of
proportions is equal to the probability of
success in the population
• A sampling distribution of proportions will be
approximately normal when the smaller
parameter (the probability of success or failure
in the population) multiplied by the sample size
is greater than or equal to 5