Download The process of Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Gibbs sampling wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
ST 311- Introduction to Statistics
Instructor: Judith Canner
Learning Objectives E
Spring 2010
SAMPLING DISTRIBUTIONS
Sampling Distribution of the (sample) mean (Central Limit Theorem)
Let μ = mean for the population of interest
Let σ = standard deviation for the population of interest
Let x = mean for the sample (sample mean)
If numerous random samples of the same size n are taken and the n observations of each sample
are independent, the distribution of the possible values for x is approximately normal, with
Mean = μ
Standard deviation = sd ( x ) =
σ
n
Sample Size (n)
1. Small (n<30): only use the normal curve approximation if from a bell-shaped population
2. Large (n>30): does not have to be bell-shaped (can be any shape) to use normal curve
approximation for sample means.
3. Extreme Outliers: may need a larger sample size (much larger than 30)
Examples:
1. Recall the activity on NFL player weights. As n increased, what happened to the range of the
distribution of sample means?
Law of Large Numbers:
2. What is the sampling distribution for the sample mean of 200 NFL players’ weights?
1
ST 311- Introduction to Statistics
Instructor: Judith Canner
Learning Objectives E
Spring 2010
3. Consider the ST 311 data on the number of cigarettes smoked a day. The distribution is
skewed right with mean 0.586 cigarettes and standard deviation 2.26.
a. Suppose I select a sample of 15 people from ST 311. Can you describe the sampling
distribution of the mean number of cigarettes smoked a day by 15 ST 311 students?
Why or why not? If so, what is the sampling distribution?
b. Now suppose I select a sample of 200 ST 311 students and determine the mean
number of cigarettes smoked a day. Can you describe the sampling distribution of the
sample mean of 200 ST 311 students? Why or why not? If so, what is the sampling
distribution?
4. Now consider the cost of textbooks. The distribution of the cost of textbooks is bell-shaped
with mean $348 and standard deviation $143.7.
a. Suppose I select a sample of 15 people from ST 311. Can you describe the sampling
distribution of the mean cost of textbooks for 15 ST 311 students? Why or why not?
If so, what is the sampling distribution?
b. Now suppose I select a sample of 200 ST 311 students and determine the mean cost
of textbooks for them. Can you describe the sampling distribution of the sample
mean of 200 ST 311 students’ textbook costs? Why or why not? If so, what is the
sampling distribution?
2
ST 311- Introduction to Statistics
Instructor: Judith Canner
Learning Objectives E
Spring 2010
Standardized Z-score for the Sample Mean
Recall the z-score for an individual randomly drawn from a normally distributed population.
z = ____________
Now instead of an individual, we consider a sample mean. Recall that the standard deviation for
the sample mean is the population standard deviation divided by the square root of the sample
size, i.e.
sd ( x ) =
σ
n
.
The z-score is
z = ____________ = ____________________________________
Example 1: Coca-cola uses a filling machine to fill 12 oz cans. Each can is to contain 355
milliliters of soda. In fact, the amount varies according to a normal distribution with mean 355.2
ml and standard deviation of 0.5 ml.
1. What is the probability that an individual can contains less than 355 ml?
2. What is the probability that the mean content of a 6-pack of cans is less than 355 ml?
3. I got a six pack of Coke from the store the other day. When I measured the amount of
soda in the six cans the mean amount was 353 ml. What is the probability that my sixpack has a mean amount of soda of 353 ml? What does this lead you to believe about the
filling machines? What about the original distribution for our population?
3
ST 311- Introduction to Statistics
Instructor: Judith Canner
Learning Objectives E
Spring 2010
Example 2: McDonald’s claims that the average time required to fill an order is 2.5 minutes and
that the standard deviation is 15 seconds. Wendy’s wants to prove McDonald’s wrong. Over a
1-week period, Wendy’s timed a sample of 40 orders at McDonalds and it obtained a sample
mean of 160 seconds.
1. We are not told the distribution of time required to fill an order at McDonald’s. Can we
still use the normal curve as an approximation for the sampling distribution of the mean
time? Why or why not?
2. Do you think the restaurant’s 2.5 minute claim is true?
Example 3: In baseball, a “no-hitter” is a regulation 9-inning game in which the pitcher yields
no hits to the opposing batters. Chance (Summer 1994) reported on a study of no-hitters in Major
League Baseball. The initial analysis focused on the total number of hits yielded per game per
team for all 9inning games played between 1989 and 1993. The distribution of hits/9-innings is
approximately normal with mean 8.72 and standard deviation 1.10.
1. What percentage of 9-inning games results in fewer than 5 hits?
2. What is the probability that the average number of hits for ten 9-inning games is less than
5 hits?
3. Demonstrate statistically, why a no-hitter is considered an extremely rare occurrence.
4
ST 311- Introduction to Statistics
Instructor: Judith Canner
Learning Objectives E
Spring 2010
Standard Error of the Mean
In practice the population standard deviation σ is rarely known, but the sample standard
deviation s is known. We can use standard error to estimate the theoretical standard deviation of
the sampling distribution for the sample means
sd ( x ) =
σ
n
Standard Error:
Measures roughly how much, on average, the sample mean x is in error as an estimate of the
population mean μ
s
se( x ) =
n
where s is the standard deviation of the sample of size n.
Example: Consider the NFL player weights. The standard deviation for the population is 45.68
lbs with mean 245.25 lbs.
1. I select a sample of 50 players and the standard deviation of the sample is 50.36 lbs with
mean 247.27 lbs. What is the standard deviation of the sampling distribution of the
sample mean? What is the standard error of the sample mean?
2. I select another sample of 50 players. The standard deviation of the sample is 46.82 lbs
with mean 238 lbs. What is the standard deviation of the sampling distribution of the
sample mean? What is the standard error of the sample mean?
3. Compare the standard deviation of the sample means for 1 and 2. Are they the same?
Different?
4. Now compare the standard error of the sample means? Are they the same? Different?
Why?
5
ST 311- Introduction to Statistics
Instructor: Judith Canner
Learning Objectives E
Spring 2010
Generalizations about Sampling Distributions
1. As long as certain conditions are met, the sampling distribution is normal
a. _________________________________
b. _________________________________
2. The mean of the sampling distribution is the population parameter that corresponds to the
sample statistic
Example:__________________________________________
3. The standard deviation of the sampling distribution measures how the values of the
sample statistics might vary across different samples from the same population
a. Sample size: As the sample size gets __________________, the variability among
possible values of the statistic from different samples gets ___________________
4. Central Limit Theorem: if n is sufficiently large, the sample means of random samples
from a population with mean μ and finite standard deviation σ are approximately
normally distributed with mean μ and standard deviation σ n
http://courses.ncsu.edu/st311/common/basic.html
Population Distribution ( μ , σ )
Sampling Distribution (n, x , sd ( x ) , se( x ) )
6
ST 311- Introduction to Statistics
Instructor: Judith Canner
Learning Objectives E
Spring 2010
Sampling Distribution of Sample Means
Example: As reported by the U.S. National Center for Health Statistics, the mean high-densitylipoprotein (HDL) cholesterol of females 20-29 years old is 53. If HDL cholesterol is normally
distributed with standard deviation 13.4…
1. What is the probability that randomly selected female 20-29 years old will have an HDL
cholesterol level above 60?
2. What is the probability that random sample of 20 year olds will have a mean cholesterol level
above 60?
3. What might you conclude if a random sample of 20 females 20-29 years old had a mean
cholesterol level above 60?
7
ST 311- Introduction to Statistics
Instructor: Judith Canner
Learning Objectives E
Spring 2010
SAMPLING DISTRIBUTIONS
Sample Proportions
What other statistics can we make inferences about using the normal curve approximation for
the sampling distribution? So far, we have focused on statistics for quantitative variables (mean,
standard deviation, etc.) and have neglected categorical variables. We can use the foundations
we have built for quantitative variables to make inferences about categorical variables.
In Class Activity: Coin Toss
1. Get into a groups of two, assign one person as the “flipper” and the other as the recorder.
2. The flipper will flip the coin 50 times while the recorder records the number of heads and tails
that occur during the 50 flips.
3. Once finished, write the number of heads on the post-it note and write your names on the back.
Go up to the chalk board and place the post-it in the appropriate bin on the chalkboard
8
ST 311- Introduction to Statistics
Instructor: Judith Canner
Learning Objectives E
Spring 2010
Sampling Distribution for Sample Proportions
Let p = population proportion of interest or binomial probability of success
Let p̂ = corresponding sample proportion or proportion of successes
If numerous random samples or repetitions of the same size n are taken, the distribution of the
possible values of p̂ is approximately a normal curve distribution with
Mean = p
Standard deviation = s.d .( pˆ ) =
p (1 − p )
n
Sample size:
Need a sample size large enough to observe at least ten of each response (success/failure), i.e.,
1. np ≥ 10
2. n(1-p) ≥ 10
Examples: In the coin toss example, we know that the population proportion for the number of
heads should be p=_______. What is the sampling distribution for the sample proportion of the 50
coin tosses?
What is the sampling distribution for the proportion of heads in 100 flips?
Compare the standard deviation for 50 and 100 flips. If I were to keep flipping the coin forever
what would expect the sample proportion of heads to be? What about the standard deviation?
What would the sampling distribution be for the proportion of tails in 100 flips?
9
ST 311- Introduction to Statistics
Instructor: Judith Canner
Learning Objectives E
Spring 2010
Standardized Score for Sample Proportions
Similar to the sample mean, if the correct conditions are met, we can use the normal distribution
to describe the sampling distribution of the sample proportion. Therefore, we can also make
inferences about the sample proportion in the same way we make inferences about the sample
mean, by using the z-score to calculate probabilities associated with different sample
proportions.
z=
sample proportion − population proportion
standard deviation of the sample proportion
z=
pˆ − p
=
s.d .( pˆ )
pˆ − p
p (1 − p )
n
Examples: What is the probability that we flip 40 heads in 50 coin tosses?
Let’s consider one of your samples. What is the probability that someone flipped _____ heads in
50 coin tosses?
What would be the probability that someone flipped the same number, _____, of tails?
Standard Error for Sample Proportions
Just as with the sample means, we generally do not know the actual population proportion, p, so
we use the standard error to estimate the standard deviation of the sample proportion, p̂ .
s.e.( pˆ ) =
pˆ (1 − pˆ )
n
10