Download Confidence Intervals and Sampling Distributions Review Sheet with

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Opinion poll wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Confidence Intervals and Sampling Distributions
Review Sheet with Answers
1. Write a paragraph discussing the topic of sampling variability. Make sure to address the
following questions. Will random samples always give the same sample means and sample
percentages (proportions)? Will random sample means and random sample percentages
always be the same as the population values? How difficult is it to estimate a population value
from 1 random sample? What does this tell us about the accuracy of most population claims in
the news and online?
2. Define the following terms
a) Sampling Distribution
b) Standard Error
c) Margin of Error
d) 95% confident
e) Confidence Interval
3. We spent a lot of time thinking about and working with sampling distributions. What is a
sampling distribution and why is it better than just looking at a single random data set? What
can the shape, center and spread of a sampling distribution tell us? What is the difference
between standard error and standard deviation? How can we calculate standard error from the
sampling distribution?
4. State the Central Limit Theorem and discuss its many implications.
5. For each of the following, identify the sample value and the margin of error. Then calculate
the confidence interval and write a sentence explaining the interval to someone in context.
These all came from large, random samples.
a) Weights of small breed of dog: 6.8 pounds  1.7 pound error
b) Percentage of seniors worldwide with Alzheimer’s Disease: 11.2%  1.3% error
c) Difference between the heights of women and men worldwide (women – men)
5.3 inches  1.7 inches
d) Difference between the percentage of people with high blood pressure in Bulgaria
and the percentage of people with high blood pressure in Finland.
48.9%  2.6% error
(Bulgaria ave chol – Finland ave chol)
6. For each of the following confidence intervals, write a sentence explaining the interval in
context. Then use the formulas below to calculate the sample value and the margin of error. Is
the sample value a mean, a percentage, a difference between two means, or the difference
between two percentages?
Sample Value =
Upper Limit + Lower Limit
2
Margin of Error =
Upper Limit  Lower Limit
2
a) 95% confidence interval estimate of price of gas in dollars (U.S.A. November 2015).
 $2.16 , $2.24
b) The population percentage chance of getting Lyme Disease if bitten by a tick:
1.2% , 1.5%
c) Difference between the amount of hemoglobin per decaliter (g/dL) of blood in men
and in women worldwide (men – women) 1.6 grams , 2.0 grams  .
(Note: men is population 1 and women is population 2.)
d) In track and field, the population difference between men’s world speed record
times and women’s world speed record times in percentage (men – women) is
between  10.35% ,  9.65%  . (Note: men is population 1 and women is
population 2.)
7. As a confidence level increases, does the margin of error increase or decrease? Explain why.
As a confidence level increases, does the confidence interval get wider or narrower? Explain
why. As the sample size increases does the standard error decrease or increase? Explain why.
8. Check the assumptions in order to determine whether or not we can estimate a population
value from the sample data. If the problem does not meet the assumptions, explain why. If the
problem does meet the assumptions, explain how and then use Statcrunch to calculate a 95%
confidence interval estimates from the data. Then write a sentence to explain the confidence
interval.
a) We wish to know the average salary for all K-12 teachers in the U.S. We took a random
sample of teachers across the U.S. and found the following data. A histogram showed a skewed
right shape.
Sample Size = 75
Sample Mean = $44,666
Sample Standard Deviation = $7,362
b) We wish to know what percent of people in Los Angeles have AB negative blood type. We
advertised on a TV commercial for people to come and donate blood for free ice cream. 325
people came to give blood, but only 4 had AB negative blood.
c) A hospital wants to know what percent of all of their patients are satisfied with their care
while at the hospital. They had a computer randomly select Hospital ID numbers. Those
patients selected were given a survey to fill out. Of the 115 patients selected, 87 said they were
satisfied with their care.
d) Jimmy is a skate board designer. He wants to know if the percentage of elementary and
junior high school students that like his skate boards is greater than the percentage of high
school students that like his skate boards. To find our Jimmy brings some of his boards over to
the elementary, junior high, and high schools near his house. He then asks kids after school if
they like his skate boards. Of the 135 elementary and junior high school students that Jimmy
asked, 102 said they liked the skate boards. Of the 87 high school students that Jimmy asked,
44 said they like the skate boards.
e) Use the health data on the website. This data describes the health statistics for 40 randomly
selected men and 40 randomly selected women. We want to use the data to estimate the
difference between the average BMI (body mass index) for women and the average BMI for
men. Is there a significant difference between women and men?
Review Sheet Answers - Confidence Interval/Sampling Distribution
1.
Sampling variability implies that when we take different random samples we get different means and
percentages. They do not come out the same. Sampling variability also implies that a random sample
mean or percent will not be the same as the population mean or population percent. In fact we can
calculate the margin of error i.e. the difference between the sample value and the population value.
Adding and subtracting the sample value and margin of error gives us confidence intervals. It is almost
impossible to estimate a population value from a single sample. The news is often very inaccurate when
they use a sample value and tell the public it is a population value.
2.
Sampling Distribution : Take a lot of random samples and calculate the mean or percent from each
sample. We then make a graph of all of the thousands of sample means or sample percentages. (We can
analyze the shape, center and spread of the distribution to better understand the population.)
Standard Error: Standard deviation of the sampling distribution. Not the standard deviation of a single
data set, but the standard deviation of the sample values for thousands of data sets.
Margin of Error: How far we think one sample value could be from the population value. Margin of error
is calculated by multiplying a z-score or t-score times the standard error.
95% confidence: 95% of confidence intervals created contain the population value and 5% of them don’t.
Confidence Interval: two numbers that we think the population value is in between. “We are 95%
confident that the population value is between ## and ##.”
3.
A sampling distribution is when we take a lot of random samples and calculate the mean or percent from
each sample. We then make a graph of all of the thousands of sample means or sample percentages. (We
can analyze the shape, center and spread of the distribution to better understand the population.)
Standard deviation is the variability in one data set. Standard Error is a measure of variability for the
sampling distribution (thousands of data sets). Standard Error measures the variability in sample means
or sample percentages. Calculate 95% of the dots in the sampling distribution, then estimate the values
that the middle 95% of the dots fall in between. If we divide the difference between the two values into 4
sections, you get an approximate standard error. And the center of the distribution is very close to the
population value. Using technology, you can also have the computer find the standard deviation
(standard error) of the sampling distribution directly which is more accurate.
4.
CLT: Central Limit Theorem : If the samples are large enough, the distribution of sample means or sample
percentages will be normal even if the population is skewed. If the original population is close to bell
shaped (nearly normal), then any size sample will give means that will be also nearly normal. For a
distribution of means to be bell shaped we like samples to be at least 30 or nearly normal. For a
distribution of percentages we like the sample to have at least 10 success and at least 10 failures. The
larger the data set the smaller the standard error. Standard deviation of 1 data set is quite a bit larger
than the standard error (stand dev from sampling distribution).
5.
a) Sample mean = 6.8 Lbs , Margin of Error = 1.7 Lbs
Confidence Interval ( 5.1 Lbs , 8.5 Lbs ) WE are 95% confident that the population mean average weight
of this breed of small dog is in between 5.1 pounds and 8.5 pounds.
b) Sample percent = 11.2% (or 0.112) ,
Margin of Error = 1.3% (or 0.013)
Confidence Interval : (9.9% , 12.5%) or (0.099 , 0.125)
We are 95% confident that between 9.9% and 12.5% of seniors worldwide have Alzheimer’s disease.
c) Sample mean difference = -5.3 inches
Margin of Error = 1.7 inches
Confidence interval : ( -7.0 , -3.6)
We are 95% confident that the average height of women is between 3.6 and 7 inches less than the
average height of men.
d) Sample Percent difference = 48.9% or 0.489
Margin of Error = 2.6% or 0.026
Confidence interval : ( 46.3% , 51.5% ) or ( 0.463 , 0.515 )
We are 95% confidence that the percent of people with high blood pressure in Bulgaria is between 46.3%
and 51.5% higher than the percent of people with high blood pressure in Finland.
6.
a) We are 95% confident that the mean average price of gas in the U.S. in November 2015 is between
$2.16 and $2.24 .
sample mean = (2.24 + 2.16) /2 = $2.20
margin of error = (2.24 – 2.16) / 2 = 0.08/2 = $0.04 (4 cents)
b) We are 95% confident that the population percent chance of getting Lyme disease if bitten by a tick is
between 1.2% and 1.5%.
sample percent = ( 0.015 + 0.012)/2 = 0.0135 or 1.35%
margin of error = ( 0.015 – 0.012)/2 = 0.0015 or 0.15%
c) We are 95% confident that the amount of hemoglobin per decaliter of blood for men is between 1.6
g/dL and 2.0 g/dL greater than women.
sample mean difference = 1.8 g/dL
margin of Error = 0.2 g/dL
d) We are 95% confident that world speed record times for men (population 1) are between 9.65% and
10.35% lower than the world speed record times for women (population 2).
Sample percent difference = -10% (or -0.1)
margin of error = 0.35% (or 0.0035)
7.
As the confidence level increases, the margin of error also increases (larger z score) and the interval gets
wider. As the confidence level decreases, the margin of error also decreases (smaller z score) and the
interval gets narrower.
As the sample size increases, the standard error and margin of error decreases, which gives a narrower
interval. If the sample size decreases, you have more error so the standard error and the margin of error
increase, which gives us a wider interval.
8.
a) The data does meet assumptions to check a population mean. The data is random and even though it
is skewed the sample size is greater than or equal to 30. Using Statcrunch T-stat, 1 sample, with
summary, we obtained the 95% confidence interval
( $42972.16 , $46359.84 ) . So we are 95% confident that the average yearly salary for k-12 teachers in
the U.S. is between $42,972.16 and $46,359.84 .
b) The data set was large enough, however this is voluntary response data and never representative of
the population. We would be wasting our time trying to use this data to estimate a population percent or
calculating a confidence interval. It needs to be random.
c) The data does meet assumptions for calculating a population percent. The data was random and it had
at least 10 people that were satisfied and at least 10 people that were not satisfied. Using Statcrunch ,
proportion stat, 1 sample, with summary, we obtained the following 95% confidence interval (0.678 ,
0.835 ). So we are 95% confident that the population percent of patients that are satisfied with their care
is between 67.8% and 83.5%.
d) The data set was large enough, however this is convenience data and never representative of the
population. We would be wasting our time trying to use this data to estimate the difference between two
population percentages or calculating a confidence interval for the difference. It needs to be random.
e) The data does meet assumptions to check a population mean. Both data sets are random. The groups
are independent of each other and both sample sizes are greater than or equal to 30. Using women’s
BMI as population 1 and men’s BMI as population 2, we used Statcrunch T-stat, 2 sample, with data, we
obtained the 95% confidence interval ( -2.48 BMI points , +1.96 BMI points ) . Note: If you had made
men’s BMI population 1 then you would of gotten ( -1.96 , +2.48 ) . Both answers tell us the same thing.
So we are 95% confident that there is no significant difference between women’s BMI and men’s BMI.