Download PROJECT 4: Behavior of Confidence Intervals Due Date - UF-Stat

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Confidence interval wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Statistical inference wikipedia , lookup

Gibbs sampling wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Exploring Statistical
Concepts
Name: _______________________
UF ID#:______________________
Purpose:
 To explore the sampling distribution of the sample proportion and the sample mean
 To create and interpret confidence intervals for the population proportion and the
population mean
Due Date: June 10th, 2010 at the beginning of class. Late projects will receive a 20%
penalty.
Statistical Software: This project requires the use of the statistical software program
called Minitab. Minitab can be accessed:
 on the CIRCA computers (http://labs.circa.ufl.edu/hours.php
 by renting the program for free for 30 days through www.e-academy.com/minitab
 by using the computers in CBD 220
Tutoring Room: The tutoring room will meet in CBD 220 on Tuesday(June 8th 11 to
5pm ) and Wednesday(June 9th 11 to 7pm ). The TA will be there to help anyone with the
project and/or other questions about the class.
Part A: Exploring the Sampling Distribution of the Sample Proportion
Applet: http://www.stat.tamu.edu/~west/ph/sampledist.html
1. Terms of the simulation. In class, we studied the sampling distribution of p̂ , the
sample proportion of successes in a binomial experiment. We saw that this
distribution is approximately normal if np and n(1-p) are both greater than or equal to
fifteen. The parent population of the data is binary, meaning that it has only two
potential responses: success (1) or failure (0). Using the applet, we are going to
repeatedly draw samples from the binary distribution and compute the sample
proportion of successes. This will allow us to see when the sampling distribution of
p̂ can be approximated with the normal distribution. Familiarize yourself with the
applet and answer the following questions.
a)
Each of the samples that will be drawn from the parent distribution will be of the
same size. What is the symbol that the website uses to represent the size of each
sample? ____________
b) What is the symbol used to represent how many times we collect these samples?
___________
2. Simulation. For each setting of n and p given in the table that follows, compute the
values of np and n(1-p). Determine if np and n(1-p) are greater than or equal to
fifteen. Then use the applet to see the sampling distribution and determine if the
normal approximation is good for each case (get at least one thousand samples for
each combination of n and p). Select the value of p by using the drop down box next
to the population graph. For each combination, determine if the graph shows that the
sampling distribution of p̂ is close to normal. Look at symmetry, continuity (no big
gaps in the data), and tails.
Sketch
n
p
10
0.90
50
0.90
np
n(1-p)
both
≥ 15?
Continuity? Symmetry?
Normal
(No big
Approximation
gaps in
Good?
data)
1000 0.90
10
0.50
50
0.50
100
0.50
10
0.20
50
0.20
100
0.20
1000 0.20
3. Summary. Play with the applet a bit. In your own words, explain what combinations
of n and p are necessary for the sampling distribution of p̂ to be approximately
normal, and why.
Part B: Exploring the Sampling Distribution of the Sample Mean
1. Simulation. (Use the same applet as in part A.) For each parent distribution and
sample size given on the table that follows, write down the mean and the standard
deviation (given by the computer) in the first column. Then, compute the values of
the mean and standard deviation of the distribution of X in the theoretical columns
using the values that the Central Limit Theorem specifies. Then use the applet to get
the distribution (get at least one thousand samples for each case). Record the mean
and standard deviation of your simulation in the observed column. Comment on the
shape of the graph. NOTE –When you look at the shape, imagine it being smoother.
Parent
Population
Sampling Distribution of X
Sample
Size
Normal
μ=
σ=
2
Normal
σ=
30
Skewed
μ=
σ=
2
μ=
Skewed
σ=
30
μ=
Uniform
σ=
2
μ=
Uniform
σ=
μ=
Theoretical
Mean Stdev
Observed
Mean Stdev
30
2. Summary. Based on the results of the simulation, what happens to:
a) the shape of the distribution of x as n increases?
b) the mean of the distribution of x as n increases?
c)
the standard deviation of the distribution of x as n increases?
Shape
Part C: Identify the types of problems and entering the data.
1. On the first day of class, everyone was invited to take part in a class survey. One
of the questions is below. Identify the type of problem as either a situation where
you are trying to estimate the population mean or the population proportion.
a. Aside from class time, how many hours a week, on average, do you expect to
spend studying and completing assignments for this course?
_________________
2. Open Minitab.
3. Go to the website: www.stat.ufl.edu/~mmeece/2023/DataSummerA10.htm and
copy the data. Put in the number of hours spent studying for each student in the first
column.
Part D: Make a 95% Confidence interval for the population mean.
1. Summarize the data for the mean problem. To do this, go to STAT> Basic
Statistics> Display Descriptive Statistics. Double Click on the variable and select
O.K.
x = _________
s = ________
n = _________
2. Identify the following:
x=
μ=
3. Assumptions. What are the assumptions necessary for making inferences in this
case? Have they been met? (To help explore the data make a boxplot. To do this, go
to Graph, Boxplot, One y, Simple, O.K. On the next screen, double click on the
variable name and select o.k. ) Put a copy of the boxplot here.
4. Regardless of your answer to number 3, construct a 95% confidence interval for
the population mean. Go to Stat> Basic Statistics> 1 Sample t. Click inside the
Samples in column box. A list of variables will appear. Double click on “Study” and
select o.k. The 95% confidence interval should appear. Paste your Minitab output
result below.
95% CI:
5. Regardless of your answer to number 3, interpret the confidence interval.
6. Now, consider your answer to number 3. Do you trust your results in parts 4 and 5?
Explain.
7. More Interpretations: Suppose a random sample of 114 students was chosen, and
each student was asked how many hours he or she studies each week. The resulting
95% confidence interval for  was (8.9, 11.8). Determine if each one of the
following statements is true with a capital “T” or false with a capital “F.”
_____
a) 95% of all students study between 8.9 and 11.8 hours per week.
_____
b) 95% of all sample means will be between 8.9 and 11.8.
_____
c) 95% of samples will have averages between 8.9 and 11.8.
_____
d) For 95% of all samples,  will be between 8.9 and 11.8.
_____
e) For 95% of all samples,  will be included in the resulting 95%
confidence interval.
_____
f)
_____
g) The formula produces intervals that capture the population mean for 95%
of all samples.
The formula produces intervals that capture the sample mean for 95% of
all samples.
Part E: Explore your own Data Set.
Website: http://www.norc.org/GSS+Website/Data+Analysis/
1. Select a research question from the General Social Survey. Go to the new GSS
website. Open the catalog by clicking on the plus sign next to NORC Public Use
Catalog, the click on the Plus sign next to GSS, then click on the icon next to General
Social Survey 1972- 2006 and then variable description and then Mnemonic Index.
Select a letter and then a variable name. Play around and look at the various variable
options. Find a question in which it is reasonable to make a confidence interval. Print
the page that contains the question and the data. Attach this to your project.
2. Write down the question selected.
3. What will be considered a success?
4. Summarize the data.
number of successes: X =
total number of observations:
n=
5. Identify the following:
p=
p̂ 
6. Assumptions. What are the assumptions necessary for making inferences in this
case? Have they been met?
7. Construct a 95% confidence interval for the population proportion. Go to Stat>
Basic Statistics> 1 Proportion. Click on the bullet for “Summarized Data”. Enter the
number of events (X) and the number of trials (n). Select o.k. The 95% confidence
interval should appear. Paste the Minitab output.
8. Interpret the results of the confidence interval.