Download Sampling Distribution Project

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Maria Trujillo
MAT 120.1605
Prof. Prabha Betne
December 01, 2008
Sampling Distribution Project
a. Suppose a random variable X has a Binomial distribution with n = 15 and
p = 0.04. Find the mean and standard deviation of X.
µ=n.p
σ=
np (1- p)
µ = 15 * 0.04
σ=
0.6 (1 – 0.04)
µ = 0.6
σ=
0.6 (0.96)
σ=
0.576
σ=
0.759
c. Check if X1 (one column of 2000 numbers) follows a normal distribution. To
check normality, do the following:
Check- Find the mean, median and mode. What relation you would expect between
the mean, median and mode if X1 is to have a normal distribution? The values of
mean, median, and mode that you obtained for X1, can you say that X1 has a
normal distribution? Why?
Statistics
X1
N
Valid
Missing
2000
0
Mean
.5895
Median
.0000
Mode
.00
Being the values of Mean= .5895, Median=.0000 and Mode= .00 do not have a normal
distribution because they are not the same or very close.
Check-ii Obtain a histogram of X1 values and discuss the shape of the histogram.
Does the shape of histogram suggest that X1 has a normal distribution?
1200
1000
800
600
400
200
Std. Dev = .76
Mean = .6
N = 2000.00
0
0.0
1.0
2.0
3.0
4.0
5.0
X1
No, it does not suggest that X1 has a normal distribution because the graph is left skewed.
This means that most of the data falls to the left side.
Check-iii Obtain a Normal QQ plot (Lesson 10) of the X1 values. Does the plot
indicate that X1 has a normal distribution? Explain why.
Normal Q-Q Plot of X1
3.5
3.0
2.5
2.0
1.5
1.0
.5
0.0
-1
0
1
2
3
4
5
6
5
6
Observed Value
Detrended Normal Q-Q Plot of X1
2.0
1.5
1.0
.5
0.0
-.5
-1
0
Observed Value
1
2
3
4
From the above plot, the scores do not appear to be from a normal distributed population
because most of the data is outling.
d. Now assuming each row is a sample of size 30, find the mean of each row.
e. Check if the means that you computed in part (d) follow a normal distribution.
Follow the same three checks as you did for part (c).
Statistics
MEAN
Valid
N
2000
Missing
0
Mean
.5943
Median
.6000
Mode
.57
400
300
200
100
Std. Dev = .14
Mean = .59
N = 2000.00
0
.19
.31
.25
MEAN
.44
.38
.56
.50
.69
.63
.81
.75
.94
.88
1.06
1.00
Normal Q-Q Plot of MEAN
1.2
1.0
.8
.6
.4
.2
0.0
0.0
.2
.4
.6
.8
1.0
1.2
Observed Value
Detrended Normal Q-Q Plot of MEAN
.05
.04
.03
.02
.01
0.00
-.01
0.0
.2
Observed Value
.4
.6
.8
1.0
1.2
f. In view of the central limit theorem, what did you expect the distribution of the
means to be? What mean and standard deviation values did you expect for the
means. (Compute this by hand)
, σ
x
N (µ
µ
= n*p
µ
= 15 * 0.04
µ
= 0.6
)
σ
=
σ
=

30
0.759
30
σ
= 0.1386
g. Find the mean and standard deviation values of the means column and compare
with the value of mean and standard deviation that you discussed in part f.
Statistics
N
MEAN
Valid
Missing
2000
0
Mean
.5943
Median
.6000
Mode
Std. Deviation
.57
.13712
The expected value of the Mean using the Central Limit Theorem is µ= 0.6 and
σ= 0.1368. We can notice that my answers are really close to the ones of the Mean of
Means computed in SPSS.
Write your understanding about the Central Limit Theorem (CLT). How did this
project help you understand the CLT? How can you use this result for solving
problems in statistics? You may use any example from the book to explain.
The Central Limit Theorem (CLT) says that regardless of the shape of the population
when we have a large sample (more than 30) the sampling distribution of the data always
is going to be normal. The CLT is very useful in solving statistics problems because
even though the distribution of the data can not be normal as we can see in the tables
above, if I have a large sample I can assume that the data that I have has a normal
distribution without the necessity of solving for it. Even though the first tables do not
have a normal distribution result we can notice that after computing the Mean of the
Means we get a outcome very close to the one that I computed by hand assuming that we
have a normal distribution. The CLT is very handy for solving statistics problems
because all I need to make sure that my sample is larger than 30 or if the population
Mean is normal to assume that I would have normal distribution.