Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Maria Trujillo MAT 120.1605 Prof. Prabha Betne December 01, 2008 Sampling Distribution Project a. Suppose a random variable X has a Binomial distribution with n = 15 and p = 0.04. Find the mean and standard deviation of X. µ=n.p σ= np (1- p) µ = 15 * 0.04 σ= 0.6 (1 – 0.04) µ = 0.6 σ= 0.6 (0.96) σ= 0.576 σ= 0.759 c. Check if X1 (one column of 2000 numbers) follows a normal distribution. To check normality, do the following: Check- Find the mean, median and mode. What relation you would expect between the mean, median and mode if X1 is to have a normal distribution? The values of mean, median, and mode that you obtained for X1, can you say that X1 has a normal distribution? Why? Statistics X1 N Valid Missing 2000 0 Mean .5895 Median .0000 Mode .00 Being the values of Mean= .5895, Median=.0000 and Mode= .00 do not have a normal distribution because they are not the same or very close. Check-ii Obtain a histogram of X1 values and discuss the shape of the histogram. Does the shape of histogram suggest that X1 has a normal distribution? 1200 1000 800 600 400 200 Std. Dev = .76 Mean = .6 N = 2000.00 0 0.0 1.0 2.0 3.0 4.0 5.0 X1 No, it does not suggest that X1 has a normal distribution because the graph is left skewed. This means that most of the data falls to the left side. Check-iii Obtain a Normal QQ plot (Lesson 10) of the X1 values. Does the plot indicate that X1 has a normal distribution? Explain why. Normal Q-Q Plot of X1 3.5 3.0 2.5 2.0 1.5 1.0 .5 0.0 -1 0 1 2 3 4 5 6 5 6 Observed Value Detrended Normal Q-Q Plot of X1 2.0 1.5 1.0 .5 0.0 -.5 -1 0 Observed Value 1 2 3 4 From the above plot, the scores do not appear to be from a normal distributed population because most of the data is outling. d. Now assuming each row is a sample of size 30, find the mean of each row. e. Check if the means that you computed in part (d) follow a normal distribution. Follow the same three checks as you did for part (c). Statistics MEAN Valid N 2000 Missing 0 Mean .5943 Median .6000 Mode .57 400 300 200 100 Std. Dev = .14 Mean = .59 N = 2000.00 0 .19 .31 .25 MEAN .44 .38 .56 .50 .69 .63 .81 .75 .94 .88 1.06 1.00 Normal Q-Q Plot of MEAN 1.2 1.0 .8 .6 .4 .2 0.0 0.0 .2 .4 .6 .8 1.0 1.2 Observed Value Detrended Normal Q-Q Plot of MEAN .05 .04 .03 .02 .01 0.00 -.01 0.0 .2 Observed Value .4 .6 .8 1.0 1.2 f. In view of the central limit theorem, what did you expect the distribution of the means to be? What mean and standard deviation values did you expect for the means. (Compute this by hand) , σ x N (µ µ = n*p µ = 15 * 0.04 µ = 0.6 ) σ = σ = 30 0.759 30 σ = 0.1386 g. Find the mean and standard deviation values of the means column and compare with the value of mean and standard deviation that you discussed in part f. Statistics N MEAN Valid Missing 2000 0 Mean .5943 Median .6000 Mode Std. Deviation .57 .13712 The expected value of the Mean using the Central Limit Theorem is µ= 0.6 and σ= 0.1368. We can notice that my answers are really close to the ones of the Mean of Means computed in SPSS. Write your understanding about the Central Limit Theorem (CLT). How did this project help you understand the CLT? How can you use this result for solving problems in statistics? You may use any example from the book to explain. The Central Limit Theorem (CLT) says that regardless of the shape of the population when we have a large sample (more than 30) the sampling distribution of the data always is going to be normal. The CLT is very useful in solving statistics problems because even though the distribution of the data can not be normal as we can see in the tables above, if I have a large sample I can assume that the data that I have has a normal distribution without the necessity of solving for it. Even though the first tables do not have a normal distribution result we can notice that after computing the Mean of the Means we get a outcome very close to the one that I computed by hand assuming that we have a normal distribution. The CLT is very handy for solving statistics problems because all I need to make sure that my sample is larger than 30 or if the population Mean is normal to assume that I would have normal distribution.