Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
C-N Math 201 Statistics – R 6 R U ready to rumble??? Things you are going to learn in this lesson: Understanding the features of the Normal distribution. Using Q-Q plots to test for a Normal distribution. 1. Start R from either the Start Menu or the MS main screen. 2. Recall that if an rv X from a population has a distribution with a mean of and a standard deviation of , the sampling distribution of the sample mean (X-bar) from such population has a mean of (same thing) and a standard deviation of /√ n. So for example if the random variable X represent the contents of cola in a 300ml bottle and X had a mean of 298ml and an sd of 2ml, then the sampling distribution of the sample mean from the contents of the bottles in a sample of 7 is expected to have a mean of 298ml and an sd of .7559 ml. Why??? Because = 298, = 2, and n = 7. The CLT informed us that when the sample size was large enough, the distribution of the sample mean got to be bell shaped. It is here that we introduce the Normal distribution as the result of the CLT. 3. The Normal distribution is the first continuous rv/distribution that we will study. We write X ~ N(, ) to mean that the rv X has the distribution Normal with mean and standard deviation . Let's look at a sample of observations from a N(15, 2) rv. Put the following commands on the command line. normvariable <- rnorm(100, mean = 15, sd = 2); stem(normvariable); boxplot(normvariable); Observe the graphs of this rv. You should be able to see the bell shaped nature of the distribution. But there is yet another way to determine the “normality” of an rv ... the famous Q – Q plot! This is simply a quantile plot comparing the data values from the data set with what would be expected for a normal rv. In R this is done easily with the following commands at the command line. qqnorm(normvariable); Notice that you have a plot going up diagonally from left to right. Simply compare the points on the graph with an imaginary diagonal line going from the lower left corner to the upper right corner of the graph. In this case the points should line up very close to the diagonal line. From here on, this will be another tool we use to ascertain the normality of a random variable. 4. We will now simulate the sampling distribution of the sample mean from the contents of cola bottles taken from a N(298, 2)ml distribution. To do this, we are going to simulate with 100 sample means of sample size 7 each and then look at the sample distribution. Remember that the difference between the terms sampling distribution and sample distribution. The first is the distribution of the sample mean. The second is something that you've observed many times to this point ... the distribution of a sample. We're going to use the second to assess the first. You may remember some of this stuff from the previous lab, but we'll go through it again here. Put the following command on the command line. B = c(); for (i in 1:100) B= cbind(B, (rnorm(7, mean = 298, sd = 2))); B; What you now have/see is a data set (the 7 x 100 matrix B) of 100 samples of size 7 of the aforementioned rv X. The command norm(7, mean = 298, sd = 2) just told the computer through R to generate samples of size 7 of the rv X. Now put the following on the command line. samplemean = colSums(B)/7; samplemean; We now have the 100 sample means from the samples of 7 bottles (not contents of individual bottles, but means of contents of 7 bottles) in the data set samplemean. Let's analyze the sample mean of X with the following commands at the command line. stem (samplemean); qqnorm(samplemean); mean(samplemean); sd(samplemean); Now you should have a Q-Q plot, stem 'n' leaf plot, mean and standard deviation for samplemean. Open an MS Word file and name it Rlab6. Look at the results you've just obtained in R, copy them to your MS Word file, and answer the following questions. Does the distribution of samplemean appear to be normal? Do the mean and standard deviation match what was expected in step 2”? Write the answers in your MS Word file for turn in at the end of lab. 5. In the previous lab we saw how the sample mean of any distribution becomes normal asymptotically. So obviously normal distributions are important. What we are going to do now is to demonstrate more of the properties of this distribution via simulation. First notice that we can now write the Central Limit Theorem in a much easier format. If the sampling size n is large (generally n > 40), then samplemean ~ N(, /√ n) where is the mean of X, and is the standard deviation of X, and samplemean is the sample mean of X. Again, notice that nothing is mentioned about the shape of the distribution of X. This is why the Central Limit Theorem is so powerful. We will now take up some probability questions dealing with the CLT and normal distribution, and answer these using simulations of 100. In each case. the answers to all of the probability questions can be got by taking the observations in the stem 'n' leaf plot that fit the criteria and dividing by 100. 6. Now here’s the first dealio. The NAEP was a test given around the year 2000 to high school seniors to measure academic prowess in the USA. According to reference, the NAEP score ~ N(300,36). What is the probability that a randomly selected test score exceeds 300? How about exceeding 336? To answer these bad boys, try using the following commands in R at the command line. naep <- rnorm(100, mean = 300, sd = 36); stem(naep); Now that you have the stemplot, copy/paste that to your MS Word file and answer the questions for turn in at the end of lab. 7. Now we R ready to rumble! Answer the same probability questions in the previous step except this time let the rv be the mean from a sample of size four from the same distribution. You may want to incorporate some R code seen previously in step 4. Remember that you need 100 simulations. Put the stem 'n' leaf plot and your answers in your MS Word file for turn in at the end of lab. 8. Next, we are going to do an internet access problem ... again by simulation. It was determined that in the USA in 2000, Internet access costs ~ EXP(28, 10)$ ... in this case it is a shifted exponential rv with a mean of $28 and sd of $10. Although we know the mean and standard deviation, we can anticipate that the original distribution is right skewed (not normal … think about this). So given a simple random sample of 500 households, what is the probability that the sample mean from that sample exceeds $29? Use the following R code to simulate a sample of 500 rv Internet Access Costs (IAC). iac <- rexp(500, rate = 1/10) + 18; Note that the rate is the inverse of the sd. Again, use R code from step 4 to obtain and analyze the sample mean. Once you have the stemplot of the 100 sims, copy/paste that to your MS Word file and answer the question for turn in at the end of lab. Okay smarty pants, now you've got the hang of this! Answer the last two questions just like in steps 7 and 8 using a stemplot of 100 sims of the sample mean of interest. You're on your own. Do it! Do it! 9. The number of accidents at a busy intersection each week has a mean of 2.2 accident, and a standard deviation of 1.4 accident. This is a “count” distribution, so we will initially use the Poisson (PO) distribution and say X(# of accidents/week) ~ PO(2.2dg/mi). Don’t worry about this … just take my word for it. Suppose we take a mean based on all the weeks in a year (that, incidentally, would be 52). What is the probability that the sample mean is less than 2. What is the probability that there are less than 100 accidents in that year? Use the following R code to simulate a sample of 52 rv Accidents per week (APW). apw <- rpois(52, lambda = 2.2); Note that lamda is the mean of the Poisson distribution. 10. Finally, the level of nitrogen oxides in the exhaust of a particular car model has a mean of .9 g/mi (gram/mile) and a standard deviation of .15 g/mi. This is again, a shifted Exponential rv, so we will initially say that the level of NOX ~ EXP(.9, .15) g/mi (note that the rate is 1/.15 and the shift is .75 g/mi). Now we want to know, given a simple random sample of 125 cars, a) what is the distribution for the sample mean of NOX, and b) find the 99th %ile for the sample mean of NOX( x 125 , . 01 )? 11. Turn in the documented printouts/answers from steps 4, and 6 through 10. 12. Exit R and log out of your C-N account before you leave. Don’t forget to take your property!!!