Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MAT1028 Laboratory Session 1: Introduction to R R is the free version of the statistical package Splus. To access R: Log on to a work station as usual Click on Start Go to All Programs Go to Other Department Software Go to Maths and Stats Click on Rgui.exe and the R window will open R is a command based language. Please work through the following exercises at your own pace. It is recommended that you copy and paste the R commands and output, including plots, into a word document as you progress through the sheet. Hint: if you wish to edit an earlier command line, use the “up arrow” key to save typing out the entire line again. …. at the end of the hour, type q() to quit R. The Binomial Distribution Start by plotting the binomial probability distribution for the binomial distributions B(10,0.25), B(10,0.50) and B(10,0.75). First split the graphics window into three using the command par(mfrow=c(1,3)). Then use the command ‘dbinom’ to calculate the probabilities for each distribution: plot(0:10,dbinom(0:10,10,0.25),type='h',lwd=4) plot(0:10,dbinom(0:10,10,0.50),type='h',lwd=4) plot(0:10,dbinom(0:10,10,0.75),type='h',lwd=4). (In the above commands, ‘h’ stands for ‘high-density’ vertical lines and lwd gives the line width – you can experiment with different values for lwd) How do the three plots differ? The command ‘pbinom’ calculates the cumulative distribution functions (i.e. P(X ≤ x)) for binomial distributions. To plot the cumulative distribution functions of B(10,0.25), B(10,0.50) and B(10,0.75) use the following: plot(0:10,pbinom(0:10,10,0.25),type='h',lwd=4) plot(0:10,pbinom(0:10,10,0.50),type='h',lwd=4) plot(0:10,pbinom(0:10,10,0.75),type='h',lwd=4). How do these plots differ? Specific probabilities can be calculated. For example to calculate P(X = 6) for a B(10,0.9) distribution use dbinom(6,10,0.9). To calculate P(X<= 6) for a B(10,0.9) distribution use pbinom(6,10,0.9). Note that to calculate P(X< 6) for a B(10,0.9) distribution use pbinom(5,10,0.9). Now using commands such as plot(0:20,dbinom(0:20,20,0.25),type='h',lwd=4) and plot(0:20,pbinom(0:20,20,0.25),type='h',lwd=4) produce plots of the probability distribution and cumulative distribution functions for B(20,0.25), B(20,0.50) and B(20,0.75). Examples using Binomial Distributions 1) Let X be a random variable with a B(12, 0.35) distribution. Calculate: a) P(X=4); c) P(X<=5); e) P(2<X<=5); b) P(X=7); d) P(X>6); f) P(X<4| X<7). 2) A commuter travels to work by train. The train is late with probability 0.15. Identify the distribution that can be used to model the number of times that the train is late over the next 4 weeks (ie 20 work days). What assumptions are you making? Calculate the probability that the train is late: a) exactly 3 times; b) no more than 5 times; c) exactly 5 times; d) more than 8 times. 3) A multiple choice paper contains 20 questions and each question has 5 possible answers. A pass is obtained if 8 or more correct answers are given. Determine the probability of passing by guesswork alone. Numerical Answers 1) a) 0.2366924, b) 0.05912461, c) 0.7872646, d) 0.08463207, e) 0.635977, f) 0.3787031 2) a) 0.2428289, b) 0.932692, c) 0.1028452, d) 0.001328908, 3) 0.03214266 The Poisson Distribution The Poisson probability distribution is another example of a discrete distribution. The Poisson distribution has one parameter, λ, and can be used to model events that happen at rate λ. For example if calls are received by a telephone switchboard at a rate of 20 per hour, then number of calls received in a particular one hour period can be modelled by a Poisson random variable with parameter 20. By extension, the number of calls received in a particular 15 minute period can be modelled by a Poisson random variable with parameter 4. Poisson distributions take values 0,1,2,3,……. The Poisson probability distribution and cumulative distribution functions can be plotted using similar commands to those used for the Binomial distributions. We will produce these plots for each of the Poisson distributions Po(5), Po(10) and Po(15). To view all six plots together use the command par(mfrow=c(3,2)). The command for the probability distribution function is ‘dpois’ and the command for the cumulative distribution function is ‘ppois’. The plot for the probability distribution for Po(5) can be obtained using the command: plot(0:30,dpois(0:30,5),type='h', lwd=2) Note that this command produces a plot giving the 31 probabilities P(X=0), P(X=1), ….,P(X=30) where X is a random variable with a Po(5) distribution. Similarly, the plot for the distribution function for Po(5) can be obtained using the command: plot(0:30,ppois(0:30,5),type='h', lwd=2) This command produces a plot giving the 31 probabilities P(X=0), P(X<=1), P(X<=2), …,P(X<=30) where X is a random variable with a Po(5) distribution. Now obtain the corresponding plots for the Po(10) and Po(15) distributions. Note that although Poisson random variables can take values larger than 30, for the three distributions chosen, the probabilities of values larger than 30 are negligible. Specific probabilities can be calculated. In the following examples make sure that you understand why the command given is appropriate. For example to calculate P(X=6) for a Po(5) distribution use dpois(6,5). To calculate the probability of getting a 7 or an 8 from a random variable with a Po(5) distribution use dpois(7,5)+dpois(8,5). To calculate P(X<=6) for a Po(5) distribution use ppois(6,5). To calculate P(X<4) for a Po(5) distribution use ppois(3,5). To calculate P(X>7) for a Po(5) distribution use 1-ppois(7,5). To calculate P(X>8| X>6) for a Po(5) distribution use (1-ppois(8,5))/(1-ppois(6,5)). Examples using Poisson Distributions 4) The number of vehicles passing a checkpoint per hour can be modelled by a Poisson random variable with parameter 18. Calculate the following using R: a) What is the probability that exactly 17 cars pass the checkpoint in an hour. b) What is the probability that 17, 18 or 19 cars pass the checkpoint in an hour. c) What is the probability that 15 cars or fewer pass the checkpoint in an hour. d) What is the probability that less than 12 cars pass the checkpoint in an hour. e) Given that at least 15 pass what is the probability that more than 20 cars pass the checkpoint in an hour. 5) The number of incidents of volcanic activity recorded in a particular volcano over the period of a year can be modelled by a Poisson distribution with parameter 0.12. a) Calculate the probability that no eruptions are recorded in a given year. b) Using an appropriate Binomial distribution, calculate the probability that in a 10 year period there are (i) exactly 2 years in which eruptions occur (ii) less than 4 years in which eruptions occur (iii) no eruptions. c) Using an appropriate Poisson distribution, calculate the probability that in a 10 year period there are no eruptions. Compare your answer to that obtained in b)(iii). Numerical Answers 4) a) 0.09359732, b) 0.2758658, c) 0.2866529, d) 0.05488742, e) 0.340033 5) a) 0.8869204, b) (i) 0.2203221, (ii) 0.9804371, (iii) 0.3011942, c) 0.3011942 Random Samples From Binomial And Poisson Distributions The function rpois generates random data from the Poisson distribution. For example the command data10<-rpois(10,5) generates a random sample of size 10 from a Po(5) distribution. Here the vector containing the data has been called ‘data10’, but of course, any name will do! Having generated a random sample of size 10 from a Poisson distribution with parameter 5, next produce a graphical representation of the data using hist(data10) and then calculate the mean and variance of your sample with the commands mean(data10) and var(data10). Now use similar commands to produce a random sample of size 50 from Po(5). Call the vector containing the sample ‘data50’. Produce a histogram of the data and find the mean and variance. Next generate random samples of sizes 100 and 1000. For the Poisson distribution with parameter λ, the distribution mean and variance are both λ. The mean and variance of the samples tend to get closer to the distribution mean and variance as the sample size increases. Now generate random samples of various sizes from a Poisson distribution with parameter 14. Calculate the mean and variance of each sample. Random samples can also be generated from binomial distributions using the command rbinom. Use the following commands to generate a random sample of size 10 from a B(20,0.6) distribution, produce a histogram of the sample and calculate the mean and variance. For convenience the vector containing the sample has been named ‘bdata10’. bdata10<-rbinom(10,20,0.6) hist(bdata10) mean(bdata10) var(bdata10) The mean of the B(n,p) distribution is np. The variance is np(1-p). Hence, calculate the mean and variance of the B(20,0.6) distribution and compare these values with the mean and variance of your sample. Now try using sample sizes 50, 100 and 1000. What happens to the histogram as the sample size increases? Do the mean and variance calculated from the data get closer to the theoretical values?