Download Now using commands such as plot(0:20,dbinom(0:20,20,0.25),type

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
MAT1028 Laboratory Session 1: Introduction to R
R is the free version of the statistical package Splus.
To access R:
 Log on to a work station as usual
 Click on Start
 Go to All Programs
 Go to Other Department Software
 Go to Maths and Stats
 Click on Rgui.exe and the R window will open
R is a command based language. Please work through the following exercises at your own pace. It is
recommended that you copy and paste the R commands and output, including plots, into a word
document as you progress through the sheet.
Hint: if you wish to edit an earlier command line, use the “up arrow” key to save typing out the entire
line again.
…. at the end of the hour, type q() to quit R.
The Binomial Distribution
Start by plotting the binomial probability distribution for the binomial distributions B(10,0.25),
B(10,0.50) and B(10,0.75). First split the graphics window into three using the command
par(mfrow=c(1,3)). Then use the command ‘dbinom’ to calculate the probabilities for each
distribution:
plot(0:10,dbinom(0:10,10,0.25),type='h',lwd=4)
plot(0:10,dbinom(0:10,10,0.50),type='h',lwd=4)
plot(0:10,dbinom(0:10,10,0.75),type='h',lwd=4).
(In the above commands, ‘h’ stands for ‘high-density’ vertical lines and lwd gives the line width – you
can experiment with different values for lwd)
How do the three plots differ?
The command ‘pbinom’ calculates the cumulative distribution functions (i.e. P(X ≤ x)) for binomial
distributions. To plot the cumulative distribution functions of B(10,0.25), B(10,0.50) and B(10,0.75) use
the following:
plot(0:10,pbinom(0:10,10,0.25),type='h',lwd=4)
plot(0:10,pbinom(0:10,10,0.50),type='h',lwd=4)
plot(0:10,pbinom(0:10,10,0.75),type='h',lwd=4).
How do these plots differ?
Specific probabilities can be calculated. For example to calculate P(X = 6) for a B(10,0.9) distribution
use dbinom(6,10,0.9). To calculate P(X<= 6) for a B(10,0.9) distribution use pbinom(6,10,0.9). Note
that to calculate P(X< 6) for a B(10,0.9) distribution use pbinom(5,10,0.9).
Now using commands such as plot(0:20,dbinom(0:20,20,0.25),type='h',lwd=4)
and
plot(0:20,pbinom(0:20,20,0.25),type='h',lwd=4) produce plots of the probability distribution and
cumulative distribution functions for B(20,0.25), B(20,0.50) and B(20,0.75).
Examples using Binomial Distributions
1) Let X be a random variable with a B(12, 0.35) distribution. Calculate:
a) P(X=4);
c) P(X<=5);
e) P(2<X<=5);
b) P(X=7);
d) P(X>6);
f) P(X<4| X<7).
2) A commuter travels to work by train. The train is late with probability 0.15. Identify the distribution
that can be used to model the number of times that the train is late over the next 4 weeks (ie 20 work
days). What assumptions are you making? Calculate the probability that the train is late:
a) exactly 3 times;
b) no more than 5 times;
c) exactly 5 times;
d) more than 8 times.
3) A multiple choice paper contains 20 questions and each question has 5 possible answers. A pass is
obtained if 8 or more correct answers are given. Determine the probability of passing by guesswork
alone.
Numerical Answers
1) a) 0.2366924, b) 0.05912461, c) 0.7872646, d) 0.08463207, e) 0.635977, f) 0.3787031
2) a) 0.2428289, b) 0.932692, c) 0.1028452, d) 0.001328908,
3) 0.03214266
The Poisson Distribution
The Poisson probability distribution is another example of a discrete distribution. The Poisson
distribution has one parameter, λ, and can be used to model events that happen at rate λ. For example if
calls are received by a telephone switchboard at a rate of 20 per hour, then number of calls received in a
particular one hour period can be modelled by a Poisson random variable with parameter 20. By
extension, the number of calls received in a particular 15 minute period can be modelled by a Poisson
random variable with parameter 4. Poisson distributions take values 0,1,2,3,…….
The Poisson probability distribution and cumulative distribution functions can be plotted using similar
commands to those used for the Binomial distributions. We will produce these plots for each of the
Poisson distributions Po(5), Po(10) and Po(15). To view all six plots together use the command
par(mfrow=c(3,2)). The command for the probability distribution function is ‘dpois’ and the command
for the cumulative distribution function is ‘ppois’.
The plot for the probability distribution for Po(5) can be obtained using the command:
plot(0:30,dpois(0:30,5),type='h', lwd=2)
Note that this command produces a plot giving the 31 probabilities P(X=0), P(X=1), ….,P(X=30) where
X is a random variable with a Po(5) distribution.
Similarly, the plot for the distribution function for Po(5) can be obtained using the command:
plot(0:30,ppois(0:30,5),type='h', lwd=2)
This command produces a plot giving the 31 probabilities P(X=0), P(X<=1), P(X<=2), …,P(X<=30)
where X is a random variable with a Po(5) distribution.
Now obtain the corresponding plots for the Po(10) and Po(15) distributions. Note that although Poisson
random variables can take values larger than 30, for the three distributions chosen, the probabilities of
values larger than 30 are negligible.
Specific probabilities can be calculated.
In the following examples make sure that you understand why the command given is appropriate.






For example to calculate P(X=6) for a Po(5) distribution use dpois(6,5).
To calculate the probability of getting a 7 or an 8 from a random variable with a Po(5)
distribution use dpois(7,5)+dpois(8,5).
To calculate P(X<=6) for a Po(5) distribution use ppois(6,5).
To calculate P(X<4) for a Po(5) distribution use ppois(3,5).
To calculate P(X>7) for a Po(5) distribution use 1-ppois(7,5).
To calculate P(X>8| X>6) for a Po(5) distribution use (1-ppois(8,5))/(1-ppois(6,5)).
Examples using Poisson Distributions
4) The number of vehicles passing a checkpoint per hour can be modelled by a Poisson random variable
with parameter 18. Calculate the following using R:
a) What is the probability that exactly 17 cars pass the checkpoint in an hour.
b) What is the probability that 17, 18 or 19 cars pass the checkpoint in an hour.
c) What is the probability that 15 cars or fewer pass the checkpoint in an hour.
d) What is the probability that less than 12 cars pass the checkpoint in an hour.
e) Given that at least 15 pass what is the probability that more than 20 cars pass the checkpoint in an
hour.
5) The number of incidents of volcanic activity recorded in a particular volcano over the period of a year
can be modelled by a Poisson distribution with parameter 0.12.
a) Calculate the probability that no eruptions are recorded in a given year.
b) Using an appropriate Binomial distribution, calculate the probability that in a 10 year period there are
(i) exactly 2 years in which eruptions occur
(ii) less than 4 years in which eruptions occur
(iii) no eruptions.
c) Using an appropriate Poisson distribution, calculate the probability that in a 10 year period there are
no eruptions. Compare your answer to that obtained in b)(iii).
Numerical Answers
4) a) 0.09359732, b) 0.2758658, c) 0.2866529, d) 0.05488742, e) 0.340033
5) a) 0.8869204, b) (i) 0.2203221, (ii) 0.9804371, (iii) 0.3011942, c) 0.3011942
Random Samples From Binomial And Poisson Distributions
The function rpois generates random data from the Poisson distribution. For example the command
data10<-rpois(10,5) generates a random sample of size 10 from a Po(5) distribution. Here the vector
containing the data has been called ‘data10’, but of course, any name will do!
Having generated a random sample of size 10 from a Poisson distribution with parameter 5, next
produce a graphical representation of the data using hist(data10) and then calculate the mean and
variance of your sample with the commands mean(data10) and var(data10).
Now use similar commands to produce a random sample of size 50 from Po(5). Call the vector
containing the sample ‘data50’. Produce a histogram of the data and find the mean and variance.
Next generate random samples of sizes 100 and 1000. For the Poisson distribution with parameter λ, the
distribution mean and variance are both λ. The mean and variance of the samples tend to get closer to
the distribution mean and variance as the sample size increases.
Now generate random samples of various sizes from a Poisson distribution with parameter 14. Calculate
the mean and variance of each sample.
Random samples can also be generated from binomial distributions using the command rbinom. Use
the following commands to generate a random sample of size 10 from a B(20,0.6) distribution, produce
a histogram of the sample and calculate the mean and variance. For convenience the vector containing
the sample has been named ‘bdata10’.
bdata10<-rbinom(10,20,0.6)
hist(bdata10)
mean(bdata10)
var(bdata10)
The mean of the B(n,p) distribution is np.
The variance is np(1-p).
Hence, calculate the mean and variance of the B(20,0.6) distribution and compare these values with the
mean and variance of your sample.
Now try using sample sizes 50, 100 and 1000. What happens to the histogram as the sample size
increases? Do the mean and variance calculated from the data get closer to the theoretical values?