Download Lab100 Week 17: Binomial Probability Distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Transcript
Lab100 Week 17: Binomial Probability Distribution
By the end of the session you should be able to:
− identify some pmfs (probability mass functions)
− simulate Bernoulli random variables
− use the Binomial pmf to calculate probabilities.
New R commands
rbinom dbinom choose barplot prod lines
Q 17.1 WS: Probability Mass Functions A probability mass function (pmf) is a function
that gives the probability that a discrete random variable is exactly equal to some value. If
p(r) is a probability mass function then:
• p(r) ≥ 0 for all r
P
•
r p(r) = 1
Q 17.2 WS: Indicator Functions An indicator function such as IA (x) takes the value 1 when
x ∈ A and 0 otherwise. In R an indicator is expressed as TRUE or FALSE. For example to
express positivity of a vector x we type (x > 0). R will return a vector of the same length as
x with entries TRUE or FALSE depending on whether the entry in x satisfies the condition
or not. Try using indicators in the next question to test whether the functions are pmfs or
not.
Q 17.3 WS: Valid probability distributions Probability distributions with a finite number
of mass points can be represented as vectors. Using the summation operations and a test of
negativity we can isolate valid probability distributions:
p = c( 0.2, 0.3, 0.4, 0.5 )
sum(p)
(p<0)
Which vectors could represent probability distributions on the integers 1, 2, 3, 4 ?
(i) 0.2 0.3 0.1 0.5
(ii) 0.2 0.3 0.4 0.1
(iii) 0.2 0.3 0.4 -0.1
Give reasons.
Q 17.4 WS: Bernoulli random variable. The Bernoulli random variable takes the values
0 and 1 with probability 1 − θ and θ respectively. Clearly 0 ≤ θ ≤ 1. The probability
distribution takes the values [1 − θ, θ] on the points 0, 1.
To simulate five Bernoulli(0.3) variables:
1
x = rbinom(5,size=1,prob=0.3)
x
x = rbinom(5,1,0.3)
x
The R function binom gives Binomial random variables, but when the size is 1, these are
the same as Bernoulli random variables. Recall a coin toss, this is an experiment that has
two outcomes, heads or tails. If it is a fair coin then the probability of a head or a tail is
0.5. This is an example of a Bernoulli random variable with probability θ = 1 − θ = 0.5. So
we can simulate five Bernoulli(0.5) variables by tossing a coin and assigning heads= 0 and
tails= 1.
Q 17.5 WS: Binomial random variable. The Binomial random variable, R ∼ Binomial(n, θ),
is characterised as the total number of successes in n trials when the trials are independent
and each has the same probability, θ, of success.
To simulate five Binomial(100, θ) random variables:
x = rbinom(5,100,0.3)
x
x = rbinom(5,100,0.3)
x
Q 17.6 WS: Simulating Binomial from Bernoulli random variables. The binomial distribution occurs in the following way. Theory states that X1 , X2 , . . . , Xn are independently
Bernoulli(θ), and if
R = X1 + X2 + · · · + Xn
then R is distributed as Binomial(n, θ). For instance, R may represents the number of heads
in n throws of a coin. Let us see what this means empirically. Set n, θ as follows.
theta = 0.3 ; n = 100;
x = rbinom(n,1,theta) ;
r = sum(x) ; r
Are these numbers similar to those generated in the previous question.
Repeat with θ = 0.7
Q 17.7 WS: Products. The number 5! = 5.4.3.2.1 = 120. Evaluate
5*4*3*2*1
prod(1:5)
prod(1:25)
2
n!
n
=
r
k!(n − k)!
Try evaluating 5i for i = 1, . . . , 5 We can also use R to calculate this for us using the
choose function. It takes two arguments, n and r. Repeat the above using choose(5,i) for
i = 1, . . . , 5 and see if you get the same answers.
Binomial probability distribution
The general analytic formula for the probability distribution is
n
θr (1 − θ)n−r , r = 0, 1, . . . n
p(r) =
r
. When n = 5 and θ = 0.3
5
0.3r 0.75−r ,
p(r) =
r
r = 0, 1, . . . 5,
and is
0.1681 0.3601 0.3087 0.1323 0.0283 0.0024
numerically.
Q 17.8 WS: Coding the Binomial probability distribution in R This mathematical formula
can be coded directly in R
n = 5 ; theta = 0.3
p = rep(0, n+1)
# initialise
for (r in 0:n){
p[r+1] = choose(n,r)*theta^r*(1-theta)^(n-r)
}
p
p[1]
Here the rep function takes two arguments, the first is the value to repeat and the second
is how many times to repeat the value. So in the above, rep(0,n+1) will repeat the value
0, n + 1 times. Verify that it delivers the same values.
Usually maths notation and computer maths notation agree well. Here however there is a
problem. The mathematical p(0) of the Binomial probability distribution has to be stored
in the vector p starting at the first element which is p[1].
Q 17.9 WS: dbinom It is much simple to use a built in R-function.
3
dbinom( 0:5, 5, 0.3)
The d is short for density. To plot the probability distribution
n = 5 ; theta = 0.4
p = dbinom( 0:n, n, theta)
barplot(p)
p
Q 17.10 WS: Exercise using dbinom Calculate P(R = 0 or R = 1) when R ∼ Binomial(5, 0.3).
Repeat when n = 200.
Q 17.11 WS: pbinom Another R-function produces cumulative probabilities. Cumulative
probabilities are defined as P (X ≤ x). So for a discrete random variable that takes values 0, . . . , 5 we have P (X ≤ 3) equals P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3). We
could add up the probabilities found using the dbinom function or we can use the cumulative
probability function pbinom as follows.
dbinom( 0:5, 5, 0.3)
pbinom( 0:5, 5, 0.3)
The p is short for probability.
To find the probability that in 4 throws of fair coin there are 2 Heads or fewer.
pbinom( 2, 4, 0.5)
Find the probability that in 12 throws of a dice, the number of 4s is 3 or fewer.
Q 17.12 WS: Binomial probability distribution: empirical results. How well does the formula
fit an observed histogram of random numbers from the binomial?
Here we form the histogram of 200 simulations from a Binomial(n, θ) distribution and then
plot a scaled version of the true probability distribution on top.
n = 10; theta = 0.5;
r = rbinom(200, n, theta)
hist(r,11)
lines(0:10,200*dbinom(0:10, n, theta))
The lines function draws lines on existing graphics. It can take many arguments but the
most important are the first two, the x co-ordinates and the y co-ordinates. The lines
function then plots the points (x, y) and draws a line through them. If x and y are vectors
then they must be of the same length. Try the code above with different values for n and θ.
4
Try increasing the number of simulations from 100 to 1000 and to 10000. Does this give a
better approximation?
Lab100 Quiz Week 17
Q 17.13 QZ: qz100wk17.i: Probability distribution Which of the following five sequences
can represent a probability distribution?
0.3,
0.4,
0.1,
0.3,
0.6,
0.2, 0.3, 0.1, 0.1
0.5, 0.2, -0.1
0.2, 0.2, 0.1, 0.1, 0.1, 0.3
0.5, -0.1, 0.3
0.1, 0.1, 0.2, 0.1
(A) YN YN Y, (B) YN YY N, (C) NN YN N, (D) YY YN N, (E) YN NN N
Q 17.14 QZ: qz100wk17.ii: Binomial probability distribution The value of the binomial
probability distribution at r = 3 when n = 6, θ = 0.4 is
(A) 0.0467, (B) 0.1382, (C) 0.2765, (D) 0.1866, (E) 0.3110
Q 17.15 QZ: qz100wk17.iii: Binomial Probability Find the probability that a binomial
random variable takes the value 0, 1 or 2 when n = 12, and θ = 0.1. It is nearest to
(A) 0.0, (B) 0.2, (C) 0.4, (D) 0.6, (E) 0.8
Q 17.16 QZ: qz100wk17.iv: Binomial Probability Repeat (iii) when n = 120, and θ = 0.01.
It is nearest to
(A) 0.0, (B) 0.2, (C) 0.4, (D) 0.6, (E) 0.8
Q 17.17 QZ: qz100wk17.v: Simulations Set up simulations to verify the following.
A random variable has the Bernoulli distribution Binomial(1, p) if it can take one of two
values 0 or 1 with probability 1 − p and p respectively.
The random variable X ∼ Binomial(1, p) with P(X = 1) = p, and the independent random
variable Y have the same Bernoulli distribution. Which of the following statements are true?
(1)
(2)
(3)
(4)
(5)
XY has the Binomial(2, p) distribution,
XY has the Binomial(1, p) distribution,
XY has the Binomial(1, p2 ) distribution,
X + Y has the Binomial(2, p) distribution,
X + Y has the Binomial(2, p2 ) distribution.
5
(A) FT TF T, (B) FF TT F, (C) FF TF T, (D) TF FT F, (E) TF FF T
6