Download Slide_Chap1(2) - Portal UniMAP

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Probability Distributions








A variable (A, B, x, y, etc.) can take any of a specified set of values.
When the value of a variable is the outcome of a statistical experiment, that
variable is a random variable.
Generally, statisticians use a capital letter to represent a random variable
and a lower-case letter, to represent one of its values.
For example, let x represents the random variable X. Then
P(x) represents the probability of X.
P(X = x) refers to the probability that the random variable X is equal to a
particular value, denoted by x. As an example, P(X = 1) refers to the
probability that the random variable X is equal to 1.
All the probabilities must be between 0 and 1;
0≤ P(X=x)≤ 1.
The sum of the probabilities of the outcomes must be 1.
∑ P(X=x)=1
Probability
Distributions
Discrete
Probability
Distributions
Binomial
Poisson
Continuous
Probability
Distributions
Normal
All possible outcomes of an experiment comprise a set that is called the
sample space. We are interested in some numerical description of the
outcome.
For example, when we toss a coin 3times, and we are interested in the
number of heads that fall, then a numerical value of 0,1,2,3 will be
assigned to each sample point.
They may be thought of as the values assumed by some random variable
x, which in this case represents the number of heads when a coin is tossed
3 times.
So we could write x1 = 0, x2 = 1, x3 = 2 and x4 = 3.
Example 1
Suppose you flip a coin two times. This simple statistical experiment can
have four possible outcomes:
S={HH, HT, TH, TT}
Now, let the variable X represent the number of Heads that result from this
experiment. The variable X can take on the values 0, 1, or 2. In this example,
X is a random variable; because its value is determined by the outcome of a
statistical experiment.
The probability distribution of the given experiment is given by,
Number of heads
0
1
2
Probability
0.25
0.50
0.25
A cumulative probability refers to the probability that the value of a random
variable falls within a specified range. What is the probability that the coin
flips would result in one or fewer heads?


It would be the probability that the coin flip experiment results in zero heads
plus the probability that the experiment results in one head.
P(X < 1) = P(X = 0) + P(X = 1) = 0.25 + 0.50 = 0.75
A cumulative probability distribution is given by
Number of heads:
x
Probability:
P(X = x)
Cumulative Probability:
P(X < x)
0
0.25
0.25
1
0.50
0.75
2
0.25
1.00
Example 2
Suppose a die is tossed. What is the probability that the die will land on 5 ?
When a die is tossed, there are 6 possible outcomes represented by:
S = { 1, 2, 3, 4, 5, 6 }.
Each possible outcome is a random variable (X), and each outcome is equally
likely to occur. Therefore, the P(X = 5) = 1/6.
Example 3
Suppose we repeat the dice tossing experiment described in Example 2. This
time, we ask what is the probability that the die will land on a number that is
smaller than 5 ?
This problem involves a cumulative probability. The probability that the die will
land on a number smaller than 5 is equal to:

P( X < 5 ) = P(X = 1) + P(X = 2) + P(X = 3) + P(X = 4)
= 1/6 + 1/6 + 1/6 + 1/6
= 2/3
A binomial experiment is a statistical experiment that has the following
properties:
 The random experiment consists of n identical trials.
 Each trial can result in one of two outcomes, which we denote by success, S
or failure, F.
 The trials are independent.
 The probability of success is constant from trial to trial, we denote the
probability of success by p and the probability of failure is equal to
(1 - p) = q.
Examples:
1. No. of getting a head in tossing a coin 10 times.
2. No. of getting a six in tossing 7 dice.
3. A firm bidding for contracts will either get a contract or not
7
A binomial experiment consist of n identical trial with probability
of success, p in each trial. The probability of x success in n trials
is given by
P( X  x)  nCx p x q n  x x = 0, 1, 2, ......, n
The Mean and Variance of X
If X ~ B(n,p), then
Mean
:   E ( X )  np
Variance
:
Std Deviation :
 2  V ( X )  np(1  p)  npq
  npq
where n is the total number of trials, p is the probability of
success and q is the probability of failure.
8
Example 4
Given that X ~ b(12, 0.4), find
a) P ( X  2)
b) P ( X  3)
c) P ( X  4)
d) P (2  X  5)
e) E( X )
f) Var( X )
Answer
a) P ( X  2)  12C2 (0.4) 2 (0.6)10
 0.0639
b) P ( X  3)  12C3 (0.4)3 (0.6)9
 0.1419
9
c) P ( X  4)  12C4 (0.4) 4 (0.6)8
 0.2128
d) P (2  X  5)  P ( X  2)  P( X  3)  P ( X  4)
 0.0639  0.1419  0.2128
=0.4185
e) E ( X )  
= np
= 12(0.4)
=4.8
f) Var ( X )   2
= npq
= 12(0.4)(0.6)
= 2.88
10
provided in the tables are in the cumulative form,
the following guidelines can be used:
P X  x
Example 5
Example 6
Exercises
A machine produces parts of which 5% are defective. If a random sample of ten
parts produced by this machine contains more than one defective part, the machine
is shut down for repairs. Find the probability that the machine will be shut down for
repairs based on this sampling plan. (answer: 0.0861)
According to the USA Snapshot ® “Knowing drug addicts,” 45% of Americans know
somebody who became addicted to a drug other than alcohol. Assuming this to be
true, what is the probability that out of a group of 30 randomly selected Americans:
a. exactly 15 know somebody who became addicted to a drug? (answer: 0.124)
b. at most 15 know somebody who became addicted to a drug? (answer: 0.769)
c.
more than 15 know somebody who became addicted to a drug? (answer: 0.231)
d. between 10 and 15 know somebody who became addicted to a drug?
(answer: 0.70)
6. Suppose that you take a five-question multiple-choice quiz by guessing. Each
question has possible answers a, b, c, d and only one is correct.
a. What is the probability that you guess more than half of the answer correctly?
(answer: 0.104)
a. What is the probability that the first question is correct if quessing?
(answer: 0.25)
1) P(x > 1) = P(x ≥ 2) = 1 - binomcdf(10,.05,1) ≈ 1-0.9139 ≈ 0.0861
2)
(a) P(x = 15) = binompdf(30,.45,15) ≈ 0.12425 ≈ 0.124
(b) P(x ≤ 15) = binomcdf(30,.45,15) ≈ 0.76909 ≈ 0.769
(c) P(x ≥ 16) = 1 - binomcdf(30,.45,15) ≈ 0.23091 ≈ 0.231
(d) P(10 ≤ x ≤ 15) = binomcdf(30,.45,15) - binomcdf(30,.45,9) ≈0.69968
≈0.700
3)
(a) P(x ≥ 3) = 1 - binomcdf(5,.25,2) ≈ 0.10352 ≈ 0.104
(b) P(Answer 1st Question by guessing) = 1/4 = 0.25
A random variable X has a Poisson distribution and it is referred to as a
Poisson random variable if and only if its probability distribution is given by
e   x
P( X  x) 
for x  0,1, 2,3,...
x!
is the long run mean number of events for a specific time or space
dimension of interest. Space can be dimensions, place or time or
combination of them. A random variable X having a Poisson distribution
can also be written as
X ~ Po ( )
with
E ( X )   and Var ( X )  
Examples:
cars passing a toll booth in one hour.
defects in a square meter of fabric
No. of network error experienced in a day.
1
8
Example 6
Given that X ~ Po (4.8) , find
a) P( X  0)
b) P( X  9)
c) P( X  1)
Answer
e 4.8 4.80
a) P ( X  0) 
 0.0082
0!
e 4.8 4.89
b) P( X  9) 
 0.0307
9!
c) 1  P ( X  0)  1  0.0082
= 0.9918
1
9
Example 7
Suppose that the number of errors in a piece of software has a
Poisson distribution with parameter   3 . Find
a) the probability that a piece of software has no errors.
b) the probability that there are three or more errors in piece of software .
c) the mean and variance in the number of errors.
Answer
e 3  30
a) P( X  0) 
0!
 e3  0.050
b)P( X  3)  1  P( X  0)  P( X  1)  P( X  2)
e 3  30 e 3  31 e 3  32
 1


0!
1!
2!
1 3 9 
 1  e3    
1 1 2 
 1  0.423  0.577
20
Example 8
 use   2.4  4  9.6 
Exercise
The number of industrial injuries per working week in a particular factory is
known to follow a Poisson distribution with mean 0.5.
Find the probability that
(a) in a particular week there will be:
(i) less than 2 accidents (Answer: 0.9098)
(ii) more than 2 accidents (Answer: 0.0144)
(b) in a three week period there will be no accidents. (Answer: 0.223)
Example 8
Exercise
On the average, 1 computer in 800 crashes during a severe thunderstorm. A
certain company had 4,000 working computers when the area was hit by a
severe thunderstorm.
a) Compute the expected value and variance of the number of crashed
computers. (5, 4.994)
b) Compute the probability that less than 10 computers crashed. (0.968)
c) Compute the probability that exactly 10 computers crashed. (0.018)
Let X be the number of crashed computers. This is the number of ”successes”
(crashed computers) out of 4,000 ”trials” (computers), with the probability of
success 1/800. Thus, it has Binomial distribution with parameters n=4000
and p=1/800.


A continuous variable involves a measurement of something, such as the
height of a person, the weight of a newborn baby, or the length of time a car
battery lasts. The probabilities are presented by the areas under the
continuous curves (probability densities or continuous distribution).
Probability densities are characterized by the fact that the area under the
curve between any two values a and b gives the probability that a random
variable having this continuous distribution will take on a value on the
interval from a to b. The total area under the curve must equal to one.
A continuous random variable X is said to have a normal distribution with
parameter  and  2, where      and  2  0 with probability
density function is
1
f ( x) 
e
 2
1  x 
 

2  
2
  x  
If X ~ N (  ,  2 ) then E  X    and V  X    2



‘Bell Shaped’
Symmetrical
Mean, Median and Mode are equal
Location is determined by mean, μ.
Spread is determined by the standard deviation, σ .
Rules of Data Dispersion
The Standard Normal curve, shown here, has mean 0 and standard
deviation 1. If a dataset follows a normal distribution, based on empirical
rule
1. 68% of the observations will lie within one standard deviation of the
mean      .
2. 95% of the observations will lie within two standard deviation of the
mean    2  .
3. 99.7% of the observations will lie within three standard deviation of the
mean    3  .
 By varying the parameters μ and σ, there
are infinitely many different normal
distributions.
 Standardizing converts all normal
distributions to the standard normal
distribution.
Standard
normal
distribution
The normal distribution with parameters   0 and  2  1 is called a standard
normal distribution. A random variable that has a standard normal
distribution is called a standard normal random variable and denoted by
Z ~ N (0,1)
If x is a random normal variable with E  x   
normal random variable is defined as
Z
and V  x    . The standard
2
x

with E  z   0 and V  z   1 .
 The total area under the standard normal curve is 1.
 The standard normal curve is symmetric about 0.
 Almost all the area under the standard normal curve lies between -3 and 3.
29
Example 9
Example 10
Determine the probability or area for the portions of the
Normal distribution described.
a) P (0  Z  0.45)
b) P (2.02  Z  0)
c) P ( Z  0.87)
d) P (2.1  Z  3.11)
e) P (1.5  Z  2.55)
32
Answer
a) P(0  Z  0.45) = 0.1736
b) P(2.02  Z  0) = 0.47831
c) P( Z  0.87)  0.5  0.3078
 0.8078
d)
e)
33
Example 10
Example 11
Answer
Determine Z such that
a) P ( Z  0.6745)  0.25
a) P( Z  Z )  0.25
b) P ( Z  0.3585)  0.36
b) P( Z  Z )  0.36
c) P( Z  Z )  0.983
d) P( Z  Z )  0.89
c) P ( Z  2.1201)  0.983
d) P ( Z  1.2265)  0.89
35
Example 12
Suppose X is a normal distribution N(25,25). Find
a) P(24  X  35)
b) P( X  20)
Answer
24 − 25
35 − 25
a) 𝑃(24 < 𝑋 ≤ 35) = 𝑃
<𝑍≤
5
5
= 𝑃(−0.2 < 𝑍 ≤ 2)
= 0.0793 + 0.4772
= 0.5565
20 − 25
b) 𝑃(𝑋 ≥ 20) = 𝑃 𝑍 ≥
5
= 𝑃(𝑍 ≥ −1)
= 0.5 + 0.3413
= 0.8413
36

When the number of observations or trials n in a binomial
experiment is relatively large, the normal probability
distribution can be used to approximate binomial probabilities.
A convenient rule is that such approximation is acceptable
when n  30, and both np  5 and nq  5.

Definition
Given a random variable X ~ b(n, p), if n  30 and both np  5
and nq  5, then X ~ N ( np, npq)
X  np
with Z 
npq
38
The continuous correction factor needs to be made when a continuous
curve is being used to approximate discrete probability distributions.
0.5 is added or subtracted as a continuous correction factor according
to the form of the probability statement as follows:
c .c
a) P( X  x) 
 P( x  0.5  X  x  0.5)
c .c
b) P( X  x) 
 P( X  x  0.5)
c .c
c) P( X  x) 
 P( X  x  0.5)
c .c
d) P( X  x) 
 P( X  x  0.5)
c .c
e) P( X  x) 
 P( X  x  0.5)
c.c  continuous correction factor
39
Example
In a certain country, 45% of registered voters are male. If
300 registered voters from that country are selected at
random, find the probability that at least 155 are males.
40
Solutions
X is the number of male voters.
X ~ b(300, 0.45)
c .c
P ( X  155) 
 P( X  155  0.5)  P( X  154.5)
np  300(0.45)  135  5
nq  300(0.55)  165  5

154.5  300(0.45) 
154.5  135 

PZ 
  P  Z 


300(0.45)(0.55)
74.25




 P( Z  2.26)
 0.01191
41
Suppose that 5% of the population over 70 years
old has disease A. Suppose a random sample of
9600 people over 70 is taken. What is the
probability that fewer than 500 of them have
disease A?

When the mean  of a Poisson distribution is
relatively large, the normal probability distribution can
be used to approximate Poisson probabilities. A
convenient rule is that such approximation is
acceptable when   10.
 Definition
Given a random variable X ~ Po ( ), if   10, then X ~ N ( ,  )
with Z 
X 

43
Example
A grocery store has an ATM machine inside. An average
of 5 customers per hour comes to use the machine. What
is the probability that more than 30 customers come to
use the machine between 8.00 am and 5.00 pm?
44
Solutions
X is the number of customers come to use the ATM machine in 9 hours.
X ~ Po (45)
  45  10
X ~ N (45, 45)
c .c
P( X  30) 
 P ( X  30  0.5)  P ( X  30.5)
30.5  45 

PZ 
  P( Z  2.16)
45 

 0.98461
45