Download probability distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bayesian inference in marketing wikipedia , lookup

Transcript
INTRODUCTION TO STATISTICS ST.PAULS UNIVERSITY
PROBABILITY DISTRIBUTIONS
Definition of a probability distribution
It is a listing of all the outcomes of an experiment and the probability associated with each
outcome.
Characteristics of a probability distribution
 The probability of a particular outcome is between 0 and 1, inclusive.
 The sum of the probabilities of all mutually exclusive outcomes is 1.0
Random variables
Definition: It is a quantity resulting from an experiment that by chance can assume different
values.
 Discrete random variables are variables that can assume only certain clearly separated values
of some item of interest.
 Continuous random variables can assume any value.
For a discrete random variable, the sum of the probabilities is 1 i.e.
 P( X  x)  1.
all x
The function that is responsible for allocating probabilities, P( X  x) , is known as the
probability density function of X, sometimes abbreviated to the p.d.f. of X.
The mean, variance and standard deviation of a probability distribution
 The mean  also referred to as the expected value is denoted by E (x ).

 = E (x) = x.P( x) where P (x ) is the probability of the possible value of the random
variable x .
2
The variance  2   x    P( x)


Example
John sells cars for General Motors. He usually sells the largest number of cars on Saturday. He
has the following probability distribution for the number of cars he expects to sell on a particular
Saturday.
No. of cars (x)
Probability P(x)
0
.1
1
.2
2
.3
3
.3
4
.1
Total
1.0
i. On a typical Saturday, how many cars does John expect to sell?
ii. What is the variance of the distribution?
BINOMIAL PROBABILITY DISTRIBUTION
Conditions for a Binomial model
For a situation to be described using a binomial model,
1
INTRODUCTION TO STATISTICS ST.PAULS UNIVERSITY




An outcome on each trial of an experiment is classified into one of two mutually exclusive
categories; a success or a failure.
The random variable counts the number of successes in a fixed number of trials.
The probability of a success stays the same for each trial and so does the probability of a
failure.
The trials are independent i.e. the outcome of one trial does not affect the outcome of any
other trial.
For a discrete random variable X , r is the number of successful outcomes in n trials. If the
above conditions are satisfied, X is said to follow a binomial distribution i.e. X ~ B (n, p ) . If
X ~ B (n, p ) , the probability of obtaining x successes in n trials is P( X  x) where
P( X  x) n C x p x q n  x for x  0,1,2,3............n
Examples
1. At Nakumatt Supermarket, 30% of the customers pay by credit card. Find the probability that
in a randomly selected sample of ten customers:
i. Exactly two pay by credit card
ii. Less than three pay by credit card
iii. More than three pay by credit card
iv. More than seven pay by credit card.
2. The Random variable X , is distributed B (7,0.2) . Find
i. P( X  3)
ii. P (1  X  4)
iii. P ( X  1)
3. A box contains a large number of pens. The probability that a pen is faulty is 0.1. How many
pens would you need to select to be more than 95% certain of picking at least one faulty one?
The mean and Variance of a Binomial distribution
If X ~ B (n, p ) , the mean  = E ( X ) = np and the variance  2  npq .
The Mode of the Binomial distribution
The mode is the value of X that is most likely to occur. When p = 0.5 and n is odd, there will be
two modes, otherwise the distribution has one mode.
Examples
1. The probability that it will be a sunny day is 0.4. Find the expected number of fine days in a
week and also the standard deviation.
2. If X is B (n, p ) with mean 5 and standard deviation 2. Find the values of n and p
3. 10% of the articles from a certain production line are defective. A sample of 25 articles is
taken. Find the expected number of defective items and the standard deviation.
4. The random variable X is B (n,0.3) and E ( X )  2.4 . Find n and the standard deviation of
X
2
INTRODUCTION TO STATISTICS ST.PAULS UNIVERSITY
HYPERGEOMETRIC PROBABILITY DISTRIBUTION
It is used when a sample is selected from a finite population without replacement and if the size
of the sample n is more than 5% of the population N.
( C )( C )
P( X  x)  S x N  S n  x
N Cn
where
denotes a combination
C
Number of trials or size of sample
n
The number of successes of interest
x
The size of the population
N
S The number of successes in the population
Examples
1. Suppose 50 computers were manufactured during the week. 40 operated perfectly and 10
had at least one defect. A sample of 5 is selected at random. What is the probability that 4 of
the 5 will operate perfectly?
2. Keith’s Florist has 15 trucks used mainly to deliver flowers and flower arrangements within
the Nairobi area. Suppose 6 of the 15 trucks have brake problems. Five trucks were selected
at random to be tested. What is the probability that 2 of those tested have defective brakes?
3. A new flavour of toothpaste has been developed. It was tested by a group of ten people. Six
of the group said they liked the new flavour and the remaining four indicated they did not.
Four of the ten are to participate in an in-depth interview. What is the probability that of
those selected for the in-depth interview, two liked the flavour and two did not?
THE POISSON PROBABILITY DISTRIBUTION
Conditions for a Poisson model
 Events occur singly and at random in a given interval of time or space
  , the mean number of occurrences in the given interval, is known and finite.
 The probability of a success is usually small and the number of trials is usually large.
The variable X is the number of occurrences in the given interval.
If the above conditions are satisfied,
X ~ P ( ) where P( X  x)  e 

x
x!
X is said to follow a Poisson distribution, written as
for x  0,1,2, ,3......to infinity
The mean and variance of a Poisson distribution
The mean and variance of a Poisson distribution are equal i.e. the mean =  and the variance = 
The mode of a Poisson distribution
In general if  is an integer, there will be two modes,   1 and  .
In general if  is not an integer, the mode is the integer below  .
Examples
1. A student finds that the average number of amoebas in 10ml of pond water from a
particular pond is four. Assuming that the number of amoebas follows a Poisson
distribution, find the probability that in a 10ml sample: 3
INTRODUCTION TO STATISTICS ST.PAULS UNIVERSITY
2.
3.
4.
5.
6.
i. There are exactly five amoebas
ii. There are no amoebas
iii. There are fewer that three amoebas
On average the school photocopier breaks down eight times during the school week
(Monday to Friday). Assuming that the number of breakdowns can be modeled by a
Poisson distribution, find the probability that it breaks down:
i. Five times in a given week
ii. Once on a Monday
iii. Eight times in a fortnight.
X follows a Poisson distribution with standard deviation 1.5. Find P ( X  3)
Mrs. Mwangi is a loan officer at the Barclays bank. Based on her years of experience, she
estimates that the probability is 0.025 that an applicant will not be able to repay his or her
installment loan. Last month she made 40 loans. What is the probability that:i. Three loans would be defaulted?
ii. At least three loans would be defaulted?
It is estimated that 0.5% of the callers to the billing department of Telkom Kenya will
receive a busy signal. What is the probability that of today’s 1,200 callers, at least five
received a busy signal?
The marketing manager of a company has noted that she usually receives 10 complaint calls
from customers during a week consisting of 5 working days and that calls occur at random.
Find the probability of receiving five such calls in a single day.
Using the Poisson distribution as an approximation to the binomial distribution
When n is large ( n  50 ), and p is small ( p  0.1 ), the binomial distribution X ~ B (n, p ) can
be approximated using a Poisson distribution with the same mean i.e. X ~ Po( np ) . The
approximation gets better as n gets larger and p gets smaller.
Examples
1. Eggs are packed into boxes of 500. On average 0.7% of the eggs are found to be broken
when the eggs are unpacked. Find the probability that in a box of 500 eggs,
(a) Exactly three are broken
(b) At least two are broken
2. A Christmas draw aims to sell 5000 tickets, 50 of which will win a prize.
(a) A syndicate buys 200 tickets. Let x represent the number of these tickets that win a prize.
i. Justify the use of the Poisson approximation for the distribution X
ii. Calculate P( X  3)
(b) Calculate how many tickets should be bought in order for there to be a 90% probability
of winning at least one prize.
3. On average one in 200 cars breaks down on a certain stretch of road per day. Find the
probability that, on a randomly chosen day,
i. None of a sample of 250 cars break down,
ii. More than two of a sample of 300 cars breaks down.
Applications of Poisson distribution
Used in determining:
4
INTRODUCTION TO STATISTICS ST.PAULS UNIVERSITY
 The number of customers arriving at a service facility in unit time e.g. per hour.
 Number of telephone calls arriving at a telephone switch board per unit time e.g. per
minute
 Number of printing mistakes per page in a book or number of typographical errors per
page in a typed material.
 Number of radioactive particles decaying in a given interval of time.
 Dimensional errors in engineering drawing.
 Number of defects along a long tape
 Number of accidents on a particular road per day
 Hospital emergencies per day
 Number of defective materials of a product
 Number of goals in a football match
Poisson distribution differs from the binomial distribution in two important respects:
 Rather than consisting of discrete trials, the distribution operates continuously over some
given amount of time, distance, area etc.
 Rather than producing a sequence of successes and failures, the distribution produces
successes, which occur at random points in the specified time, distance or area.
THE NORMAL PROBABILITY DISTRIBUTION
The characteristics of the normal probability distribution
 It is bell shaped and has a single peak at the center of the distribution
 The arithmetic mean, median and mode of the distribution are equal and located at the peak.
 Half of the area under the curve is above this center point and the other half is below it.
 It is symmetrical about its mean i.e. if it is cut vertically at the central value, the two halves
will be mirror images
 It is asymptotic i.e. the curve gets closer and closer to the x-axis but never actually touches
it.
 The area under the total curve is equal to 1
The standard normal probability distribution ( = 0,  = 1)
Z-Value: It is the distance between a selected value designated X and the mean  divided by the
standard deviation. It is the distance from the mean, measured in units of the standard deviation.
X 
Z

Where
X is the value of any particular observation

is the mean of the distribution
 is the standard deviation of the distribution
Importance of the Normal distribution
 Frequency distributions of many physical characteristics such as heights and weights of
people, dimensions of items from production processes etc often have the shape of the
normal curve.
 It is useful as an approximation to the various other distributions under certain limiting
conditions e.g. the Binomial and Poisson distribution.
5
INTRODUCTION TO STATISTICS ST.PAULS UNIVERSITY
 It is useful in statistical quality control where the control limits are set by using this
distribution.
 Used in sampling theory.
 Used in testing statistical hypothesis and test of significance in which the assumption is
that the population from which the sample is drawn is a normal distribution.
 It is a fairly ‘robust’ distribution i.e. reasonable results may be obtained by approximating
the normal distribution.
Examples:
1. What is the area under the curve between the mean and X for the following Z Values?
2.84, 1.00, 0.49, 1.91, 1.25, 0.1, 0.35, 1.23
2. The daily incomes of middle managers are normally distributed with a mean of Sh. 1000 and
a standard deviation of Sh. 100. Required: (a) What is the Z value for an income of Sh. 1,100 and 900?
(b) What is the area under the normal curve between 1,000 and 1,100?
(c) What is the probability that a particular daily income selected at random is between
790 and 1,000?
(d) What is the probability that the income is less than 790?
(e) What is the area under the normal curve between 840 and 1200?
(f) What percent of executives earn daily incomes of 1,245 and above?
(g) What is the area under the normal curve between 1,150 and 1,250?
3. In an intelligence test administered to 1,000 students, the average score was 42 and the
standard deviation was 24. Find:
i. The number of students exceeding a score of 50 marks
ii. The number of students lying between 30 and 54 marks
iii. The value of score exceeded by the top 100 students.
4. A tyre manufacturer wants to set a minimum mileage guarantee on its new MX100 tyre.
Tests reveal the mean mileage is 47,900 with a standard deviation of 2,050 miles and the
distribution is a normal distribution. The manufacturer wants to set the minimum guaranteed
mileage so that no more than 4% of the tyres will have to be replaced. What minimum
guaranteed mileage should the manufacturer announce?
5. A firm’s marketing manager believes that total sales for the firm next year can be modeled by
using a normal distribution with a mean of 2.5 million and a standard deviation of 300,000.
(a) What is the probability that the firm’s sales will exceed 3 million?
(b) What is the probability that the firm’s sales will fall within 150,000 of the expected
level of sales?
(c) In order to cover fixed costs, the firm’s sales must exceed the break-even level of 1.8
million. What is the probability that sales will exceed the break-even level?
(d) Determine the sales level that has only a 9% chance of being exceeded next year.
6. The speeds of cars passing a certain point on a motorway can be taken to be normally
distributed. Observations show that of cars passing the point, 95% are traveling at less than
85 Km/ h and 10% are traveling at less than 55Km/h.
6
INTRODUCTION TO STATISTICS ST.PAULS UNIVERSITY
(a) Find the average speed of the cars passing the point.
(b) Find the proportion of cars that travel at more than 70 Km/h.
7. The results of a particular examination are given below in a summary form
Result
% of candidates
Passed with distinction
10
Passed without distinction
60
Failed
30
It is known that a candidate fails in the examination if he obtains less than 40 marks (out of
100) while he must obtain at least 75 marks in order to pass with distinction. Determine the
mean and standard deviation of the distribution of marks, assuming this to be normal.
8. The masses of packets of sugar are normally distributed. In a large consignment of packets of
sugar, it is found that 5% of them have a mass greater than 510g and 2% have a mass greater
than 515g. Estimate the mean and standard deviation of this distribution.
9. The masses of boxes of oranges are normally distributed such that 30% of them are greater
than 4.00 Kg and 20% are greater than 4.53kg. Estimate the mean and standard deviation of
the masses.
The normal approximation to the binomial distribution
If X ~ B (n, p ) and n and p are such that np  5 and nq  5 , the X ~ N (np, npq )
approximately.
Examples
1. Find the probability of obtaining 4, 5, 6 or 7 heads when a fair coin is tossed 12 times
(a) Using the binomial distribution
(b) Using a normal approximation to the binomial distribution
2.
In a sack of mixed grass seeds, the probability that a seed is ryegrass is 0.35. Find the
probability that in a random sample of 400 seeds from the sack,
(a) Less than 120 are ryegrass seeds,
(b) Between 120 and 150 (inclusive) are ryegrass
(c) More than 160 are ryegrass seeds
3.
It is given that 40% of the population support the birthday party. One hundred and fifty
members of the population are selected at random. Use a suitable approximation to find the
probability that more than 55 out of the 150 support the birthday party.
4.
At a particular hospital, records show that each day, on average, only 80% of people keep
their appointment at the outpatient’s clinic. Find the probability that on a day when 200
appointments have been booked:
(a) More than 170 patients keep their appointments
(b) At least 155 patients keep their appointments
7
INTRODUCTION TO STATISTICS ST.PAULS UNIVERSITY
5.
A certain tribe is distinguished by the fact that 45% of the males have six toes on their
right foot. Find the probability that, in a group of 200 males from the tribe, more than 97
have six toes on their right foot.
6.
A lorry load of potatoes has, on average, one rotten potato in six. A green grocer decides to
refuse the consignment if she finds more than 18 rotten potatoes in a random sample of
100. Find the probability that she accepts the consignment.
The normal approximation to the Poisson distribution
If X follows a Poisson distribution with parameter  i.e. X ~ Po( ) , then E (X) =  and Var
(X) =  . When  is large (  >15), the normal distribution can be used as an approximation,
where X ~ N ( ,  ) .
Examples
1
A radioactive disintegration gives counts that follow a Poisson distribution with a mean
count of 25 per second. Find the probability that in a one second interval, the count is
between 23 and 27 inclusive.
2.
In a certain factory the number of accidents occurring in a month follows a Poisson
distribution with a mean of 4. Find the probability that there will be at least 40 accidents
during one year.
3.
In an experiment with a radioactive substance, the number of particles reaching a counter
over a given period of time follows a Poisson distribution with mean 22. Find the
probability that the number of particles reaching the counter over a given period of time is
(a) Less than 22
(b) Between 25 and 30
(c) 18 or more.
8