Survey

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Survey

Transcript

INTRODUCTION TO STATISTICS ST.PAULS UNIVERSITY PROBABILITY DISTRIBUTIONS Definition of a probability distribution It is a listing of all the outcomes of an experiment and the probability associated with each outcome. Characteristics of a probability distribution The probability of a particular outcome is between 0 and 1, inclusive. The sum of the probabilities of all mutually exclusive outcomes is 1.0 Random variables Definition: It is a quantity resulting from an experiment that by chance can assume different values. Discrete random variables are variables that can assume only certain clearly separated values of some item of interest. Continuous random variables can assume any value. For a discrete random variable, the sum of the probabilities is 1 i.e. P( X x) 1. all x The function that is responsible for allocating probabilities, P( X x) , is known as the probability density function of X, sometimes abbreviated to the p.d.f. of X. The mean, variance and standard deviation of a probability distribution The mean also referred to as the expected value is denoted by E (x ). = E (x) = x.P( x) where P (x ) is the probability of the possible value of the random variable x . 2 The variance 2 x P( x) Example John sells cars for General Motors. He usually sells the largest number of cars on Saturday. He has the following probability distribution for the number of cars he expects to sell on a particular Saturday. No. of cars (x) Probability P(x) 0 .1 1 .2 2 .3 3 .3 4 .1 Total 1.0 i. On a typical Saturday, how many cars does John expect to sell? ii. What is the variance of the distribution? BINOMIAL PROBABILITY DISTRIBUTION Conditions for a Binomial model For a situation to be described using a binomial model, 1 INTRODUCTION TO STATISTICS ST.PAULS UNIVERSITY An outcome on each trial of an experiment is classified into one of two mutually exclusive categories; a success or a failure. The random variable counts the number of successes in a fixed number of trials. The probability of a success stays the same for each trial and so does the probability of a failure. The trials are independent i.e. the outcome of one trial does not affect the outcome of any other trial. For a discrete random variable X , r is the number of successful outcomes in n trials. If the above conditions are satisfied, X is said to follow a binomial distribution i.e. X ~ B (n, p ) . If X ~ B (n, p ) , the probability of obtaining x successes in n trials is P( X x) where P( X x) n C x p x q n x for x 0,1,2,3............n Examples 1. At Nakumatt Supermarket, 30% of the customers pay by credit card. Find the probability that in a randomly selected sample of ten customers: i. Exactly two pay by credit card ii. Less than three pay by credit card iii. More than three pay by credit card iv. More than seven pay by credit card. 2. The Random variable X , is distributed B (7,0.2) . Find i. P( X 3) ii. P (1 X 4) iii. P ( X 1) 3. A box contains a large number of pens. The probability that a pen is faulty is 0.1. How many pens would you need to select to be more than 95% certain of picking at least one faulty one? The mean and Variance of a Binomial distribution If X ~ B (n, p ) , the mean = E ( X ) = np and the variance 2 npq . The Mode of the Binomial distribution The mode is the value of X that is most likely to occur. When p = 0.5 and n is odd, there will be two modes, otherwise the distribution has one mode. Examples 1. The probability that it will be a sunny day is 0.4. Find the expected number of fine days in a week and also the standard deviation. 2. If X is B (n, p ) with mean 5 and standard deviation 2. Find the values of n and p 3. 10% of the articles from a certain production line are defective. A sample of 25 articles is taken. Find the expected number of defective items and the standard deviation. 4. The random variable X is B (n,0.3) and E ( X ) 2.4 . Find n and the standard deviation of X 2 INTRODUCTION TO STATISTICS ST.PAULS UNIVERSITY HYPERGEOMETRIC PROBABILITY DISTRIBUTION It is used when a sample is selected from a finite population without replacement and if the size of the sample n is more than 5% of the population N. ( C )( C ) P( X x) S x N S n x N Cn where denotes a combination C Number of trials or size of sample n The number of successes of interest x The size of the population N S The number of successes in the population Examples 1. Suppose 50 computers were manufactured during the week. 40 operated perfectly and 10 had at least one defect. A sample of 5 is selected at random. What is the probability that 4 of the 5 will operate perfectly? 2. Keith’s Florist has 15 trucks used mainly to deliver flowers and flower arrangements within the Nairobi area. Suppose 6 of the 15 trucks have brake problems. Five trucks were selected at random to be tested. What is the probability that 2 of those tested have defective brakes? 3. A new flavour of toothpaste has been developed. It was tested by a group of ten people. Six of the group said they liked the new flavour and the remaining four indicated they did not. Four of the ten are to participate in an in-depth interview. What is the probability that of those selected for the in-depth interview, two liked the flavour and two did not? THE POISSON PROBABILITY DISTRIBUTION Conditions for a Poisson model Events occur singly and at random in a given interval of time or space , the mean number of occurrences in the given interval, is known and finite. The probability of a success is usually small and the number of trials is usually large. The variable X is the number of occurrences in the given interval. If the above conditions are satisfied, X ~ P ( ) where P( X x) e x x! X is said to follow a Poisson distribution, written as for x 0,1,2, ,3......to infinity The mean and variance of a Poisson distribution The mean and variance of a Poisson distribution are equal i.e. the mean = and the variance = The mode of a Poisson distribution In general if is an integer, there will be two modes, 1 and . In general if is not an integer, the mode is the integer below . Examples 1. A student finds that the average number of amoebas in 10ml of pond water from a particular pond is four. Assuming that the number of amoebas follows a Poisson distribution, find the probability that in a 10ml sample: 3 INTRODUCTION TO STATISTICS ST.PAULS UNIVERSITY 2. 3. 4. 5. 6. i. There are exactly five amoebas ii. There are no amoebas iii. There are fewer that three amoebas On average the school photocopier breaks down eight times during the school week (Monday to Friday). Assuming that the number of breakdowns can be modeled by a Poisson distribution, find the probability that it breaks down: i. Five times in a given week ii. Once on a Monday iii. Eight times in a fortnight. X follows a Poisson distribution with standard deviation 1.5. Find P ( X 3) Mrs. Mwangi is a loan officer at the Barclays bank. Based on her years of experience, she estimates that the probability is 0.025 that an applicant will not be able to repay his or her installment loan. Last month she made 40 loans. What is the probability that:i. Three loans would be defaulted? ii. At least three loans would be defaulted? It is estimated that 0.5% of the callers to the billing department of Telkom Kenya will receive a busy signal. What is the probability that of today’s 1,200 callers, at least five received a busy signal? The marketing manager of a company has noted that she usually receives 10 complaint calls from customers during a week consisting of 5 working days and that calls occur at random. Find the probability of receiving five such calls in a single day. Using the Poisson distribution as an approximation to the binomial distribution When n is large ( n 50 ), and p is small ( p 0.1 ), the binomial distribution X ~ B (n, p ) can be approximated using a Poisson distribution with the same mean i.e. X ~ Po( np ) . The approximation gets better as n gets larger and p gets smaller. Examples 1. Eggs are packed into boxes of 500. On average 0.7% of the eggs are found to be broken when the eggs are unpacked. Find the probability that in a box of 500 eggs, (a) Exactly three are broken (b) At least two are broken 2. A Christmas draw aims to sell 5000 tickets, 50 of which will win a prize. (a) A syndicate buys 200 tickets. Let x represent the number of these tickets that win a prize. i. Justify the use of the Poisson approximation for the distribution X ii. Calculate P( X 3) (b) Calculate how many tickets should be bought in order for there to be a 90% probability of winning at least one prize. 3. On average one in 200 cars breaks down on a certain stretch of road per day. Find the probability that, on a randomly chosen day, i. None of a sample of 250 cars break down, ii. More than two of a sample of 300 cars breaks down. Applications of Poisson distribution Used in determining: 4 INTRODUCTION TO STATISTICS ST.PAULS UNIVERSITY The number of customers arriving at a service facility in unit time e.g. per hour. Number of telephone calls arriving at a telephone switch board per unit time e.g. per minute Number of printing mistakes per page in a book or number of typographical errors per page in a typed material. Number of radioactive particles decaying in a given interval of time. Dimensional errors in engineering drawing. Number of defects along a long tape Number of accidents on a particular road per day Hospital emergencies per day Number of defective materials of a product Number of goals in a football match Poisson distribution differs from the binomial distribution in two important respects: Rather than consisting of discrete trials, the distribution operates continuously over some given amount of time, distance, area etc. Rather than producing a sequence of successes and failures, the distribution produces successes, which occur at random points in the specified time, distance or area. THE NORMAL PROBABILITY DISTRIBUTION The characteristics of the normal probability distribution It is bell shaped and has a single peak at the center of the distribution The arithmetic mean, median and mode of the distribution are equal and located at the peak. Half of the area under the curve is above this center point and the other half is below it. It is symmetrical about its mean i.e. if it is cut vertically at the central value, the two halves will be mirror images It is asymptotic i.e. the curve gets closer and closer to the x-axis but never actually touches it. The area under the total curve is equal to 1 The standard normal probability distribution ( = 0, = 1) Z-Value: It is the distance between a selected value designated X and the mean divided by the standard deviation. It is the distance from the mean, measured in units of the standard deviation. X Z Where X is the value of any particular observation is the mean of the distribution is the standard deviation of the distribution Importance of the Normal distribution Frequency distributions of many physical characteristics such as heights and weights of people, dimensions of items from production processes etc often have the shape of the normal curve. It is useful as an approximation to the various other distributions under certain limiting conditions e.g. the Binomial and Poisson distribution. 5 INTRODUCTION TO STATISTICS ST.PAULS UNIVERSITY It is useful in statistical quality control where the control limits are set by using this distribution. Used in sampling theory. Used in testing statistical hypothesis and test of significance in which the assumption is that the population from which the sample is drawn is a normal distribution. It is a fairly ‘robust’ distribution i.e. reasonable results may be obtained by approximating the normal distribution. Examples: 1. What is the area under the curve between the mean and X for the following Z Values? 2.84, 1.00, 0.49, 1.91, 1.25, 0.1, 0.35, 1.23 2. The daily incomes of middle managers are normally distributed with a mean of Sh. 1000 and a standard deviation of Sh. 100. Required: (a) What is the Z value for an income of Sh. 1,100 and 900? (b) What is the area under the normal curve between 1,000 and 1,100? (c) What is the probability that a particular daily income selected at random is between 790 and 1,000? (d) What is the probability that the income is less than 790? (e) What is the area under the normal curve between 840 and 1200? (f) What percent of executives earn daily incomes of 1,245 and above? (g) What is the area under the normal curve between 1,150 and 1,250? 3. In an intelligence test administered to 1,000 students, the average score was 42 and the standard deviation was 24. Find: i. The number of students exceeding a score of 50 marks ii. The number of students lying between 30 and 54 marks iii. The value of score exceeded by the top 100 students. 4. A tyre manufacturer wants to set a minimum mileage guarantee on its new MX100 tyre. Tests reveal the mean mileage is 47,900 with a standard deviation of 2,050 miles and the distribution is a normal distribution. The manufacturer wants to set the minimum guaranteed mileage so that no more than 4% of the tyres will have to be replaced. What minimum guaranteed mileage should the manufacturer announce? 5. A firm’s marketing manager believes that total sales for the firm next year can be modeled by using a normal distribution with a mean of 2.5 million and a standard deviation of 300,000. (a) What is the probability that the firm’s sales will exceed 3 million? (b) What is the probability that the firm’s sales will fall within 150,000 of the expected level of sales? (c) In order to cover fixed costs, the firm’s sales must exceed the break-even level of 1.8 million. What is the probability that sales will exceed the break-even level? (d) Determine the sales level that has only a 9% chance of being exceeded next year. 6. The speeds of cars passing a certain point on a motorway can be taken to be normally distributed. Observations show that of cars passing the point, 95% are traveling at less than 85 Km/ h and 10% are traveling at less than 55Km/h. 6 INTRODUCTION TO STATISTICS ST.PAULS UNIVERSITY (a) Find the average speed of the cars passing the point. (b) Find the proportion of cars that travel at more than 70 Km/h. 7. The results of a particular examination are given below in a summary form Result % of candidates Passed with distinction 10 Passed without distinction 60 Failed 30 It is known that a candidate fails in the examination if he obtains less than 40 marks (out of 100) while he must obtain at least 75 marks in order to pass with distinction. Determine the mean and standard deviation of the distribution of marks, assuming this to be normal. 8. The masses of packets of sugar are normally distributed. In a large consignment of packets of sugar, it is found that 5% of them have a mass greater than 510g and 2% have a mass greater than 515g. Estimate the mean and standard deviation of this distribution. 9. The masses of boxes of oranges are normally distributed such that 30% of them are greater than 4.00 Kg and 20% are greater than 4.53kg. Estimate the mean and standard deviation of the masses. The normal approximation to the binomial distribution If X ~ B (n, p ) and n and p are such that np 5 and nq 5 , the X ~ N (np, npq ) approximately. Examples 1. Find the probability of obtaining 4, 5, 6 or 7 heads when a fair coin is tossed 12 times (a) Using the binomial distribution (b) Using a normal approximation to the binomial distribution 2. In a sack of mixed grass seeds, the probability that a seed is ryegrass is 0.35. Find the probability that in a random sample of 400 seeds from the sack, (a) Less than 120 are ryegrass seeds, (b) Between 120 and 150 (inclusive) are ryegrass (c) More than 160 are ryegrass seeds 3. It is given that 40% of the population support the birthday party. One hundred and fifty members of the population are selected at random. Use a suitable approximation to find the probability that more than 55 out of the 150 support the birthday party. 4. At a particular hospital, records show that each day, on average, only 80% of people keep their appointment at the outpatient’s clinic. Find the probability that on a day when 200 appointments have been booked: (a) More than 170 patients keep their appointments (b) At least 155 patients keep their appointments 7 INTRODUCTION TO STATISTICS ST.PAULS UNIVERSITY 5. A certain tribe is distinguished by the fact that 45% of the males have six toes on their right foot. Find the probability that, in a group of 200 males from the tribe, more than 97 have six toes on their right foot. 6. A lorry load of potatoes has, on average, one rotten potato in six. A green grocer decides to refuse the consignment if she finds more than 18 rotten potatoes in a random sample of 100. Find the probability that she accepts the consignment. The normal approximation to the Poisson distribution If X follows a Poisson distribution with parameter i.e. X ~ Po( ) , then E (X) = and Var (X) = . When is large ( >15), the normal distribution can be used as an approximation, where X ~ N ( , ) . Examples 1 A radioactive disintegration gives counts that follow a Poisson distribution with a mean count of 25 per second. Find the probability that in a one second interval, the count is between 23 and 27 inclusive. 2. In a certain factory the number of accidents occurring in a month follows a Poisson distribution with a mean of 4. Find the probability that there will be at least 40 accidents during one year. 3. In an experiment with a radioactive substance, the number of particles reaching a counter over a given period of time follows a Poisson distribution with mean 22. Find the probability that the number of particles reaching the counter over a given period of time is (a) Less than 22 (b) Between 25 and 30 (c) 18 or more. 8