Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STATISTICS: MODULE 12122 Chapter 1- Random variables and their distributions The aim of statistical analysis is to enable one to proceed from a knowledge of the current situation to an understanding of what will happen tomorrow. 1.1 In the business world, businessmen and economists are frequently being called on to make decisions in the face of uncertainty, to predict and make forecasts. In statistics, we study numerical data in order that these decisions, predictions, etc., be made after some mathematical investigation. The raw materials for decision making and hypothesis testing consists of the data and the methods we use to analyse it. The end product is the conclusion. Statistics 1. It refers to data which describe known things such as sales, profits, output and unemployment. These are measurements of past and present activities. 2. Refers to statistics as a set of analytical methods for collecting and analysing data. These methods ultimately lead to decision-making and hypothesis testing and prediction in the face of uncertainty. The science of the analysis of data rests on mathematics, particularly on probability theory. 1.2 Terminology Random experiment The word experiment is used because the outcome is yet to be determined, whereas random signifies that any particular outcome is uncertain. Random variable It is a numerical quantity whose value is determined by the outcome of a random experiment. It is a variable (usually denoted by capital letters e.g. X, Y, Z, ....) whose value (usually denoted by small letters e.g. x, y, z .......) is subject to chance. Example 1.1 (1) A government department studies certain characteristics of the economy over time such as monthly unemployment figures and annual % age growth. Let X = the monthly unemployment figures in UK over the last 12 years. Let Y = the annual %age growth of the UK economy over the last 20 years. (2) An investment company record (i) the prices of 5 different shares traded on the London Stock Exchange on particular days (ii) the rate of return on a particular portfolio over a period of time Let X i = price of share i ,i = 1,2,3,4,5 on a particular day. Let Y = rate of return of a particular portfolio at any particular time. 2 (3) A company record (i) the number of unpaid accounts at a particular time, (ii) their market share for a particular product aver a period of time (iii) the sales of a particular product over a period of time. Let X = the number of unpaid accounts at a particular time. Let Y = the market share of the product at a particular time. Let Z = the sales of a particular product at a particular time (4) A large national motoring school record each year (i) the number of driving tests clients undertake before passing the test (ii) the total number of lessons thier clients receive (iii) Let X = the number of driving tests before passing the test. Let Y = the number of lessons each client receives. Let Z = Sample space S The set of all possible outcomes of a random experiment or equivalently the set of all possible values of a random variable is called the sample space (or possibility space) or population. Example 1.2. Describe the sample spaces for random experiments examples (3) and (4) above. Discrete random variable- one which can only assume certain specific values, usually integers. Discrete random variables usually arise from counting. Continuous random variable- one which can assume any value between two given values. Continuous random variables usually arise from measurement. Example 1.3 Which of the above random variables are discrete and which continuous? Event An event is a subset of S, so if we call the event A, then A ⊆ S and if n(A) is the number of elements in A, then n(A) ó n(S). Example 1.4 Using random experiment (3). (i) A is the event that there is at least 2000 unpaid accounts 3 (ii) A is the event that the market share is between 65 and 72% inclusive (iii) A is the event that sales are more than £75000. 1.3. Discrete Probability Models or Distributions One of the basic tasks of a statistical investigation is to find a theoretical probability model or distribution which fits the data under investigation. Having found such a model to fit the data, we may use it to make predictions about future observations. For example, suppose an airline has 16 seats remaining seven days before a given flight departs. If more than 16 people arrive for the flight, one or more persons will be bumped from the flight, and the airline will have to pay a penalty. We can use a probability model called the Binomial model to determine the probability that at least one person will be bumped. If the random variable is a discrete random variable, the model/distribution is called a discrete probability model/distribution. A discrete probability distribution is defined by specifying the set of values of the random variable and the probabilities assigned to each of these values. Example 1.5. Two discrete probability models/distributions An investment consultant wishes to buy stock to be held for one year in anticipation of capital gain. He has narrowed his choice down to High-Volatility Enterprises and Stability Power. Both stocks currently sell for £10 per share and yield £0.2 dividends per share. He puts forward models for the next year's selling prices. High Volatility Enterprises(HVE) Value of X1 Probability x1 p1 ( x1 ) 2.50 5.00 7.50 10.00 12.50 15.00 17.50 20.00 22.50 25.00 0.05 0.07 0.10 0.05 0.10 0.15 0.12 0.10 0.12 0.14 Stability Power(SPO) Value of X2 Probability x2 p2 ( x 2 ) 9.50 10.00 10.50 11.00 0.10 0.25 0.50 0.15 The random variables are X1 = selling price of High Volatility Enterprises shares and X2 = selling price of Stability Power shares. 4 The sample space of X1 = {2.50, 5.00, 7.50, 10.00, ... 25.00} The sample space of X2 = {9.50, 10.00, 10.50, 11.00}. A probability function, denoted by p(x) , assigns a probability to each value of the random variable such that the sum of the probabilities is 1. p(x) = P (X = x) where X is the random variable and x is its value. So in the above example the probability functions are p1(x1) and p2(x2) and are defined by p1 ( x1 ) = P ( X 1 = x1 ) and p2 ( x 2 ) = P ( X 2 = x 2 ) NOTE The probabilities total to 1 for each distribution. In order to calculate the probability of certain events e.g. the probability that High Volatility share price is at least £7.50 i.e. P( X ò 7.50) we need to use probability laws. 1.4 Probability Summary Probability is a measure of the degree of certainty which we associate with the occurrence of a particular event. If P(A) is the probability of event A occurring 0 ó P(A) ó 1 1. Addition law (a) If events A and B are mutually exclusive events, i.e. they cannot both occur together, then P( A or B) = P( A ∪ B) = P(A) + P(B) Example 1. 6 Calculate the probability that the HVE selling price is less than £7.50. (b) P( Not A ) = 1 - P(A) = P( A ) Example 1.7 Calculate the probability that the SPO selling price is at least £10.0. (c) If events A and B are not mutually exclusive then 5 P( A or B ) = P( A) + P(B) - P( A and B) or P( A ∪ B ) = P( A) + P(B) - P( A ∩ B) using set notation ( see Chapter 1 QM1). 2. Conditional Probabilities P( A / B) is the probability of event A occurring given or if event B has occurred or is certain to occur. P( A / B) = P( A and B) = P( A ∩ B) (*) P(B) P(B) P( B / A) = P( A and B) = P( A ∩ B) (**) P(A) P(A) e.g. Suppose A is the event ‘Car driver has a road accident ‘ and B is the event ‘Driver is drunk’ Accident statistics would indicate that P( A / B ) > > P( A) . We say in this case that the events A and B are dependent. Generally P(A / B) ≠ P( A) If P(A / B) = P(A) or P(B / A) = P(B) then we say that the events A and B are independent. 3. Multiplication law (a) If events A and B are dependent events then P( A and B ) = P( A ∩ B) = P(A / B) x P( B) or P(B / A) x P( A) rearranging equation (*) rearranging equation (**) (b) If events A and B are independent i.e. the probability of A occurring is in no way affected by the occurrence or non-occurrence of event B in which case P(A / B) = P(A) then P( A and B ) = P( A ∩ B) = P(A) x P(B) Example 1.8 If it can be assumed that HVE and SPO share prices move independently of one another, (i) calculate the probability that the HVE share price is at least £7.50 and the SPO share price is at least £10. (ii) calculate the probability that the HVE share price is at least £7.50 or the SPO share price is at least £10. 6 1.5 Expectation or long-run average E(X) It is useful to know the mean of a probability distribution as the mean tells us where the centre of the distribution is and it tells us what value X can be expected to take on average in the long run. The mean of a probability distribution is defined as follows, µ = E ( X ) = Σ x p( x ) = Σ x P( X = x ). E(X) is called the expectation of the random variable X or the expected value of the random variable X. Example 1.9 Compare the expected selling prices for HVE and SPO shares. Example 1.10 A contractor has to choose between two jobs. The first job promises a profit of £120,000 with a probability of 0.75 or loss of £30,000 ( due to strikes and other delays) with a probability of 0.25; the second job promises a profit of £180,000 with a probability of 0.5 or a loss of £ 45,000 with a probability of 0.5. (a) Which job should the contractor choose if he wants to maximise his expected profit ? (b) Which job would the contractor choose if his business is in fairly bad shape and he will go broke unless he can make a profit of at least £150,000 on his next job? 1.6 Standard deviation (St.dev(X)) It is useful to have a measure of the spread or dispersion of a probability distribution from the mean position µ such as the standard deviation ( σ ). σ = St . dev ( X ) = E ( X − µ) = 2 E ( X 2 ) − [ E ( X )] 2 = E( X 2 ) − µ 2 σ 2 = Var ( X ) = E ( X − µ) = E ( X 2 ) − [ E ( X )]2 = E ( X 2 ) − µ 2 . 2 E ( X 2 ) = Σ x 2 p( x ) = Σ x 2 P ( X = x ). Example 1.11 Compare the HVE and SPO share price standard deviations. Which share is riskier? 7 1.7. Expectation of a function of a random variable If X is a random variable and g(X) is a function of X, then E( g( X )) = Σ g ( x ) p( x ) = Σ g ( x ) P( X = x ). Examples and you get E ( X ) = Σ x p( x ) = Σ x P ( X = x ) as in 1.5 Take g(X) = X Take g(X) = X 2 and you get E ( X 2 ) = Σ x 2 p( x ) = Σ x 2 P ( X = x ) as in 1.6 Take g(X) = aX + b where a and b are constants and you get E ( aX + b) = 8 9 1.8. Continuous Probability Models or Distributions In many respects continuous probability models are very different from discrete probability models. For continuous probability models, the random variable is continuous and the sample space S is continuous. The models are defined by specifying a probability density function (p.d.f.) typically denoted by f (x). A probability density function tells us about probability density i.e. where most of the probability is concentrated, whereas with discrete models, the probability function gives the actual probability , e.g. P( X = 5) = 0.62. In the continuous case, probability is calculated by evaluating an area under the p.d.f. curve i.e. b P( a ≤ X ≤b ) = ∫f ( x ) dx. a Important results (i) P(X = a) = 0 because the area under a point is zero. Hence P ( a < X < b) = P ( a ≤ X ≤b) = P (a < X ≤b) = P (a ≤ X < b). Also P ( X > b) = P ( X ≥ b) and (ii) P ( X < a )= P ( X ≤a ) The total area under the p.d.f. curve is 1 since P(S) = 1. ∞ i.e. ∫f ( x )dx = 1. −∞ Definition A function f (x), is a proper probability density function if ∞ f (x) ò 0 and ∫f ( x )dx = 1. −∞ 1. 9 Expectation E(X) If X is a continuous random variable then µ = E ( X ) = ∞ ∫xf ( x )dx. −∞ 1.10 Variance and Standard deviation (Var(X) and St.dev(X)) If X is a continuous random variable, then σ 2 = Var ( X ) = E ( X − µ) = E ( X 2 ) − [ E ( X )]2 = E ( X 2 ) − µ 2 , 2 where E ( X 2 ) = ∞ ∫x 2 f ( x ) dx. −∞ σ = st.dev(X) = E( X 2 ) − µ 2 . 10 Example 1.11 The owner of Smartville Fashions Ltd has determined that their accounts receivable can be modelled as a random variable X, which has the probability density function : f ( x ) = 0.20 − cx 0 ≤ x ≤10 = 0 otherwise (i) Find the value of c that makes f (x) a proper probability density function . (ii) Find P( X > 5) and P( 2 ó X ó 8). (iii) Find E(X) and stdev(X). 1.11. Expectation of a function of a random variable If X is a continuous random variable and g(X) is a function of X, then E[ g( X )] = ∞ ∫g( x ) f ( x ) dx . −∞ 1.12. Cumulative distribution function or distribution functionF(x) The c.d.f. F(x) is defined as F(x) = P(X ≤ x) = a function of x For discrete random variables (i) For a discrete random variable X with probability function p (x), i.e. p (x) = P( X = x) F ( x ) = ∑ p( u) = ∑ P ( X = u) u≤x u ≤x The graph of F will always be a step function . Also Note P ( a < X ≤ b)= F ( b) − F ( a) P (a ≤ X ≤b) = P (a < X ≤b) + P ( X = a) so P (a ≤ X ≤b) = F (b) − F (a ) − P ( X = a ). Example 1.12 Construct the c.d.f. for the SPO share price model in section 1.3 and sketch it. Hence calculate the probability that the SPO share price will be at most £10.20. 11 Example 1.13 Each batch of chemical used in drug manufacture is tested for impurities.The percentage of impurity is X, where X is a random variable with probability density function given by 0 < x ≤12 12 x 1 f ( x ) = 6 (4 − x ) 1 < x ≤ 4 0 otherwise (i) Determine, for all x, the (cumulative) distribution function F(x). (ii) In order to purify the chemical it is subjected to one of four possible purification processes, the percentage impurity in the batch determining the actual process used. The process used and its cost, for each level of percentage impurity, is shown in the table. Percentage impurity x 0 < x ≤1 1 < x ≤2 2 < x ≤3 3 < x ≤4 Process used A B C D Batch cost (£) 200 250 350 500 Determine the expected cost per batch of removing the impurities. For continuous random variables For the continuous random variable X, with p.d.f. f (x), the c.d.f. , F(x) , is given by F(x) = area under the p.d.f. curve from -∞ to x x = ∫f ( u) du −∞ ( The top limit is always x when defining F(x)) so F ( ∞ ) = 1 and F ( − ∞ )= 0 . Also P( a < X < b)= P( a ≤X ≤b)= P ( a ≤X < b) = P ( a < X ≤b)= F ( b) − F ( a) Also because differentiation is the opposite of integration there is an important relationship between the p.d.f. , f (x), and the c.d.f. , F(x). Namely f (x) = dF or F ′ ( x) . dx 12 Example 1.14 Suppose that the time in months between disputes ( T ) in a large company has a probability density function f (t) given by f (t ) = λe − λt (i) (ii) (iii) (iv) t>0 = 0 otherwise Suppose the expected time between disputes is 6 months, find λ. Construct and sketch the distribution function, F(t), and hence calculate the probability of the time between disputes exceeding 2 months? Find the median inter-dispute time. Suppose there has been no dispute for 6 months. What is the probability that there will no dispute for a further 6 months? ( A continuous probability distribution with such a p.d.f. as this is called an Exponential distribution. It is used to model a number of waiting line problems, including the time between cars arriving at expressway toll booths, cars arriving at a car wash, customers arriving at a bank teller, customers arriving at a supermarket till, machine breakdowns and telephone calls to a business. For example it might play a part in deciding how many tills are operating in a supermarket so that large queues do not form but also till operators are not idle too much.) Summary 1. Random variables are variables whose values are subject to chance so that to each value of the random variable you can attach a probability. They are classified as discrete or continuous and they have distributions called probability distributions or probability models. It is most important that you classify your random variable correctly as distributions of discrete random variables are defined differently from those of continuous random variables. 2. Probability distributions (a) Discrete random variables A probability distribution is defined by specifying the values of the random variable ( or equivalently the sample space) and the probabilities associated with these values via the probability function , p(x), where p(x) is defined by p(x) = P (X = x) where X is the random variable and x is its value. Note that ∑ p( x)= 1 i.e. the probabilities sum to 1. x (b) Continuous random variables A probability distribution is defined by specifying the values of the random variable and the probability density function (p.d.f.) , f (x), ∞ where f (x) ò 0 and ∫f ( x ) dx = 1. −∞ 13 Probability is calculated by evaluating an area under the p.d.f. curve i.e. b P( a ≤ X ≤b ) = ∫f ( x ) dx. a Note P(X = a) = 0 which is very different from discrete random variables. 3. Expectation E(X) or µ gives the expected value of X or the average value of X in the long run. It gives us the centre of the probability distribution. For discrete r.v. For continuous r.v. µ = E ( X ) = Σ x p( x ) = Σ x P( X = x ). µ = E( X ) = ∞ ∫xf ( x )dx. −∞ 4. Variance and standard deviation ( Var(X) or σ 2 and St.dev(X) or σ ) They are measures of the dispersion or spread of X values about the mean µ . For discrete r.v. σ 2 = E ( X 2 ) − [ E ( X )]2 = E ( X 2 ) − µ 2 , where E ( X 2 ) = Σ x 2 p( x ) For continuous r.v. σ 2 = E ( X 2 ) − [ E ( X )]2 = E ( X 2 ) − µ 2 , where E ( X ) = 2 ∞ ∫x 2 f ( x ) dx. −∞ 5. Cumulative distribution or distribution function F(x) F(x) = P(X ≤ x). The c.d.f. F(x) is defined as For discrete r.v. F ( x ) = ∑ p( u) = ∑ P ( X = u) u≤x u ≤x and its graph will always be a step function . Also P (a < X ≤b) = F (b) − F (a ) and P (a ≤ X ≤b) = F (b) − F (a ) − P ( X = a ). x For continuous r.v. F(x) = ∫f ( u) du −∞ and it equals the area under the p.d.f. curve from -∞ to x where f (x) is the p.d.f. of X. So F ( ∞ ) = 1 and F ( − ∞ )= 0 . The p.d.f. f (x) = dF or F ′ ( x ). dx 14 Also P (a < X ≤b) = F (b) − F (a ) and Particular skills required from ‘A’ level P (a ≤ X ≤b) = F (b) − F (a ). You need to be able to integrate the following : (a) ∫x (b) ∫e n dx = ax + b 1 x n+ 1 + c n+ 1 dx = 1 ax + b e + c a which generalises to 1 ∫(ax + b) dx (c) (n ≠ − 1) = ∫f ′( x )e (a ≠ 0) f ( x) dx = e f ( x ) + c 1 ln ( ax + b) + c a which generalises to f′ ( x) ∫ f ( x ) dx ( a ≠ 0) = ln f ( x ) + c . You also need to be able to integrate by parts. b ∫u( x ). a dv b dx = [u( x )v( x )]a − dx b du ∫dx v( x ) dx a For an example of this, see Example 1.14, part (i)