Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Continuous Distributions 1 Continuous Distributions Foundations for much of statistical inference • • • • • • • • Normal Distribution Log Normal Distribution Gamma Distribution Chi Square Distribution F Distribution t Distribution Weibull Distribution Extreme Value Distribution (Type I and II) • Exponential Distribution Environmental variables Time to failure, radioactivity Basis for statistical tests. Lifetime distributions Reaction Kinetics Continuous random variables are defined for continuous numbers on the real line. Probabilities have to be computed for all possible sets of numbers. 2 Continuous Distributions The distributions discussed so far have only has a discrete set of possible outcomes (eg 0,1,2,...). Now we'll discuss continuous distributions, whose outcomes lie along the real line. Strange Observation: One interesting point about continuous probability distributions is that, because an infinite number of points lie on the real line, the probability of observing any particular point is effectively zero. This means that the height of the curve does not represent the probability 3 Continuous Distributions (PDF) Continuous distributions are described by probability density functions, f(x) What is the meaning of the probability function f(x) when X is continuous? First observe that it is meaningless to define events in terms of single continuous values. The probability of an event occurring at 2.35678935465457348204945023983598459830923…. is zero. A continuous random variable has an infinite number of values. Thus for a continuous random variable, an event must be defined in terms of an interval of values. 4 Continuous Distributions One can therefore find the probability that a random variable X will fall between two values by integrating f(x) over the interval: The total integral over the real line must equal one: Any one point has zero probability of occurrence. 5 Continuous Distributions Big difference between discrete and continuous distributions: Height is the probability (Sum of heights = 1) The area is the probability (Total area = 1) 6 Probability Density Function (PDF) A function which integrates to 1 over its range and from which event probabilities can be determined. f(x) Area under curve sums to one. Random variable range 7 Probability Density Function Chi Square density functions x 2 0. 0.1 0.2 0.3 0.4 0.5 The pdf does not have to be symmetric, nor be defined for all real numbers. fX(x|b) The shape of the curve is determined by one or more distribution parameters. 0 5 10 15 20 25 30 y 8 Normal Distribution Or Gaussian Distribution The Gaussian distribution, or Normal distribution, is probably the most commonly encountered continuous distribution. Each time you take a set of data, average it and calculate the standard deviation of that data, one implicitly assumes that the underlying distribution is Gaussian. The normal distribution is the distribution that is expected when measurements are made up from a large number of 'noise' components that are all distributed in the same way as each other. Many biological and physical measurements have lots of sources of inaccuracy and noise and so the distributions of those measurements will be approximately normal, as long as the distributions of those components is 9 similar (They don’t have to be normal!) Normal Distribution - Properties 1. 2. 3. 4. The mean, median, and mode are equal The normal curve is bell-shaped and symmetric about the mean The total area under the curve is equal to one The normal curve asymptotically approaches zero on either side of the mean. 10 Standard Normal Distribution: Z score Rescales any normal distribution axis from its true units (time, weight, dollars, barrels, and so forth) to the standard measure referred to as a z-value. Thus, any value of the normally distributed continuous random variable can be represented by a unique z-value. 1. Moves mean to zero 2. Normalizes the standard deviation so that 68% mark is now at the x value 1.0 11 Standard Normal Distribution All normal random variables can be related back to the standard normal random variable. m-3s -3 m-2s m-s m m+s m+2s m+3s -2 0 +1 -1 +2 +3 A Standard Normal random variable has mean 0 and standard deviation 1. 12 Illustration Density of (X-m)/s Density of X-m Density of X s 1 m 0 13 Normal Distribution - Properties To the left of 𝜇 − 𝜎 and the right of 𝜇 + 𝜎 the graph curves upwards. The graph curves downwards to the right of 𝜇 − 𝜎 and 𝜇 + 𝜎. The points at which the curve changes are called the inflection points. Inflection point: Where the second derivative is zero and changes sign 14 Normal Distribution A symmetric distribution defined on the range - to + whose shape is defined by two parameters, the mean, denoted 𝜇, that centers the distribution, and the standard deviation, 𝜎, that determines the spread of the distribution. 68% of total area is between 𝜇 − 𝜎 and 𝜇 + 𝜎 Inflection Point P( m - s X m + s ) 68% 15 Normal Distribution 16 Notation Suppose 𝑿 has a normal distribution with mean m and standard deviation s, we often denote this by 𝑿~𝑵(𝝁, 𝝈). A new random variable defined as 𝒁 = (𝑿 − 𝝁)/𝝈, has the standard normal distribution, denoted 𝒁~ 𝑵 𝟎, 𝟏 𝝈𝒁 + 𝝁 = 𝑿 To create a random variable with specific mean and standard deviation, we start with a standard normal deviate, multiply it by the target standard deviation, and then add the target mean. Why is this important? Because in this way, the probability of any event on a normal random variable with any given mean and standard deviation can be computed from tables of the standard normal distribution. Tables in statistics textbooks often have pre-calculated tables that show how the z-score varies with the probability density. 17 Example Assume that a set of test scores has a mean of 150 and standard deviation of 25. If a particular student had a score of 190, what is his/her z –score? 𝑥 −𝜇 𝑧= 𝜎 Therefore z = (190-150)/25 = 1.6 That is the score is 1.6 standard deviations above form the mean. What percentage of students have scores above this? 18 Look up table p-value = P(Z>+1.6) =1 - P(Z<1.6) =1 – 0.9452 = 0.0548 = 5.5 % 19 Look up table Because of symmetry we could also have looked up the area from –infinity to -1.6 20 Exercises 1. If z = 2.15, what is the area beyond z? 2. Find the area below z 3. What is the sum of the above two areas? 4. What is the area between the mean and 2.15 standard deviations 5. What is the probability of obtaining a z score between −2.20 and 0.25 on the standard normal curve? 6. What z score is exceeded by 10% of all scores under the normal curve? 21 Example After running repeated experiments, we find that the doubling time for a particular strain of E. coli is 58 minutes with a standard deviation of 10 minutes. Using z-scores, determine the range of expected doubling times at the 95% and 99% confidence levels. Rearrange the z-score formula to solve for x (both upper and lower): 𝑥 =𝜇+𝑧𝜎 Look up a standard table to find out what the z score is for 95% 𝑥𝑢𝑝𝑝𝑒𝑟 = 58 + 1.645 × 10 22 Example Look up a standard table to find out what the z score is for 95% 1.645 23 Example Look up a standard table to find out what the z score is for 95% 𝑥𝑢𝑝𝑝𝑒𝑟 = 58 + 1.645 × 10 = 74.45 mins 𝑥𝑙𝑜𝑤𝑒𝑟 = 58 − 1.645 × 10 = 41.55 mins Class you work out the 99% limits. 24 Week 4: Exercise A pharmaceutical company manufactures stocks of Ebola vaccine. The vaccine has a shelf life that is approximately normally distributed with mean equal to 800 hours and standard deviation of 40 hours. Find the probability that a random sample of 16 vials of vaccine will have an likely shelf life of 775 hours? 25