Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
14. CONTINUOUS DISTRIBUTIONS Some distributions are discrete; others are continuous. What’s the difference? A random variable has a continuous distribution if it can take any real value in some interval. Examples of intervals: The set of all real numbers The set of positive real numbers All real numbers between 0 and 2. Height, weight, distance, time and volume are continuous. Prices, sales, income, stock returns, and an evening’s winnings at blackjack can be usefully treated as if they are continuous, under certain circumstances. The most important continuous distribution is the normal distribution, which we will discuss soon. By contrast, binomial random variables are discrete, since their possible values are limited to the integers 0, 1 , ··· , n. More generally, counts are discrete. A count is the result of counting up things: sheep, pizzas, homeruns, etc. Thus, the number of occurrences of some future event is a discrete random variable. Eg: Number of rainy days in NYC next year, Number of hits at a website tomorrow, etc. Continuous distributions are described by smooth curves called probability densities f (x) (for real numbers x). Key Property of Probability Densities: Probability = Area Under Curve The probability that the continuous random variable X will be between a and b is the area under the curve f (x) between x = a and x = b. In calculus notation, this area is b P(a < X < b) = ∫ f ( x)dx a (Don't worry, we're not going to do any integrals. This is just a notation for area under the curve!) Thus, the total area under f (x) is 1: ∞ P(−∞ < X < ∞) = ∫ f ( x)dx = 1 . −∞ Eg 1: The lifetime X of a 75 watt light bulb (in hours) is a continuous random variable with density f (x). In terms of this density, what is the probability that the lifetime will exceed 100 hours? • Amazingly, the probability that a continuous random variable will assume some particular pre-specified value is zero! For example, the probability that the lightbulb will last exactly 100 hours is zero. Reason: There’s no area under the curve between x = 100 and x = 100. Therefore, it is only meaningful to talk about the probability that a continuous random variable lies in some interval, (a, b). • Since P(X = a) = 0 if X is a continuous RV, it is clear that f (a) does not represent the probability that X = a. Eg 2: If the probability that a 75 watt light bulb will last ≤ 100 hours is 0.7, what is the probability that it will last ≥ 100 hours? For a continuous random variable X, the mean and variance are defined by µ = E(X), 2 σ = E(X− µ)2 where E is the expected value. These expected values can be computed from the density function, f (x). We’ll never try to do this, since expected values of continuous random variables involve integrals. It’s still instructive to look at the formulas: ∞ µ = ∫ x ⋅ f ( x)dx (mean), −∞ ∞ σ 2 = ∫ ( x − µ) 2 ⋅ f ( x)dx (variance), −∞ As usual, µ and σ describe the center and spread of the random variable. The definitions above are similar to those given earlier for discrete random variables, that is, µ = ∑ x ⋅ p ( x) (mean), σ 2 = ∑ ( x − µ) 2 ⋅ p ( x)dx (variance), where p(x) is the probability distribution function of the discrete random variable. For continuous random variables, however, the probability distribution function is replaced by the probability density function, and the sums are replaced by integrals. Relationship between histograms and probability densities: A histogram is simply a way of estimating a probability density based on data. From the old faithful histogram, for example, we can estimate the probability that the time between eruptions will be between 70 minutes and 90 minutes by taking the area under the histogram. As we get more and more data, the histogram will get closer and closer to the true probability density, as long as we increase the number of bins in a suitable way. Time Between Eruptions, Separated by Eruption Duration 30 < 3 min > 3 min Frequency 20 10 0 40 50 60 70 80 Time Between Eruptions 90 100