Download 14. CONTINUOUS DISTRIBUTIONS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Randomness wikipedia , lookup

Probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
14. CONTINUOUS DISTRIBUTIONS
Some distributions are discrete; others are continuous. What’s the
difference?
A random variable has a continuous distribution if it can take any
real value in some interval.
Examples of intervals: The set of all real numbers
The set of positive real numbers
All real numbers between 0 and 2.
Height, weight, distance, time and volume are continuous.
Prices, sales, income, stock returns, and an evening’s winnings at
blackjack can be usefully treated as if they are continuous, under
certain circumstances.
The most important continuous distribution is the normal
distribution, which we will discuss soon.
By contrast, binomial random variables are discrete, since their
possible values are limited to the integers 0, 1 , ··· , n.
More generally, counts are discrete.
A count is the result of counting up things: sheep, pizzas,
homeruns, etc.
Thus, the number of occurrences of some future event is a
discrete random variable.
Eg: Number of rainy days in NYC next year, Number of hits at a
website tomorrow, etc.
Continuous distributions are described by smooth curves called
probability densities f (x) (for real numbers x).
Key Property of Probability Densities:
Probability = Area Under Curve
The probability that the continuous random variable X will be
between a and b is the area under the curve f (x) between x = a
and x = b.
In calculus notation, this area is
b
P(a < X < b) = ∫ f ( x)dx
a
(Don't worry, we're not going to do any integrals. This is just a
notation for area under the curve!)
Thus, the total area under f (x) is 1:
∞
P(−∞ < X < ∞) = ∫ f ( x)dx = 1 .
−∞
Eg 1: The lifetime X of a 75 watt light bulb (in hours) is a
continuous random variable with density f (x). In terms of this
density, what is the probability that the lifetime will exceed 100
hours?
• Amazingly, the probability that a continuous random variable
will assume some particular pre-specified value is zero!
For example, the probability that the lightbulb will last exactly
100 hours is zero.
Reason: There’s no area under the curve between x = 100 and
x = 100.
Therefore, it is only meaningful to talk about the probability
that a continuous random variable lies in some interval, (a, b).
• Since P(X = a) = 0 if X is a continuous RV, it is clear that
f (a) does not represent the probability that X = a.
Eg 2: If the probability that a 75 watt light bulb will last ≤ 100
hours is 0.7, what is the probability that it will last ≥ 100 hours?
For a continuous random variable X, the mean and variance are
defined by
µ = E(X),
2
σ = E(X− µ)2
where E is the expected value.
These expected values can be computed from the density
function, f (x).
We’ll never try to do this, since expected values of continuous
random variables involve integrals. It’s still instructive to look
at the formulas:
∞
µ = ∫ x ⋅ f ( x)dx
(mean),
−∞
∞
σ 2 = ∫ ( x − µ) 2 ⋅ f ( x)dx
(variance),
−∞
As usual, µ and σ describe the center and spread of the random
variable.
The definitions above are similar to those given earlier for
discrete random variables, that is,
µ = ∑ x ⋅ p ( x) (mean),
σ 2 = ∑ ( x − µ) 2 ⋅ p ( x)dx (variance),
where p(x) is the probability distribution function of the discrete
random variable.
For continuous random variables, however, the probability
distribution function is replaced by the probability density
function, and the sums are replaced by integrals.
Relationship between histograms and probability densities:
A histogram is simply a way of estimating a probability density
based on data.
From the old faithful histogram, for example, we can estimate
the probability that the time between eruptions will be between
70 minutes and 90 minutes by taking the area under the
histogram. As we get more and more data, the histogram will
get closer and closer to the true probability density, as long as
we increase the number of bins in a suitable way.
Time Between Eruptions,
Separated by Eruption Duration
30
< 3 min
> 3 min
Frequency
20
10
0
40
50
60
70
80
Time Between Eruptions
90
100