Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
History of network traffic models wikipedia , lookup
Law of large numbers wikipedia , lookup
Non-standard calculus wikipedia , lookup
History of statistics wikipedia , lookup
Multivariate normal distribution wikipedia , lookup
Exponential family wikipedia , lookup
Chi-squared distribution wikipedia , lookup
Student's t-distribution wikipedia , lookup
Multimodal distribution wikipedia , lookup
STAT 315: LECTURE 4 CHAPTER 4: CONTINUOUS RANDOM VARIABLES TROY BUTLER 1. The basic concepts, definitions, notation, and some results Some basic definitions and notation and results from basic calculus: • A continuous random variable X is a rv that can take on any value within an interval, or union of disjoint intervals. – For a continuous rv X, the probability of any particular real number is zero, i.e. if X is a continuous rv, then P (X = x) = 0 for all x ∈ R. – For a continuous rv X, it only makes sense to ask about probabilities of events that contain an infinite number of outcomes (e.g. a small interval). In practice, we wish to use many i.i.d. samples of X to approximate the probabilities of various events in S. • Let X be a continuous rv. The probability density function (pdf) of X is a function f (x) such that for any two real numbers a and b, a ≤ b: ∫ b P (a ≤ X ≤ b) = f (x)dx. a – f (x) ≥ 0 for all x (i.e., f must be nonnegative) ∫∞ – −∞ f (x) dx = 1 (i.e. f must be integrable) • Using basic calculus we have the following: – Consider a fixed value of a in the range of continuous rv X with pdf f (x) such that f (a) > 0, then ∫ P (X = a) = ∫ a a+ϵ f (x) dx = lim f (x) dx = 0. ϵ↓0 a a−ϵ – Moreover, P (a ≤ X ≤ b) = P (a < X ≤ b) = P (a ≤ X < b) = P (a < X < b). • The cumulative distribution function (cdf) F (x) for a continuous rv X is defined for every number x by ∫ F (x) = P (X ≤ x) = x f (y) dy. −∞ – F (x) ∈ [0, 1] for all x ∈ R, limx→−∞ F (x) = 0 and limx→∞ F (x) = 1. – If X is a continuous rv with cdf F (x), then for any number a, P (X > a) = 1 − F (a) – For any two numbers a and b with a < b, P (a ≤ X ≤ b) = F (b) − F (a). – F (x) is simply a specific antiderivative of f (x). 1 2 TROY BUTLER • If X is a continuous rv with pdf f (x) and cdf F (x), then at every x for which the derivative F ′ (x) exists, F ′ (x) = f (x). • If X is a continuous rv with pdf f (x), then the expectation of X is ∫ ∞ E(X) = µX = xf (x) dx. −∞ – If h(X) is any function of X, then ∫ ∞ h(x)f (x) dx. E(h(X)) = µh(X) = −∞ – When there is no room for confusion, we often just use µ to indicate the expectation of the rv under question. • If X is a continuous rv with pdf f (x) and expectation µ, then the variance of X is ∫ ∞ V (X) = σ 2 = (x − µ)2 f (x) dx. −∞ – Sometimes it is easier to compute V (X) = E(X 2 ) − (E(X))2 = E(X 2 ) − µ2 . √ – The standard deviation of X is given by SD(X) = σ = V (x). • Let p be a number between 0 and 1. The (100p)th percentile of the distribution of continuous rv X, denoted by η(p), is defined by ∫ η(p) p = F (η(p)) = f (x)dx. −∞ 2. The Uniform Distribution A uniform rv X is a continuous rv with a pdf that can be written as 1 a≤x≤b b−a f (x) = 0 otherwise We denote this as X ∼ U (a, b). Theorem 1. If X ∼ U (a, b), then E(X) = a+b (b − a)2 , and V (X) = , 2 12 3. The Exponential Distribution The exponential distribution is related to the Poisson distribution we studied in chapter 3. It is useful in modeling many engineering and physical phenomena especially when modeling times between the occurrence of successive events and can be used in certain instances to model component lifetime. A continuous rv X is said to have an exponential distribution with parameter λ > 0 if its pdf is λ exp(−λx), x ≥ 0 f (x) = 0, otherwise. STAT 315: LECTURE 4 CHAPTER 4: CONTINUOUS RANDOM VARIABLES 3 We often write X ∼ Exp(λ) to indicate that rv X has an exponential distribution with specified parameter λ. Theorem 2. If X ∼ Exp(λ), then its cdf is given by 1 − exp(−λx), x ≥ 0, F (x) = 0, x < 0, The expectation is E(X) = λ1 , and the variance of X is V (X) = 1 λ2 . Suppose we model a component lifetime as X ∼ Exp(λ), and assume the component is still working after some fixed t0 hours from when it was first put into use. We can show that P (X ≥ t+t0 | X ≥ t0 ) = P (X ≥ t). This means that the exponential distribution has a memoryless property. 4. The Normal Distribution Meet your new best friend. The normal curve is a symmetric bell-shaped curve. A distribution represented by a normal curve is called a normal distribution. Many populations can be approximated by a normal distribution. Thanks to the Central Limit Theorem, point estimates of the mean-value of any distribution computed from i.i.d. samples often have a normal distribution making this distribution central to the rest of the course involving hypothesis testing. A continuous rv X has a normal distribution with parameters µ and σ (or µ and σ 2 ), where µ ∈ (−∞, ∞) and σ > 0, if its pdf is (x−µ)2 1 f (x) = √ e− 2σ2 2πσ The cdf of X is ∫ F (x) = P (X ≤ x) = − ∞ < x < ∞. x −∞ (t−µ)2 1 √ e− 2σ2 dt . 2πσ The mean of X is E(X) = µ, and its variance is V ar(X) = σ 2 . Figure 1. Left: pdfs of normal rv’s. Right: cdfs of normal rv’s. 4 TROY BUTLER The normal distribution with parameters µ = 0 and σ = 1 is called the standard normal distribution. Its pdf is x2 1 f (x) = √ e− 2 2π − ∞ < x < ∞. We generally denote a standard normal rv by Z. The cdf of Z is often denoted by Φ(z) and is defined by ∫ Φ(z) := F (z) = P (Z ≤ z) = z −∞ x2 1 √ e− 2 dx. 2π The erf function in Matlab is quite handy in computing probabilities of normally distributed rv’s. The Table in Appendix A.3 gives Φ(z) = P (Z ≤ z). Remark 1. We define the notation zα for α ∈ (0, 1) to represent the z value corresponding to P (Z > zα ) = α. Note this is equivalent to P (Z ≤ zα ) = 1 − α. 2 Theorem 3. If X ∼ N (µX , σX ), for any two constant a and b and Y = aX + b, then Y follows normal 2 distribution with mean µY = aµX + b and variance σY2 = a2 σX . Remark 2. Th theorem above means that we can transform any normal rv into a standard normal rv. We will do this a lot and is relatively simple. If X has a normal distribution with mean µ and standard deviation σ, then the standardized variable Z= X −µ σ has a standard normal distribution. Thus, P (a ≤ X ≤ b) = P ( a−µ b−µ ≤Z≤ ), σ σ P (X ≤ a) = P (Z ≤ a−µ ), σ P (X ≥ b) = 1 − P (Z ≤ b−µ ). σ We can always go back from the standardized variable to the non-standardized variable using X = µ+Zσ. Thus, to find the (100p)th percentile of a normal distribution with mean µ and standard deviation σ, follow the two steps below: (1) Find the (100p)th percentile of a the standard normal distribution. (2) The (100p)th percentile for N (µ, σ)=µ+[(100p)th percentile for Z]×σ. STAT 315: LECTURE 4 CHAPTER 4: CONTINUOUS RANDOM VARIABLES 5 5. Determining when to use normal distributions There are several ways we may verify that data is modeled reasonably well by a normal distribution. A population distribution is approximately normal if the empirical rule holds. This rule states that data that fits the normal distribution will have approximately: • 68% of the observations within 1 SD of the mean. • 95% of the observations within 2 SD of the mean. • 99% of the observations within 3 SD of the mean. We can always “eyeball” it by plotting a normal curve over a density histogram. Probably the most systematic eyeball method is to look at a normal probability plot (a.k.a. QQ plot). This is actually a decent method for determining which named distribution (normal, uniform, exponential, or other) should be parametrically fit (by using data to estimate the parameters defining the distribution) to data before we even go through the trouble of fitting the parameters (e.g. why estimate the mean and standard deviation if the underlying distribution looks to be uniform instead of normal?). A probability plot or QQ plot graphs the sample percentiles against the theoretical percentiles of a particular distribution. The essence of such a plot is that if the distribution on which the plot is based is a good fit, then the points in the plot will fall close to a straight line. If the distribution is a poor fit, then the points will depart from a linear pattern. Sometimes our data appears non-normal, but a transformation of the data gives us a symmetric bellshaped curve. 6. Exercises to do in class Chapter 4 exercises: 6 (b)-(e) and 12 (b)-(e), 10 (b)-(d) and 24, 30, 32, 52, 60