Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
MATH20812: PRACTICAL STATISTICS I SEMESTER 2 NOTES ON RANDOM VARIABLES Things to Know Random Variable A random variable is a function that assigns a numerical value to each outcome of a particular experiment. A random variable is denoted by an uppercase letter, such as X and a corresponding lower case letter such as x is used to denote a possible value of X. The set of possible numbers of a random variable X is referred to as the range of X. The probability of the event that X = x is denoted by Pr(X = x). Discrete Random Variable A discrete random variable is a random variable with a finite (or countably infinite) range. Examples: number of accidents, number of applicant interviewed, number of power plants, etc Continuous Random Variable If the range of a random variable contains an interval of real numbers, then it is a continuous random variable. Examples: temperature, breaking strength, failure time, etc Probability Mass Function For a discrete random variable X: the function f (x) = Pr(X = x) is called a probability mass function if it satisfies f (x) ≥ 0 for all possible values of x and X f (x) = 1. (1) for all x Probability Density Function For a continuous random variable X: the function f (x) is called a probability density function if it satisfies f (x) ≥ 0 for all possible values of x and Z b f (x)dx = Pr(a < X < b) (2) a for all a and b (see figure 1). Two consequences are: Z ∞ f (x)dx = Pr(−∞ < X < ∞) = 1 (3) −∞ and Z a f (x)dx = Pr(X = a) = 0. a 1 (4) Figure 1 Probability Density Function of X Pr(a < X < b) 0 a b X Cumulative Distribution Function (CDF) The cumulative distribution function of a random variable X is: X F (x) = Pr(X ≤ x) = f (y) (5) for all y ≤ x if X is a discrete random variable; F (x) = Pr(X ≤ x) = Z x f (y)dy −∞ if X is a continuous random variable. Properties of CDF The CDF has the following properties: 2 (6) (i) 0 ≤ F (x) ≤ 1 (see figure 2); (ii) If a ≤ b then F (a) ≤ F (b) (see figure 2); (iii) F (−∞) = 0 (see figure 2); (iv) F (∞) = 1 (see figure 2); (v) If X is a continuous random variable then F (b) − F (a) = Pr(a < X < b) (see figure 2); (vi) If X is a continuous random variable then f (x) = ∂F (x) . ∂x (7) Figure 2 Cumulative Distribution Function of X 1 Pr(a<X<b) 0 a b X Percentiles The 100(1 − α)% percentile of a random variable X, denoted by xα , is the value of X exceeded with probability α, i.e. Pr(X ≤ xα ) = 1 − α. 3 (8) Expected Value The expected value of a random variable X is: X E(X) = xf (x) (9) xf (x)dx (10) for all x if X is a discrete random variable; Z E(X) = ∞ −∞ if X is a continuous random variable. Properties of Expectation (i) E(c) = c (c is a constant); (ii) E(cX) = cE(X) (c is a constant); (iii) E(cX + d) = cE(X) + d (c and d are constants). Expectation of Function For any real-valued function g, the expected value of g(X) is: X E(g(X)) = g(x)f (x) (11) g(x)f (x)dx (12) for all x if X is a discrete random variable; E(g(X)) = Z ∞ −∞ if X is a continuous random variable. Variance The variance of a random variable X is: V ar(X) = E [X − E(X)]2 = E(X 2 ) − (E(X))2 . Properties of Variance (i) V ar(c) = 0 (c is a constant); (ii) V ar(cX) = c2 V ar(X) (c is a constant); (iii) V ar(cX + d) = c2 V ar(X) (c and d are constants). 4 (13) Standard Deviation The standard deviation of a random variable X is: q SD(X) = (14) V ar(X), a measure of spread. Coefficient of Variation The coefficient of variation of a random variable X is: CV (X) = SD(X) , E(X) (15) a dimensionless measure of spread relative to the expected value. Measures of Shape Two dimensionless measures of shape are skewness and kurtosis, defined by γ1 (X) = E [X − E(X)]3 (16) [V ar(X)]3/2 and γ2 (X) = E [X − E(X)]4 , [V ar(X)]2 (17) respectively. Note that ³ ´ ³ ´ E [X − E(X)]3 = E X 3 − 3E(X)E X 2 + 2 (E(X))3 (18) and ³ ´ ³ ´ ³ ´ E [X − E(X)]4 = E X 4 − 4E(X)E X 3 + 6 (E(X))2 E X 2 − 3 (E(X))4 . (19) Reliability Function Let a random variable X represent the time between failures of a system. Clearly this is a continuous random variable. The reliability function at time t denoted by F̄ (t) is the probability that the system survives longer than time t, i.e. F̄ (t) = Pr(X > t) = 1 − Pr(X ≤ t) = 1 − F (t). (20) Failure Rate Function The failure rate of many systems (e.g. human body) change over time. In general, failure rate is a function of the system’s lifetime so far. The hazard rate or the failure rate function at time t denoted by λ(t) is found by dividing the density function at time t by the reliability function for that duration: λ(t) = f (t) . F̄ (t) (21) The typical shape of a hazard rate function is shown in figure below: Region I, where the function decreases, is termed the region of infant mortality; Region II, where the function does not change rapidly, is termed the random failure re gion; Region III is the wear-out region, where the function increases due to deterioration. 5 20 15 10 5 Failure Rate Function Region II Region III 0 Region I 0.0 0.2 0.4 0.6 0.8 1.0 t An Alternative to Kurtosis If X is a continuous random variable with pdf f (x) then Tf = V ar {log (f (X))} (22) measures the instrinic shape of the distribution. This measure was introduced last year (2001) by Dr. K. -S. Song from the Florida State University [see the Journal of Statistical Planning and Inference, volume 93, pp. 51–69]. It is better measure than kurtosis in measuring the shape of a distribution. 6