Download MATH20812: PRACTICAL STATISTICS I SEMESTER 2 NOTES ON

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
MATH20812: PRACTICAL STATISTICS I
SEMESTER 2
NOTES ON RANDOM VARIABLES
Things to Know
Random Variable A random variable is a function that assigns a numerical value to each outcome
of a particular experiment.
A random variable is denoted by an uppercase letter, such as X and a corresponding lower case
letter such as x is used to denote a possible value of X. The set of possible numbers of a random
variable X is referred to as the range of X. The probability of the event that X = x is denoted by
Pr(X = x).
Discrete Random Variable A discrete random variable is a random variable with a finite (or
countably infinite) range.
Examples: number of accidents, number of applicant interviewed, number of power plants, etc
Continuous Random Variable If the range of a random variable contains an interval of real
numbers, then it is a continuous random variable.
Examples: temperature, breaking strength, failure time, etc
Probability Mass Function For a discrete random variable X: the function f (x) = Pr(X = x)
is called a probability mass function if it satisfies f (x) ≥ 0 for all possible values of x and
X
f (x) = 1.
(1)
for all x
Probability Density Function For a continuous random variable X: the function f (x) is called
a probability density function if it satisfies f (x) ≥ 0 for all possible values of x and
Z
b
f (x)dx = Pr(a < X < b)
(2)
a
for all a and b (see figure 1). Two consequences are:
Z
∞
f (x)dx = Pr(−∞ < X < ∞) = 1
(3)
−∞
and
Z
a
f (x)dx = Pr(X = a) = 0.
a
1
(4)
Figure 1
Probability Density Function of X
Pr(a < X < b)
0
a
b
X
Cumulative Distribution Function (CDF) The cumulative distribution function of a random
variable X is:
X
F (x) = Pr(X ≤ x) =
f (y)
(5)
for all y ≤ x
if X is a discrete random variable;
F (x) = Pr(X ≤ x) =
Z
x
f (y)dy
−∞
if X is a continuous random variable.
Properties of CDF The CDF has the following properties:
2
(6)
(i) 0 ≤ F (x) ≤ 1 (see figure 2);
(ii) If a ≤ b then F (a) ≤ F (b) (see figure 2);
(iii) F (−∞) = 0 (see figure 2);
(iv) F (∞) = 1 (see figure 2);
(v) If X is a continuous random variable then F (b) − F (a) = Pr(a < X < b) (see figure 2);
(vi) If X is a continuous random variable then
f (x) =
∂F (x)
.
∂x
(7)
Figure 2
Cumulative Distribution Function of X
1
Pr(a<X<b)
0
a
b
X
Percentiles The 100(1 − α)% percentile of a random variable X, denoted by xα , is the value of X
exceeded with probability α, i.e.
Pr(X ≤ xα ) = 1 − α.
3
(8)
Expected Value The expected value of a random variable X is:
X
E(X) =
xf (x)
(9)
xf (x)dx
(10)
for all x
if X is a discrete random variable;
Z
E(X) =
∞
−∞
if X is a continuous random variable.
Properties of Expectation
(i) E(c) = c (c is a constant);
(ii) E(cX) = cE(X) (c is a constant);
(iii) E(cX + d) = cE(X) + d (c and d are constants).
Expectation of Function For any real-valued function g, the expected value of g(X) is:
X
E(g(X)) =
g(x)f (x)
(11)
g(x)f (x)dx
(12)
for all x
if X is a discrete random variable;
E(g(X)) =
Z
∞
−∞
if X is a continuous random variable.
Variance The variance of a random variable X is:
V ar(X) = E [X − E(X)]2 = E(X 2 ) − (E(X))2 .
Properties of Variance
(i) V ar(c) = 0 (c is a constant);
(ii) V ar(cX) = c2 V ar(X) (c is a constant);
(iii) V ar(cX + d) = c2 V ar(X) (c and d are constants).
4
(13)
Standard Deviation The standard deviation of a random variable X is:
q
SD(X) =
(14)
V ar(X),
a measure of spread.
Coefficient of Variation The coefficient of variation of a random variable X is:
CV (X) =
SD(X)
,
E(X)
(15)
a dimensionless measure of spread relative to the expected value.
Measures of Shape Two dimensionless measures of shape are skewness and kurtosis, defined by
γ1 (X) =
E [X − E(X)]3
(16)
[V ar(X)]3/2
and
γ2 (X) =
E [X − E(X)]4
,
[V ar(X)]2
(17)
respectively. Note that
³
´
³
´
E [X − E(X)]3 = E X 3 − 3E(X)E X 2 + 2 (E(X))3
(18)
and
³
´
³
´
³
´
E [X − E(X)]4 = E X 4 − 4E(X)E X 3 + 6 (E(X))2 E X 2 − 3 (E(X))4 .
(19)
Reliability Function Let a random variable X represent the time between failures of a system.
Clearly this is a continuous random variable. The reliability function at time t denoted by F̄ (t) is
the probability that the system survives longer than time t, i.e.
F̄ (t) = Pr(X > t) = 1 − Pr(X ≤ t) = 1 − F (t).
(20)
Failure Rate Function The failure rate of many systems (e.g. human body) change over time.
In general, failure rate is a function of the system’s lifetime so far. The hazard rate or the failure
rate function at time t denoted by λ(t) is found by dividing the density function at time t by the
reliability function for that duration:
λ(t) =
f (t)
.
F̄ (t)
(21)
The typical shape of a hazard rate function is shown in figure below: Region I, where the function
decreases, is termed the region of infant mortality; Region II, where the function does not change
rapidly, is termed the random failure re gion; Region III is the wear-out region, where the function
increases due to deterioration.
5
20
15
10
5
Failure Rate Function
Region II
Region III
0
Region I
0.0
0.2
0.4
0.6
0.8
1.0
t
An Alternative to Kurtosis If X is a continuous random variable with pdf f (x) then
Tf
= V ar {log (f (X))}
(22)
measures the instrinic shape of the distribution. This measure was introduced last year (2001)
by Dr. K. -S. Song from the Florida State University [see the Journal of Statistical Planning and
Inference, volume 93, pp. 51–69]. It is better measure than kurtosis in measuring the shape of a
distribution.
6