Download RANDOM VARIABLES and PROBABILITY DISTRIBUTIONS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability wikipedia , lookup

Randomness wikipedia , lookup

Transcript
Notes: Random Variables and Probability Distributions
RANDOM VARIABLES and
PROBABILITY DISTRIBUTIONS
Statistical experiment - a term that is used to describe
any process by which several chance observations are
generated.
Discrete random variable - if its set of possible
outcomes is countable - count data
Continuous random variable - when a random
variable can take on values on a continuous scale.
Examples of continuous random variables represent
measured data, such as all possible heights, weights,
temperature, distances, or life periods.
Definition 3.1 A random variable is a function that
associates a real number with each element in the
sample space.
Definition 3.2 If a sample space contains a finite
number of possibilities or an unending sequence with as
many elements as there are whole numbers, it is called
a discrete sample space.
Definition 3.3 If a sample space contains an infinite
number of possibilities equal to the number of points on
a line segment, it is called a continuous sample space.
ENGSTAT Notes of AM Fillone
Notes: Random Variables and Probability Distributions
Discrete Probability Distributions
Definition 3.4 The set of ordered pairs (x, f(x)) is a
probability function, probability mass function, or
probability distribution of the discrete random variable
X if for each possible outcome x,
1.
2.
f(x) ≥ 0.
∑ f(x) = 1.
3.
P(X = x) = f(x).
x
Definition 3.5 The cumulative distribution F(x) of a
discrete random variable X with probability distribution
f(x) is given by
F(x) = P(X ≤ x) = ∑ f(t)
t≤x
ENGSTAT Notes of AM Fillone
for - ∞ < x < ∝ .
Notes: Random Variables and Probability Distributions
Continuous Probability Distributions
Definition 3.6 The function f(x) is a probability
density function for the continuous random variable X,
defined over the set of real numbers R, if
1. f(x) ≥ 0 for all x ∈R.
∞
2. ∫ f(x) dx = 1.
-∞
b
3. P(a<X<b) = ∫ f(x) dx.
a
Definition 3.7 The cumulative distribution F(x) of a
continuous random variable X with density function
f(x) is given by
x
F(x) = P(X ≤ x) = ∫ f(t) dt for - ∞ < x < ∞
-∞
which would result to
P(a < X < b) = F(b) – F(a)
and
f(x) = dF(x)/dx.
ENGSTAT Notes of AM Fillone
Notes: Random Variables and Probability Distributions
Joint Probability Distributions
Definition 3.8 The function f(x,y) is a joint
probability distribution or probability mass function
of discrete random variables X and Y if
1. f(x, y) ≥ 0 for all (x, y).
2. ∑ ∑ f(x, y) = 1.
x
y
3. P(X = x, Y = y) = f(x, y).
For any region A in the xy plane,
P[X, Y ∈ A] = ∑ ∑ f(x, y).
A
Definition 3.9 The function f(x, y) is a joint density
function of the continuous random variable X and Y if
1. f(x, y) ≥ 0 for all (x, y).
∞
∞
2. ∫ ∫ f(x, y) dx dy = 1,
-∞ -∞
3. P[(X, Y) ∈ A] = ∫ ∫ f(x, y) dx dy.
A
for any region A in the xy plane.
ENGSTAT Notes of AM Fillone
Notes: Random Variables and Probability Distributions
Definition 3.10 The marginal distribution of X alone
and of Y alone are given by
g(x) = ∑ f(x, y)
y
and h(y) = ∑ f(x, y)
x
for the discrete case and by
∞
g(x) = ∫ f(x, y) dy
-∞
∞
and h(y) = ∫ f(x, y) dx
-∞
for the continuous case.
Definition 3.11 Let X and Y be two random variables,
discrete or continuous. The conditional distribution of
the random variable Y, given that X = x, is given by
f(x,y)
f(y|x) = ---------,
g(x)
g(x) > 0.
Similarly, the conditional distribution of the random
variable X, given that Y = y, is given by
f(x,y)
f(x|y) = ---------, h(y) > 0.
h(y)
ENGSTAT Notes of AM Fillone
Notes: Random Variables and Probability Distributions
Definition 3.12 Let X and Y be two random variables,
discrete or continuous, with joint probability
distribution f(x,y) and marginal distributions g(x) and
h(y), respectively. The random variables X and Y are
said to be statistically independent if and only if
f(x, y) = g(x)h(y)
for all (x, y) within their range.
Definition 3.13 Let X1, X2,…Xn be n random variables,
discrete or continuous, with joint probability
distribution f(x1, x2, …, xn) and marginal distributions
f1(x1), f2(x2), …, fn(xn), respectively. The random
variables X1, X2, …, Xn are said to be mutually
statistically independent if and only if
f(x1, x2, …, xn) = f1(x1) f2(x2) … fn(xn)
for all (x1, x2, …, xn) within the range.
ENGSTAT Notes of AM Fillone
Notes: Random Variables and Probability Distributions
F(x)
Cumulative distribution function
1.0
0.5
0
x
A continuous random variable x is one that has the following three
properties:
1. The cumulative distribution, F(x), is continuous.
2. x takes on an uncountably infinite number of values in the interval
(-∞, ∞).
3. The probability that x equals any one particular value is 0.
f(x)
continuous density function
F(x0)
x
Properties of a Density Funcion
1. f(x) ≥ 0
∞
2. ∫ f(x)dx = F(∞) = 1
-∞
ENGSTAT Notes of AM Fillone
x0
Notes: Random Variables and Probability Distributions
Empirical Distributions
- Sometimes statistical methods cannot generate sufficient
information or experimental data to characterize the distribution
totally
- However, sets of data often can be used to learn about certain
properties of the distribution
- A summary of a collection of data via graphical display can
provide insight regarding the system from which the data were
taken
Steam and leaf plot – it is a combined tabular and graphical display of statistical data which
can be very useful in studying the behavior of the distribution
Example: Using the data in Table 3.1, which represent the lives of 40 similar car batteries
recorded to the nearest tenth of a year. The batteries were guaranteed to last 3 years.
Steps in constructing a stem and leaf plot
1. Split each observation into two parts consisting of a stem and a leaf such that the steam
represents the digit preceding the decimal and the leaf corresponds to the decimal part of
the number.
For example: For the number 3.7 the digit 3 is designated the stem and the digit 7 is
the leaf.
2. Summarize the number of leaves recorded opposite each stem under the frequency
column. See Table 3.2
Table 3.2 Stem and Leaf Plot of Battery Lives
Stems
Leaves
Frequency
1
69
2
2
25696
5
3
4318514723628297130097145
25
4
71354172
8
3. When the stem and leaf plot does not provide an adequate picture of the distribution,
increase the number of stems in the plot.
4. This can be done by writing each stem value twice on the left side of the vertical line and
then record the leaves 0, 1, 2, 3, and 4 opposite the appropriate stem value where it
appears for the first time; and the leaves 5, 6, 7, 8, and 9 opposite this same stem value
where it appears for the second time. See Table 3.3.
Stems
1.
Leaves
69
ENGSTAT Notes of AM Fillone
Frequency
2
Notes: Random Variables and Probability Distributions
2*
2.
3*
3.
4*
4.
2
5696
431142322130014
8576897975
13412
757
1
4
15
10
5
3
5. A further increase in the number of stems may be achieved by writing each stem value
five times on the left side of a vertical line, where we might now code the stems a for
leaves 0 and 1, b for leaves 2 and 3, c for leaves 4 and 5, d for leaves 6 and 7, e for leaves
8 and 9.
*Note: The number of appropriate stem values is guided by the size the sample, usually
between 5 and 20 stems.
ENGSTAT Notes of AM Fillone