Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Notes: Random Variables and Probability Distributions RANDOM VARIABLES and PROBABILITY DISTRIBUTIONS Statistical experiment - a term that is used to describe any process by which several chance observations are generated. Discrete random variable - if its set of possible outcomes is countable - count data Continuous random variable - when a random variable can take on values on a continuous scale. Examples of continuous random variables represent measured data, such as all possible heights, weights, temperature, distances, or life periods. Definition 3.1 A random variable is a function that associates a real number with each element in the sample space. Definition 3.2 If a sample space contains a finite number of possibilities or an unending sequence with as many elements as there are whole numbers, it is called a discrete sample space. Definition 3.3 If a sample space contains an infinite number of possibilities equal to the number of points on a line segment, it is called a continuous sample space. ENGSTAT Notes of AM Fillone Notes: Random Variables and Probability Distributions Discrete Probability Distributions Definition 3.4 The set of ordered pairs (x, f(x)) is a probability function, probability mass function, or probability distribution of the discrete random variable X if for each possible outcome x, 1. 2. f(x) ≥ 0. ∑ f(x) = 1. 3. P(X = x) = f(x). x Definition 3.5 The cumulative distribution F(x) of a discrete random variable X with probability distribution f(x) is given by F(x) = P(X ≤ x) = ∑ f(t) t≤x ENGSTAT Notes of AM Fillone for - ∞ < x < ∝ . Notes: Random Variables and Probability Distributions Continuous Probability Distributions Definition 3.6 The function f(x) is a probability density function for the continuous random variable X, defined over the set of real numbers R, if 1. f(x) ≥ 0 for all x ∈R. ∞ 2. ∫ f(x) dx = 1. -∞ b 3. P(a<X<b) = ∫ f(x) dx. a Definition 3.7 The cumulative distribution F(x) of a continuous random variable X with density function f(x) is given by x F(x) = P(X ≤ x) = ∫ f(t) dt for - ∞ < x < ∞ -∞ which would result to P(a < X < b) = F(b) – F(a) and f(x) = dF(x)/dx. ENGSTAT Notes of AM Fillone Notes: Random Variables and Probability Distributions Joint Probability Distributions Definition 3.8 The function f(x,y) is a joint probability distribution or probability mass function of discrete random variables X and Y if 1. f(x, y) ≥ 0 for all (x, y). 2. ∑ ∑ f(x, y) = 1. x y 3. P(X = x, Y = y) = f(x, y). For any region A in the xy plane, P[X, Y ∈ A] = ∑ ∑ f(x, y). A Definition 3.9 The function f(x, y) is a joint density function of the continuous random variable X and Y if 1. f(x, y) ≥ 0 for all (x, y). ∞ ∞ 2. ∫ ∫ f(x, y) dx dy = 1, -∞ -∞ 3. P[(X, Y) ∈ A] = ∫ ∫ f(x, y) dx dy. A for any region A in the xy plane. ENGSTAT Notes of AM Fillone Notes: Random Variables and Probability Distributions Definition 3.10 The marginal distribution of X alone and of Y alone are given by g(x) = ∑ f(x, y) y and h(y) = ∑ f(x, y) x for the discrete case and by ∞ g(x) = ∫ f(x, y) dy -∞ ∞ and h(y) = ∫ f(x, y) dx -∞ for the continuous case. Definition 3.11 Let X and Y be two random variables, discrete or continuous. The conditional distribution of the random variable Y, given that X = x, is given by f(x,y) f(y|x) = ---------, g(x) g(x) > 0. Similarly, the conditional distribution of the random variable X, given that Y = y, is given by f(x,y) f(x|y) = ---------, h(y) > 0. h(y) ENGSTAT Notes of AM Fillone Notes: Random Variables and Probability Distributions Definition 3.12 Let X and Y be two random variables, discrete or continuous, with joint probability distribution f(x,y) and marginal distributions g(x) and h(y), respectively. The random variables X and Y are said to be statistically independent if and only if f(x, y) = g(x)h(y) for all (x, y) within their range. Definition 3.13 Let X1, X2,…Xn be n random variables, discrete or continuous, with joint probability distribution f(x1, x2, …, xn) and marginal distributions f1(x1), f2(x2), …, fn(xn), respectively. The random variables X1, X2, …, Xn are said to be mutually statistically independent if and only if f(x1, x2, …, xn) = f1(x1) f2(x2) … fn(xn) for all (x1, x2, …, xn) within the range. ENGSTAT Notes of AM Fillone Notes: Random Variables and Probability Distributions F(x) Cumulative distribution function 1.0 0.5 0 x A continuous random variable x is one that has the following three properties: 1. The cumulative distribution, F(x), is continuous. 2. x takes on an uncountably infinite number of values in the interval (-∞, ∞). 3. The probability that x equals any one particular value is 0. f(x) continuous density function F(x0) x Properties of a Density Funcion 1. f(x) ≥ 0 ∞ 2. ∫ f(x)dx = F(∞) = 1 -∞ ENGSTAT Notes of AM Fillone x0 Notes: Random Variables and Probability Distributions Empirical Distributions - Sometimes statistical methods cannot generate sufficient information or experimental data to characterize the distribution totally - However, sets of data often can be used to learn about certain properties of the distribution - A summary of a collection of data via graphical display can provide insight regarding the system from which the data were taken Steam and leaf plot – it is a combined tabular and graphical display of statistical data which can be very useful in studying the behavior of the distribution Example: Using the data in Table 3.1, which represent the lives of 40 similar car batteries recorded to the nearest tenth of a year. The batteries were guaranteed to last 3 years. Steps in constructing a stem and leaf plot 1. Split each observation into two parts consisting of a stem and a leaf such that the steam represents the digit preceding the decimal and the leaf corresponds to the decimal part of the number. For example: For the number 3.7 the digit 3 is designated the stem and the digit 7 is the leaf. 2. Summarize the number of leaves recorded opposite each stem under the frequency column. See Table 3.2 Table 3.2 Stem and Leaf Plot of Battery Lives Stems Leaves Frequency 1 69 2 2 25696 5 3 4318514723628297130097145 25 4 71354172 8 3. When the stem and leaf plot does not provide an adequate picture of the distribution, increase the number of stems in the plot. 4. This can be done by writing each stem value twice on the left side of the vertical line and then record the leaves 0, 1, 2, 3, and 4 opposite the appropriate stem value where it appears for the first time; and the leaves 5, 6, 7, 8, and 9 opposite this same stem value where it appears for the second time. See Table 3.3. Stems 1. Leaves 69 ENGSTAT Notes of AM Fillone Frequency 2 Notes: Random Variables and Probability Distributions 2* 2. 3* 3. 4* 4. 2 5696 431142322130014 8576897975 13412 757 1 4 15 10 5 3 5. A further increase in the number of stems may be achieved by writing each stem value five times on the left side of a vertical line, where we might now code the stems a for leaves 0 and 1, b for leaves 2 and 3, c for leaves 4 and 5, d for leaves 6 and 7, e for leaves 8 and 9. *Note: The number of appropriate stem values is guided by the size the sample, usually between 5 and 20 stems. ENGSTAT Notes of AM Fillone