Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
University of California, Davis Department of Statistics Summer Session II Statistics 13 August 8, 2012 Date of latest update: August 8 Lecture 4: Random Variables and Probability Distributions Definition 4.1 A random variable is a variable that assumes numerical values associated with the random outcomes of an experiment, where one (and only one) numerical value is assigned to each sample point. Example 1 For the coin tossing example, the sample space is {H, T }. We can represent the outcome by a random variable X such that ( 1 if the outcome is H X= 0 if the outcome is T Example 2 A dice is rolled and the up face is recorded. The sample space is {1, 2, 3, 4, 5, 6}. Here, we can represent the outcome by a random variable X where X is the number shown in the up face. Notice that X is just equal to the realized sample point (i.e.1, 2, 3, 4, 5, 6). 4.1 Two Types of Random Variables Definition 4.2 called discrete. Random variables that can assume a countable number of values are Example 3 The random variables in Example 1 and 2. Example 4 The number of earthquakes in next year: It can be 0, 1, 2, . . . . Definition 4.3 Random variables that can assume values corresponding to any of the points contained in an interval are called continuous. Example 5 The length of time X (in minutes) a student takes to complete an one-hour exam: 0 ≤ X ≤ 60. 1 4.2 Probability Distributions for Discrete Random Variables Definition 4.4 The probability distribution of a discrete random variable is a graph, table, or formula that specifies the probability associated with each possible value that the random variable can assume. Requirements for the Probability Distribution of a Discrete Random Variable X 1. p(x) ≥ 0, for all values of x. P 2. p(x) = 1 where the summation of p(x) is over all possible values of x. Example 6 Suppose you pay $5 to play a game. If you win, you get $6 back, otherwise you get nothing. Let $X be the winning from the game. The probability distribution of X is given by x 6 0 p(x) 1/3 2/3 Definition 4.5 Definition 4.6 The mean, or expected value, of a discrete random variable X is P µ = E(X) = x xp(x). The variance of a random variable x is P P σ 2 = E[(X − µ)2 ] = x (x − µ)2 p(x) = x x2 p(x) − µ2 . Example 7 Refer to Example 6, the expected value of X is given by µ = E(X) = 6 × 1 2 + 0 × = 2. 3 3 and the variance is given by the following calculation: E(X 2 ) = 62 × 1 2 + 02 × = 12. 3 3 Then the variance is given by σ 2 = V ar(X) = E(X 2 ) − µ2 = 12 − 22 = 8. Example 8 Refer to Example 6 and 7, the expected profit of the game is E(X) − 5 = 2 − 5 = −3. 2 Definition 4.7 The standard deviation of a discrete random variable is equal to the √ 2 square root of the variance, or σ = σ . Chebyshev’s Rule and Empirical Rule for a Discrete Random Variable Let X be a discrete random variable with probability distribution p(x), mean µ, and standard deviation σ. Then, depending on the shape of p(x), the following probability statements can be made: Chebyshev’s Rule Empirical Rule Applies to any Applied to probability distributions probability distribution that are mound shaped and symmetric P (µ − σ < X < µ + σ) ≥0 ≈ 0.68 P (µ − 2σ < X < µ + 2σ) ≥ 3/4 ≈ 0.95 P (µ − 3σ < X < µ + 3σ) ≥ 8/9 ≈ 1.00 4.3a The Binomial Distribution Many experiments result in dichotomous responses (i.e., responses for which there exist two possible alternatives, such as Yes-No, Pass-Fail, Defective-Nondefective, or MaleFemale). A simple example of such an experiment is the coin-toss experiment. A coin is tossed a number of times, say, 10. Each toss results in one of two outcomes, Head or Tail, and the probability of observing each of these two outcomes remains the same for each of the 10 tosses. Ultimately, we are interested in the probability distribution of x, the number of heads observed. Many other experiments are equivalent to tossing a coin (either balanced or unbalanced) a fixed number n of times and observing the number x of times that one of the two possible outcomes occurs. Random variables that posses these characteristics are called binomial random variables. Characteristics of a Binomial Random Variable 1. The experiment consists of n identical trials. 2. There are only two possible outcomes on each trial. We will denote one outcome by S (for Success) and the other by F (for Failure). 3. The probability of S remains the same from trial to trial. This probability is denoted by p, and the probability of F is denoted by q = 1 − p. 4. The trials are independent. 5. The binomial random variable X is the number of S’s in n trials. 3 The Binomial Probability Distribution p(x) = nx px q n−x (x = 0, 1, 2, · · · , n) where p q n x = = = = Probability of a success on a single trial 1−p Number of trials Number of successes in n trials n n! = x x!(n − x)! We write X ∼ B(n, p). Mean, Variance, and Standard Deviation for a Binomial Random Variable Mean: µ = np Variance: σ 2 = npq √ Standard deviation: σ = npq 4.3b The Poisson Distribution Consider a type of event occurring randomly through time, say the number of earthquakes. Let X be the number occurring in a unit interval of time. Then under the following conditions, X can be shown mathematically to have a Poisson(λ) distribution. Characteristics of a Poisson Random Variable 1. The events occur at a constant average rate of λ per unit time. 2. Occurrences are independent of one another. 3. More than one occurrence cannot happen at the same time. The Poisson Probability Distribution A random variable X taking values 0, 1, 2, . . . has a Poisson distribution if P (X = x) = e−λ λx x! for x = 0, 1, 2, . . . We write X ∼ Poisson(λ). [Note that P (X = 0) = e−λ as λ0 = 1 and 0! = 1.] For example, if λ = 2, we have P (0) = e−2 = 0.135335, P (3) = 4 e−2 ×23 3! = 0.180447. As required for a probability function, it can be shown that the probabilities P (X = x) all sum to 1. Learning to use the formula Using the Poisson probability formula, verify the following: If λ = 1, P (0) = 0.36788, and P (3) = 0.061313. Example 9 While checking the proofs of some theorems in the first four chapters of a mathematical statistics textbook, the authors found 1.6 printer’s errors per page on average. We will assume the errors were occurring randomly according to a Poisson process. Let X be the number of errors on a single page. Then X ∼ Poisson(λ = 1.6). We will use this information to calculate a large number of probabilities. a. The probability of finding no errors on a particular page is P (X = 0) = e−1.6 = 0.2019 b. The probability of finding 2 errors on any particular page is P (X = 2) = e−1.6 (1.62 ) 2! = 0.2584 c. The probability of no more than 2 errors on a page is e−1.6 (1.60 ) e−1.6 (1.61 ) e−1.6 (1.62 ) + + 0! 1! 2! = 0.2019 + 0.3230 + 0.2584 = 0.7833. P (X ≤ 2) = P (0) + P (1) + P (2) = d. The probability of more than 4 errors on a page is P (X > 4) = P (5) + P (6) + P (7) + P (8) + . . . so if we tried to calculate it in a straightforward fashion, there would be an infinite number of terms to add. However, if we use P (A) = 1 − P (Ac ), we get P (X > 4) = 1 − P (X ≤ 4) = 1 − [P (0) + P (1) + P (2) + P (3) + P (4)] = 1 − (0.2019 + 0.3230 + 0.2584 + 0.1378 + 0.0551) = 1 − 0.9762 = 0.0238. e. Let us now calculate the probability of getting a total of 5 errors on 3 consecutive pages. Let Y be the number of errors in 3 pages. The only thing that has changed is that we are now looking for errors in bigger units of the manuscript so that the average number of events per unit we should use changes from 1.6 errors per page to 3 × 1.6 = 4.8 errors per 3 pages. Thus, 5 Y ∼ Poisson(λ = 4.8) and e−4.8 (4.85 ) 5! P (Y = 5) = = 0.1747. f. What is the probability that in a block of 10 pages, exactly 3 pages have no errors? There is quite a big change now. We are no longer counting events (errors) in a single block of material so we have left the territory of the Poisson distribution. What we have now is akin to making 10 tosses of a coin. It lands “heads” if the page contains no errors. Otherwise it lands “tails”. The probability of landing “heads” (having no errors on the page) is given by (a), namely P (X = 0) = 0.2019. Let W be the number of pages with no errors. Then W ∼ B(n = 10, p = 0.2019) and P (W = 3) = 10 (0.2019)3 (0.7981)7 = 0.2037. 3 4.4 Continuous Probability Distributions for Continuous Random Variables Just as we describe the probability distribution of a discrete random variable by specifying the probability that the random variable takes on each possible value, we describe the probability distribution of a continuous random variable by giving its density function. If X is a continuous random variable, then a density function is a function f (x) with domain (a, b) which satisfies the following three properties: 1. f (x) ≥ 0 for all x in (a, b). Rb 2. −a f (x)dx = 1, and 3. for any a ≤ c < d ≤ b, P (c < X < d) = Rd c f (x)dx Notes: 1. a and b could possibly be −∞ and ∞. 2. f (x) is NOT a probability (i.e. f(1) is not the probability that X = 1), it is the probability density. 3. P (c < X < d) is the area under the curve f (x) between c and d. 4. For a continuous random variable, the probability of any single value is zero. ie. P (X = c) = 0. This also means that P (c ≤ X ≤ d) = P (c < X < d) = P (c ≤ X < d) = P (c < X ≤ d). 5. The domain of f (x) can include the endpoints [a, b], or not (a, b) (or a combination). 6 Characteristics of the density function 1. the density function is always nonnegative. i.e. the graph of the density function always lies on or above the x-axis. 2. the areas under the density curve is 1. We think of a continuous random variable with density function f as being a random variable that can be obtained by picking a point at random from under the density curve and then reading off the x-coordinate of that point. Because the total area under the density curve is 1, the probability that the random variable takes on a value between a and b is the area under the curve between a and b. More precisely, if X is a random variable with density function f and a < b, then Rb P (a ≤ X ≤ b) = a f (x)dx 4.5 The Normal Distribution One of the most commonly observed continuous random variables has a bell-shaped probability distribution (or bell curve). It is known as a normal random variable and its probability distribution is called a normal distribution. Probability Distribution for a Normal Random Variable x Probability density function: 1 2 f (x) = √ e−(1/2)[(x−µ)/σ] σ 2π where µ σ π e = = = = Mean of the normal random variable x Standard deviation 3.1416 . . . 2.71828 . . . P (x < a) is obtained from a table of normal probabilities. Definition 4.8 The standard normal distribution is a normal distribution with µ = 0 and σ = 1. A random variable with a standard normal distribution, denoted by the symbol Z, is called a standard normal random variable. Example 10 The probability that a standard normal random variable exceeds 1.96 in absolute value is P (|Z| > 1.96) = P (Z < −1.96 or Z > 1.96) = 0.05 7 Property of Normal Distributions If X is a normal random variable with mean µ and standard deviation σ, then the random variable z denoted by the formula X −µ Z= σ has a standard normal distribution. The value z describes the number of standard deviations between x and µ. Steps for Finding a Probability Corresponding to a Normal Random Variable 1. Sketch the normal distribution and indicate the mean of the random variable X. Then shade the area corresponding to the probability you want to find. 2. Convert the boundaries of the shaded area from x values to standard normal random variable z values by using the formula z= x−µ σ Show the z values under the corresponding x values on your sketch. 3. Use the standard normal table to find the areas corresponding to the z values. If necessary, use the symmetry of the normal distribution to find areas corresponding to negative z values and the fact that the total area on each side of the mean equals 0.5 to convert the areas from the table to the probabilities of the event you have selected. 4.6 Descriptive Methods for Assessing Normality In next session, we learn how to make inferences about the population based on the information contained in the sample. Several of these techniques are based on the assumption that the population is approximately normally distributed. Consequently, it will be important to determine whether the sample data come from a normal population before we can apply these techniques properly. A number of descriptive methods can be used to check for normality. Determining whether the Data Are from an Approximately Normal Distribution 1. Construct either a histogram or stem-and-leaf display for the data, and note the shape of the graph. If the data are approximately normal, the shape of the histogram or stem-and-leaf display will be similar to the normal curve (i.e., the display will be mound shaped and symmetric about the mean). 8 2. Compute the intervals x̄±s, x̄±2s, and x̄±3s, and determine the percentage of measurements falling into each. If the data are approximately normal, the percentages will be approximately equal to 68%, 95%, and 100%, respectively. 3. Find the interquartile range IQR and standard deviation s for the sample, and then calculate the ratio IQR/s. If the data are approximately normal, then IQR/s≈1.3. 4. Construct a normal probability plot for the data. If the data are approximately normal, the points will fall (approximately) on a straight line. Definition 4.9 A normal probability plot for a data set is a scatterplot with the ranked data values on one axis and their corresponding expected z-scores from a standard normal distribution on the other axis. Example 11 Below is a normal probability plot for the NBA heights from the 2008-9 season. Do these data appear to follow a normal distribution? 4.7 Sampling Distribution We assumed that we knew the probability distribution of a random variable, and using this knowledge, we were able to compute the mean, variance, and probabilities associated with the random variable. However, in most practical applications, the true mean and standard deviation are unknown quantities that have to be estimated. Numerical quantities that describe probability distributions are called parameters For instance, p, the probability of a success in a binomial experiment, and µ and σ, the mean and standard deviation, respectively, of a normal distribution, are examples of parameters. Definition 4.10 A parameter is a numerical descriptive measure of a population. Because it is based on the observations in the population, its value is almost always unknown. 9 We have also discussed the sample mean x̄, sample variance s2 , sample standard deviation s, and the like, which are numerical descriptive measures calculated from the sample. We will often use the information contained in these sample statistics to make inferences about the parameters of a population. Definition 4.11 A sample statistic is a numerical descriptive measure of a sample. It is calculated from the observations in the sample. Note that the term statistic refers to a sample quantity and the term parameter refers to a population quantity. Definition 4.12 The sampling distribution of a sample statistic calculated from a sample of n measurements is the probability distribution of the statistic. 4.8 The Central Limit Theorem We are always interested in making an inference about the parameters of some population. For example, the mean µ and variance σ 2 can specify the distribution of a normal distribution. From lecture note 2, we know the sample mean X̄ and sample variance are, in general, good estimators of µ and σ 2 respectively. We now develop pertinent information about the sampling distribution for this useful statistics. Properties of the Sampling Distribution of X̄ 1. The mean of the sampling distribution equals the mean of the sampled population. That is, µX̄ = E(X̄) = µ. When this property holds, we say X̄ is an unbiased estimator of µ. 2. The standard deviation of the sampling distribution equals Standard deviation of sampled population Square root of sample size √ That is, σX̄ = σ/ n The standard deviation σX̄ is often referred to as the standard error of the mean. Theorem 4.1 If a random sample of n observations is selected from a population with a normal distribution, the sampling distribution of X̄ will be a normal distribution. Theorem 4.2: Central Limit Theorem Consider a random sample of n observations selected from a population (any population) with mean µ and standard deviation σ. Then when n is sufficiently large, the sampling distribution of X̄ will be approximately a √ normal distribution with mean µX̄ = µ and standard deviation σX̄ = σ/ n. The larger the sample size, the better will be the normal approximation to the sampling distribution of X̄. 10