Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Psych 524, 10/10/05 p. 1/8 Random Variables and Probability Distributions (based on Kirk, Ch. 7) Random Sampling random sampling: method of drawing samples from a population so that every possible sample of a particular size has the same probability of being selected with replacement: sampled element is returned to the population and can therefore be drawn again without replacement: sampled element is not returned to the population (most common sampling method) examples of random sampling: flip a coin, draw out of a hat/jar, use a spinner, use a table of random numbers using a table of random numbers refer to table D1 in Kirk basic idea is to determine a random starting place (Kirk suggests tossing a pencil to see where it lands, but I would suggest labeling 50 objects & drawing out of a hat… 1st draw: 1 through 25 land you in the corresponding column on the first page, 26-50 land you in the corresponding column (+25) on the second page 2nd draw: what you draw here corresponds to the row if you have up to 99 cases in the population, proceed across (and down) the table, with each two-digit number on Table D.1 corresponding to a case number; if you have fewer than 99 cases, skip all case numbers that do not exist in your population (e.g., if you only have 57 cases, but the table gives you an 89, go on to the next number in the table) if you have up to 999 cases, proceed across (and down) the table so that you use a 3-digit number to select your case, then skip to the next 3-digit number, etc.; note that the spaces between numbers in the table are irrelevant, so if the table reads 09 76 63, you would read this as “097” (or 97) and “663” Random Variables random variable: numerical quantity whose value is determined by the outcome of a random experiment; a symbol for a set of numerical events, with each having a probability e.g., we could say that the random variable, X, was the symbol for the set of values that would result from the roll of a die or the measurement of randomly drawn women’s heights Psych 524, 10/10/05 p. 2/8 random variables are quantitative values of random variables are determined by chance random variables can be discrete or continuous (but remember, we can only measure continuous variables in discrete units) probability distribution: a distribution that associates a probability with each value of a random variable (like a frequency distribution but with probabilities instead of frequencies) e.g., regular 6-sided die e.g., 6-sided die with sides labeled 1,2,2,3,3,4 e.g., roll 6-sided die twice in a row and calculate mean Expected value of discrete random variable With frequency distributions, we can calculate several measures of central tendency. We can calculate a similar index for probability distributions. We refer to this as expected value, or E(X), which can be computed as: E(X) = Σp(Xi)Xi This may look more daunting than it is. Basically, you take each possible value of the random variable, multiply it by its associated probability, and sum the results. This is very similar to what you do to get the mean from an ungrouped frequency distribution. In that case, you multiplied the number of Psych 524, 10/10/05 p. 3/8 cases observed for each score by that score, summed up, and divided by the total number of cases. Essentially, all you were doing was weighting each score by its relative frequency or proportion. That is all we are doing now. Examples: take a look at each of the examples on page 3; first, make a guess as to what the expected value should be; then, calculate E(X) based on the above formula Expected value of continuous random variable Calculating the expected value for a continuous random variable (e.g., height) is not as straight-forward (see the box on the next page for a hint) Because of this, we must determine the area under portions of the curve defined by certain intervals (we can say what portion of the population is between 5.9 and 6 feet tall). Calculus is necessary to do this, but most statistical textbooks have tables that summarize this information (at least a reasonable approximation); we will cover this in chapter 9. Standard deviation of a discrete random variable As with frequency distributions, we can also find the standard deviation of a probability distribution. Recall that for a frequency distribution: S f j ( X i X )2 / n We can think of the formula for a probability distribution very similarly. If we move n into the numerator so that f becomes f/n, it becomes easier to see the Psych 524, 10/10/05 p. 4/8 association between the above formula and the formula for the standard deviation of a probability distribution. p( X i )( X i E( X i )) 2 Hint for page 3: What is the probability of sampling someone who is exactly 5.9173567 feet tall? Examples: Compute the standard deviations for the distributions on page 2 Binomial Distribution Now we turn to the case of a variable that is not quantitative. Bernoulli trial based on an experiment with only two outcomes (e.g., coin flip; success vs. failure) three properties: trial can result in one of two outcomes probability of “success” remains constant across trials outcomes of trials are independent note that the last two criteria are rarely met in real-life situations because sampling is often done without replacement from a finite (not infinite) population; but as long as the population is large relative to the sample size, the basic properties described below are applicable Psych 524, 10/10/05 p. 5/8 Distributions of Simple and Complex Events Simple Events single trial is… Bernoulli trial (e.g., coin flip) “Bernoulli” Complex Events (Sampling Distributions) “binomial” “binomial” Discrete random variable (e.g., die toss) Continuous random variable (e.g., height measurement) Binomial Distribution binomial distribution: probability distribution resulting from conducting two or more Bernoulli trials Example: construct a probability distribution for two tosses of a fair coin (consider heads as successes) Although the probability of success (p) may equal the probability of a failure (q), this is not necessary. Consider, for example, the case where success is defined as correctly answering a 5-choice multiple-choice question. In this case, p is 1/5 = .2, and q is 4/5 = .8. Constructing the probability distribution when p and q are unequal (or when the number of trials is large) becomes a bit more challenging to generate conceptually. Instead, we can use the following formula, which generates the probability of obtaining a given number of successes: p(X=r) = nCrprqn-r Psych 524, 10/10/05 p. 6/8 In this formula, r is the number of successes, so when you compute p(X=r), you are computing the probability that you will observe r successes. Again, p in the right hand portion of the equation refers to the probability of a success, and q refers to the probability of a failure. As long as you keep track of which is which, you will obtain the same result (e.g., p can refer to number of heads or number of tails; it can refer to number of correct responses or number of incorrect responses). Finally, n refers to the number of trials. Note that the exponents (r and n-r) will always sum to n. Also, p and q will always sum to 1. Example: What’s the probability that a student who takes a multiple choice test and simply guesses will correctly answer 3 or more questions. Assume that each question has 5 choices. Another way to obtain these probabilities is to use a binomial table (see separate handout…also linked on website). Note that you obtain the same answer with the table. Expected Value and Standard Deviation of the Binomial Distribution Formulas for the expected value (~mean) and standard deviation of the binomial distribution can be derived from the more general formulas given above for random variables. However, this derivation is somewhat complex, so it is now shown here. The resulting formulas are very straight-forward: E(X) = np, where p is the probability of success npq , where p and q are the probabilities of success and failure, respectively Example: compute the expected value and standard deviation for the multiple choice problem above; do your answers make sense? Psych 524, 10/10/05 p. 7/8 Multinomial Distribution This distribution is similar to the binomial distribution; however, instead of there being two possible outcomes, the multinomial distribution results when there are 3 or more possible outcomes (e.g., red, white, blue marbles). Now, we are not interested in number of successes or failures but, instead, the probability that some distinct outcome will occur: Replacement is assumed. p( X 1 n1 andX 2 n2 and ...andX k nk ) n! ( p1 ) n1 ( p 2 ) n2 ...( p k ) nk n1!n2 !...nk ! nk represents the number of each type that are drawn pk represents the probability associated with each type Example: What’s the probability of drawing one white, two blue, and one red pair of socks from a drawer containing 5 white, 4 blue, and 1 red pair of socks, if the socks are put back into the drawer after each draw? The Chi-Square distribution is often used to approximate the multinomial distribution. We will return to this topic later. Hypergeometric Distribution A similar problem can be imagined where replacement is not assumed. p( X 1 n1andX 2 n2 and ...andX k nk ) (t1 Cn1 )( t2 Cn2 )...( tk Cnk ) ( t Cn ) nk represents the number of each type that are drawn tk represents the number of each type that are in the population Example: What’s the probability of drawing one white, two blue, and one red pair of socks from a drawer containing 5 white, 4 blue, and 1 red pair of socks, if the socks are not put back into the drawer after each draw? Psych 524, 10/10/05 p. 8/8 Application of the Binomial Distribution: The Sign Test Assume that you are interested in determining whether brothers and sisters instigate the same proportion of physical aggression when they are interacting with each other. We will discuss null-hypothesis testing in much more detail later, but, for now, assume that we are drawing random samples of brother-sister pairs from the population of interest. The observer decides whether the brother was more aggressive (let’s arbitrarily call this a “success”) or the sister was more aggressive (let’s call this a “failure”). We will now add up all of our successes and failures (these will sum to the total number of brother-sister pairs). Assume that we sample 20 brother-sister pairs and find that, in 13 cases, brothers instigate physical aggression more often than sisters. Can we conclude that brothers are more physically aggressive? Well, if brothers and sisters were equally aggressive in the population (p = .5), We will establish a