Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PROBABILITY AND BINOMIAL DISTRIBUTION! LECTURE#5 ! PSYC218 ANALYSIS OF BEHAV. DATA ! DR. OLLIE HULME, 2011, UBC! Where we are! Past exams on vista today Formula sheet attached Roadmap! Correlation & regression slope Multiple coefficient of determination Probability Introduction to binomial distribution Correlation and regression slope! Last week we defined the least-squares equation for regression slope Y’ = predicted value of Y Y' = bYX + aY bY = slope of line for minimizing errors in predicting Y aY = Y intercept for minimizing errors in predicting Y The question arose how does correlation relate to this? Relationship Between bY and r! bY is the constant that relates to r r does not ≠ by if scores are still raw since scaling will change the slope, but not the correlation r = by (of the least-squares regression) line only when scores are plotted as z-scores When r = 1 the slope of the zscore plot is 1 When r = 0 the slope of the zscore plot = 0 When r = -1 slope of the zscore plot = -1 € Formula for bY (when r is Known) ! sY = stand. dev. of Y (raw scores) sX = stand. dev. of X (raw scores) 27814.6462 b Y = 0.7469 3.9335 This was the data for the relation between years of education and income This is what we found by the other method Formula for bY (when r is Known) ! If this is applied to z-score data rather than the raw scores Only for z-score data Multiple Regression! GPA 1st predictor variable (IQ) 2nd predictor variable (study time) Same constants (b1 b2 & a) are calculated to minimise prediction errors, but math is a lot more complex We generally do better predicting GPA if we base our predictions on more than one variable Adding more predictor variables generally (not always) decreases our prediction error and increase our predication accuracy. Multiple regression! Due to complexity we won’t derive the equations but we will assume that SPSS has performed the leastsquares regression by minimising Σ (Y–Y')2 SPSS calculates the values of b and a to minimise Σ (Y–Y')2 Y’ = 0.049 X1 + 0.118 X2 - 5.249 Now lets see how accurate the prediction is with two predictor variables compared to one This reduced Σ (Y–Y')2 from 1.88 to 0.63, 66% improvement in prediction accuracy for GPA The variability accounted for in Y by X’s has increased Variability accounted for! Remember, r2 is the proportion of variability accounted for by X in single variable regression It might make sense to try to figure out the r2 values from the correlations between the predictor variables and the Y data and add them up But this results in (0.856)2 + (0.829)2 = 1.42 No david, impossible means impossible, deal with it. This is impossible, you cannot account for more than 100% of variability This is because there is overlap in the variability accounted for by IQ and GPA R2 Multiple Coefficient of Determination ! If we are to calculate the real proportion of variability accounted for by the predictor variables we must take this overlap into account Corr between Y and X2 Corr between Y and X1 2 2 rYX + rYX − 2rYX rYX rX X 2 R = 2 1− rX X 1 2 1 1 2 1 2 This subtracts out the overlap in variability accounted for by both predictor variables 2 (0.856)2 + (0.829)2 − 2(0.856)(0.829)(0.560) R = 1− (0.560)2 2 € € 2 R = 0.9100 91% of the variability of GPA is accounted for by IQ and Study hours Midterm cut-off! Everything up to here, ch1-7 is relevant to MT1 Descriptive vs. Inferential Stats! Descriptive Statistics: Concerns techniques that are used to describe or characterize the data (Ch1-7) Inferential Statistics: Involves techniques that allow us to use data from a sample to make inferences about a population (Ch8 onwards) Parameter Estimation: Experimenter is interested in determining the magnitude of a population characteristic (e.g., how much marijuana does the average UBC student smoke?) Hypothesis Testing: Experimenter collects data on a sample to test a hypothesis concerning the population (e.g., does being high on marijuana affect memory?) ?) Random Sample! Defined as a sample selected from the population by a process that ensures that: All the members of the population have an equal chance of being selected into the sample Each possible sample of a given size has an equal chance of being selected Of all possible combinations of elements sampled each combination is equally likely Why random sample?! In order to generalize from the sample to the population the sample needs to be representative of the population It allows us to apply the laws of probability to the sample Question ! If I used a process to randomly select 1 of my 84 students at random, what is the probability that you would be selected? a) b) c) d) e) 1/84 Zis one 10/84 1/1.4 B or C cannot be determined Sampling & Replacement! Sampling without replacement: a method of sampling in which the members of the sample are NOT returned to the population before subsequent members are selected Sampling with replacement: a method of sampling in which members of the sample are returned to the population before subsequent members are selected Method used most in psychological research The element sampled is re-placed back into the population before taking the next Probability Basics! Typically expressed in values ranging from 0.00 to 1.00 0.0000 means an event is certain* NOT to occur 1.0000 means an event is certain to occur 0.0500 means an event will happen 5 times in 100 Can also be expressed as fraction Probabilities are rounded to 4 decimal places! This is because when they are converted to percentages we still have 2 decimal places A priori probability! Problems solved using only reason. No data collection What is the probability of rolling a six sided die and having it land on 3? But assumes each event is equally possible – e.g. fair dice A posteriori probability! What is the probability of rolling a six sided die and having it land on 3? Problems solved after some data have been collected Rules of probability! Addition Rule! Deals with the probability of occurrence of any one of several possible events probability of A p(A or B) = p(A) + p(B) – p(A and B) probability of occurrence of A or B plus the probability of occurrence of B minus the probability of occurrence of both A and B Addition Rule Example! You are asked to draw one card from a normal deck of 52 playing cards. What is the probability you pick a diamond or a 10? p(A or B) = p(A) + p(B) – p(A and B) p(♦ or 10) = p(♦) + p(10) – p(♦ and 10) p(♦) = 13/52 p(10) = 4/52 p(♦ and 10) = 1/52 p(♦ or 10) = 13/52 + 4/52 – 1/52 = 16/52 = 0.3077 Mutually Exclusive Events! …are events that cannot occur together If A and B are mutually exclusive then p(A and B) = 0 Picking a spade or a diamond in one draw from a deck Rolling a 3 or a 4 on one roll of a die p(A or B) = p(A) + p(B) – p(A and B) Giving birth to a baby boy or girl Therefore for mutually exclusive events the equation simplifies to … p(A or B) = p(A) + p(B) What is the probability you pick a queen or a jack? p(Q or J) = p(Q) + p(J) p(Q) = 4/52 p(J) = 4/52 p(Q or J) = 4/52 + 4/52 = 8/52 = 0.1538 Question! Are the events picking a diamond or picking a 10 in a single draw from a deck of cards mutually exclusive? a) Yes b) No No since you could pick a card which is both Question ! You have 12 cans of pop in the fridge: 3 cans of Coke, 3 cans of Sprite, 2 cans of Dr. Pepper, 2 cans of Orange Crush, 1 can of Ginger Ale and 1 can of Cream Soda. You close your eyes and pick one can out of the fridge at random. What is the probability you pick a Coke or a Cream Soda? a) b) c) d) e) 0.0833 0.1667 0.2500 0.3333 I don’t have a calculator p(Coke or Cream Soda) = p(Coke) + p(Cream Soda) p(Coke) = 3/12 p(Cream Soda) = 1/12 p(Coke or Cream Soda) = 3/12 + 1/12 = 4/12 = 0.3333 2+ Mutually Exclusive Events! where A, B, C,…,Z are mutually exclusive events You have 12 cans in the fridge: 3 Cokes, 3 Sprites, 2 Dr. Peppers, 2 Orange Crush, 1 Ginger Ale and 1 Cream Soda. You close your eyes and pick one can out of the fridge at random. What is the probability you pick a Dr. Pepper, an Orange Crush or a Cream Soda. p(Dr.P or OC or CS) = p(Dr.P) + p(OC) + p(CS) p(Dr.P) = 2/12 p(OC) = 2/12 p(CS) = 1/12 p(Dr.P or OC or CS) = 2/12 + 2/12+ 1/12 = 5/12 = 0.4167 Exhaustive! A set of events is exhaustive if it includes all possible events If a set of events is exhaustive then the probability of occurrence is 1 What is the probability of flipping a coin and having it turn up heads or tails? p(heads or tails) = p(heads) + p(tails) p(heads) = 1/2 p(tails) = 1/2 p(heads or tails) = 1/2 + 1/2 = 2/2 = 1 Heads I win tails you lose! Exhaustive Notation! Usually when there are only two mutually exclusive events, we denote the probability of occurrence of one as P and the other as Q Flipping a coin: P = probability of getting a head = 1/2 Q = probability of getting a tail = 1/2 Gender of a baby: P = probability of having a boy = 5/12 Q = probability of having a girl = 7/12 P + Q = 1.00 when two events are exhaustive and mutually exclusive Multiplication Rule! The joint or successive occurrence of one of several events. E.g. probability of heads then tails, being female and brunette p(A and B) = p(A) p(B|A) Probability of A Multiplied by the probability of B given that A has occurred This depends on whether A and B are independent or dependent in some way Mult. Rule & Independent events! Events are independent if the occurrence of one has no effect on the probability of occurrence of the other If events are independent then p(B|A) = p(B) p(A and B) = p(A) p(B|A) p(A and B) = p(A) p(B) Special case of multiplication rule when events are independent If you flip two coins what is the probability they will both turn up heads? p(A and B) = p(A) p(B) = p(1/2) p(1/2) = .2500 e.g. coin flips, flipping tails on coin1 has no effect on probability of flipping tails on coin2 Example! Draw two cards randomly from a regular deck. After drawing the first card you return it to the deck before drawing the second card. What is the probability that both cards will be diamonds? p(♦ 1st and ♦ 2nd) = p(♦ 1st) p(♦ 2nd) p(♦ 1st) = 13/52 p(♦ 2nd) = 13/52 p(♦ 1st and ♦ 2nd) = (13/52)(13/52) = 169/2704 = 0.0625 Sampling with replacement, so events are independent If card replaced then the second draw is independent of the first since the deck of cards consists of same number of cards Multiplication rule for independent events Question! You just got a new ipod shuffle and put 250 songs onto it; 10 of which are from a Radiohead album. What is the probability that the first two songs you play are from the Radiohead (RH) album? Assume the shuffle samples with replacement. a) b) c) d) e) 0.0800 0.0400 0.0016 None of the above I don’t have a calculator Apple had to change the randomisation function because people complained it wasn’t random, even though it was exactly random p(RH 1st and RH 2nd) = p(RH 1st) p(RH 2nd) p(RH 1st) = 10/250 p(RH 2nd) = 10/250 p(RH 1st and RH 2nd) = (10/250)(10/250) = 100/62500 = 0.0016 Mult. rule for multiple events! p(A and B and C and…and Z) = p(A)p(B)p(C)…p(Z) Same but just multiply by the extra elements What is the probability that the first three songs you play are from the same Radiohead (RH) album? p(RH 1st and RH 2nd and RH 3rd) = p(RH 1st) p(RH 2nd)p (RH 3rd) p(RH 1st) = 10/250 p(RH 2nd) = 10/250 P(RH 3rd) = 10/250 p(RH 1st and RH 2nd and RH 3rd) = (10/250)(10/250)(10/250) = 1000/15625000 = 0.000064 Mult. Rule for dependent events! Events are dependent if the occurrence of one event affects the probability of occurrence of the other Probability of rain What are the chances of both? For both to happen it must rain, what are the chances of rain Probability of getting wet hair Given that it has rained, what are the chances of wet hair p(A and B) = p(A) p(B|A) [Note that this was the equation we saw before, before we simplified it for independent events] Example! You are asked to draw two cards randomly from a regular deck. You do not return the first card to the deck before drawing the second card. What is the probability that both cards will be diamonds? Sampling without replacement, so events are dependent p(♦ 1st and ♦ 2nd) = p(♦ 1st) p(♦ 2nd given ♦ 1st) Since you don’t replace p(♦ 1st) = 13/52 p(♦ 2nd given ♦ 1st) = 12/51 p(♦ 1st and ♦ 2nd) = (13/52)(12/51) = 156/2652 = 0.0588 the card, the deck is different for the second draw changing the odds Look! odds change because 1 card has been removed Question ! You have 12 cans of pop in the fridge: 3 cans of Coke, 3 cans of Sprite, 2 cans of Dr. Pepper, 2 cans of Orange Crush, 1 can of Ginger Ale and 1 can of Cream Soda. You close your eyes and pick two cans out of the fridge at random. What is the probability that the 1st can you pick is a Coke and the 2nd can you pick is a Dr. Pepper? a) 0.0455 b) 0.0625 c) 0.4157 d) 0.4318 Sampling without replacement, therefore dependent events p(Coke 1st and Dr. P 2nd) = p(Coke 1st) p(Dr. P 2nd given Coke 1st) p(Coke 1st) = 3/12 p(Dr. P 2nd given Coke 1st) = 2/11 p(Coke 1st and Dr. P 2nd) = (3/12)(2/11) = 6/132 = 0.0455 Mult. Rule 2+ Dependent Events! For A B and C to happen ‘A’ has to happen Then B has to happen given that A has happened, p(A and B and C) = p(A) p(B|A) p(C|AB) Then C has to happen given that A and B has happened where p(A) = probability of A p(B|A) = probability of B, given A has occurred p(C|AB) = probability of C, given A and B have occurred For 4 events… and do so on and so forth… p(A and B and C and D) = p(A) p(B|A) p(C|AB) p(D|ABC) Example! You have 12 cans of pop in the fridge: 3 cans of Coke, 3 cans of Sprite, 2 cans of Dr. Pepper, 2 cans of Orange Crush, 1 can of Ginger Ale and 1 can of Cream Soda. You close your eyes and pick three cans out of the fridge at random. What is the probability that the 1st can you pick is a Coke , the 2nd can you pick is a Dr. Pepper and the 3rd can is a Sprite? p(Coke 1st and Dr. P 2nd and Sprite 3rd) = p(Coke 1st) p(Dr. P 2nd given Coke 1st) p(Sprite 3rd given Coke 1st and Dr. P 2nd) Note how the p(Coke 1st) = 3/12 chances change as further cans are p(Dr. P 2nd given Coke 1st) = 2/11 removed p(Sprite 3rd given Coke 1st and Dr. P 2nd) = 3/10 p(Coke 1st and Dr. P 2nd and Sprite 3rd) = (3/12)(2/11) (3/10) = 18/1320 = 0.0136 Example! There are 61 students in a classroom. 12 are Biology majors, 20 are English majors and 29 are Psych (Ψ) majors. If you sample 3 without replacement what is the probability of obtaining 3 Psych majors? p(Ψ 1st and Ψ 2nd and Ψ 3rd) = p(Ψ 1st) p(Ψ 2nd, given Ψ 1st) p(Ψ 3rd, given Ψ 1st and Ψ 2nd) p(Ψ 1st) = 29/61 p(Ψ 2nd, given Ψ 1st) = 28/60 p(Ψ 3rd, given Ψ 1st and Ψ 2nd) = 27/59 p(Ψ 1st and Ψ 2nd and Ψ 3rd) = (29/61)(28/60)(27/59) = 21924/215940 = 0.1015 Chances change as you sample without replacement Multiplication and Addition Rules! There are 61 students in a classroom. 12 are Biology majors, 20 are English majors and 29 are Psych majors. If you sample 2 without replacement what is the probability of obtaining 1 Psych major and 1 English Major? 2 Possible Outcomes meet this requirement Outcome A: Psych 1st, English 2nd Outcome B: English 1st, Psych 2nd More complex problems will require both Use multiplication rule to calculate probability for each outcome Then addition rule to account for either Multiplication and Addition Rules! 1. Determine probability of each outcome using the multiplication rule: Outcome A p(Ψ 1st) = 29/61 p(Eng 2nd, given Ψ 1st) = 20/60 (29/61)(20/60) = 580/3660 =0.1585 Outcome B p(Eng 1st) = 20/61 p(Ψ 2nd, given Eng 1st) = 29/60 (20/61)(29/60) = 580/3660 = 0.1585 2. Use the addition rule to add the probabilities together: 580/3660 + 580/3660 = 1160/3660 = 0.3169 Normal Continuous Variables! Up to this point we have only been considering discrete variables but most variables in research are continuous How do we determine the probability that a score will be equal to or greater than a specific score? Transform score to z-score and use Column C of Table A! Example! You pick an individual out of a crowd at random. What is the probability that they have an IQ equal to or greater than 120? Remember! IQ is normally distributed, mean = 100, standard deviation = 16. Example! Step 1: Calculate the z-score z = 1.25 Step 2: Draw a normal curve and place the z-score on the curve 1.25 -3 -2 -1 0 1 2 3 Step 3: Find the corresponding area under the curve (Table A; Column C) = 0.1056 Question! Which of the following is a dichotomous variable? a. b. c. d. e. Age Number of Friends Gender Result of a coin toss C and D Dichotomous when there are only 2 possible states for the variable Binomial Distribution! Binomial data – the data that result from measuring subjects on a dichotomous variable Binomial – Latin for ‘having two names’ Binomial Distribution – Cousin to the normal distribution. Allows us to determine the probability of certain outcomes for binomial data Question! What is the probability of guessing correctly on 2 true/ false questions? a. b. c. d. e. 1.0000 0.7500 0.5000 0.2500 0.125 We can use the multiplication rule We can assume events are independent so we can use the simplified equation p(A and B) = p(A) p(B) P (correct 1 and correct 2) = p(correct) p (correct) = p(1/2) p(1/2) =0.25 = .2500 The Binomial Distribution! A probability distribution that results when: 1. There is a series of N trials 2. One each trial, there are only 2 possible outcomes (P and Q) 3. On each trial, the two outcomes are mutually exclusive 4. The trials are independent 5. The probability of each outcome stays the same from trial to trial The Binomial Distribution! e.g. every possible combination of heads and tails for N tosses of a coin When these requirements are met: 1. The binomial distribution tells us each possible combination of outcomes from N trials 2. The probability of getting each of these outcomes e.g. probability of HH = 0.25 probability of TT = 0.25 True or false! The average person eats 8 spiders / year a) True b) False False Generating Binomial Distribution ! This is if you were completely guessing (and had no capacity for reasoning) For guessing on 1 true/false question What are the possible outcomes? Outcome 1: Q1: √ Outcome 2: Q1: X What is the probability of each outcome? p(√) = 1/2 = 0.5000 p(X) = 1/2 = 0.5000 Generating Binomial Distribution Average person eats 8 spiders / year Average person only uses 10% of your brain For guessing on 2 true/false questions What are the possible outcomes? Outcome 1: Outcome 2: Outcome 3: Outcome 4: Q1: √ Q1: √ Q1: X Q1: X Q 2: √ Q 2: X Q 2: √ Q 2: X What is the probability of each type of outcome? p(2 √) = p(Q1 √ Q2 √) = 1/4 = 0.2500 p(1 √) = p(Q1 √ or Q2 √) = 2/4 = 0.5000 p(0 √) = p(Q1 X Q2 X) = 1/4 = 0.2500 There are 2 different ways of getting 1 right Generating Binomial Distribution For guessing on 3 true/false questions What are the possible outcomes? Outcome 1: Q 1: √ Q 2: √ Q3: √ Outcome 2: Q 1: √ Q 2: √ Q3: X Outcome 3: Q 1: √ Q 2: X Q3: √ Outcome 4: Q 1: X Q 2: √ Q3: √ Outcome 5: Q 1: √ Q 2: X Q3: X Outcome 6: Q 1: X Q 2: √ Q3: X Outcome 7: Q 1: X Q 2: X Q3: √ Outcome 8 Q1: X Q 2: X Q3: X What is the probability of each outcome? p(3 √) = 1/8 = 0.1250 p(2 √) = 3/8 = 0.3750 p(1 √) = 3/8 = 0.3750 p(0 √) = 1/8 = 0.1250 Binomial Distribution When P = 0.50 ! N Possible Outcomes (# of Events of Interest) Probability 1 1 .5000 0 2 3 e.g. 1 correct guess .5000 2 .2500 1 .5000 0 .2500 3 .1250 2 .3750 1 .3750 0 .1250 This is in appendix D, table B The real table includes columns for probabilities other than 0.5 And so on and so forth for increasing values of N Q! Using the binomial distribution we just generated determine the probability that a woman will give birth to 2 boys and 1 girl (over the course of 3 pregnancies, assuming boys and girls are equally probable). N Possible Outcomes (# of Events of Interest) Probability 1 1 .5000 0 .5000 2 .2500 1 .5000 0 .2500 3 .1250 2 .3750 1 .3750 0 .1250 2 The probability is… a) b) c) d) 0.5000 0.3750 0.2500 0.1250 Either of these 3 Mind your Ps and Qs! When there are only 2 mutually exclusive events we denote the probability of occurrence of one as P and the other as Q Guessing on a True/False Question P = guessing correctly = 1/2 Q = guessing incorrectly = 1/2 Giving Birth: P = having a boy = 1/2 Q = having a girl = 1/2 Flipping a Coin: P = getting a head = 1/2 Q = getting a tail = 1/2 P + Q = 1.00 only when two events are exhaustive Which is assigned P and which Q is often arbitrary Binomial Expansion! Binomial distribution can be generated from this (P + Q)N To generate the possible outcomes and the probabilities of each outcome simply expand the expression for the number of trials (N) and evaluate each term in the expression Expanding the equation gives you the particular equation for any number of trials Using the Binomial Expansion! Generate a binomial distribution for 2 True/False Questions (N = 2) (P+Q)N = (P+Q)2 = (P+Q)(P+Q) = P2 + 2PQ + Q2 Process of expansion for N = 2 Interpreting the Binomial Expansion! (P+Q)N = (P+Q)2 1. The letters (P, Q) tell us the kinds of events that comprise the outcome = (P+Q)(P+Q) 2. The exponents tell us how many of that kind of event there are in the outcome = P2 + 2PQ + Q2 3. The coefficients tell us how many ways there are of obtaining the outcome (if there is no coefficient this means just 1) represents all possible outcomes Let P = Correct Guess and Q = Incorrect Guess So P2 represents 1 possible outcome with 2 P events (2 Correct Guesses) 2PQ represents 2 possible outcomes with 1 P and 1 Q event (1 Correct Guess) Q2 represents 1 possible outcome with 2 Q events (0 Correct Guesses) Using the Binomial Expansion! P2 + 2PQ + Q2 We can use the binomial expansion to determine the probability of getting each of these possible outcomes by substituting the probability of P and Q in for P and Q The probability of P = Q = 0.50, so…. Prob. of 2 Correct Guesses = p(2 √) = P2 = (0.50)2 = 0.2500 Prob. of 1 Correct Guess = p(1 √) = 2PQ = 2(0.50)(0.50) = 0.5000 Prob. of 0 Correct Guesses = p(0 √) = Q2 = (0.50)2 = 0.2500 Next Lecture! More Binomial Distribution!