Download slides - Ollie Hulme`s website

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
PROBABILITY AND BINOMIAL DISTRIBUTION!
LECTURE#5 !
PSYC218 ANALYSIS OF BEHAV. DATA !
DR. OLLIE HULME, 2011, UBC!
Where we are!
Past
exams on
vista
today
Formula
sheet
attached
Roadmap!
Correlation & regression slope
Multiple coefficient of determination
Probability
Introduction to binomial distribution
Correlation and regression slope!
Last week we defined the least-squares equation for
regression slope
Y’ = predicted
value of Y
Y' = bYX + aY
bY = slope of line for minimizing
errors in predicting Y
aY = Y intercept for
minimizing errors in
predicting Y
The question arose how does correlation relate to this?
Relationship Between bY and r!
bY is the constant that relates to r
r does not ≠ by if scores are still raw since scaling will
change the slope, but not the correlation
r = by (of the least-squares regression) line only when
scores are plotted as z-scores
When r = 1 the
slope of the zscore plot is 1
When r = 0 the
slope of the zscore plot = 0
When r = -1
slope of the zscore plot = -1
€
Formula for bY (when r is Known) !
sY = stand. dev. of Y (raw scores)
sX = stand. dev. of X (raw scores)
27814.6462
b Y = 0.7469
3.9335
This was the data for the
relation between years of
education and income
This is what we found by
the other method
Formula for bY (when r is Known) !
If this is applied to z-score data
rather than the raw scores
Only for z-score
data
Multiple Regression!
GPA
1st predictor
variable (IQ)
2nd predictor
variable
(study time)
Same constants (b1 b2
& a) are calculated to
minimise prediction
errors, but math is a lot
more complex
We generally do better predicting GPA if we base our predictions on
more than one variable
Adding more predictor variables generally (not always) decreases our
prediction error and increase our predication accuracy.
Multiple regression!
Due to complexity we won’t derive the equations but
we will assume that SPSS has performed the leastsquares regression by minimising Σ (Y–Y')2
SPSS calculates the
values of b and a to
minimise Σ (Y–Y')2
Y’ = 0.049 X1 + 0.118 X2 - 5.249
Now lets see how accurate the
prediction is with two predictor
variables compared to one
This reduced Σ (Y–Y')2 from
1.88 to 0.63, 66%
improvement in prediction
accuracy for GPA
The variability accounted for
in Y by X’s has increased
Variability accounted for!
Remember, r2 is the proportion of variability accounted for by
X in single variable regression
It might make sense to try to figure out the r2
values from the correlations between the predictor
variables and the Y data and add them up
But this results in (0.856)2 + (0.829)2 = 1.42
No david,
impossible means
impossible, deal
with it.
This is impossible, you
cannot account for
more than 100% of
variability
This is because there is overlap in
the variability accounted for by IQ
and GPA
R2 Multiple Coefficient of Determination !
If we are to calculate the real proportion of variability
accounted for by the predictor variables we must take this
overlap into account
Corr between Y
and X2
Corr between Y
and X1
2
2
rYX + rYX − 2rYX rYX rX X
2
R =
2
1− rX X
1
2
1
1
2
1
2
This subtracts out
the overlap in
variability accounted
for by both predictor
variables
2
(0.856)2 + (0.829)2 − 2(0.856)(0.829)(0.560)
R =
1− (0.560)2
2
€
€
2
R = 0.9100
91% of the variability of GPA is
accounted for by IQ and Study hours
Midterm cut-off!
Everything up to here, ch1-7 is relevant to MT1
Descriptive vs. Inferential Stats!
Descriptive Statistics: Concerns techniques that are
used to describe or characterize the data (Ch1-7)
Inferential Statistics: Involves techniques that allow
us to use data from a sample to make inferences about
a population (Ch8 onwards)
Parameter Estimation:
Experimenter is interested in
determining the magnitude of
a population characteristic
(e.g., how much marijuana
does the average UBC
student smoke?)
Hypothesis Testing:
Experimenter collects data on
a sample to test a hypothesis
concerning the population
(e.g., does being high on
marijuana affect memory?)
?)
Random Sample!
Defined as a sample selected from the population by a
process that ensures that:
All the members of the population have an equal
chance of being selected into the sample
Each possible sample of a given size has an equal
chance of being selected
Of all possible combinations of
elements sampled each
combination is equally likely
Why random sample?!
In order to generalize from the sample to the
population the sample needs to be
representative of the population
It allows us to apply the laws of probability to the
sample
Question !
If I used a process to randomly select 1 of my 84
students at random, what is the probability that you
would be selected?
a) b) c) d) e) 1/84
Zis one
10/84
1/1.4
B or C
cannot be determined
Sampling & Replacement!
Sampling without replacement: a method of
sampling in which the members of the sample are NOT
returned to the population before subsequent members
are selected
Sampling with replacement: a method of sampling in
which members of the sample are returned to the
population before subsequent members are selected
Method used
most in
psychological
research
The element sampled
is re-placed back into
the population before
taking the next
Probability Basics!
Typically expressed in values ranging from 0.00 to 1.00
0.0000 means an event is certain* NOT to occur
1.0000 means an event is certain to occur
0.0500 means an event will happen 5 times in 100
Can also be expressed as
fraction
Probabilities are rounded to 4
decimal places!
This is because when they are
converted to percentages we still
have 2 decimal places
A priori probability!
Problems
solved using
only reason. No
data collection
What is the probability of rolling a six sided die
and having it land on 3?
But assumes each
event is equally
possible – e.g. fair
dice
A posteriori probability!
What is the probability of rolling a six sided die and
having it land on 3?
Problems solved
after some data
have been
collected
Rules of probability!
Addition Rule!
Deals with the probability of occurrence of any one of
several possible events
probability of A
p(A or B) = p(A) + p(B) – p(A and B)
probability of
occurrence of
A or B
plus the
probability of
occurrence of
B
minus the probability of
occurrence of both A and B
Addition Rule Example!
You are asked to draw one card from a normal deck of
52 playing cards. What is the probability you pick a
diamond or a 10?
p(A or B) = p(A) + p(B) – p(A and B)
p(♦ or 10) = p(♦) + p(10) – p(♦ and 10)
p(♦) = 13/52
p(10) = 4/52
p(♦ and 10) = 1/52
p(♦ or 10) = 13/52 + 4/52 – 1/52
= 16/52
= 0.3077
Mutually Exclusive Events!
…are events that cannot occur together
If A and B are mutually exclusive
then p(A and B) = 0
Picking a spade or a diamond in
one draw from a deck
Rolling a 3 or a 4 on one roll of a die
p(A or B) = p(A) + p(B) – p(A and B)
Giving birth to a baby boy or girl
Therefore for mutually exclusive events
the equation simplifies to …
p(A or B) = p(A) + p(B)
What is the probability you
pick a queen or a jack?
p(Q or J) = p(Q) + p(J)
p(Q) = 4/52
p(J) = 4/52
p(Q or J) = 4/52 + 4/52
= 8/52 = 0.1538
Question!
Are the events picking a diamond or picking a 10 in a
single draw from a deck of cards mutually exclusive?
a) Yes
b) No
No since you could pick a card
which is both
Question !
You have 12 cans of pop in the fridge: 3 cans of Coke,
3 cans of Sprite, 2 cans of Dr. Pepper, 2 cans of
Orange Crush, 1 can of Ginger Ale and 1 can of Cream
Soda. You close your eyes and pick one can out of the
fridge at random. What is the probability you pick a
Coke or a Cream Soda?
a) b) c) d) e) 0.0833
0.1667
0.2500
0.3333
I don’t have a calculator
p(Coke or Cream Soda) = p(Coke) + p(Cream Soda)
p(Coke) = 3/12
p(Cream Soda) = 1/12
p(Coke or Cream Soda) = 3/12 + 1/12 = 4/12 = 0.3333
2+ Mutually Exclusive Events!
where A, B, C,…,Z are mutually
exclusive events
You have 12 cans in the fridge: 3 Cokes, 3 Sprites, 2 Dr. Peppers, 2
Orange Crush, 1 Ginger Ale and 1 Cream Soda. You close your eyes
and pick one can out of the fridge at random. What is the probability
you pick a Dr. Pepper, an Orange Crush or a Cream Soda.
p(Dr.P or OC or CS) = p(Dr.P) + p(OC) + p(CS)
p(Dr.P) = 2/12
p(OC) = 2/12
p(CS) = 1/12
p(Dr.P or OC or CS) = 2/12 + 2/12+ 1/12 = 5/12 = 0.4167
Exhaustive!
A set of events is exhaustive if it includes all possible events
If a set of events is exhaustive then
the probability of occurrence is 1
What is the probability of flipping a coin and having it
turn up heads or tails?
p(heads or tails) = p(heads) + p(tails)
p(heads) = 1/2
p(tails) = 1/2
p(heads or tails) = 1/2 + 1/2 = 2/2 = 1
Heads I
win tails
you lose!
Exhaustive Notation!
Usually when there are only two mutually exclusive
events, we denote the probability of occurrence of one
as P and the other as Q
Flipping a coin:
P = probability of getting a head = 1/2
Q = probability of getting a tail = 1/2
Gender of a baby:
P = probability of having a boy = 5/12
Q = probability of having a girl = 7/12
P + Q = 1.00 when two events are exhaustive and
mutually exclusive
Multiplication Rule!
The joint or successive occurrence of one
of several events. E.g. probability of heads
then tails, being female and brunette
p(A and B) = p(A) p(B|A)
Probability
of A
Multiplied by the probability
of B given that A has
occurred
This depends on whether A and B are independent or dependent in
some way
Mult. Rule & Independent events!
Events are independent if the occurrence of one has
no effect on the probability of occurrence of the other
If events are
independent then
p(B|A) = p(B)
p(A and B) = p(A) p(B|A)
p(A and B) = p(A) p(B)
Special case of
multiplication rule when
events are independent
If you flip two coins what is the
probability they will both turn up
heads?
p(A and B) = p(A) p(B)
= p(1/2) p(1/2)
= .2500
e.g. coin flips,
flipping tails on
coin1 has no
effect on
probability of
flipping tails on
coin2
Example!
Draw two cards randomly from a regular
deck. After drawing the first card you
return it to the deck before drawing the
second card. What is the probability that
both cards will be diamonds?
p(♦ 1st and ♦ 2nd) = p(♦ 1st) p(♦ 2nd)
p(♦ 1st) = 13/52
p(♦ 2nd) = 13/52
p(♦ 1st and ♦ 2nd) = (13/52)(13/52)
= 169/2704
= 0.0625
Sampling with replacement, so events
are independent
If card replaced then the second draw is
independent of the first since the deck of
cards consists of same number of cards
Multiplication rule
for independent
events
Question!
You just got a new ipod shuffle and put 250 songs onto
it; 10 of which are from a Radiohead album. What is
the probability that the first two songs you play are
from the Radiohead (RH) album? Assume the shuffle
samples with replacement.
a) b) c) d) e) 0.0800
0.0400
0.0016
None of the above
I don’t have a calculator
Apple had to change the
randomisation function
because people complained it
wasn’t random, even though it
was exactly random
p(RH 1st and RH 2nd) = p(RH 1st) p(RH 2nd)
p(RH 1st) = 10/250
p(RH 2nd) = 10/250
p(RH 1st and RH 2nd) = (10/250)(10/250)
= 100/62500 = 0.0016
Mult. rule for multiple events!
p(A and B and C and…and Z) = p(A)p(B)p(C)…p(Z)
Same but just multiply by
the extra elements
What is the probability that the first three songs you play are from the same
Radiohead (RH) album?
p(RH 1st and RH 2nd and RH 3rd) = p(RH 1st) p(RH 2nd)p (RH 3rd)
p(RH 1st) = 10/250
p(RH 2nd) = 10/250
P(RH 3rd) = 10/250
p(RH 1st and RH 2nd and RH 3rd) = (10/250)(10/250)(10/250)
= 1000/15625000 = 0.000064
Mult. Rule for dependent events!
Events are dependent if the occurrence of one event
affects the probability of occurrence of the other
Probability of rain
What are the
chances of
both?
For both to
happen it must
rain, what are the
chances of rain
Probability of
getting wet hair
Given that it has
rained, what are
the chances of
wet hair
p(A and B) = p(A) p(B|A)
[Note that this was the equation we saw before, before
we simplified it for independent events]
Example!
You are asked to draw two cards randomly from a
regular deck. You do not return the first card to the
deck before drawing the second card. What is the
probability that both cards will be diamonds?
Sampling without replacement, so events are
dependent
p(♦ 1st and ♦ 2nd) = p(♦ 1st) p(♦ 2nd given ♦ 1st) Since you don’t replace
p(♦ 1st) = 13/52
p(♦ 2nd given ♦ 1st) = 12/51
p(♦ 1st and ♦ 2nd) = (13/52)(12/51)
= 156/2652
= 0.0588
the card, the deck is
different for the second
draw changing the
odds
Look! odds change because 1
card has been removed
Question !
You have 12 cans of pop in the fridge: 3 cans of Coke, 3
cans of Sprite, 2 cans of Dr. Pepper, 2 cans of Orange
Crush, 1 can of Ginger Ale and 1 can of Cream Soda.
You close your eyes and pick two cans out of the fridge
at random. What is the probability that the 1st can you
pick is a Coke and the 2nd can you pick is a Dr. Pepper?
a) 0.0455
b) 0.0625
c) 0.4157
d) 0.4318
Sampling
without
replacement,
therefore
dependent
events
p(Coke 1st and Dr. P 2nd) = p(Coke 1st) p(Dr. P 2nd given Coke 1st)
p(Coke 1st) = 3/12
p(Dr. P 2nd given Coke 1st) = 2/11
p(Coke 1st and Dr. P 2nd) = (3/12)(2/11)
= 6/132 = 0.0455
Mult. Rule 2+ Dependent Events!
For A B
and C to
happen
‘A’ has to
happen
Then B has to
happen given
that A has
happened,
p(A and B and C) = p(A) p(B|A) p(C|AB)
Then C has to
happen given
that A and B
has happened
where
p(A) = probability of A
p(B|A) = probability of B, given A has occurred
p(C|AB) = probability of C, given A and B have occurred
For 4 events…
and do so on and so
forth…
p(A and B and C and D) = p(A) p(B|A) p(C|AB) p(D|ABC)
Example!
You have 12 cans of pop in the fridge: 3 cans of Coke,
3 cans of Sprite, 2 cans of Dr. Pepper, 2 cans of
Orange Crush, 1 can of Ginger Ale and 1 can of Cream
Soda. You close your eyes and pick three cans out of
the fridge at random. What is the probability that the
1st can you pick is a Coke , the 2nd can you pick is a
Dr. Pepper and the 3rd can is a Sprite?
p(Coke 1st and Dr. P 2nd and Sprite 3rd) =
p(Coke 1st) p(Dr. P 2nd given Coke 1st) p(Sprite 3rd given Coke 1st and Dr. P
2nd)
Note how the
p(Coke 1st) = 3/12
chances change as
further cans are
p(Dr. P 2nd given Coke 1st) = 2/11
removed
p(Sprite 3rd given Coke 1st and Dr. P 2nd) = 3/10
p(Coke 1st and Dr. P 2nd and Sprite 3rd) = (3/12)(2/11) (3/10)
= 18/1320
= 0.0136
Example!
There are 61 students in a classroom. 12 are Biology
majors, 20 are English majors and 29 are Psych (Ψ)
majors. If you sample 3 without replacement what is
the probability of obtaining 3 Psych majors?
p(Ψ 1st and Ψ 2nd and Ψ 3rd) =
p(Ψ 1st) p(Ψ 2nd, given Ψ 1st) p(Ψ 3rd, given Ψ 1st
and Ψ 2nd)
p(Ψ 1st) = 29/61
p(Ψ 2nd, given Ψ 1st) = 28/60
p(Ψ 3rd, given Ψ 1st and Ψ 2nd) = 27/59
p(Ψ 1st and Ψ 2nd and Ψ 3rd) =
(29/61)(28/60)(27/59) = 21924/215940 = 0.1015
Chances change as
you sample without
replacement
Multiplication and Addition Rules!
There are 61 students in a classroom. 12
are Biology majors, 20 are English majors
and 29 are Psych majors. If you sample 2
without replacement what is the probability
of obtaining 1 Psych major and 1 English
Major?
2 Possible Outcomes meet this
requirement
Outcome A: Psych 1st, English 2nd
Outcome B: English 1st, Psych 2nd
More complex problems
will require both
Use multiplication rule to
calculate probability for
each outcome
Then addition rule to
account for either
Multiplication and Addition Rules!
1. Determine probability of each outcome using
the multiplication rule:
Outcome A
p(Ψ 1st) = 29/61
p(Eng 2nd, given Ψ 1st) = 20/60
(29/61)(20/60) = 580/3660 =0.1585
Outcome B
p(Eng 1st) = 20/61
p(Ψ 2nd, given Eng 1st) = 29/60
(20/61)(29/60) = 580/3660 = 0.1585
2. Use the addition rule to add the probabilities
together:
580/3660 + 580/3660 = 1160/3660 = 0.3169
Normal Continuous Variables!
Up to this point we have only been considering discrete
variables but most variables in research are continuous
How do we determine the probability that a score will be
equal to or greater than a specific score?
Transform score
to z-score and
use Column C of
Table A!
Example!
You pick an individual out of a crowd at random. What is
the probability that they have an IQ equal to or greater
than 120?
Remember! IQ is
normally distributed,
mean = 100,
standard deviation =
16.
Example!
Step 1: Calculate the z-score
z = 1.25
Step 2: Draw a normal curve and place the z-score on
the curve
1.25
-3
-2
-1
0
1
2
3
Step 3: Find the corresponding area under the curve
(Table A; Column C) = 0.1056
Question!
Which of the following is a dichotomous variable?
a. b. c. d. e. Age
Number of Friends
Gender
Result of a coin toss
C and D
Dichotomous when there are
only 2 possible states for the
variable
Binomial Distribution!
Binomial data – the data that result from measuring subjects on a
dichotomous variable
Binomial – Latin for ‘having two names’
Binomial Distribution – Cousin to the normal distribution. Allows
us to determine the probability of certain outcomes for binomial data
Question!
What is the probability of guessing correctly on 2 true/
false questions?
a. b. c. d. e. 1.0000
0.7500
0.5000
0.2500
0.125
We can use the multiplication rule
We can assume events are
independent so we can use the
simplified equation
p(A and B) = p(A) p(B)
P (correct 1 and correct 2) = p(correct) p
(correct)
= p(1/2) p(1/2)
=0.25
= .2500
The Binomial Distribution!
A probability distribution that results when:
1. There is a series of N trials
2. One each trial, there are only 2 possible outcomes (P and Q)
3. On each trial, the two outcomes are mutually exclusive
4. The trials are independent
5. The probability of each outcome stays the same from trial to
trial
The Binomial Distribution!
e.g. every possible combination
of heads and tails for N tosses of
a coin
When these requirements are met:
1. The binomial distribution tells us each possible combination of
outcomes from N trials
2.
The probability of getting each of these outcomes
e.g. probability of HH = 0.25
probability of TT = 0.25
True or false!
The average person eats 8 spiders / year
a) True
b) False
False
Generating Binomial Distribution !
This is if you were completely
guessing (and had no
capacity for reasoning)
For guessing on 1 true/false question
What are the possible outcomes?
Outcome 1: Q1: √
Outcome 2: Q1: X
What is the probability of each outcome?
p(√) = 1/2 = 0.5000
p(X) = 1/2 = 0.5000
Generating Binomial Distribution Average person eats
8 spiders / year
Average person only uses
10% of your brain
For guessing on 2 true/false questions
What are the possible outcomes?
Outcome 1:
Outcome 2:
Outcome 3:
Outcome 4:
Q1: √
Q1: √
Q1: X
Q1: X
Q 2: √
Q 2: X
Q 2: √
Q 2: X
What is the probability of each type of outcome?
p(2 √) = p(Q1 √ Q2 √) = 1/4 = 0.2500
p(1 √) = p(Q1 √ or Q2 √) = 2/4 = 0.5000
p(0 √) = p(Q1 X Q2 X) = 1/4 = 0.2500
There are 2
different ways of
getting 1 right
Generating Binomial Distribution For guessing on 3 true/false questions
What are the possible outcomes?
Outcome 1: Q 1: √ Q 2: √
Q3: √
Outcome 2: Q 1: √ Q 2: √
Q3: X
Outcome 3: Q 1: √ Q 2: X
Q3: √
Outcome 4: Q 1: X Q 2: √
Q3: √
Outcome 5: Q 1: √ Q 2: X
Q3: X
Outcome 6: Q 1: X Q 2: √
Q3: X
Outcome 7: Q 1: X Q 2: X
Q3: √
Outcome 8 Q1: X Q 2: X
Q3: X
What is the probability of each outcome?
p(3 √) = 1/8 = 0.1250
p(2 √) = 3/8 = 0.3750
p(1 √) = 3/8 = 0.3750
p(0 √) = 1/8 = 0.1250
Binomial Distribution When P = 0.50 !
N Possible Outcomes (# of Events of Interest) Probability 1 1 .5000 0 2 3 e.g. 1 correct
guess
.5000 2 .2500 1 .5000 0 .2500 3 .1250 2 .3750 1 .3750 0 .1250 This is in appendix
D, table B
The real table
includes columns for
probabilities other
than 0.5
And so on and so forth for
increasing values of N
Q!
Using the binomial
distribution we just
generated determine the
probability that a woman will
give birth to 2 boys and 1
girl (over the course of 3
pregnancies, assuming boys
and girls are equally
probable).
N Possible Outcomes (# of Events of Interest) Probability 1 1 .5000 0 .5000 2 .2500 1 .5000 0 .2500 3 .1250 2 .3750 1 .3750 0 .1250 2 The probability is…
a) b) c) d) 0.5000
0.3750
0.2500
0.1250
Either of
these
3 Mind your Ps and Qs!
When there are only 2 mutually exclusive events we
denote the probability of occurrence of one as P and
the other as Q
Guessing on a True/False Question
P = guessing correctly = 1/2
Q = guessing incorrectly = 1/2
Giving Birth:
P = having a boy = 1/2
Q = having a girl = 1/2
Flipping a Coin:
P = getting a head = 1/2
Q = getting a tail = 1/2
P + Q = 1.00 only when two events are exhaustive
Which is
assigned P
and which Q is
often arbitrary
Binomial Expansion!
Binomial
distribution
can be
generated
from this
(P + Q)N
To generate the possible outcomes and the probabilities of each
outcome simply expand the expression for the number of trials (N)
and evaluate each term in the expression
Expanding the equation gives you
the particular equation for any
number of trials
Using the Binomial Expansion!
Generate a binomial distribution for 2 True/False
Questions (N = 2)
(P+Q)N
= (P+Q)2
= (P+Q)(P+Q)
= P2 + 2PQ + Q2
Process of
expansion
for N = 2
Interpreting the Binomial Expansion!
(P+Q)N
=
(P+Q)2
1. The letters (P, Q) tell us the kinds of
events that comprise the outcome
= (P+Q)(P+Q)
2.
The exponents tell us how many of that
kind of event there are in the outcome
= P2 + 2PQ + Q2
3.
The coefficients tell us how many ways
there are of obtaining the outcome (if
there is no coefficient this means just 1)
represents all possible
outcomes
Let P = Correct Guess and Q = Incorrect Guess
So P2 represents 1 possible outcome with 2 P events (2 Correct Guesses)
2PQ represents 2 possible outcomes with 1 P and 1 Q event (1 Correct Guess)
Q2 represents 1 possible outcome with 2 Q events (0 Correct Guesses)
Using the Binomial Expansion!
P2 + 2PQ + Q2
We can use the binomial expansion to determine the probability of
getting each of these possible outcomes by substituting the probability
of P and Q in for P and Q
The probability of P = Q = 0.50, so….
Prob. of 2 Correct Guesses = p(2 √) = P2 = (0.50)2 = 0.2500
Prob. of 1 Correct Guess = p(1 √) = 2PQ = 2(0.50)(0.50) = 0.5000
Prob. of 0 Correct Guesses = p(0 √) = Q2 = (0.50)2 = 0.2500
Next Lecture!
More Binomial
Distribution!