Download Notes for Module 6 - UNC

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Soci708 – Statistics for Sociologists
Module 6 – Probability & Sampling Distributions1
François Nielsen
University of North Carolina
Chapel Hill
Fall 2008
1
Adapted in part from slides for the course Quantitative Methods in Sociology
(Sociology 6Z3) taught at McMaster University by Robert Andersen (now at
University of Toronto)
1 / 114
Why Study Probability?
Probability is essential for several reasons
É
É
We calculate statistics (e.g., mean, median, variance,
regression coefficients, etc.) from a sample of units drawn at
random from a population
Thus the statistic calculated from the sample is the result of a
random process
É
In fact, it is a random variable
É
Thus we need to study probability to figure out how sample
statistics relate to population parameters
É
Also, the notion of conditional probability is essential to
understanding association between variables
2 / 114
Origins of Probability Theory (1)
Origin in gambling circles in 17th Century France
É
Gambler Chevalier de Méré
contacted mathematical
friends Blaise Pascal and
Pierre de Fermat with
gambling questions
Blaise Pascal (1623–1662)
3 / 114
Origins of Probability Theory (2)
É
Subsequent correspondence
between Pascal and Fermat
is the origin of probability
theory
Pierre de Fermat (1601–1665)
4 / 114
Origins of Probability Theory (3)
Other portrait of Fermat (younger)
5 / 114
Random Trials, Sample Spaces, and Events
Random Trials
É
É
A random trial is an activity in which there are two or more
different possible outcomes, and uncertainty exists in advance
as to which outcome will occur.
Examples:
É
É
É
É
Throw one standard die
Throw two standard dice at the same time
Draw one society at random from a set of 325 societies in the
Ethnographic Atlas cross-classified by beliefs in high gods and
by subsistence technology
Draw one childbirth at random from a set of childbirths to
Pima mothers cross-classified according to diabetic status of
mother and presence of one or more birth defects in the
newborn (next slide)
6 / 114
Random Trials (2)
É
An example of random trial is drawing a birth at random from
the 1207 births represented in the following table:
Table 1. Child Birth Defects by Mother’s
Diabetic Status Among Pima Indian Mothers
Data
É
B1 : 1+ Defect
B2 : No Defect
Total
A1 : Nondiabetic
A2 : Prediabetic
A3 : Diabetic
31
13
9
754
362
38
785
375
47
Total
53
1154
1207
Various proportions calculated from the table can be
interpreted as probabilities of drawing a case with certain
characteristics
7 / 114
Random Trials (3)
É
The cells of the contingency table contain the joint frequencies,
or counts corresponding to each combination of categories.
É
The row marginal frequencies (in the last column marked
Total) and the column marginal frequencies (in the last row
marked Total) are calculated by adding up the cell frequencies
within the corresponding row and column, respectively
É
The row marginals and the column marginals both add up to
N, the total number of cases (1207 in Table 1)
É
We saw all this earlier
8 / 114
Probability Models (1)
É
The different possible outcomes of a random trial are called
the basic outcomes; the set of all basic outcomes is called the
sample space, e.g.,
É
The sample space associated with throwing a standard die is
{1 dot, 2 dots, 3 dots, 4 dots, 5 dots, 6 dots}
É
The sample space associated with throwing 2 standard dice is
. . . Q – What is it?
The sample space associated with drawing a birth at random
from the contingency table is depicted in Table 2, where each
Oi symbolizes a basic outcome of the random trial
É
É
For example O3 represents a birth where the mother is
diagnosed as prediabetic and the child is found to have one or
more birth defects
9 / 114
Probability Models (2)
Table 2. Sample Space for the Random
Trial, Selecting a Birth at Random
Data
A1 : Nondiabetic
A2 : Prediabetic
A3 : Diabetic
É
B1 : 1+ Defect
B2 : No Defect
O1
O3
O5
O2
O4
O6
The sample space can be univariate (as in the throw of one
standard die), bivariate (as in Table 2), or multivariate
10 / 114
Probability Models (3)
É
An event is a subset of the basic outcomes that constitute the
sample space
É
É
An event is said to occur if any one of its basic outcomes is
realized in the random trial
Notation (refer to Table 2):
É
É
É
É
Oi ∈ E, where Oi is an outcome and E is an event means “basic
outcome Oi belongs to event E”
E = {O1 , O3 , O6 } where E is an event and the Oi are basic
outcomes means “event E consists of basic outcomes O1 , O3
and O6 }”
E = {Oi |B=B2 } means “event E consists of all basic outcomes
Oi such that B is equal to B2 ”; thus E consists of O2 , O4 , and O6
Ø = {} means the null event that consists of no outcome
11 / 114
Probability Models (4)
Complementation, Addition, & Intersection of Events (NWW Figure 4.4 p. 113)
12 / 114
Probability Models (5)
Complementation, Addition, & Intersection of Events
É
É
The set of all basic outcomes not contained in an event E is
called the complementary event to E and is denoted by Ec
The joint occurrence of two events E1 and E2 is another event,
denoted E1 ∩ E2 , that consists of the basic outcomes common
to E1 and E2
É
É
É
The occurrence of event E1 or event E2 is another event, to be
denoted by E1 ∪ E2 , that consists of all basic outcomes
contained in either E1 or E2 or in both E1 and E2
É
É
É
E1 ∩ E2 is also called the intersection of E1 and E2
E1 ∩ E2 is equivalent to E1 and E2
E1 ∪ E2 is called the union of E1 and E2
E1 ∪ E2 is equivalent to E1 or E2
If events E1 and E2 have no basic outcomes in common, they
are said to be mutually exclusive or disjoint events
13 / 114
Probability
Andrey Nikolaievitch Kolmogorov (Tambov 1908–Moskow 1987) provided axiomatic
foundations of probability in 1933
Younger
Older as academician
14 / 114
Probability Rules (1)
Probability Rules – The first three rules are called Kolmogorov’s Axioms
1. The probability P(A) of any event A is a number between 0
and 1, i.e.
0 ≤ P(A) ≤ 1
2. The probability that some basic outcome in the sample space
will occur is 1 and the probability that none will occur is 0, i.e.
P(S) = 1 and P(;) = 0
3. Addition rule for disjoint events. If two events A and B are
mutually exclusive or disjoint (i.e., they have no outcomes in
common and so can never occur together),
P(A ∪ B) = P(A) + P(B)
Generally any countable sequence of pairwise disjoint events
E1 , E2 , . . . satisfies
X
P(E1 ∪ E2 ∪ · · · ) =
P(Ei )
i
15 / 114
Probability Rules (2)
Probability Rules – The first three rules are called Kolmogorov’s Axioms
4. Complement rule. If Ac denotes the complement of an event A
(i.e., the event that A does not occur), the rule states that
P(Ac ) = 1 − P(A)
16 / 114
Meaning of Probability
Objective & Subjective Approaches to Probability
É
É
Example: What is the probability of rain in Chapel Hill on 10
September?
Interpretations:
1. Objective Interpretation: Probability is associated with the
relative frequency of occurrence of the event in the long run
under constant causal conditions; applies only to repeatable
events. Also, probability by construction (e.g., in
manufacturing dice)
2. Subjective Interpretation: Based on personal assessment; What
is the probability that Napoleon was poisoned? Applies also to
non-repeatable events.
17 / 114
Probability Expressed as Odds
1. From Probability to Odd:
É
É
É
A is the event “My paper is rejected by Social Forces”
Suppose P(A) = .8; then P(Ac ) = 1 − P(A) = .2
Then Odd(A) = P(A)/P(Ac ) = .8/.2 = “4 to 1” (the ratio is
often expressed using integers)
2. From Odd to probability:
É
É
É
É
Suppose the odd of A is “d1 to d2 ” (where d1 and d2 are
integers).
Then P(A) = d1 /(d1 + d2 ).
E.g., suppose the odd of dying next year (for a given age and
sex category) is 1 to 499
Then P(dying next year) = 1/(1 + 499) = 1/500 = .002
18 / 114
Probability Distributions
Concept of Probability Distribution
É
É
A probability distribution is the assignment of probabilities to
each of the basic outcomes in the sample space
In the Pima example, the univariate or marginal probability
distributions associated with drawing a birth at random from
the contingency table of birth defects by mother’s diabetic
status are shown in Table 3a & Table 3b
É
É
For example, the probability of drawing a prediabetic mother
is .311; it is obtained by dividing the number of prediabetic
mothers by the total number of cases
Other univariate probabilities are obtained in the same way,
by dividing marginal frequencies by N.
19 / 114
Table 3a. Univariate (Marginal)
Distribution of Mother’s Diabetic Status
Among Pima Indian Mothers (N=1207)
Outcome
Probability
Symbol
A1 : Nondiabetic
A2 : Prediabetic
A3 : Diabetic
0.650
0.311
0.039
P(A1 )
P(A2 )
P(A3 )
Total
N
1.000
1207
P(S)
20 / 114
Table 3b. Univariate (Marginal)
Distribution of Birth Defects Among Births
to Pima Indian Mothers (N=1207)
Outcome
Probability
Symbol
B1 : 1+ Defect
B2 : No Defect
0.044
0.956
P(B1 )
P(B2 )
Total
N
1.000
1207
P(S)
21 / 114
É
The bivariate (or, generally, joint or multivariate) probability
distribution associated with drawing a birth at random from
the contingency table is shown in Table 4.
É
For example the probability of drawing a case for which the
mother is nondiabetic and the child has no defect is .625
É
This and other probabilities in Table 4 are obtained by
dividing cell frequencies in Table 1 by N (1207).
É
Probabilities within the table (not counting the marginal
frequencies) add up to 1.
22 / 114
Table 4. Bivariate (Joint) Probability
Distribution for Diabetic Status of Mother
(A) and Birth Defect (B) (N=1207)
Data
B1 : 1+ Defect
B2 : No Defect
Total
A1 : Nondiabetic
A2 : Prediabetic
A3 : Diabetic
0.026
0.011
0.007
0.625
0.300
0.031
0.650
0.311
0.039
Total
0.044
0.956
1.000
23 / 114
Joint & Marginal Probability Distributions
É
É
The joint or bivariate probability distribution corresponds to
probabilities of the outcomes consisting in the intersections of
the categories of the two variables
Thus the joint probability distribution of Table 4 can be
represented in symbolic form as in Table 5 – each cell
corresponds to the probability of joint occurrence, or
intersection of two univariate events
É
É
For example the probability .300 associated with drawing a
case with a prediabetic mother and a child with no defect
corresponds to the probability P(A2 ∩ B2 ) of the joint
occurrence of A2 (Prediabetic) and B2 (No Defect)
The marginals of Tables 4 and 5 represent the probabilities of
the univariate events Ai and Bj , same as in Tables 3a and 3b.
24 / 114
Table 5. Bivariate (Joint) Probability
Distribution for Diabetic Status of Mother
(A) and Birth Defect (B) in Symbolic Form
Data
A1 : Nondiabetic
A2 : Prediabetic
A3 : Diabetic
Total
B1 : 1+ Defect
B2 : No Defect
Total
P(A1 ∩ B1 )
P(A2 ∩ B1 )
P(A3 ∩ B1 )
P(A1 ∩ B2 )
P(A2 ∩ B2 )
P(A3 ∩ B2 )
P(A1 )
P(A2 )
P(A3 )
P(B1 )
P(B2 )
P(S)
25 / 114
Marginal Probabilities
É
The marginal probabilities for a bivariate probability
distribution are found as
X
P(Ai ) =
P(Ai ∩ Bj )
j
P(Bj ) =
X
P(Ai ∩ Bj )
i
where the sums are over all events Bj and Ai , respectively
É
For example, in Table 4 the marginal probability P(A1 ) is
equal to .650 and is equal to the sum of the probabilities in
the entries of the first row (.026 + .625 = .650, rounded)
26 / 114
Conditional Probabilities
É
É
If E1 and E2 are any two events and P(E2 ) is not equal to 0,
the conditional probability of E1 given E2 , denoted P(E1 |E2 ), is
defined as
P(E1 ∩ E2 )
P(E1 |E2 ) =
P(E2 )
For example, the probability that a child has birth defects (B1 )
given that the mother is diabetic (A3 ) is obtained from figures
in Table 4 as
P(B1 |A3 ) =
P(B1 ∩ A3 )
P(A3 )
=
.007
.039
= .191
27 / 114
Conditional Probability Distributions
É
É
The conditional probability distribution of a variable B (e.g.,
presence of birth defects) given the value of another variable
A (e.g., a certain diabetic status of the mother) assigns to each
basic outcome of B its conditional probability, given the value
of A
The conditional probability distributions of birth defects given
diabetic status of the mother are shown in Table 6
É
É
The rows of Table 6 correspond to the conditional distributions
of B corresponding to a given value of A (A1 , A2 , or A3 )
I.e., there are three conditional distributions of B, one for each
value of A
28 / 114
Table 6. Conditional Probability Distributions of Birth
Defect Given Diabetic Status of Mother Among Pima Indian
Mothers (N=1207)
Data
A1 : Nondiabetic
A2 : Prediabetic
A3 : Diabetic
B1 : 1+ Defect
B2 : No Defect
Total
N
0.039
0.035
0.191
0.961
0.965
0.809
1.000
1.000
1.000
785
375
47
29 / 114
Conditional Distributions (3)
Conditional Distribution & Patterns of Association
É
The conditional distributions of the response variable (here
birth defects) given categories of the explanatory variable
(here diabetic status of mother) reveals the association
between response and explanatory variables
É
É
É
Causal patterns are revealed by comparing the conditional
distributions of the response variable across different categories
of the explanatory variable (rows in this case)
E.g. we note that children born to diabetic mothers have a
much greater chance of birth defects (19.1%) than children
born to nondiabetic mothers (3.9%) or prediabetic mothers
(3.5%)
In published tables it is customary to use percentages (rather
than probabilities) as in Table 7
30 / 114
Table 7. Conditional Probability Distributions of Birth
Defect Given Diabetic Status of Mother, in Percentage Form
(N=1207)
Data
A1 : Nondiabetic
A2 : Prediabetic
A3 : Diabetic
B1 : 1+ Defect
B2 : No Defect
Total
N
3.9
3.5
19.1
96.1
96.5
80.9
100.0
100.0
100.0
785
375
47
31 / 114
Causal Ordering Assumption
É
É
Comparing conditional distributions to reveal patterns of
association requires the assumption of a causal ordering: one
variable is assumed dependent (the response), the other
independent (the cause)
Conditional distributions in the other direction are not
necessarily meaningful, except to describe the composition of
categories of birth outcomes in terms diabetic status of
mother, as in Tables 8 (probabilities) and 9 (percentages)
É
É
On what substantive basis would researchers decide which
variable is the cause and which the effect in this case, i.e. that
they should look at Tables 6 or 7 rather than Tables 8 or 9?
Is the causal ordering A → B the only one possible in this case?
32 / 114
Table 8. Conditional Probability
Distribution of Diabetic Status of Mother
Given the Presence or Absence of Birth
Defect Among Pima Indian Mothers
(N=1207)
Data
B1 : 1+ Defect
B2 : No Defect
A1 : Nondiabetic
A2 : Prediabetic
A3 : Diabetic
0.585
0.245
0.170
0.653
0.314
0.033
Total
N
1.000
53
1.000
1154
33 / 114
Table 9. Conditional Probability
Distribution of Diabetic Status of Mother
Given the Presence or Absence of Birth
Defect, in Percentage Form (N=1207)
Data
A1 : Nondiabetic
A2 : Prediabetic
A3 : Diabetic
Total
N
B1 : 1+ Defect
B2 : No Defect
58.5
24.5
17.0
65.3
31.4
3.3
100.0
53
100.0
1154
34 / 114
Conventional Presentation
É
Given a choice of response and explanatory variables, it is
conventional in sociology to arrange published tables with
É
É
É
É
categories of the explanatory variable as the rows
categories of the response variable as the columns
percentages calculated within rows, as in Table 7
so that associations appear from comparison of rows
É
But convention may be different in other fields!
É
In the end the important principle is to percentage within
categories of the explanatory variable
35 / 114
More Probability Rules (1)
5. General Addition Rule. For any two events E1 and E2
P(E1 ∪ E2 ) = P(E1 ) + P(E2 ) − P(E1 ∩ E2 )
The reason for removing the probability of the intersection is
to correct for the “double counting” of basic outcomes that are
included in both E1 and E2 .
É
É
E.g., in the Pima example
P(A3 ∪ B2 ) = P(A3 ) + P(B2 ) − P(A3 ∩ B2 )
(1)
= .403 + .342 − .206 = .539
(2)
For any two mutually exclusive events E1 and E2
P(E1 ∪ E2 ) = P(E1 ) + P(E2 )
É
which is obvious because disjoint events have no outcomes in
common so their intersection is empty and has probability zero
Generally, for n mutually exclusive events E1 , E2 , . . . , En
P(E1 ∪ E2 ∪ . . . ∪ En ) = P(E1 ) + P(E2 ) + . . . + P(En )
36 / 114
More Probability Rules (2)
6. Multiplication Rule. For any two events E1 and E2 the
probability of the joint occurrence of the events E1 ∩ E2 is the
probability of occurrence of one of the events (say, E1 ) times
the conditional probability of occurrence of the other event,
given the first event, i.e.
P(E1 ∩ E2 ) = P(E1 )P(E2 |E1 ) = P(E2 )P(E1 |E2 )
The multiplication theorem forms the basis of the
representation of joint events as the limbs of a probability tree,
as shown in the probability tree for the actual presence of
AIDS given test result for 1988 data (next slide)
37 / 114
Probability Tree
Probability tree of AIDS (disease present/absent) and test result (positive/negative)
38 / 114
Sensitivity & Specificity
Important concepts in testing
É
In a situation of testing for presence of a disease
É
É
É
B1 (B2 ) are the events “the test is positive (negative)”,
respectively
A1 (A2 ) are the events “disease is present (absent)”,
respectively
The sensitivity of the test is the probability that the test is
positive given that the person has the disease, i.e.
sensitivity = P(B1 |A1 ) = .98
É
The specificity of the test is the probability that the test is
negative given that the person does not have the disease
specificity = P(B2 |A2 ) = .99
É
A substantively very important question in context is “What is
the probability P(A1 |B1 ) that the disease is present given that
the test in positive?”
É
The answer is given by Bayes’s Theorem (see later)
39 / 114
Relationships Between Variables
Independent Events
É
Two events E1 and E2 are said to be independent if the
probability that one event occurs is unaffected by whether or
not the other event occur, i.e. if one of the following
(equivalent) conditions holds
1. P(E1 |E2 ) = P(E1 ) or P(E2 |E1 ) = P(E2 )
2. P(E1 ∩ E2 ) = P(E1 )P(E2 )
É
É
Condition 2 follows from condition 1 because of the
multiplication theorem
In general for n independent events E1 , E2 , . . . , En
P(E1 ∩ E2 ∩ . . . ∩ En ) = P(E1 )P(E2 ) . . . P(En )
40 / 114
Relationships Between Variables
Independent Variables
É
Two categorical variables A (with categories Ai ) and B (with
categories Bj ) are independent if
P(Ai ∩ Bj ) = P(Ai )P(Bj )
É
for all Ai and Bj I.e., A and B are independent if the joint
probabilities are equal to the products of the marginal
probabilitites for all combinations of categories
For the Pima example the joint probabilities of birth defect by
diabetic status of mother under the assumption of
independence are shown symbolically in Table 10 and as
numbers in Table 11
É
The probabilities in Table 10 and 11 are counterfactual, since
the two variables are not in fact independent!
41 / 114
Relationships Between Variables
Joint Probabilitites Under Assumption of Independence
Table 10. Joint Probability Distribution of
Diabetic Status of Mother (A) and Presence
of Birth Defects (B) Under Assumption of
Independence, in Symbolic Form
Data
A1 : Nondiabetic
A2 : Prediabetic
A3 : Diabetic
Total
É
B1 : 1+ Defect
B2 : No Defect
Total
P(A1 )P(B1 )
P(A2 )P(B1 )
P(A3 )P(B1 )
P(A1 )P(B2 )
P(A1 )P(B2 )
P(A1 )P(B2 )
P(A1 )
P(A2 )
P(A3 )
P(B1 )
P(B2 )
P(S)
Joint probabilities under the assumption of independence are
calculated as the products of the marginal probabilities in
corresponding row and column
42 / 114
Relationships Between Variables
Joint Probabilitites Under Assumption of Independence
Table 11. Joint Probability Distribution of
Diabetic Status of Mother (A) and Presence
of Birth Defects (B) Under Assumption of
Independence (N=1207)
Data
B1 : 1+ Defect
B2 : No Defect
Total
A1 : Nondiabetic
A2 : Prediabetic
A3 : Diabetic
0.029
0.014
0.002
0.622
0.297
0.037
0.650
0.311
0.039
Total
0.044
0.956
1.000
43 / 114
Relationships Between Variables
Predicted Frequencies
É
É
One calculates the predicted or expected frequencies under the
assumption of independence by multiplying the predicted
probabilities of Table 11 by the total number N of
observations (1207 in the example) to obtain Table 12
Comparing Table 12 with Table 1 (original data) one notes
that:
É
É
É
É
Expected frequencies are not (necessarily) integers
Expected cell frequencies sum up to the same row and column
marginal totals as in the original table (Table 1)
Discrepancies between Table 12 and Table 1 suggest that A
and B are not independent
But how does one tell if the discrepancies indicate
non-independence?
44 / 114
Table 12. Expected Cell Frequencies Under
Assumption of Independence
Data
A1 : Nondiabetic
A2 : Prediabetic
A3 : Diabetic
Total
B1 : 1+ Defect
B2 : No Defect
Total
34.5
16.5
2.1
750.5
358.5
44.9
785
375
47
53
1154
1207
Table 1 (Repeated for Comparison). Child
Birth Defects by Mother’s Diabetic Status
Among Pima Indian Mothers
Data
B1 : 1+ Defect
B2 : No Defect
Total
A1 : Nondiabetic
A2 : Prediabetic
A3 : Diabetic
31
13
9
754
362
38
785
375
47
Total
53
1154
1207
45 / 114
Dependence
Dependent Events & Dependent Variables
É
Two events are dependent if they are not independent.2
É
Two variables are dependent if they are not independent.3
2
3
Warning! This will be on the test.
This too will be on the test.
46 / 114
Dependence
É
The nature of the relationship between two dependent
variables can be studied by comparing the conditions
probability distributions for one variable (the dependent
variable), conditional on each possible outcome of the other
variable (the independent variable)
É
É
E.g., Table 6 or 7 show the dependence between mother
diabetic status and birth defects
Comparing the conditional distributions of the dependent
variable, conditional on the values of the independent
variable(s), is a general strategy to reveal the nature of the
dependence (association) between two variables
É
E.g., regression analysis (simple and multiple) consists in
modeling the location (mean) of the distribution of a
dependent variable Y as conditional on the values of one or
several independent variables X
47 / 114
Dependence
Chi-Squared Measure of Dependence
É
We use the notation
É
É
É
É
Oij for the observed frequency corresponding to row i and
column j (entries of Table 1)
Eij for the corresponding expected frequency under
assumption of independence (entries of Table 12)
It stands to reason that the dependence between A and B is in
some way proportional to the discrepancies between observed
frequencies and those expected under the assumption of
independence
The most commonly used measure of dependence is the
Pearson chi-squared statistic, denoted χ 2∗ ; it measures the
discrepancy with the formula
Chi-squared = χ 2∗ =
X (Oij − Eij )2
Eij
summing over all cells in the table
48 / 114
Dependence
Chi-Squared Measure of Dependence
É
In words, it is the sum of the squared discrepancies between
Oij and Eij , with each squared term divided by Eij
É
The components
(Oij −Eij )2
Eij
of the chi-squared statisitc are
shown in Table 13
É
É
The cell A3 ∩ B1 (number of babies with birth defects born to
diabetic mothers) corresponds to a large discrepancy
The value of Chi-squared statistic in Table 13 (for the
comparison of Table 1 and Table 12) is 25.511
É
As explained later it is highly significant, in the sense that an
overall discrepancy this large in unlikely to be due to chance
49 / 114
Dependence
Chi-Squared Measure of Dependence
Table 13. Components
(Oij −Eij )2
Eij
of
Chi-squared Statistic for Child Birth Defects
by Mother’s Diabetic Status Among Pima
Indian Mothers
Data
A1 : Nondiabetic
A2 : Prediabetic
A3 : Diabetic
Total
É
B1 : 1+ Defect
B2 : No Defect
0.349
0.730
23.312
0.016
0.034
1.071
χ 2∗ = 25.511
We now drop momentarily the topic of chi-squared; we will
come back to it later in the context of statistical inference
50 / 114
Bayes’s Theorem
É
É
Example: AIDS test in 1988 (Mendehall 1991: 94–96)
Events are denoted
É
É
É
É
É
disease is present
disease is absent
test is positive
test is negative
What we know:
É
É
É
É
É
A1 :
A2 :
B1 :
B2 :
P(A1 ) = .00001858 = 1.858 × 10−5 (prevalence of AIDS in
entire population in 1988)
P(A2 ) = 1 − P(A1 ) = .99998142
P(B1 |A1 ) = .98 (probability test is positive when disease is
present)
P(B1 |A2 ) = .01 (probability test is positive when disease is
absent)
P(A1 ) and P(A2 ) are called prior probabilities (i.e., prior to
knowing the result of the test)
51 / 114
Bayes’s Theorem
É
The questions that Bayes’s Theorem tries to answer are
É
É
What is P(A1 |B1 ) (probability that I have AIDS given that the
test is positive)?
What is P(A1 |B2 ) (probability that I have AIDS given that the
test is negative)?
É
These are good questions! (Especially if you are the one taking
the test.)
É
The answer is given by Bayes’s Theorem
P(Ai )P(Bj |Ai )
P(Ai |Bj ) = P
i P(Ai )P(Bj |Ai )
É
We do not use this formula right away
É
It helps to first represent the problem as a probability
distribution on a bivariate sample space, as in Table 14
52 / 114
Bayes’s Theorem
Table 14. Joint Probability Distribution of Presence of
Disease (A) and Test Result (B) for AIDS in 1988
Data
A1 : Disease present
A2 : Disease absent
B1 : Test positive
B2 : Test negative
Total
.00001821
.0099998142
.00000037
.9899816
.00001858
.99998142
.01002
.98998
1.000
Total
É
Steps (details next slide):
1.
2.
3.
4.
Entries in black are given
Calculate red entries as P(A1 ∩ B1 ) and P(A2 ∩ B1 )
Calculate green entries
From information in completed table calculate desired
probability as
P(A1 |B1 ) =
P(A1 ∩ B1 )
P(B1 )
=
.00001821
.01002
= .0018174
53 / 114
Bayes’s Theorem
Steps to Find P(A1 |B1 )
É
We use the following steps:
É
É
Step 1. At the outset we know the marginal probabilities
P(A1 ) and P(A2 ) = 1 − P(A1 ), shown in black in the table. The
goal is to fill the entire joint probability table with the
corresponding probabilities.
Step 2. Calculate
P(A1 ∩ B1 ) = P(A1 )P(B1 |A1 ) = (.00001858)(.98) = .00001821
P(A2 ∩ B1 ) = P(A2 )P(B1 |A2 ) = (.99998142)(.01) = .0099998142
É
These probabilities are shown in red in Table 14. They are
calculated using the multiplication formula, as in the
probability tree shown earlier.
Step 3. Calculate the remaining entries of Table 15, by
summing down the first column to find P(B1 ), then subtracting
to find the other entries. The resulting figures are shown in
green.
54 / 114
Bayes’s Theorem
Steps to Find P(A1 |B1 )
É
Step 4. Having the complete joint probability distribution,
calculate the desired conditional probability “the other way
around” as
P(A1 |B1 ) =
P(A1 ∩ B1 )
P(B1 )
=
.00001821
.01002
= .0018174
Thus, in 1988 the probability of having the disease given that
the test came up positive was only .0018174 (or about 2 in
1,000)!
É
É
The probability P(A1 |B1 ) = .0018174 is called the posterior
probability of A1 (i.e., after learning the result of the test)
Compare with the prior probability P(A1 ) = 1.858 × 10−5 (i.e.,
before learning the result)
55 / 114
Bayes’s Theorem
É
Looking back at Bayes’s formula,
P(Ai )P(Bj |Ai )
P(Ai |Bj ) = P
i P(Ai )P(Bj |Ai )
it appears that is is really a shortcut for the strategy of first
reconstructing the complete joint probability distribution from
the known prior probabilities and conditional probabilities,
and then using information in the table to calculate the
desired conditional probability
É
In fact, the numerator is equal to P(Ai ∩ Bj ) (one of the joint
probabilities) and the denominator is equal to the marginal
probability P(Bj ), which is equal to the sum of the joint
probabilities in the cells of the table.
56 / 114
Random Variables
É
A random variable is a variable whose value is a numerical
outcome of a random phenomenon
É
É
E.g. Tossing a coin with S={H,T} is a random event. Now
assign H=1, T=0. Tossing a coin has become a random
variable with S={0,1}
E.g. Throw a die. “Number of pips on the face” is a random
variable with S={1, 2, 3, 4, 5, 6}
57 / 114
Discrete Random Variables
É
A discrete random variable X has a finite number of possible
values
É
The probability distribution of X lists all the values and their
probabilities
Value of X:
Probability:
É
x1
p1
x2
p2
x3
p3
...
...
xk
pk
The probabilities pi must satisfy two requirements:
1. Every probability pi is a number between 0 and 1
2. p1 + p2 + · · · + pk = 1
58 / 114
Discrete Random Variable
Reproductive Success Among Xavante Indians4
4
Daly and Wilson 1983, Figure 5-6 p. 89
59 / 114
Discrete Random Variable
Reproductive Success of Female Xavante
Table 15. Probability Distribution of Reproductive Success (Total # of
Children) of Xavante Females
x
Frequency
P(X = x)
0
1
.023
1
7
.159
2
7
.159
3
7
.159
4
7
.159
5
7
.159
6
4
.091
7
3
.068
8
1
.023
Total
44
1.000
É
Table 15 shows the probability distribution of X for females,
corresponding to the random trial “select a Xavante female
randomly”
É
E.g., probability that the female has 6 or more children is
P(≥ 6) = P(X = 6) + P(X = 7) + PX = 8)
= .091 + .068 + .023 = .182
60 / 114
Discrete Random Variable
Reproductive Success in Male Xavante
É
Table 16. Probability
Distribution of Reproductive
Success (Total # of Children)
of Xavante Males
x
Frequency
P(X = x)
0
1
2
3
4
5
6
7
9
11
23
4
12
14
7
7
6
2
7
1
1
1
.065
.194
.226
.113
.113
.097
.032
.113
.016
.016
.016
Total
62
1.000
Table 16 shows the
probability distribution of X
for males
É
Q – What is the relative
social status of the male
who has 23 children?
61 / 114
Discrete Random Variable
Another Example: Toss of Four Coins
É
É
É
What is the probability distribution of variable X = “number
of heads in four tosses of a coin”
Same as the sum of the four outcomes with H=1, T=0
Assume:
1. The coin is balanced – each toss equally is likely to give H or T
2. The coin has no memory – tosses are independent
É
Then each sequence of tosses – e.g. HTHH – has probability
P(HTHH) =
É
1
2
×
1
2
×
1
2
×
1
2
=
1
16
Number of heads X has possible values 0, 1, 2, 3, 4
62 / 114
Discrete Random Variable
Toss of Four Coins (cont’d)
É
The event {X = 1} can occur in only four ways: HTTT, THTT,
TTHT, TTTH, so that
P(X = 1) =
=
É
count of ways X = 1 can occur
16
4
16
= 0.25
We can find the probability for each value of X. The resulting
probability distribution of X is
Value of X
Probability
0
0.0625
1
0.25
2
0.375
3
0.25
4
0.0625
63 / 114
Discrete Random Variable
Toss of Four Coins (cont’d)
É
Again, the probability distribution of X is
Value of X
Probability
É
0
0.0625
1
0.25
2
0.375
3
0.25
4
0.0625
From it one can calculate the probability of various events
É
E.g., the probability of “three or more heads” is:
P(X ≥ 3) = 0.250 + 0.0625 = 0.3125
É
For the probability of “at least one head” it is easier to use the
complement rule:
P(X ≥ 1) = 1 − P(X = 0) = 1 − 0.0625 = 0.9375
64 / 114
Continuous Random Variables
É
É
A continuous random variable X takes all values in an interval
of numbers
The probability distribution of X is described by a density
curve such that
1. The probability of any event is the area under the density
curve and above the values of X that make up the event
2. The total area under the density curve is 1
É
In a continuous probability distribution only intervals have
nonzero probability. Each individual outcome (precise value of
X) has probability 0
É
I.e., for any continuous distribution P(X = x) = 0
65 / 114
Continuous Random Variables
Continuous Probability Model5
5
From Moore & McCabe 2006, Figure 4.10 p.284
66 / 114
Continuous Random Variables
Uniform Probability Distribution6
É
Continuous uniform distributions examples:
É
É
6
Spinner (“continuous roulette”)
Uniform number generator in computer programs
From Moore & McCabe 2006, Figure 4.9 p.283
67 / 114
Continuous Random Variables
Normal Distributions
É
É
N(µ, σ) denotes a normal distribution with mean µ and
standard deviation σ
If X is N(µ, σ) then the standardized variable
Z=
É
(X − µ)
σ
is distributed as N(0, 1)
We saw earlier how to use tables or software to calculate:
É
É
given a quantile z (value of Z) calculate P(Z ≤ z)
given a probability α, find z such that P(Z ≤ z) = α
68 / 114
Discovery of the Normal Distribution (1)
Pierre Simon Laplace (1749-1827)
69 / 114
Discovery of the Normal Distribution (2)
Carl Friedrich Gauss (1777-1855) – Younger
70 / 114
Discovery of the Normal Distribution (3)
Carl Friedrich Gauss (1777-1855) – Older
71 / 114
Descendants of the Normal Distribution
χ 2 (Chi-squared), t, and F Distributions
É
É
At the end of 19th century and beginning of 20th century, the
main development of statistics is taking place in Great Britain
Karl Pearson, William Gosset (“Student”) and Ronald Fisher
derive three families of distributions derived from N(0, 1) that
arise naturally in statistical problems:
É
χ 2 (df ) is a sum of df squared zs
É
É
t is a ratio of a z to the the square root of a χ 2 divided by its df
É
É
Arises in the distribution of a sum of squared discrepancies
Arises in the distribution of the sample mean divided by its
standard error
F is a ratio of two χ 2 s each divided by its df
É
Arises in the distribution of the ratio of two variances
72 / 114
Karl Pearson (1857–1936)
73 / 114
William Sealey Gosset a.k.a. “Student” (1876-1937)
É
William Gosset was working
for the Guiness brewery in
Dublin
É
Company policy did not
allow employees to publish
under their own name
É
He used the pen name
“Student” to publish his
famous paper on the t
distribution in Biometrika in
1908
74 / 114
Sir Ronald Aylmer Fisher (1890-1962)
75 / 114
Characteristics of χ 2 , t, and F
76 / 114
Means & Variances of Random Variables
É
É
The mean µX of a random variable X is similar to the average
X but takes into account fact that events are not equally likely
Suppose X is a discrete random variable with distribution
Value of X:
Probability:
x1
p1
x2
p2
x3
p3
...
...
xk
pk
To find the mean of X, add the products of each possible value
of X multiplied by its probability:
µ X = x 1 p1 + x 2 p2 + . . . + x k pk
X
=
x i pi
É
µX is also called the expected value or expectation of X,
denoted E(X)
77 / 114
Means & Variances of Random Variables
É
The variance σX2 of a random variable X is similar to the
sample variance s2 but takes into account fact that events are
not equally likely, i.e.
É
É
The variance is the average value of the squared deviations
(X − µX )2 of the variable X from the mean µX
Suppose X is a discrete random variable with distribution
Value of X:
Probability:
x1
p1
x2
p2
x3
p3
...
...
xk
pk
and that µX is the mean of X. The variance of X is:
σX2 = (x1 − µX )2 p1 + (x2 − µX )2 p2 + . . . + (xk − µX )2 pk
X
=
(xi − µX )2 pi
É
The standard deviation σX of X is the square root of the
variance σX2
78 / 114
Means & Variances of Random Variables
Example: Mean µX & Variance σX2 of Number of Children of Xavante Women
Table 17. µX & σX of Number of Children of
Xavante Females
xi
pi
xpi
(xi − 3.591)2 pi
0
1
2
3
4
5
6
7
8
0.023
0.159
0.159
0.159
0.159
0.159
0.091
0.068
0.023
0.000
0.159
0.318
0.477
0.636
0.795
0.545
0.477
0.182
(0 − 3.591)2 (0.023) = 0.293
(1 − 3.591)2 (0.159) = 1.068
(2 − 3.591)2 (0.159) = 0.403
(3 − 3.591)2 (0.159) = 0.056
(4 − 3.591)2 (0.159) = 0.027
(5 − 3.591)2 (0.159) = 0.316
(6 − 3.591)2 (0.091) = 0.528
(7 − 3.591)2 (0.068) = 0.792
(8 − 3.591)2 (0.023) = 0.442
1.00
µX = 3.591
σX2 = 3.924
Total
É
Thus µX = 3.591, σX2 = 3.924 & σX =
p
3.924 = 1.981
79 / 114
Correlation of Two Random Variables
É
We saw earlier that the sample correlation rXY of two variable
X and Y is the sum of the products of the standardized scores
ZX and ZY divided by n − 1:
rXY =
É
É
1
n−1
ZX ZY
A similar formula exists for the correlation ρ between two
random variables X and Y with joint probability distribution
pij
The formula is
ρ=
É
X
X
zX zY pij
The formula for the correlation is illustrated by an example of
the relationship between subsistence technology X and Y
belief centralization based on data from the data set
Ethnographic Atlas, a theory apocryphally attributed to Dr.
Grossgrabenstein
80 / 114
Correlation of Two Random Variables
The Grossgrabenstein Theory: Subsistence Technology & Monotheism
81 / 114
Correlation of Two Random Variables
Subsistence Technology & Monotheism
Table 18. Joint Frequency
Distribution of Technology Score
(X) & Monotheism Score (Y)
Y\X
1
2
3
4
Total
1
2
3
4
51
26
27
15
25
15
67
4
7
1
16
3
2
1
21
44
85
43
131
66
Total
119
111
27
68
325
Technology Score (X):
1
2
3
4
Hunting and gathering
Simple horticultural: plant cultivation
Advanced horticultural: plant cultivation +
metallurgy
Agrarian: plant cultivation + metallurgy +
plow
Monotheism Score (Y):
1
2
3
4
No High Gods
High Gods, but not active in human affairs
High Gods, active in human affairs but do
not support human morality
High Gods, active in human affairs and support human morality
82 / 114
Correlation of Two Random Variables
Subsistence Technology & Monotheism
É
Table 19. Joint Probability
Distribution of Technology Score
(X) & Monotheism Score (Y)
From the marginal
probabilities first calculate
É
É
Y\X
1
2
3
4
Total
1
2
3
4
0.157
0.080
0.083
0.046
0.077
0.046
0.206
0.012
0.022
0.003
0.049
0.009
0.006
0.003
0.065
0.135
0.262
0.132
0.403
0.203
Total
0.366
0.342
0.083
0.209
1.000
É
É
µX = 2.548
σX2 = 1.177
and σX = 1.085
And also
É
É
É
µY = 2.135
σY2 = 1.268
and σY = 1.126
83 / 114
Correlation of Two Random Variables
Subsistence Technology & Monotheism
Table 20. Correlation of Technology
Score (X) & Monotheism Score (Y)
É
X
Y
pij
zX
zY
zX zY pij
1
1
1
1
2
2
2
2
3
3
3
3
4
4
4
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
0.157
0.077
0.022
0.006
0.080
0.046
0.003
0.003
0.083
0.206
0.049
0.065
0.046
0.012
0.009
0.135
−1.427
−1.427
−1.427
−1.427
−0.505
−0.505
−0.505
−0.505
0.417
0.417
0.417
0.417
1.338
1.338
1.338
1.338
−1.008
−0.120
0.768
1.656
−1.008
−0.120
0.768
1.656
−1.008
−0.120
0.768
1.656
−1.008
−0.120
0.768
1.656
0.226
0.013
−0.024
−0.014
0.041
0.003
−0.001
−0.003
−0.035
−0.010
0.016
0.045
−0.062
−0.002
0.009
0.299
Total
ρXY = 0.500
É
Correlation is calculated
by “unrolling” the cells of
the table row by row &
summing up zX zY pij over
all the cells
The correlation of
technology with
monotheism is
ρXY = 0.500, a moderate
positive correlation
É
More technologically
advanced
(preindustrial)
societies tend to have
more monotheistic
beliefs
84 / 114
Correlation of Two Random Variables
Dr. Grossgrabenstein’s Misfortunes
85 / 114
Linear Functions of Random Variables
Rules for Means
1. If X is a random variable and a and b are fixed numbers, then
µa+bX = a + bµX
2. If X and Y are random variables, then
µX+Y = µX + µY
É
Example 1. The mean temperature in Chapel Hill on 5 October is
75◦ F (X), what is it in degrees Celsius (Y)?
µY =
É
7
5
9
(µX − 32) =
5
9
(75 − 32) = 23.9◦ C
Example 2. Inspection of newly made refrigerators for surface
defects finds an average of 0.7 dimples and 1.4 sags per
refrigerator.7 What is the mean of the total number of defects
(number of dimples + number of sags)? If µX = 0.7 and µY = 0.4
then
µX+Y = µX + µY = 0.7 + 1.4 = 2.1 defects
Example from Moore & McCabe 2006 p.298
86 / 114
Linear Functions of Random Variables
Rules for Variances
É
Caution:
É
É
É
É
É
The mean of a sum of random variables is always the sum of
their means
But: this rule is true for variances only in special situations –
when X and Y are independent
When random variables are not independent, the variance of
their sum depends on the correlation between them as well as
their individual variances
The correlation ρ is a number between 0 and 1; it measures
the direction and strength of the linear relationship between
two variables
The correlation between two independent variables is zero
87 / 114
Linear Functions of Random Variables
Rules for Variances
1. If X is a random variable and a and b are fixed numbers, then
2
σa+bX
= b2 σX2
2. If X and Y are independent random variables, then
2
σX+Y
= σX2 + σY2
2
σX−Y
= σX2 + σY2
Caution!
This is the addition rule for variances of independent random
variables
3. If X and Y have correlation ρ, then
2
σX+Y
= σX2 + σY2 + 2ρσX σY
2
σX−Y
= σX2 + σY2 − 2ρσX σY
This is the general addition rule for variances of random
variables
88 / 114
Linear Functions of Random Variables
Example of Rules for Variances: Total SAT Scores8
É
É
Total SAT score is the sum of the math score (X) and verbal
score (Y). In a recent year the means and standard deviations
of the scores were
SAT math score X
µX = 519 σX = 115
SAT verbal score Y
µY = 507 σY = 111
The total SAT score is X + Y. The mean is
µX+Y = µX + µY = 519 + 507 = 1026
É
8
The variance and standard deviation cannot be computed with
the information given, because X and Y are not independent –
we need to know the correlation between them
Example from Moore & McCabe 2006, p.303–304
89 / 114
Linear Functions of Random Variables
Example of Rules for Variances: Total SAT Scores (cont’d)
É
The correlation between math and verbal scores was
ρ = 0.71. By variances Rule 3:
2
σX+Y
= σX2 + σY2 + 2ρσX σY
= (115)2 + (111)2 + (2)(0.71)(115)(111)
= 43, 672
É
The variance of the sum is greater than the sum of the
variances, because X and Y move up and down together
É
Finally find the standard deviation
p
σX+Y = 43, 672 = 209
É
To sum up the total SAT score has mean µX+Y = 1026 and
standard deviation σX+Y = 209
90 / 114
Jacob Bernouilli 1st (Basel 1654–1705)
Family Were Refugees from Antwerp; Ars Conjectandi published 1713)
91 / 114
Jacob Bernouilli 1st (older)
92 / 114
Jacob Bernouilli 1st (stamp)
Note implausible sequence of proportions (see later)
93 / 114
Law of Large Numbers (1)
É
Jacob Bernouilli contributed to the discovery of a major
phenomenon of probability, the Law of Large Numbers:
É
É
É
In the long run, the proportion of a certain outcome of a
random trial (say, head turns up when tossing a coin) will tend
to stabilize to a stable value
But outcome of one trial is independent of previous outcomes
This is counterintuitive:
É
É
People naturally tend to believe in a sort of Law of Small
Numbers
People do not normally expect the long “runs” of the same
outcome (say, heads in tossing a coin) that occur in true
random processes
94 / 114
Law of Large Numbers (2)
Two simulations of tossing a fair coin 5,000 times
0.6
0.4
0.2
0.0
Proportion of Heads
0.8
1.0
Law of Large Numbers
1
5
10
50
500
5000
Number of Tosses
Simulation of Coin Tosses
95 / 114
Sampling Distributions Revisited
Population & Sample
É
The population distribution of a variable is:
1. the distribution of its values for all members of the population
É
E.g., the distribution of IQ test scores in the Belgian population
2. the probability distribution of the variable when choosing one
individual at random from the population.
É
E.g., choose one Belgian randomly and record the IQ
É
A statistic (e.g., x, p̂, b1 ) calculated from a random sample or
randomized experimental group is a random variable
É
The probability distribution of a statistic is its sampling
distribution
In remainder of Module 6 we look at the sampling
distributions of:
É
É
É
counts & proportions
sample means
96 / 114
Binomial Distributions
Count X & Proportion p̂
É
In general X is a count of the occurrence of some outcome in a
fixed number of observations n
É
É
E.g., in an agricultural experiment n plants are treated for a
fungus; the number X of plants with the fungus is a random
variable
The sample proportion is p̂ = X/n
É
E.g., in the experiment X = 9 out of n = 32 plants have the
fungus. The sample proportion is
p̂ =
É
9
32
= 0.281
The binomial setting is:
1. There are a fixed number n of observations
2. The n observations are all independent
3. Each observation can be classified as “success” (1) or “failure”
(0)
4. The probability p of a success is the same for each observation
97 / 114
Binomial Distributions
Binomial Distribution
É
The distribution of the count X of successes in the binomial
setting is called the binomial distribution with parameters n
(number of observations) and p (probability that any one
observation is a success)
É
É
É
The possible values of X are the positive integers from 0 to n
In abbreviation, one says that X is B(n, p)
E.g., a child of a specific couple has probability p = 0.25 of
being blood type O. Suppose the couple has n = 5 children.
Then the number X of their children with blood type O is
distributed as B(5, 0.25)
É
É
Possible values of X are 0, 1, 2, 3, 4, 5
The probability distribution of X is (see why later)
X:
P(X = x) :
0
0.2373
1
0.3955
2
0.2637
3
0.0879
4
0.0146
5
0.001
98 / 114
Binomial Distributions
Binomial Distribution
É
Choosing an SRS (without replacement) from a population
with proportion p of successes is not exactly a binomial setting
É
É
E.g., draw 10 cards from a deck, with “red card” a success.
Then probability of red on second card is not independent of
color of first card
However, if the population is much larger than the sample –
say, 20 times as large – the count X of successes in an SRS of
size n has approximately the binomial distribution B(n, p)
É
E.g., draw sample with n = 200 from about 8,000 graduate
students at UNC. “Success” is: student is female. Suppose
p = 0.57. Then number of females X is distributed (almost
exactly) as B(200, 0.57)
99 / 114
Binomial Distributions
Finding Binomial Probabilities (1)
1. Calculator on the Web
É
http://rockem.stat.sc.edu/prototype/calculators/index.php3
2. Table of Binomial Probabilities
É
E.g., Table C in Moore & McCabe (2006)
3. Software – R
É
Finding P(X = x)
> # P(exactly 2 children out of 5 with type O blood)
> dbinom(2,5,0.25)
[1] 0.2636719
É
Finding P(X ≤ x)
> # P(2 or fewer children out of 5 with type O blood)
> pbinom(2,5,0.25)
[1] 0.8964844
100 / 114
Binomial Distributions
Finding Binomial Probabilities (2)
4. Software – Stata
É
Finding P(X = x)
. * P(exactly 2 children out of 5 with type O blood)
. display Binomial(5,2,0.25) - Binomial(5,3,0.25)
.26367188
É
Finding P(X ≤ x)
. * P(2 or fewer children out of 5 with type O blood)
. display 1 - Binomial(5,3,0.25)
.89648438
É
Note: In Stata the function Binomial(n,k,p) returns
P(X ≥ x). It has to be spelled with capital B.
101 / 114
Binomial Distributions
Finding Binomial Probabilities (3)
5. Using the Binomial Formulas (Optional; see Moore & McCabe
2006, pp.348–350)
É
Binomial Coefficient – The number of ways of arranging k
successes among n observations is given by the binomial
coefficient
n!
n
=
k
k!(n − k)!
for k = 0, 1, . . . , n. In the formula the factorial n! for any
positive integer is defined as
n! = n × (n − 1) × (n − 2) × . . . × 2 × 1
É
and also 0! = 1.
Binomial Probability – If X has distribution B(n, p), the
binomial probability that X = k (for k = 0, 1, . . . , n) is
n k
P(X = k) =
p (1 − p)n−k
k
102 / 114
Binomial Distributions
Origin of the Binomial Formula
É
Origin of the binomial formula
103 / 114
Binomial Distributions
Binomial Mean & Standard Deviation
É
É
If a count X is B(n, p), what are the mean µX and the standard
deviation σX of X?
To find out view X as the sum of n independent random
variables Si . Each Si has the same probability distribution
Outcome:
Probability:
É
1
p
0
1−p
For a single Si (which, BTW, is called a Bernouilli trial)
µS = (1)(p) + (0)(1 − p) = p
σS2 = p(1 − p)
É
Then for X = S1 + S2 + · · · + Sn
µX = µS1 + µS2 + · · · + µSn = nµS = np
σX2 = nσX2 = np(1 − p)
104 / 114
Binomial Distributions
Mean & Standard Deviation of Count & Proportion
É
If a count X has the binomial distribution B(n, p), then
µX = np
p
σX = np(1 − p)
É
Our estimator of the proportion p of “successes” in the
population is the sample proportion
p̂ =
É
count of successes in sample
size of sample
X
n
If p̂ is the sample proportion of successes in an SRS of size n
from a large population with proportion p of successes9
µp̂ = p
r
σp̂ =
9
=
p(1 − p)
n
Check this follows from the rules for linear functions of random variables.
105 / 114
Binomial Distribution
Normal Approximation of Counts & Proportions
É
Implications of mean and standard deviation of p̂
1. µp̂ = p implies p̂ is unbiased
q
p(1−p)
2. σp̂ =
implies that to divide the standard deviation of p̂
n
by half one must multiply n by four
É
Normal approximation for counts & proportions:
É
In an SRS of size n from a large population, when n is large
p
X is approximately N np, np(1 − p)
r
!
p(1 − p)
p̂ is approximately N p,
n
where p is the proportion of successes in the population, and X
and p̂ = X/n are the count & proportion of successes in the
sample, respectively
106 / 114
Binomial Distribution
Normal Approximation of Counts & Proportions
É
Rule of thumb for normal approximation: n & p satisfy
É
É
np ≥ 10, and
n(1 − p) ≥ 10
107 / 114
Binomial Distribution
Normal Approximation of Counts & Proportions
É
E.g., SRS of n = 200 from population of 8,000 UNC graduate
students with proportion females p = .57. What is P(p̂ ≤ 0.5)
(i.e., the sample has fewer females than males)?
É
É
É
np = 200 × 0.57 = 114 > 10 and
n(1 − p) = 200 × 0.43 = 86 > 10 so rule of thumb is satisfied
Using binomial probabilities: X is distributed as B(200, 0.57).
p̂ = 0.5 correspond to X = 100. P(X ≤ 100) = 0.02734091 or
.027.
Using the normal approximation: µp̂ = p = 0.57;
q
p(1−p)
= 0.03500714;
σp̂ =
n
P(p̂ ≤ 0.5) = P
p̂ − 0.57
≤
0.5 − 0.57
0.03500714
0.03500714
= P(Z ≤ −1.999592) = 0.02277217
108 / 114
Sampling Distribution of the Sample mean
Experimental Study of the Sampling Distribution of x with n = 3, n = 10, n = 100
É
É
(a) Population
distribution of X
(income)
Distribution of x
for 600 samples:
É
É
É
(b) n = 3
(c) n = 10
(d) n = 100
109 / 114
Sampling Distribution of the Sample mean
Experimental Study of the Sampling Distribution of x with n = 3, n = 10, n = 100
É
Income Sampling
Experiment
Data
Population
n=3
n = 10
n = 100
Mean
SD
22.172
22.584
21.955
22.176
15.635
9.376
4.916
1.193
The experimental results suggest the
following conjectures:
1. The distribution of values of x for a SRS is
centered around the population mean µX ,
regardless of sample size
2. The standard deviation σx of values of x
decreases with increasing sample size – i.e.,
as n increases the distribution of x values
becomes more concentrated around the
population mean µX
3. The distribution of x values becomes more
symmetrical as the sample size becomes
larger and is approximately normal for large
ns
110 / 114
Sampling Distribution of the Sample mean
Theoretical Development: Mean & Standard Deviation of x
É
The mean x of a SRS is a random variable
x=
É
n
(X1 + X2 + · · · + Xn )
If the population has mean µ then by the addition rule for a
sum of random variables
µx =
=
É
1
1
n
1
n
(µX1 + µX2 + · · · + µXn )
(µ + µ + · · · + µ) = µ
Thus the mean of x is µ, the same as the mean of the population
É
I.e., x is an unbiased estimator of µ
111 / 114
Sampling Distribution of the Sample mean
Theoretical Development: Mean & Standard Deviation of x
É
É
Because the observations Xi are independent, by the addition
rule for variances
2
1
2
(σX2 + σX2 + · · · + σX2 )
σx =
1
2
n
n
2
1
(σ2 + σ2 + · · · + σ2 )
=
n
σ2
=
n
Thus for an SRS of size n from population with mean µ and
standard deviation σ
µx = µ
σ
σx = p
n
112 / 114
Sampling Distribution of the Sample mean
Experimental Study of the Sampling Distribution of x with n = 3, n = 10, n = 100
Table 21. Income Sampling Experiment Results and Theoretical
Values Compared (600 Samples)
Data
Population
n=3
n = 10
n = 100
Mean
SD
µx
σx
Fpcf10
σx∗ 11
22.172
22.584
21.955
22.176
15.635
9.376
4.916
1.193
—
22.172
22.172
22.172
—
9.027
4.944
1.564
—
0.994
0.980
0.781
—
8.974
4.846
1.221
p
Finite population correction factor 1 − Nn with N = 256 and n = 3, 10, 100
p
p
11
Finite population corrected standard error σx∗ = σX / n × 1 − Nn
10
113 / 114
Sampling Distribution of the Sample mean
Why Does the Distribution of x Become Normal When n Increases?
É
In income sampling experiment:
É
É
É
This is due to a very important natural phenomenon, called
the Central Limit Theorem:
É
É
É
The distribution of income in the population is not normal (it
is skewed to the right)
Even so, the distribution of sample mean x becomes symmetric
& “normal-looking” when n increases
Draw an SRS of size n from any population with mean µ and
finite standard deviation σ. When n is large, the sampling
distribution of the sample mean x is approximately normal, so
that
σ
x is approximately N µ, p
n
The normal approximation for sample proportions & counts is
also an instance of the CLT
Special case: the mean of an SRS from a normal population is
also normally distributed (for any n)
114 / 114