Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Soci708 – Statistics for Sociologists Module 6 – Probability & Sampling Distributions1 François Nielsen University of North Carolina Chapel Hill Fall 2008 1 Adapted in part from slides for the course Quantitative Methods in Sociology (Sociology 6Z3) taught at McMaster University by Robert Andersen (now at University of Toronto) 1 / 114 Why Study Probability? Probability is essential for several reasons É É We calculate statistics (e.g., mean, median, variance, regression coefficients, etc.) from a sample of units drawn at random from a population Thus the statistic calculated from the sample is the result of a random process É In fact, it is a random variable É Thus we need to study probability to figure out how sample statistics relate to population parameters É Also, the notion of conditional probability is essential to understanding association between variables 2 / 114 Origins of Probability Theory (1) Origin in gambling circles in 17th Century France É Gambler Chevalier de Méré contacted mathematical friends Blaise Pascal and Pierre de Fermat with gambling questions Blaise Pascal (1623–1662) 3 / 114 Origins of Probability Theory (2) É Subsequent correspondence between Pascal and Fermat is the origin of probability theory Pierre de Fermat (1601–1665) 4 / 114 Origins of Probability Theory (3) Other portrait of Fermat (younger) 5 / 114 Random Trials, Sample Spaces, and Events Random Trials É É A random trial is an activity in which there are two or more different possible outcomes, and uncertainty exists in advance as to which outcome will occur. Examples: É É É É Throw one standard die Throw two standard dice at the same time Draw one society at random from a set of 325 societies in the Ethnographic Atlas cross-classified by beliefs in high gods and by subsistence technology Draw one childbirth at random from a set of childbirths to Pima mothers cross-classified according to diabetic status of mother and presence of one or more birth defects in the newborn (next slide) 6 / 114 Random Trials (2) É An example of random trial is drawing a birth at random from the 1207 births represented in the following table: Table 1. Child Birth Defects by Mother’s Diabetic Status Among Pima Indian Mothers Data É B1 : 1+ Defect B2 : No Defect Total A1 : Nondiabetic A2 : Prediabetic A3 : Diabetic 31 13 9 754 362 38 785 375 47 Total 53 1154 1207 Various proportions calculated from the table can be interpreted as probabilities of drawing a case with certain characteristics 7 / 114 Random Trials (3) É The cells of the contingency table contain the joint frequencies, or counts corresponding to each combination of categories. É The row marginal frequencies (in the last column marked Total) and the column marginal frequencies (in the last row marked Total) are calculated by adding up the cell frequencies within the corresponding row and column, respectively É The row marginals and the column marginals both add up to N, the total number of cases (1207 in Table 1) É We saw all this earlier 8 / 114 Probability Models (1) É The different possible outcomes of a random trial are called the basic outcomes; the set of all basic outcomes is called the sample space, e.g., É The sample space associated with throwing a standard die is {1 dot, 2 dots, 3 dots, 4 dots, 5 dots, 6 dots} É The sample space associated with throwing 2 standard dice is . . . Q – What is it? The sample space associated with drawing a birth at random from the contingency table is depicted in Table 2, where each Oi symbolizes a basic outcome of the random trial É É For example O3 represents a birth where the mother is diagnosed as prediabetic and the child is found to have one or more birth defects 9 / 114 Probability Models (2) Table 2. Sample Space for the Random Trial, Selecting a Birth at Random Data A1 : Nondiabetic A2 : Prediabetic A3 : Diabetic É B1 : 1+ Defect B2 : No Defect O1 O3 O5 O2 O4 O6 The sample space can be univariate (as in the throw of one standard die), bivariate (as in Table 2), or multivariate 10 / 114 Probability Models (3) É An event is a subset of the basic outcomes that constitute the sample space É É An event is said to occur if any one of its basic outcomes is realized in the random trial Notation (refer to Table 2): É É É É Oi ∈ E, where Oi is an outcome and E is an event means “basic outcome Oi belongs to event E” E = {O1 , O3 , O6 } where E is an event and the Oi are basic outcomes means “event E consists of basic outcomes O1 , O3 and O6 }” E = {Oi |B=B2 } means “event E consists of all basic outcomes Oi such that B is equal to B2 ”; thus E consists of O2 , O4 , and O6 Ø = {} means the null event that consists of no outcome 11 / 114 Probability Models (4) Complementation, Addition, & Intersection of Events (NWW Figure 4.4 p. 113) 12 / 114 Probability Models (5) Complementation, Addition, & Intersection of Events É É The set of all basic outcomes not contained in an event E is called the complementary event to E and is denoted by Ec The joint occurrence of two events E1 and E2 is another event, denoted E1 ∩ E2 , that consists of the basic outcomes common to E1 and E2 É É É The occurrence of event E1 or event E2 is another event, to be denoted by E1 ∪ E2 , that consists of all basic outcomes contained in either E1 or E2 or in both E1 and E2 É É É E1 ∩ E2 is also called the intersection of E1 and E2 E1 ∩ E2 is equivalent to E1 and E2 E1 ∪ E2 is called the union of E1 and E2 E1 ∪ E2 is equivalent to E1 or E2 If events E1 and E2 have no basic outcomes in common, they are said to be mutually exclusive or disjoint events 13 / 114 Probability Andrey Nikolaievitch Kolmogorov (Tambov 1908–Moskow 1987) provided axiomatic foundations of probability in 1933 Younger Older as academician 14 / 114 Probability Rules (1) Probability Rules – The first three rules are called Kolmogorov’s Axioms 1. The probability P(A) of any event A is a number between 0 and 1, i.e. 0 ≤ P(A) ≤ 1 2. The probability that some basic outcome in the sample space will occur is 1 and the probability that none will occur is 0, i.e. P(S) = 1 and P(;) = 0 3. Addition rule for disjoint events. If two events A and B are mutually exclusive or disjoint (i.e., they have no outcomes in common and so can never occur together), P(A ∪ B) = P(A) + P(B) Generally any countable sequence of pairwise disjoint events E1 , E2 , . . . satisfies X P(E1 ∪ E2 ∪ · · · ) = P(Ei ) i 15 / 114 Probability Rules (2) Probability Rules – The first three rules are called Kolmogorov’s Axioms 4. Complement rule. If Ac denotes the complement of an event A (i.e., the event that A does not occur), the rule states that P(Ac ) = 1 − P(A) 16 / 114 Meaning of Probability Objective & Subjective Approaches to Probability É É Example: What is the probability of rain in Chapel Hill on 10 September? Interpretations: 1. Objective Interpretation: Probability is associated with the relative frequency of occurrence of the event in the long run under constant causal conditions; applies only to repeatable events. Also, probability by construction (e.g., in manufacturing dice) 2. Subjective Interpretation: Based on personal assessment; What is the probability that Napoleon was poisoned? Applies also to non-repeatable events. 17 / 114 Probability Expressed as Odds 1. From Probability to Odd: É É É A is the event “My paper is rejected by Social Forces” Suppose P(A) = .8; then P(Ac ) = 1 − P(A) = .2 Then Odd(A) = P(A)/P(Ac ) = .8/.2 = “4 to 1” (the ratio is often expressed using integers) 2. From Odd to probability: É É É É Suppose the odd of A is “d1 to d2 ” (where d1 and d2 are integers). Then P(A) = d1 /(d1 + d2 ). E.g., suppose the odd of dying next year (for a given age and sex category) is 1 to 499 Then P(dying next year) = 1/(1 + 499) = 1/500 = .002 18 / 114 Probability Distributions Concept of Probability Distribution É É A probability distribution is the assignment of probabilities to each of the basic outcomes in the sample space In the Pima example, the univariate or marginal probability distributions associated with drawing a birth at random from the contingency table of birth defects by mother’s diabetic status are shown in Table 3a & Table 3b É É For example, the probability of drawing a prediabetic mother is .311; it is obtained by dividing the number of prediabetic mothers by the total number of cases Other univariate probabilities are obtained in the same way, by dividing marginal frequencies by N. 19 / 114 Table 3a. Univariate (Marginal) Distribution of Mother’s Diabetic Status Among Pima Indian Mothers (N=1207) Outcome Probability Symbol A1 : Nondiabetic A2 : Prediabetic A3 : Diabetic 0.650 0.311 0.039 P(A1 ) P(A2 ) P(A3 ) Total N 1.000 1207 P(S) 20 / 114 Table 3b. Univariate (Marginal) Distribution of Birth Defects Among Births to Pima Indian Mothers (N=1207) Outcome Probability Symbol B1 : 1+ Defect B2 : No Defect 0.044 0.956 P(B1 ) P(B2 ) Total N 1.000 1207 P(S) 21 / 114 É The bivariate (or, generally, joint or multivariate) probability distribution associated with drawing a birth at random from the contingency table is shown in Table 4. É For example the probability of drawing a case for which the mother is nondiabetic and the child has no defect is .625 É This and other probabilities in Table 4 are obtained by dividing cell frequencies in Table 1 by N (1207). É Probabilities within the table (not counting the marginal frequencies) add up to 1. 22 / 114 Table 4. Bivariate (Joint) Probability Distribution for Diabetic Status of Mother (A) and Birth Defect (B) (N=1207) Data B1 : 1+ Defect B2 : No Defect Total A1 : Nondiabetic A2 : Prediabetic A3 : Diabetic 0.026 0.011 0.007 0.625 0.300 0.031 0.650 0.311 0.039 Total 0.044 0.956 1.000 23 / 114 Joint & Marginal Probability Distributions É É The joint or bivariate probability distribution corresponds to probabilities of the outcomes consisting in the intersections of the categories of the two variables Thus the joint probability distribution of Table 4 can be represented in symbolic form as in Table 5 – each cell corresponds to the probability of joint occurrence, or intersection of two univariate events É É For example the probability .300 associated with drawing a case with a prediabetic mother and a child with no defect corresponds to the probability P(A2 ∩ B2 ) of the joint occurrence of A2 (Prediabetic) and B2 (No Defect) The marginals of Tables 4 and 5 represent the probabilities of the univariate events Ai and Bj , same as in Tables 3a and 3b. 24 / 114 Table 5. Bivariate (Joint) Probability Distribution for Diabetic Status of Mother (A) and Birth Defect (B) in Symbolic Form Data A1 : Nondiabetic A2 : Prediabetic A3 : Diabetic Total B1 : 1+ Defect B2 : No Defect Total P(A1 ∩ B1 ) P(A2 ∩ B1 ) P(A3 ∩ B1 ) P(A1 ∩ B2 ) P(A2 ∩ B2 ) P(A3 ∩ B2 ) P(A1 ) P(A2 ) P(A3 ) P(B1 ) P(B2 ) P(S) 25 / 114 Marginal Probabilities É The marginal probabilities for a bivariate probability distribution are found as X P(Ai ) = P(Ai ∩ Bj ) j P(Bj ) = X P(Ai ∩ Bj ) i where the sums are over all events Bj and Ai , respectively É For example, in Table 4 the marginal probability P(A1 ) is equal to .650 and is equal to the sum of the probabilities in the entries of the first row (.026 + .625 = .650, rounded) 26 / 114 Conditional Probabilities É É If E1 and E2 are any two events and P(E2 ) is not equal to 0, the conditional probability of E1 given E2 , denoted P(E1 |E2 ), is defined as P(E1 ∩ E2 ) P(E1 |E2 ) = P(E2 ) For example, the probability that a child has birth defects (B1 ) given that the mother is diabetic (A3 ) is obtained from figures in Table 4 as P(B1 |A3 ) = P(B1 ∩ A3 ) P(A3 ) = .007 .039 = .191 27 / 114 Conditional Probability Distributions É É The conditional probability distribution of a variable B (e.g., presence of birth defects) given the value of another variable A (e.g., a certain diabetic status of the mother) assigns to each basic outcome of B its conditional probability, given the value of A The conditional probability distributions of birth defects given diabetic status of the mother are shown in Table 6 É É The rows of Table 6 correspond to the conditional distributions of B corresponding to a given value of A (A1 , A2 , or A3 ) I.e., there are three conditional distributions of B, one for each value of A 28 / 114 Table 6. Conditional Probability Distributions of Birth Defect Given Diabetic Status of Mother Among Pima Indian Mothers (N=1207) Data A1 : Nondiabetic A2 : Prediabetic A3 : Diabetic B1 : 1+ Defect B2 : No Defect Total N 0.039 0.035 0.191 0.961 0.965 0.809 1.000 1.000 1.000 785 375 47 29 / 114 Conditional Distributions (3) Conditional Distribution & Patterns of Association É The conditional distributions of the response variable (here birth defects) given categories of the explanatory variable (here diabetic status of mother) reveals the association between response and explanatory variables É É É Causal patterns are revealed by comparing the conditional distributions of the response variable across different categories of the explanatory variable (rows in this case) E.g. we note that children born to diabetic mothers have a much greater chance of birth defects (19.1%) than children born to nondiabetic mothers (3.9%) or prediabetic mothers (3.5%) In published tables it is customary to use percentages (rather than probabilities) as in Table 7 30 / 114 Table 7. Conditional Probability Distributions of Birth Defect Given Diabetic Status of Mother, in Percentage Form (N=1207) Data A1 : Nondiabetic A2 : Prediabetic A3 : Diabetic B1 : 1+ Defect B2 : No Defect Total N 3.9 3.5 19.1 96.1 96.5 80.9 100.0 100.0 100.0 785 375 47 31 / 114 Causal Ordering Assumption É É Comparing conditional distributions to reveal patterns of association requires the assumption of a causal ordering: one variable is assumed dependent (the response), the other independent (the cause) Conditional distributions in the other direction are not necessarily meaningful, except to describe the composition of categories of birth outcomes in terms diabetic status of mother, as in Tables 8 (probabilities) and 9 (percentages) É É On what substantive basis would researchers decide which variable is the cause and which the effect in this case, i.e. that they should look at Tables 6 or 7 rather than Tables 8 or 9? Is the causal ordering A → B the only one possible in this case? 32 / 114 Table 8. Conditional Probability Distribution of Diabetic Status of Mother Given the Presence or Absence of Birth Defect Among Pima Indian Mothers (N=1207) Data B1 : 1+ Defect B2 : No Defect A1 : Nondiabetic A2 : Prediabetic A3 : Diabetic 0.585 0.245 0.170 0.653 0.314 0.033 Total N 1.000 53 1.000 1154 33 / 114 Table 9. Conditional Probability Distribution of Diabetic Status of Mother Given the Presence or Absence of Birth Defect, in Percentage Form (N=1207) Data A1 : Nondiabetic A2 : Prediabetic A3 : Diabetic Total N B1 : 1+ Defect B2 : No Defect 58.5 24.5 17.0 65.3 31.4 3.3 100.0 53 100.0 1154 34 / 114 Conventional Presentation É Given a choice of response and explanatory variables, it is conventional in sociology to arrange published tables with É É É É categories of the explanatory variable as the rows categories of the response variable as the columns percentages calculated within rows, as in Table 7 so that associations appear from comparison of rows É But convention may be different in other fields! É In the end the important principle is to percentage within categories of the explanatory variable 35 / 114 More Probability Rules (1) 5. General Addition Rule. For any two events E1 and E2 P(E1 ∪ E2 ) = P(E1 ) + P(E2 ) − P(E1 ∩ E2 ) The reason for removing the probability of the intersection is to correct for the “double counting” of basic outcomes that are included in both E1 and E2 . É É E.g., in the Pima example P(A3 ∪ B2 ) = P(A3 ) + P(B2 ) − P(A3 ∩ B2 ) (1) = .403 + .342 − .206 = .539 (2) For any two mutually exclusive events E1 and E2 P(E1 ∪ E2 ) = P(E1 ) + P(E2 ) É which is obvious because disjoint events have no outcomes in common so their intersection is empty and has probability zero Generally, for n mutually exclusive events E1 , E2 , . . . , En P(E1 ∪ E2 ∪ . . . ∪ En ) = P(E1 ) + P(E2 ) + . . . + P(En ) 36 / 114 More Probability Rules (2) 6. Multiplication Rule. For any two events E1 and E2 the probability of the joint occurrence of the events E1 ∩ E2 is the probability of occurrence of one of the events (say, E1 ) times the conditional probability of occurrence of the other event, given the first event, i.e. P(E1 ∩ E2 ) = P(E1 )P(E2 |E1 ) = P(E2 )P(E1 |E2 ) The multiplication theorem forms the basis of the representation of joint events as the limbs of a probability tree, as shown in the probability tree for the actual presence of AIDS given test result for 1988 data (next slide) 37 / 114 Probability Tree Probability tree of AIDS (disease present/absent) and test result (positive/negative) 38 / 114 Sensitivity & Specificity Important concepts in testing É In a situation of testing for presence of a disease É É É B1 (B2 ) are the events “the test is positive (negative)”, respectively A1 (A2 ) are the events “disease is present (absent)”, respectively The sensitivity of the test is the probability that the test is positive given that the person has the disease, i.e. sensitivity = P(B1 |A1 ) = .98 É The specificity of the test is the probability that the test is negative given that the person does not have the disease specificity = P(B2 |A2 ) = .99 É A substantively very important question in context is “What is the probability P(A1 |B1 ) that the disease is present given that the test in positive?” É The answer is given by Bayes’s Theorem (see later) 39 / 114 Relationships Between Variables Independent Events É Two events E1 and E2 are said to be independent if the probability that one event occurs is unaffected by whether or not the other event occur, i.e. if one of the following (equivalent) conditions holds 1. P(E1 |E2 ) = P(E1 ) or P(E2 |E1 ) = P(E2 ) 2. P(E1 ∩ E2 ) = P(E1 )P(E2 ) É É Condition 2 follows from condition 1 because of the multiplication theorem In general for n independent events E1 , E2 , . . . , En P(E1 ∩ E2 ∩ . . . ∩ En ) = P(E1 )P(E2 ) . . . P(En ) 40 / 114 Relationships Between Variables Independent Variables É Two categorical variables A (with categories Ai ) and B (with categories Bj ) are independent if P(Ai ∩ Bj ) = P(Ai )P(Bj ) É for all Ai and Bj I.e., A and B are independent if the joint probabilities are equal to the products of the marginal probabilitites for all combinations of categories For the Pima example the joint probabilities of birth defect by diabetic status of mother under the assumption of independence are shown symbolically in Table 10 and as numbers in Table 11 É The probabilities in Table 10 and 11 are counterfactual, since the two variables are not in fact independent! 41 / 114 Relationships Between Variables Joint Probabilitites Under Assumption of Independence Table 10. Joint Probability Distribution of Diabetic Status of Mother (A) and Presence of Birth Defects (B) Under Assumption of Independence, in Symbolic Form Data A1 : Nondiabetic A2 : Prediabetic A3 : Diabetic Total É B1 : 1+ Defect B2 : No Defect Total P(A1 )P(B1 ) P(A2 )P(B1 ) P(A3 )P(B1 ) P(A1 )P(B2 ) P(A1 )P(B2 ) P(A1 )P(B2 ) P(A1 ) P(A2 ) P(A3 ) P(B1 ) P(B2 ) P(S) Joint probabilities under the assumption of independence are calculated as the products of the marginal probabilities in corresponding row and column 42 / 114 Relationships Between Variables Joint Probabilitites Under Assumption of Independence Table 11. Joint Probability Distribution of Diabetic Status of Mother (A) and Presence of Birth Defects (B) Under Assumption of Independence (N=1207) Data B1 : 1+ Defect B2 : No Defect Total A1 : Nondiabetic A2 : Prediabetic A3 : Diabetic 0.029 0.014 0.002 0.622 0.297 0.037 0.650 0.311 0.039 Total 0.044 0.956 1.000 43 / 114 Relationships Between Variables Predicted Frequencies É É One calculates the predicted or expected frequencies under the assumption of independence by multiplying the predicted probabilities of Table 11 by the total number N of observations (1207 in the example) to obtain Table 12 Comparing Table 12 with Table 1 (original data) one notes that: É É É É Expected frequencies are not (necessarily) integers Expected cell frequencies sum up to the same row and column marginal totals as in the original table (Table 1) Discrepancies between Table 12 and Table 1 suggest that A and B are not independent But how does one tell if the discrepancies indicate non-independence? 44 / 114 Table 12. Expected Cell Frequencies Under Assumption of Independence Data A1 : Nondiabetic A2 : Prediabetic A3 : Diabetic Total B1 : 1+ Defect B2 : No Defect Total 34.5 16.5 2.1 750.5 358.5 44.9 785 375 47 53 1154 1207 Table 1 (Repeated for Comparison). Child Birth Defects by Mother’s Diabetic Status Among Pima Indian Mothers Data B1 : 1+ Defect B2 : No Defect Total A1 : Nondiabetic A2 : Prediabetic A3 : Diabetic 31 13 9 754 362 38 785 375 47 Total 53 1154 1207 45 / 114 Dependence Dependent Events & Dependent Variables É Two events are dependent if they are not independent.2 É Two variables are dependent if they are not independent.3 2 3 Warning! This will be on the test. This too will be on the test. 46 / 114 Dependence É The nature of the relationship between two dependent variables can be studied by comparing the conditions probability distributions for one variable (the dependent variable), conditional on each possible outcome of the other variable (the independent variable) É É E.g., Table 6 or 7 show the dependence between mother diabetic status and birth defects Comparing the conditional distributions of the dependent variable, conditional on the values of the independent variable(s), is a general strategy to reveal the nature of the dependence (association) between two variables É E.g., regression analysis (simple and multiple) consists in modeling the location (mean) of the distribution of a dependent variable Y as conditional on the values of one or several independent variables X 47 / 114 Dependence Chi-Squared Measure of Dependence É We use the notation É É É É Oij for the observed frequency corresponding to row i and column j (entries of Table 1) Eij for the corresponding expected frequency under assumption of independence (entries of Table 12) It stands to reason that the dependence between A and B is in some way proportional to the discrepancies between observed frequencies and those expected under the assumption of independence The most commonly used measure of dependence is the Pearson chi-squared statistic, denoted χ 2∗ ; it measures the discrepancy with the formula Chi-squared = χ 2∗ = X (Oij − Eij )2 Eij summing over all cells in the table 48 / 114 Dependence Chi-Squared Measure of Dependence É In words, it is the sum of the squared discrepancies between Oij and Eij , with each squared term divided by Eij É The components (Oij −Eij )2 Eij of the chi-squared statisitc are shown in Table 13 É É The cell A3 ∩ B1 (number of babies with birth defects born to diabetic mothers) corresponds to a large discrepancy The value of Chi-squared statistic in Table 13 (for the comparison of Table 1 and Table 12) is 25.511 É As explained later it is highly significant, in the sense that an overall discrepancy this large in unlikely to be due to chance 49 / 114 Dependence Chi-Squared Measure of Dependence Table 13. Components (Oij −Eij )2 Eij of Chi-squared Statistic for Child Birth Defects by Mother’s Diabetic Status Among Pima Indian Mothers Data A1 : Nondiabetic A2 : Prediabetic A3 : Diabetic Total É B1 : 1+ Defect B2 : No Defect 0.349 0.730 23.312 0.016 0.034 1.071 χ 2∗ = 25.511 We now drop momentarily the topic of chi-squared; we will come back to it later in the context of statistical inference 50 / 114 Bayes’s Theorem É É Example: AIDS test in 1988 (Mendehall 1991: 94–96) Events are denoted É É É É É disease is present disease is absent test is positive test is negative What we know: É É É É É A1 : A2 : B1 : B2 : P(A1 ) = .00001858 = 1.858 × 10−5 (prevalence of AIDS in entire population in 1988) P(A2 ) = 1 − P(A1 ) = .99998142 P(B1 |A1 ) = .98 (probability test is positive when disease is present) P(B1 |A2 ) = .01 (probability test is positive when disease is absent) P(A1 ) and P(A2 ) are called prior probabilities (i.e., prior to knowing the result of the test) 51 / 114 Bayes’s Theorem É The questions that Bayes’s Theorem tries to answer are É É What is P(A1 |B1 ) (probability that I have AIDS given that the test is positive)? What is P(A1 |B2 ) (probability that I have AIDS given that the test is negative)? É These are good questions! (Especially if you are the one taking the test.) É The answer is given by Bayes’s Theorem P(Ai )P(Bj |Ai ) P(Ai |Bj ) = P i P(Ai )P(Bj |Ai ) É We do not use this formula right away É It helps to first represent the problem as a probability distribution on a bivariate sample space, as in Table 14 52 / 114 Bayes’s Theorem Table 14. Joint Probability Distribution of Presence of Disease (A) and Test Result (B) for AIDS in 1988 Data A1 : Disease present A2 : Disease absent B1 : Test positive B2 : Test negative Total .00001821 .0099998142 .00000037 .9899816 .00001858 .99998142 .01002 .98998 1.000 Total É Steps (details next slide): 1. 2. 3. 4. Entries in black are given Calculate red entries as P(A1 ∩ B1 ) and P(A2 ∩ B1 ) Calculate green entries From information in completed table calculate desired probability as P(A1 |B1 ) = P(A1 ∩ B1 ) P(B1 ) = .00001821 .01002 = .0018174 53 / 114 Bayes’s Theorem Steps to Find P(A1 |B1 ) É We use the following steps: É É Step 1. At the outset we know the marginal probabilities P(A1 ) and P(A2 ) = 1 − P(A1 ), shown in black in the table. The goal is to fill the entire joint probability table with the corresponding probabilities. Step 2. Calculate P(A1 ∩ B1 ) = P(A1 )P(B1 |A1 ) = (.00001858)(.98) = .00001821 P(A2 ∩ B1 ) = P(A2 )P(B1 |A2 ) = (.99998142)(.01) = .0099998142 É These probabilities are shown in red in Table 14. They are calculated using the multiplication formula, as in the probability tree shown earlier. Step 3. Calculate the remaining entries of Table 15, by summing down the first column to find P(B1 ), then subtracting to find the other entries. The resulting figures are shown in green. 54 / 114 Bayes’s Theorem Steps to Find P(A1 |B1 ) É Step 4. Having the complete joint probability distribution, calculate the desired conditional probability “the other way around” as P(A1 |B1 ) = P(A1 ∩ B1 ) P(B1 ) = .00001821 .01002 = .0018174 Thus, in 1988 the probability of having the disease given that the test came up positive was only .0018174 (or about 2 in 1,000)! É É The probability P(A1 |B1 ) = .0018174 is called the posterior probability of A1 (i.e., after learning the result of the test) Compare with the prior probability P(A1 ) = 1.858 × 10−5 (i.e., before learning the result) 55 / 114 Bayes’s Theorem É Looking back at Bayes’s formula, P(Ai )P(Bj |Ai ) P(Ai |Bj ) = P i P(Ai )P(Bj |Ai ) it appears that is is really a shortcut for the strategy of first reconstructing the complete joint probability distribution from the known prior probabilities and conditional probabilities, and then using information in the table to calculate the desired conditional probability É In fact, the numerator is equal to P(Ai ∩ Bj ) (one of the joint probabilities) and the denominator is equal to the marginal probability P(Bj ), which is equal to the sum of the joint probabilities in the cells of the table. 56 / 114 Random Variables É A random variable is a variable whose value is a numerical outcome of a random phenomenon É É E.g. Tossing a coin with S={H,T} is a random event. Now assign H=1, T=0. Tossing a coin has become a random variable with S={0,1} E.g. Throw a die. “Number of pips on the face” is a random variable with S={1, 2, 3, 4, 5, 6} 57 / 114 Discrete Random Variables É A discrete random variable X has a finite number of possible values É The probability distribution of X lists all the values and their probabilities Value of X: Probability: É x1 p1 x2 p2 x3 p3 ... ... xk pk The probabilities pi must satisfy two requirements: 1. Every probability pi is a number between 0 and 1 2. p1 + p2 + · · · + pk = 1 58 / 114 Discrete Random Variable Reproductive Success Among Xavante Indians4 4 Daly and Wilson 1983, Figure 5-6 p. 89 59 / 114 Discrete Random Variable Reproductive Success of Female Xavante Table 15. Probability Distribution of Reproductive Success (Total # of Children) of Xavante Females x Frequency P(X = x) 0 1 .023 1 7 .159 2 7 .159 3 7 .159 4 7 .159 5 7 .159 6 4 .091 7 3 .068 8 1 .023 Total 44 1.000 É Table 15 shows the probability distribution of X for females, corresponding to the random trial “select a Xavante female randomly” É E.g., probability that the female has 6 or more children is P(≥ 6) = P(X = 6) + P(X = 7) + PX = 8) = .091 + .068 + .023 = .182 60 / 114 Discrete Random Variable Reproductive Success in Male Xavante É Table 16. Probability Distribution of Reproductive Success (Total # of Children) of Xavante Males x Frequency P(X = x) 0 1 2 3 4 5 6 7 9 11 23 4 12 14 7 7 6 2 7 1 1 1 .065 .194 .226 .113 .113 .097 .032 .113 .016 .016 .016 Total 62 1.000 Table 16 shows the probability distribution of X for males É Q – What is the relative social status of the male who has 23 children? 61 / 114 Discrete Random Variable Another Example: Toss of Four Coins É É É What is the probability distribution of variable X = “number of heads in four tosses of a coin” Same as the sum of the four outcomes with H=1, T=0 Assume: 1. The coin is balanced – each toss equally is likely to give H or T 2. The coin has no memory – tosses are independent É Then each sequence of tosses – e.g. HTHH – has probability P(HTHH) = É 1 2 × 1 2 × 1 2 × 1 2 = 1 16 Number of heads X has possible values 0, 1, 2, 3, 4 62 / 114 Discrete Random Variable Toss of Four Coins (cont’d) É The event {X = 1} can occur in only four ways: HTTT, THTT, TTHT, TTTH, so that P(X = 1) = = É count of ways X = 1 can occur 16 4 16 = 0.25 We can find the probability for each value of X. The resulting probability distribution of X is Value of X Probability 0 0.0625 1 0.25 2 0.375 3 0.25 4 0.0625 63 / 114 Discrete Random Variable Toss of Four Coins (cont’d) É Again, the probability distribution of X is Value of X Probability É 0 0.0625 1 0.25 2 0.375 3 0.25 4 0.0625 From it one can calculate the probability of various events É E.g., the probability of “three or more heads” is: P(X ≥ 3) = 0.250 + 0.0625 = 0.3125 É For the probability of “at least one head” it is easier to use the complement rule: P(X ≥ 1) = 1 − P(X = 0) = 1 − 0.0625 = 0.9375 64 / 114 Continuous Random Variables É É A continuous random variable X takes all values in an interval of numbers The probability distribution of X is described by a density curve such that 1. The probability of any event is the area under the density curve and above the values of X that make up the event 2. The total area under the density curve is 1 É In a continuous probability distribution only intervals have nonzero probability. Each individual outcome (precise value of X) has probability 0 É I.e., for any continuous distribution P(X = x) = 0 65 / 114 Continuous Random Variables Continuous Probability Model5 5 From Moore & McCabe 2006, Figure 4.10 p.284 66 / 114 Continuous Random Variables Uniform Probability Distribution6 É Continuous uniform distributions examples: É É 6 Spinner (“continuous roulette”) Uniform number generator in computer programs From Moore & McCabe 2006, Figure 4.9 p.283 67 / 114 Continuous Random Variables Normal Distributions É É N(µ, σ) denotes a normal distribution with mean µ and standard deviation σ If X is N(µ, σ) then the standardized variable Z= É (X − µ) σ is distributed as N(0, 1) We saw earlier how to use tables or software to calculate: É É given a quantile z (value of Z) calculate P(Z ≤ z) given a probability α, find z such that P(Z ≤ z) = α 68 / 114 Discovery of the Normal Distribution (1) Pierre Simon Laplace (1749-1827) 69 / 114 Discovery of the Normal Distribution (2) Carl Friedrich Gauss (1777-1855) – Younger 70 / 114 Discovery of the Normal Distribution (3) Carl Friedrich Gauss (1777-1855) – Older 71 / 114 Descendants of the Normal Distribution χ 2 (Chi-squared), t, and F Distributions É É At the end of 19th century and beginning of 20th century, the main development of statistics is taking place in Great Britain Karl Pearson, William Gosset (“Student”) and Ronald Fisher derive three families of distributions derived from N(0, 1) that arise naturally in statistical problems: É χ 2 (df ) is a sum of df squared zs É É t is a ratio of a z to the the square root of a χ 2 divided by its df É É Arises in the distribution of a sum of squared discrepancies Arises in the distribution of the sample mean divided by its standard error F is a ratio of two χ 2 s each divided by its df É Arises in the distribution of the ratio of two variances 72 / 114 Karl Pearson (1857–1936) 73 / 114 William Sealey Gosset a.k.a. “Student” (1876-1937) É William Gosset was working for the Guiness brewery in Dublin É Company policy did not allow employees to publish under their own name É He used the pen name “Student” to publish his famous paper on the t distribution in Biometrika in 1908 74 / 114 Sir Ronald Aylmer Fisher (1890-1962) 75 / 114 Characteristics of χ 2 , t, and F 76 / 114 Means & Variances of Random Variables É É The mean µX of a random variable X is similar to the average X but takes into account fact that events are not equally likely Suppose X is a discrete random variable with distribution Value of X: Probability: x1 p1 x2 p2 x3 p3 ... ... xk pk To find the mean of X, add the products of each possible value of X multiplied by its probability: µ X = x 1 p1 + x 2 p2 + . . . + x k pk X = x i pi É µX is also called the expected value or expectation of X, denoted E(X) 77 / 114 Means & Variances of Random Variables É The variance σX2 of a random variable X is similar to the sample variance s2 but takes into account fact that events are not equally likely, i.e. É É The variance is the average value of the squared deviations (X − µX )2 of the variable X from the mean µX Suppose X is a discrete random variable with distribution Value of X: Probability: x1 p1 x2 p2 x3 p3 ... ... xk pk and that µX is the mean of X. The variance of X is: σX2 = (x1 − µX )2 p1 + (x2 − µX )2 p2 + . . . + (xk − µX )2 pk X = (xi − µX )2 pi É The standard deviation σX of X is the square root of the variance σX2 78 / 114 Means & Variances of Random Variables Example: Mean µX & Variance σX2 of Number of Children of Xavante Women Table 17. µX & σX of Number of Children of Xavante Females xi pi xpi (xi − 3.591)2 pi 0 1 2 3 4 5 6 7 8 0.023 0.159 0.159 0.159 0.159 0.159 0.091 0.068 0.023 0.000 0.159 0.318 0.477 0.636 0.795 0.545 0.477 0.182 (0 − 3.591)2 (0.023) = 0.293 (1 − 3.591)2 (0.159) = 1.068 (2 − 3.591)2 (0.159) = 0.403 (3 − 3.591)2 (0.159) = 0.056 (4 − 3.591)2 (0.159) = 0.027 (5 − 3.591)2 (0.159) = 0.316 (6 − 3.591)2 (0.091) = 0.528 (7 − 3.591)2 (0.068) = 0.792 (8 − 3.591)2 (0.023) = 0.442 1.00 µX = 3.591 σX2 = 3.924 Total É Thus µX = 3.591, σX2 = 3.924 & σX = p 3.924 = 1.981 79 / 114 Correlation of Two Random Variables É We saw earlier that the sample correlation rXY of two variable X and Y is the sum of the products of the standardized scores ZX and ZY divided by n − 1: rXY = É É 1 n−1 ZX ZY A similar formula exists for the correlation ρ between two random variables X and Y with joint probability distribution pij The formula is ρ= É X X zX zY pij The formula for the correlation is illustrated by an example of the relationship between subsistence technology X and Y belief centralization based on data from the data set Ethnographic Atlas, a theory apocryphally attributed to Dr. Grossgrabenstein 80 / 114 Correlation of Two Random Variables The Grossgrabenstein Theory: Subsistence Technology & Monotheism 81 / 114 Correlation of Two Random Variables Subsistence Technology & Monotheism Table 18. Joint Frequency Distribution of Technology Score (X) & Monotheism Score (Y) Y\X 1 2 3 4 Total 1 2 3 4 51 26 27 15 25 15 67 4 7 1 16 3 2 1 21 44 85 43 131 66 Total 119 111 27 68 325 Technology Score (X): 1 2 3 4 Hunting and gathering Simple horticultural: plant cultivation Advanced horticultural: plant cultivation + metallurgy Agrarian: plant cultivation + metallurgy + plow Monotheism Score (Y): 1 2 3 4 No High Gods High Gods, but not active in human affairs High Gods, active in human affairs but do not support human morality High Gods, active in human affairs and support human morality 82 / 114 Correlation of Two Random Variables Subsistence Technology & Monotheism É Table 19. Joint Probability Distribution of Technology Score (X) & Monotheism Score (Y) From the marginal probabilities first calculate É É Y\X 1 2 3 4 Total 1 2 3 4 0.157 0.080 0.083 0.046 0.077 0.046 0.206 0.012 0.022 0.003 0.049 0.009 0.006 0.003 0.065 0.135 0.262 0.132 0.403 0.203 Total 0.366 0.342 0.083 0.209 1.000 É É µX = 2.548 σX2 = 1.177 and σX = 1.085 And also É É É µY = 2.135 σY2 = 1.268 and σY = 1.126 83 / 114 Correlation of Two Random Variables Subsistence Technology & Monotheism Table 20. Correlation of Technology Score (X) & Monotheism Score (Y) É X Y pij zX zY zX zY pij 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 0.157 0.077 0.022 0.006 0.080 0.046 0.003 0.003 0.083 0.206 0.049 0.065 0.046 0.012 0.009 0.135 −1.427 −1.427 −1.427 −1.427 −0.505 −0.505 −0.505 −0.505 0.417 0.417 0.417 0.417 1.338 1.338 1.338 1.338 −1.008 −0.120 0.768 1.656 −1.008 −0.120 0.768 1.656 −1.008 −0.120 0.768 1.656 −1.008 −0.120 0.768 1.656 0.226 0.013 −0.024 −0.014 0.041 0.003 −0.001 −0.003 −0.035 −0.010 0.016 0.045 −0.062 −0.002 0.009 0.299 Total ρXY = 0.500 É Correlation is calculated by “unrolling” the cells of the table row by row & summing up zX zY pij over all the cells The correlation of technology with monotheism is ρXY = 0.500, a moderate positive correlation É More technologically advanced (preindustrial) societies tend to have more monotheistic beliefs 84 / 114 Correlation of Two Random Variables Dr. Grossgrabenstein’s Misfortunes 85 / 114 Linear Functions of Random Variables Rules for Means 1. If X is a random variable and a and b are fixed numbers, then µa+bX = a + bµX 2. If X and Y are random variables, then µX+Y = µX + µY É Example 1. The mean temperature in Chapel Hill on 5 October is 75◦ F (X), what is it in degrees Celsius (Y)? µY = É 7 5 9 (µX − 32) = 5 9 (75 − 32) = 23.9◦ C Example 2. Inspection of newly made refrigerators for surface defects finds an average of 0.7 dimples and 1.4 sags per refrigerator.7 What is the mean of the total number of defects (number of dimples + number of sags)? If µX = 0.7 and µY = 0.4 then µX+Y = µX + µY = 0.7 + 1.4 = 2.1 defects Example from Moore & McCabe 2006 p.298 86 / 114 Linear Functions of Random Variables Rules for Variances É Caution: É É É É É The mean of a sum of random variables is always the sum of their means But: this rule is true for variances only in special situations – when X and Y are independent When random variables are not independent, the variance of their sum depends on the correlation between them as well as their individual variances The correlation ρ is a number between 0 and 1; it measures the direction and strength of the linear relationship between two variables The correlation between two independent variables is zero 87 / 114 Linear Functions of Random Variables Rules for Variances 1. If X is a random variable and a and b are fixed numbers, then 2 σa+bX = b2 σX2 2. If X and Y are independent random variables, then 2 σX+Y = σX2 + σY2 2 σX−Y = σX2 + σY2 Caution! This is the addition rule for variances of independent random variables 3. If X and Y have correlation ρ, then 2 σX+Y = σX2 + σY2 + 2ρσX σY 2 σX−Y = σX2 + σY2 − 2ρσX σY This is the general addition rule for variances of random variables 88 / 114 Linear Functions of Random Variables Example of Rules for Variances: Total SAT Scores8 É É Total SAT score is the sum of the math score (X) and verbal score (Y). In a recent year the means and standard deviations of the scores were SAT math score X µX = 519 σX = 115 SAT verbal score Y µY = 507 σY = 111 The total SAT score is X + Y. The mean is µX+Y = µX + µY = 519 + 507 = 1026 É 8 The variance and standard deviation cannot be computed with the information given, because X and Y are not independent – we need to know the correlation between them Example from Moore & McCabe 2006, p.303–304 89 / 114 Linear Functions of Random Variables Example of Rules for Variances: Total SAT Scores (cont’d) É The correlation between math and verbal scores was ρ = 0.71. By variances Rule 3: 2 σX+Y = σX2 + σY2 + 2ρσX σY = (115)2 + (111)2 + (2)(0.71)(115)(111) = 43, 672 É The variance of the sum is greater than the sum of the variances, because X and Y move up and down together É Finally find the standard deviation p σX+Y = 43, 672 = 209 É To sum up the total SAT score has mean µX+Y = 1026 and standard deviation σX+Y = 209 90 / 114 Jacob Bernouilli 1st (Basel 1654–1705) Family Were Refugees from Antwerp; Ars Conjectandi published 1713) 91 / 114 Jacob Bernouilli 1st (older) 92 / 114 Jacob Bernouilli 1st (stamp) Note implausible sequence of proportions (see later) 93 / 114 Law of Large Numbers (1) É Jacob Bernouilli contributed to the discovery of a major phenomenon of probability, the Law of Large Numbers: É É É In the long run, the proportion of a certain outcome of a random trial (say, head turns up when tossing a coin) will tend to stabilize to a stable value But outcome of one trial is independent of previous outcomes This is counterintuitive: É É People naturally tend to believe in a sort of Law of Small Numbers People do not normally expect the long “runs” of the same outcome (say, heads in tossing a coin) that occur in true random processes 94 / 114 Law of Large Numbers (2) Two simulations of tossing a fair coin 5,000 times 0.6 0.4 0.2 0.0 Proportion of Heads 0.8 1.0 Law of Large Numbers 1 5 10 50 500 5000 Number of Tosses Simulation of Coin Tosses 95 / 114 Sampling Distributions Revisited Population & Sample É The population distribution of a variable is: 1. the distribution of its values for all members of the population É E.g., the distribution of IQ test scores in the Belgian population 2. the probability distribution of the variable when choosing one individual at random from the population. É E.g., choose one Belgian randomly and record the IQ É A statistic (e.g., x, p̂, b1 ) calculated from a random sample or randomized experimental group is a random variable É The probability distribution of a statistic is its sampling distribution In remainder of Module 6 we look at the sampling distributions of: É É É counts & proportions sample means 96 / 114 Binomial Distributions Count X & Proportion p̂ É In general X is a count of the occurrence of some outcome in a fixed number of observations n É É E.g., in an agricultural experiment n plants are treated for a fungus; the number X of plants with the fungus is a random variable The sample proportion is p̂ = X/n É E.g., in the experiment X = 9 out of n = 32 plants have the fungus. The sample proportion is p̂ = É 9 32 = 0.281 The binomial setting is: 1. There are a fixed number n of observations 2. The n observations are all independent 3. Each observation can be classified as “success” (1) or “failure” (0) 4. The probability p of a success is the same for each observation 97 / 114 Binomial Distributions Binomial Distribution É The distribution of the count X of successes in the binomial setting is called the binomial distribution with parameters n (number of observations) and p (probability that any one observation is a success) É É É The possible values of X are the positive integers from 0 to n In abbreviation, one says that X is B(n, p) E.g., a child of a specific couple has probability p = 0.25 of being blood type O. Suppose the couple has n = 5 children. Then the number X of their children with blood type O is distributed as B(5, 0.25) É É Possible values of X are 0, 1, 2, 3, 4, 5 The probability distribution of X is (see why later) X: P(X = x) : 0 0.2373 1 0.3955 2 0.2637 3 0.0879 4 0.0146 5 0.001 98 / 114 Binomial Distributions Binomial Distribution É Choosing an SRS (without replacement) from a population with proportion p of successes is not exactly a binomial setting É É E.g., draw 10 cards from a deck, with “red card” a success. Then probability of red on second card is not independent of color of first card However, if the population is much larger than the sample – say, 20 times as large – the count X of successes in an SRS of size n has approximately the binomial distribution B(n, p) É E.g., draw sample with n = 200 from about 8,000 graduate students at UNC. “Success” is: student is female. Suppose p = 0.57. Then number of females X is distributed (almost exactly) as B(200, 0.57) 99 / 114 Binomial Distributions Finding Binomial Probabilities (1) 1. Calculator on the Web É http://rockem.stat.sc.edu/prototype/calculators/index.php3 2. Table of Binomial Probabilities É E.g., Table C in Moore & McCabe (2006) 3. Software – R É Finding P(X = x) > # P(exactly 2 children out of 5 with type O blood) > dbinom(2,5,0.25) [1] 0.2636719 É Finding P(X ≤ x) > # P(2 or fewer children out of 5 with type O blood) > pbinom(2,5,0.25) [1] 0.8964844 100 / 114 Binomial Distributions Finding Binomial Probabilities (2) 4. Software – Stata É Finding P(X = x) . * P(exactly 2 children out of 5 with type O blood) . display Binomial(5,2,0.25) - Binomial(5,3,0.25) .26367188 É Finding P(X ≤ x) . * P(2 or fewer children out of 5 with type O blood) . display 1 - Binomial(5,3,0.25) .89648438 É Note: In Stata the function Binomial(n,k,p) returns P(X ≥ x). It has to be spelled with capital B. 101 / 114 Binomial Distributions Finding Binomial Probabilities (3) 5. Using the Binomial Formulas (Optional; see Moore & McCabe 2006, pp.348–350) É Binomial Coefficient – The number of ways of arranging k successes among n observations is given by the binomial coefficient n! n = k k!(n − k)! for k = 0, 1, . . . , n. In the formula the factorial n! for any positive integer is defined as n! = n × (n − 1) × (n − 2) × . . . × 2 × 1 É and also 0! = 1. Binomial Probability – If X has distribution B(n, p), the binomial probability that X = k (for k = 0, 1, . . . , n) is n k P(X = k) = p (1 − p)n−k k 102 / 114 Binomial Distributions Origin of the Binomial Formula É Origin of the binomial formula 103 / 114 Binomial Distributions Binomial Mean & Standard Deviation É É If a count X is B(n, p), what are the mean µX and the standard deviation σX of X? To find out view X as the sum of n independent random variables Si . Each Si has the same probability distribution Outcome: Probability: É 1 p 0 1−p For a single Si (which, BTW, is called a Bernouilli trial) µS = (1)(p) + (0)(1 − p) = p σS2 = p(1 − p) É Then for X = S1 + S2 + · · · + Sn µX = µS1 + µS2 + · · · + µSn = nµS = np σX2 = nσX2 = np(1 − p) 104 / 114 Binomial Distributions Mean & Standard Deviation of Count & Proportion É If a count X has the binomial distribution B(n, p), then µX = np p σX = np(1 − p) É Our estimator of the proportion p of “successes” in the population is the sample proportion p̂ = É count of successes in sample size of sample X n If p̂ is the sample proportion of successes in an SRS of size n from a large population with proportion p of successes9 µp̂ = p r σp̂ = 9 = p(1 − p) n Check this follows from the rules for linear functions of random variables. 105 / 114 Binomial Distribution Normal Approximation of Counts & Proportions É Implications of mean and standard deviation of p̂ 1. µp̂ = p implies p̂ is unbiased q p(1−p) 2. σp̂ = implies that to divide the standard deviation of p̂ n by half one must multiply n by four É Normal approximation for counts & proportions: É In an SRS of size n from a large population, when n is large p X is approximately N np, np(1 − p) r ! p(1 − p) p̂ is approximately N p, n where p is the proportion of successes in the population, and X and p̂ = X/n are the count & proportion of successes in the sample, respectively 106 / 114 Binomial Distribution Normal Approximation of Counts & Proportions É Rule of thumb for normal approximation: n & p satisfy É É np ≥ 10, and n(1 − p) ≥ 10 107 / 114 Binomial Distribution Normal Approximation of Counts & Proportions É E.g., SRS of n = 200 from population of 8,000 UNC graduate students with proportion females p = .57. What is P(p̂ ≤ 0.5) (i.e., the sample has fewer females than males)? É É É np = 200 × 0.57 = 114 > 10 and n(1 − p) = 200 × 0.43 = 86 > 10 so rule of thumb is satisfied Using binomial probabilities: X is distributed as B(200, 0.57). p̂ = 0.5 correspond to X = 100. P(X ≤ 100) = 0.02734091 or .027. Using the normal approximation: µp̂ = p = 0.57; q p(1−p) = 0.03500714; σp̂ = n P(p̂ ≤ 0.5) = P p̂ − 0.57 ≤ 0.5 − 0.57 0.03500714 0.03500714 = P(Z ≤ −1.999592) = 0.02277217 108 / 114 Sampling Distribution of the Sample mean Experimental Study of the Sampling Distribution of x with n = 3, n = 10, n = 100 É É (a) Population distribution of X (income) Distribution of x for 600 samples: É É É (b) n = 3 (c) n = 10 (d) n = 100 109 / 114 Sampling Distribution of the Sample mean Experimental Study of the Sampling Distribution of x with n = 3, n = 10, n = 100 É Income Sampling Experiment Data Population n=3 n = 10 n = 100 Mean SD 22.172 22.584 21.955 22.176 15.635 9.376 4.916 1.193 The experimental results suggest the following conjectures: 1. The distribution of values of x for a SRS is centered around the population mean µX , regardless of sample size 2. The standard deviation σx of values of x decreases with increasing sample size – i.e., as n increases the distribution of x values becomes more concentrated around the population mean µX 3. The distribution of x values becomes more symmetrical as the sample size becomes larger and is approximately normal for large ns 110 / 114 Sampling Distribution of the Sample mean Theoretical Development: Mean & Standard Deviation of x É The mean x of a SRS is a random variable x= É n (X1 + X2 + · · · + Xn ) If the population has mean µ then by the addition rule for a sum of random variables µx = = É 1 1 n 1 n (µX1 + µX2 + · · · + µXn ) (µ + µ + · · · + µ) = µ Thus the mean of x is µ, the same as the mean of the population É I.e., x is an unbiased estimator of µ 111 / 114 Sampling Distribution of the Sample mean Theoretical Development: Mean & Standard Deviation of x É É Because the observations Xi are independent, by the addition rule for variances 2 1 2 (σX2 + σX2 + · · · + σX2 ) σx = 1 2 n n 2 1 (σ2 + σ2 + · · · + σ2 ) = n σ2 = n Thus for an SRS of size n from population with mean µ and standard deviation σ µx = µ σ σx = p n 112 / 114 Sampling Distribution of the Sample mean Experimental Study of the Sampling Distribution of x with n = 3, n = 10, n = 100 Table 21. Income Sampling Experiment Results and Theoretical Values Compared (600 Samples) Data Population n=3 n = 10 n = 100 Mean SD µx σx Fpcf10 σx∗ 11 22.172 22.584 21.955 22.176 15.635 9.376 4.916 1.193 — 22.172 22.172 22.172 — 9.027 4.944 1.564 — 0.994 0.980 0.781 — 8.974 4.846 1.221 p Finite population correction factor 1 − Nn with N = 256 and n = 3, 10, 100 p p 11 Finite population corrected standard error σx∗ = σX / n × 1 − Nn 10 113 / 114 Sampling Distribution of the Sample mean Why Does the Distribution of x Become Normal When n Increases? É In income sampling experiment: É É É This is due to a very important natural phenomenon, called the Central Limit Theorem: É É É The distribution of income in the population is not normal (it is skewed to the right) Even so, the distribution of sample mean x becomes symmetric & “normal-looking” when n increases Draw an SRS of size n from any population with mean µ and finite standard deviation σ. When n is large, the sampling distribution of the sample mean x is approximately normal, so that σ x is approximately N µ, p n The normal approximation for sample proportions & counts is also an instance of the CLT Special case: the mean of an SRS from a normal population is also normally distributed (for any n) 114 / 114