* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Notes - Voyager2.DVC.edu
Survey
Document related concepts
Transcript
Chapter 4: ORGANIZING POPULATIONS There are two reasons for studying samples rather than whole populations when applying statistical methods: (1) convenience and (2) the Law of Large Numbers. Samples are smaller than populations and thus easier to manage/analyze (1). Also, we may not be able to observe an entire population when organizing and analyzing information. But large enough samples, "representative" samples, are sufficient to answer questions dealing with the relative frequency (or probability) of unobserved elements in populations (2). In order to accumulate a large enough sample (a representative sample) we must first be able to characterize the entire population of a data set. Chapter FOUR is the first step in the process of describing a population. Probability Theory (chapter 4) and Probability Distributions (chapter 5) will enable us to organize information about a population in a way we can compare sample information to population data: this enables us to figure out if a sample “represents” a population in the range and frequency of values contained therein before we reach any conclusions from said samples. Probability Distributions enable us to organize information about a population in a way that we can easily compare to sample information/data. Following our coverage of Probability Distributions (chapters 5 and 6) we will be able to ascertain when a sample “represents” a population in the range and frequency of values contained therein (chapter 7). BASIC TERMINOLOGY A random event is one which has not one but many different possible outcomes. The collection of all outcomes or elements, observed or unobserved, constitute the sample or probability space for a random event under study. When the sample space is unknown, the number of outcomes and their relative frequency are established by performing many, repeated "statistical experiments" or by collecting observations in a systematic fashion and listing all different event-occurrences or outcomes, as well as their frequency. Calculating the relative frequency table of many repetitions of these experiments yields, according to the Law of Large Numbers, an estimate of the true probabilities of all outcomes in the sample space. Probabilities of outcomes may also be defined by the so called classical approach: this is the approach we employed in the first two lectures on probability in class and which is covered in your textbook in great detail. 1 Chapter 4. PROBABILITY THEORY Probability theory is the study of uncertainty. Its purpose is to enable the understanding of events and/or outcomes which are not [definitively singular or known in advance]. OUR purpose in studying Probability Theory is to continue the process of learning about organizing and analyzing samples from populations. Probability theory will enable us to organize populations so that we can then proceed with our course objectives (Analysis and Inference from sample data). RULES OF PROBABILITIES A. Since true probabilities for events in a sample space are the relative frequencies of occurrence of events, they must at least obey the rules of relative frequencies: 1. Each probability is at least 0 and at most 1 in value. 2. The sum of all probabilities of all events in the sample space is 1. 3. A zero (0) probability event is one which can not happen, just like a probability of 1 implies such event will happen with certainty. B. When simple outcomes combine to form other, more complex events, called compound events, the probability of the combined outcomes will involve operations on the "unconditional" probabilities of each simpler event. Two factors to assess before knowing how to calculate these more complex probabilities are Independence 1 of events and mutual exclusiveness2 of events. C. Compound probabilities of three different kinds: CONDITIONAL (GIVEN THAT), AND-type (the intersection of sets of outcomes, ) and OR-type (the union of sets of outcomes, ) combinations. 1) The CONDITIONAL type of compound-event probabity is evaluated by counting the frequency of occurrence of the event whose probability we are evaluating, in relation to the event we are using as a "conditioning agent" in the probability statement. 2) The latter two cases of probabilities for compound events, namely the probability of the intersection and the union of events, are measured using the MULTIPLICATION RULE of probability, and the ADDITION RULE of probability, respectively. 1 Two events are Independent when the probability of one event is not changed by the ocurrence of the other, and vice-versa. Detailed examples will be presented in class. 2 Two events are Mutually Exclusive when the probability of one event changes to zero (0) given the ocurrence of the other, and vice-versa. Detailed examples will be presented in class. 2 PROBABILITY THEORY A horse race [EVENT] is going to take place. It is going to involve 8 horses that are identical in ability, jockey chemistry, track record, etc. 2 Assume that only and exactly one horse can win the race. At the end of the race an announcement is made that sounds like "...and horse '#' wins the race" [OUTCOMES]. Simple outcomes of this event are descriptions of the individual horses that may win the race. For example, "horse 3 wins the race" is a simple outcome. 3 1 4 5 8 6 Compound outcomes of this event are descriptions of groups of individual horses that include the winner among their group. For example, if Allan bet on horses {2, 5, 7} to win, then "{2, 5, 7} wins the race" is a compound outcome. 7 The classical approach to calculate probabilities of simple and compound events/outcomes that are UNCONDITIONAL (as in the two above examples): P(outcome) count of # of ways outcome occurs size of the probabilit y space Examples: P(horse 3 wins the race) = 1/8 P(A) = P(Allan wins the bet) = P({2, 5, 7} wins the race) = 3/8 Given (unconditional compound) outcomes: A = {2,5,7}; B = {2,4,8}; C = {1,3,6}; D = {1,3,5,7}; E = {2,4,6,8}; F = {1,2,3,4} Other types of compound outcomes: 1. Conditional, |, outcomes. 2. Intersecting (the intersection, , of 2 unconditional outcomes). 3. The union, , of unconditional outcomes. Conditional (restricted) Probabilities: P(outcome | given occurrence) restricted count of # of ways outcome occurs restricted size of the probabilit y space For example: P(A|E)= The probability of A winning the bet given that you know E was a winner =1/4 = 25% Suggestion: always begin by calculating the denominator (bottom of the fraction). Try: P(B|A) "dependence"; P(C|E) "mutually exclusive"; P(F|E) "independence". The intersection of events and the Multiplication Rule of Probability: 3 P(outcome1 outcome2) P(outcome1) P(outcome 2 | outcome1) P(outcome 2) P(outcome1 | outcome 2) For example: P(C and D) = P( C D) = P( C)*P(D|C) = (3/8) * (2/3) = 2/8 = 25% Suggestion: calculate each part separately and then multiply them. Try: P(A and B); P(E and F); P(C and E). The union of events and the Addition Rule of Probability: P(outcome1 outcome2) P(outcome1) P(outcome 2) - P(outcome1 outcome2) For example: P(C or D) = P( C D) = P( C) + P(D) - P(C and D) = (3/8) + (4/8) - (2/8) = 5/8 = 62.5% Suggestion: calculate each part separately and then multiply them. Try: P(A or B); P(E or F); P(C or E). COUNTING RULES Counting Rules make the calculation of odds or probabilities a lot easier. They help us organize events by informing us of the number of possible ways in which we can arrange them. In class, we will cover the following: Fundamental Counting Rule: counts the number of sequences that are possible when matching the number of outcomes of one event with the outcomes of any number of other events. For example, if you own 3 slacks, 4 shirts, 2 pairs of shoes, 2 pairs of socks, 1 undergarment and 2 coats, then you can combine your dresswear into 3x4x2x2x1x2 = 96 distinct outfits. The Factorial Rule (!): counts the number of sequences that are possible when matching sequences of numbers in arrays of length equal to the number of possible outcomes. For example, 5! = 5x4x3x2x1=120 = five numbers can be arranged in 120 different 5-tuples. The Permutations Rule (nPr): counts the number of sequences that are possible when matching distinct sequences of numbers in arrays of shorter/lesser length than the number of possible outcomes, in which order matters. For example, 5P2 = 5x4 = 20 = there are 20 ways of arranging numbers in pairs (sequences of length 2) when one can choose from five distinct numbers, in which order matters. 4 The Combinations Rule (nCr): counts the number of sequences that are possible when matching distinct sequences of numbers in arrays of shorter/lesser length than the number of possible outcomes, in which order does not matter. For example, 5C2 = 5x42 = 10 = there are 10 ways of arranging numbers in pairs (sequences of length 2) when one can choose from five distinct numbers, in which order does not matter. Counting Rules I. Fundamental counting rule: the number of possible sequence-arrangements of joint compound events equals the product (multiplication) of the number of arrangements of each component/part. For example, if a car model can be offered to customers in 4 interior colors and 8 exterior colors, then the total number of car arrangements (by interior-exterior) is 4*8 = 32. II. Factorial counting rule: the number of possible arrangements of distinct sequences of n objects into n-tuples is equal to n! (which reads "n factorial") For example, how many quartets of the below objects can be formed so that each object only appears once in the 4-tuple. Answer : 4! = 4*3*2*1 = 24 ways. Or Or Or The first object can be selected to fit in any of four 4 spots in the sequence; once a spot is selected for it, it cannot be used again in the sequence (because of the distinct nature of the sequences), and neither can the spot reserved for it be employed by another symbol. or or The second object can be selected to fit in any of three 3 remaining spots in the sequence; once a spot is selected for it, it cannot be used again in the sequence (because of the distinct nature of the sequences), and neither can the spot reserved for it be employed by another symbol. Or The third object can be selected to fit in any of two 2 remaining spots in the sequence; once a spot is selected for it, it cannot be used again in the sequence (because of the distinct nature of the sequences), and neither can the spot reserved for it be employed by another symbol. The fourth object can be selected to fit in the only (1) remaining spot in the sequence. Thus, applying the fundamental counting rule, we get that the total number of arrangements is 4x3x2x1 = 24. This special pattern of counting is called factorial because we are "factoring out" one possibility after every consecutive selection in the sequence --until we exhaust the list of items being counted. This accomplishes our objective to have every item of the sequence be different from every other one item in the list (distinct), while accounting for all items on the list. 5 III. Permutations counting rules (shuffles): count the number of distinct sequences of items picked from a larger list of possibilities. A permutations count is essentially a factorial count cut short because not all possible items get selected. For example, "count the number of was in which 2 items can be picked out of a list of four items in such a way that the order of the sequence is important." Answer: 4P2 = 4 x 3 = 12 ways. Or Or Or The first object can be selected from any of four 4 available; once a spot is occupied by one of the items, it cannot be used again; neither can the object repeat itself in the sequence (because of the distinct nature of the sequences). or or And the second picked object can be any of three 3 remaining choices. Combinations counting rule: count the number of distinct sequences of items picked from a larger list of possibilities in such a way that the order of appearance of the elements does not "matter" (affect the total count of sequences). For example, "count the number of ways in which 2 items can be selected form a list of four items in such a way that the order of appearance of items in a sequence does not matter. Answer: 4C2 = (4P2) 2! = (4 x 3) 2 = 6 ways. 3 ways = + 2 ways = + 1 way = adds up to 6 ways. 6 Probability Distributions Probability Distributions are complete descriptions of populations that tell us what occurs in the population (probability space) and how often we observe all different occurrences (probabilities of all outcomes in the probability space). Example #1: EVENT: # of tails in two (2) tosses of a fair coin OUTCOMES: H, T PROB. SPACE: {0, 1, 2} SIZE OF PROB. SPACE = 4 Tree Diagram of all Outcomes of the above EVENT HH 0 HT 1 H TH 1 T TT 2 H H T T X 0 1 2 All (0X2) P(X) 1/4 2/4 1/4 4/4 = 1 = 100% A probability Distribution is the equivalent of a relative frequency table for a population of data 7 Mean of a population, : = middle value of a population = (not statistical, but) weighted (P(X)) average of all population values (X) = X*P(X) Process steps: 1. Multiply outcomes by respective probabilities: X*P(X) 2. Add all up Example #1 (cont.): Step 1 P(X) X*P(X) 1/4 0 2/4 2/4 1/4 2/4 4/4 = 1 = 100% 1 = (Step 2) Variance of a population, : = average squared dispersion in the population = (not statistical, but) weighted (P(X)) average of all population squared deviations from . = (X - )2*P(X) Process steps: 1. Difference the outcomes with the population mean, , to get DEVIATIONS. 2. Square DEVIATIONS. 3. Multiply DEVIATIONS by respective probabilities: (X - )2*P(X). 4. Add all up. Step 1 Step 2 Step 3 X P(X) X*P(X) (X - ) (X - )2 P(X)*(X - )2 0 ¼ 0 -1 1 1/4 1 2/4 2/4 0 0 0 2 ¼ 2/4 +1 1 1/4 1/2 = 4/4 = 1 = 100% All (0X2) 1= Step 4 X 0 1 2 All (0X2) 8