Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Combinatorics What is Combinatorics ? Definition: Methodology of arrangements or selecting of finite set elements according to some rules. Three basic problems: z Existence: Whether some arrangements is possible. z Counting: Count the number of ways that the corresponding arrangement can be carried out. z Construction: Generate all possible arrangements. Many problems in probability theory require that we count the number of ways that a particular event can occur. (Counting problem) 1 Multiplication Rule Fact: If the set A has m elements and the set B has n elements then the set A×B has m⋅n elements. Multiplication Rule: Suppose that an arrangements or selection process consists of two sequential parts. The first part can occur in n ways; the second part in m ways; and the occurrence in one part does not affect that of the other part. Then the process can occur in n⋅m ways. A task is to be carried out in a sequence of r stages. There are n1 ways to carry out the first stage; for each of these n1 ways, there are n2 ways to carry out the second stage; for each of these n2 ways, there are n3 ways to carry out the third stage, and so forth. Then the total number of ways in which the entire task can be accomplished is given by the product N = n1 ⋅ n2 ⋅ ⋅ ⋅ nr.. Counting Subsets Fact: If the set A has n elements then the number of its subsets is 2n. Explanation: The number of subsets is equal to the number of binary n-strings. ( 00…0 ≡ ∅, . . . , 0…01…0, . . . , 11…1 ≡ A ) n-zeroes element included in the subset The number of binary n-strings is equal to the number of elements in B × B × … × B where B = {0, 1}. According to the multiplication rule, n-times this number is equal to 2⋅2⋅…⋅2 = 2n Example: The subsets of {a,b,c} are: ∅, {a}, {b}, {c}, {a,b}, {a,c}, {b,c}, {a,b,c}. 2 Types of Selections A selection process can be with or without replacement. In selection with replacement, the number of choices remains the same throughout the selection process; any selected object is eligible to be selected for the second or more time. In selection without replacement, the number of choices reduces one by one successively; any selected object is no longer eligible for reselection. The order of objects being selected may or may not matter in a selection process. If the order of selection matters, selecting A before B is different from selecting B before A. However, if the order does not matter, i.e., all you care is which two objects that you have selected, the collections {A, B} and {B, A} are exactly the same. Consider selecting two objects from {A, B}. Order does not matter Order matters With Replacement {AA, AB, BB} {AA, AB, BA, BB} Without Replacement {AB} {AB, BA} Variations (order matters) The variation of n distinguishable objects taken k at a time is the number of arrangements (selections) in which the order of (the selection of) objects matters. The number of variation n! Vn,k = n ⋅ (n − 1) ⋅ (n − 2)L(n − k + 1) = without replacement: (n − k )! There are n ways to carry out the first object; for each of these n ways, there are n-1 ways to carry out the second object; for each of these n-1 ways, there are n2 ways to carry out the third object, and so forth. The number of variations with replacement: V n ,k = n ⋅ n ⋅ n L n = n k There are n ways to carry out the first object; for each of these n ways, there are n ways to carry out the second object; for each of these n ways, there are n ways to carry out the third object, and so forth. 3 Variations (examples) z How many 4-digits number with different digits exists? NDigits = 9 ⋅ V9,3 = 9 ⋅ 9 ⋅ 8 ⋅ 7 = 4536 z How many 4-digits number exists? NDigits = 9 ⋅ V10,3 = 9 ⋅ 10 ⋅ 10 ⋅ 10 = 9000 z In poker, each player (suppose 1 plyer) is randomly distributed 5 cards. Find the number of possible hands. NHands = V52,5 = 52 ⋅ 51⋅ 50 ⋅ 49 ⋅ 48 = 311875200 Permutations (order matters) The permutation of n distinguishable objects is the number of arrangements (selections) in which the order of objects matters (permutation = variation without replacement of order n, k= n). The number of permutations: Pn = n ⋅ (n − 1) ⋅ (n − 2)L1 = n! There are n ways to carry out the first object; for each of these n ways, there are n-1 ways to carry out the second object; for each of these n-1 ways, there are n2 ways to carry out the third object, and so forth. Permutations of Non-Distinct Objects: Suppose that there are n objects of k types where objects of the same types are indistinguishable from each other. Further suppose that there are ni objects of ith type; Σi ni = n. The number of permutations of non-distinct objects: Pn1,n2 ,K,nk = n! n1! n2!L nk ! 4 Permutations(examples) z In how many ways can be arranged 2 apples, 3 oranges and 3 peachs. 8! P2,3,3 = = 80 2!3!3! z How many permutations can be made of the letters of the word missisipi. 9! P1,4,3,1 = = 2520 4!3! Combinations (order does not matter) The combination of n distinguishable objects taken k at a time is the number of arrangements in which the order of selection does not matter. The number of combinations: Cn,k = Vn,k k! = n! n = ⎛⎜ ⎞⎟ k! (n − k )! ⎝ k ⎠ Because the order does not metter the number is obtained by dividing the number of variation with the number of permutation to eliminate all possible different orders. The number of combinations with replacement: n + k − 1⎞ Cn,k = ⎛⎜ ⎟ ⎝ k ⎠ Note that in combinations with replacement n could be less than k. 5 Combinations (examples) Objects {a, b, c}. Combination of order 2: {ab, ac, bc}. 3 ( 3 = ⎛⎜ ⎞⎟ ) ⎝ 2⎠ Combination with replacement of order 2: {aa, ab, ac, bb, bc, cc}. ( 6 = ⎛⎜ 3 + 2 − 1⎞⎟ ) ⎝ z From a group of 7 mans and 4 women we choose 6 persons from which at least two should be women. Count the number of possibilities. NPossibilities = C7,4 ⋅ C 4,2 + C7,3 ⋅ C 4,3 + C7,2 ⋅ C 4,4 = 7 4 7 4 7 4 = ⎛⎜ ⎞⎟⎛⎜ ⎞⎟ + ⎛⎜ ⎞⎟⎛⎜ ⎞⎟ + ⎛⎜ ⎞⎟⎛⎜ ⎞⎟ = 35 ⋅ 6 + 35 ⋅ 4 + 21⋅ 1 = 371 ⎝ 4 ⎠⎝ 2 ⎠ ⎝ 3 ⎠⎝ 3 ⎠ ⎝ 2 ⎠⎝ 4 ⎠ z A flower-shop has 5 kinds of flowers. How many bouquets of 11 flowers could be made. 15 ⋅ 14 ⋅ 13 ⋅ 12 15 NBouquets = C5,11 = C5 +11−1,11 = ⎛⎜ ⎞⎟ = = 1365 11 1⋅ 2 ⋅ 3 ⋅ 4 ⎝ ⎠ 2 ⎠ Examples There are four suits (spade, heart, club, and diamond) in an ordinary deck of cards. Each suit is one of 13 denominations (ace, 2, 3, …, 10, jack, queen, king). By convention, the ace is regarded as either one or 14; jack as 11, queen as 12; king as 13. In poker, each player is randomly distributed 5 cards. The five cards form: z a royal flush if the five cards are 10, J, Q, K, A of the same suit, za straight flush if the five cards are of the same suit and of consecutive numbers but not a royal flush, z a four-of-a-kind if there are four cards of the same denomination, z a full house if there are three cards of one denomination and two cards of another, 6 Examples za flush if the five cards are of the same suit but not a royal flush, nor a straight flush, z a straight if the five cards are of consecutive numbers but not the same suit, z a three-of-a-kind if there are three cards of one denomination and one card each of two other denominations, z a two-pairs if there are two cards of one denomination, two cards of another denomination, and one card of a third denomination, z a one pair if there are two cards of one denomination and one card each of from three other denomination, and z a bust if the cards are of different denominations. Find the number of hands in each of the above cases. End of Chapter Thank you for your attention! 7 II Probability What is Probability ? Since pre-historic times, Mankind has been aware of Deterministic phenomena z z z z z daily sunrises and sunsets tides at sea shores phases of the moon seasonal changes in weather annual flooding of the Nile Mankind has also Noticed Random phenomena results of coin tosses z results of rolling dice z results of horse races Legends and folk tales from all over the world refer to dice games and gambling z 1 What is Probability ? Modern mankind also loves to gamble z z z State-run lotteries Casinos Betting Parlors, etc Probabilistic notions are commonplace in everyday language usage. We use words such as z probable/improbable; possible/impossible z certain/uncertain; likely/unlikely Phrases such as: “there is a 50-50 chance” “the odds are 7 to 4 against” “the probability of precipitation is 20%” are understood? by most people What is Probability ? What is the probability that a coin comes down Heads when it is tossed? Almost everyone answers 1/2. Why 1/2 ? z “Because there are two sides to the coin” z “Because when tossed repeatedly, the coin will turn up Heads half the time” This is called the classical approach to probability Justification: Symmetry principle; Principle of indifference (or insufficient reason) If we toss two coins, are there three outcomes or four outcomes? z {0 Heads, 1 Head, 2 Heads}? z {(T,T), (T,H), (H,T), (H,H)}? z Note that 2 Heads has probability 1/3 or 1/4 depending on the choice 2 What is Probability ? Suppose multiple tosses have resulted in 50% Heads. Setting P(Head) = 1/2 is the relative frequency approach to probability. z z z z z z z If an outcome x occurs NX times on N trials, its relative frequency is NX /N and we define its probability P(x) to be NX /N Does there exist a probability of Heads for a mint-new untossed coin? Or do probabilities come into existence only after multiple tosses? How large should N be? Are probabilities re-defined after each toss? Many assertions about probability are essentially statements of beliefs A fair coin is one for which P(Heads) = 1/2 but how do we know whether a given coin is fair? Symmetry of the physical object is a belief; That further tosses of a coin for which P(Heads) = 1/2 will result in 50% Heads is a belief What is Probability ? Consider a dice game that played an important role in the historical development of probability. The famous letters between Pascal and Fermat, which many believe started a serious study of probability, were instigated by a request for help from a French nobleman and gambler, Chevalier de Méré. It is said that de Méré had been betting that, in four rolls of a dice, at least one six would turn up. He was winning consistently and, to get more people to play, he changed the game to bet that, in 24 rolls of two dice, a pair of sixes would turn up. It is claimed that de Méré lost with 24 and felt that 25 rolls were necessary to make the game favorable. Later we shell compute the following probabilities: P(“at least one 6 of 4 rolls”) = 0.518 P(“at least one 2×6 of 24 rolls”) = 0.491 P(“at least one 2×6 of 25 rolls”) = 0.506 3 Experiments An experiment that can result in different outcomes, even though it is repeated in the same manner every time, is called a random experiment. The set of all possible outcomes of a random experiment is called sample space and each outcome is called a trial. The sample space is denoted by Ω and a trial by ω. Example: Tossing a coin: Ω = { H, T } Example: Rolling a dice: Ω = { 1, 2, 3, 4, 5, 6 } Example: Phone calls in police (in 24 hours): Ω = { 0, 1, 2, 3, … } = N Example: Measuring a noise voltage: Ω = { x : –1 ≤ x ≤ 1 } Example: Brown moving of particles: Ω = { (x(t), y(t), z(t)), t∈[0, T] } Events A subset of Ω is called an event Example: A = {2, 4, 6} and B = {2, 3, 5} are said to be events defined on the sample space Ω = {1, 2, 3, 4, 5, 6} “events defined on the sample space” is merely a probabilist’s way of saying “subsets of the sample space” AC = {1, 3, 5} and BC = {1, 4, 6} also are events defined on Ω. A sample space Ω of n elements has 2n different events (subsets). Two special events: z Ω can be regarded as a subset of Ω z On any trial, the event Ω always occurs z The event Ω is called the certain event or the sure event z ∅, the empty set, is also a subset of Ω z On any trial, the event ∅ never occurs z The event ∅ is called the null event or the impossible event 4 Experiments and Events An event containing a single outcome (trial) is called an elementary event or singleton event. Elementary events can not happen simultaneously. Example: Rolling 2 dices (shows that there could have more sample spaces): Ω1 = { (x,y), x,y = 1, 2, …, 6 } Ω2 = { 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 } (sum) Ω3 = { “even sum”, “odd sum” } Ω3 = { “same numbers”, “different numbers”, “sum 7” } Ω1, Ω2, Ω2 are elementary event spaces. Ω4 is not an elementary event space because events “different numbers” and “sum 7” can happen simultaneously. Ω1 is most informative elementary event space. Event Operations A ∩ B (A⋅B) A∪B AC A\B (A−B) Event A implies event B (A⊆B) iff when occurs A then occurs B. Example: Rolling 2 dices, Ω1. A = { (x,y), x=y } (same numbers); B = { (x,y), x+y=2k } (even sum) The following operation holds: A ∩ B = A, A ∪ B = B, AC = { (x,y), x≠y }, A⊆B B\A = { (1,3), (1,5), (2,4), (2,6), (3,1), (3,5), (4,2), (4,6), (5,1), (5,3), (6,2), (6,4) } 5 Probability Axioms Probabilities are numbers assigned to events that satisfy the following rules: Axiom I: P(A) ≥ 0 for all events A Axiom II: P(Ω) = 1 z Axiom III: If events A and B are disjoint, then P(A ∪ B) = P(A) + P(B) z z Consequences of the Axioms: P(∅) = 0; 0 ≤ P(A) ≤ 1; P(AC) = 1 – P(A); If A⊆B then P(A) ≤ P(B); P(A ∪ B) = P(A) + P(B) – P(A ∩ B); Discrete Probability For elementary sample space Ω with n elements ω1, ω2, …, ωn, the probabilities of events depend on the probabilities of the outcomes: Classical approach: Each outcome has probability 1/n; P(ωi) = 1/n for all i P(A) = |A|/n, where |A| = # of elements in A Special cases: P(Ω) = n/n = 1, P(∅) = 0/n = 0 Nonclassical approach: The n outcomes have probabilities p1, p2, …, pn where pi ≥ 0 and ∑pi = 1 P(A) = sum of the pi for all members of A Special cases: P(∅) = 0 as before P(Ω) = p1 + p2 + … + pn = 1 6 Examples (1) What is the probability that at rolling of 2 dices we will get: a) sum 7, b) different numbers, c) sum greater than 8. P(" sum 7" ) = 6 30 9 = 0.1667, P(" different numbers " ) = = 0.8333, P(" sum > 7" ) = = 0.25 36 36 36 We distribute 3 plying cards. What is the probability to get: a) one ace, b) at least one ace, c) at least one club, d) black jack ⎛ 4 ⎞⎛ 48 ⎞ + ⎛ 4 ⎞⎛ 48 ⎞ + ⎛ 4 ⎞⎛ 48 ⎞ ⎛ 4 ⎞⎛ 48 ⎞ ⎜ 1 ⎟⎜ 2 ⎟ ⎜ 1 ⎟⎜ 2 ⎟ ⎜ 2 ⎟⎜ 1 ⎟ ⎜ 3 ⎟⎜ 0 ⎟ P(" one ace" ) = ⎝ ⎠⎝ ⎠ = 0.204, P(" at least one ace" ) = ⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠ = 0.218 ⎛ 52 ⎞ ⎛ 52 ⎞ ⎜3⎟ ⎜3⎟ ⎝ ⎠ ⎝ ⎠ Better approach : ⎛ 4 ⎞⎛ 48 ⎞ ⎜ 0 ⎟⎜ 3 ⎟ P(" at least one ace" ) = 1 − P(" no ace" ) = 1 − ⎝ ⎠⎝ ⎠ = 0.218 ⎛ 52 ⎞ ⎜3⎟ ⎝ ⎠ ⎛ 4 ⎞⎛ 48 ⎞ ⎛13 ⎞⎛ 39 ⎞ ⎜ 1 ⎟⎜ 2 ⎟ ⎜ 0 ⎟⎜ 3 ⎟ ⎝ ⎠ ⎝ ⎠ P(" at least one c lub" ) = 1 − = 0.587, P(" black jack " ) = ⎝ ⎠⎝ ⎠ = 0.048 ⎛ 52 ⎞ ⎛ 52 ⎞ ⎜3⎟ ⎜3⎟ ⎝ ⎠ ⎝ ⎠ Examples (2) How many people do we need to have in a room to make it a favorable bet (probability of success greater than 1/2) that two people in the room will have the same birthday? Since there are 365 possible birthdays, it is tempting to guess that we would need about 1/2 this number, or 183. You would surely win this bet. In fact, the number required for a favorable bet is only 23. k (people) Probability P(" have the same birthday " ) = 1 − P(" do not have the same birthday " ) = 1 − 365 ⋅ 364 ⋅ 363 ⋅ K ⋅ (365 − k + 1) 365 k 10 20 23 30 40 50 0.1169 0.4115 0.5073 0.7064 0.8912 0.9704 Compute the probabilities that in four rolls of a dice, at least one six would turn up and that in 24 rolls of two dice, a pair of sixes would turn up. (Chevalier de Méré) P(" at least one 6 of 4 rolls" ) = 1 − P(" no one 6 of 4 rolls" ) = 1 − 54 64 = 0.518 P(" at least one 2 × 6 of 24 rolls" ) = 1 − P(" no one 2 × 6 of 24 rolls" ) = 1 − 35 24 36 24 = 0.491 7 Estimates of Probabilities (1) z Based on past experimental results, we can use the observed relative frequency as an estimate of the probability of an event If an experiment is repeated N times, and event A occurs NA times the relative frequency of the event A is W(A) = NA/N. z Estimates are usually reasonably good if the number of trials was large On N trials, the relative frequency that we observe might easily differ from the true probability by as much as N–1/2 or more z z z z z Suppose that a fair coin is tossed a million times Is there a logical reason why the coin will not turn up Heads each and every time? No, there is no logical reason why it couldn’t, but it is very unlikely Yes, if the coin is fair, there is no way that it can turn up Heads a million times in a row Estimates of Probabilities (2) If we toss two coins, are there three outcomes or four outcomes? z { 0 Heads, 1 Head, 2 Heads } ? z { (T,T), (T,H), (H,T), (H,H) } ? Note that 2 Heads has probability 1/3 or 1/4 depending on the choice (if outcomes are considered with same probabilities ???) Only the observed relative frequency gives us a solution what probability to choose. So, initial estimates of outcome probabilities are based on observed relative frequencies. All probability theory is deduced to methods of calculation of probability of events composed of outcomes related to an experiment. In case of two coins, the relative frequency is W((H,H)) ≈ 1/4. 8 Estimates of Probabilities (3) z z z z z z z A lot of effort was expended in trying to define probability as the limit of the relative frequency P(A) = lim NA/N N→∞ Unfortunately, the limit does not exist in a mathematical sense Physically we will only observe a finite length prologue of the sequence of trials The nonclassical approach cannot be used at all for uncountable infinite sample spaces Example: Pick a random number between 0 and 1. Ω = { x | 0 < x < 1 } Consider a million random numbers obtained from rand(). A particular outcome, say 0.703546789, will either not have occurred in these million trials, or it will have occurred just once. P{0.703546789} = 10–6 or 0 Estimates of Probabilities (4) z z z z z z z z The relative frequency estimate seems to be converging to 0 as the number of trials increases! The only model that works for uncountable infinite sample spaces is for each outcome to have probability 0 But, on each trial, some outcome occurs, doesn’t it? So where are the probabilities? For rand(), P{a < outcome < b} = b–a The nonzero probabilities are assigned to the intervals of the line, not to outcomes! In the physical sciences and engineering, the real numbers are a model for many phenomena that are discrete at the microscopic level. This usually causes no problems and the model usually gives the correct answers We all understand that V = 1.235 volts really means 1.2345 ≤ V ≤ 1.2355 volts 9 God made the integers Kronecker: God made the integers; all else is the work of man Human beings usually choose rational numbers when asked for a number in (0,1) A physical measurement made with an instrument will yield a rational number rand() returns “real numbers” that are actually rational numbers All this is because of finite precision z z z z z Do real numbers exist? The real number line is a mathematical construction that models the real world very well indeed If the volume electrical charge density is ρ, the charge in a volume Δv is just ρΔv For Δv very small, ρΔv is smaller than the charge of an electron, so the model cannot be right for small volumes (or densities)! (Paradoxes) But it is convenient !!! z z z z What about P(arbitrary subset)? z z z z z z If every outcome is an event of probability zero, then isn’t it true that any event A must also have probability zero? P(A) = sum of the probabilities of all the outcomes that comprise A =0+0+…=0? No, the above is a mis-application of Axiom III (which applies to countable unions only) Since each outcome has probability zero, a countable event, that is, an event that has a countable number of outcomes, also has probability zero (by Axiom III) Axiom III does not say that the probability of an uncountable event is the sum of the probabilities of the outcomes For uncountable infinite sample spaces, a consistent probability assignment to all the subsets of Ω is not possible 10 Asking the right question z z z z z z z z z The nonzero probabilities are assigned to the intervals of the line, not to outcomes! In most physical applications, the question “Does x = 0.213482774099070267623…?” is meaningless If x were 0.213482774099070267624…instead, the airplane will still fly, the bridge will still stand, the modem will still connect In most instances, we are satisfied if x is in some specified range (design specs) “Does x ∈ (a,b)?” is the right question! Example. Choose a random number between 0 and 1 P{a < outcome < b} = b–a; P{0.4 < outcome < 0.6} = 0.2 N calls to rand() give N numbers, roughly 20% of which are in the interval (0.4, 0.6) At most one (and most likely none!) of these will be 0.57689231 Geometric Probability Let us Ω be infinite and uncountable representing some geometrical object in R (axis), R2 (plane), R3 (space), R4 (4-dimensional space), etc. Let us S ⊆ Ω be a random event. Then, probability of S is the number P(S) = m(S) , m( Ω) ⎛ where m(A) is a measure of the set A ⎞ ⎜ ⎟ 1 2 3 ⎝ length in R , area in R , volume in R , etc . ⎠ Example. Two friends arrange meeting between 10 and 11 clock. What is the probability that they will meet each other if one will wait other at most 20 minutes. Ω = { (x,y), x,y∈[0,60] } 60 40 |x - y| < 20 20 S: |X – Y| < 20 P = (602 – 402)/602 = 5/9 = 0.5556 20 40 60 11 Examples Example. What is the probability that the equitation x2 + ax + b = 0 will have real solutions for random chosen a,b∈[0,1]. b 1 Ω = { (a,b), a,b∈[0,1] } S: a2 > 4b ⇒ b < a2/4 a2/4 P = ∫0,1(a2/4)da = 1/12 = 0.083 1 a Example. One stick is randomly broken on three parts. What is the probability that the parts can form triangle. Ω = { (x,y), x+y<l } x y S: x < y + (l-x-y) ⇒ x < l/2 y < x + (l-x-y) ⇒ y < l/2 (l-x-y) < x + y ⇒ x+y < l/2 P = 0.5(l/2)(l/2)/(0.5⋅l⋅l) = ¼ = 0.25 l l-x-y l/2 l/2 l Conditional Probability z The experiment has been performed and we know that the event A occurred, that is, the outcome is some member of A z Question: What are the chances that B occurred? in view of the new knowledge that event A is known to have occurred? One (stupid?) answer to this question is that the chances that B occurred are still what they always were, viz. P(B), But: z z z Left diagram: AB = ∅. Obviously, if A occurred, B cannot have occurred Right diagram: AB = A. Obviously, if A occurred, B must also have occurred 12 Definition z The conditional probability of B given A is denoted by P(B|A) z Read this as “the probability of B given A” or “the probability of B conditioned on A” A is called the conditioning event z Definition: If P(A) > 0, P(B|A) is defined as P(B | A) = z z P(AB) P(A) Left diagram: AB = ∅. Obviously, if A occurred, B cannot have occurred P(B|A) = P(AB)/P(A) = 0 Right diagram: AB = A. Obviously, if A occurred, B must also have occurred P(B|A) = P(AB)/P(A) = P(A)/P(A) = 1 Motivation The definition of conditional probability is motivated by considerations arising from the relative frequency viewpoint z Suppose that N independent trials of the experiment have been performed z Let NA denote the number of trials on which event A occurred z Let NB denote the number of trials on which event B occurred z Event AB occurred on NAB trials z Consider only those NA trials on which A occurred and ignore the rest B occurred on NAB trials out of the NA trials on which A occurred, so the relative frequency of B on those NA trials is NAB/NA N AB W(AB) N AB W(B | A) = = N = NA W(A) NA N conditional probability = relative frequency on a restricted set of trials 13 Examples Example. Two fair dice are rolled. What is the probability that the sum of the two faces is 6 given that the dice are showing different faces? What is the probability that the sum of the two faces is 7 given that the dice are showing different faces? What is the probability that the first dice is showing a 6 given that the dice are showing different faces? A = “different faces”, P(A) = 30/36 B = “sum of the two faces is 6”, AB = { (1,5), (5,1) ,(2,4), (4,2) } C = “sum of the two faces is 7”, AC = { (1,6), (6,1) ,(2,5), (5,2), (3,4), (4,3) } D = first dice is showing a 6”, AD= { (6,1), (6,2) ,(6,3), (6,4), (6,5) } P(B|A) = (4/36)/(30/36) = 2/15 < P(B) = 5/36 P(C|A) = (6/36)/(30/36) = 1/5 > P(C) = 1/6 P(D|A) = (5/36)/(30/36) = 1/6 = P(D) = 1/6 Axioms Conditional probabilities are a probability measure, that is, they satisfy the axioms of probability theory Axiom I: 0 ≤ P(B|A) ≤ 1 for all events B. Since AB ⊆ A, 0 ≤ P(AB) ≤ P(A) ⇒ 0 ≤ P(B|A) = P(AB)/P(A) ≤ 1 z Axiom II: P(W|A) = 1 Since AW = A, P(AW) = P(A) ⇒ P(W|A) = P(AW)/P(A) = 1 z Similarly for Axiom III z An expression such as P((B ∪ C)|A) is commonly written as P(B ∪ C|A) Beginners’ mistake: If B and C are disjoint, they write P(B ∪ C|A) = P(B) + P(C|A) z NOT! 14 Some Rules P(BC|A) = 1 – P(B|A) z If B ⊆ C, then P(B|A) ≤ P(C|A) z If BC = ∅, then P((B ∪ C)|A) = P(B|A) + P(C|A) z More generally, P((B ∪ C)|A) = P(B|A) + P(C|A) – P(BC|A) z Even if A,B,C, and D are disjoint, P(B ∪ C|A ∪ D) ≠ P(B) + P(C|A) +P(D) z OK, so you can update your probabilities to conditional probabilities if you know that event A occurred z Is that all there is to it? z Is the notion of conditional probability just a one-trick pony? z Surely life holds more than that? Actually, conditional probabilities are fundamental tools in probabilistic analyses Chain Rule z P(B|A) = P(AB)/P(A) and symmetrically P(A|B) = P(AB)/P(B) z P(AB) = P(B|A)P(A) = P(A|B)P(B) = P(AB) More generally, P(ABCD…) = P(A)P(B|A)P(C|AB)P(D|ABC)… z Product of first two terms is P(AB), P(C|AB)P(AB) = P(ABC), so that the product of the first three terms is P(ABC), and so on … z Every probability result also applies to conditional probabilities. The chain rule applies to computation of conditional probabilities by conditioning everything on the given event H (say) P(ABCD… |H) = P(A|H)P(B|AH)P(C|ABH)P(D|ABCH)… 15 Total Probability From A = AB ∪ ABC ⇒ P(AB) + P(ABC) = P(A) On the other hand P(AB) = P(A|B)P(B) and P(ABC) = P(A|BC)P(BC) P(A) = P(A|B)P(B) + P(A|BC)P(BC) and symmetrically P(B) = P(B|A)P(A) + P(B|AC)P(AC) z z z z z This result allows us to find unconditional probabilities from conditional probabilities It is a fundamentally important result It is also very simple (uses horse sense) This fundamental result is called the theorem of total probability The probability of the event A is the weighted average of the probabilities of A conditioned on B and on BC Example Example. Box I has 3 green and 2 red balls, while Box II has 2 green and 2 red balls. A ball is drawn at random from Box I and transferred to Box II. Then, a ball is drawn at random from Box II. What is the proability that the ball drawn from Box II is green? Note that the color of the ball transferred from Box I to Box II is not known. Box I Box II After the transfer, Box II has 5 balls in it G = event ball drawn from Box II is green, A = event ball transferred is red P(G|A) = 2/5, P(G|AC) = 3/5, P(A) = 2/5 P(G) = P(G|A)P(A) + P(G|AC)P(AC) = (2/5)(2/5) + (3/5)(3/5) = 13/25 16 Total Probability Given a partition A1, A2 , …, An of the sample space (A1∩A2 = ∅ and A1 ∪ A2 ∪ … ∪ An = Ω), then P(B) = P(B|A1)P(A1) + P(B|A2)P(A2)+ … + P(B|An)P(An) The theorem as presented previously was the case n = 2 of this more general result. If P(B|Aj) is the smallest of the P(B|Ai), then replacing the P(B|Ai) by P(B|Aj) gives P(B) ≤ P(B|Aj)•[ P(A1) + P(A2) + … + P(An) ] = P(B|Aj) If P(B|Ak) is the largest of the P(B|Ai), then replacing the P(B|Ai) by P(B|Aj) gives P(B) ≥ P(B|Ak)•[ P(A1) + P(A2) + … + P(An) ] = P(B|Ak) Conclusion: minP(B|Aj) ≤ P(B) ≤ maxP(B|Ai) j i Example Example. Box I has 2 green and 4 red balls, while Box II has 4 green and 2 red balls. 1 ball from Box I and 2 balls from Box II are transferred to Box III. Then, a ball is drawn at random from Box III. What is the probability that the ball drawn from Box III is green? B = ball drawn from Box III is green A0 = Box III has 0 green ball, P(A0) = unimportant because P(B|A0) = 0 A1 = Box III has 1 green ball, P(A1) = (2/6)(C2,2/C6,2) + (4/6)(C4,1C2,1/C6,2) = (2/6)(1/15) + (4/6)(8/15) = 17/45. P(BI:1g BII:2r) p(BI:1r BII:1g1r) A2 = Box III has 2 green ball, P(A2) = (2/6)(C4,1C2,1/C6,2) + (4/6)(C4,2C2,0/C6,2) = P(BI:1g BII:1g1r) p(BI:1r BII:2g) (2/6)(8/15) + (4/6)(6/15) = 20/45 A3 = Box III has 3 green ball, P(A3) = (2/6)(C4,2C2,0/C6,2) = (2/6)(6/15) = 6/45 P(BI:1g BII:2g) P(B) = P(B|A1)P(A1) + P(B|A2)P(A2)+ P(B|A3)P(A3) = = (1/3)(17/45) + (2/3)(20/45) + (3/3)(6/45) = 5/9 17 Bayes’ formula z Given that event A of probability P(A) > 0 occurred, the conditional probability of B given A is denoted by P(B|A) and defined as P(B|A) = P(AB)/P(A) z What is P(A|B)? P(A | B) = z z z P(AB) P(B | A) P(A) = P(B) P(B) Simplest version of Bayes’ formula Bayes’ formula P(A|B) = P(B|A)P(A)/P(B) is also called Bayes’ theorem, or Bayes’ lemma, or often (mistakenly) Bayes’ rule Bayes’ rule refers to a methodology for decision-making that is an extremely controversial topic among statisticians Bayes’ formula z z When P(B) is obtained from P(B|Ak)’s via the more general version of the theorem of total probability, the more general total probability appears in the denominator The numerator is still one of the terms in the denominator P(A k | B) = P(B | A k ) P(A k ) = P(B) = P(B | A k ) P(A k ) P(B | A1) P(A1) + P(B | A 2 ) P(A 2 ) + K + P(B | A n ) P(A n ) 18 Example Example. Box I has 2 green and 4 red balls, while Box II has 4 green and 2 red balls. 1 ball from Box I and 2 balls from Box II are transferred to Box III. Then, a ball is drawn at random from Box III. What is the probability that the ball drawn from Box III is green? B = ball drawn from Box III is green A0 = Box III has 0 green ball, P(A0) = unimportant because P(B|A0) = 0 A1 = Box III has 1 green ball, P(A1) = (2/6)(C2,2/C6,2) + (4/6)(C4,1C2,1/C6,2) = (2/6)(1/15) + (4/6)(8/15) = 17/45. P(BI:1g BII:2r) p(BI:1r BII:1g1r) A2 = Box III has 2 green ball, P(A2) = (2/6)(C4,1C2,1/C6,2) + (4/6)(C4,2C2,0/C6,2) = P(BI:1g BII:1g1r) p(BI:1r BII:2g) (2/6)(8/15) + (4/6)(6/15) = 20/45 A3 = Box III has 3 green ball, P(A3) = (2/6)(C4,2C2,0/C6,2) = (2/6)(6/15) = 6/45 P(BI:1g BII:2g) P(B) = P(B|A1)P(A1) + P(B|A2)P(A2)+ P(B|A3)P(A3) = = (1/3)(17/45) + (2/3)(20/45) + (3/3)(6/45) = 5/9 Independence z z z z z z z Repeated independent trials. The outcome of any trial of the experiment does not influence or affect the outcome of any other trial The trials are said to be physically independent Physical independence is a belief It cannot be proved that the trials are independent; we can only believe The belief in independence is reflected in the assignment of probabilities to the events of the compound experiment If the trials are (believed to be) independent, then we set P(A, B, C, AC, …) = P(A)P(B)P(C)P(AC)… Both A and AC cannot occur on the same trial of the simple experiment: here they are occurring on different subexperiments 19 Independence z Definition: Events A and B defined on an experiment are said to be (stochastically) mutually independent if P(A ∩ B) = P(A)P(B) z z z z Sometimes people say “A is independent of B” instead, but independence is mutual: A is independent of B if and only if B is independent of A If we believe that events A and B are physically independent, then we insist that this equality holds Physical independence is, in essence, a property of the events themselves. We believe that events A and B are physically independent and express this independence via P(AB)=P(A)P(B) Stochastic independence is a property of the probability measure and does not necessarily mean that the events are physically independent Independence z If A and B are mutually independent events, then P(B|A) = P(AB)/P(A) = P(A)P(B)/P(A) = P(B) and P(A|B) = P(AB)/P(B) = P(A) z The conditional probability of B given A is the same as the unconditional probability! Knowing that A occurred does not cause any “updating” of the chances of B. If A and B are mutually independent events, then P(AB) = P(A)P(B). If A and B are mutually exclusive events, then P(AB) = 0 For mutually exclusive events, P(B |A) = 0. Knowing that A occurred guarantees that B did not occur! Thus, A and B cannot be mutually independent as well. z z z Many people (and textbook authors!) feel that P(B|A) = P(B) is a much more natural definition of the notion of independence z z z “B is independent of A if P(B|A) = P(B)” A and B seem to have different roles and mutuality of independence is not obvious Assumes that P(A) > 0 20 Independence z z z Physical independence (which is a belief, remember?) of A and B implies stochastic independence — we insist that we must have P(AB) = P(A)P(B) But, if we do not have any reason to believe that A and B are physically independent, and our calculations reveal that P(AB) = P(A)P(B), we should not automatically assume that A and B are also physically independent If A and B are mutually independent events, then P(AB) = P(A)P(B) This is equivalent to each of the following z P(ABC) = P(A)P(BC) z P(ACB) = P(AC)P(B) z P(ACBC) = P(AC)P(BC) Examples (1) Playing cards. Take a card from standard deck of 52 cards. Are the following events independent: A = “taken ace”, B = “taken heart” ? P(A) = 4/52; P(B) = 13/52; AB = “taken heart ace”; P(AB) = 1/52 P(AB) = 1/52 = (4/52) (13/52) = P(A)P(B) ⇒ A and B are independent Let us include one joker in the deck. P(A) = 4/53; P(B) = 13/53; AB = “taken heart ace”; P(AB) = 1/53 P(AB) = 1/53 ≠ (4/53) (13/53) = P(A)P(B) ⇒ A and B are depended A minor change in the probability space destroyed the independence of A and B! Are A and B physically independent ? 21 Examples (2) Exclusive-OR gates. Let A and B respectively denote the events that inputs #1 and #2 of an Exclusive-OR gate are logical 1 Assume that A and B are physically independent (hence they are stochastically independent) events. Assume that P(A) = P(B) = 0.5 Let C denote the event that the output of the Exclusive-OR gate is logical 1 C = A⊕B = ABC ∪ ACB; P(A) = P(B) = 0.5; P(C) = P(ABC) + P(ACB) = P(A)P(BC) + P(AC)P(B) = 0.5⋅0.5 + 0.5⋅0.5 = 0.5 Are A and C independent events? P(AC) = P(A(ABC∪ACB)) = P(ABC) = P(A)P(BC) = 0.5⋅0.5 = 0.25 = P(A)P(C) Is the output of the XOR gate really independent of the input? z The output is stochastically independent of the input z The output is physically dependent on the input z Physical independence (such as A and B being independent) is a belief z Stochastic independence is an artifact of the probability measure z Examples (2) Exclusive-OR gates. Let A and B respectively denote the events that inputs #1 and #2 of an Exclusive-OR gate are logical 1 Assume that A and B are physically independent (hence they are stochastically independent) events. C = A⊕B = ABC ∪ ACB; P(A) = P(B) = 0.500001; P(C) = P(A)P(BC) + P(AC)P(B) = 2⋅0.500001⋅0.499999 = 0.49999999998 P(AC) = P(ABC) = P(A)P(BC) = 0.500001⋅0.499999 = 0.2499999999 ≠ P(A)P(C) = 0.250000499998… zA minor change in the probabilities of A and B from P(A) = P(B) = 0.5 to P(A) = P(B) = 0.500001 destroyed the independence of A and C! z It would be hard to distinguish between the two cases via experimentation z The occurrence of stochastic independence of A and C does not imply that A and C are physically independent z The output of an XOR gate does depend on its input 22 Series Independent Trials The k trials on which A occurs in a series of n trials can be specified by stating the subset (of size k) of {1, 2, 3, … , n} on which A occurred z How many such subsets are there? z The probability p that A occurs on k trials and does not occur on the other n–k trials is n P{A occurs on k trials out of n } = ⎛⎜ ⎞⎟ p k (1 − p )n − k ⎝k ⎠ z Generalization: The probability that A occurs on na trials, B occurs on nb trials, C occurs on nc trials, . . . in n trials is z P{A occurs on na , B occurs on nb , C occurs on nc , trials out of n } = n! = P(A)na P(B)nb P(C)nc K na ! nb ! nc ! K Series Independent Trials z The most probably number of occurring an event A in a series of n trials is: [ n⋅ P(A) – (1–P(A)) ] where [x] denotes whole part of x Example. What is more probably: a chases match between equal players to finish 3:2 or 5:5. ⎛ ⎞ 3 2 P(“3 : 2”) = ⎜ 3 ⎟0.5 (1 − 0.5) = 0.3125 ⎝ ⎠ ⎛10 ⎞ P(“5 : 5”) = ⎜ ⎟0.55 (1 − 0.5)5 = 0.2461 ⎝5⎠ 5 23 Examples (1) Compute the probabilities that in four rolls of a dice, at least one six would turn up and that in 24 rolls of two dice, a pair of sixes would turn up. (Chevalier de Méré) P(“at least one 6 of 4 rolls”) = 1 ⎛ 4 ⎞⎛ 1 ⎞ ⎜ 1 ⎟⎜ ⎟ ⎝ ⎠⎝ 6 ⎠ 3 2 2 3 1 4 0 500 + 150 + 20 + 1 ⎛ 5 ⎞ ⎛ 4 ⎞⎛ 1 ⎞ ⎛ 5 ⎞ ⎛ 4 ⎞⎛ 1 ⎞ ⎛ 5 ⎞ ⎛ 4 ⎞⎛ 1 ⎞ ⎛ 5 ⎞ = 0.5177 ⎜ ⎟ + ⎜ ⎟⎜ ⎟ ⎜ ⎟ + ⎜ ⎟⎜ ⎟ ⎜ ⎟ + ⎜ ⎟⎜ ⎟ ⎜ ⎟ = 4 3 2 6 6 1296 6 6 6 6 6 ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ 0 4 P(“at least one 6 of 4 rolls”) = 1 - P(“no 6 of 4 rolls”) = 1 - ⎛ 4 ⎞⎛ 1 ⎞ ⎛ 5 ⎞ ⎜ 0 ⎟⎜ ⎟ ⎜ ⎟ = 0.5177 ⎝ ⎠⎝ 6 ⎠ ⎝ 6 ⎠ P(“at least one 2×6 of 24 rolls”) = 0 24 1 - P(“no 2×6 of 24 rolls”) = 1 - ⎛ 24 ⎞⎛ 1 ⎞ ⎛ 35 ⎞ ⎜ 0 ⎟⎜ ⎟ ⎜ ⎟ = 0.4914 ⎝ ⎠⎝ 36 ⎠ ⎝ 36 ⎠ What is the probability that among 50 people in a bus will be more then 2 bald people if we know that baldness in the population is 13%? A = “more than 2 bald”, AC = “0, 1 or 2 bald” P(A) = 1 – P(AC) = 1 – ⎛ 50 ⎞0.1300.8750 − ⎛ 50 ⎞0.1310.87 49 − ⎛ 50 ⎞0.13 20.87 48 ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ 1⎠ ⎝2⎠ ⎝0⎠ = 1 – 0.0009462 – 0.0070690 – 0.0258790 = 0.9661058 End of Chapter 2 Thank you for your attention! 24 III Random Variables Random Variables z A random variable is a numerical description of the outcome of an experiment. Random variable X maps ω ∈ Ω to the number X(ω) z A random variable can be classified as being either discrete or continuous depending on the numerical values it assumes. A discrete random variable may assume either a finite number of values or an infinite sequence of values. A continuous random variable may assume any numerical value in an interval or collection of intervals. z z 1 Examples z z z The random variable is always denoted as X, never as X(ω) It is often convenient to not display the arguments of the functions when it is the functional relationship that is of interest d(uv) = u•dv + v•du Discrete random variable with a finite number of values Let x = number of TV sets sold at the store in one day where x can take on 5 values (0, 1, 2, 3, 4) z Discrete random variable with an infinite sequence of values Let x = number of customers arriving in one day where x can take on the values 0, 1, 2, . . . We can count the customers arriving, but there is no finite upper limit on the number that might arrive. Discrete Probability Distributions z The probability distribution for a random variable describes how probabilities are distributed over the values of the random variable. z The probability distribution is defined by a probability function,, denoted by f(x), which provides the probability for each value of the random variable. ⎛ x x x K xn ⎞ f (x) : ⎜ 1 2 3 ⎟, ⎝ p1 p2 p3 K pn ⎠ Σf(xi) = ΣP(X=xi) = Σpi = 1 z All the probabilistic information about the discrete random variable X is summarized in its probability function z The probability function can be used to answer questions such as “What is the probability that X has value between a and b?” “What is the probability that X is an even number?” 2 Example: JSL Appliances Using past data on TV sales (below left), a tabular representarepresentation of the probability distribution for TV sales (below right) was developed. Units Sold N. of Days 0 1 2 3 4 80 50 40 10 20 200 x 0 1 2 3 4 f(x) .40 .25 .20 .05 .10 1.00 .50 Probability G .40 .30 .20 .10 0 1 2 3 4 Graphical Representation of the Probability Distribution What is the probability that in a day the number of sold units will be less than 2? P(X < 2) = P(X = 0) + P(X = 1) = 0.4 + 0.25 = 0.65 Expected Value z The expected value,, or mean,, of a random variable is a measure of its central location. Expected value of a discrete random variable: E(X) = μ = Σxi ⋅f(xi) = Σxi ⋅P(X = xi) z Features of expected value E(c) = c E(cX) = cE(X) 3 Variance (Dispersion) z The variance summarizes the variability in the values of a random variable. Variance of a discrete random variable: Var(X) = D(X) = σ 2 = E(X – E(X))2 = Σ(xi - μ)2f(xi) z z Features of variance D(c) = 0 D(cX) = c2D(X) D(X) = EX2 – (EX)2 The standard deviation,, σ, is defined as the positive square root of the variance. Example: JSL Appliances z Expected Value of a Discrete Random Variable x 0 1 2 3 4 f(x) xf( xf(x) .40 .00 .25 .25 .20 .40 .05 .15 .10 .40 E(x) = 1.20 The expected number of TV sets sold in a day is 1.2 4 Example: JSL Appliances z Variance and Standard Deviation of a Discrete Random Variable x x-μ (x - μ)2 f(x) (x - μ)2f(x) 0 1 2 3 4 -1.2 -0.2 0.8 1.8 2.8 1.44 0.04 0.64 3.24 7.84 .40 .25 .20 .05 .10 .576 .010 .128 .162 .784 1.660 = σ 2 x2f(x f(x) 0 .25 .80 .45 1.60 2 σ = 3.10 – 1.22 The variance of daily sales is 1.66 TV sets squared. squared. The standard deviation of sales is 1.2884 TV sets. Discrete Uniform Probability Distribution z The discrete uniform probability distribution is the simplest example of a discrete probability distribution given by a formula x x f(x) = 1/n, 1/n, f ( x ) : ⎛⎜ 1 2 x3 K x n ⎞ ⎟ ⎝1/n 1/n 1/n K 1/n ⎠ Note that the values of the random variable are equally likely. E(x) = μ = (1/n)Σxi, Var(X) = (1/n)Σ(xi - μ)2 5 Binomial Probability Distribution z Properties of a Binomial Experiment The experiment consists of a sequence of n identical trials.. Two outcomes, success and failure,, are possible on each trial. The probability of a success, denoted by p,, does not change from trial to trial. The trials are independent.. f (x) = n! p xx (1 − p )((nn−−xx)) x! (n − x )! E(X) = μ = np, Var(X) = σ 2 = np(1 - p) where: f(x) = the probability of x successes in n trials n = the number of trials p = the probability of success on any one trial Example: Evans Electronics z Using the Binomial Probability Function Evans is concerned about a low retention rate for employees. On the basis of past experience, management has seen a turnover of 10% of the hourly employees annually. Thus, for any hourly employees chochosen at random, management estimates a probability of 0.1 that the person will not be with the company next year. Choosing 3 hourly employees a random, what is the probability that 1 of them will leave the company this year? Let: Let: f (1) = p = .10, n = 3, x=1 3! (0.1)1(0.9)2 = (3)(0.1)(0.81) = 0.243 1! (3 − 1)! E(x) = μ = 3(.1) = .3 employees out of 3 Var(x) = σ 2 = 3(.1)(.9) = .27 6 Example: Evans Electronics Using the Tables of Binomial Probabilities z n 3 x 0 1 2 3 .10 .7290 .2430 .0270 .0010 .15 .6141 .3251 .0574 .0034 .20 .5120 .3840 .0960 .0080 .25 .4219 .4219 .1406 .0156 p .30 .3430 .4410 .1890 .0270 .35 .2746 .4436 .2389 .0429 .40 .2160 .4320 .2880 .0640 .45 .1664 .4084 .3341 .0911 .50 .1250 .3750 .3750 .1250 Example: Evans Electronics z Using a Tree Diagram First Worker Second Worker Leaves (.1) Third Worker L (.1) 3 Probab. .0010 2 .0090 L (.1) 2 .0090 S (.9) 1 .0810 L (.1) 2 .0090 S (.9) 1 .0810 L (.1) 1 .0810 S (.9) 0 .7290 S (.9) Leaves (.1) Stays (.9) Leaves (.1) Stays (.9) Stays (.9) Value of x 7 Poisson Probability Distribution z Properties of a Poisson Experiment The probability of an occurrence is the same for any two intervals intervals of equal length. The occurrence or nonoccurrence in any interval is independent of of the occurrence or nonoccurrence in any other interval. f (x) = μ xxe −−μ x! where: f(x) = probability of x occurrences in an interval μ = mean number of occurrences in an interval, e = 2.71828 Example: Mercy Hospital z Using the Poisson Probability Function Patients arrive at the emergency room of Mercy Hospital at the average rate of 6 per hour on weekend evenings. What is the probability of 4 arrivals in 30 minutes on a weekend evening? μ = 6/hour = 3/half3/half-hour, x = 4 f (4) = 34 (2.71828)−3 = .1680 4! 8 Example: Mercy Hospital z Using the Tables of Poisson Probabilities μ x 0 1 2 3 4 5 6 7 8 2.1 .1225 .2572 .2700 .1890 .0992 .0417 .0146 .0044 .0011 2.2 .1108 .2438 .2681 .1966 .1082 .0476 .0174 .0055 .0015 2.3 .1003 .2306 .2652 .2033 .1169 .0538 .0206 .0068 .0019 2.4 .0907 .2177 .2613 .2090 .1254 .0602 .0241 .0083 .0025 2.5 .0821 .2052 .2565 .2138 .1336 ..0668 .0278 .0099 .0031 2.6 .0743 .1931 .2510 .2176 .1414 .0735 .0319 .0118 .0038 2.7 .0672 .1815 .2450 .2205 .1488 .0804 .0362 .0139 .0047 2.8 .0608 .1703 .2384 .2225 .1557 .0872 .0407 .0163 .0057 2.9 .0550 .1596 .2314 .2237 .1622 .0940 .0455 .0188 .0068 3.0 .0498 .1494 .2240 .2240 .1680 .1008 .0504 .0216 .0081 Hypergeometric Probability Distribution z z The hypergeometric distribution is closely related to the binomial distribution. With the hypergeometric distribution, the trials are not independent, independent, and the probability of success changes from trial to trial. ⎛ r ⎞⎛ N − r ⎞ ⎜⎜ ⎟⎟⎜⎜ ⎟ x ⎠⎝ n − x ⎟⎠ ⎝ , for 0 ≤ x ≤ r f (x) = ⎛N ⎞ ⎜⎜ ⎟⎟ ⎝n⎠ where: f(x) = probability of x successes in n trials n = number of trials N = number of elements in the population r = number of elements in the population labeled success 9 Example: Neveready z Hypergeometric Probability Distribution Bob Neveready has removed two dead batteries from a flashlight and inadvertently mingled them with the two good batteries he intended intended as replacements. The four batteries look identical. Bob now randomly selects two of the four batteries. What is the probability probability he selects the two good batteries? ⎛ r ⎞⎛ N − r ⎞ ⎛ 2 ⎞⎛ 2 ⎞ ⎜ x ⎟⎜ n − x ⎟ ⎜ 2 ⎟⎜ 0 ⎟ ⎠ = ⎝ ⎠⎝ ⎠ = 1 = .167 f ( x ) = ⎝ ⎠⎝ 6 ⎛N ⎞ ⎛ 4⎞ ⎜ ⎟ ⎜ ⎟ n 2 ⎝ ⎠ ⎝ ⎠ where: x = 2 = number of good batteries selected n = 2 = number of batteries selected N = 4 = number of batteries in total r = 2 = number of good batteries in total Example: Lottery z Hypergeometric Probability Distribution In Macedonian lottery 7 numbers are pulled out of 39. What is the the probability that a player by filling one column will have k guesses? 39 total 7 win 32 lose 7 choose k from win k P(x=k) 0 ⎛ 7 ⎞⎛ 32 ⎞ ⎜ ⎟⎜ ⎟ k 7−k⎠ , for k = 0,1,…,7 f (k ) = ⎝ ⎠⎝ ⎛ 39 ⎞ ⎜7⎟ ⎝ ⎠ 7- k from lose 1 2 3 4 5 6 7 0.23099 0.43533 0.29022 0.08637 0.01191 0.000715 0.0000154 0.000000068 10 Continuous Probability Distributions z z z A continuous random variable can assume any value in an interval on the real line or in a collection of intervals. It is not possible to talk about the probability of the random variable assuming a particular value.. Instead, we talk about the probability of the random variable assuming a value within a given interval.. Area = .2967 The probability of the random variable assuming a value within some given interval from x1 to x2 is defined to be the area under the graph of the probability density function between x1 and x2. Area = .5 - .2967 = .2033 Area = .5 0 .83 Discrete Vs. Continous z Probability function ⎛ x x x K xn ⎞ f (x) : ⎜ 1 2 3 ⎟ ⎝ p1 p2 p3 K pn ⎠ Σpi = 1 P (a ≤ x ≤ b ) = E (X) = μ = Var (X) = n ∑ ∑ pi i : xi ∈[ a,b ] z Probability density function f(x) : R ⇒ R +∞ ∫ f ( x )dx = 1b −∞ P (a ≤ x ≤ b ) = ∫ f ( x )dx a +∞ n ∑ xi pi E (X) = μ = i =1 ( x i − μ )2 pi = i =1 = n ∑ i =1 −∞ +∞ Var (X) = ∫ ( x − μ )2 f ( x )dx = −∞ xi 2 pi − μ 2 ∫ xf ( x )dx +∞ = ∫x 2 f ( x )dx − μ 2 −∞ 11 Example 5 A train will be not in time is given by the following density function: E( x ) = ⎧ 3 2 ⎪ 500 ( 25 − x ), for − 2 ≤ x ≤ 5 ⎪ f (x) = ⎨ ⎪ ⎪⎩0, otherwise = Find the probability that the train will be late more than 2 minutes, the mean and vaiance. 5 P ( x > 2) = 3 ∫ 500 (25 − x = 75 5 x 3 5 108 /= x/ − = 0.216 500 2 500 2 500 )dx = 3x 4 5 75 x 2 5 / = / − 500 2 − 2 4 ⋅ 500 − 2 1575 1827 − = 0.6615 1000 2000 5 Var ( x ) = )dx = 2 2 −2 ∫ −2 2 3x ∫ 500 (25 − x 3x 2 (25 − x 2 )dx − 0.6615 2 = 500 = 25 3 5 3x 5 5 x / − / − 0.4376 = 5 ⋅ 500 − 2 500 − 2 = 3325 9471 − − 0.4376 = 2.424 500 2500 Uniform Probability Distribution z A random variable is uniformly distributed whenever the probability is proportional to the interval’ interval’s length. Uniform Probability Density Function ⎧ 1 ⎪ b − a , for a ≤ x ≤ b ⎪ f (x) = ⎨ ⎪ ⎪⎩ 0, otherwise E(x) = (a + b)/2 Var(x) = (b - a)2/12 where: where: a = smallest value the variable can assume b = largest value the variable can assume 12 Example: Slater's Buffet z Uniform Probability Distribution Slater customers are charged for the amount of salad they take. Sampling suggests that the amount of salad taken is uniformly distributed between 5 ounces and 15 ounces. The probability density function is f(x) = 1/10 for 5 < x < 15 =0 elsewhere where: x = salad plate filling weight f(x) P(12 < x < 15) = 1/10(3) = .3 E(x) = (a + b)/2 = = (5 + 15)/2 = 10 x Var(x) = (b - a)2/12 = (15 – 5)2/12 = 8.33 1/10 5 10 12 15 Salad Weight (oz.) Normal Probability Distribution z Graph of the Normal Probability Density Function f(x) f (x) = 1 −(x −μ)22 / 2σ 22 μ = mean e σ = standard 2πσ deviation π = 3.14159 e = 2.71828 σ=4 σ=2 μ x 13 Normal Probability Distribution z Characteristics of the Normal Probability Distribution The shape of the normal curve is often illustrated as a bellshaped curve. Two parameters,, μ (mean) and σ (standard deviation), determine the location and shape of the distribution. The highest point on the normal curve is at the mean, which is also the median and mode. The mean can be any numerical value: negative, zero, or positive. The normal curve is symmetric.. The standard deviation determines the width of the curve: larger values result in wider,, flatter curves. The total area under the curve is 1 (.5 to the left of the mean and .5 to the right). Standard Normal Probability Distribution z z z A random variable that has a normal distribution with a mean of zero and a standard deviation of one is said to have a standard normal probability distribution. The letter z (or N(0,1))) is commonly used to designate this normal random variable. Converting to the Standard Normal Distribution z= z x−μ σ μ = 0,σ =1, f (x) = 1 −x22/ 2 e 2π We can think of z as a measure of the number of standard deviations x is from μ. 14 Example: Pep Zone z Standard Normal Probability Distribution Pep Zone sells auto parts and supplies including a popular multimulti-grade motor oil. When the stock of this oil drops to 20 gallons,, a replenishment order is placed. The store manager is concerned that sales are being lost due to stockouts while waiting for an order. It has been determined that leadtime demand is normally distributed with a mean of 15 gallons and a standard deviation of 6 gallons.. The manager would like to know the probability of a stockout, stockout, P(x x > 20). P( Example: Pep Zone z Standard Normal Probability Distribution The Standard Normal table shows an area of .2967 for the region between the z = 0 and z = .83 lines below. The shaded tail area is .5 - .2967 = .2033. The probability of a stockstock-out is .2033. Area = .2967 z = (x (x - μ)/σ = (20 - 15)/6 = .83 Area = .5 - .2967 = .2033 z 0 .83 15 Example: Pep Zone z z Using the Standard Normal Probability Table .00 .01 .02 .03 .04 .05 .0 .0000 .0040 .0080 .0120 .0160 .0199 .1 .0398 .0438 .0478 .0517 .0557 .0596 .2 .0793 .0832 .0871 .0910 .0948 .0987 .3 .1179 .1217 .1255 .1293 .1331 .1368 .06 .07 .08 .09 .0239 .0279 .0319 .0359 .0636 .0675 .0714 .0753 .1026 .1064 .1103 .1141 .1406 .1443 .1480 .1517 .4 .1554 .1591 .1628 .1664 .1700 .1736 .1772 .1808 .1844 .1879 .5 .1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224 .6 .2257 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2518 .2549 .7 .2580 .2612 .2642 .2673 .2704 .2734 .2764 .2794 .2823 .2852 .8 .2881 .2910 .2939 .2967 .2995 .3023 .3051 .3078 .3106 .3133 .9 .3159 .3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389 Example: Pep Zone z Standard Normal Probability Distribution If the manager of Pep Zone wants the probability of a stockout to be no more than .05, what should the reorder point be? Area = .05 Area = .5 Area = .45 0 z.05 Let z.05 represent the z value cutting the .05 tail area. 16 Example: Pep Zone We now looklook-up the .4500 area in the Standard Normal Probability table to find the corresponding z.05 value. z . .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 . . . . . . . . . . 1.5 .4332 .4345 .4357 .4370 .4382 .4394 1.6 .4452 .4463 .4474 .4484 .4495 .4505 1.7 .4554 .4564 .4573 .4582 .4591 .4599 1.8 .4641 .4649 .4656 .4664 .4671 .4678 .4406 .4418 .4429 .4441 .4515 .4525 .4535 .4545 .4608 .4616 .4625 .4633 .4686 .4693 .4699 .4706 1.9 .4713 .4719 .4726 .4732 .4738 .4744 .4750 .4756 .4761 .4767 . . . . . . . . . . . z.05 = 1.645 is a reasonable estimate. The corresponding value of x is given by x = μ + z.05σ = 15 + 1.645(6) = 24.87 A reorder point of 24.87 gallons will place the probability of a stockout during leadtime at .05. Exponential Probability Distribution z Exponential Probability Density Function 1 f (x) = e−−xx//μ, x ≥ 0, μ ≥ 0 μ z Cumulative Exponential Distribution Function P(x ≤ x00) = 1 − e−−xxoo//μ z where: where: x0 = some specific value of x Example The time between arrivals of cars at Al’ Al’s Carwash follows an exponential probability distribution with a mean time between arrivals of 3 minutes. Al would like to know the probability that the time between two successive arrivals will be 2 minutes or less. P(x < 2) = 1 - 2.71828-2/3 = 1 - .5134 = .4866 17 Example: Al’s Carwash z Graph of the Probability Density Function f(x) .4 .3 P(x < 2) = area = .4866 .2 .1 x 1 2 3 4 5 6 7 8 9 10 Time Between Successive Arrivals (mins.) Relationship between the Poisson and Exponential Distributions (If) the Poisson distribution provides an appropriate description of the number of occurrences per interval (If) the exponential distribution provides an appropriate description of the length of the interval between occurrences 18 Chi-Square Distribution z If X1, X2, …, Xn are independent random variables with z (N(0,1)) distribution then the random variable χ2 = X12 + X22 + K + Xn2 has a ChiChi-Square Distribution with n degrees of freedom α 1 2 n / 2 −1 − x / 2 χ : f (x) = n / 2 x e where Γ(α ) = x α −1e − y dy 2 Γ(n / 2) 0 ∫ E(x) = n , Var(x) = 2n for n=2 we have Exponential PDF General form: f(x) = λ⋅exp(–λx)⋅(λx)t–1/Γ(t) for x > 0 λ=1 t=3 Student Distribution If X has N(0,1) and Y have χ2 then t = X n has a Student Dis. Y with n degrees of freedom (X and Y are are independent random variables) t : f (x) = Γ((n + 1) / 2) ⎛⎜ x 2 ⎞⎟ 1+ n ⎟⎠ Γ(n / 2) nπ ⎜⎝ −( n +1) / 2 α ∫ where Γ(α ) = x α −1e − y dy 0 Graph of t density for n = 1; 3; 8 and the normal density with μ =0; σ = 1. 19 Fisher Distribution If X has χ2 with n1 degrees of freedom and Y have χ2 with n2 degrees of freedom then F = X/ n1 Y/ n2 has a Fisher Distribution with (n1,n2) degrees of freedom (X and Y are are independent random variables) α nn1 / 2nn2 / 2 (n −2) / 2 2 x 1 (n2 + n1x)−(n1+n2 ) / 2 where Γ(α) = xα−1e−y dy F : f ( x) = 1 β(n1 / 2,n2 / 2) ∫ 0 Fisher distribution Fα Joint Distributions z The generalization from one random variable to two random variables is the most challenging intellectual concept. Once the two random variable case is understood, the extension of the ideas to many random variables is easy z Let X and Y denote two random variables defined on the same sample space Ω z The outcome ω ∈ Ω is mapped to the real number X(ω) by the random variable X, and to the real number Y(ω) by the random variable Y z Jointly, the random variables X and Y are said to map the outcome ω ∈ Ω to the point (X(ω), Y(ω)) in the plane 20 Joint Distributions z z The individual probabilistic descriptions of X and Y are insufficient to determine the probabilistic behavior of the random point (X, Y) in the plane The random point (X, Y) is also called the bivariate random variable (X, Y) the joint random variable (X, Y) the random vector (X, Y) z The joint probability density function (joint PDF) for discrete random variables X and Y taking on values u1, u2, …, un, … and v1, v2, …, vm, … respectively is defined as PX,Y(u,v) = P{X = u, Y = v} = P{{X = u}∩{Y = v}}. Example X 0 Y 1 2 3 0 .02 .05 .10 .03 .20 1 .04 .09 .13 .08 .34 2 .05 .15 .17 .09 .46 .11 .29 .40 .20 21 Covariance and Correlation z The covariance is a measure of the linear association between two random variables X and Y sXY = E((X – EX)(Y – EY)) = … = E(XY) - EX⋅EY If X and Y are independent, sXY = 0. E(XY) = EX⋅EY + sXY Var(X + Y) = Var(X) + Var(Y) + sXY z The correlation coeficient is a normalized measure of the linear association between two random variables ρXY = s XY Var (X)Var (Y) , | ρXY | ≤ 1 If X and Y are independent, ρXY = 0. Examples y y A Positive Relationship ρXY > 0.8 A Negative Relationship ρXY < −0.8 x y No Apparent Relationship −0.5 < ρXY < 0.5 x x y No Linear Relationship −0.5 < ρXY < 0.5 x 22 Example X 0 Y 1 2 E(X) = 1(.05+.09+.15) + 2(.10+.13+.17) + 3(.03+.08+.09) = 1.69 3 0 .02 .05 .10 .03 .20 E(Y) = 1(.04+.09+.13+.08) + 2(.05+.15+.17+.09) = 1.27 1 .04 .09 .13 .08 .34 Var(X) = .29⋅12+.40⋅22+.20⋅32 – 1.692 = 0.83 2 .05 .15 .17 .09 .46 Var(Y) = .34⋅12+.46⋅22 – 1.272 = 0.58 .11 .29 .40 .20 E(XY) = 1 ⋅ .09 + 2 ⋅ .13 + 3 ⋅ .08 + 2 ⋅ .15 + 4 ⋅ .17 + 6 ⋅ .09 = 2.11 E (XY) − E (X)E (Y) 2.11 − 1.69 ⋅ 1.27 ρXY = = −0.06 XY = 0.83 ⋅ 0.58 Var (X)Var (Y) Conclusion: There is no linear relationship between Example X 2 4 6 1 3 E(X) = (2 + 4 + 6 + 1 + 3)/5 = 3.2 Y 1 0 7 4 6 E(Y) = (1 + 0 + 7 + 4 + 6)/5 = 3.6 0.2 for i = j P ( X = xi ,Y = y j ) = ⎧⎨ for i ≠ j ⎩0 Var(X) = (22 + 42 + 62 + 12 + 32)/5 – 3.22 = 2.96 Var(Y) = (12 + 02 + 72 + 42 + 62)/5 – 3.62 = 7.44 E(XY) = (2 + 0 + 42 + 4 + 18)/5 = 13.2 E (XY) − E (X)E (Y) 13.2 − 3.2 ⋅ 3.6 ρXY = = 0.36 XY = Var (X)Var (Y) 2.96 ⋅ 7.44 Conclusion: There is week linear relationship between X and Y 23 End of Chapter 3 Thank you for your attention! 24 IV Central Limit Theorem Chebyshev Inequality z (Chebyshev Inequality) Let X be a random variable with finite expected value μ = E(X) and finite variance σ2. Then for any positive number ε > 0 we have P( X − μ ≥ ε ) ≤ z σ 22 ε 22 If ε = kσ = k standard deviations for some integer k, then σ22 1 P ( X − μ ≥ kσ ) ≤ 2 2 = 2 2 2 k σ k2 % of Values in Some Commonly Used Intervals (1-1/4) ≡ 75% of values of a random variable are within +/- 1 standard deviation of its mean. (1-1/16) ≡ 93.75% of values of a random variable are within +/- 2 standard deviations of its mean. (1-1/36) ≡ 97.25% of values of a random variable are within +/- 3 standard deviations of its mean. 1 Law of Large Numbers z (Law of Large Numbers) Let X1, X2, . . . , Xn be an independent trials process with a continuous density function f, finite expected value μ, and finite variance σ2. Let Sn = X1 +X2 + … + Xn be the sum of the Xi. Then for any ε > 0 we have S S lim P ( nn − μ ≥ ε ) = 0 or lim P ( nn − μ ≤ ε ) = 1 nn→ nn→ n n →∞ ∞ →∞ ∞ z Note that Sn/n is an average of the individual outcomes, and one often calls the Law of Large Numbers the “law of averages“. It is a striking fact that we can start with a random experiment about which little can be predicted and, by taking averages, obtain an experiment in which the outcome can be predicted with a high degree of certainty. Example Let us consider the special case of tossing a coin n times with Sn the number of heads that turn up. Then the random variable Sn/n represents the fraction of times heads turns up and will have values between 0 and 1. In Figures, we have plotted the distribution for this example for increasing values of n. We have marked the outcomes between .45 and .55 by dots at the top of the spikes. We see that as n increases the distribution gets more and more concentrated around .5 and a larger and larger percentage of the total area is contained within the interval (45, 55), as predicted by the Law of Large Numbers. 2 Central Limit Theorem z The second fundamental theorem of probability is the Central Limit Theorem. This theorem says that if Sn is the sum of n mutually independent random variables (each of which contributes a small amount to the total), then the distribution function of Sn is wellapproximated by a normal density function f (x) = 22 22 1 e −−((xx−−μμ)) // 22σσ 2π σ μ = 0, σ = 1, f ( x ) = 1 −−xx22 // 22 e 2π Distribution of heights of adult women. Central Limit Theorem z (Central Limit Theorem) Let X1, X2, …, Xn be a sequence of independent random variables, and let Sn = X1 +X2 + … +Xn. For each n, denote the mean and variance of Xi by μi and σ2i, respectively. If there exists a constant A, such that |Xi| ≤ A for all i, then 22 S /n −μ 1 lim P ( nn < b) = e −−yy dy nn→ →∞ ∞ σ/ n 2π − ∞ −∞ bb ∫ z The above theorem essentially says that anything that can be thought of as being made up as the sum of many small independent pieces is approximately normally distributed. 3 Example Suppose we choose n random numbers from the interval [0, 1] with uniform density. Let X1, X2, ..., Xn denote these choices, and Sn = X1 + X2 + … + Xn their sum. Then the density function for Sn tends to have a normal shape, but is centered at n/2. xx 22 1 S / n − 1/ 2 < x) = P ( nn e −−yy dy σ/ n 2π − ∞ −∞ ∫ Density function for n = 2, 3, 4, 10 General Problem To calculate probabilities of different events related to some experiment it is best to know probability density function. PDF enables us to compute probability of each event. Parametric: we know the PDF form and only need to calculate the PDF parameters. Example: We know that PDF is normal but we do not know μ and σ. Problem: supposition about the PDF form Two approaches to estimate PDF Non-Parametric: we do not know anything about the PDF form. Example: We build histogram and approximate the PDF. Problem: we need large number of samples 4 Fortunately For enough large sample size n, according to central limit theorem we can consider that it has normal distribution. In practice it is enough to be n ≥ 30. Central Limit Theorem enables supposition for normal distribution of many events because they are influenced by a number of random unknown parameters with undecisive influence to the event. Everybody believes in correctness of Central Limit Theorem and using of normal distribution in case of enough large sample size: Mathematician: because it is experimental fact ! Engineers and practitioners: because it is mathematically proven ! Variations (n − 1)s 2 • ChiChi-Square Distribution χ = 2 σ 2 , where s 2 = 1 n ( x i − x )2 n i =1 ∑ with n-1 degrees of freedom (X1, X2, …, Xn have normal distribution). • If X has χ2 with n1 degrees of freedom and Y have χ2 with n2 X/ n1 has a Fisher Distribution Y/ n2 with (n1,n2) degrees of freedom. x −μ X n x −μ σ/ n = = If X has N(0,1) and Y have χ2 then t = degrees of freedom then F = • Y (n − 1)s 2 σ 2 /( n − 1) s/ n has a Student distribution with n degrees of freedom These variations of Normal distribution are extremly usefull because they have only one uknown parameter (to estimate). In the first two it is σ, while in the third one it is μ 5 End of Chapter 4 6 V Statistics Statistics z The statistics uses the probability methods to deal with empirical data obtained by measurements, observations, etc. z The statistics in some sense is opposite to the probability: z In probability we examine concrete events and situations according to the model (Ω (Ω, ℑ, P) In statistics we try to build the model using concrete events and situations (statistical data) Statistical methods are based on random sample:: n samples are taken from population and analyze the obtained results enable to infer for all population 1 Statistical Inference z The purpose of statistical inference is to obtain information about a population from information contained in a sample. z A population is the set of all the elements of interest. z A sample is a subset of the population. z The sample results provide only estimates of the values of the population characteristics. z A parameter is a numerical characteristic of a population. z With proper sampling methods,, the sample results will provide “good” good” estimates of the population characteristics. Simple Random Sampling z Finite Population z A simple random sample from a finite population of size N is a sample selected such that each possible sample of size n has the same probability of being selected. Replacing each sampled element before selecting subsequent elements is called sampling with replacement.. Sampling without replacement is the procedure used most often. In large sampling projects, computercomputer-generated random numbers are often used to automate the sample selection process. Infinite Population A simple random sample from an infinite population is a sample selected independently from the same population. The population is usually considered infinite if it involves an ongoongoing process that makes listing every element impossible. The random number selection procedure cannot be used for infinite populations. 2 Some Statistical Tasks Example: Let us have n independent trials of an experiment E where an event A occurs m times. How to estimate P = P(A) ? z Point estimations of unknown parameters m probability P. P̂ = ≈ P is a good estimation of the uknown probability n z Interval estimation of unknown parameters z We should find P and P such that P(" P ∈ [P, P]" ) ≈ 1 For example P = m + 2 and P = m − 2 n n Hypothesis Testing m m and hypotesis H1: P ≠ P̂ = n n Wheather we should accept H0? Hypotesis H0: P = P̂ = End of Chapter 5 3 VI Descriptive Statistics Descriptive Statistics: Tabular and Graphical Methods z Summarizing Qualitative Data z Summarizing Quantitative Data z z Frequency Distribution Relative Frequency Percent Frequency Distribution Bar Graph Pie Chart Frequency Distribution Relative Frequency and Percent Frequency Distributions Histogram Cumulative Distributions Ogive Exploratory Data Analysis Crosstabulations and Scatter Diagrams 1 Frequency Distribution z A frequency distribution is a tabular summary of data showing the frequency (or number) of items in each of several nonoverlapping classes. z The relative frequency of a class is the fraction or proportion of the total number of data items belonging to the class. A relative frequency distribution is a tabular summary of a set of data showing the relative frequency for each class. z z z The percent frequency of a class is the relative frequency multiplied by 100. A percent frequency distribution is a tabular summary of a set of data showing the percent frequency for each class. Example: Marada Inn Guests staying at Marada Inn were asked to rate the quality of their accommodations as being excellent, excellent, above average, average, average, average, below average, average, or poor. poor. The ratings provided by a sample of 20 guests are shown below. below. Below Average Above Average Above Average Average Above Average Average Above Average Average Above Average Below Average Poor Excellent Above Average Average Above Average Above Average Below Average Poor Above Average Average 2 Example: Marada Inn z Frequency Distribution Rating Frequency Poor 2 Below Average 3 Average 5 Above Average 9 Excellent 1 Total 20 Example: Marada Inn z Relative Frequency and Percent Frequency Distributions Rating Poor Below Average Average Above Average Excellent Total Relative Percent Frequency Frequency .10 .15 .25 .45 .05 1.00 10 15 25 45 5 100 3 Bar Graph: Marada Inn z Bar Graph 9 Frequency 8 7 6 5 4 3 2 1 Poor Below Average Above Excellent Average Average Rating Pie Chart: Marada Inn z Exc. Poor 5% 10% Pie Chart Above Average 45% Below Average 15% Average 25% Quality Ratings 4 Example: Hudson Auto Repair The manager of Hudson Auto would like to get a better pictupicture of the distribution of costs for engine tunetune-up parts. A samsample of 50 customer invoices has been taken and the costs of parts, rounded to the nearest dollar, are listed below. 91 71 104 85 62 78 69 74 97 82 93 72 62 88 98 57 89 68 68 101 75 66 97 83 79 52 75 105 68 105 99 79 77 71 79 80 75 65 69 69 97 72 80 67 62 62 76 109 74 73 Frequency Distribution z Guidelines for Selecting Number of Classes Use between 5 and 20 classes. Data sets with a larger number of elements usually require a larger number of classes. Smaller data sets usually require fewer classes. z Guidelines for Selecting Width of Classes Use classes of equal width. Approximate Class Width = Largest Data Value − Smallest Data Value Number of Classes 5 Example: Hudson Auto Repair z Frequency Distribution If we choose six classes: Approximate Class Width = (109 - 52)/6 = 9.5 ≅ 10 Cost ($) 5050-59 6060-69 7070-79 8080-89 9090-99 100100-109 Frequency 2 13 16 7 7 5 Total 50 Example: Hudson Auto Repair z Relative Frequency and Percent Frequency Distributions Cost ($) 5050-59 6060-69 7070-79 8080-89 9090-99 100100-109 Total Relative Frequency .04 .26 .32 .14 .14 .10 1.00 Percent Frequency 4 26 32 14 14 10 100 6 Histogram z Another common graphical presentation of quantitative data is a histogram.. z The variable of interest is placed on the horizontal axis and the frequency, relative frequency,, or percent frequency is placed on the vertical axis. z Algorithm: z Find the range (width) of the data. (Range=max – min)) ( Divide the range into classes < 25 samples 5 to 6 classes 25 to 50 samples 7 to 14 classes > 50 samples 15 to 20 classes Find the relative frequency for each class. Unlike a bar graph, a histogram has no natural separation between rectangles of adjacent classes. Example: Hudson Auto Repair Histogram 18 Approximation of Probability Density Function 16 Frequency z 14 12 10 8 6 4 2 50 60 70 80 90 100 110 Parts Cost ($) 7 Cumulative Distribution z The cumulative frequency distribution shows the number of items with values less than or equal to the upper limit of each class. z The cumulative relative frequency distribution shows the proportion of items with values less than or equal to the upper limit of each class. z The cumulative percent frequency distribution shows the percentage of items with values less than or equal to the upper limit of each class. Example: Hudson Auto Repair z Cumulative Distributions Cost ($) < 59 < 69 < 79 < 89 < 99 < 109 Cumulative Frequency 2 15 31 38 45 50 Cumulative Cumulative Relative Percent Frequency Frequency .04 4 .30 30 .62 62 .76 76 .90 90 1.00 100 8 Ogive z z z An ogive is a graph of a cumulative distribution. The data values are shown on the horizontal axis. Shown on the vertical axis are the: z z cumulative frequencies, or cumulative relative frequencies, or cumulative percent frequencies The frequency (one of the above) of each class is plotted as a point. The plotted points are connected by straight lines. Example: Hudson Auto Repair z Ogive Because the class limits for the partsparts-cost data are 5050-59, 6060-69, and so on, there appear to be oneone-unit gaps from 59 to 60, 69 to 70, and so on. These gaps are eliminated by plotting points halfway between the class limits. Thus, 59.5 is used for the 5050-59 class, 69.5 is used for the 6060-69 class, and so on. 9 Example: Hudson Auto Repair Ogive with Cumulative Percent Frequencies Cumulative Percent Frequency z 100 80 60 40 20 50 60 70 80 90 100 Parts Cost ($) 110 Descriptive Statistics: Numerical Methods z z z z z z μ Measures of Location Measures of Variability Measures of Relative Location and Detecting Outliers Exploratory Data Analysis Measures of Association Between Two Variables The Weighted Mean and Working with Grouped Data x σ % 10 Measures of Location z z z z z Mean Median Mode Percentiles Quartiles Example: Apartment Rents Given below is a sample of monthly rent values ($) for oneone-bedroom apartments. The data is a sample of 70 apartments in a particular city. The data are presented in ascending order. 425 440 450 465 480 510 575 430 440 450 470 485 515 575 430 440 450 470 490 525 580 435 445 450 472 490 525 590 435 445 450 475 490 525 600 435 445 460 475 500 535 600 435 445 460 475 500 549 600 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615 11 Mean z The mean of a data is the average of all the data values. z If the data are from a sample, the mean is denoted by x= z x. ∑ xi n If the data are from a population, the mean is denoted by μ . ∑ xi N μ=∑ Example: Apartment Rents z Mean 425 440 450 465 480 510 575 x= 430 440 450 470 485 515 575 430 440 450 470 490 525 580 ∑xi = 34356 = 490.80 n 435 445 450 472 490 525 590 70 435 445 450 475 490 525 600 435 445 460 475 500 535 600 435 445 460 475 500 549 600 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615 12 Median z z z z The median of a data set is the value in the middle when the data items are arranged in ascending order. For an odd number of observations, the median is the middle value. For an even number of observations, the median is the average of the two middle values. A few extremely large incomes or property values can inflate the mean but not median. Example: Apartment Rents z Median Median = 50th percentile i = (p (p/100)n /100)n = (50/100)70 = 35.5 Averaging the 35th and 36th data values: Median = (475 + 475)/2 = 475 425 440 450 465 480 510 575 430 440 450 470 485 515 575 430 440 450 470 490 525 580 435 445 450 472 490 525 590 435 445 450 475 490 525 600 435 445 460 475 500 535 600 435 445 460 475 500 549 600 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615 13 Mode z z z z The mode of a data set is the value that occurs with greatest frequency. The greatest frequency can occur at two or more different values. If the data have exactly two modes, the data are bimodal.. If the data have more than two modes, the data are multimodal.. Example: Apartment Rents z Mode 425 440 450 465 480 510 575 450 occurred most frequently (7 times) 430 430 435Mode 435= 450 435 435 435 440 440 445 445 445 445 445 450 450 450 450 460 460 460 470 470 472 475 475 475 480 485 490 490 490 500 500 500 515 525 525 525 535 549 550 575 580 590 600 600 600 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615 14 Percentiles z z A percentile provides information about how the data are spread over the interval from the smallest value to the largest value. The pth percentile of a data set is a value such that at least p percent of the items take on this value or less and at least (100 - p) percent of the items take on this value or more. Arrange the data in ascending order. Compute index i, the position of the pth percentile. i = (p (p/100)n /100)n If i is not an integer, round up. The p th percentile is the value in the i th position. If i is an integer, the p th percentile is the average of the values in positions i and i+1.. Example: Apartment Rents z 90th Percentile i = (p (p/100)n /100)n = (90/100)70 = 63 Averaging the 63rd and 64th data values: 90th Percentile = (580 + 590)/2 = 585 425 440 450 465 480 510 575 430 440 450 470 485 515 575 430 440 450 470 490 525 580 435 445 450 472 490 525 590 435 445 450 475 490 525 600 435 445 460 475 500 535 600 435 445 460 475 500 549 600 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615 15 Quartiles z z z z Quartiles are specific percentiles First Quartile = 25th Percentile Second Quartile = 50th Percentile = Median Third Quartile = 75th Percentile Example: Apartment Rents z Third Quartile Third quartile = 75th percentile i = (p (p/100)n /100)n = (75/100)70 = 52.5 = 53 Third quartile = 525 425 430 430 435 435 435 435 435 440 440 440 445 445 445 445 445 450 450 450 450 450 460 460 460 465 470 470 472 475 475 475 480 480 485 490 490 490 500 500 500 510 515 525 525 525 535 549 550 575 575 580 590 600 600 600 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615 16 Measures of Variability z z z z z z z It is often desirable to consider measures of variability (dispersion), as well as measures of location. For example, in choosing supplier A or supplier B we might consider not only the average delivery time for each, but also the variability in delivery time for each. Range Interquartile Range Variance Standard Deviation Coefficient of Variation Range z z z The range of a data set is the difference between the largest and smallest data values. It is the simplest measure of variability. It is very sensitive to the smallest and largest data values. 17 Example: Apartment Rents z Range Range = largest value - smallest value Range = 615 - 425 = 190 425 440 450 465 480 510 575 430 440 450 470 485 515 575 430 440 450 470 490 525 580 435 445 450 472 490 525 590 435 445 450 475 490 525 600 435 445 460 475 500 535 600 435 445 460 475 500 549 600 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615 Interquartile Range z z z The interquartile range of a data set is the difference between the third quartile and the first quartile. It is the range for the middle 50% of the data. It overcomes the sensitivity to extreme data values. 18 Example: Apartment Rents z Interquartile Range 3rd Quartile (Q (Q3) = 525 1st Quartile (Q (Q1) = 445 Interquartile Range = Q3 - Q1 = 525 - 445 = 80 425 440 450 465 480 510 575 430 440 450 470 485 515 575 430 440 450 470 490 525 580 435 445 450 472 490 525 590 435 445 450 475 490 525 600 435 445 460 475 500 535 600 435 445 460 475 500 549 600 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615 Variance z z The variance is a measure of variability that utilizes all the data. It is based on the difference between the value of each observation (x (xi) and the mean (x (x for a sample, μ for a population). 19 Variance z The variance is a measure of variability that utilizes all the data. z The variance is the average of the squared differences between each data value and the mean. If the data set is a sample, the variance is denoted by s2. z ∑ ( xi − x ) 2 n−1 2 s2 = z If the data set is a population, the variance is denoted by σ 2. 2 2 ∑ ( xi − μ) σ = σ = N Standard Deviation z z z The standard deviation of a data set is the positive square root of the variance. It is measured in the same units as the data,, making it more easily comparable, than the variance, to the mean. If the data set is a sample, the standard deviation is denoted s. s = s2 z If the data set is a population, the standard deviation is denoted σ (sigma). σ = σ2 20 Coefficient of Variation z z z The coefficient of variation indicates how large the standard deviation is in relation to the mean. If the data set is a sample, the coefficient of variation is computed as follows: s (100) x If the data set is a population, the coefficient of variation is computed as follows: σ (100) μ Example: Apartment Rents s2 = ∑ ( x i − x )2 z Variance z Standard Deviation z Coefficient of Variation n −1 = 2996 . 16 s = s 2 = 2996.47 = 54.74 s 54 .74 × 100 = × 100 = 11 .15 x 490 .80 21 Measures of Relative Location and Detecting Outliers z z z z z-Scores Chebyshev’ Chebyshev’s Theorem Empirical Rule Detecting Outliers z-Scores z z The z-score is often called the standardized value. It denotes the number of standard deviations a data value xi is from the mean. x −x zi = i s z z z A data value less than the sample mean will have a z-score less than zero. A data value greater than the sample mean will have a zz-score greater than zero. A data value equal to the sample mean will have a zzscore of zero. 22 Example: Apartment Rents z z-Score of Smallest Value (425) z= -1.20 -0.93 -0.75 -0.47 -0.20 0.35 1.54 xi − x 425 − 490.80 = = −1.20 s 54.74 Standardized Values for Apartment Rents -1.11 -1.11 -1.02 -1.02 -1.02 -1.02 -1.02 -0.93 -0.93 -0.93 -0.84 -0.84 -0.84 -0.84 -0.84 -0.75 -0.75 -0.75 -0.75 -0.75 -0.56 -0.56 -0.56 -0.47 -0.38 -0.38 -0.34 -0.29 -0.29 -0.29 -0.20 -0.20 -0.11 -0.01 -0.01 -0.01 0.17 0.17 0.17 0.17 0.44 0.62 0.62 0.62 0.81 1.06 1.08 1.45 1.54 1.63 1.81 1.99 1.99 1.99 1.99 2.27 -0.93 -0.75 -0.47 -0.20 0.35 1.45 2.27 Chebyshev’s Theorem At least (1 - 1/k2) of the items in any data set will be within k standard deviations of the mean, where k is any value greater than 1. At least 75% of the items must be within k = 2 standard deviations of the mean. At least 89% of the items must be within k = 3 standard deviations of the mean. At least 94% of the items must be within k = 4 standard deviations of the mean. 23 Example: Apartment Rents z Chebyshev’s Theorem Let k = 1.5 with x = 490.80 and s = 54.74 At least (1 - 1/(1.5)2) = 1 - 0.44 = 0.56 or 56% of the rent values must be between x - k(s) = 490.80 - 1.5(54.74) = 409 and x + k(s) = 490.80 + 1.5(54.74) = 573 Example: Apartment Rents z Chebyshev’s Theorem (continued) Actually, 86% of the rent values are between 409 and 573. 425 440 450 465 480 510 575 430 440 450 470 485 515 575 430 440 450 470 490 525 580 435 445 450 472 490 525 590 435 445 450 475 490 525 600 435 445 460 475 500 535 600 435 445 460 475 500 549 600 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615 24 Empirical Rule For data having a bellbell-shaped distribution: Approximately 68% of the data values will be within one standard deviation of the mean. Empirical Rule For data having a bellbell-shaped distribution: Approximately 95% of the data values will be within two standard deviations of the mean. 25 Empirical Rule For data having a bellbell-shaped distribution: Almost all (99.7%) of the items will be within three standard deviations of the mean. Example: Apartment Rents z Empirical Rule Within +/+/- 1s Within +/+/- 2s Within +/+/- 3s 425 440 450 465 480 510 575 430 440 450 470 485 515 575 430 440 450 470 490 525 580 Interval 436 436.06 to 545.54 381.32 to 600.28 326.58 to 655.02 435 445 450 472 490 525 590 435 445 450 475 490 525 600 435 445 460 475 500 535 600 435 445 460 475 500 549 600 % in Interval 48/70 = 69% 68/70 = 97% 70/70 = 100% 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615 26 Example: Apartment Rents z Five-Number Summary Lowest Value = 425 First Quartile = 450 Median = 475 Third Quartile = 525 Largest Value = 615 425 440 450 465 480 510 575 430 440 450 470 485 515 575 430 440 450 470 490 525 580 435 445 450 472 490 525 590 435 445 450 475 490 525 600 435 445 460 475 500 535 600 435 445 460 475 500 549 600 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615 Measures of Association Between Two Variables z Covariance z The covariance is a measure of the linear association between two variables. Positive values indicate a positive relationship.. Negative values indicate a negative relationship.. Correlation Coefficient The coefficient can take on values between -1 and +1. Values near -1 indicate a strong negative linear relationship.. Values near +1 indicate a strong positive linear relationship.. 27 Covariance z If the data sets are samples, the covariance is denoted by sxy. sxy = z ∑ ( xi − x )( yi − y ) n −1 If the data sets are populations, the covariance is denoted by σ xy σ xy = ∑ ( xi − μ x )( yi − μ y ) N Correlation Coefficient z If the data sets are samples, the coefficient is rxy. rxy = z sxy sx s y If the data sets are populations, the coefficient is ρ xy xy . ρ xy = σ xy σ xσ y 28 Grouped Data z z z z The weighted mean computation can be used to obtain approximations of the mean, variance, and standard deviation for the grouped data. To compute the weighted mean, we treat the midpoint of each class as though it were the mean of all items in the class. We compute a weighted mean of the class midpoints using the class frequencies as weights. Similarly, in computing the variance and standard deviation, the class frequencies are used as weights. Mean for Grouped Data z Sample Data x= ∑fM ∑f i i i z Population Data μ=∑ fi M i N where: fi = frequency of class i Mi = midpoint of class i 29 Example: Apartment Rents Given below is the previous sample of monthly rents for oneone-bedroom apartments presented here as grouped data in the form of a frequency distribution. Rent ($) Frequency 420-439 8 440-459 17 460-479 12 480-499 8 500-519 7 520-539 4 540-559 2 560-579 4 580-599 2 600-619 6 Example: Apartment Rents z Mean for Grouped Data Rent ($) 420-439 440-459 460-479 480-499 500-519 520-539 540-559 560-579 580-599 600-619 Total fi 8 17 12 8 7 4 2 4 2 6 70 Mi 429.5 449.5 469.5 489.5 509.5 529.5 549.5 569.5 589.5 609.5 f iMi 3436.0 7641.5 5634.0 3916.0 3566.5 2118.0 1099.0 2278.0 1179.0 3657.0 34525.0 x= 34525 = 493.21 70 This approximation differs by $2.41 from the actual sample mean of $490.80. 30 Variance for Grouped Data z Sample Data s2 = ∑ f i ( Mi − x ) 2 n −1 For apartment rents variance is s 2 = 3017.89 s = 3017.89 = 54.94 while standard deviation is This approximation differs by only $.20 from the actual standard deviation of $54.74. z Population Data ∑ fi ( Mi − μ ) 2 σ2 = 2 N End of Chapter 6 31 VII Point Estimation Sampling and Sampling Distributions z z z z z z z z z Simple Random Sampling Point Estimation Introduction to Sampling Distributions Sampling Distribution of x Sampling Distribution of s Sampling Distribution of p Properties of Point Estimators Other Sampling Methods Other Estimation Methods n = 100 n = 30 1 Point Estimation z In point estimation we use the data from the sample to compute a value of a sample statistic that serves as an estimate of a population parameter. Let us X be random variable with distribution f(x,θ) which depends on an unknown paremeter θ.. (if it depends on more parameters we will consider them separately) For estimation of θ we take a random sample x1, x2, …,xn, and compute a function t = t(x1, x2, …,xn) that estimate θ.. For each random sample we obtain a number for θ. z z z We refer to x as the point estimator of the population mean μ. s is the point estimator of the population standard deviation σ. p is the point estimator of the population proportion p.. Sampling Error z The absolute difference between an unbiased point estimate and the corresponding population parameter is called the sampling error.. z Sampling error is the result of using a subset of the population (the sample), and not the entire population to develop estimates. z The sampling errors are: | x − μ | for sample mean | s - σ | for sample standard deviation | p − p | for sample proportion 2 Example: St. Andrew’s St. Andrew’s University receives 900 applications annually from prospective students. The application forms contain a variety of information including the individual’s scholastic aptitude test (SAT) score and whether or not the individual desires on-campus housing. The director of admissions would like to know the following information: the average SAT score for the applicants, and the proportion of applicants that want to live on campus. We will now look at three alternatives for obtaining the desired information. Conducting a census of the entire 900 applicants Selecting a sample of 30 applicants, using a random number table Selecting a sample of 30 applicants, using computercomputergenerated random numbers Example: St. Andrew’s z Taking a Census of the 900 Applicants SAT Scores z z Population Mean μ=∑ xi = 990 900 Population Standard Deviation σ= ∑(x − μ) 2 i 900 = 80 Applicants Wanting OnOn-Campus Housing z Population Proportion p= 648 = .72 900 3 Example: St. Andrew’s z Take a Sample of 30 Applicants Using ComputerGenerated Random Numbers 900 random numbers are generated, one for each applicant in the population. Then we choose the 30 applicants corresponding to the 30 smallest random numbers as our sample. Each of the 900 applicants have the same probability of being included. Using Excel to Select a Simple Random Sample z Formula Worksheet 1 2 3 4 5 6 7 8 9 A B C Applicant Number 1 2 3 4 5 6 7 8 SAT Score 1008 1025 952 1090 1127 1015 965 1161 On-Campus Housing Yes No Yes Yes Yes No Yes No D Random Number =RAND() =RAND() =RAND() =RAND() =RAND() =RAND() =RAND() =RAND() Note: Rows 1010-901 are not shown. 4 Using Excel to Select a Simple Random Sample z Value Worksheet 1 2 3 4 5 6 7 8 9 A B C D Applicant Number 1 2 3 4 5 6 7 8 SAT Score 1008 1025 952 1090 1127 1015 965 1161 On-Campus Housing Yes No Yes Yes Yes No Yes No Random Number 0.41327 0.79514 0.66237 0.00234 0.71205 0.18037 0.71607 0.90512 Note: Rows 1010-901 are not shown. Using Excel to Select a Simple Random Sample z Value Worksheet (Sorted) 1 2 3 4 5 6 7 8 9 A B C D Applicant Number 12 773 408 58 116 185 510 394 SAT Score 1107 1043 991 1008 1127 982 1163 1008 On-Campus Housing No Yes Yes No Yes Yes Yes No Random Number 0.00027 0.00192 0.00303 0.00481 0.00538 0.00583 0.00649 0.00667 Note: Rows 10-901 are not shown. 5 Example: St. Andrew’s z Point Estimates ∑ xi = 29,910 = 997 x= 30 30 x as Point Estimator of μ s as Point Estimator of σ s = p as Point Estimator of p p = 20 30 = .68 ∑(x − x ) 2 z i 29 = 163,996 = 75.2 29 Note: Different random numbers would have identified a different sample which would have resulted in different point estimates. Sampling Distributionx of z Process of Statistical Inference Population with mean μ=? The value of x is used to make inferences about the value of μ. A simple random sample of n elements is selected from the population. The sample data provide a value for the sample mean x . 6 Properties of Point Estimators z Before using a sample statistic as a point estimator, stastatisticians check to see whether the sample statistic has the following properties associated with good point estimators. Unbiasedness Efficiency Consistency Properties of Point Estimators z Unbiasedness If the expected value of the sample statistic is equal to the population parameter being estimated, the sample statistic is said to be an unbiased estimator of the population parameter. E( x ) = E ( x1) + E ( x2 ) + K + E ( xn ) nμ = =μ n n E (s ) = K = E( p ) = n −1 σ ⇒ biased, s = n ∑i ( xi − x )2 is unbiased n −1 E ( x1) + E ( x2 ) + K + E ( xn ) np 1 if A occurs = = p, xi = ⎧⎨ n n ⎩0 otherwise 7 Properties of Point Estimators z Efficiency Given the choice of two unbiased estimators of the same popo- pulation parameter, we would prefer to use the point estimator with the smaller standard deviation,, since it tends to provide estimates closer to the population parameter. Example: 1 1 3 1 μ ≈ x = ∑15xi , μ ≈ x = ∑13x i + x 4 + x5 ⇒ E ( x ) = E ( x ) = μ 5 5 10 10 but: s( x ) = 1 1 2 9 2 1 2 22 σ, while s( x ) = (σ + σ2 + σ2 ) + σ + σ = σ 5 25 100 100 100 Properties of Point Estimators z Consistency A point estimator is consistent if the values of the point estiestimator tend to become closer to the population parameter as the sample size becomes larger. lim P (| t ( x1, x2 ,K, xn ) − θ | ≥ ε) = 0, ∀ε > 0 n →∞ Example: Var ( x ) σ2 lim P (| x − μ | ≥ ε) = lim P (| x − E ( x ) | ≥ ε) ≤ lim = lim =0 n →∞ n →∞ n →∞ n → ∞ nε 2 ε2 x is a consistent estimator of μ.. 8 Standard Deviations z Standard Deviation of x Finite Population σx = ( z σ N −n ) n N −1 Infinite Population σx = σ n Standard Deviation of p Finite Population σp = Infinite Population p(1 − p ) N − n n N −1 σp = p(1 − p ) n A finite population is treated as being infinite if n/N < .05.. (N − n) /(N − 1) is the finite correction factor. Sampling Distribution of x z If we use a large (n > 30) simple random sample, the central limit theorem enables us to conclude that the sampling distribution of x can be approximated by a normal probability distribution. z When the simple random sample is small (n < 30), the sampling distribution of x can be considered normal only if we assume the population has a normal probability distribution. 9 Example: St. Andrew’s z Sampling Distribution of x for the SAT Scores σx = E( x ) = μ = 990 σ n = 80 = 14.6 30 x Example: St. Andrew’s z Sampling Distribution of x for the SAT Scores What is the probability that a simple random sample of 30 applicants will provide an estimate of the population mean SAT score that is within plus or minus 10 of the actual population mean μ ? In other words, what is the probability that x will be between 980 and 1000? 10 Example: St. Andrew’s z Sampling Distribution of x for the SAT Scores Sampling distribution of x Area = .2518 Area = .2518 x 980 990 1000 Using the standard normal probability table with z = 10/14.6= .68, we have area = (.2518)(2) = .5036 Example: St. Andrew’s z Sampling Distribution of p for In-State Residents σp = .72(1− .72) = .082 30 E( p ) = .72 The normal probability distribution is an acceptable approximation since np = 30(.72) = 21.6 > 5 and n(1 - p) = 30(.28) = 8.4 > 5. 11 Example: St. Andrew’s z Sampling Distribution of p for In-State Residents What is the probability that a simple random sample of 30 applicants will provide an estimate of the population proporproportion of applicants desiring onon-campus housing that is within plus or minus .05 of the actual population proportion? In other words, what is the probability that p will be between .67 and .77? Example: St. Andrew’s z Sampling Distribution of p for In-State Residents Sampling distribution of p Area = .2291 Area = .2291 p 0.67 0.72 0.77 For z = .05/.082 = .61, the area = (.2291)(2) = .4582. The probability is .4582 that the sample proportion will be within +/+/-.05 of the actual population proportion. 12 Other Sampling Methods z z z z z Stratified Random Sampling Cluster Sampling Systematic Sampling Convenience Sampling Judgment Sampling Stratified Random Sampling z z z z z z The population is first divided into groups of elements called strata.. Each element in the population belongs to one and only one stratum. Best results are obtained when the elements within each stratum are as much alike as possible (i.e. homogeneous group). ). A simple random sample is taken from each stratum.. Advantage: If strata are homogeneous, this method is as “precise” precise” as simple random sampling but with a smaller total sample size. Example: The basis for forming the strata might be department, location, age, industry type, etc. 13 Cluster Sampling z z z z z z z The population is first divided into separate groups of elements called clusters.. Ideally, each cluster is a representative smallsmall-scale version of the population (i.e. heterogeneous group). A simple random sample of the clusters is then taken. All elements within each sampled (chosen) cluster form the sample. Advantage: The close proximity of elements can be cost effective (I.e. many sample observations can be obtained in a short time). Disadvantage: This method generally requires a larger total sample size than simple or stratified random sampling. Example: A primary application is area sampling, where clusters are city blocks or other wellwell-defined areas. Systematic Sampling z z z z z If a sample size of n is desired from a population containing N elements, we might sample one element for every n/N elements in the population. We randomly select one of the first n/N elements from the population list. We then select every n/Nth element that follows in the population list. Advantage: The sample usually will be easier to identify than it would be if simple random sampling were used. Example: Selecting every 100th listing in a telephone book after the first randomly selected listing 14 Convenience Sampling z z z z z It is a nonprobability sampling technique.. Items are included in the sample without known probabilities of being selected. The sample is identified primarily by convenience.. Advantage: Sample selection and data collection are relatively easy. Disadvantage: It is impossible to determine how representative of the population the sample is. Example: A professor conducting research might use student volunteers to constitute a sample. Judgment Sampling z z z z z The person most knowledgeable on the subject of the study selects elements of the population that he or she feels are most representative of the population. It is a nonprobability sampling technique.. Advantage: It is a relatively easy way of selecting a sample. Disadvantage: The quality of the sample results depends on the judgment of the person selecting the sample. Example: A reporter might sample three or four senators, judging them as reflecting the general opinion of the senate. 15 Other Estimation Methods z Maximal Probabilty We are looking for maximum of the function: n max L( x1, x 2 ,K, x n , θ) = max ∏ f ( xi , θ), θ z θ i =1 ) ∂L = 0 ⇒ θ = θ( x1, x2 ,K, xn ) ∂θ Least Squares min( t ( x1, x 2 ,K, xn ) − θ )2 θ z z z Systematic Sampling Convenience Sampling Judgment Sampling End of Chapter 7 16 VIII Interval Estimation Interval Estimation z z z z z z Interval Estimation – Basic Method Interval Estimation of a Population Mean: LargeLarge-Sample Case Interval Estimation of a Population Mean: SmallSmall-Sample Case Determining the Sample Size Interval Estimation of a Population Proportion Interval Estimation of a Population Variance μ x [--------------------- x ---------------------] [--------------------- x ---------------------] [--------------------- x ---------------------] 1 Interval Estimation Point estimators are not enough determined because of lack of information about error and confidence. So, instead of one, we can use two estimators t1 = t1(x1, x2, …,xn) and t2 = t2(x1, x2, …,xn) of the unknown parameter θ, such that: that: P(t1 ≤ θ ≤ t2) = 1 - α (confidence probability) The probability that the parameter θ is in the confidence interval (t1, t2) Interval Estimation – basic method Let us θˆ be an estimator of θ such that E(θˆ ) = θ . We suppose that sample is normaly distributed or its size is n ≥ 30. Then, z = θˆ − θ σ θˆ has z(0,1) distribution. P(- zα/2 ≤ z ≤ zα/2) = 1−α θˆ α/2 P(θˆ − zα / 2 ⋅σθˆ ≤ θ ≤ θˆ − zα / 2 ⋅σθˆ ) = 1− α Examples: z0.025 = 1.96 ⇒ P(θ ∈θˆ ±1.96⋅σθˆ ) = 0.95 z0.05 = 1.65 ⇒ P(θ ∈θˆ ±1.65⋅σ ˆ ) = 0.90 θ z0.33 = 0.44 ⇒ P(θ ∈θˆ ± 0.44⋅σθˆ ) = 0.34 Sampling distribution of θ 1 - α of all θ values α/2 μ θ There is a 1 - α probability that the value of a sample mean will provide a sampling error of zα/2σθ or less. 2 Interval Estimate of a Population Mean: (n > 30) z With σ known where: P ( μ ∈ x ± zα / 2 σ n ) = 1− α x is the sample mean 1 − α is the confidence probability zα/2 is the z value providing an area of α/2 in the upper tail of the standard normal probability distribution σ is the population standard deviation n is the sample size If n ≥ 30, x has a approximatelly normal distribution z( μ, σ2 ). n Interval Estimate of a Population Mean: (n > 30) z With σ unknown In most applications the value of the population standard deviation is unknown. We simply use the value of the sample standard deviation, s,, as the point estimate of the population standard deviation. For n ≥ 30 it is quite acceptable. P ( μ ∈ x ± zα / 2 s ) = 1− α n 3 Example: National Discount, Inc. z Large-Sample Case (n ≥ 30) with σ Unknown National Discount has 260 retail outlets throughout the United States. National evaluates each potential location for a new retail outlet in part on the mean annual income of the individuals in the marketing area of the new location. Sampling can be used to develop an interval estimate of the mean annual income for individuals in a potential marketing area for National Discount. A sample of size n = 36 was taken. The sample mean, x , is $21,100 and the sample standard deviation, s, is $4500. We will use .95 as the confidence coefficient in our interval estimate. Example: National Discount, Inc. There is a .95 probability that the value of a sample mean for National Discount will provide a sampling error of $1,470 or less…… less…….. determined as follows: 95% of the sample means that can be observed are within + 1.96 σ x of the population mean μ. If σ x = s 4500 = = 750 , then 1.96 = 1470. σx n 36 Interval Estimate of μ is: or $21,100 + $1,470 $19,630 to $22,570 We are 95% confident that the interval contains the population mean. 4 Interval Estimation of a Population Mean: (n < 30) z Population is Not Normally Distributed The only option is to increase the sample size to n > 30 and use the largelarge-sample intervalinterval-estimation procedures. z Population is Normally Distributed and σ is Known The largelarge-sample intervalinterval-estimation procedure can be used. z Population is Normally Distributed and σ is Unknown The appropriate interval estimate is based on a probability distribution known as the t distribution.. Interval Estimation of a Population Mean: (n < 30) z Interval Estimate where P( μ ∈ x ± tαα // 22 s ) = 1− α n 1 -α is the confidence coefficient tα/2 is the t value providing an area of α/2 in the upper tail of a t distribution with n - 1 degrees of freedom s is the sample standard deviation Sample has normal distribution, but it is not correct to subsubstitute σ with s (because n < 30). In that case, the random x −μ variable has a t distribution (normal/χ (normal/χ2 ). s / n −1 5 Example: Apartment Rents z Small-Sample Case (n < 30) with σ Unknown A reporter for a student newspaper is writing an article on the cost of offoff-campus housing. A sample of 10 oneone-bedroom units within a halfhalf-mile of campus resulted in a sample mean of $550 per month and a sample standard deviation of $60. Let us provide a 95% confidence interval estimate of the mean rent per month for the population of oneonebedroom units within a halfhalf-mile of campus. We’ We’ll assume this population to be normally distributed. Example: Apartment Rents z t Value At 95% confidence, 1 - α = .95, α = .05, and α/2 = .025. t.025 is based on n - 1 = 10 - 1 = 9 degrees of freedom. In the t distribution table we see that t.025 = 2.262. Degrees of Freedom . 7 8 9 10 . ,10 . 1,415 1,397 1,383 1,372 . Area in Upper Tail ,05 ,025 ,01 . . . 1,895 2,365 2,998 1,860 2,306 2,896 1,833 2,262 2,821 1,812 2,228 2,764 . . . ,005 . 3,499 3,355 3,250 3,169 . 6 Example: Apartment Rents x ± t.025 550 ± 2.262 s n 60 10 550 + 42.92 or $507.08 to $592.92 We are 95% confident that the mean rent per month for the population of oneone-bedroom units within a halfhalfmile of campus is between $507.08 and $592.92. $592.92. Example: Evacuate Victims According to one study “The majority of people who die from fire and smoke in compartmented firefire-resistive buildingsbuildings-the type used for hotelshotels-die in the attempt to evacuate. (Risk management. Feb. 1986). The following data represent the number of victims who attempted to evacuate for a sample of 14 recent fires: Fire Died Las Vegas Hilton (Las Vegas) 5 Howard J. (New Orleans) Fire Died 5 Inn on the park (Toronto) 5 Cornell Univ. (Ithaca) 9 Westchase Hilton (Houston) 8 Wesport Central (Kanzas C.) 4 Holiday Inn (Cambridge, OH) 10 Orrington (Evanston, Illinois) 0 Conrad Hilton (Chicago) 4 Hartford Hospital (Hartford) 16 Providence (Providence) 8 Milford Plaza (New York) 0 Baptist Towers (Atlanta) 7 MGM Grand (Las Vegas) 36 Construct a 98% confidence interval for the true mean number of victims per fire. What is the confidence of the interval with ±3 victims around the mean? 7 Example: Evacuate Victims x = (5 + 5 + 8 + 10 + 4 + 8 + 7 + 5 + 9 + 4 + 0 + 16 + 0 + 36) / 14 = 117 / 14 = 8.36 s 2 = (3.362 + 3.362 + 0.362 + 1.642 + 4.362 + 0.362 + 1.362 + 3.362 + 0.642 + 4.362 + 8.362 + 7.642 + 8.362 + 27.642 ) / 13 = 79.94 ⇒ s = 8.94 1 − α = 0.98 ⇒ α / 2 = 0.01 ⇒ t 013.01 = 2.650 s 8.94 x ± t 13 = 8.36 ± 2.650 = 8.36 ± 6.33 .01 n 14 If we want 99% confidence: s 8.94 x ± t 13 = 8.36 ± 3.012 = 8.36 ± 7.2 .005 n 14 If we want to find the confidence of the interval 8.36 ± 3: 13 3 = tα13 α // 22 8.94 14 13 ⇒ tα13 = 1.2555 ⇒ α / 2 > 0.1 ⇒ 1 − α < 0.80 α // 22 = 3 8.94 14 Sample Size for an Interval Estimate of a Population Mean z z z z z Let E = the maximum sampling error mentioned in the precision statement. E is the amount added to and subtracted from the point estimate to obtain an interval estimate. E is often referred to as the margin of error. We have σ E = zα / 2 n Solving for n we have n= ( zα / 2 )2 σ 2 E2 8 Example: National Discount, Inc. Suppose that National’ National’s management team wants an estimaestimate of the population mean such that there is a .95 probability that the sampling error is $500 or less. How large a sample size is needed to meet the required precision? σ zα / 2 = 500 n At 95% confidence, z.025 = 1.96. Recall that σ = 4,500. (1.96)2 ⋅ ( 4500 )2 = 311.17 Solving for n we have n = 500 2 We need to sample 312 to reach a desired precision of + $500 at 95% confidence. Interval Estimation of a Population Proportion z Interval Estimate P( p ∈ p ± zα / 2 where: p (1 − p ) ) = 1− α n 1 - α is the confidence coefficient zα/2 is the z value providing an area of α/2 in the upper tail of the standard normal probability distribution p is the sample proportion 9 Example: Political Science, Inc. z Interval Estimation of a Population Proportion Political Science, Inc. (PSI) specializes in voter polls and surveys designed to keep political office seekers informed of their position in a race. Using telephone surveys, interinterviewers ask registered voters who they would vote for if the election were held that day. In a recent election campaign, PSI found that 220 regisregistered voters, out of 500 contacted, favored a particular candidate. PSI wants to develop a 95% confidence ininterval estimate for the proportion of the population of registered voters that favors the candidate. Example: Political Science, Inc. p ± zα / 2 where: p (1 − p ) n n = 500, p = 220/500 = .44, .44 ± 1.96 zα/2 = 1.96 .44(1 − .44 ) 500 .44 + .0435 PSI is 95% confident that the proportion of all voters that favors the candidate is between .3965 and .4835. 10 Sample Size for an Interval Estimate of a Population Proportion z Let E = the maximum sampling error mentioned in the precision statement. We have p(1 − p ) E = zα / 2 n z Solving for n we have z n= ( zα / 2 )2 p(1 − p ) E2 Example: Political Science, Inc. Suppose that PSI would like a .99 probability that the sample proportion is within + .03 of the population proportion. How large a sample size is needed to meet the required precision? At 99% confidence, z.005 = 2.576. n= ( zα / 2 )2 p(1 − p ) E2 = (2.576 )2 (.44 )(.56 ) ≅ 1817 (.03)2 Note: We used .44 as the best estimate of p in the above expression. If no information is available about p, then .5 is often assumed because it provides the highest possible sample size. If we had used p = .5, the recommended n would have been 1843. 11 Interval Estimation of σ 2 z Interval Estimate of a Population Variance P( (n − 1)s 2 χα2 / 2 ≤σ2 ≤ (n − 1)s 2 χ(21−α / 2) ) = 1− α where the χ2 values are based on a chichi-square distribution with n − 1 degrees of freedom and where 1 − α is the confidence coefficient. If xi have normal distribution N(μ,σ2), Taking the square 1 2 root of the upper ( x i − x )2 , has χ2 then s = ∑ n −1 and lower limits of distribution (sum of squares of random the variance interinterval provides the variables with normal distribution). So, confidence interval (n − 1)s 2 2 χ = (to be sum of N(0,1) r.var.) for the population σ2 standard deviation. and from P( χ12−α / 2 ≤ χ 2 ≤ χα2 / 2 ) = 1 − α Example: Buyer’s Digest Buyer’ Buyer’s Digest rates thermostats manufactured for home tempetempera ture control. In a recent test, 10 thermostats man ufactured by ThermoRite were selected and placed in a test room that was maintained at a temperature of 68oF. The temperature readings of the ten thermostats are listed below. We will use the 10 readings to develop a 95% confidence interval estimate of the population variance. Therm. 2 3 4 5 6 7 8 9 10 Therm. 1 Temp. 67.4 67.8 68.2 69.3 69.5 67.0 68.1 68.6 67.9 67.2 12 Example: Buyer’s Digest n - 1 = 10 - 1 = 9 degrees of freedom and α = .05 χ.2975 ≤ 2.70 ≤ .025 (n − 1)s 2 σ 2 (n − 1)s 2 σ2 ≤ χ.2025 ≤ 19.02 .025 This area is 0.95 0 χ χ 2 .975 2 .025 χ2 19.02 2.70 Example: Buyer’s Digest Sample variance s2 provides a point estimate of σ 2. s2 = ∑ ( x i − x )2 n −1 = 6. 3 = .70 9 A 95% confidence interval for the population variance is given by: (10 − 1).70 (10 − 1).70 ≤σ2 ≤ 19.02 2.70 0.33 < σ 2 < 2.33 13 Example: Can Production A quality control suprvisor in cannery knows that the exact amount each can contains will vary, since there are certain uncontrollable factors that affect the amount of fill. The mean fill per can is important, but equally important is the variation variation 2 2 σ of the amount of fill. If σ is large, some cans will contain too little and others too much. In order to estimate the variation of fill at the cannery, the supervisor randomly selects 10 cans and weighs the contents in each. The following results are obtained: x = 7.98 ounces, s = 0.04 ounces Construct a 90% confidence interval for the true variation in fill of cans at the cannery. Example: Can Production ⎧ χ 2 = 16.9190 for 9 degrees 1 − α = 0.90 ⇒ α / 2 = 0.05 ⇒ ⎨ 02.05 ⎩ χ 0.95 = 3.3251 of freedom 9 ⋅ 0.04 9 ⋅ 0.04 ≤σ2 ≤ ⇒ 0.000851 ≤ σ 2 ≤ 0.004331 16.9190 3.3251 The quality control supervisor could use this interval to check wheather the variation of fill at the cannery is too large and in violation of government regulations. 14 Interval Estimation of σ12/σ22 z Interval Estimate of the ratio between Population Variances P( 1 σ 12 s12 ≤ ≤ Fα / 2 (ν 2 ,ν 1)) = 1 − α s22 Fα / 2 (ν 1,ν 2 ) σ 22 s22 s12 where the F values are based on a Fisher distribution with (ν1, ν2) degrees of freedom and where 1 − α is the confidence coefficient. A random variable has Fisher distribution: Taking the square root of the upper and lower limits of the variance interinterval provides the confidence interval for the ratio between population standard deviation. (n1 − 1)s12 F= χ12 /ν 1 χ 22 /ν 2 = σ 12 (n2 − 1)s22 σ 22 /( n1 − 1) = /( n2 − 1) s12 / σ 12 s22 / σ 22 And knowing that F1−α (ν 1,ν 2 ) = from P(F1−α / 2 1 Fα (ν 2 ,ν 1) ≤ F ≤ Fα / 2 ) = 1 − α End of Chapter 8 15 IX Hypothesis Testing Hypothesis Testing z About concept of Hypothesis z Parametric Tests z Developing Null and Alternative Hypotheses z Type I and Type II Errors z Tests About a Population Mean: Large-Sample Case z Tests About a Population Mean: Small-Sample Case z Tests About a Population Proportion z Tests about Population Variance z Hypothesis Testing and Decision Making z Calculating the Probability of Type II Errors z Determining the Sample Size for a Hypothesis Test 1 Hypothesis Testing z z z z z About the nature of some event many hypothesis H0, H1, …, Hk could be established (developed) Because of various reasons important for us can be H0, called Null Hypothesis.. The other hypothesis could be considered as an Alternative Hypothesis HA. To decide which hypothesis to accept we conduct an exexperiment (take sample) obtaining a value T(x1, x2, …, xn) We devide the sample space Ω (in our case Rn) on two sets A and B= Ω-A and if T ∈ A we accept H0 if T ∈ B we accept HA In ideal case: P(T ∈ B/H0) = 0, Never reject H0 when it is true P(T ∈ B/HA) = 0, Always reject H0 when it is false (Never reject HA when it is true) Hypothesis Testing z Unfortunatly, such ideal partition of the sample space into sets A and B is not possible. So, we take a number α > 0,, and choose the critical domain B such that P(T ∈ B/H0) ≤ α (α is called significance level and it gives the probability to reject H0 when it is true. Ussualy we take α=0.05, 0.01, 0.001) α gives Type I error (rejecting H0 when it is true) P(T ∈ A/HA) = β ⇒ 1−β = P(T ∈ B/HA) (1−β is called power of the test and it gives the probability of correctly rejecting H0 when it is false) β gives Type II error (accepting H0 when it is false) 2 Parametric Tests z Let us X be random variable with distribution f(x,θ) which depends on an unknown paremeter θ. We test parameter θ using a test sample in the following way: H0: θ = θ0, HA: θ = θ1 < θ0 or H0: θ = θ0, HA: θ = θ1 > θ0 or H0: θ = θ0, HA: θ = θ1 ≠ θ0 z The sample is a random vector T(x1, x2, …, xn) with PDF: n n n i =1 i =1 L(T) = ∏ f ( xi ,θ ) and L 0 (T) = ∏ f ( xi ,θ 0 ) , L1(T) = ∏ f ( xi ,θ1) i =1 z Rn Lemma: If exist area B in and a number c such that L 0 (T) L 0 (T) < c when T ∈ B and ≥ c when T ∉ B L1(T) L1(T) then B is the best critical domain. (Neuman-Pirson) Parametric Tests z z Let us X be random variable with N(μ,σ2) distribution, where μ is unknown and σ can be replaced by s.. We want to test: H0: θ = θ0, HA: θ = θ1 < θ0 We take the sample (x1, x2, …, xn) and calculate 1 − 2 ∑1n( xi − μ0 )2 2 2 1 1 2σ e −( xi − μ0 ) / 2σ = n e and L1(T) σ (2π )n / 2 i =1 σ 2π n n L 0 (T) = ∏ 1 − 2 ∑1( xi − μ0 ) 2 L 0 (T) e 2σ 2σ 2 ln c + n( μ02 − μ12 ) 1 n = < c ⇒ x < when T∈B ∑ i 1 1 n L1(T) n 2n( μ0 − μ1) − 2 ∑1( xi − μ1 )2 new c 2 σ e n 2 x − μ0 c − μ0 < and σ/ n σ/ n x − μ0 c − μ0 P( x < c / H0 ) = α ⇒ P( < / H0 ) = α σ/ n σ/ n The expression x < c is transforme d in zα gives the best critical domain x − μ0 Thus, if < zα , we reject H0 with level of significance α. σ/ n 3 Null or Alternative ? z z z Hypothesis testing is similar to a criminal trial:: H0: The defendant is innocent HA: The defendant is guilty Ussualy the theory we want to support should be alternative hypothesis. Testing Research Hypotheses z Testing the Validity of a Claim z The research hypothesis should be expressed as the alternative.. The conclusion that the research hypothesis is true comes from sample data that contradict the null hypothesis. Manufacturers’ Manufacturers’ claims are usually given the benefit of the doubt and stated as the null hypothesis.. The conclusion that the claim is false comes from sample data that contradict the null hypothesis. Testing in Decision-Making Situations A decision maker might have to choose between two courses of action, one associated with the null hypothesis and another associated with the alternative hypothesis. A Summary of Forms for Null and Alternative Hypotheses about a Population Mean z z The equality part of the hypotheses always appears in the null hypothesis. In general, a hypothesis test about the value of a populapopulation mean μ must take one of the following three forms (where μ0 is the hypothesized value of the population mean). H0: μ > μ0 HA: μ < μ0 H0: μ < μ0 HA: μ > μ0 H0: μ = μ0 HA: μ ≠ μ0 4 Example: Metro EMS A major west coast city provides one of the most comprehencomprehensive emergency medical services in the world. Operating in a multiple hospital system with approximately 20 mobile medical units, the service goal is to respond to medical emergencies with a mean time of 12 minutes or less. The director of medical services wants to formulate a hypohypothesis test that could use a sample of emergency response times to determine whether or not the service goal of 12 minutes or less is being achieved. Conclusion and Action Hypotheses H0: μ < 12 The emergency service is meeting the response goal; no follow-up action is necessary. HA: μ > 12 The emergency service is not meeting the response goal; follow-up action is necessary. μ = mean response time of medical emergency requests. Type I and Type II Errors z z z z z z Since hypothesis tests are based on sample data, we must allow for the possibility of errors. A Type I error is rejecting H0 when it is true. A Type II error is accepting H0 when it is false. The person conducting the hypothesis test specifies the maximum allowable probability of making a Type I error, denoted by α and called the level of significance. Generally, we cannot control for the probability of making a Type II error, denoted by β. Statistician avoids the risk of making a Type II error by using “do not reject H0” and not “accept H0”. 5 Example: Metro EMS Conclusion Population Condition H0 True Ha True (μ < 12 ) (μ > 12 ) Accept H0 (Conclude μ < 12) Correct Conclusion Type II Error Reject H0 (Conclude μ > 12) Type I Εrror Εrror Correct Conclusion The Steps of Hypothesis Testing z Determine the appropriate hypotheses.. z Select the test statistic for deciding whether or not to reject the null hypothesis. z Specify the level of significance α for the test. z Use α to develop the rule for rejecting H0. z Collect the sample data and compute the value of the test statistic.. z a) Compare the test statistic to the critical value(s) in the rejection rule, or b) Compute the p-value based on the test statistic and compare it to α to determine whether or not to reject H0. 6 One-Tailed Tests about a Population Mean: (n > 30) z Hypotheses H0: μ < μ0 HA: μ > μ0 z Test Statistic σ Known z = x −μ σ/ n σ Unknown z = x −μ 00 00 z H0: μ > μ0 HA: μ < μ0 or s/ n Rejection Rule Reject H0 if z > zα Reject H0 if z < -zα Example: Metro EMS z One-Tailed Test about a Population Mean: Large n Let α = P(Type I Error) = .05 Sampling distribution of x (assuming H0 is true and μ = 12) Reject H0 Do Not Reject H0 α = .05 1.645σ x 12 x c (Critical value) 7 Example: Metro EMS z One-Tailed Test about a Population Mean: Large n Let n = 40, x = 13.25 minutes, s = 3.2 minutes (The sample standard deviation s can be used to estimate the population standard deviation σ.) z= x − μ 13.25 − 12 = = 2.47 3.2 / 40 σ/ n Since 2.47 > 1.645, we reject H0. Conclusion: Conclusion: We are 95% confident that Metro EMS is not meeting the response goal of 12 minutes; appropriate action should be taken to improve service. Example: Metro EMS z Using the p-value to Test the Hypothesis Recall that z = 2.47 for x = 13.25. Then p -value = P(z > 2.47) = .0068. p-value .0068. Since p -value < α, that is .0068 < .05, we reject H0. p-value Reject H0 Do Not Reject H0 0 p-value= .0068 1.645 2.47 z 8 Two-Tailed Tests about a Population Mean: (n > 30) z Hypotheses z Test Statistic z H0: μ = μ0 HA: μ ≠ μ0 σ Known z = x − μ0 σ/ n σ Unknown z = x − μ0 s/ n Rejection Rule Reject H0 if |z| > zα/2 Example: Glow Toothpaste The production line for Glow toothpaste is designed to fill tubes of toothpaste with a mean weight of 6 ounces. Periodically, Periodically, a sample of 30 tubes will be selected in order to check the filling process. Quality assurance procedures call for the continuation of the filling process if the sample results results are consistent with the assumption that the mean filling weight for the population of toothpaste tubes is 6 ounces; otherwise the filling process will be stopped and adjusted. Hypotheses H0: μ = 6, HA: μ ≠ 6 Rejection Rule Αssuming a .05 level of significance, Reject H0 if z < -1.96 or if z > 1.96 9 Example: Glow Toothpaste z Two-Tailed Test about a Population Mean: Large n Sampling distribution of x (assuming H0 is true and μ = 6) Reject H0 Reject H0 α /2= .025 Do Not Reject H0 α /2= .025 z -1.96 0 1.96 Example: Glow Toothpaste z Two-Tailed Test about a Population Mean: Large n Assume that a sample of 30 toothpaste tubes provides a sample mean of 6.1 ounces and standard deviation of 0.2 ounces. Let n = 30, x = 6.1 ounces, z= s = .2 ounces x − μ0 6.1− 6 = = 2.74 s / n .2/ 30 Since 2.74 > 1.96, we reject H0. Conclusion: Conclusion: We are 95% confident that the mean filling weight of the toothpaste tubes is not 6 ounces. The filling process should be stopped and the filling mechanism adjusted. 10 Example: Glow Toothpaste z Using the p-Value for a Two-Tailed Hypothesis Test Suppose we define the p-value for a two-tailed test as double the area found in the tail of the distribution. With z = 2.74, the standard normal probability table shows there is a .5000 - .4969 = .0031 probability of a difference larger than .1 in the upper tail of the distribution. distribution. Considering the same probability of a larger difference in the lower tail of the distribution, we have p-value = P(|z|>2.74) = 2(.0031) = .0062 The p-value .0062 is less than α = .05, so H0 is rejected. Confidence Interval Approach to a Two-Tailed Test about a Population Mean z z Select a simple random sample from the population and use the value of the sample mean x to develop the confidence interval for the population mean μ. If the confidence interval contains the hypothesized value μ0, do not reject H0. Otherwise, reject H0. The 95% confidence interval for μ is x ± zα / 2 σ n = 6.1 ± 1.96(.2 30 ) = 6.1 ± .0716 or 6.0284 to 6.1716 Since the hypothesized value for the population mean, μ0 = 6, is not in this interval, the hypothesishypothesis-testing conclusion is that the null hypothesis, H0: μ = 6, can be rejected. 11 Tests about a Population Mean: Small-Sample Case (n < 30) z Test Statistic σ Known t = x −μ σ/ n 00 σ Unknown t = x − μ s/ n 00 This test statistic has a t distribution with n - 1 degrees of freedom. z Rejection Rule OneOne-Tailed H0: H0: H0: μ < μ0 μ > μ0 μ = μ0 TwoTwo-Tailed Reject H0 if t > tα Reject H0 if t < -tα Reject H0 if |t| > tα/2 Example: Highway Patrol z One-Tailed Test about a Population Mean: Small n A State Highway Patrol periodically samples vehicle speeds at various locations on a particular roadway. The sample of vehicle speeds is used to test the hypothesis H0: μ < 65. The locations where H0 is rejected are deemed the best locations for radar traps. At Location F, a sample of 16 vehicles shows a mean speed of 68.2 mph with a standard deviation of 3.8 mph. Use an α = .05 to test the hypothesis. 12 Example: Highway Patrol z One-Tailed Test about a Population Mean: Small n Let n = 16, x = 68.2 mph, s = 3.8 mph α = .05, d.f. = 1616-1 = 15, tα = 1.753 t= x − μ0 68.2 − 65 = = 3.37 s / n 3.8 / 16 Since 3.37 > 1.753,, we reject H0. Conclusion: Conclusion: We are 95% confident that the mean speed of vehicles at Location F is greater than 65 mph. Location F is a good candidate for a radar trap. Summary of Test Statistics for Population Mean Yes σ known ? Yes No Yes Use s to estimate σ x −μ σ/ n z= x −μ s/ n No σ known ? Yes z= No n > 30 ? z= x −μ σ/ n Use s to estimate σ t= x −μ s/ n Popul. Popul. approx. normal ? No Increase n to > 30 13 Sample size n ≥ 30: Sample size n < 30: Tests about a Population Proportion: Large-Sample Case (np > 5 and n(1 - p) > 5) z Test Statistic where: z z= σp = p − p0 σp p0 (1 − p0 ) n Rejection Rule OneOne-Tailed H0: p < p0 H0: p > p0 H0: p = p0 TwoTwo-Tailed Reject H0 if z > zα Reject H0 if z < -zα Reject H0 if |z| > zα/2 14 Example: NSC z Two-Tailed Test about a Population Proportion: Large n For a Christmas and New Year’ Year’s week, the National Safety Council estimated that 500 people would be killed and 25,000 injured on the nation’ nation’s roads. The NSC claimed that 50% of the accidents would be caused by drunk driving. A sample of 120 accidents showed that 67 were caused by drunk driving. Use these data to test the NSC’ NSC’s claim with α = 0.05. Hypothesis H0: p = .5 ; HA: p ≠ .5 Test Statistic p − p0 (67 /120) − .5 p (1 − p0 ) .5(1 − .5) = = 1.278 = = .045644 z = σp = 0 .045644 σp 120 n Rejection Rule Reject H0 if z < -1.96 or z > 1.96 Conclusion Do not reject H0. For z = 1.278, the p-value is .201. If we reject H0, we exceed the maximum allowed risk of committing a Type I error (p (p-value > .050). Hypothesis Testing About a Population Variance z Left-Tailed Test Hypotheses: Rejection Rule: H0 : σ 2 ≤ σ 02 H A : σ 2 > σ 02 2 Test Statistic: χ = (n − 1)s 2 σ 02 2 2 Reject H0 if χ > χα (chi(chi-square distribution with n - 1 d.f.) or Reject H0 if p-value < α z Right-Tailed Test Hypotheses: H0 : σ 2 ≥ σ 02 H A : σ 2 < σ 02 2 Test Statistic: χ = (n − 1)s 2 σ 02 Rejection Rule 2 2 Reject H0 if χ > χ(1−α ) (chi(chi-square distribution with n - 1 d.f.) or Reject H0 if p-value < α 15 Hypothesis Testing About a Population Variance z Two-Tailed Test H0 : σ 2 = σ 02 ♦ Hypotheses: ♦ Rejection Rule: H A : σ 2 ≠ σ 02 2 Test Statistic: χ 2 = ( n − 1)s 2 σ0 2 Reject H0 if χ < χ (1−α / 2) or χ > χα / 2 (where χ (1−α / 2) and χα / 2 are based on a chichi-square distribution with n - 1 d.f.) or Reject H0 if p-value < α 2 2 2 2 2 Hypothesis Testing About the Variances z One-Tailed Test Hypotheses: Rejection Rule: H0 : σ 12 ≤ σ 22 H A : σ 12 > σ 22 Test Statistic: F= s12 s22 Reject H0 if F > Fα where the value of Fα is based on an F distribution with n1 - 1 (numerator) and n2 - 1 (denominator) d.f. z Two-Tailed Test Hypotheses: Rejection Rule: H0 : σ 12 = σ 22 H A : σ 12 ≠ σ 22 Test Statistic: F= s12 s22 Reject H0 if F > Fα/2 where the value of Fα/2 is based on an F distribution with n1 - 1 (numerator) and n2 - 1 (denominator) d.f. 16 Example: Buyer’s Digest Buyer’ Buyer’s Digest has conducted the same test, as was described earlier, on another 10 thermostats, this time manufactured by TempKing. TempKing. The temperature readings of the ten thermostats are listed below. We will conduct a hypothesis test with α = .10 to see if the variances are equal for ThermoRite’ ThermoRite’s thermostats and TempKing’ TempKing’s thermostats. Therm. 2 3 4 5 6 7 8 9 10 Therm. 1 Temp. 66.4 67.8 68.2 70.3 69.5 68.0 68.1 68.6 67.9 66.2 Example: Buyer’s Digest z Hypothesis Testing About the Variances of Two Populations Hypotheses H0 : σ 12 = σ 22 (ThermoRite and TempKing thermothermostats have same temperature variance) H A : σ 12 ≠ σ 22 (Their variances are not equal) Rejection Rule The F distribution table shows that with α = .10, .10, 9 d.f. (numer.), and 9 d.f. (denom.), F.05 = 3.18. Reject H0 if F > 3.18 Test Statistic ThermoRite’ ThermoRite’s sample variance is .70. TempKing’ TempKing’s sample variance is 1.52. F = 1.52/.70 = 2.17 Conclusion We cannot reject H0. There is insufficient evidence to conclude that the population variances differ for the two thermostat brands. brands. 17 Hypothesis and Decision Making In many decisiondecision-making situations the decision maker may want, and in some cases may be forced, to take action with both the conclusion do not reject H0 and reject H0. In such situations, it is recommended that the hypothesishypothesis-testing procedure be extended to include consideration of making a Type II error. x − μ1 For test with critical domain > zα : σ/ n x − μ0 P(T ∈ B/ H A ) = P( x > c / H A ) = P( > zα / H A ) = σ/ n x − μ1 Does not have N(0,1). When HA is true N(0,1) has , so: σ/ n x − μ0 μ0 − μ1 μ − μ1 x − μ1 = P( + > zα + 0 / H A ) = P( > zα + λ / H A ) = 1 − β σ/ n σ/ n σ/ n σ/ n x − μ1 x − μ1 < zα + λ / H A ) = 1 − β < zα: P( σ/ n σ/ n x − μ1 P( −zα / 2 + λ < < zα / 2 + λ / H A ) = 1 − β σ/ n For test with critical domain For two-tailed test : Probability of Type II Error β Calculating the Probability of a Type II Error: Metro EMS n = 40, x = 13.25 minutes, s = 3.2 minutes 1. Hypotheses are: H0: μ = 12 min and HA: μ ≠ 12 min 2. Rejection rule is: Reject H0 if |z| |z| > z.05/2 = 1.645 3. Value of the sample mean that identifies the rejection region: region: x − 12 = 2.4705 > 1.96 = z.025 ⇒ 3.2 / 40 4. We will accept H0 when x < 12.99 z= x > 12 + 1.96 ⋅ 3.2 = 12.99 40 18 Example: Metro EMS (revisited) Calculating the Probability of a Type II Error 12− μ1 12.99− μ1 Values of μ1 1.96+ λ = 1.96+ 3.2/ 40 = 3.2/ 40 β z 14.0 13.6 13.2 12.99 12.90 12.4 12.0001 -2.00 -1.21 -0.42 0.00 0.18 1.17 1.96 .0104 .0643 .3372 .5000 .5714 .8023 .9750 1-β .9896 .9357 .6628 .5000 .4286 .1977 .0250 Recall: The probability of correctly rejecting H0 when it is false is called the power of the test. For any particular value of μ1, the power is 1 – β. Example: Metro EMS (revisited) z Observations about the preceding table: z When the true population mean μ is close to the null hypothesis value of 12, there is a high probability that we will make a Type Type II error. When the true population mean μ is far above the null hypothesis value of 12, there is a low probability that we will make a Type II error. Relationship among α, β, and n Once two of the three values are known, the other can be computed. For a given level of significance α, increasing the sample size n will reduce β. For a given sample size n, decreasing α will increase β, whereas increasing α will decrease β. 19 Example: Highway Improvements The department of Highway improvements, responsible for repairing repairing of 2525-mile stretch of interstate highway, wants to design surface that will be structurally efficient. One important consideration is the the volume of heavy freight traffic. State weigh stations reports that the average number of of havyhavy-duty trailers on a 2525-mile segment is 72 per hour. However, engineers believe that the volume of heavy freight traffic traffic is greater than the average reported. In order to validate this theory, theory, the department monitors the highway for 50 1-hour periods randomly selected throughout the month. The sample mean and standard deviation of the heavy freight traffic for the 50 sampled hours are: x =74.1, s = 13.3. Do the data support the department’ department’s theory? Use α = .10. H0: μ = 72; HA: μ > 72; z= x − 72 74.1 − 72 = = 1.12 < 1.28 = z.1 ⇒ H0 (state report) accepted s / n 13.3 / 50 Example: Highway Improvements If the number of heavy freight trucks is in fact 78 per hour, what is the probability that the test procedure would fail to detect it? 72 − 78 x − 78 1 − β = P( .28 + > 1{ / H A ) = P( z > −1.91) = 0.9719 13.3 / 50 13 .32 /4 50 zα 14 3 λ Therefore, the probability of accepting H0 when μ = 78 is only 0.0281. If the number of heavy freight trucks is in fact 78 per hour, for β we have: 72 − 74 zα + λ = 1.28 + = 0.22 13.3 / 50 β μ=72 74.1 μ=78 P( z > 0.22) = 0.4129 = 1 − β β = 0.588 x High type II error 20 Determining the Sample Size for a Hypothesis Test About a Population Mean n= ( zα + zβ )2 σ 2 ( μ0 − μa )2 where zα = z value providing an area of α in the tail zβ = z value providing an area of β in the tail σ = population standard deviation μ0 = value of the population mean in H0 μa = value of the population mean used for the Type II error Note: In a twotwo-tailed hypothesis test, use zα /2 not zα End of Chapter 9 21