Download Lecture Notes - Department of Statistics, Purdue University

STAT 225: Introduction to Probability Models Course Lecture Notes 1 1.1 Introduction to Probability Set Theory The material in this handout is intended to cover general set theory topics. Information includes (but is not limited to) introductory probabilities, outcome spaces, sample spaces, laws of probability, and Venn Diagrams. This covers section 1.2 and all of chapter 2 from A Course in Probability by Neil Weiss. An element is a single item (outcome), typically denoted by ω. A set is a collection of elements. A subset is a set itself, in which every element is contained in a larger set. Suppose the set A is contained in the set B. This is denoted by A ⊂ B or A ⊆ B depending on whether or not B has elements which are not in A. If B contains elements that are not in A, then A is called a proper subset of B. A Population is the collection of all individuals or items under consideration. An individual could refer to a person, a playing card, or whatever object we are interested in. A population is used in reference to sampling. However, when we talk about experiments, we use the phrase sample space. Sample space is the set of all possible outcomes for a random experiment and is denoted by Ω. Random Experiment is an action whose outcome cannot be predicted with certainty beforehand. Example 1.1 Suppose we are interested in whether the price of the S & P 500 decreases, stays the same, or increases. If we were to examine the S & P 500 over one day, then Ω = {decreases, stays the same, increases}. What would Ω be if we looked at 2 days? The opposite of Ω is the empty (null) set. It is the set with 0 elements in it and is written as ∅. (Please note how this looks. Do not write your 0s like this or you will lose points as they have 2 very different meanings.) Ω and ∅ are complements. A complement is a set that contains all of the elements in the sample space that are not in the original set. We denote a complement with a superscript c (or C). For example, the complement of A would be denoted as Ac or AC . Sometimes the symbol \ is useful when writing complements. The symbol \ means ”except” or ”everything but”. Suppose we look at the outcome of 2 rolls of a die. Let A be the event that both rolls are a 5. Then AC = Ω \ {5, 5}. We use the symbol ∈ to denote ”belongs to”. Here is the symbol for ”does not belong to”: 6∈. 1 of 62 Here are some important sets that pertain to numbers: the real numbers R , the integers Z, the rational numbers Q, the natural (whole) numbers N, and the positive integers Z+ . What sets are contained in (or are subsets of) the other sets? Example 1.2 Let us examine what happens in the flip of 3 fair coins. Fair means that the coin has the same probability of landing as a head as it does as landing as a tail. First, define Ω. Let A be the event of exactly 2 tails. Let B be the event that the first 2 tosses are tails. Let C be the event that all 3 tosses are tails. Write out the possible outcomes for each of these 3 events. We will revisit these events later on. Example 1.3 Let Ω, the universal set, be all 26 lower-case letters. Define the sets V , N , E, and G (all of which are subsets of Ω) as follows: • V = vowels (here, assume “y” is a vowel) = • N = letters next to a vowel (in the natural sequence “a” - “z”) = • E = every other letter, starting with “b” = • G = letters “a” - “g” = List the letters in each of the following sets: • V , N , E, and G individually (see answers above) • NC = • GC = Example 1.4 Start with a standard deck of 52 cards and remove all the hearts and all the spades, leaving 13 red and 13 black cards. List the cards in each of the following sets: • N = not a face card • R = neither red nor an ace • E = either black, even, or a Jack Example 1.5 Suppose a fair six-sided die is rolled twice. Determine the number of possible outcomes • for this experiment. • in which the sum of the two rolls is 5. • in which the two rolls are the same. • in which the sum of the two rolls is an even number. Random Experiment is an action whose outcome cannot be predicted with certainty beforehand. This does not mean that we know nothing about what can happen. An example of a random experiment could be one roll of a die (or multiple rolls), a hand in Texas Hold ’em, or a grade in a course. Ω represents all possible outcomes from the random experiment or the model under consideration. An event is defined to be any subset of the sample space. It can be one or more outcomes. Typically, when we refer to an event that is a single outcome, it is called a simple event, and 2 of 62 subsequently, a simple probability. For an example, you could think of an event as not losing money on the S & P 500 on a given day. This event has 2 outcomes based on Example 1.1 where Ω = {decreases, stays the same, increases}. Example 1.6 Refer to Example 1.1. Suppose you looked at 2 consecutive days for this index. Let A be the event that you made money on the first day. Let B be the event that you had at least one day where you made money. How many outcomes does each event represent? 1.2 Probability The Frequentist Interpretation of Probability states that the probability of an event is the long-run proportion of times that the event occurs in independent repetitions of the random experiment. This is referred to as an empirical probability and can be written as P (E) = N (E) n where n represents the sample size. (For definitions of P(E) and N(E) see the symbols reference.) Long-run means that n is large. There are differing viewpoints on large (typical examples are > 100, > 1,000, > 1,000,000, etc.) We will not use this exact formula for now, but it is essential to the Central Limit Theorem (CLT), which will be covered in MGMT 305. However, the concept is applicable for our purposes. Regardless of the sample size, if we are in an EQUALLY LIKELY FRAMEWORK, then N (E) P (E) = . N (Ω) What is meant by an equally likely framework? Well, let us create a scenario that has such a property. Suppose we roll a fair, 6-sided die. Because the die is fair, each side of the die has the same probability of occurring as any other side of the die. Therefore, any individual outcome of the sample space is equally likely as any other outcome in the sample space. Often, the equal-likelihood model is referred to as classical probability. So, in an equally likely framework, the probability of any event is the number of ways the event occurs divided by the number of total events possible. Find the probabilities associated with parts 2-4 of Example 1.5. 1.3 Probability Rules Regardless of whether sample outcomes have the same probabilities, there are rules that probabilities must satisfy. • Any probability must be between 0 and 1 inclusive. • Additionally, the sum of the probabilities for all the experimental outcomes must equal 1. • Suppose the event E is composed of several outcomes. Then the probability of E is just the sum of the probabilities of those outcomes. 3 of 62 If a probability model satisfies the first two rules, it is said to be legitimate. Refer to event B in Example 1.2 for as an example of the third rule above. What are the probabilities of Ω, ∅? If A ⊂ B, what (if anything) can you say about their probabilities? Example 1.7 (ASW Chapter 4.1, Problem 6) An experiment with three outcomes has been repeated 50 times, and it was learned that E1 occurred 20 times, E2 occurred 13 times, and E3 occurred 17 times. Assign probabilities to the outcomes. What method did you use? Example 1.8 Start with a standard deck of 52 cards and remove all the hearts and all the spades, leaving 13 red and 13 black cards. Suppose a card is randomly drawn from the remaining cards. What are the probabilities of the following events? • N = not a face card • R = neither red nor an ace • E = either black, even, or a Jack Example 1.9 (ASW Chapter 4.1, Problem 7) A decision maker subjectively assigned the following probabilities to the four outcomes of an experiment: P (E1 ) = .10, P (E2 ) = .15, P (E3 ) = .40, and P (E4 ) = .20. Are these probability assignments legitimate? Explain. 1.4 Probability with Several Events The intersection of the events A and B is written as A ∩ B. For an outcome to belong to the intersection, that outcome has to be in both A and B. If we were talking about the intersection of 3 or more events, the outcome would need to be in all of them. The intersection is what is in common. The union of the events A and B is written as A ∪ B and it means whatever is in at least one of A or B. Please note that we do not double count. If an outcome was in both A and B, then it is in their union, but it is not in there twice. Example 1.10 Refer to Example 1.2, where we flipped 3 fair coins: What are A ∩ B, A ∪ C, and A ∩ B ∪ C? Two other useful terms are mutually exclusive and exhaustive. Mutually exclusive refers to two (or more) events that cannot both occur when the random experiment is formed. Can you think of an event that is mutually exclusive with event C from Example 1.2? Note that the term disjoint is the same as mutually exclusive except that it refers to sets and not events. One can symbolically denote mutually exclusive events by the following equation: A ∩ B = ∅. 4 of 62 Exhaustive refers to event(s) that comprise the sample space. In other words, events that are exhaustive have a union that equals the sample space; if A and B are exhaustive, then A ∪ B = Ω. What would you call events that are both mutually exclusive and exhaustive? The answer is a partition. What is the simplest partition? Venn Diagrams are useful tools for examining the relationships between events. Tree diagrams are also helpful (more on this when we come to conditional probability, general multiplication rule, etc.) Draw generic diagrams for events that are: mutually exclusive, exhaustive, complements, subsets, and have an intersection but are not subsets. The complement rule is a way to calculate a probability based on the probability of its complement. It is P (A) = 1 − P (AC ). This law is extremely useful. It is often handy in situations where the desired event has many outcomes, but its complement has only a few. Example 1.11 Suppose we rolled a fair, six-sided die 10 times. Let T be the event that we roll at least 1 three. If one were to calculate T you would need to find the probability of 1 three, 2 threes, ... , and 10 threes and add them all up. However, you can use the complement rule. What is P(T)? The general addition rule is a way of finding the probability of a union of 2 events. P (A ∪ B) = P (A) + P (B) − P (A ∩ B). What does this become if A and B are mutually exclusive? Can you provide a mathematical proof of this? The inclusion-exclusion principle is a way to extend the general addition rule to 3 or more events. Here we will limit it to 3 events. P (A ∪ B ∪ C) = P (A) + P (B) + P (C) − P (A ∩ B) − P (A ∩ C) − P (B ∩ C) + P (A ∩ B ∩ C). The law of partitions is a way to calculate the probability of an event. Let A1 , A2 , ..., Ak form a partition of Ω. Then, for all events B, P (B) = k X P (Ai ∩ B). i=1 Then, there are DeMorgan’s Laws. Let A and B be subsets of Ω. Then • (A ∪ B)C = AC ∩ B C . 5 of 62 • (A ∩ B)C = AC ∪ B C . Example 1.12 Refer to Example 1.3. Solve for the following quantities: • P (consonant) = • P (GC ) = • P (E) and P (E C ) Example 1.13 Three of the major commercial computer operating systems are Windows, Mac OS, and Red Hat Linux Enterprise. A Computer Science professor selects 50 of her students and asks which of these three operating systems they use. The results for the 50 students are summarized below. • 30 students use Windows • 16 students use at least two of the operating systems • 9 students use all three operating systems • 18 students use Mac OS • 46 students use at least one of the operating systems • 11 students use both Windows and Linux • 11 students use both Windows and Mac OS Use the above information to complete a three-way Venn diagram. Windows Red Hat Linux Enterprise Mac OS Using the Venn diagram summarizing the distribution of operating system use previously described, calculate the following: 6 of 62 • Let Windows = W , Mac OS = M , and Red Hat Linux Enterprise = L • N (W C ∩ M C ) • P (W C ∪ M C ) = • N (W ∪ M ∪ L) = Example 1.14 In a certain population, 10 % of the population are rich, 5 % are famous, and 3 % are both. • Draw a Venn Diagram for the situation described above and label all probabilities. • What is the probability a randomly chosen person is not rich? • What is the probability a randomly chosen person is rich but not famous? • What is the probability a randomly chosen person is either rich or famous? • What is the probability a randomly chosen person is either rich or famous but not both? • What is the probability a randomly chosen person has neither wealth nor fame? Example 1.15 Drew is a risk taker. On any given weekend, Drew takes risks with or without monetary compensation. He gets paid 20 % of the time he takes risks. The risks involved are to either drink something weird (like garlic butter) or do something silly (like shave his head into a mohawk). Drew gets paid and drinks something weird 16 % of the time. Drew does not get paid and drinks something weird 72 % of the time. What is the probability Drew drinks something weird? What is the probability he does something silly? Here are a few of the other laws. Each pair of equations refers to the distributive, commutative, and associative laws respectively. For all of these, let A, B, and C be subsets of Ω. A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C) A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C). A∩B =B∩A A∪B =B∪A A ∩ (B ∩ C) = (A ∩ B) ∩ C. A ∪ (B ∪ C) = (A ∪ B) ∪ C. Please be aware that the formulas just written can be extended to more than 3 events (even an infinite number of events). 1.5 Counting Rules The Basic Counting Rule, or BCR is used for scenarios that have multiple choices or actions to be determined. Suppose that r actions (choices) need to be performed (in a definite order). Further suppose that there are m1 possibilities for the 1st action, m2 possibilities for the 2nd action, etc. Then there are m1 ∗ m2 ∗ ... ∗ mr possibilities altogether for the r actions. A factorial is the product of the 1st so many positive integers. Suppose we were looking at a generic (positive) integer k. Then k factorial, denoted k!, is equivalent to k*(k-1)*(k-2)*...*1. For 7 of 62 a specific example, 4! is 4*3*2*1 = 24. A permutation of r objects from a collection of n objects is any ORDERED arrangement of r distinct objects from the n objects. This is written as either (n)r or n Pr . Mathematically it is n! defined to be (n−r)! . The special permutation rule states that anything permute itself is equivalent to itself factorial. As an example, (n)n = n! or (6)6 = 6!. A combination of r objects from a collection of n objects is any UNORDERED arrangment of r distinct objects from the n total objects. The difference between a combination and a permutation is that order of the objects is not important for a combination. A combination, say n choose r n n r (as described above) is written as either n Cr or r . Mathematically, r is equal to (n) r! which is n! also equal to (n−r)!∗r! . An ordered partition of m objects into k distinct groups of sizes m1 , m2 , ..., mk is any division of the m objects into a combination of m1 objects constituting the first group, m2 objects comprising the second group, etc. The number of such partitions that can be made is denoted by m m! m1 ,m2 ,...,mk . Mathematically, this is equal to m1 !∗m2 !∗...∗mk ! . The symbol used in evaluating an ordered partition is called a multinomial coefficient. You may hear your instructor use both ordered partition and multinomial coefficient. Example 1.16 3 people get into an elevator and choose to get off at one of the 10 remaining floors. Find the following probabilities: • P(they all get off on different floors) • P(they all get off on the 5th floor) • P(they all get off on the same floor) • P(exactly one of them gets off on the 5th floor) • P(at LEAST one of them gets off on the 5th floor) Example 1.17 Suppose we have the fictional word DALDERFARG. • How many ways are there to arrange all of the letters? • What is the probability that the 1st letter is the same as the 2nd letter? • What is the probability that an arrangement of all of the letters has the 2 Ds next to each other? 8 of 62 • What is the probability that an arrangement of all of the letters has the 2 Ds next to each other and it has the 2 Rs grouped together (not necessarily the Ds and Rs next to each other)? • What is the probability that an arrangement of all the letters has the 2 Ds before the F? Example 1.18 Illinois license plates consist of 4 digits followed by 2 letters. Whereas, in Ohio, license plates start with 3 letters and end with 4 digits. Assume all letters are capitals (without loss of generality, or wlog). • For each state, how many possible license plates are there? • How many possible license plates are there for each state if no digit or letter is allowed to repeat? • How many possible license plates are there if they must have at least 1 vowel? • How many possible license plates are there if they must have at least one vowel or at least one 3? Example 1.19 Using a standard 52 card deck: • How many possible ways are there to get a 5 card poker hand? • What is the probability of getting a pair (with the other 3 cards different denominations)? • What is the probability of getting 2 pairs? • What is the probability of getting a full house? • What is the probability of getting a 3 of a kind (but not a full house)? • What is the probability of getting a straight? • What is the probability of getting a flush? Example 1.20 In a simplified version of the lottery, you have 20 numbers and 5 different numbers are drawn. You pick 5 numbers ahead of time and wait to see how many you matched those that were randomly drawn. • What is the probability you get 4 correct? • What is the probability you don’t get any correct? 9 of 62 • What is the probability you get exactly 2 correct given you got at least 1 correct? Example 1.21 Suppose Krannert only allows 5 spaces for a password to Portals. Suppose further you are only allowed to use a number or a letter, but the system is not case sensitive. • How many possible combinations are there? • If you cannot have 9 in the first space, how many possible combinations are there? • If you cannot have 9 in the first spot, what is the probability that all 5 blanks are odd numbers? • If you cannot repeat the same character, how many possible combinations are there? Example 1.22 We are looking at the finals of the 100m dash in the Olympics. There are 8 contestants, all with different last names, that represent 6 countries total, 2 of which have 2 contestants each. • How many ways are there for the contestants to finish if we look at their last names? • How many ways are there for the contestants to finish if we look at their countries? • If we are only interested in the medals, how many ways are there for this to occur if we are only interested in the countries of the winners? Example 1.23 A snack pack of skittles contains 20 candies, 5 of which are red, and 15 are either orange, green, yellow or purple. Find the following probabilities: • P(selecting 3 skittles with replacement and getting all 3 red) • P(selecting 3 skittles with replacement and getting exactly ONE red) • P(selecting 3 skittles with replacement and getting at LEAST one red) • P(selecting 3 skittles without replacement and getting all 3 red) • P(selecting 3 skittles without replacement and getting exactly ONE red) • P(selecting 3 skittles without replacement and getting at LEAST one red) Example 1.24 There are 4 different kinds of meat on a sandwich: Ham, Turkey, Roast Beef, Veggie. You can have either Swiss, American or Provolone Cheese and have it on Rye, White or Wheat bread. Then you have the option of 12 additional condiments such as dressing, mayo, pickles, peppers, lettuce, tomatoes etc. How many different sandwiches can be made? Example 1.25 You have the 7 Harry Potter books, 4 Twilight books and 3 Hunger Games books. 10 of 62 • How many ways can the books be arranged on a shelf? • What is the probability the first book is a Harry Potter book? • What is the probability the first and last books are not Harry Potter books? • What is the probability the books are grouped by series? • What is the probability the Hunger Games books are grouped by series and in the correct sequence order? • What is the probability the first and last books are from the same series? Example 1.26 There are 5 women and 15 men, 4 of which will be chosen to be in a group. • What is the probability all 4 are women? • What is the probability half are women? • What is the probability there are more women than men? • What is the probability there is at least one woman? Example 1.27 Suppose you have a fridge full of Powerades: 6 green, 4 blue, 3 red, and 4 yellow (otherwise identical except for color). • Suppose you grab 4 Powerades from the fridge. What is the probability that they are the same color? • How many distinct ways can you arrange all of the Powerades in the fridge? • How many distinct ways can you arrange all of the Powerades so that all bottles of the same color are next to each other? Example 1.28 A system composed of n separate components is said to be a parallel system if it functions when at least one of the components functions. Suppose the following systems function if current flows from A to B. If each switch (break in the line) is activated independently with probability p = 0.3, what is the probability the system functions? 1 2 A B 3 4 11 of 62 1 A 2 B 3 Example 1.29 The U.S. Senate consists of 100 senators, 2 from each of the 50 states. They want to form a committee, where each member has an equal role, consisting of 5 senators. • How many different committees are possible (without any restrictions)? • How many different committees are possible if no state can have more than 1 senator on the committee? 1.6 Conditional Probability, Independence, and Bayes’ Rule Let A and B be events. The probability that event B occurs given (knowing) that event A occurs is called a conditional probability. It is denoted as P(B | A). Whichever event is considered ”given” or ”known” goes after the | in the notation. P (B | A) = P (B ∩ A) . P (A) The above formula works so long as P(A) > 0. There is an equivalent, within the equally likely framework, to the above formula. It is P (B | A) = N (A ∩ B) . N (A) The idea behind conditional probability is that you have an idea of what occurred, but do not know exactly what happened. Meaning, you can limit the original sample space (Ω) to something smaller. In our above example, we know that the event A occurred, so what we are doing is making A our ”new” Ω. General multiplication rule is defined as P (A ∩ B) = P (A) ∗ P (B | A). This formula is equivalent to the 2 above, just our goal is different now. Before we wanted to figure out a conditional probability, now we want to know a joint probability, or a probability of an intersection of 2 events. This rule can easily be extended to more than 2 events. P( n \ Ai ) = P (A1 ) ∗ P (A2 | A1 ) ∗ P (A3 | A2 ∩ A1 ) ∗ ... ∗ P (An | An−1 ∩ ... ∩ A1 ). i=1 12 of 62 Important note: A lot of the formulas in this section are rearrangements of previous formulas. You use one over another depending on what you are given in the problem and what the goal is. It is important to define 2 types of sampling. Suppose for the sake of argument we are looking at the integers 1, 2, ... , 10. We want to choose 3 of these numbers, or we have 3 selections. If you were asked how many ways this could happen, it would depend on if sampling were done with or without replacement. Sampling with replacement means any element of the sample space has the ability to be chosen for any selection regardless of whether or not it was previously picked. The idea is that no matter how many selections (or trials) there are, after each selection (or trial), you record the outcome, then put that element back in the population, so that it can be sampled again. In this example, you could pick the number 1 three straight times if sampling were done with replacement. This would be unlikely, but possible. Sampling without replacement means any element of the sample space has the ability to be chosen at most once. Meaning once you pick an element on a certain selection (or trial), you can never pick that element again. Again, if you were to make your selection, record the element, you would not put that element back in the population to be chosen again. Once it has been selected, it is no longer a choice for any subsequent selections. Let us go back to our integer example. How many different samples are possible? If sampling is done with replacement, we have 10 choices for the first selection. Since we replace our selection before picking again, we still have 10 possibilities for the second selection. Similarly, we have 10 options for the last selection. Therefore, we have 10*10*10 = 1,000 different possible samples. Suppose instead we sampled without replacement. We would still have 10 choices for the first selection. However, we do not put that element back in the sample space. So, we only have 9 available options for our second pick. Additionally, we would only have 8 choices for our last selection, since we could not use either of our first 2 choices again. In total, we would have 10*9*8 = 720 different possible samples. Example 1.30 Refer to Example 1.15 with Drew. Find the following probabilities: • What is the probability that Drew drinks something weird, if we know he was paid? • What is the probability that Drew does something silly, if we know he was paid? • What is the probability that Drew drinks something weird, if we know he was not paid? Example 1.31 (ASW Chapter 4.4, Problem 38) A Morgan Stanley Consumer Research Survey sampled men and women and asked each whether they preferred to drink plain bottled water or a sports drink such as Gatorade or Propel Fitness water (The Atlanta Journal-Constitution, December 28, 2005). Suppose 200 men and 200 women participated in the study, and 280 reported they preferred plain bottled water. Of the group preferring a sports drink, 80 were men and 40 were women. Let 13 of 62 • M = the event the consumer is a man • W = the event the consumer is a woman • B = the event the consumer preferred plain bottled water • S = the event the consumer preferred a sports drink Answer the following: • What is the probability a person in the study preferred plain bottled water, or P(B)? • What is the probability a person in the study preferred a sports drink, or P(S)? • What is the probability that a person who prefers a sports drink is a man, or P (M |S)? What is the probability that a person who prefers a sports drink is a woman, or P (W |S)? • What is the probability a person is male and prefers sports drink, or P (M ∩ S)? What is the probability a person is female and prefers sports drink, or P (W ∩ S)? • Given a consumer is a man, what is the probability he will prefer a sports drink, or P (S|M )? Example 1.32 Using the Venn Diagram summarizing the distribution of operating systems (Example 1.13), calculate the following: • The probability that a randomly chosen student uses all three operating systems, given the student uses Windows. • The probability that a randomly chosen student uses all three operating systems, given the student does not use Windows. • The probability that a randomly chosen student uses Windows, given the student uses Mac OS. • The probability that a randomly chosen student does not use any of the operating systems, given the student does not use Windows. Example 1.33 Case Problem (Adapted from ASW Chapter 9, Case Problem 2, page 397) Cheating has been a concern of the dean of the College of Business at Bayview University for several years. Some faculty members in the college believe that cheating is more widespread at Bayview than at other universities, while other faculty members think that cheating is not a major problem in the college. To resolve some of these issues, the dean commissioned a study to assess the current ethical behavior of the business students at Bayview. As a part of this study, an anonymous exit survey was administered to this year’s graduating class. Responses to the following questions were used to obtain data regarding three types of cheating. Any student who answered “Yes” to one or more of these questions was considered to have been involved in some type of cheating. • During your time at Bayview, did you ever present work copied off the Internet as your own? • During your time at Bayview, did you ever copy answers off another student’s exam? • During your time at Bayview, did you ever collaborate with other students on projects that were supposed to be completed individually? The data are represented in the following Venn diagrams below: • Using the law of partitions, fill in the “Overall” Venn diagram. 14 of 62 MALES Copied off the Internet 1 21 6 1 0 6 2 Copied off an exam FEMALES Collaborated on Individual projects 1 Copied off the Internet 4 0 3 3 17 1 3 Copied off an exam Collaborated on Individual projects 0 OVERALL 38 Copied off the Internet 5 4 6 5 7 3 1 Copied off an exam Collaborated on Individual projects • What is the probability that a randomly chosen student was involved in some type of cheating? Use the inclusion-exclusion principle, then the idea of complements. Which is simpler? • Given that a randomly chosen student cheated, what is the probability that student was male? • Given that a randomly chosen student is female, what is the probability that student cheated? • What is the probability that a randomly chosen student neither presented work from the Internet nor copied answers off another student’s exam? • What is the probability that a randomly chosen student cheated in all three ways, given that the student copied answers off another student’s exam? 15 of 62 Example 1.34 Suppose the Queen of Statlandia does not have hemophilia, but may be a carrier of the hemophilia gene. If she is a carrier, any children she has will have a 50% chance of having hemophilia (independently). If she is not a carrier, her children will not have hemophilia. Since genetic testing is forbidden in Statlandia, the castle physician’s best estimate of the probability the Queen is a carrier was initially P(carrier)=0.5. Suppose the Queen has a son, and the son does not have hemophilia. Should the castle physician’s estimate of P(carrier) change? Why? If yes, to what? Now suppose the Queen has had three sons (none of which has hemophilia) and would like another child. What should the castle physician’s best estimate be for the probability the 4th child has hemophilia? In general, a conditional probability will change the original probability. This change may be an increase or a decrease. However, it could stay the same. When the conditional probability is that same as the unconditional probability, the events are said to be independent. Formally, let A and B be events. Let P(A) > 0. B is independent of A if the occurrence of A does not affect the probability that event B occurs, i.e. P (B|A) = P (B). The special multiplication rule restates the general multiplication rule, but for independent events. If A is independent of B, then P (A ∩ B) = P (A) ∗ P (B). Use the general multiplication rule to provide a proof of this statement. Also, the independence of the events A and B implies that the following are independent: 1. AC and B 2. A and B C 3. AC and B C It would be a good exercise to prove these on your own. For pairwise independence let us look at the events A1 , A2 , ..., AN . These events are pairwise independent if for every pair of events from the collection, those 2 events are independent of each other. Please note that this does not mean that if you take 3 or more of these events that they are independent. That deals with mutual independence. Again, consider the events A1 , A2 , ..., AN . They are said to be mutually independent if for each subcollection of events, the subcollection satisfies the special multiplication rule. That is, for each integer n, where 2 ≤ n ≤ N, then P (Ak1 ∩ Ak2 ∩ ... ∩ Akn ) = P (Ak1 ) ∗ P (Ak2 ) ∗ ... ∗ P (Akn ), where k1 , k2 , ..., kn are distinct integers between 1 and N. Mutual independence implies pairwise independence, but not the other way around. 16 of 62 Example 1.35 A man and a woman each have a standard deck of 52 cards. Each draws a card at random from his/her deck. • Find the probability the man draws the ace of clubs, the woman draws the ace of clubs, and that they both draw the ace of clubs. Are the 2 events independent? Please explain why or why not. • Suppose that 2 people share 1 deck. They each draw from the deck and keep their card. Find the probability the first person gets the king of hearts, the second person gets the king of hearts, and they both get the king of hearts. Are these events independent? If not, what other statistical term represents these two events? • A person randomly draws from a deck of cards. Let A be the event of a heart, B be the event of a face card, C be the event of a 7 or Jack. Are the events A and B indepedent? What about A and C? B and C? A, B, and C? Prove your answers mathematically. Example 1.36 Insurance companies assume that there is a difference between gender and your likelihood of getting into an accident which is why women generally have lower insurance rates than men. We did a study to see the number of accidents that occurred according to gender. We found that 60% of the population was male, 86% of the population was either male or got into an accident, 35% of the population are accident free. Does this study indicate that the likelihood of getting into an accident depends on gender? Prove your answer. Example 1.37 Chris and his roommates each have a car. Julia’s Mercedes SLK works with probability .98, Alex’s Mercielago Diablo works with probability .91, and Chris’ 1987 GMC Jimmy works with probability .24. Assume all cars work independently of on another. What is the probability that at least 1 car works? Law of Partitions: Suppose A1 , A2 , ..., AN form a partition of the sample space. Then for every event B in the sample space, P (B) = P (B ∩ A1 ) + ... + P (B ∩ AN ). Furthermore, the law of total probability restates this as P (B) = N X P (Ai ) ∗ P (B|Ai ). i=1 A very useful example of this is when you have the simple partition of an event (here we will use E) and its complement. Then, P (B) = P (E) ∗ P (B|E) + P (E C ) ∗ P (B|E C ). Refer to Example 1.37. What is the probability that exactly 1 car works? Example 1.38 Acme Consumer Goods sells three brands of computers: Mac, Dell, and HP. 30% of the machines they sell are Mac, 50% are Dell, and 20% are HP. Based on past experience Acme executives know that the purchasers of Mac machines will need service repairs with 17 of 62 probability .2, Dell machines with probability .15, and HP machines with probability .25. Find the probability a customer will need service repairs on the computer they purchased from Acme. Example 1.39 Let us assume that a specific disease is only present in 5 out of every 1,000 people. Suppose that the test for the disease is accurate 99% of the time a person has the disease and 95% of the time that a person lacks the disease. Find the probability that a random person will test positive for this disease. Example 1.40 Polya’s Urn Scheme: An urn contains b black balls and r red balls. One ball is selected at random, its color is recorded, and then it as well as c balls of the same color are put back in the urn. this process is repeated. find the probability that the first 2 balls selected are black and the third ball chosen is red. Example 1.41 Suppose at a given university the following statements are true. 15% of females are in sororities and 18-20% of males are in fraternities. The campus paper uses this information to claim that 33-35% of campus is ”greek”. Is this correct? If your answer is no, what is wrong with it and how would you fix it? Example 1.42 A grade school boy has 5 blue and 4 white marbles in his left pocket and 4 blue and 5 white marbles in his right pocket. If he transfers one marble at random from his left pocket to his right pocket, what is the probability of his then drawing a blue marble from his right pocket? Bayes’ Rule is used in order to revise probabilities in accordance with newly acquired information. Bayes’ Rule: Let A1 , A2 , ..., AN form a partition of the sample space. Then for every event B in the sample space, P (Aj ) ∗ P (B|Aj ) P (Aj |B) = PN . i=1 P (Ai ) ∗ P (B|Ai ) This is useful when you do not [directly] know the probability of event B, but you know the probability of B given the events A1 , A2 , ..., AN . Let us revisit our disease example above (#5). Suppose we are interested in what the probability of having the disease was given that the test was positive. We now have the following: P (D|O) = P (D) ∗ P (O|D) . P (D) ∗ P (O|D) + P (DC ) ∗ P (O|DC ) This is more often what we are concerned with in this problem. We are concerned with the idea of having the disease (or sometimes of being pregnant) given that the test was positive. This formula takes into account the probabilities of testing positive because the disease is present and the probability of a false positive. Let us revisit Example 1.34. What is the probability that the person has the disease given that they tested positive? Refer back to Chris’ car example, Example 1.37. What is the probability that Julia’s car works, given only 1 car works? 18 of 62 Example 1.43 There was an old television show called Let’s Make a Deal, whose original host was named Monty Hall. The set-up is as follows. You are on a game show and you are given the choice of three doors. Behind one door is a car, behind the others are goats. You pick a door, and the host, who knows what is behind the doors, opens another door (not your pick) which has a goat behind it. Then he asks you if you want to change your original pick. The question we ask you is, “Is it to your advantage to switch your choice?” Example 1.44 Let us roll 2 dice, a hunter green die and a cardinal red die. let A be the event that the hunter green die is odd. Let B be the event that the cardinal red die is odd. Let C be the event that the sum of the dice is odd. Prove that these events are pairwise independent but not mutually independent. Example 1.45 After the first exam, a student will go to the beach (event B) depending on if they pass the exam (event A). The probability a student will pass is .9. If a student passes, they go to the beach with a probability of .8. However, a student who fails the exam will only go to the beach with a probability of .4. A student passes the exam with probability .7. What is the probability that a student at the beach passed their test? What is the probability that a student not at the beach failed the test? Example 1.46 Suppose you are in MGMT 614, the class is divided into 2 groups and asked to manage a portfolio through Yahoo! Finance. On any given day, group 1 has an 85% chance of increasing their net worth while group 2 has a 75% chance of increasing their net worth. Assume that they had a decrease if they did not have an increase. Suppose 40% of the class is in group 1. If the teacher picks a student at random to report their portfolio change (from the previous day), what is the probability they report an increase? What is the probability that they are from group 2 knowing that they reported a decrease? Example 1.47 During a tennis match, a player served 75 times. He either aimed at the corner or middle of the court. 60% of the serves were aimed at the corner. Of the serves aimed at the middle of the court, 46.6% were faults (i.e. goodc ). Of the serves aimed at the corner of the court, 28.8% were faults. • What percent of serves were good? • What percent of serves were faults? • Of the good serves, what is the probability that it was aimed at the corner? • Of the faults, what is the probability it was aimed at the middle of the court? Example 1.48 You are playing a game. You get to pick 1 bill out of one of 2 bags. You roll a fair 6-sided die twice. If the sum is an 8, 9, or 10, you pick from bag B. 80% of the bills in bag A are $5. 72% of the bills in bag B are $5. All the bills are either $5 or $10. • What is the probability that you get a $5 bill? • What is the probability you picked from bag A knowing that you picked a $5 bill? • What is the probability you picked from bag B knowing that you picked a $10 bill? Example 1.49 An urn originally contains 8 red balls and 2 blue balls. You flip a fair coin 3 times. For each head you get, the prizemaster adds 2 more blue balls to the urn. When you are done flipping the coin, you pull 1 ball from the urn. If you get a blue ball you win a vacation. • What is the probability that you do not go on vacation? 19 of 62 • Given that you went on vacation, find the probabilities of 0, 1, 2, and 3 heads (separately). • Given that you did not go on vacation, find the probabilities of 0, 1, 2, and 3 heads (separately). Example 1.50 Glen and Jiabai are going to Indianapolis this weekend. They are twice as likely to go on Sunday as they are on Friday. They are three times as likely to go on Saturday as they are on Friday. There is a 45% chance of snow on Friday, 25% chance of snow on Saturday, and 30% chance of snow on Sunday. • What is they probability that it snows while Glen and Jiabai are in Indianapolis? • Given that it did not snow, what is the probability that they went on Friday? Saturday? Sunday? 2 2.1 Discrete Random Variables General Discrete Random Variables A variable denotes a characteristic that varies from one person or thing to another. Examples include height, weight, mariatl status, gender, etc. Variables can be either quantitative (numerical) or qualitative (categorical). We use many terms when describing variables, including frequency and relative frequency. These terms mean “count” and “percent of count written as a decimal” respectively. Example 2.1 The following is a chart describing the number of siblings each student in a particular class has. Note there are 40 students total. Siblings (x) 0 1 2 3 4 Frequency of Students 8 17 11 3 1 Relative Frequency .200 .425 .275 .075 .025 Percentage of Students 20.0 42.5 27.5 7.5 2.5 A random variable is a real-valued function whose domain is the sample space of a random experiment. In other words, a random variable is a function X : Ω −→ R where Ω is the sample space of the random experiment under consideration and R represents the set of all real numbers. (You can think of a random variable as a way of assigning probabilities to an event of an experiment.) From the above example, the event that the student randomly drawn from the class has 2 siblings can be expressed in several ways. Way 1 is {ω ∈ Ω : X(ω) = 2}. Or the shorthand way is to say 11 {X=2}. The probability of this event is 40 or .275. We could define the event A as the event that a student has 2 or more siblings. P (X ∈ A) is what? 20 of 62 There are two main types of quantitative random variables: discrete and continuous. A discrete random variable often involves a count of something. Examples may include number of cars per household, number of hours spent studying for a test, number of hours spent watching t.v. per day, etc. A random variable X is called a discrete random variable if the outcome of the random variable is limited to a countable set of real numbers (meaning the r.v. can only take on so many real values). Mathematically, we have a countable set K (of real numbers) s.t. P(X ∈ K)=1. Another key word for r.v.s is support. The word support means the possible values a r.v. can take. Any r.v. with a countable support – that is whose possible values form a finite or countably infinite set – is a discrete r.v. Another way of stating this is to say that all of the probability for a 1 discrete r.v. occurs at particular points. These points (or numbers) could be 1, 100, .5, -22, - 11 . There is no stipulation that a r.v.s’ support must be positive or an integer. Random variables (depending on the context) can take on really any value from R. Let X be a discrete r.v. Then the probability mass function (pmf) of X is the real-valued function defined on R by pX (x) = P(X=x). An important note is that capital letters, like X, are used to denote r.v.s. Lowercase letters, like x, are used to denote possible values of the random variable. This distinction will be used throughout this course as well as in most Statistics courses. The subscript in the pX (x) notation is used to denote that this is the pmf of the r.v. X. We could use Y, Z, etc. If it is obvious what variable we are referencing, the subscript is often dropped. The x in parentheses refers to the value of the r.v. that we are interested in. Example 2.2 Flip a fair coin 3 times. Let X denote the number of heads tossed in the 3 flips. Create a pmf for X, assuming the following: • the coin is fair. • P(heads on 1 flip)=0.7. • Suppose we used 10 flips, with P(heads on 1 flip)=0.7. – How many outcomes are there? – What is the probability of 7 heads? – What is the probability you get at least one head? Example 2.3 This is problem 7 from the Fall 2010 Stat 225 Exam 2. There are 3 guys and 2 girls sitting in a row of 5 seats at the Wabash Landing 9. Let G be the number of girls sitting at the ends [of the row]. First, find the pmf of G. Secondly, suppose the following information is true. A person will only order popcorn during the movie if they are sitting at the end of the row. A guy will order 2 boxes, but a girl will order 1 box. Let C denote the number of boxes of popcorn the 5 friends will order. Find the pmf of C. Example 2.4 Refer to Example 2.1. Let M be the amount of money that parents spend on college. Let M = 30,000(X+1) + 2,000. Find the pmf for M. Basic Properties of a PMF: 21 of 62 • 0 ≤ pX (x) ≤ 1 ∀x ∈ R. That is to say a pmf is a nonnegative function and it cannot be bigger than 1 at any point. • {x ∈ R : pX (x) 6= 0 } is countable. That is the set of real numbers for which a pmf is nonzero is countable. P • x pX (x) = 1. The sum of the values of a pmf equals 1. This is just another way to say P(Ω)=1. P Suppose that X is a discrete r.v. Then, for any subset A of real numbers, P(X ∈ A) = x∈A pX (x). This states that the probability a discrete r.v. takes a value from a specified subset of real numbers is just the sum of the pmf of the r.v. over that subset of real numbers. Interpretation of a pmf In a large number of independent observations of a discrete r.v. X, the proportion of times that each possible value occurs will approximate the pmf at that value. This is the frequentist viewpoint. Example 2.5 Let X be a random variable with pmf defined as follows. pX (x) = k ∗ (5 − x) for x = 0, 1, 2, 3, and 4. However, pX (x) = 0 for all other possible values of X. • Find the value of k that makes pX (x) a legitimate pmf. • What is the probability that X is between 1 and 3 inclusive? • If X is not 0, what is the probability that X is less than 3? Interpretation of an expected value Classic probability asserts that the expected value of a r.v. is the long-run average value of the r.v. in independent observations. The expected value of a discrete r.v. X, denoted by E[X] is defined by E[X] = X x ∗ pX (x). x In other words, the expected value of a discrete r.v. is a weighted average of its possible values, and the weight used is its probability. Sometimes we refer to the expected value as the expectation, the mean, or the first moment. Sometimes it is denoted by µX . For any function, say g(x), we can also find an expectation of that function. It is X E[g(X)] = g(x) ∗ pX (x). x An expectation we are often interested in is E[X 2 ]. So, using the above formula, how could we write this? Expectation of r.v.s has some nice properties that can be quite useful computationally. Let X and Y be independent, discrete r.v.s defined on the same sample space and having finite expectation (meaning < ∞). Let a and b be real numbers. Then the following hold: 22 of 62 • The r.v. X + Y has finite expectation and E[X + Y] = E[X] + E[Y]. • The r.v. aX + b has finite expectation and E[aX + b] = a*E[X] + b. The variance of a r.v. is a measure of the spread, or variability, in the r.v. The conceptual definition of variance is Var(X) = E[(X − µX )2 ]. Basically, this states that variance is the expected squared deviation of a r.v. from its mean. You could combine this with the E[g(X)] formula to calculate variance. However, there is another way. We can also define variance by Var(X) = E[X 2 ] - (E[X])2 . This is typically more useful, mainly because we are often interested in E[X] so there is just one more calculation in order to find Var(X). There are 2 useful properties of variance of a r.v. Let X be a r.v. and let c be a constant. • Var(cX) = c2 *Var(X) • Var(X + c) = Var(X) Examples: Refer to Example 2.1, Example 2.2, Example 2.3. Calculate the expectation and variance of those variables. Please note the properties of expectation and variance. These could save you some time. As a check, E[X] and Var(X) for Example 2.1 is 1.3 and .91 respectively. Example 2.6 How many licks does it take to get to the center of a tootsie roll pop? You have the following distribution representing your population. Calculate E[X] and Var(X). animal owl thing 1 thing 2 silly person licks 3 100 200 427 probability .001 .55 .448999 .000001 Example 2.7 How much wood could a woodchuck chuck if a woodchuck could chuck wood? We have the following distribution measured in butt cords. Calculate E[X] and Var(X). family member younger brother older sister mom dad amount of wood 153 272 573 1245 probability .15 .2 .23 .42 Example 2.8 Peter Piper picked a peck of pickled peppers. If Peter Piper could pick the following number of pecks of peppers in a day, what is the expected value and variance of the number of pecks of pickled peppers that Peter Piper could pick in a day? Every week Peter goes to the market on Saturday and sells all of his pecks of peppers. He does not pick peppers on Saturday. If he gets $ .35 for a peck of peppers, what is the expected value and variance of the amount of money he will earn? 23 of 62 # of Pecks 20 50 120 175 200 probability .01 .25 .35 .2 .19 Example 2.9 Sally sells seashells by the seashore. Suppose on a given day she sells 1-5 shells with respective probabilities .25, .15, .3, .2, and .1. If each shell sells for $2, how much money can Sally expect to earn in a day? Example 2.10 The pmf of a discrete r.v. X is described below. x pX (x) -2 .22 -1 .29 0 .04 1 .19 2 .11 3 .15 • What is the probability that X is between -.8 and 2.2? • Given X is at least 0, what is the probability that it is at least 1? • Find E[X] and Var(X). • Let Y = 2X - 1. Find the pmf of Y. • Let Z = X 2 . Find the pmf of Z. • What is special about Y compared to Z that makes part d easier than part e? Does the linearity property of expectation hold for both Y and Z? For a general expectation of a random variable, you can refer to the formula: X E[g(x)] = g(x) ∗ p(x). x As an example, this would mean that E[|x − 3|] = X |x − 3| ∗ p(x). x Instead of using this general formula, you could also create a new random variable and its pmf. You could let y = the function of x that you desire. Example 2.11 Refer to Example 2.9 (Sally and her seashells). Let Sally’s cost function be .4|X − 1.5|. Use this information and the formula previously presented to calculate E[Y] and Var(Y). Next construct the pmf for Y and redo your calculations using the regular formulas for Expectation and Variance of an r.v. Example 2.12 Let X be a r.v. Let pX (x) = .1|x − 2| for x = -2, -1, 0, and 1 and be 0 otherwise. Let Y be X 2 . Find E[Y] and Var(Y). Example 2.13 Let X be a r.v. that takes the two values {-1, 1}. However you do not know the pmf. Let E[X] = Θ. 24 of 62 • Find a formula for Var(X) written in terms of Θ. • Verify that your above formula makes sense for when Θ = -1 or for when Θ = 1. • What value of Θ maximizes Var(X)? Let p be the P(X = 1). What value of p maximizes Var(X)? Example 2.14 Suppose X and Y are random variables with E(X) = 3, E(Y ) = 4 and V ar(X) = 2. Find: • • • • • • 2.2 E(2X + 1) E(X − Y ) E(X 2 ) E(X 2 − 4) E((X − 4)2 ) V ar(2X − 4) Bernoulli and Binomial Random Variables Many problems in probability involve independently repeating a random experiment and observing at each repetition whether a specified event occurs. We label the occurrence of the specified event a success and the nonoccurrence of the specified event a failure. A success could be a female child, a head from a coin flip, a 5 on a die, a defective part in a manufacturing warehouse, a green spin in roulette, etc. A success can take on a positive or negative connotation in the context of an example; it is merely the event that we are interested in. Each repetition of the random experiment is called a trial. We use p to denote the probability of a success on 1 trial. In Bernoulli Trials, p remains constant from trial to trial. Conditions for Bernoulli: • The trials are independent of one another. • The result of each trial is classified as a success or failure, depending on whether or not a specified event occurs respectively. • The success probability and therefore the failure probability remains the same from trial to trial. An important note: Say that we want to extract a sample of size n one-by-one from a larger population, and see how many successes we get. If we sample with replacement, each individual draw is Bernoulli and all n draws are independent of each other; hence, the number of successes is Binomial. However, if we sample without replacement, the n draws are no longer independent; the distribution of number of successes is no longer Binomial. Sometimes the Bernoulli Distribution is called an indicator function, i.e. it lets one know whether or not a specific event has occurred. 25 of 62 Characteristics of the Bernoulli Distribution: • The definition of X. • The support is: • Its parameter(s) and definition(s): • The pmf if: • The expected value is: • The variance is: We can define the Binomial R.V. as the number of successes in n independent trials, where the probability of success in one trial is p. Characteristics of the Binomial Distribution: • The definition of X. • The support is: • Its parameter(s) and definition(s): • The pmf if: • The expected value is: • The variance is: There are several approximations in this course. All 3 of them involve the Binomial in some way. These will be written in later on where appropriate. However, I give a quick summary here. We can use the Binomial to approximate the Hypergeometric if N > 20n. We can use the Poisson to approximate the Binomial if n > 100 and p < .01. We can use the Normal to approximate the Binomial if np > 5 and n(1-p) > 5. Example 2.15 In Chris’ Stat 225 class, 75% of the students passed (got a C or better) on Exam 1. If we were to pick a student at random and asked them whether or not they passed. Let X represent the number of student(s) who passed. • What type of random variable is this? How do you know? Additionally, write down the pmf, the expected value, and the variance for X. • Repeat under the following assumption: What about if we picked 10 students with replacement and let X be the number of student(s) who passed. Example 2.16 Suppose that 95% of consumers can recognize Coke in a blind taste test. Assume consumers are independent of one another. The company randomly selects 4 consumers for a taste test. Let X be the number of consumers who recognize Coke. • Write out the pmf table for X. 26 of 62 • What is the probability that X is at least 1? • What is the probability that X is at most 3? Example 2.17 To test for ESP, we have 4 cards. They will be shuffled and one randomly selected each time, and you are to guess which card is selected. This is repeated 10 times. You do not have ESP. Let R be the number of times you guess a card correctly. What are the distribution and parameter(s) of R? What is the expected value of R? Furthermore, suppose that you get certified as having ESP if you score at least an 8 on the test. What is the probability that you get certified as having ESP? 2.3 Hypergeometric Random Variables Important applications are quality control and statistical estimation of population proportions. The hypergeometric r.v. the equivalent of a Binomial r.v. except that sampling is done without replacement, or put another way, the trials are dependent (no longer Bernoulli trials). As an illustration, let us revisit a poker example. Assume we have a standard 52 card deck and we are drawing five cards without replacement. Let us use our counting rules to determine the probability of 3 kings. For the sake of this problem, we are going to assume we do not care what the remaining two cards are, just that they are not kings. The answer to this problem involves combinations since we are sampling without replacement, and the sampling order does not matter (because we only care about which cards we received, not in what order we received them). So, you have to answer 3 questions. How many ways are there to get 3 kings? How many ways are there to get the remaining 2 cards? How many ways are there total to get a 5 card hand? Put (4)∗(48) these all together for the answer of 3 52 2 . Little did you know, you just used the hypergeometric (5) distribution. Characteristics of the Hypergeometric Distribution: • The definition of X. • The support is: • Its parameter(s) and definition(s): • The pmf if: • The expected value is: • The variance is: What is the difference between an Binomial r.v. and a Hypergeometric r.v.? Hint: Do NOT say N. Approximation. If X∼Hyp(N,n,p) and N > 20n, then we can approximate the probability of X by using X* ∼ Bin(n,p) (the same n and p). 27 of 62 Example 2.18 There are 100 identical looking 52” TVs at Best Buy in Costa Mesa, California. Let 10 of them be defective. Suppose we want to buy 8 of the aforementioned TVs (at random). What is the probability that we don’t get any defective TVs? Example 2.19 An experiment consists of shuffling a standard deck of 52 cards and then dealing a 10 card hand. Let Y denote the number of hearts in the hand. • Identify the distribution of Y and give its parameter(s). Find the probability that Y is 3. • Suppose instead of using 1 deck, we mix together 1,000 decks. The cards are shuffled and 10 are dealt into a hand. Again, let Y denote the number of hearts in the hand. Is an approximate distribution appropriate for Y, why or why not? Find the probability that Y is 3 (if an approximation is appropriate, use that instead of the exact distribution). If you used an approximation, what is the distribution and the value of its parameter(s)? Example 2.20 Jacob is shooting a basketball at a carnival in order to win a stuffed animal for his girlfriend. On a single shot, Jacob can make a basket with probability .65. Jacob will win a small prize if he makes at least 2 out of 3 shots. Jacob pays $4 for three shots. • What is the probability that Jacob will win a small prize with his first $4. What distribution and what parameter(s) are you using? • • What is the probability it takes Jacob $20 to win hist first small prize? 2.4 Poisson Random Variables P tn An important fact from Calculus is: et = ∞ n=0 n! . This fact will allow one to show that the pmf for a Poisson indeed sums to one for any value of λ. The Poisson r.v. also measures number of successes (like the 3 preceding named discrete r.v.s). However, it is different from the others in the fact that it does not have a sample size (or depending on perspective, you can take the sample size to be infinite). While our 3 previous r.v.s measure number of successes in a certain number of trials, the Poisson r.v. measures number of successes per [blank]. This [blank] can be something like hours, cookies, area, volume, etc. Examples in the past have included: number of chocolate chips in a cookie (or batch of cookies), number of busses per hour, number of silver loop busses per hour, number of defects per square foot, etc. Characteristics of the Poisson Distribution: • The definition of X. • The support is: • Its parameter(s) and definition(s): • The pmf if: 28 of 62 • The expected value is: • The variance is: Approximation: If X ∼ is Bin(n,p) where n > 100 and p < .01, then X can be approximated by X* ∼ Poisson(λ = np). Example 2.21 Let us say a certain disease has a probability of occurring in 7 out of 5,000 people. Let us sample 1,000 people. Find the exact and approximate probabilities that 0 people have the disease and at most 5 people have the disease. Example 2.22 Suppose earthquakes occur in the western US with a rate of 2 per week. Let X be the number of earthquakes in the western US this week. Let Y be the number of earthquakes in the western US this month (assume a 4 week period of time). Find the probability that X is 3 and Y is 12. Let Z be the number of weeks in a 4 week period that have a week with 3 earthquakes in the western US. Find the probability that Z is 4. Is this the same as the probability that Y is 12? Does this make sense? Example 2.23 A store has 50 light bulbs for sale. Of these, 5 are black lights. A customer buys eight light bulbs randomly chosen from the store. Let B denote the number of black light bulbs the customer selected. Define the distribution of B. What is the probability that B is 1? What is the probability the customer gets at least one black light bulb? Example 2.24 PRP has on average 4 telephone calls per minute. Let X be the number of phone calls in the next minute. Find the probability that X is at least 3. Example 2.25 Customers arrive at the VP on 9th Street at a rate of 10 per hour. What is the distribution of the number of customers that arrive in the first 3 hours, call this distribution Y? What is the probability that exactly 12 customers arrive in each of the first 3 hours? What is the probability that Y is 36? Example 2.26 You are interested in the Indianapolis Indians. They play 20 games in the month of August. Of their games, they win 10% of them by 2 runs or fewer. Assume each game is independent of any other game. Let G be the number of August games won by the Indians by 2 or fewer runs. • What is the distribution and parameter(s) of G? • Wbat is the probability that G is either 2 or 3? • If the Indians win 4 or more games by 2 or fewer runs in August, they will receive $20,000 bonuses. What is the probability the players receive bonuses? • Given the players do not receive bonuses, what is the probability that they win exactly 3 games by 2 runs or fewer? • What is the expectation of G? • What is the variance of G? Example 2.27 A girl scout troop has 100 boxes of cookies to sell. Of these 100 boxes, 60 are thin mints and 40 are Samoas. 10 boxes are randomly selected to be sold at the White County Fair. Let S be the number of boxes of Samoas selected to go to the fair. What is the distribution of S as well as the value(s) of its parameter(s)? Find the probability that S is 0. Suppose that thin mints can sell for $4 and Samoas can sell for $3.50. What is the expected value and 29 of 62 standard deviation of the amount of money the girl scouts will receive at the fair (assume that all 10 selected boxes will be sold). Example 2.28 Tom Maloney decided to hang out with friends the night before his quiz and did not study. He has no knowledge of any of the material on the quiz. The quiz consists of 5 multiple choice questions with 3 possible answers each. Let T be the number of answers that Tom correctly guesses. What is the distribution and parameter(s) of T? What is the probability that Tom gets at least a B (on our grading scale)? Example 2.29 Flaws on a used computer tape occur on the average of one flaw per 1,200 feet. Let X denote the number of flaws in a 4,800 foot roll. Name the distribution of X. What is the probability that X is at least 1? 2.5 Geometric and Negative Binomial Random Variables The Geometric and Negative Binomial Distributions also deal with successes and failures. However, they are not looking to count the number of failures in a given sample size. They count the sample size necessary to get a given number of successes. More specifically, if X is Geometric, it measures the number of trials up to and including the 1st success. If X is NB(r,p), then it measures the number of trials up to and including the rth success. For both the Geometric and Negative Binomial, we consider the set-up as independent Bernoulli trials. Characteristics of the Geometric Distribution: • The definition of X. • The support is: • Its parameter(s) and definition(s): • The pmf if: • The expected value is: • The variance is: The Geometric distribution has 2 wonderful properties. They are called the tail probabiity formula and the lack-of-memory (or memoryless) property. Their respective formulas are given below: Tail probability: P (X > k) = (1 − p)k Memoryless Property: P (X > s + t | X > s) = P (X > t) Characteristics of the Negative Binomial Distribution: • The definition of X. • The support is: • Its parameter(s) and definition(s): • The pmf if: 30 of 62 • The expected value is: • The variance is: Example 2.30 Suppose Dunphy is really bad at tossing a Frisbee. His girlfriend attempts to teach him how to aim. However, it inevitably ends in hitting a passerby. Suppose Dunphy hits pedestrians at a rate of 1 out of 5 people that walk past the campus mall. Every time that Dunphy thinks he is going to hit a person with the Frisbee, he yells, Geronimo! Eventually, he gets the hang of it. He exclaims, Eureka! Eureka is Greek for I have found it. However, before he gets acclimated to throwing a Frisbee, what is the probability that his first accidental hitting is between the 5th and 10th person, inclusive, that walks by? What distribution (with parameter(s)) did you use? Example 2.31 Pat is required to sell candy bars to raise money for the 6th grade field trip. There is a 40% chance of him selling a candy bar at each house. He has to sell 5 candy bars in all. • What is the probability he sells his last candy bar at the 11th house? • What is the probability of Pat finishing on or before the 8th house? Example 2.32 From past experience it is known that 3% of accounts in a large accounting population are in error. (Assume the firm is so big that sampling is done with replacement since sampling the same account has such as small probability.) • What is the probability that 5 accounts are audited before an account in error is found? • What is the probability that the first account in error occurs in the first five accounts audited? • What is the probability it takes a double digit number of accounts audited to find one that is in error? Example 2.33 Bob is a high school basketball player who has a 70% free throw percentage. Assume all free throw attempts are independent of one another (i.e. there is no such thing as a hot hand). • What is the probability it takes more than 3 shots to get his first made free throw? • What is the probability his first made free throw is on the third shot? • What is the probability that his third made free throw is on his fifth shot? • What is the probability that his 100th made free throw is on his 123rd shot? Example 2.34 The Minnesota Twins are having a bad year. Suppose their ability to win any one game is 42% and games are independent of one another. • • What is the probability it takes 14 games for them to win their fourth game? • What is the expected value and variance of the number of games it will take them to win their fortieth game? • What is the expected value and variance of the number of games it will take them to win their first game? • Knowing they got their 49th win with 5 games remaining in the season, what is the probability that they do not get 50 or more wins? 31 of 62 To begin with, there are essentially 2 groups of named, discrete random variables that we have discussed in Stat 225. There are the r.v.s that count the number of successes (Bernoulli, Binomial, Hypergeometric, and Poisson). There are also the r.v.s that count the number of trials up to and including a certain number of successes (Geometric and Negative Binomial). Secondly, Bernoulli and Binomial are related in the sense that Binomial can be thought of as the sum of n independent Bernoulli r.v.s with the same value of p. Or, you could [potentially] think of Bernoulli as being a Binomial with n=1. Thirdly, Geometric and Negative Binomial are related in much the same way that Bernoulli and Binomial are related. Negative Binomial is really the sum of r independent Geometric r.v.s with the same value of p. Or, you could [potentially] think of Geometric as being a Negative Binomial with r=1. Lastly, there are 2 approximations that can be made. The first one occurs if the actual (exact) distribution is Hypergeometric and N > 20n. Then we can approximate it with a Binomial r.v. with the same n and same p as that of the original Hypergeometric r.v. The second approxmation occurs if the actual (exact) distribution is Binomial and both n > 100 and p < .01. Then, we can approximate it with a Poisson r.v. where we set λ = np. Why do we do this? Well, we are setting the expected values equal for the two distributions. Example 2.35 In a jar there are 200,000,000 coins, 5,000,000 of which are quarters. You select 50 coins from the jar randomly and without replacement. Let X be the number of quarters in your sample. What is the distribution of X? Find the probability that X is 2. Is there an approximate distribution for X, why or why not? If there is, call the approximation X* and find P(X* = 2) as well. Example 2.36 We look at sampling a 5 card hand from a standard deck of playing cards. First, compute the probability of a full house. Nick plays a game with his friend Errrr. Errrr bets $1 every hand (5 cards). If he gets a full house, he wins $500 (on top of keeping his bet of $1); otherwise, he loses the $1 to Nick. Suppose in an afternoon of gambling, Nick and Eric play this game 500 times. Let E denote the number of hands that Errrr wins in this particular afternoon. Name the distribution and parameters of E. Find the probability that E is at least 3. Next, is an approximate distribution appropriate for E, why or why not? If an approximation is appropriate, label it E* and find the above probability with E* instead of E. Example 2.37 Mike is playing fetch with Maxine. At nighttime, Maxine does not always see the ball. On any one throw, she has a probability of .30 of not seeing/finding the ball. One late autumn evening, Mike throws the ball to Maxine 50 times. Let SM be the number of times that Maxine cannot find the ball. What is the distribution of SM? Find the probability that SM is between 13 and 17 inclusive. An approximation is not appropriate for SM, why not? Let’s ignore this and use the approximation anyway. Let SM* be the approximation. Find the probability that SM* is between 13 and 17 inclusive. Did SM* do a good job? Example 2.38 Suppose there are 2,000 stocks on the NYSE. We are looking at making a portfolio consisting of 500 different stocks. We just finished reading the Wall Street Journal and discovered that 32 of 62 there are 200 stocks that have risen in price over the last week. Let RS denote the number of stocks in your sample that have risen over the previous week. What is the distribution of RS? Find the probability that RS is between 50 and 55 inclusive. An approximation is not appropriate for RS, why not? Let’s ignore this and use the approximation anyway. Let RS* be the approximation. Find the probability that RS* is between 50 and 55 inclusive. Did RS* do a good job? Example 2.39 Adaptation of Spring 2012 Exam 1 Problem 5. Chris is collecting the quarters featuring the different U.S. states on the back. Suppose now he has a jar with 50 quarters, 7 of which are Minnesota quarters, 8 are Indiana quarters. One day he randomly picks 9 quarters from the jar without replacement. Let MN be the number of Minnesota quarters he selects. Name the distribution and the parameters for MN. Find the probability that MN is at least 8. Find the probability that MN is at most 2. What is E[MN]? Example 2.40 Assume the set-up in Example 2.39. However, suppose he picks (with replacement) a quarter until he gets his first one from Minnesota. Let F denote the number of trials it takes until he picks his first one from Minnesota. Define the distribution of F. Find the following probabilities related to F: at most 4, at least 6, and exactly 5. Example 2.41 Assume the set-up in Example 8.2. However, now we are looking for the 5th time he picks a Minnesota quarter. Let T denote the number of trials it takes until he picks his fifth one from Minnesota. Define the distribution of T. Find the following probabilites related to T: at most 4, at least 6, and exactly 5. Example 2.42 Adaptation of Spring 2012 Exam 1 Problem 6 Assume a page on a book has to be edited if there are at least 2 typos on it. On average, there are 3 typos every 4 pages in this 300 page book. Consider pages independent of one another as far as typos are concerned. Let ED represent the number of pages that need to be edited in this book. Define the distribution and parameters of ED. Find the following items for ED: expected value, variance, and the probability it is between 52 and 56 inclusive. Example 2.43 Assume the set-up in Example 8.4. Additionally, assume that we have 10 books total that have the same properties as the original book. Let B represent the number of books in this stack that we have looked at in order to find the first one that has between 52 and 56 pages that need to be edited. Create a pmf for B. Let P(B ≥ 10) be P(B=10) in your pmf or pmf table. Nested problems really just means that we switch distributions throughout the problem. You must pay careful attention to the variable under consideration at all times. Example 2.44 The wonderful candy shop, Albanese Candy Outlet, makes chocolate chip cookies as part of their production line. Chocolate chips in the cookies are randomly and independently distributed with an average of 12 chocolate chips per cookie. You and 9 of your friends decide to make a trip to Albanese Candy Outlet. Each of you buys one chocolate chip cookie. • What is the probability that your cookie contains between 10 and 15 chocolate chips inclusive? • What is the probability that 5 or 6 people in your group have cookies with between 10 and 15 chocolate chips inclusive? 33 of 62 • While examining your cookies (one-by-one), what is the probability that it takes at least 4 cookies to find the first one with between 10 and 15 chocolate chips inclusive? • While examining your cookies (one-by-one), what is the probability that it takes at least 4 cookies to find the first one with 12 or 13 chocolate chips? • Suppose you and your 9 friends were to go repeatedly to Albanese Candy Outlet. What is the probability that it takes until your sixth trip so that 5 or 6 people in your group have 12 or 13 chocolate chips in their cookie? Example 2.45 An urn contains 6 red balls, 6 green balls, and 3 purple balls. You randomly reach in and pull out 4 balls. • Assume sampling is done with replacement. What is the probability that you draw at least 2 purple balls? • Assume sampling is done without replacement. What is the probability that you draw at least 2 purple balls? • Which of the 2 previous parts was easier computationally and why? • Assume sampling is done with replacement. What is the probability that it takes you until your tenth sample to get a sample with at least 2 purple balls? Example 2.46 Let us play name the distribution as well as the parameter(s). This problem is adapted from Stat 225 Fall 2008 HW 6 problem 1. • X is the number of 5’s in ten rolls of a fair die. • A baseball starting lineup consists of nine players, three of which are outfielders. A random sample of three players is taken from a baseball starting lineup. Let X be the number of outfielders in the sample. • X is the number of Hearts in a five-card poker hand dealt from a standard 52 card deck. • Let us repeatedly deal out five-card poker hands (replacing the cards after each hand is dealt). Let X be the deal number of the first time in which we get a flush. • Let us repeatedly deal out five-card poker hands (replacing the cards after each hand is dealt). Let X be the deal number of the eighth time in which we get a a straight (allow the A-5 straight). • A player wins a game if he/she rolls at least one 6 in four rolls of a fair die. Let X be the outcome (win or lose) of this game. • Customers arrive at Alice’s with a rate of 5 per hour. Let X be the number of customers that enter Alice’s between 2 A.M. and 4 A.M. Example 2.47 It rains 3 days per month on average in California. For simplicity assume all months are of equal length. • What is the probability that there are no rainy days next month? • What is the probability that there will be 4 rainless months during the next year? • What is the probability that April is the first month this year with at least some rain? • What is the probability that October is the second month with 2 or more days of rain this year? 34 of 62 3 Continuous Random Variables 3.1 General Continuous Random Variables A continuous random variable typically involves measurement. One way to define a continuous random variable is that it has no point mass, or no point probabilities. This is in direct contrast to discrete random variables. Mathematically, a random variable X is called a continuous r.v. if P(X=x) = 0 for all x in R. Some useful set notation is that x ∈ (0,1) is {x: 0 < x < 1} while x ∈ [0,1] means {x: 0 ≤ x ≤ 1}. Cumulative Distribution Function, cdf, is a key topic for r.v.s (discrete and continuous alike). Let X be a r.v., then the cdf of X, denoted by FX (x) is the real-valued function defined on R by FX (x) = P (X ≤ x) such that x is in R. While a cdf applies to any type of r.v., we typically only use it with respect to continuous r.v.s. The reason for this is that most discrete random variables do not have a nice functional form for their cdf. Example 3.1 Let us find the cdf of a coin tossing example. • Let n=4, p=.7, and X be the number of heads in the sample. Find the cdf for X. • Keep the above set-up, but use p=.5 instead. What is the cdf for this r.v.? Example 3.2 Let us find the cdf of a random experiment over an interval. • Let X denote a number selected at random from the interval (0,1), what is the cdf of X? • Let X denote a number selected at random from the interval (0,10), what is the cdf of X? Properties of a cdf 1. It is nondecreasing. 2. It is everywhere right-continuous. 3. It has a value of 0 for x = -∞ 4. It has a value of 1 for x = ∞ Useful Identities • P(c < X < d) = FX (d−) − FX (c) • P(c ≤ X < d) = FX (d−) − FX (c−) 35 of 62 • P(c < X ≤ d) = FX (d) − FX (c) • P(c ≤ X ≤ d) = FX (d) − FX (c−) Most of the above are really important when we have a cdf that has a jump (whether it is a cdf for a discrete r.v. or a “mixed” r.v.). However, the idea of the probability of being in a region for a CONTINUOUS r.v. is the cdf at the higher x value minus the cdf at the lower x value. Putting this another way, FX (b−) = FX (b) and FX (a−) = FX (a) for all values of a and b if X is a continuous r.v. Probability Density Function, pdf is another key topic for continuous r.v.s. Let X be a continuous r.v. A nonnegative function fX is said to be a pdf for X if, for all real numbers a < b, Z P (a ≤ X ≤ b) = b fX (x)dx a The pdf is the derivative of the cdf (only where the cdf is nonzero. Anywhere the cdf is 0, the pdf is also 0.) Revisit Example 3.2. What are the pdfs for these 2 problems? Properties of the pdf: 1. fX (x) ≥ 0 for all real numbers x. R∞ 2. −∞ fX (x)dx = 1. 3. P (a ≤ X ≤ b) = Rb a fX (x)dx for all real numbers a and b such that a ≤ b. Recall, item 3 above can also be written as FX (b) − FX (a). This brings us back to the definition or formulation of the cdf. We can define the cdf in 2 ways. The first is more of the interpretation of the cdf and the second is how to calculate or find it, if it is not given in a problem. FX (x) = P (X ≤ x) Z x FX (x) = fX (u)du −∞ Expected Value is still a big topic for continuous r.v.s. The formula is similar to that for discrete r.v.s. How do you think the sum would change for a continuous r.v.? How do you think pX (x) would change? E[X] = Again, you can do general expectations for functions of a random variable. For any function of x, say g(x), you can find the expectation of g(x). E[g(x)] = 36 of 62 An interesting note is that not all continuous distributions have a finite expected value (sometimes they are infinite). If they do not have a finite expected value, we say they do not have an 1 expected value. A famous example is the Cauchy distribution, which has a pdf of π(1+x 2 ) which takes values anywhere in R. Linearity Property of Expected Value Let X and Y be continuous r.v.s with a joint pdf and finite expectations. Also, let a, b, and c be real numbers. Then the following hold: 1. The random variable X + Y has finite expectation and E[X + Y] = E[X] + E[Y]. 2. E[cX] = c*E[X] 3. E[aX + bY] = a*E[X] + b*E[Y] 4. E[a + bX] = a + b*E[X] 5. if X ≤ Y, then E[X] ≤ E[Y] The distribution of a continuous r.v. X is said to be symmetric about a number θ if fX (x − θ) = fX (θ − x) for all values of x. If X is a continuous random variable such that E[X] exists and X is symmetric about θ, then E[X] = θ. Recall there are 2 different definitions of variance. V ar(X) = E[(X − E[X])2 ] and V ar(X) = E[X 2 ] − (E[X])2 Remember, the first definition is more about the interpretation of variance, and the second definition is usually a bit easier computationally. Percentiles and Special Percentiles A quartile represents a quarter of a data set or a quarter of a distribution. There are 3 quartiles of importance to a statistician (1st , 2nd , and 3rd ). Sometimes the first and third quartiles are referred to as the lower and upper quartiles respectively. • The first quartile, Q1, represents the bottom (lower) 25% of the data. • The second quartile, Q2, aka the median, represents the bottom (lower) 50% of the data. • The third quartile, Q3, represents the bottom (lower) 75% of the data. Q1 is the x value for which FX (x) = .25. You can define similarly Q2 and Q3. A percentile represents the lower such-and-such percent of the distribution. For example, the 10th percentile means that 10% of the distribution is ≤ that value, or it is the x-value such that FX (x) = .10. You can similarly define any other percentiles. Note: The quartiles are really just special cases of percentiles, especially the median. 37 of 62 Example 3.3 Let X represent the diameter in inches of a circular disk cut by a machine. Let fX (x) = c(4x − x2 ) for 1 ≤ x ≤ 4 and be 0 otherwise. Answer the following questions: (a) Find the value of c that makes this a valid pdf. (b) Find the expected value and variance of X. (c) What is the probability that X is within .5 inches of the expected diameter? (d) Find FX (x). (e) What is the 33rd percentile of X? Example 3.4 Let fX (x) = .25x for 1 ≤ x ≤ 3 and 0 otherwise. (a) Is X more likely to be within [1,2] or within [2,3]? First answer this question using logic. Next, check your answer by calculating the probabilities. (b) What is the probability that X is more than 2.2? (c) Find the mean and standard deviation of X. (d) Find FX (x). (e) What value of X represents the top 15% of the distribution? Example 3.5 For each of the following random variables, find their pdfs or cdfs (whichever is missing). (a)   0 FX (x) = .01(x − 10)2   1 x < 10 10 ≤ x < 20 x ≥ 20 (b) ( 0 FX (x) = 1 − e−λx x<0 0≤x (c)   .4 1 ≤ x ≤ 2 fX (x) = .2 3 ≤ x ≤ 6   0 otherwise Example 3.6 Let X be a continuous random variable with f (x) = c|x − 2| for 1 < x < 4 and 0 otherwise. c is a positive constant. Find the value of c that makes f(x) valid. Find the cdf of X. What are the probabilities that X is at most 3, at least 2, between 1.25 and 1.75, and less than 2 given it is less than 3? What is the median of X? What is E[X]? Example 3.7 Let f(x) be c(x+2) from 0 to 1 and c(-x+4) from 1 to 2 and 0 otherwise. Find c. Sketch the pdf. Find the cdf, median, and variance. Example 3.8 For this problem state whether the given cdf or pdf is valid. If it is not valid, state the reason(s) it is not valid and fix them (adding a constant, multiplying by a constant, changing the support, ...). √ • Let f(x) be (x-2) for x ∈ (1,2+ 3) and 0 otherwise. • Let F(x) be 0 for x ≤ 1, 2x2 − 3x + 1 for x ∈ (1,1.75), and 1 for x ≥ 1.75. • Let F(x) be 0 for x ≤ -3, −3x2 +2x+33 28 for x ∈ (-3,-1), and 1 for x ≥ -1. 38 of 62 3.2 Uniform Random Variables Refer back to Example 3.2. It has a uniform characteristic, this applies to its pdf. The Uniform Distribution is sometimes said to be evenly or uniformly distributed over an interval. This is a good way to characterize the distribution. Characteristics of the Uniform Distribution: • The definition of X. • The support is: • Its parameter(s) and definition(s): • The pdf is: • The cdf is: • The expected value is: • The variance is: Example 3.9 Revisit Example 3.2. These examples are actually uniform distributions. Calculate the expected values and variances for these 2 distributions. Also, calculate the 41st percentiles. Example 3.10 Shaggy feeds Scooby a Scooby-snack after every hi-jinks that Scooby foils. Suppose Scooby foils a hi-jinks anywhere from 0 minutes into the show up until 15 minutes into the show. Find the pdf, cdf, expected value, and variance for the amount of time until Scooby receives a Scooby-snack (denoted by X). Additionally, calculate the following probabilities: P(X < 5), P(X > 10), P(3 < X < 11), and P(X < 12 | X > 4). Example 3.11 A very famous, always crowded restaurant named Shenanigans has a porterhouse meal as its advertised special on Sweetest Day. It takes between 7 and 16 minutes to cook the porterhouse. Find the pdf, cdf, expected value, and variance for the amount of time until your porterhouse is cooked (denoted by X). Additionally, calculate the following probabilities: P(X < 10), P(X > 12), P(9 < X < 11), and P(X < 14 | X > 11). Example 3.12 Suppose it takes Landfill between 4 seconds and 15 seconds to finish any given drink. Keep in mind that he has to deal with the noise coming from the glockenspiel. Let X be the amount of time it takes Landfill to finish his next drink. Name the distribution and parameter(s) of X. Find the probabilities that X is more than 8, less than 12, and between 8 and 12. Example 3.13 Anywhere from 0 to 20 years a really ridiculous political term gets added to the English dictionary. Examples include antidisestablishmentarianism, gerrymandering, and filibuster. What is the probability that the next quirky political term gets added to the dictionary sometime in the next 8 years? What about at least 13 years from now? 3.3 Exponential Random Variables The Exponential Distribution can be thought of as the continuous analog of the geometric random variable. The exponential r.v. is often used as the distribution for the time required to complete a 39 of 62 certain task or for the elapsed time between successive occurrences of a specified event. Additionally, the exponential distribution may be used to model the behavior of units that have a constant failure rate (or units that do not degrade with time or wear out). Some examples include: the time until an appliance breaks, the time until a light bulb burns out, or the time until the next customer arrives at a grocery store. • The definition of X. • The support is: • Its parameter(s) and definition(s): • The pdf is: • The cdf is: • The expected value is: • The variance is: Since the Exponential distribution is the continuous analog of the Geometric distribution, one might wonder if the 2 great properties from the Geometric also apply to the Exponential. The answer is yes. The Exponential also has the memoryless property and a nice tail probability formula. Example 3.14 The sirens, while perched on their aesthetically pleasing fjord, were beckoning for Odysseus to come hither. If it on average takes about 1 minute for a captain to navigate his boats toward the sirens, what is the probability that Odysseus will steer his ship towards them after 5 minutes? What is the probability that he takes at most 3 minutes? What is the probability that it takes between 30 and 90 seconds? What is the probability it takes less than 300 seconds knowing it took more than 100 seconds? Example 3.15 Suppose the time it takes a puppy to run and get a ball, say T, follows an exponential distribution with a mean of 30 seconds. State the distribution and parameters of T. What is the probability that it takes the puppy more than 50 seconds to get the ball? Assuming independence, what is the probability that it takes the puppy less than 40 seconds to fetch each of the next 5 balls? What is the probability that it will take the puppy more than 45 seconds to get the ball knowing that it took the puppy longer than 20 seconds? Example 3.16 You and 3 friends decide to drive from West Lafayette to Boston to watch the Patriots lose. The duration of a round trip, say D, has an exponential distribution with a rate of 1 trip per 20 hours. Find the following probabilities: D is at most 15 hours, D is between 15 and 25 hours, D exceeds 25 hours, and D is at most 40 given that it is more than 15. Lastly, calculate the mean and variance of D. 3.4 Poisson Processes For a specified event that occurs randomly in continuous time, an important application of probability theory is in modeling the number of times such an event occurs. The following are several examples of such random phenomenon. 40 of 62 • The number of patients that arrive at a hospital emergency room. • The number of customers that enter a particular bank. • The number of accidents at an intersection. • The number of alpha particles emitted by a radioactive substance. Consider an event that occurs randomly and homogenously in continuous time at an average rate of λ per unit of time. We will refer to the occurrence of the event as a success. If we begin counting successes at time 0, and, for each time, t ≥ 0, we let N(t) = the number of successes by time t (≤ t). Automatically, this implies that N(0) is 0. We say such a counting process is a Poisson Process with rate λ if 2 more properties hold. Namely, if: • N(t): t ≥ 0 has independent increments (as long as the two time intervals have no overlap, they are indepedent). • N(t) - N(s), which is the number of successes in the time interval (s,t], is distributed as Poisson(λ(t-s)) for 0 ≤ s < t < ∞. As indicated by previous examples, the Poisson Process can be used to model arrivals. It is also used for waiting times and interarrival times. For each n ∈ N, we let Wn denote the time of the occurance of the nth event. That is the time at which the nth success occurs. If W3 is 10.34, that means the 3rd success occurred at a time of 10.34. The random variable Wn is called the nth waiting time. The elapsed time between the occurrence of the (n − 1)st and nth events is denoted by In and is called the nth interarrival time. So, we have the following 2 relationships: Wn = n X Ij j=1 In = Wn − Wn−1 One nice property of a Poisson Process with rate λ is that the interarrival times, or In s are iid Exponential random variables with rate parameter λ. There is one more property of a Poisson Process that is quite useful. Suppose we have Wt = n. This means that we had n successes on the interval [0,t]. These successes are independent Uniform(0,t) random variables. Keep in mind that time increments are independent for a Poisson random variable if there is no overlap. Knowing Wt = n, if we looked at the distribution of the number of successes on the interval [0, 4t ], how would these be distributed? Example 3.17 Suppose that phone calls arrive at a switchboard according to a Poisson Process at a rate of 2 per minute. Let X be the number of calls between 9:30 and 9:45. Find the distribution 41 of 62 of X. Let T be the time between the 8th and 9th calls. What is the distribution of T? What is the probability that exactly 10 calls (total) come in the next 4 minutes? What is the probability that the next call comes in 30 seconds and the second call comes at least 45 seconds after that? Given there are exactly 7 calls in 3 minutes, what is the probability that they all came in the last minute? Example 3.18 Each time a student logs on to their ITaP account, the computer sends a request for the student’s profile to the main ITaP database. Suppose that these profile requests come to the main database according to a Poisson Process at a rate of 9 per minute. What is the probability that between 8 and 11 (inclusive) profile requests go to ITaP in a given minute? On average, how many profile requests arrive in an hour period? What is the probability of 7 profile requests in a 1-minute interval followed by 19 profile requests in the subsequent 2-minute interval? How long, on average, does it take between successive profile requests? What is the probability that the next profile request takes more than 15 seconds? What is the probability that the next profile request takes at most 22 seconds? It we know that 13 profile requests occurred between 12:00:00 AM and 12:01:30 AM, what is the probability that 5 profile requests occurred between 12:00:50 and 12:01:20? Example 3.19 Customers arrive at Scotty’s at a rate of .5 per minute. (Assume all customers arrive independently of all other customers.) What is the probability that 10 customers arrive in the next 15 minutes? What is the probability that 10 customers arrive in each of the next 4 15-minute intervals? How long on average does it take for the next customer to arrive? What is the probability that I1 is more than 20 seconds, I2 is more than 30 seconds, and I3 is less than 15 seconds? Example 3.20 At any point during a Stat 225 exam, the next person to drop a calculator will take 5 minutes on average to do so. Let C represent the time until the next person drops their calculator. Name the distribution and parameter(s) of C. Find the following probabilities: C is more than 5 knowing that it is less than 10, C is at least 8 given it is less than 15, C is more than 2, C is less than 4, and C is at least 7 given that it is more than 5. Example 3.21 Purdue undergraduate students’ IQ are evenly distributed over the interval 80 to 170. Pick a random undergraduate from Purdue. Let I denote their IQ. Find the following probabilities: I is less than an ”average” intelligence (100), I is more than 130, I is between 110 and 140, and I is more than 90 given it is less than 120. Also, in order to be in Mensa, a person must be in the top 2% of all IQs. What is the top 2% IQ score for a Purdue undergraduate? Example 3.22 Suppose that the amount of time one spends in a bank has a mean of 10 minutes. Let T be the amount of time that Glen spends in his bank. What are the following probabilities: T is more than .25 hours, T is less than .2 hours, T is less than .25 hours given it is at least .16̄ hours? Find the 40th percentile of T. Example 3.23 Shoe sizes of NBA players are equally likely over the interval 14 to 22. Let S represent the shoe size of a random NBA player. Find the following: the 10th percentile of S, the value of S such that only 12% of NBA players have bigger feet, the probability that S is between 10 and 16, the probability that S is more than 17, the expected value of S, and the variance of S. Example 3.24 Let X ∼ Expo(λ = 2). Find P(X < 4), P(X > 1.2), and Var(X). Example 3.25 Thomas is examining a length of television wire for defects. He knows that there are an average of 3 defects in every 10 feet of wire, that the occurrence of defects in any segment 42 of 62 of wire is independent of the occurrence of defects in any other segment, that all segments of wire are equal with regards to the occurrence of defects, and that for sufficiently small segments of wire the likelihood of finding more than one defect is practically zero. Let D1 be the number of defects in the first 10 feet of wire, D2 , be the number of defects in 50 feet of wire, W be the amount of wire between the fifth and sixth defects. Find the following probabilities: D1 is between 2 and 4 inclusive, there are multiple defects in the first 10 feet of wire, D2 is 15 or 17, W is at most 3, W is at least 2, W is at most 10 given it is at least 7. Example 3.26 Find the expected value and variance of the 3 variables defined in Example 3.25. Suppose Mike is supervising Thomas. He inspects Thomas’ work right before lunch. This coincides with feet 30 through 45 of the wire. It is known that Thomas finds 6 defects while Mike is watching. Let Y be the number of these defects that occur anywhere from the 38th foot to the 42nd foot. Find the following probabilities for Y: it is at least 1, it is at most 2. Suppose further that we know no defects occured in the last 3 feet of wire (from the 42nd foot to the 45th foot). Recalculate the previous 2 probabilities. Example 3.27 Suppose Lynda Thoman arrives to her office on Monday’s anywhere from 6:45 AM to 7:45 AM and that she is equally likely to arrive anywhere in that interval. Let T be the time of her arrival. Find the following probabilities for T: it is between 7 and 7:30, it is at most 7:25, it is at least 7:30, it is less than 7:40 knowing it is more than 7:20. Also, find E[T] and Var(T). Example 3.28 Refer to Example 3.27. It is known that she teaches at 7:30 AM on Monday’s. It is also known that it takes her 12 minutes to walk from her office to where she teaches, and it takes her 8 minutes to make a pot of coffee. Find the following probabilities: she is late to class knowing she did not make coffee, she is late to class knowing she made coffee, she is on time to class and had at least 11 minutes in her office, she is on time to class and had at least 11 minutes in her office to enjoy the coffee that she made. Lastly, knowing that she was on time to class, what is the 23rd percentile of the time that she arrived to her office (write this as a time). Example 3.29 The time that it takes until a student uses a cell phone in class is exponential with a mean of 1.1 minutes. Marı́a just used her cell phone at 12:55 PM. Let X be the time until the next person uses a cell phone. Class ends at 1:00 PM. Find the following probabilities: X is at most 2.3, X is more than 3.9 knowing it is more than 2.3, that no one uses a cell phone until after class is over. What is the 81st percentile of X (write this as a time)? 3.5 Normal Random Variables One of the most important distributions in Probability and Statistics is the Normal Distribution. Any Normal distribution problem will be labeled as a Normal Distribution. Let us start with the (x−µ)2 1 pdf of the Normal. It is: fX (x) = √2πσ e− 2σ2 for any real number x, any positive number σ, and any real number µ. Now, you know the pdf, support, and parameters for a Normal Distribution. Take a minute to calculate the cdf of the Normal. A potential next question is what do µ and σ mean (or represent)? The answer shall be provided by your teacher. This also eliminates the typical 6th and 7th bullet points for your distributions. 43 of 62 Thus completing all 7 bullet points. A Normal Distribution is sometimes referred to as a bell curve because of how the pdf looks. One important property of this bell curve is that it is symmetric. What is a Normal Distribution symmetric about? While talking about the shape of the pdf, what would happen to the graph of the pdf if we changed σ? What about if we changed µ? One drawback of the Normal Distribution is that its cdf is not a simple algebraic formula. There is no closed form solution to the cdf of a Normal. Therefore, in order to find any probability associated with a Normal(µ, σ 2 ) random variable we need to do an algebraic trick that is called standardizing a Normal r.v. To understand this concept, first we need to introduce the variable Z. In Statistics, Z is reserved for a Normal(µ = 0, σ = 1) random variable. Z is referred to as the Standard Normal. Our ”trick” is to turn a Normal(µ, σ 2 ) into a Normal(µ = 0, σ = 1) random variable. This is done by the following formula: X −µ . σ Unlike other continuous random variables, the pdf and cdf for Z are not labeled with f and F. Instead, they are labeled with φ and Φ respectively. Because of the importance of Z in Statistics, it gets its own letter to represent its pdf and cdf. However, since Z is a Normal r.v. its cdf does not exist in closed form either. Instead, we have a table of probabilities. The one we will use in this course is on the course web site as ”Normal Table”. Please print this pdf off and bring it with you to every class. Z= c−µ If X is a Normal(µ, σ 2 ) r.v., then P(c < X < d) = Φ( d−µ σ ) - Φ( σ ). In other words, we can relate the cdf of X to the cdf of Z. FX (x) = Φ( x−µ σ ). Recall that a Normal r.v. is symmetric. This actually implies the following: Φ(−z) = 1 − Φ(z). This is useful for P(Z ≥ z) = 1 - Φ(z) = Φ(−z). Now that we can calculate probabilities for a Normal r.v., there are 2 other main topics to discuss. The first is about sums of independent Normal random variables. Let Xi denote mutually independent Normal random variables with parameters µi and σi respectively. Their sum has mean equal sum of the variances. If we let Y = Pn equal to the sum of the µi and Pn variance Pn to the 2 2 i=1 Xi then Y ∼ Normal(µy = i=1 µi , σy = i=1 σi ). This can be applied to any number of Normal random variables (provided that they are mutually independent). (Quick aside: This provides motivation for the CLT, which a lot of you will see in MGMT 305.) Example 3.30 Let us examine Z. Find the following probabilities with respect to Z: at most -1.75, at most 1.75, between -2 and 2 inclusive, less than .5. Find the following with respect to Z: the value such that 20.3% are higher than it, the 4.65th percentile, and the values representing the middle 96.6% of the distribution. Example 3.31 Let X be Normal with a mean of 20 and a variance of 49. Find the following probabilities: X is between 15 and 23; X is more than 12 knowing it is less than 20; given X is less than 28, the probability that it is more than 16; and that it is more than 31. What is the value that is smaller than 20% of the distribution? 44 of 62 Example 3.32 Let X1 , X2 , and X3 be mutually independent, Normal random variables. Let their means and standard deviations be 3k and k for k = 1, 2, and 3 respectively. Find the following P3 distributions: i=1 Xi , X1 + X2 - X3 , 2X1 - 3X3 + 4X3 . Call the previous distributions S, T, and V respectively. Find the following percentiles for S, T, and V respectively, 83th , 63rd , and 42nd . Find the following probabilities: S is bigger than V’s mean, T is smaller than half of S’s variance, and V is bigger than T’s 99th percentile. Example 3.33 SAT Math scores follow a Normal distribution with a mean of 533 and a standard deviation of 116. Assuming that scores above 800 get truncated to 800, what percent of scores were reported as 800? The middle 50% of SAT Math scores at Purdue in 2011 were reported as 550 to 690. What percent of all SAT Math scores were in this range? Notre Dame’s middle 50% are between 680 and 770. What percent of all scores are below Notre Dame’s 75th percentile? What percent of all scores are above Notre Dame’s 25th percentile? Example 3.34 Colin and Mike are wasting their childhood playing ping pong in Colins basement. Since they have spent so much time in the basement playing ping pong, pool, and darts, they are famished. They decide to order Chinese food with extra teriyaki sauce for delivery. If the food will arrive according to a normal distribution with mean of 20 minutes and standard deviation of 5 minutes, what is the probability that the two kids have to wait more than 32 minutes for their food? What is the probability that they wait less than 15 minutes? What is the probability that they wait less than 26 minutes, knowing that they wait at least 12 minutes? Example 3.35 Suppose you and 4 of your best friends are migrating west. You are the local physician. Suppose you decide to hunt buffalo. On average buffalo have 800 lbs. of edible meat with a standard deviation of 75 lbs. If your party comes back to the trail with one buffalo, what is the probability that you come back with less than 700 lbs. of edible meat? If you need 925 pounds of edible meat to make it all the way to Independence, Missouri, what is the probability that your 1 buffalo will last you until Independence, Missouri? What amount of edible meat is less than 29% of the distribution? Example 3.36 “Wish” by NIN is a 3 minute and 36 second long song. Suppose the length of time the pyrotechnics last is normally distributed with an average of 2 minutes, and they have a standard deviation of 53 seconds. Suppose NIN use pyrotechnics at the beginning of “Wish”. What is the probability that the fog will still mask Trent Reznor at the end of “Wish”? Example 3.37 A male yeti’s height is normally distributed with a mean of 84 inches and a standard deviation of 7 inches. Since, yetis seem to elude people, we will not make a question about the probability of a specific yeti, but of yetis in general. What are the 25th , 48th , and 67th percentiles for height of a yeti? We can use a Normal Distribution to approximate a Binomial Distribution if n is large and p is moderate (close to .5). Our rule of thumb for this approximation to be valid is that both np > 5 and n(1-p) > 5. If X ∼ Binomial(n,p) and the approximation holds, then the approximation, X* ∼ N(µ = np, σ 2 = np(1-p)). One caveat to this approximation is that we are approximating a discrete distribution (the Binomial) with a continuous distribution (the Normal). One thing that we know about these types of distributions is that discrete r.v.s have point probabilities, but continuous r.v.s do not. In order to account for this, we use the continuity correction. This involves either adding or subtracting a half from the x value accordingly. 45 of 62 A Normal Distribution is sometimes referred to as a bell curve because of how the pdf looks. One important property of this bell curve is that it is symmetric. What is a Normal Distribution symmetric about? While talking about the shape of the pdf, what would happen to the graph of the pdf if we changed σ? What about if we changed µ? The Empirical Rules or as they are sometimes known the Rules of Thumb are a way to approximate certain probabilities for the Normal Distribution. There are 3 rules of thumb and they contain two parts: an interval and a percent (or probability). Interval µ±1∗σ µ±2∗σ µ±3∗σ Percent Contained 68% 95% 99.7% The above intervals are all centered around µ. Additionally, since the Normal Distribution is centered around µ, these intervals represent the middle 68, 95, and 99.7 % of the Normal Distribution. Recall that the Normal Distribution is symmetric. This means that the % not included in each interval is equally distributed on the low and high ends of the interval. For example, that means 16% of the distribution is < µ − 1 ∗ σ. Example 3.38 Mr. DeFries’ golf scores per 9 holes ar Normally distributed with a mean of 50 strokes and a variance of 25 strokes. For this entire problem, use the Empirical Rules. Find the probability that Mr. DeFries scores between 45 and 60 on his next round. Find the probability that Mr. DeFries scores between 55 and 65 on his next round. Find the probability that Mr. DeFries scores less than 55 on his next round. What is the 97.5th percentile of his score distribution? Example 3.39 NFL players height is Normally distributed with a mean of 74 inches and a standard deviation of 2 inches. For this entire problem, use the Empirical Rules. The middle 95% of all NFL players have heights between what 2 values? Find the .15th percentile. Example 3.40 For this entire problem, please use the Rules of Thumb. The number of pairs of shoes in an adult female’s closet is Normal with a mean of 58 and a standard deviation of 5. What interval contains the middle 68% of the distribution? Find the value such that 2.5% are lower than that value. What is the probability an adult female’s closet has between 48 and 63 pairs of shoes? What percent of adult women have between 68 and 73 pairs of shoes in their closet? Example 3.41 Suppose a class has 400 students (to begin with), that each student drops independently of any other student with a probability of .07. Let X be the number of students that finish this course. Find the probability that X is between 370 and 373 inclusive? Is an approximation appropriate for the number of students that finish the course? If so, what is this distribution and what are the value(s) of its parameter(s)? For the following probabilities, if an approximation is appropriate, use the approximation; otherwise, use the exact distribution. Find the probability that is between 370 and 373 inclusive, that X is at least 375, that X is at most 370, that X is between 360 and 380, and that X is between 360 and 380 inclusive. 46 of 62 Example 3.42 Brian is a movie buff. He has an enormous DVD collection, that he lets his friends borrow from. Let N represent the number of DVDs that Brian has in his house at any given time. N is Normal with a mean of 600 and a variance of 144. Find the following probabilities: N is within 20 of its mean, N is greater than 630, N is less than 560, N is greater than 588 or less than 624 but not both. (Please answer the next 2 questions with an unrounded answer and a rounded answer.) What is the 34th percentile of N? What number of movies represents the top 15 percent for N? Example 3.43 Clayton, Jeremy, and Eric are at Balmoral Race Track betting on horses. The 7th race has 8 horses. A Trifecta requires you to pick the first 3 horses (win, place, and show) in order. A Box around a Trifecta (or a superfecta for that matter) means that you do not have to pick the order, only the horses that are in the first 3 spots. Suppose they pool their money and buy 400 $1 Boxed Trifectas for the 7th race and they pick the horses at random for each bet. Suppose each bet costs $6 (why would that make sense?) and that a winning ticket pays $500. Let X represent the number of winning tickets. Find the following probabilities: X is 7 or 8, they have at least 1 winning ticket, they make money on this bet. Lastly, what is the expected value and variance of their profit from this bet? Example 3.44 Refer to Example 3.43. Is an approximation appropriate for X? Justify your answer. If it is, recalculate the probabilities using the approximation. Example 3.45 Karl is making some pasta and will let it boil between 8 and 10 minutes before removing from the stove and draining. Let X be the length of time the pasta will boil on the stove. What is the distribution of X? Find the following probabilities: X < 8.8, X > 9.4, X is between 8.75 and 9.1, and X is greater than 8.4 given that it is smaller than 8.95. Example 3.46 Kathy has decided to Go Green and is replacing all existing lights in her apartment with energy saving bulbs. These new energy saving bulbs have a mean lifetime of 7 years. Let X be the amount of time until she needs to replace one of these new bulbs. What is the distribution of X? Find the probability that X is: more than 5 years, at most 10 years, between 2 and 6 years, greater than 12 given it is greater than 8, greater than 3 given it is less than 7. Example 3.47 At a STAT Christmas party, Ritabrata claims that he can accurately identify the contents of a wrapped present 45% of the time, with each package independent of any other. Let X be the number of presents Ritabrata correctly identifies in the 16 packages at the party. What is the distribution of X? Is the an appropriate approximation for X (why or why not)? Find the probability that X is: 8, at least 14, and at most 4. If an approxmation was appropriate, state the approximate distribution and repeat the probability calculations. Example 3.48 The length of Dougs 225 lectures follow a Normal distribution with an average of 47.5 minutes and a standard deviation of 1.25 minutes while Grant’s 225 lectures follow a Normal distribution with an average of 49.25 minutes and a standard deviation of 0.75 minutes. Assume the length of a 225 lecture is independent from day to day and between TAs. What is the probability that Doug lectures longer than the median time that Grant lectures? Grant wants to reassure his students by telling them that he will only lecture longer than ”M” minutes 8% of the time. Find ”M”. Classes are 50 minutes long, what is the probability that at least 1 TA will let their students out late. Example 3.49 Chester is on vacation with his wife and children. They go to a restaurant where the special is a 96 ounce steak. The restaurant will give you a gift card worth 4 free meals if you can finish this steak. It is known that only 10% of all people that attempt this challenge will 47 of 62 actually be able to finish this giant steak. The week before Thanksgiving, several people attempt this challenge to try and prepare for their Thanksgiving feasts. Suppose that 200 people attempted to eat the 96 oz. steak during this week. The proportion of people that will successfully finish the steak is ∼ N(µ = .1, σ 2 = .00045). Find the following probabilities: more than 26 people finished the steak and at most 40 people finished the steak. How many people do you expect to finish the steak? Suppose this set-up applies during every Thanksgiving week. The top 18% of all Thanksgiving weeks have at least how many people finish this steak? The bottom 31% of all Thanksgiving weeks have at most how many people finish this steak? Example 3.50 The number of trick or treaters (labeled tots hereafter) that arrive at Harvey’s house are equal over time with a mean of 7 per hour. Assume all tots arrive independently of one another. Find the following: 8 tots in the first hour, 12 tots in the first 2 hours, 8 tots in the first hour and 12 tots total in the first 2 hours, it takes more than 5 minutes for the next tot to show up, and the probability that 10 tots show up in the first 1.5 hours if 20 tots showed up total (a 4 hour period). Example 3.51 Let X be a continuous random variable. Let the pdf of X be c(3x2 − 2x) for x between 2 and 4 inclusive. First, find the value of c that makes this a legitimate pdf. Second, find the cdf. Additionally, find E[X], Var(X), the median, the probability X is at most 3, and the probability X is between 2.3 and 3.1. 4 4.1 Numerical Summaries Quantitative Random Variables Sample statistics are numerical measures of location, dispersion, shape, association, etc. that are computed for data FROM A SAMPLE. Population parameters are numerical measures of location, dispersion, shape, association, etc. that are computed for data FROM A POPULATION. Note: most of the time, we will just say statistic or parameter. Keep in mind that statistics are always from the sample and parameters are always from the population. In most cases, parameters are denoted by Greek letters, and statistics are denoted by their English alphabet counterparts. Additionally, sometimes statistics are referred to as point estimates of the parameter that they represent. This concept is especially prevalent during hypothesis testing and confidence interval construction. Mean is the average value or expected value. The population mean is represented by mu, µ. If necessary, you can add a subscript to avoid confusion, like µx vs µy . The sample mean is represented by x-bar, x. Computation of x: 48 of 62 The population variance is denoted as σ 2 , while the sample variance is denoted by s2 . They are computed as such: Mode is the value that occurs the most (has the highest frequency). Range = largest value (maximum) - smallest value (minimum). Percentile is best represented with an example. The pth percentile is a value of the data set (or distribution) such that at least p% of the data set (or distribution) is ≤ this value. There are 3 special percentiles, call the quartiles. The quartiles split the data into 4 parts. The lower quartile, median (aka the 2nd quartile), and the upper quartile are the 25th , 50th , and 75th percentiles respectively. The lower and upper quartiles are sometimes known as the first and third quartiles. We typically abbreviate these 3 values as Q1, M, and Q3. Calculation of Percentiles (and Quartiles) using the indexing method (see page 86 of Statistics for Business and Economics by Anderson, Sweeney, and Williams, 11th ed. ).: Interquartile Range, or IQR is Q3 - Q1. A boxplot is a visual representation of the 5 number summary. The 5 number summary is the minimum, Q1, the median, Q3, and the maximum. Boxplots have different types. Namely, there is a ”regular” boxplot and a modified boxplot. The modified boxplot will highlight if there are outliers, but a regular one will not. Your teacher will demonstrate both of these versions. Please keep in mind that there are different variations of a modified boxplot. An outlier is a data point that does not fit with the rest of the data. In a univariate case, this number can be either too small or too large. In a bivariate case, it would be a data point that does not fit the overall trend of the variables taken together. Here is our outlier test: Example 4.1 Hank Aaron hit an astounding 755 home runs in his career. His career spanned from 1954 through 1976. In those 23 seasons he hit 13, 27, 26, 44, 30, 39, 40, 34, 45, 44, 24, 32, 44, 39, 29, 44, 38, 47, 34, 40, 20, 12, 10. What is the mode of the data set? What is the range of the data set? Create both a regular and a modified boxplot for the number of home runs that Hank Aaron hit in a season. Find the 61st percentile. Example 4.2 A Stat 113K class was asked how many times they wanted to eat ice cream last summer. The answers given were: 0, 15, 18, 7, 15, 28, 10, 20, 3, 10, 6, 10, 8, and 9. What is the mode of the data set? What is the range of the data set? Create both a regular and a modified boxplot for the number of times the students wanted to eat ice cream.Find the 18th percentile. Example 4.3 Suppose we have the data set 1, 2, 3, 4, and 5. Find the mean of the data. Also compute variance in 2 ways (one assuming that this is a sample, the other assuming that this represents the entirety of the population). For these 2 different variance calculations, how would 49 of 62 you denote the mean? Example 4.4 Suppose we have the data set -4, -2, 0, 2, and 4. Find the mean of the data. Also compute variance in 2 ways (one assuming that this is a sample, the other assuming that this represents the entirety of the population). How does the variance relate to that in example 13.3? Is this suprising or can you show why this is true? Statistics is the science of collecting, analyzing, presenting, and interpreting data. Data are the facts and figures collected, analyzed, and summarized for presentation and interpretation. Data set all the data collected in a particular study. Elements are the individual entities of a data set. A variable is a characteristic of interest for the elements. An observation is the set of measurements obtained for a particular element. 4.2 Qualitative Random Variables There are two main types of variables, qualitative (aka categorical) and quantitative (aka numerical). Qualitative data has labels or names used to identify an attribute of an element. Qualitative data use either the nominal or ordinal scale of measurement. Nominal scale is such that order does not matter. Ordinal scale is such that order does matter. The order or rank of the data is meaningful. Quantitative data has numeric values that indicate how much or how many of something. Quantitative data uses either the interval or ratio scale. Interval scale has ratios of quantities that cannot be compared. Ratio scale has ratios of quantities that are meaningful. 50 of 62 Note: We can use numeric values to represent categoric data. This is often done when working with a data set. For example, suppose we are interested in grade level of a student. Instead of using the values of Freshman, Sophomore, Junior, and Senior, we could use the values 1, 2, 3, and 4. Since the numbers represent categories, grade level is a qualitative variable. When referring to a variable, we can describe it is qualitative or quantitative, and one of nominal, ordinal, interval, or ratio. Cross-sectional data is data collected at the same or approximately the same point in time. Time series data is data collected over several time periods. Example 4.5 Wabash College student data set Gender Male Male Male Male Grade Sophomore Senior Senior Freshman Hometown Indianapolis Crown Point Lombard Indianapolis Major Psychology Spanish Religion Philosophy Pieces of Candy Consumed 15 12 8 10 • What is the entire spreadsheet of data called? • Each student is what? • How many elements are in the data set? • How many variables are in the data set? • List the 3rd observation. • What type of variable is each variable in the data set (be sure to answer both qualitative or quantitative as well as nominal, ordinal, interval, or ratio). Example 4.6 For this example, answer what type of variable each of the following are (be sure to answer both qualitative or quantitative as well as nominal, ordinal, interval, or ratio). Smoking status, SAT score, income, level of satisfaction, GPA, clothing size (s, m, l, xl), and time taken to run a mile. Example 4.7 For this problem, state whether the variables included are cross-sectional or time series. • Current GPAs of Purdue Statistics Graduate Students vs. GPA of Sanvesh during his time at Purdue. • Value of Gordan Gecko’s portfolio over the previous 3 years vs. Value of all portfolio’s at Charles Schwaab in January 2008. • Total salary of the LA Lakers throughout the 1990s vs. Salaries of all NBA teams in 1994. 51 of 62 4.3 Sampling Where does data come from? Sources of data can be existing sources (employee records, student records, medical history, etc.), surveys (teacher evaluations, amazon buyer reports), experiments, or observational studies. Population is the set of all elements of interest in a particular study. Sample is a subset of the population. Census is a survey designed to collect data from the entire population. Statistical inference is the process of using data obtained from a sample to make estimates or test hypotheses about the characteristics of a population. Some of the reasons that people use samples as opposed to looking at the whole population are time, money, etc. Types of Sampling Simple random sampling, abbreviated SRS is a sample selected such that each possible sample of size n has the same probability of being selected. Another way to say this is that each element in the population has an equal chance of being picked to be in the sample. Sampling with replacement has sampling where the elements are put back in the population after being selected for the sample. This allows an element a chance of being selected more than once for a single sample. Sampling without replacement has sampling where the elements are not put back in the population after being selected for the sample. This allows an element a chance of being selected at most once for a single sample. Stratified random sample is a probability sampling method in which the population are first divided into strata (groups) and a simple random sample is then taken from each stratum. Probability sampling is sampling where elements are selected from a population with a known probability of being included in the sample. It could give equal probability to each element (this is the SRS) or to elements in a group (stratified sampling) or have any legitimate probability model for inclusion for each element. Cluster sampling is sampling where the elements in the population are first divided into separate groups called clusters and then a simple random sample of the clusters is taken. This means 52 of 62 that all elements in a selected cluster are part of the sample. Systematic sampling is a probability sampling method in which we randomly select one of the first k elements and then every k th element thereafter is picked. Convenience sampling is a nonprobability method of sampling whereby elements selected for the sample are on the basis of convenience. Judgment sampling is a nonprobability method of sampling whereby elements are selected for the sample based on the judgment of the person doing the study. Example 4.8 I am going to write this in terms of lines. Elegant, extravagant elephants entertain every evening at seven. They serve escargot and eggs benedict and endive. Eight elderly elegant elephants elevate themselves to the expensive entrance with elevators exceeding expectations. Eating everything edible, elephants expan exponentially. ”Excellent!” the entertained elephants express after the entertaining entrees were served. Everything was expedited by the energetic efforts of the executive elephant empress. Everyone was entertained to excess and enjoyed the edible endeavors immensely. The evening ended enchantedly with Echinacea herbal tea. This example will be lead by your instructor. • Count the number of ”e”s in this paragraph. • Randomly pick 1 of the 7 lines and count the ”e”s in that line. Then, multiply that number by 7 to get an estimate of the total. How accurate is your estimate? 4.4 Summarizing Data Information Bias is an important concept in statistics. It can refer to the design of a study, the way a questions is asked, or the value of a statistic. A design is said to be biased if it systematically favors certain outcomes. This can apply to how a question is asked too. Bias can also be defined as consistent, repeated deviation of the sample statistic from the population parameter in the SAME direction when we take many samples. This means that the statistic is either always below the parameter or it is always above the true value. When creating a survey, you want to pay particular attention to trying to avoid bias. Some things to avoid are confusing wording, asking a question no one would remember, leading the question to a certain answer, and asking embarrasing (or very personal) questions. How to summarize qualitative data: You can use a frequency distribution, percent relative frequency, bar or column graphs, and pie charts. 53 of 62 Frequency Distribution is a summary of data showing the number (frequency) of data values in each of several nonoverlapping classes. Relative Frequency Distribution is a summary of data showing the fraction or proportion of data values in each of several nonoverlapping classes. Percent Frequency Distribution is a summary of data showing the percentage of data values in each of several nonoverlapping classes. Typically the above 3 distributions are summarized in table form. The relative frequency distribution is akin to a pmf. The above 3 distributions can also be represented by a bar graph or pie chart. Bar graph is a graphical device used for depicting qualitative data that have been summarized by any of the above 3 distributions. Pie chart is a graphical device used for presenting data summaries based on a subdivision of a circle into sectors that correspond to the relative frequency for each class. How to summarize quantitative data: You can use dot plots, relative or % frequency, histograms, cumulative distributions, or stem and leaf plots. Dot Plot is a graphical device that summarizes data by the number of dots above each data value on the horizontal axis. Histogram is a graphical presentation of a frequency distribtion, relative frequency distribution, or percent frequency distribution of a quantiative variable. It is constructed by placing the class intervals on the horizontal axis and the frequencies, relative frequencies, or percent frequencies on the vertical axis. When making a histogram, you need to pick an adequate number of classes (or, equivalently, an appropriate width of the interval for each class). You do not want to have too few classes that you lose most of the information, nor do you want to have too many classes so that most of the frequencies are low. It should be noted that while bar graphs look similar to histograms they are quite different. Their similarities are that they are constructed using bars and the y-axis is one of frequency, percent frequency, or relative frequency. Their main difference is that a bar graph summarizes a qualitative variable and a histogram summarizes a quantitative variable. Additionally, the bars in a histogram touch, but the bars in a bar graph do not touch. The reason for this last difference is about the use of histograms. You want to get an idea of the distribution of your variable. We can look at a histogram in much the same way as a pdf. Often a use of a histogram is to try and see if you can fit a named distribution (like a Normal or Exponential) to variable of interest. Cumulative Frequency Distributionis a summary of quantiative data showing the number of 54 of 62 data values that are less than or equal to the upper class limit of each class. If you had a data set of n values, we could think of the cumulative frequency distribution as being n*F(x), where F(x) is the cdf as defined previously. Cumulative Relative Frequency Distribution is a summary of quantitative data showing the fraction or proportion of data values that are less than or equal to the upper class limit of each class. This is equivalent to the cdf. However, the definition might be a little strange as it has been adapted to fit the concept of a histogram (using class limits as opposed to the data value). This definition is used in the case where you do not know the data, just a summary of the data. Cumulative Percent Frequency Distribution is a summary of quantitative data showing the percentage of data values that are less than or equal to the upper class limit of each class. Ogive is a graph of a cumulative distribution. Line graphs are used to summarize time series data. A typical line graph has time on the x-axis and the variable on the y-axis. Stem-and-leaf plot is a technique that orders quantiative data points and provides insight about the shape of the distribution. To make a stem-and-leaf plot, the last digit of the number is the leaf and the rest of the number is the stem. Additionally, any stem that is not used, but is within the range of the data, is kept in the plot. You can create split-stem plots or trimmed data stem-and-leaf plots also. Example 4.9 Suppose our data set is the numbers 1, 3, 5, 7, 12, 15, 17, 19, 21, 21, 21, 30, 33, 39, and 56. Create a stem-and-leaf plot of the data. Scatter Diagram or scatterplot is a graphical representation of the relationship between 2 quantitative variables. This topic will be addressed on November 30th . 4.5 Relationships between Two Variables Crosstabulations (sometimes known as contingency tables) is a summary of data for 2 qualitative variables. The classes for one variable are the rows and the classes for the other variable are the columns. The entries of the table are a frequency. When we look at crosstabulations, we examine 3 types of probabilities: joint, marginal, and conditional. 55 of 62 Joint distribution is how the 2 variables are distributed together. Marginal distribution is how 1 variable is distributed without accounting for the other variable. Conditional distribution is how 1 variable is distributed given a particular value of the other variable. Calculations of these probabilities involve cell totals, row or column totals, and the overall total. Example 4.10 Suppose we polled 100 students, 50 of whom went to class yesterday and 50 did not attend class yesterday. We asked them whether or not they were happy. Suppose that 2 of the students who went to class were happy, while 40 of the students who did not go to class were happy. • Create a crosstabulation for this situation. • For each of the following, state whether it is a joint, marginal, or conditional probability, and calculate the probability. – – – – – A A A A A student student student student student is happy was in class yesterday was not in class and not happy was happy knowing they were in class was in class knowing that they were happy class no class total happy 2 40 42 not happy 48 10 58 total 50 50 100 Example 4.11 Let us examine the following crosstabulation: Married Divorced/Widowed Never Married Total Men 78 24 11 Women 64 32 25 Total • What percent of men are married? • What percent of people in the sample are divorced/widowed? • If we pick a random person who was never married, what is the probability that they are male? • What is the probability that a person is married and male? • Knowing the person is female, what is the probability they are divorced/widowed? 56 of 62 • Are these joint, marginal, or conditional probabilities? As previously discussed, crosstabulations are a way to summarize the relationship between 2 categorical (qualitative) random variables. The χ2 test is a way to test if these variables have a relationship or not. Below are the 8 steps necessary for a χ2 test. 1. Define the Null (H0 ) and Alternative (HA ) hypotheses. 2. (If necessary) Calculate the row, column, and overall totals. 3. Calculate the expected counts. 4. Calculate the partial χ2 values (a χ2 value for each cell of the table). 5. Calculate the χ2 statistic. 6. Calculate the degrees of freedom (df). 7. Find the χ2 critical value (from the chart). 8. Draw your conclusion. Example 4.12 A 2011 study was conducted in Kalamazoo, Michigan. The objective was to determine if parents’ marital status affects children’s marital status later in their life. In total, 2,000 children were interviewed. The columns refer to the parents’ marital status. Use the twoway table below to conduct a χ2 test from beginning to end. Use α = .10. (Observed Counts) Child Married Child Divorced Total Parents Married 581 455 Parents Divorced 487 477 Total Example 4.13 The following two-way table contains enrollment data for a random sample of students from several colleges at Purdue University during the 2006-2007 academic year. The table lists the number of male and female students enrolled in each college. Use the two-way table to conduct a χ2 test from beginning to end. Use α = .01. (Observed Counts) Liberal Arts Science Engineering Total Female 378 99 104 Male 262 175 510 Total Example 4.14 Here is a two-way table from a survey of male students in six secondary schools in Malaysia. Use the two-way table to conduct a χ2 test from beginning to end. Use α = .05. Variance is a measure of the variability for 1 quantitative variable. 57 of 62 (Observed Counts) At least 1 close family member died from lung cancer At least 1 close family member smokes No close family member smokes Total Student Smokes 18 115 25 Student does not Smoke 110 207 75 Covariance and correlation are both measures of how 2 quantitative variables change together. So the question becomes which to use and why. The answer lies in the values these 2 concepts can take. Covariance is unbounded, meaning it can be anything from -∞ to +∞. However, correlation is always between -1 and 1. A large (+ or -) covariance does not necessarily mean there is a strong relationship (or association) between the 2 variables. The reason for this is that this could be caused by a large variance in 1 or both of the variables. However, a large (+ or -) correlation does mean there is a strong relationship between the 2 variables. To classify the strength of a relationship we use the value of the correlation coefficient. This is either ρ or r depending on whether it is the population or sample. I will state the rules with respect to ρ but they can be used with r too. For For For For For | ρ | = 1, we say they have a perfect, linear relationship. .8 ≤ | ρ | < 1, we say they have a strong, linear relationship. .5 ≤ | ρ | < .8, we say they have a moderate, linear relationship. 0 < | ρ | < .5, we say they have a weak, linear relationship. ρ = 0, we say they have no linear relationship. Calculations: σ2 = s2 = σx,y = sx,y = ρx,y = rx,y = Example 4.15 What is the average airspeed velocity of an unladen swallow? Suppose you collect sample data on African and European swallows. African 18 22 26 30 European 21 22 25 28 58 of 62 Total Calculate the means, variances, and standard deviations of each variable. Additionally, calculate the covariance and correlation between the 2 variables. Example 4.16 You wonder how sleep affects productivity. You take a sample of 4 of your friends and measure last night’s sleep and today’s productivity in hours. Here are the results: Sleep 2 4 6 10 Productivity 4 14 12 7 Calculate the means, variances, and standard deviations of each variable. Additionally, you were told that the covariance is .83. Calculate the correlation coefficient. Example 4.17 Jeremy wonders how much his students pay attention and if distractions (phone, a classmate, etc.) have any influence on them. He collects sample data, and reports the following: # of Distractions 0 2 4 6 % of Time Paying Attention 85 60 30 15 Jeremy has calculated the correlation as -.992277877. He has 2.581988897 and 31.22498999 as the standard deviations of # of Distractions and % of Time Paying Attention respectively. Use this information to calculate the covariance and the variances. Example 4.18 Adapted from Spring 2012 Final Exam Problem 1. Use the sample data below to answer the following questions: X -8 6 10 -12 -1 Y 5 8 10 4 3 Z 4 -6 5 -3 9 • Compute s2x . • Suppose you are given that rx,z is .0795 and sz is 6.14. Compute sx,z . • In addition to all of the previous information, suppose you are given sx,y is 21.75, sy,z is -4.25, and sy is 2.9155. Rank the pairs of variables from weakest relationship to strongest relationship. If you are looking for extra practice problems for this material, see Spring 2010 Exam 1 Problem 8, and/or Fall 2009 Exam 1 Problem 6. Properties of Correlation: 59 of 62 1. It is always between -1 and 1 inclusive. 2. It has the same sign as the slope of the line of best fit. 3. It is severely affected by outliers. Removing an outlier will increase the | correlation |. 4. It has no units of measurement and is therefore unaffected by changes of units of measurement. 5. It is the same if you have the same 2 variables, no matter which one you call x and which one you call y. A scatterplot is a graph representing the relationship between 2 quantitative variables. Each dot on the graph represents one observation from the data. There are 3 main questions we ask about how a scatterplot looks. They are: form, strength, and direction. The form refers to linear, quadratic, sinusoidal, etc. The strength is given as an ordinal, qualitative variable with levels like weak, moderate, and strong. Sometimes people use very weak or very strong as well. The direction is positive or negative (upward sloping or downward sloping). Remember, r and ρ have the same sign as the slope, so both of them can be used to tell the direction of the relationship. A trendline is sometimes called a regression line or a line of best fit. What this does is it fits a line to the data by trying to minimize the sum of squares of the vertical distances from the points to the line. A trendline is written in slope intercept form, y = β0 + β1 x. This represents the true value of y, and β0 and β1 are the population intercept and slope. However, we typically do not know β0 and β1 , so they must be estimated instead. Therefore, you will see this as yb = b0 + b1 x, where b0 and b1 represent the sample values or estimates of their population counterparts. Any variable in statistics that is written with abdenotes that it is a prediction, or predicted value. Another concept in Statistics is that of a residual. A residual is defined to be your observed value - your predicted value. So using our symbols, the ith residual (or residual from the ith observation) would be ei = yi - ybi . Some typical questions involving trendlines are to interpret the slope, the intercept, and to do predictions. Additionally, we can ask how much you expect y to change by if x changes by a certain amount. r2 is just the square of the sample correlation coefficient. This concept is known as the coefficient of determination. It represents the amount of variability in y explained by the linear relationship with x. Example 4.19 For these examples, we will revisit Example 4.16- Example 4.18. Answer the following questions: 60 of 62 • Interpret the y-intercept. • Interpret the slope. • Interpret the r2 value. • Calculate the value of r. • Is a prediction at the value of 4 appropriate? If so, what is the predicted value? • Is a prediction at the value of 22 appropriate? If so, what is the predicted value? • If applicable, calculate a residual from your predicted value(s) above. What does this tell you about the position of the observation compared to the regression line? • If one were to increase x by 2 units, how would you expect y to change? • If one were to decrease x by 3 units, how would you expect y to change? Example 4.20 Use the graphs labelled graphs 1-4. You have the following possibilities for r values: -1, -.9696, -.4611, -.0490, 0, .0490, .5737, .9696, and 1. Pick the appropriate values for the 4 graphs. If you are looking for extra practice for values of r, you can go to Fall 2011 Final Exam Problem 2 or Spring 2012 Final Exam Problem 4. If you are looking for extra practice with scatterplot and regression questions, you can go to Fall 2011 Final Exam Problem 5 or Spring 2012 Final Exam Problem 3. 61 of 62 Graph 1 30 25 20 15 10 5 0 0 5 10 15 Graph 2 15 10 5 0 0 2 4 6 8 10 12 Graph 3 15 10 5 0 0 5 10 15 20 Graph 4 10 5 0 0 2 62 of 62 4 6 8 10 12

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture Notes - Department of Statistics, Purdue University