* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 1 - Department of Statistics and Probability
Survey
Document related concepts
Transcript
What have we learned? Shifting data by adding or subtracting the same amount from each value affects measures of center and position but not measures of spread. Rescaling data by multiplying or dividing every value by a constant changes all the summary statistics—center, position, and spread. What have we learned? (cont.) We’ve learned the power of standardizing data. Standardizing uses the SD as a ruler to measure distance from the mean (z-scores). With z-scores, we can compare values from different distributions or values based on different units. z-scores can identify unusual or surprising values among data. What have we learned? (cont.) We’ve learned that the 68-95-99.7 Rule can be a useful rule of thumb for understanding distributions: For data that are unimodal and symmetric, about 68% fall within 1 SD of the mean, 95% fall within 2 SDs of the mean, and 99.7% fall within 3 SDs of the mean. What have we learned? (cont.) We see the importance of Thinking about whether a method will work: Normality Assumption: We sometimes work with Normal tables (Table Z). These tables are based on the Normal model. Data can’t be exactly Normal, so we check the Nearly Normal Condition by making a histogram (is it unimodal, symmetric and free of outliers?) or a normal probability plot (is it straight enough?). Ex. 6.10 Cars currently sold in the US have an average of 135 horsepower, with a standard deviation of 40 horsepower. What is the zscore for a car with 195 horse power? Z=(195-135)/40=1.5 Ex. 6.12 People with z-scores greater than 2.5 on an IQ test are sometimes classified as geniuses. If IQ test scores have a mean of 100 and a std. dev. of 16 points, what IQ score do you need to be considered a genious? 2.5=(x-100)/16 x=140 Frequency table for quiz1 grades Descriptive statistics for Grades by sections Box plots for Grades by sections Assume that I picked a student with a 10 point from each section. Will this mean that these students are equivalent by means of their success? Section 10 Section 11 Mean=13.33 Std=3.241 Mean=13.300 Std=3.064 Section 12 Mean=12.567 Std=3.07 Assume that I picked a student with a 10 point from each section. Will this mean that these students are equivalent by means of their success? Section 10 Section 11 Mean=13.33 Std=3.241 Z-score= (10-13.33)/3.241=-1.027 Mean=13.300 Std=3.064 Z-score= (10-13.3)/3.064=-1.07 Section 12 Mean=12.567 Std=3.07 Z-score= (10-12.567)/3.07=-0.8367 Ex. 6.42 In a standard Normal model, what value(s) of z cut(s) off the region described? A) The lowest 12% -1.175 B) The highest 30% 0.53 C) The highest 7% 1.47 D) The middle 50% (-0.67, 0.67) Ex. 6.43 Based on the Normal model N(100,16) describing IQ scores, what percent of people’s IQS would you expect to be A) Over 80? Z=(80-100)/16=-1.25 1-0.1056=0.8944 ⇒89.4% B) Under 90? Z=(90-100)/16=-0.625 The mean for the values of -0.62 and -0.63=(0.2676+0.2643)/2=0.2659 ⇒26.6% C) Between 112 and 132? Z1=(112-100)/16=0.75 Z2=(132-100)/16=2.00 The value for 2.00-The value for 0.75=0.9772-0.7734=0.2038 ⇒20.4% Ex. 6.27 A) B) C) D) E) Environmental protection agency (EPA) fuel economy estimates for automobile models tested recently predicted a mean of 24.8 mpg and a standard deviation of 6.2 mpg for highway driving. Assume that the distribution is moundshaped(i.e; Normal model applies) Draw the model for auto fuel economy. Clearly label it showing what the 68-95-99.7 rule predicts about miles per gallon. In what interval would you expect the central 68% of autos to be found? About what percent of autos should get more than 31 mpg? About what percent of autos should get between 31 and 37 mpg? Describe the gas mileage of the worst 2.5% of all cars? Chapter 14 From Randomness to Probability Thinking Challenge What’s the probability of getting a head on the toss of a single fair coin? Use a scale from 0 (no way) to 1 (sure thing). So toss a coin twice. Do it! Did you get one head & one tail? What’s it all mean? Many Repetitions!* Total Heads Number of Tosses 1.00 0.75 0.50 0.25 0.00 0 25 50 75 Number of Tosses 100 125 Dealing with Random Phenomena A random phenomenon is a situation in which we know what outcomes could happen, but we don’t know which particular outcome did or will happen. In general, each occasion upon which we observe a random phenomenon is called a trial. At each trial, we note the value of the random phenomenon, and call it an outcome . The most basic outcome of a trial is a sample point. The collection of all possible outcomes is called the sample space. Visualizing Sample Space 1. Listing for tossing a coin once and noting up face S = {Head, Tail} Sample point 2. A pictorial method for presenting the sample space 3. Venn Diagram H T S Example Tossing two coins and recording up faces: Is sample space as below? S={HH, HT, TT} Tree Diagram 1st coin H T 2nd coin H T H T Sample Space Examples Sample Space Toss a Coin, Note Face Toss 2 Coins, Note Faces Select 1 Card, Note Kind Select 1 Card, Note Color Play a Football Game Inspect a Part, Note Quality Observe Gender {Head, Tail} {HH, HT, TH, TT} {2♥, 2♠, ..., A♦} (52) {Red, Black} {Win, Lose, Tie} {Defective, Good} {Male, Female} Events 1. Specific collection of sample points 2. Simple Event • Contains only one sample point 3. Compound Event • Contains two or more sample points Venn Diagram Trial: Toss 2 Coins. Note Faces. Sample Space Outcome; Sample point S = {HH, HT, TH, TT} TH HH Compound Event: At least one Tail HT TT S Venn Diagram Trial: Toss 2 Coins. Note Faces. Sample Space S = {HH, HT, TH, TT} HT TH TT HH S Simple Event: Tail for both tosses Thinking challenge A fair coin is tossed till to get the first head or four tails in a row. Which one is the sample space for this experiment? a. S={T, TH, TTH, TTTH, TTTT} b. S={T, HT, TTH, TTTH, TTTT} c. S={H, TH, TTH, TTTH, TTTT} d. S={H, HT, HHT, HHHT, HHHH} The Law of Large Numbers First a definition . . . When thinking about what happens with combinations of outcomes, things are simplified if the individual trials are independent. Roughly speaking, this means that the outcome of one trial doesn’t influence or change the outcome of another. For example, coin flips are independent. The Law of Large Numbers (cont.) The Law of Large Numbers (LLN) says that the long-run relative frequency of repeated independent events gets closer and closer to a single value. We call the single value the probability of the event. Because this definition is based on repeatedly observing the event’s outcome, this definition of probability is often called empirical probability. The Nonexistent Law of Averages Many people believe, for example, that an outcome of a random event that hasn’t occurred in many trials is “due” to occur. A common term for this is Law of Averages which doesn’t exist at all. The LLN says nothing about short-run behavior. Relative frequencies even out only in the long run, and this long run is really long (infinitely long, in fact). If the probability of an outcome doesn’t change and the events are independent, the probability of any outcome in another trial is always what it was, no matter what has happened in other trials Modeling Probability When probability was first studied, a group of French mathematicians looked at games of chance in which all the possible outcomes were equally likely. It’s equally likely to get any one of six outcomes from the roll of a fair die. It’s equally likely to get heads or tails from the toss of a fair coin. However, keep in mind that events are not always equally likely. A skilled basketball player has a better than 50-50 chance of making a free throw. Modeling Probability (cont.) When outcomes are equally likely, their probability is just 1/ #of possible outcomes. So probability of having 3 when we roll a fair die is 1/6 The probability of picking the ace of spades from the top of a well-shuffled deck is 1/52. It’s almost as simple to find probabilities for events that are made of several equally likely outcomes. Modeling Probability (cont.) The probability of an event is the number of outcomes in the event divided by the total number of possible outcomes. P(A) = # of outcomes in A # of possible outcomes Modeling Probability (cont.) The probability of drawing a face card (JQK) from a deck I P(face card)= #face cards/# cards = 12/52 Formal Probability 1. Two requirements for a probability: A probability is a number between 0 and 1. For any event A, 0 ≤ P(A) ≤ 1. Formal Probability (cont.) 2. Probability Assignment Rule: The probability of the set of all possible outcomes of a trial must be 1. P(S) = 1 (S represents the set of all possible outcomes.) Formal Probability (cont.) 3. Complement Rule: The set of outcomes that are not in the event A is called the complement of A, denoted AC. The probability of an event occurring is 1 minus the probability that it doesn’t occur: P(A) = 1 – P(AC) Formal Probability (cont.) Events that have no outcomes in common (and, thus, cannot occur together) are called disjoint (or mutually exclusive). Formal Probability (cont.) 4. Addition Rule : For two disjoint events A and B, the probability that one or the other occurs is the sum of the probabilities of the two events. P(A or B) = P(A) + P(B), provided that A and B are disjoint. Example d) For the probabilities given in part c, P(A)=0.3, P(B)=0.2 P(A)=0.25, P(B)=0.3 i) under the assumption that events A and B are disjoint, find P(A or B)=? ii) Find P(Ac) Ex. 14.19 from text book A consumer org. estimates that over a 1-year period 17% of cars will need to be repaired once,7% will need repairs twice, and 4% will require three or more repairs. What is the probability that a car chosen at random will need a) no repairs? P(NR)=1-[(0.17)+(0.07)+(0.04)]=0.72 b) no more than one repair? P(NR)+P(1R)=0.72+0.17=0.89 c) some repairs? P(1R)+P(2R)+P(3mR)=0.28 Formal Probability 5. Multiplication Rule (cont.): For two independent events A and B, the probability that both A and B occur is the product of the probabilities of the two events. P(A and B) = P(A) x P(B), provided that A and B are independent. Formal Probability (cont.) 5. Multiplication Rule (cont.): Two independent events A and B are not disjoint, provided the two events have probabilities greater than zero: Formal Probability (cont.) 5. Multiplication Rule: Many Statistics methods require an Independence Assumption, but assuming independence doesn’t make it true. Always Think about whether that assumption is reasonable before using the Multiplication Rule. Q2 A certain bowler can bowl a strike 70% of the time. What is the probability that she A) goes three consecutive frames without a strike? 0.3*0.3*0.3=0.027 B) makes her first strike in the third frame? 0.3*0.3*0.7=0.063 C) Has at least one strike in the first three games? 3*(0.3*0.3*0.7)+3*(0.3*0.7*0.7)+0.7*0.7*0.7)=0.973 Or 1-P(NoS)=1-(1-0.7)3 =0.973 D) Bowls a perfect game (12 consecutive strikes)? (0.7)12=0.0138 Just checking a) b) c) d) Opinion polling organizations contact their respondents by telephone. Random telephone numbers are generated, and interviewers try to contact those house holds. According to the Pew Research center for the people and the Press, by 2003 this contact rate had risen to 76%. We can reasonably assume each house hold’s response to be independent of the others. What is the probability that the interviewer successfully contact the next households on the list? What is the probability that the interviewer successfully contact both of the next households on her list? What is the probability that the interviewer’s first successful contact is the third house hold on the list? What is the probability that the interviewer makes at least one successful contact among the next five households on the list? Just checking a) b) c) d) Opinion polling organizations contact their respondents by telephone. Random telephone numbers are generated, and interviewers try to contact those house holds. According to the Pew Research center for the people and the Press, by 2003 this contact rate had risen to 76%. We can reasonably assume each house hold’s response to be independent of the others. What is the probability that the interviewer successfully contact the next households on the list? 0.76 What is the probability that the interviewer successfully contact both of the next households on her list? 0.76*0.76 What is the probability that the interviewer’s first successful contact is the third house hold on the list? (1-0.76)2 * 0.76 What is the probability that the interviewer makes at least one successful contact among the next five households on the list? 1-(1-0.76)5 Example Solution a. S={Brown, yellow, red, blue, orange,green} b. P={0.13, 0.14, 0.13, 0.24, 0.20, 0.16} c. Let event A=selecting brown candy P(A)=P(Brown)=0.13 d. Let event B=selecting red, green or yellow candy P(B)= 0.13+0.16+0.14=0.43 e. Let event C= selecing a candy other than blue P(C) = 0.13 +0.14+ 0.13+ 0.20+ 0.16=0.76 or P(C) = 1-0.24=0.76 Formal Probability - Notation Notation alert: In the text book the notation P(A or B) and P(A and B) are used. In other situations, you might see the following: P(A ∪ B) instead of P(A or B) P(A ∩ B) instead of P(A and B) What have we learned? Probability is based on long-run relative frequencies. The Law of Large Numbers speaks only of longrun behavior. Watch out for misinterpreting the LLN. What have we learned? (cont.) There are some basic rules for combining probabilities of outcomes to find probabilities of more complex events. We have the: Probability Assignment Rule Complement Rule Addition Rule for disjoint events Multiplication Rule for independent events