Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 3 Summer Semester 2009 BEA 140 By Leon Jiang BEA140 Leon Jiang, University of Tasmania 1 Some points more for univariate data BEA140 Leon Jiang, University of Tasmania 2 Central tendency Mean Median Mode BEA140 Leon Jiang, University of Tasmania 3 Variance BEA140 Population: 2 = ( Xi2 - (Xi)2/N ) / N Sample: s2 = ( Xi2 - (Xi)2/n ) / (n-1) Leon Jiang, University of Tasmania 4 Standard deviation s2 = ( Xi2 - (Xi)2/n ) / (n-1) [ X i ( X i ) / n] 2 s BEA140 2 n 1 Leon Jiang, University of Tasmania 5 The meaning of Stdv. “ For most data batches around two thirds ( or 68%) of the data will fall within one standard deviation of the mean, and around 95% within two standard deviations of the mean.” - empirical rule - rule of thumb BEA140 Leon Jiang, University of Tasmania 6 MEASURING FROM GROUPED DATA BEA140 Leon Jiang, University of Tasmania 7 Measuring For Grouped Data When no raw data but only secondary source of data available, we have to analyze this secondary set of data, which has been grouped for reporting purposes. A set of grouped data is not like a set of raw data in that the information in it has already been grouped arbitrarily. A set of grouped data is subjective or at least it is not so objective as raw data, therefore small errors exist. BEA140 Leon Jiang, University of Tasmania 8 Generally we use a frequency distribution table to show the grouping of data Time number of calls class mark cum. Freq. fj xj fjxj fjxj2 1<=X<3 3<=X<5 5<=X<7 7<=X<9 9<=X<11 11<=X<13 13<=X<15 15<=X<17 11 19 10 9 2 1 1 0 2 4 6 8 10 12 14 16 22 76 60 72 20 12 14 0 44 304 360 576 200 144 196 0 11 30 40 49 51 52 53 53 17<=X<19 19<=X<21 1 0 54 18 20 18 0 294 324 0 2148 54 54 BEA140 Leon Jiang, University of Tasmania 9 Class mark for frequency distribution of grouped data Class mark , Xj is a representative value of all observations located in the class. A class mark is determined by the largest value and the smallest value in the class. Xj Xj = (RUCL + RLCL) / 2 Where, RUCL => the largest value ; RLCL => the smallest value BEA140 = ( largest value + smallest value ) / 2 Leon Jiang, University of Tasmania 10 Central tendency for grouped data Mean of g.d (grouped data) is defined as the weighted sum of class marks, with class frequencies as weights. i.e. X(mean) = (Σfj xj ) / n X ( mean ) = 294/54=5.44 BEA140 Leon Jiang, University of Tasmania 11 Median for g.d 1. - Locating the median class : the class containing the median. But how and where? - Total number of calls in the frequency distribution is 54 (=> even number). - and therefore, according to the formula of median ( median = n + 1 / 2 ), the median ought to be the 27.5th value. The class containing the 27.5th value is the median class. - BEA140 Leon Jiang, University of Tasmania 12 FORMULA FOR MD: MD = LCL + class width * ( how far into class ) / (how many in class ) 3.0 + 2 * (27.5 - 11) / 19 BEA140 Leon Jiang, University of Tasmania 13 MD = LCL + class width * ( how far into class ) / (how many in class ) 3.0 + 2 * (27.5 - 11) / 19 Time number of calls class mark cum. Freq. fj xj fjxj fjxj2 1<=X<3 3<=X<5 5<=X<7 7<=X<9 9<=X<11 11<=X<13 13<=X<15 15<=X<17 11 19 10 9 2 1 1 0 2 4 6 8 10 12 14 16 22 76 60 72 20 12 14 0 44 304 360 576 200 144 196 0 11 30 40 49 51 52 53 53 17<=X<19 19<=X<21 1 0 54 18 20 18 0 294 324 0 2148 54 54 BEA140 Leon Jiang, University of Tasmania 14 Small errors likely exist most of the time Median from raw data = 4.4 Median from grouped data = 4.47 BEA140 Leon Jiang, University of Tasmania 15 An example: MD = LCL + class width * ( how far into class ) / (how many in class ) Class 80 &U 90 90 &U 100 100 &U 110 110 &U 120 120 &U 130 130 &U 140 BEA140 Freq. 1 2 6 3 2 2 16 Leon Jiang, University of Tasmania cumu. Freq. 1 3 9 12 14 16 16 LCL + class width * (how far into the class) / how many in the class 100 + 10 * (8.5 – 3) / (9 – 3) Median = 109.17 BEA140 Leon Jiang, University of Tasmania 17 Mode for g.d. With grouped data, we tend to talk more of a modal class – the class (classes) with the highest frequency rather than the mode. But, if asked for a mode with grouped data, the best we can do is to tell the class mark of modal class as follows: Modal class: 3 &U 5 ( 19 observations ) Mode : 4 ( class mark of modal class ) BEA140 Leon Jiang, University of Tasmania 18 Dispersion ( variance ) for grouped data The sample variance formula is : S2 ={Σfj Xj2 – (Σfj Xj)2 / n }/ (n-1) The population variance formula is : Standard deviation = BEA140 2 = {Σfj Xj2 – (Σfj Xj)2 / N }/ N s 2 or Leon Jiang, University of Tasmania 2 19 Preparing a table to help work out S.d. Class 80 &U 90 90 &U 100 100 &U 110 110 &U 120 120 &U 130 130 &U 140 BEA140 Freq. class mark cumu. Freq. fj xj f jXj f jXjsquare 1 2 6 3 2 2 16 85 95 105 115 125 135 660 85 190 630 345 250 270 1770 7225 18050 66150 39675 31250 36450 198800 Leon Jiang, University of Tasmania 1 3 9 12 14 16 20 Working out the standard deviation for the example~! S2 ={Σfj Xj2 – (Σfj Xj)2 / n }/ (n-1) Standard deviation = S = 14.14 Mean = 1770 / 16 = 110.625 BEA140 s Leon Jiang, University of Tasmania 2 21 Shape Skewness – relates to symmetry of distribution. Positively skewed or right skewed: tail extends to right , mean > Median > Mode Negatively skewed or left skewed: tail extends to left, mean < median < mode BEA140 Leon Jiang, University of Tasmania 22 Standard scores The standard score expresses any observation in terms of the number of standard deviation it is from the mean. t score ( for sample) * z score (for population) X X t s X z BEA140 Leon Jiang, University of Tasmania 23 Interpretation of standard score Mean 5, standard deviation 2, for a sample t score for 8 = (8-5)/2=1.5 Interpretation: the observation is 1.5 standard deviations above the sample mean. BEA140 Leon Jiang, University of Tasmania 24 Bivariate Variables Summary measures BEA140 Leon Jiang, University of Tasmania 25 Bivariate variables In the previous parts, we were all the time talking about a single numerical variable such as the rate of return of mutual funds. From this lecture, we shall start to study two variables with correlation. BEA140 Leon Jiang, University of Tasmania 26 Two numerical variables A case: In a call center, operators were trained to receive phone calls. However, the duration of calls shows a significant difference from one another. The shorter the duration of a call, the more efficient an operator proves to be. Suppose, the call center manager wants to know if the training hours the operators received have any correlation to the duration of those phone calls the operators handled. The data pooled down are as the follows: X Training hours Y Duration minutes BEA140 Leon Jiang, University of Tasmania 27 Data pooled like this X (training hours): 6.5 7.5 6 8.5 5.5 3.5 8.5 8 8 7 8.5 9.5 Y (duration mins): 6.2 2.9 9.2 3.2 8.9 13.6 2.5 4.2 4.3 3.1 3.4 2.7 X (training hours): ……………………………………………………. Y (duration mins): ……………………………………………………. Anyway, in total there have been 54 phone calls in this set of data being studied. * Now, what we are about to find out is to know whether these two variables ( X training hours of operators ; Y duration minutes of calls) show any real correlation. Or , by putting it simply, the call center manager wants to know if the more training hours the operators receive, the shorter the duration of calls the operators handle will be. BEA140 Leon Jiang, University of Tasmania 28 Setting up a scatter diagram for the data here ~! A scatter diagram ( scattergram ) between two variables will indicate the form, type and strength of the relation. Form – whether linear or non-linear Type – direct (positive) or inverse (negative) Strength – how closely data are co-ordinated, e.g. if linear, how close ordered pairs are to a line describing their relationship. This is indicated by a correlation measure. BEA140 Leon Jiang, University of Tasmania 29 (Pearson’s) Coefficient of Correlation This is a summary measure that describes the form, type and strength of a scattergram. The range of r is between –1 , 0 , 1. -1: perfect negative relationship – all points exactly on a negative sloping line 0: no linear correlation 1: perfect positive relationship r BEA140 XY ( X )( Y ) / n X ( X ) / n Y ( Y ) 2 2 Leon Jiang, University of Tasmania 2 2 /n 30 Back to the case study r( Pearson’s coefficient of correlation) = - 0.9209 This means X and Y have a very strong negative linear relationship. Or , let’s say the training hours the operators received really show a strong negative relationship with the duration of calls they handled. BEA140 Leon Jiang, University of Tasmania 31 In-depth analysis of this linear relationship – linear regression Determining the Coefficient of Correlation is concerned with summarizing the form, type and strength of the relationship between two variables. The motivation for regression is the desire to quantify the relationship, often for the purposes of using the knowledge of one variable to predict the other. Say , using one variable ( X ) to predict the other variable ( Y ). BEA140 Leon Jiang, University of Tasmania 32 The regression line is mathematically expressed by this equation Yc = a + bX Yc is the computed value of Y. a is the sample regression constant, or Y-intercept. b is the sample regression coefficient, or slope of the line. BEA140 Leon Jiang, University of Tasmania 33 Least squares method This is a mathematical technique that determines what values of a and b minimize the sum of squared differences. Any values for a and b other than those determined by the least-squares method result in a greater sum of squared differences between the actual value of Y and the predicted value of Y. Simply put, least-squares method is used to find a line of best fit for two correlated variables. BEA140 Leon Jiang, University of Tasmania 34 Working out the linear regression ~! Residual is defined as the vertical distance between the actual value and the predicted value ( the point on the line of best fit). In least-squares regression, we find the values of a and b, such that sum of squares of residuals, is a minimum. Actual pairs : (X1, Y1), (X2, Y2),… ... Predicted (calculated )pairs: (X1, Yc1), (X1,Yc2), … … BEA140 Leon Jiang, University of Tasmania 35 Back to the case study~! Since we have known that the training hours correlate to the duration of calls. It is somehow to say : if we know the training hours an operator received , in some sense we can predict how many minutes , on average, he or she should take to handle a phone call. Or, in linear regression, we know X and by using the least squares method, we can calculate out Y. BEA140 Leon Jiang, University of Tasmania 36 Solutions for a & b Two formulae respectively for a and b. b n XY X Y a BEA140 n X 2 Y ( X ) 2 b X n Leon Jiang, University of Tasmania 37 Establishing a table to work out linear regression BEA140 Xi 6.5 7.5 6 … … … 8.5 7.5 6 391.5 Yi 6.2 2.9 9.2 … … … 2.8 5.9 6.5 290.7 2 Xi 42.25 … XiYi 40.3 … 2 Yi 38.44 … 36 39 42.25 2974.25 1863.55 2081.69 Leon Jiang, University of Tasmania 38 Outcomes ~! b=-1.79595 a=18.40399 . BEA140 Then Yc=18.404 –1.796X This is the linear regression. Interpretation : for each extra hour of training, there is an associated decrease of 1.796 minutes in call duration. Leon Jiang, University of Tasmania 39 One consideration~! Note: regression says nothing about causation, only about association~! This means X does not necessarily cause a change in Y. Or, the training hours do not necessarily change the duration of calls, instead they have correlation. Think about : does smoking cigarettes cause life expectancy shorter? Not really~! ? BEA140 Leon Jiang, University of Tasmania 40 The standard error of the estimate Standard error measures how well actual Y and computed Y are matched – the smaller Se, the better the match and predictive accuracy. Se 2 (Y Yc ) n2 Se BEA140 2 Se Leon Jiang, University of Tasmania 2 41 Note! Standard error is very similar to standard deviation. Standard error is for bivariate, whilst standard deviation is for univariate. BEA140 Leon Jiang, University of Tasmania 42 Computational form for Se. You can use this computational form to find out Se. Se BEA140 2 Y 2 a Y b XY n2 Leon Jiang, University of Tasmania 43 Coefficient of determination 2 2 Y ( Y ) /n Total variation = SST= Explained variation = SSR Unexplained variation = SSE= 2 Y aY b XY Coefficient of determination =SSR / SST= BEA140 Leon Jiang, University of Tasmania SSE 1 SST 44 Coefficient of determination - r 2 The Coefficient of determination by calculation turned out to be 0.848 This means 85% of total variation in call duration (around the average duration level) has been explained by a linear relation between duration and training hours. BEA140 Leon Jiang, University of Tasmania 45 We just saw summery measures for dealing with two numerical variables. What about ordinal data? BEA140 Leon Jiang, University of Tasmania 46 Two ordinal variables BEA140 A scattergram can also be used to illustrate a possible relationship between two ordinal variables. We often have ordinal variables in fields such as Marketing and Management where people have been asked to rank some attribute. An example could be a series of taste trials carried out during product development, such as the example below, where a panel was asked to rank soft drinks by “Refreshing ness” and “Sweetness”. Leon Jiang, University of Tasmania 47 Understanding this example This example illustrates which one of the drinks is the most refreshing and which is the second most refreshing … Likewise, which is the sweetest and which is the second sweetest … BEA140 Leon Jiang, University of Tasmania 48 Drink Slurp Fizz Fizz Plus Binge Slam Dunk Whizz Pling Tweak Blitz BEA140 Refresh Rank 1 2 5 6 3 4 10 9 7 8 Leon Jiang, University of Tasmania Sweetness Rank 8 7 10 9 5 6 2 3 1 4 49 Sweetness vs Refreshingness Sweetness Rank 12 10 8 6 4 2 0 0 2 4 6 8 10 12 Refreshing Rank BEA140 Leon Jiang, University of Tasmania 50 Spearman’s Rank Correlation Coefficient Spearman’s Rank CC, can be used as a summary measure to gauge the degree of relationship between two ordinal variables. Spearman’s Rank C.C. is given the symbol rs for sample data, (and rs for population data) BEA140 It is usually calculated using the following short cut formula: r is the Greek letter ‘rho’ - (the Greek equivalent to ‘r’). Leon Jiang, University of Tasmania 51 Where di is the difference between the ranks of the ith pair of observations, and n is the number of pairs of observations. n rs 1 BEA140 6 d i 1 2 2 i n n 1 Leon Jiang, University of Tasmania 52 Notes to this short formula Strictly speaking this formula only works when the number of ties is relatively small. If more than about 1/4 to 1/3 of the observations of a variable are in ties then the shortcut formula starts to get unreliable. We will deal with ties later. When there are too many ties we need to use the “long” formula BEA140 Leon Jiang, University of Tasmania 53 What are ties? BEA140 Leon Jiang, University of Tasmania 54 Dealing with ties: we allocate the average rank of all observations involved in the tie, to each observation involved in the tie. Standard & Poor’s bond ratings for a random sample of 12 bonds: C BB A AA A BBB CC D B A AA AAA BEA140 Leon Jiang, University of Tasmania 55 C BB A AA A BBB CC D B A AA AAA AAA AA AA A A A BBB BB B CC C D 1 2 3 4 5 6 7 8 9 10 11 12 AAA AA AA A A A BBB BB B CC C 1 BEA140 2.5 2.5 5 5 5 7 Leon Jiang, University of Tasmania 8 D 9 10 11 12 56 Two people came equal third (that is, the next person came fifth). These share the 3rd & 4th positions and thus each is given a rank of 3.5. placing 1.0 2.0 3* 3* 5.0 6.0 7.0 ranking 1.0 2.0 3.5 3.5 5.0 6.0 7.0 BEA140 Leon Jiang, University of Tasmania 57 Rankings with ties When rankings involve ties they provide us with two extra problems: how to deal with the ties the short cut formula may be unreliable if there are too many ties, and we need to use a longer formula – BEA140 Leon Jiang, University of Tasmania 58 The Full Spearman formula - Use when there are ties! rs BEA140 n n n i 1 i 1 i 1 n X i Yi ( X i )( Yi ) n n n 2 n 2 2 2 n X i ( X i ) n Yi ( Yi ) i 1 i 1 i 1 i 1 Leon Jiang, University of Tasmania 59 Example - using the “short cut” formula Drink Slurp Fizz Fizz Plus Binge Slam Dunk Whizz Pling Tweak Blitz BEA140 Refresh Rank Sweetness Rank 1 8 2 7 5 10 6 9 3 5 4 6 10 2 9 3 7 1 8 4 Leon Jiang, University of Tasmania di di 2 -7 -5 -5 -3 -2 -2 8 6 6 4 49 25 25 9 4 4 64 36 36 16 268 60 Result ! rs = 1 - 6*268 / (10*99) = - 0.624 Indicating quite a strong negative relationship between refreshingness and sweetness, (as we saw in the scattergram). BEA140 Leon Jiang, University of Tasmania 61 Example – using the “long” formula BEA140 A students association’s satisfaction ratings for 8 courses, and the seniority of the person taking the course are listed below. Use Spearman’s Rank C.C. to investigate the relationship between the two. Leon Jiang, University of Tasmania 62 End of Module 2 We are getting in Module 3 ! BEA140 Leon Jiang, University of Tasmania 63 Module 3 Probability & Probability Distributions BEA140 Leon Jiang, University of Tasmania 64 Probability What is meant by the word – probability? Probability is the likelihood or chance that a particular event will occur. Three approaches to probability 1. 2. BEA140 A priori classical probability Empirical classical probability 3. Subjective probability Leon Jiang, University of Tasmania 65 A priori classical probability The probability of success is based on prior knowledge of the process involved. Probability of occurrence X T Probability of occurrence X=number of ways in which the event occurs T=total number of elementary outcomes BEA140 Leon Jiang, University of Tasmania 66 Example for priori classical probability A box of 20 chocolate beans, among which 10 are red-colored and the other 10 are greencolored. The probability of selecting a piece of red-colored bean each time is 0.5 , or say : 10 / 20. Because we know the total number of beans and also the proportion of the two different colored beans in advance, that’s why we call it – “ priori probability ” BEA140 Leon Jiang, University of Tasmania 67 Empirical classical probability Empirical classical probability adopts the same formula to calculate the probability of occurrence. Probability of occurrence However, in empirical classical probability, probability of success is based on observed data instead of pre-known data (priori). BEA140 X T Leon Jiang, University of Tasmania 68 Example for empirical classical probability Your mid-term exam is coming and this exam is said to be optional, which means you can choose to take the exam or not. If we take a poll asking how many students are to attend the exam and 99% of students are to attend the exam, we say here, there is a 0.99 probability that an individual student will attend the exam. Remember, in this example, we did not know how many students wished to take the exam. And this is different from the priori classical example, in which we already knew 50% were red and 50% were green. BEA140 So, empirical probability actually is based on more randomness. Leon Jiang, University of Tasmania 69 Subjective probability From the name we can infer that this approach to probability is based on people’s personal decision. For instance: You think you have a probability of 90% to pass CPA exam and your supervisor thinks your probability to pass it can be 60%. Both of the probabilities are based on personal judgment and experience, but not on objectiveness. BEA140 Leon Jiang, University of Tasmania 70 Sample spaces and events BEA140 Event : Each possible type of occurrence is referred to as an event. Simple event A simple event can be described by a single characteristic. Sample space: The collection of all the possible events is called the sample space. Leon Jiang, University of Tasmania 71 Axioms about probability Given a sample space: S={E1+E2+… + En}, the probabilities assigned to Ei must satisfy: If an event has no chance to occur, the probability is 0 and if an event is definite to occur, the probability is 1. BEA140 0 ≦Ei ≦1, for each I P(E1) + P(E2) +…P(En) = ∑P(Ei) = 1 Probability of Event A = sum of probabilities of simple events comprising A. Leon Jiang, University of Tasmania 72 Contingency tables By example: Intent to purchase investigation This kind of investigation often takes place in sales and marketing research scenario. BEA140 In this example : the sample space is 1,000 households in terms of purchase behavior for laptop computer. Leon Jiang, University of Tasmania 73 1. 2. 1. 2. BEA140 In the investigation, there are basically two intents to the purchase. Sub-samples Planned to purchase – 300 households Not planned to purchase – 700 households So, after the purchase behaviors happened, we can further subdivide the sample of 1,000 households into : Actually purchased Not purchased Leon Jiang, University of Tasmania 74 Now, in this example, of the big sample of 1,000, we can have four different sub-samples: 1. Planned to purchase Not planned to purchase 2. 3. 4. BEA140 Purchased Not purchased Leon Jiang, University of Tasmania 75 but, latter, the outcomes of actual purchase and no purchase turned out to be not that consistent with the original investigated intents. In the first category ( planned to purchase – 300 households), 200 out of 300 actually purchased and the remaining 100 did not. In the second category ( not planned to purchase – 700 households ), 50 out of 700 actually purchased, the remaining 650 was consistent with their initial intent. BEA140 Leon Jiang, University of Tasmania 76 Complement and joint event The complement of event A includes all events that are not part of event A. The complement of A is given by the symbol A’ or A. In the above example, 300 planned to purchase is the complement of 700 not planned to purchase. Joint event: A joint event is an event that has two or more characteristics. BEA140 In the above example, the event “ planned to purchase and actually purchased” is a joint event. Leon Jiang, University of Tasmania 77 Usually two ways to depict events in sample Contingency table - also called “ table of crossclassification ” Now, based on the above example, we learn to construct this contingency table and Venn diagram. BEA140 Leon Jiang, University of Tasmania 78 Contingency table Actually Purchased Yes No Total BEA140 Planned to purchase Yes No 200 50 100 650 300 700 Leon Jiang, University of Tasmania Total 250 750 1000 79 Terms BEA140 Intersection A∩B: both A and B occur together, the joint event. ( sometimes simply written as AB) Union A∪B: either A or B or both. Other common forms of notation include A∨B , A+B, A OR B Leon Jiang, University of Tasmania 80 Example of using the above two notations Number (n) of cards that is a Heart or an ace in a set of poker cards (52 cards). n(H∪A) = n(H) + n(A) - n(H∩A) = 13 + 4 - 1 = 16 BEA140 Leon Jiang, University of Tasmania 81 Complement Complement - A’: event A does not occur, or another form : NOT A. Example: Non-hearts = H’, n(H’) = 39 Complement rule: P(A) = 1- P(A’) BEA140 Leon Jiang, University of Tasmania 82 Mutually exclusive and collectively exhaustive Mutually exclusive: occurrence of one event precludes occurrence of another. If A and B are mutually exclusive, then n(A∩B) = 0. Collectively exhaustive: Events together comprise the sample space; at least one event is certain to occur. Example: number of female students ∪ number of male students = 26 ( QM course ). BEA140 Leon Jiang, University of Tasmania 83 More to understand mutually exclusive and collectively exhaustive BEA140 For being female or male, everyone only can be one or the other ( collectively exhaustive) , but no one is both ( mutually exclusive). Being female or male are mutually exclusive and collectively exhaustive events. In the example of TV set purchase: Planned to purchase or not planned to purchase. Everyone only can plan to purchase or not (collectively exhaustive), but no one is both “planned to purchase” and “not planned to purchase ” (mutually exclusive). Leon Jiang, University of Tasmania 84 Probability contingency table BEA140 numbers O O' Total M 7 24 31 M' 14 35 49 Total 21 59 80 numbers O O' Total M 0.0875 0.3 0.3875 M' 0.175 0.4375 0.6125 Total 0.2625 0.7375 1 Leon Jiang, University of Tasmania 85 General form of a 2×2 contingency table Probabilities A B P(A∩B) B' P(A∩B') Total P(A) BEA140 A' Total P(A'∩B) P(B) P(A'∩B') P(B') P(A') 1 Leon Jiang, University of Tasmania 86 Simple (marginal) probability : P(A) The most fundamental rule for probabilities is that they range from 0 to 1. Simple (marginal) probability refers to the probability of occurrence of a simple event. P(A). Example: what is the probability that a red-heart card is selected in a set of poker cards? P(red-card) = 13 / 52 = 0.25 BEA140 Leon Jiang, University of Tasmania 87 Joint probability : P(A∩B) Joint probability refers to situations involving two or more events, such as the probability of planned to purchase and actually purchased in the big-screen TV set purchase example. Joint probability means that both event A and B must occur simultaneously. So, P(planned ∩purchased ) = 200/1000 = 0.2 BEA140 Leon Jiang, University of Tasmania 88 Computing marginal probability In fact, the marginal probability of an event consists of a set of joint probabilities. The formula: P(A) = P(A and B1) + P(A and B2) + … + P(A and Bk) In the previous example: P(planned to purchase) = P(planned to purchase and purchased) + P(planned to purchase + did not purchase) = 200/1000 + 100/1000 =0.30 BEA140 Leon Jiang, University of Tasmania 89 Addition rule P(A∪B) = P(A) + P(B) – P(A∩B) N.B. If A, B are mutually exclusive, then P(A∩B) = 0, and P(A∪B) = P(A) + P(B) BEA140 Leon Jiang, University of Tasmania 90 Multiplication rule P(A∩B) = P(B|A) = P(A|B)P(B) and it follows that P(B|A) = P(A∩B) / P(A) or P(A|B) = P(A∩B) / P(B) The bar symbol “ | ” means “given”. P(B|A) is the probability of B happening given that A happens. This is known as a conditional probability. BEA140 Leon Jiang, University of Tasmania 91 Conditional probability - - BEA140 To spot conditional probabilities, we notice those words like “of ”, “ if ” and “given”. Suppose : D = “part is defective”, and B = “part was produced by B”, the following would tell you P(D|B): If a part was produced by B, there is a 5% chance it is defective. 5% of the parts produced by B are defective. There is a 5% chance that a part is defective, given that it was produced by B. Leon Jiang, University of Tasmania 92 Back to the TV purchase example P(actually purchased | planned to purchase) = planned to purchase and actually purchased planned to purchase = 200 / 250 = 0.80 P(B|A) = P(A and B) / P(A) Here: A = planned to purchase B = Actually purchased BEA140 Leon Jiang, University of Tasmania 93 Independence Two events, A and B, are independent, if the probability of A occurring is not affected by B and vice versa. A, B independent if : P(A) = P(A|B) , P(B) = P(B|A) P(AB) = P(A)P(B) only if A and B are independent. BEA140 Leon Jiang, University of Tasmania 94 Bayes’ Theorem BEA140 Bayes’ rule is useful in decision analysis. Let’s learn it through an example as follows: A machine is known to be in good condition 90% of the time. If in good condition, only 1% of output is defective. If in poor condition, 10% of output is defective. An item of output is observed to be defective. Given this information what is the probability that the machine is in good condition? Leon Jiang, University of Tasmania 95 Solution G: condition of machine is good. D: an item of output is defective. Probabilities: Prior (pre-condition) : P(G) = 0.9, P(G’) = 0.1 Conditional : P(D|G) = 0.01, P(D|G’) = 0.10 BEA140 P(G|D) = P(D|G)*P(G) / P(D) - conditional probability But, we need to find out P(D). Leon Jiang, University of Tasmania 96 P(defect) = P(defect and good condition) + P(defect and poor condition) P(G∩D) + P(G’ ∩D) P(D) = P(D|G)P(G) + P(D|G’)P(G’) = 0.01*0.9 + 0.10*0.1 = 0.019 Then : P(G|D) = 0.009 / 0.019 = 0.47 BEA140 Leon Jiang, University of Tasmania 97 Expression of Bayes’ Rule P(A|B) = P(B|A)P(A) / P(B) This actually is the formula for joint probability. BEA140 Leon Jiang, University of Tasmania 98 Counting Rule 1 If any one of k different mutually exclusive and collectively exhaustive events can occur on each of n trials, the number of possible outcomes is equal to k BEA140 n Leon Jiang, University of Tasmania 99 Example for counting rule 1 A coin ( two sides) tossed 10 times, the number of outcomes is 2 1,024 10 BEA140 Leon Jiang, University of Tasmania 100 Counting Rule 2 If there are K1 events on the first trial, K2 events on the second trial, … and Kn events on the nth trial, then the number of possible outcomes is BEA140 (k1) (k2) … (Kn) Leon Jiang, University of Tasmania 101 Example for counting rule 2 A license plate consists of 3 letters (26 letters in total, a,b,c…z) followed by 3 digits ( 1 – 10), the possible outcomes are: 26× 26×26 ×10 ×10 ×10 = 17,576,000 BEA140 Leon Jiang, University of Tasmania 102 Counting Rule 3 The number of ways that n objects can be arranged in order is: n!=(n)(n-1)…(1) 0!=1 “!” reads “factorial”. “n!” is read “n factorial”. BEA140 Leon Jiang, University of Tasmania 103 Example for counting rule 3 The number of ways that 6 books can be arranged is: n!=6!= 6*5*4*3*2*1=720 BEA140 Leon Jiang, University of Tasmania 104 Counting Rule 4 Permutations: the number of ways of arranging X objects selected from n objects in order is: n! ( n X )! Permutation: each possible arrangement is called permutation. BEA140 Leon Jiang, University of Tasmania 105 Example for counting rule 4 The number of ordered arrangements of 4 books selected from 6 books is : n! 6! 360 (n X )! (6 4)! BEA140 Leon Jiang, University of Tasmania 106 Counting Rule 5 Combinations: the number of ways of selecting X objects out of n objects, irrespective of order, is : n! X !( n X )! X n BEA140 Leon Jiang, University of Tasmania 107 Example for counting rule 5 – also called rule of combinations 4 books out of 6 books, the number of arrangements is ( note: irrelevant to order): BEA140 n! X X !( n X )! n = 15 Leon Jiang, University of Tasmania 108