Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Random Variables & Probability Distributions Chapter 6 of the textbook Pages 167-208 Lecture Overview Schedule Clarification from Friday Discrete Random Variables Schedule Today: – – Discrete random variables Homework #4 will be posted this afternoon Wednesday: – – Continuous random variables & bivariate random variables Homework #3 due Friday: – Homework 4 help & Excel show and tell Next Monday: – – – Any remaining chapter 6 slides Exam #1 Review Homework #4 due Next Wednesday: – – Exam #1 (you’re allowed 1 sheet of paper (front & back) for notes & equations) Test questions will be very reminiscent of homework problems Next Friday: – – Go over exam #1 questions Intro to S-Plus Clarification From Friday On HW3, question #15 P(A) = .3, P(B) = .5, P(B|A) = .4 What is P(A|B) What is P(A∩B) Using the multiplication theorem P(B∩A) = P(B|A)*P(A) = 0.4 * 0.3 = .12 Using the definition of conditional probability P(A|B) = P(A∩B) / P(B) = .12 / .5 = .24 Clarification From Friday Why doesn’t P(A ∩ B) = P(A) * P(B)? Answer: because (A) and (B) aren’t statistically independent Recall that statistical independence is defined as: – P(A|B) = P(A) – OR – P(B|A) = P(B) This is not true for this problem If A and B are statistically independent, the multiplication theorem becomes: P(A ∩ B) = P(A) * P(B) since we can just replace P(A|B) with P(A) Definitions Random Sample (from Ch. 1) Variable (from Ch. 1) Random Variable – “any numerically valued function that is defined over a sample space” – For the household example in the book - “the variable is random not because the household makes a random decision to include a certain number of people, but because our sample experiment selects a household randomly” Example Imagine randomly sampling students in the union and asking them how many books they are carrying Example Data Elementary Outcomes – – – – – – – – – – Student 1 : 3 books Student 2 : 2 books Student 3 : 0 books Student 4 : 1 books Student 5 : 2 books Student 6 : 1 books Student 7 : 0 books Student 8 : 1 books Student 9 : 4 books Student 10 :1 books Sample Space : {0,1,2,3,4} Random Variable (X) (Function: X = # of books) – – – – X(Student 1) = 3 X(Student 2) = 2 X(3) = 0 Etc. Probability of Random Variables – So X can be any value from the full set of possible #s of books – x can = any number in the sample space (0,1,2,3,4) – P(x) is the probability of getting an x in a random sample – Example: P(0 books) = 2/10 = .2 – Example: P(3) = 1/10 = .1 Clarification: X and x X is the random variable – Can be any of the possible values and their associated probabilities – In other words, this can equal any element in a sample space, each with a probability of occurring x is one possible outcome of the random variable (i.e., an event) – For example, x can be 0 books, 1 book, 2 books, etc. Why does this matter? – When we figure out probabilities we are usually concerned with P(xi) since P(X) = 1 – When we figure out expected values (E) or variances (V) we are concerned with X because we want to know the expected values with respect to all possibilities Definition Probability Distribution or Function – “a table, graph, or mathematical function that describes the potential values of a random variable X and their corresponding probabilities” Example Continued Probability Distribution or Function Graph Form Probability : P(x) xi 0 1 2 3 4 Table Form P(xi) 2/10 = 0.2 4/10 = 0.4 2/10 = 0.2 1/10 = 0.1 1/10 = 0.1 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 1 2 Number of Books Question: What do these remind you of from past chapters? 3 4 Key Concept Discrete Random Variables – “The set of possible values (i.e., the sample space) is finite or countably infinite.” Continuous Random Variables – The set of possible values can be any real number in the range of possible values (i.e., infinite possible values) Questions: – What type of random variable is the student / book example? – Can you come up with examples of each? Probability Mass Function Specifies the probability distribution for discrete variables The tables and graphs are examples of the probability mass function (i.e., the probability is “massed” at the discrete possible values) The probability mass function preserves the provision: k P ( X ) P ( xi ) 1 i 1 k = the different values of the discrete variable i = 1, 2, …. k This is identical to the rule from last chapter Example Continued k P( x ) 1 i 1 i k P( x ) P(0) P(1) P(2) P(3) P(4) 1 i 1 i k P( x ) 0.2 0.4 0.2 0.1 0.1 1 i 1 i Expected Values for Discrete Random Variables “E” is the term for expected values E(X) = the expected value of a discrete random variable To calculate E we need the probability distribution (e.g., the probability distribution table) k E ( X ) P( xi ) * ( xi ) i 1 Example Continued For our students & books example k E ( X ) P( xi ) * ( xi ) i 1 E ( X ) P( x0 ) * ( x0 ) P( x1 ) * ( x1 )..... E ( X ) 0.2 * 0 0.4 *1 0.2 * 2 0.1* 3 0.1* 4 E ( X ) 0 0.4 0.4 0.3 0.4 1.5 So if we randomly selected a student we would expect them to have 1.5 books with them Question: does this remind you of any other statistic? Variance Values for Discrete Random Variables “V” is the term for the variance of a probability distribution V(X) = the variance value of a discrete random variable To calculate V we need E and the probability distribution (e.g., the probability distribution table) k V ( X ) [ P( xi ) * ( xi ) 2 ] [ E ( X ) 2 ] i 1 Note: I wrote the equation a little differently than the book to make it clear that you do the sum first and then subtract the E(X)2 Example Continued For out students & books example k E ( X ) [ P( xi ) * ( xi ) 2 ] [ E ( X ) 2 ] i 1 E ( X ) [ P( x0 ) * x02 P( x1 ) * x12 .....] [ E ( X ) 2 ] E ( X ) [0 * 0.2 2 1* 0.4 2 2 * 0.2 2 3 * 0.12 4 * 0.12 ] [1.52 ] E ( X ) [3.7] [2.25] 1.45 As with univariate statistics, the standard deviation is the square root of the variance Discrete Probability Models If we have a census, the probability distribution (table, graph, etc.) is complete/finished/appropriate/accurate If we have a sample, we try to match our sample probability distribution to a known probability distribution Discrete Probability Models Common probability models for discrete random variables include – Uniform distribution – Binomial distribution – Poisson distribution The benefit of using these models is that they have known properties and corresponding equations already made Discrete Uniform Distribution The probability of all possible random variables is equal This equates to a rectangular graph (i.e., flat on top) Discrete Uniform Distribution Equations Probability: 1 P( x) k k Expected: 1 E( X ) x k x 1 Variance: K 2 1 V (X ) 12 Discrete Binomial Distribution There are 2 and only 2 possible outcomes of a statistical experiment (e.g., flipping a coin) Discrete Binomial Distribution Equations Probability: P( x) C (1 ) n x x Expected: E ( X ) n Variance: V ( X ) n (1 ) = the probability of one solution n x Example (from book) Quiz with 10 multiple choice questions Each question has 5 possible answers P(guessing correctly) = 1/5 = 0.2 P(guessing incorrectly) = 4/5 = 0.8 What is the probability of guessing 5 correct answers? Answering this question with what we learned last chapter Imagine we only have 2 questions Since the questions are independent, P(A∩B) = P(A) * P(B) This result is in the upper left box Answering this question with what we learned last chapter Now imagine we add a third question Since the questions are independent P(A∩B∩C) = P(A) * P(B) *P(C) is the upper left box Answering this question with what we learned last chapter So 3 out of 3 right answers = P(RRR) = 0.2 * 0.2 * 0.2 = .008 Follow this out to 5 correct & 5 incorrect – = 0.2*0.2*0.2*0.2*0.2*0.8*0.8*0.8*0.8*0.8 (i.e., 0.25 * 0.85) – = 0.000104858 for one option of 5 R, 5W – Another option would be 4 R, 5 W, 1 R (i.e., 1-4 & 10 correct) How many combinations of 5 Right & 5 Wrong are there? Use combinations rule: C(10,5) = 252 Answer = 252 * 0.000104858 = 0.026 Answer Question Using the New Equation P( x) C (1 ) n x x n x 105 P( x) C 0.2 (1 0.2) 10 5 5 0.026 This answer can also be found in a table in the back of the book on P. 605 Poisson Discrete Distribution Poisson distributions are often used to determine the probability of a number of events (x) occurring in a fixed space or over a fixed period of time But they can be used to determine probabilities for other variables as well (the book used the example of lengths of rope) Poisson distributions are also called the “distribution of rare events” Poisson Discrete Distribution Requirements for using a Poisson distribution – Mutually exclusive events are independent – The probability of an event occurring is small and proportional to the size of the area (or to the length of the interval) – The probability of 2 or more events occurring in a small area or interval is near zero Rules 2 and 3 are where the phrase “distribution of rare events” Poisson Discrete Distribution The parts of a Poisson distribution equation: – λ - The average occurrence of an event in time or space • • • • 8 houses per block 1 hiccup per minute 1 lightening strike per square mile per decade The “answers” to questions will be in the same units as λ (e.g., “per minute”) – e - base of the natural logarithm (e = 2.71828...) – X – the Poisson random variable • Just like the random variable for the other distributions • This is the value for which we determine E and V – x – the values from X for which we find probabilities etc. • E.g., what is the probability of x if x = 2 hiccups per minute? • x can be any positive number Poisson Discrete Distribution Probability: Expected: Variance: e P( x) x! x e x E( X ) x x! x 0 x e 2 V ( X ) [ x E ( X )] x x! x 0 Poisson Discrete Distribution The Poisson discrete distribution is actually a family of distributions – The members of the family relate to one λ each For example: – 1 hiccup per minute uses one family – 2 hiccups per minute uses another family Poisson Discrete Distribution 1 Hiccup Per Minute (i.e., λ = 1) What is the probability of hiccupping 4 times per minute (i.e., x = 4)? e x P( x) x! 2.71828114 P(4) 4! 0.36788 P(4) 16 P(4) 0.01533 This answer can also be found in a table in the back of the book on P. 606 Continuous Random Variables Review: – Continuous Random Variables: The set of possible values can be any real number in the range of possible values (i.e., infinite possible values) For continuous random variables we use probability density functions rather than probability mass functions Probability Density Functions Specify the probability distribution for continuous variables Unlike probability mass functions we used with discrete variables where all the P(xi) added up to 1, with probability density function the area under the curve = 1 Also unlike probability mass functions we aren’t concerned with P(xi) because each xi is a vertical line with an area equal to zero Instead we are concerned with probabilities such as P(x > some amount A) or P(x between values B and C) Probability Density Functions The probability density function of a random continuous variable X is denoted as f(X) The “function” part (i.e., the “f”) relates to the equation that produces a curve (i.e., it is used to graph the line) Conditions satisfied by probability density functions (assume the min and max values are a and b respectively): – f(x) ≥ 0 for a ≤ x ≤ b – The area under f(x) from x = a to x = b = 1 Because we are ultimately concerned with areas under portions of the curve, what type of math do we need? Continuous Probability Distribution Models Probability Distribution Models – As with discrete random variables we usually have a sample rather than a census – To calculate probabilities from a sample we assume the data conform to some known distribution for which we have handy tables – This is how we avoid having to do calculus Continuous Probability Models Common probability models for continuous random variables include – Uniform distribution • Rectangular distribution – Normal distribution (a.k.a. Gaussian) • The “bell shaped curve” Uniform Continuous Distribution Probability: P(c to d) = (d-c) / (b-a) 1 Distribution Function: f ( x) ba Expected: Variance: given a ≤ c ≤ d ≤ b for a ≤ x ≤ b ba E( X ) 2 2 (b - a) V (X ) 12 Normal Probability Distribution Probability: convert to z-scores first (explained in a few slides) Distribution Function: 1 f ( x) e 2 Expected: E( X ) Variance: V (X ) 2 Notes: = 3.14…… 1 x 2 2 Features of the Normal Probability Distribution The mean, median, and mode values are all equal to the peak of the distribution The distribution is symmetrical ½ of the curve is above the mean and viceversa Z scores To avoid having to calculate probabilities for curves with varying μ, σ, and shapes we can convert any normally distributed random variable to a standard form for which we have tables To do this we use the z transformation: zx x This conversion changes our variable measured in units x (e.g., meters, miles, pounds) to units of z (i.e., standard deviation units) Example If we have a normally distributed dataset of bowling scores with μ = 150, σ = 10, what is the z-score of 175? z175 x 175 150 25 2.5 10 10 – What does a z-score of 2.5 mean? – Answer: the value of 175 is 2.5 standard deviations above the mean for this particular normal probability distribution Probability z-scores Remember that one of the reasons we calculate zscores is to ask questions about probability For example what is the probability of bowling over 175 given our previous example? To answer this question it is easiest to use a z-table like the one on page 207 of your book Using Standard Normal Probabilities (i.e., a z-table) The table in our book is atypical of what you usually see, but more user friendly thanks to the pictures For our bowling example, find the z-score (2.5) in the column on the left Now choose the column of interest, in this case column #3: P(Z > z) The probability value we get from the table is 0.006 – This means that the probability of bowling over 175, for our fictional dataset, is 0.006 (i.e., 0.6%) More Conventional Z tables Normally we see z-tables with the following characteristics: – 2 digits of precision in the far left column – 1 additional digit of precision in each of the 10 other columns to the right – A value indicating a one-directional probability (i.e., the total probability of values less than a z-score) • This is equivalent to the 5th column in the z-table in our book Bivariate Random Variables Now we turn our attention to the relationship between two variables (hence the name “bivariate”) The random variables can be discrete or continuous Most of the following slide & equations should look very familiar to those from chapter 5 Bivariate Probability Functions Conditions: – 0 ≤ P(x,y) ≤ 1 – P( x, y) 1 x y For 2 discrete random variables (x & y) it is useful to set up a contingency table These contingency tables are just like those for 2 events and they may contain actual counts or probabilities Example (from book) 100 households sampled and asked how many people are in the household (x) and how many cars are owned by members of the household (y) Our book did you the disservice of switching the x and y axes The data can be summarized in the following table: Cars (y) 0 1 2 3 2 10 8 3 2 Household Size 3 7 10 6 3 (x) 4 4 5 12 6 5 1 2 6 15 Marginal Totals Marginal Probabilities are the sums of the rows and columns The marginal totals for household size (x) are in red The marginal totals for cars (y) are in blue The total number of households sampled is in green Cars (y) 0 1 2 3 2 10 8 3 2 23 Household Size 3 7 10 6 3 26 (x) 4 4 5 12 6 27 5 1 2 6 15 24 22 25 27 26 100 Marginal Probabilities All probabilities are just the totals from each box (see last slide) divided by the total number of households (100) The marginal probabilities for household size (x) are in red The marginal probabilities for cars (y) are in blue The sum of each set of marginal probabilities green Cars (y) 0 1 2 3 2 .1 .08 .03 .02 .23 Household Size 3 .07 .1 .06 .03 .26 (x) 4 .04 .05 .12 .06 .27 5 .01 .02 .06 .15 .24 .22 .25 .27 .26 1.0 Conditional Probabilities Equation: P( x, y ) P( x | y ) P( y ) For Example: what is the probability of having a household size of 4 (x=4) given the household has 3 cars (y=3)? 0.06 P(4 | 3) 0.231 0.26 Covariance “Covariance is a direct statistical measure of the degree to which two random variables X and Y tend to vary together” Covariance is positive when X and Y increase together (and therefore decrease together) – Ex. the amount of ice cream you eat and the temperature outside Covariance is negative when X and Y are inversely related – Ex. the number of layers of clothes you tend to wear and the temperature outside When there is no pattern the covariance is close to zero Covariance Covariance Equation: C ( X , Y ) P( x, y )[ x E ( X )][ y E (Y )] x y Note 1: There is another option for calculating covariance in your book Note 2: there are also nice tables showing how you would go about calculating these values on page 202 & 203 Covariance Covariance Equation: C ( X , Y ) P( x, y )[ x E ( X )][ y E (Y )] x y Parts of this equation: – The P(x,y) values come from the covariance table – The E(X) and E(Y) values are calculated using this formula (from about 40 slides ago): k E ( X ) P( xi ) * ( xi ) i 1 Covariance Example (from book) First we need to calculate the expected (E) values: k E ( X ) P( xi ) * ( xi ) i 1 We can use the marginal probabilities for this E ( X ) 2 * 0.23 3 * .26 4 * .27 5 * .24 For E(X) multiply the xi by the row totals – (i.e., orange # * red #) E ( X ) .46 .78 1.08 1.2 3.52 For E(Y) multiply the yi by the column totals – (i.e., purple # * blue #) k E (Y ) P( yi ) * ( yi ) 0 1 2 3 2 .1 .08 .03 .02 .23 3 .07 .1 .06 .03 .26 4 .04 .05 .12 .06 .27 5 .01 .02 .06 .15 .24 .22 .25 .27 .26 1.0 i 1 E (Y ) 0 * 0.22 1* .25 2 * .27 3 * .26 E (Y ) 0 .25 .54 .78 1.57 Covariance Example (book is wrong again!) (x,y) P(x,y) X-E(X) Y-E(Y) [x-E(X)][y-E(Y)] P(x,y) [x-E(X)][y-E(Y)] 2,0 .1 -1.52 -1.57 2.3864 0.23864 2,1 .08 -1.52 -0.57 0.8664 0.069312 2,2 .03 -1.52 0.43 -0.6536 -0.019608 2,3 .02 -1.52 1.43 -2.1736 -0.043472 3,0 .07 -0.52 -1.57 0.8164 0.057148 3,1 .1 -0.52 -0.57 0.2964 0.02964 3,2 .06 -0.52 0.43 -0.2236 -0.013416 3,3 .03 -0.52 1.43 -0.7436 -0.022308 4,0 .04 0.48 -1.57 -0.7536 -0.030144 4,1 .05 0.48 -0.57 -0.2736 -0.01368 4,2 .12 0.48 0.43 0.2064 0.024768 4,3 .06 0.48 1.43 0.6864 0.041184 5,0 .01 1.48 -1.57 -2.3236 -0.023236 5,1 .02 1.48 -0.57 -0.8436 -0.016872 5,2 .06 1.48 0.43 0.6364 0.038184 5,3 .15 1.48 1.43 2.1164 0.31746 Sum 0.6336 Independence As with events (e.g., A and B) from last chapter, x and y are independent if P(x,y) = P(x)P(y) for all values of x and y Independence and covariance are closely related, but not the same – Independent variable will have a covariance of 0 – But random variables with a covariance of 0 may not be independent Problems With Covariance The sign (+ or -) of the calculated covariance is meaningful, but not the magnitude This is because the covariance is dependent on the scale of the input data Therefore if we multiplied x or y by 10 and recalculated the covariance it will have changed even though the relationship between x and y, strictly speaking, is the same Correlation Coefficient The correlation coefficient is a standardized statistic that measures the relationship between random variables Correlation coefficients range from -1 to 1 – 1 is a positive relationship (both ↑ or ↓ together) – -1 is an inverse relationship (one ↑ while the other ↓) – 0 suggests, but doesn’t guarantee independence Unlike covariance the scale of the data does not matter Correlation Coefficient In chapter 6 the book introduces this statistics for with the assumption that the population covariance (C) & standard deviation (σ) are known or can be calculated The correlation coefficient for a sample is discussed in chapter 12 Correlation Coefficient Equation: xy C( X ,Y ) x y