* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download x, y
Survey
Document related concepts
Transcript
Discrete Distributions Random Variable A random variable X is a function that maps the possible outcomes of an experiment to real numbers. That is X: C --> R, where C is the set of all outcomes of an experiment and R is the set of real numbers. The space of X is the set of real numbers S = {x: X(c)= x, c C } An Example of Random Variable If we toss a coin one time, then there are two possible outcomes, namely “head up” and “tail up”. We can define a random variable X that maps “head up” to 1 and “tail up” to 0. We also can define a random variable Y that maps “head up” to 0 and “tail up” to 1. The spaces of both random variables X and Y are {0,1}. Further Illustration of Random Variables A random variable corresponds to a quantitative interpretation of the outcomes of an experiment. For example, a company offers its employees a drawing in its yearend party. A computer will randomly select an employee for the first prize of $100,000 based on the employees’ ID number, which ranges from 1 to 100. In addition, the computer will randomly select two more employees for the second and third prizes of $50,000 and $10,000, respectively. Assume that each employee can receive only one award and the drawing starts with the third prize and ends with the first prize. Then, there are totally 100 × 99 × 98 = 970200 possible outcomes. To Edward, whose employee ID number is 10, the random variable of his interest is as follows: X(<10, *, *>) = 10,000 X(<*, 10, *>) = 50,000 X(<*, *, 10>) = 100,000 X(all other outcomes) = 0 To Grace, whose employee ID number is 30, the random variable of her interest is as follows: Y(<30, *, *>) = 10,000 Y(<*, 30, *>) = 50,000 Y(<*, *, 30>) = 100,000 Y(all other outcomes) = 0 The outcome spaces of random variables X and Y are identical. However, X and Y map some outcomes to different real numbers. The spaces of X and Y are also identical and both are {0, 10000, 50000, 100000}. The probability functions of X and Y are also equal. Prob(X=10,000) = Prob(Y=10,000) = 0.01 Prob(X=50,000) = Prob(Y=50,000) = 0.01 Prob(X=100,000) = Prob(Y=100,000) = 0.01 Prob(X=0) = Prob(Y=0) = 0.97 The expected values of X and Y are equal to E[X] = E[Y] = 10,000 * 0.01 + 50,000 * 0.01 + 100,000 * 0.01 = 1600. Discrete Random Variables Given a random variable X, let S denote the space of X. If S is a finite or countable infinite set, then X is said to be a discrete random variable. Countable Infinite A set is said to be countable infinite, if it contains infinite number of elements and there exists a one-to-one mapping between each element of the set and the positive integers. Examples of Countable / Uncountable Infinite The set of integer numbers is countable. The set of fractional numbers is countable. The set of real numbers is uncountable. Probability Mass Function The probability mass function (p.m.f.) of a discrete random variable X is defined to be PX k Prob X k Prob(q), qQk where Qk contains all outcomes that are mapped to k by random variable X . In the previous example of drawing, PX 10,000 Prob X 10,000 Prob( 10, i, j ) 10,i , j i 10, j 10 i j 10,i , j i 10, j 10 i j 1 0.01. 100 99 98 In fact,the p.m.f. of a random variable is defined on a set of events of the experiment conducted. In the previous drawing example, the set of outcomes that are mapped to 10,000 by X is an event. Furthermore, in the previous drawing example, random variables X and Y map some outcomes to different real numbers. However, X and Y have the same distribution, i.e. the p.m.f. of X and the p.m.f. of Y are equal. More precisely, PX (k ) PY (k ) for every k {0,10000, 50000, 100000}. Properties of the Probability Mass Function The p.m.f. of a random variable X satisfies the following three properties: 1 PX x 0 , x S : the space of X. If S is finite, then PX x 0. 2 PX xi 1 . 3 P x , xi S Prob A x j A X j where A S . Probability Distribution Function For a random variable X, we define its probability distribution function F as FX t Prob X t Properties of a Probability Distribution Function 1. lim FX t 1 . t 2 . lim FX t 0 . t 3 . FX w FX t , if w t . Any function that satisfies these conditions above can be a distribution function. An Example of the Probability Distribution Function of a Discrete Random Variable Assume that we toss a 4sided die twice. Then, we have 16 possible outcomes: , 1,2 , 1,3 , 1,4 , 2,1 , 2,2 , 2,3 , 2,4, 1,1 , 3,2 , 3,3 , 3,4 , 4,1 , 4,2 , 4,3 , 4,4 3,1 Let random variable X be the sum of the outcome. Then, Prob X 2 1 2 , Prob X 3 16 16 3 4 Prob X 4 , Prob X 5 16 16 3 2 Prob X 6 , Prob X 7 16 16 1 Prob X 8 . 16 1 2 3 4 5 FX 5 Prob X 5 . 16 16 16 16 8 Operations of Random Variables Let X and Y be two random variables defined on the same outcome space of an experiment. Then, we can define a new random variable Z=f(X,Y). For example, in the example of drawing, if Edward and Grace are husband and wife, then we can define a new random variable Z=X+Y. We have X(<30, 10, *>) = 50,000 Y(<30, 10, *>) = 10,000 Z(<30, 10, *>) = 60,000 Function of Random Variables Let X be a random variable and G be a function. Then, random variable Y=G(X)maps an outcome ν in the outcome space of X to value G(X(ν)). With respect to the probability distribution functions, if G(X) is monotonically increasing, oneto-one mapping, then FY t ProbY t ProbG X t ProbX G 1 t FX G 1 t An Example of Functions of Random Variables Let random variable X be the sum of two tosses of a 4sided die and Y=X2. Then, FY 16 ProbY 16 Prob X 16 Prob X 4 FX 4. 2 6 3 PX 4 PX 3 PX 2 . 16 8 Expected Value of a Discrete Random Variable Let X be a discrete random variable and S be its space. Then, the expected value of X is EX Pr ob( z ) X ( z ) PX xi xi zC xi S μ is a widely used symbol for expected value. Expected Value of a Function of a Random Variable Let X be a random variable and G be a function. Then, the expected value of random variable Y G X is equal to Gx P x xi S i X i Expected Value of a Function of a Random Variable Proof : ' E Y PY yi yi , where S is the space of Y . yi S ' ProbY y y i i yi S ' ProbX x Gx j yi S ' all x j such that G x j yi P x Gx . x j S X j j j For example, let X correspond to the outcome of tossing a die once. Then, Px(1)=Px(2)=Px(3)=Px(4)=Px(5)=Px(6)=1/6. and E[X]=3.5 If we are concerned about the difference between the observed outcome and the mean. And define Y=|X-E[X]|, then PY(1/2)=1/3, PY(3/2)=1/3, PY(5/2)=1/3. Therefore, 1 1 3 1 5 1 9 1 3 E[Y ] . 2 3 2 3 2 3 2 3 2 On the other hand, 1 | xi E[ X ] |P x ( xi ) | xi 3.5 | 6 xi xi 1 9 3 (2.5 1.5 0.5 0.5 1.5 2.5) . 6 6 2 Theorems about the Expected Value (a)If c is a constant, Ec c. (b)If c is a constant and g is a function, Ecg X cEg X (c)If c1 and c2 are constants and g1 and g2 are functions, then Ec1g1 X c2 g2 X c1Eg1 X c2 Eg2 X . Theorems about the Expected Value Proof of (a): Trivial. Proof of (b): E cg X cg x P x ,where S is the space xi S i X i of X and PX x is the p.m.f of X. c g xi P X xi xi S cE g X Theorems about the Expected Value Proof of (c): Ec1 g1 X c2 g 2 X c1 g1 xi c2 g 2 xi PX xi xi S c1 g1 xi PX xi c2 g 2 xi PX xi xi S xi S c1Eg1 X c2 Eg 2 X . An extension of (c) k k E ci g i X ci Eg i X . i 1 i 1 Variance of a Discrete Random Variable The variance of a random variable is defined to be E X 2 and is typically denoted by σ2. For a discrete random variable X, 2 Var X E X E X 2 2 X 2 E X E X 2 2 E X 2 2 2 . σ is normally called the standard deviation. Variance of a Discrete Random Variable Let X be a random variable with mean μX and variance σX2. Let Y= aX+b, where a and b are constants. Then, EY EaX b aEX b a X b E aX b a b E a X a E X a VarY E Y y 2 2 X 2 2 X 2 2 X 2 2 X . Variance of a Random Variable The variance of a random variable measures the deviation of its distribution from the mean. For example, in one drawing, Robert has 0.1% of chance to win $100,000, while in another drawing, he has 0.01% of chance to win $1,000,000. The expected amounts of award in these two drawings are equal. 0.001 * 100000 = 100 0.0001 * 1000000 = 100 However, their variances are different. 0.001 * (100000 – 100)2 + 0.999 * (0 – 100)2 = 9,990,000 0.0001 * (1000000 – 100)2 + 0.999 * (0 – 100)2 = 99,990,000 In many distributions, the mean and variance together uniquely determine the parameters of the random variables. The Bernoulli Experiment and Distribution A Bernoulli experiment is a random experiment, the outcome of which can be classified in one of two mutually exclusive and exhaustive ways, say, success and failure. A sequence of Bernoulli trials occurs when a Bernoulli experiment is performed several independent times, so that the probability of success, say p, remains the same from trial to trial. The Bernoulli Distribution Let X be a Bernoulli random variable. The p.m.f of X can be written as PX k p 1 p 1 k k , where k= 0 or 1 and p is the probability of success. The expected value of X is 1 k kp 1 p 1 k p. k 0 The variance of X is 1 2 k 1 k k p p 1 p p1 p . k 0 The Binomial Distribution Let X be the random variable corresponding to the number of successes in a sequence of Bernoulli trials. n k Then, PX k Prob X k Ck p 1 p nk where n is the number of Bernoulli trials and p is the probability of success in one trial. X is said to have a binomial distribution and is normally denoted by b(n , p). , Example of the Binomial Distribution Assume that Tiger and Whale are the two teams that enter the Championship series of the professional basket ball league. Based on prior records, Tiger has a 60% chance of beating Whale in a single game. Larry, who is a fan of Tiger, makes a bet with Peter, who is a fan of Whale. According to their agreement, Larry will pay Peter $1000, should Whale win the 5-game series. In order to make a fair bet, how much should Peter pay Larry, if Tiger wins the series? The probability that Tiger wins the series is C35 (0.6)3 (0.4) 2 C45 (0.6) 4 (0.4) C55 (0.6)5 0.6826 Z * 0.6826 1000 * (1 0.6826) Z 465. If the championship series consists of 3 games, then what is the probability that Tiger win the series? C (0.6) (0.4) C (0.6) 3 2 2 0.648 0.6826 3 3 3 The Moment-Generating Function Let X be a discrete random variable with p.m.f PX x and space S. If there is a positive number h such that e E etX txi xi S PX xi exists and is finite for -h<t<h, then the function of t defined by M t E e tX is called the moment-generating function of X. and often abbreviated as m.g.f. The Moment-Generating Function Let X and Y be two discrete random variables with the same space S. If E etX E etY , then the probability mass functions of X and Y are equal. Insight of the argument above: Assume that S={s1,s2, …, sk}contains only positive integers. Then, we have PX s1 ets1 PX s2 ets2 ... PX sk etsk PY s1 ets1 PY s2 ets2 ... PY sk etsk . Therefore, PX si PY si , i.e. X and Y have the same p.m.f. The Moment-Generating Function Let M X t be the m.g.f of a discrete random variable X. d K M X t k txi xi e PX xi . K dt xi S Furthermore, d K M X 0 k k x P x . E X . i X i K dt xi S In particular, X M X 0 and M X 2 X 0 M X 0 . 2 The Moment-Generating Function of the Binomial Distribution Let X be b(n , p). E X n n k kC p k 1 p n k k 0 n n! n k p k 1 p k 0 k 1! n k ! k n E X 2 2 Ckn p k 1 p n k k 0 n n! k n k p k 1 p k 0 k 1! n k ! are both difficult to compute. On the other hand, we can easily derive the m.g.f. of a binomial distribution. e M X t E e n tX k 0 Ckn pe t 1 p n k 0 tk k nk Ckn p k 1 p nk n pe t 1 p . The Moment-Generating Function of the Binomial Distribution M X t n pe 1 p pet n2 n 1 t t 2 t t M X t nn 1 pe 1 p pe npe pe 1 p M X 0 np M X 0 nn 1 p 2 np. n 1 t Therefore, X M X 0 np 2 σ X M X 0 M X 0 n 2 p 2 np 2 np n 2 p 2 2 np1-p . The Poisson Process A Poisson process models the number of times that a particular type of events occur during a time interval. The Poisson process is based on the following 3 assumptions: (1)The numbers of event occurrences in non-overlapping intervals are independent. lim Prob(one occurrence (2) t 0 between times t and t t )= t. The Poisson Process lim (3) t 0 Prob(two occurrences between times t and t t )= 0. λ is the only parameter of the Poisson process. One example of the Poisson process is to model the number of Web accesses that a Web server receives between 8 AM and 9 AM. The Basis of the Assumptions of the Poisson Process Assume that an ideal random number generator generates λ numbers in [0, 1]. If we divide [0, 1] evenly into n subintervals,then the probability that there is exactly one of the number generated in [0, 1/n] is 1 C1 1 n 1 n 1 1 1 1 . n n The Basis of the Assumptions of the Poisson Process The probability that there are exactly two of the numbers generated in [0, 1/n] is 2 1 1 C2 1 n n 2 1 1 1 2 2n n 2 . Let t 1 / n .Then, 1 lim Prob one occurrence in 0, t lim t 1 t 0 n n lim Prob two occurrences in 0, t lim t 0 n 1 2 1 t. t 1 1 n 2 2 1 2 t 2 . The Poisson Distribution Assume that we are concerned about a Poisson process with parameter λ and want to count the number of event occurrences during one time interval. We can divide the time interval evenly into n subintervals as the following figure shows. 1/n Time=0 Time=1 The Poisson Distribution The probability that the event occurs k times during the time interval is lim Ckn 1 n n n k nk lim n k! n Since k 1 1 nk n n k n lim 1 n n n! lim n k!n k ! n 1 k lim n k! 1 k 1 and n n k n lim 1 e , n n k the final result is e . k! k 1 1 n , n n k n The Poisson Distribution We say that a random variable X has a Poisson distribution, if PX k k k! e . By the Maclaurin’s series, we have 1 k e . k 0 k! Therefore, k P k e k! e k 0 X k 0 e 1. The Poisson Distribution The moment-generating function of a random variable with the Poisson distribution is M X t E e Xt e k 0 MX MX e t e kt k e k! k 0 k e k! t e e e t 1 t t e t 2 e e t e e t 1 e e t 1 e t e e t 1 The Poisson Distribution Therefore, X M X 0 and X 2 M X 0 M X 0 2 2 . Therefore, λ is the average rate of event occurrence per unit of time. Let Y be the random variable corresponding to the number of event occurrences during a time interval of length t. Then, k t PY k k! e t . 2 The Poisson Distribution The probability that the event occurs k times during a time interval of length t is t n lim Ck n n lim n t 1 n n k t k 1 t n 1 t k k! t k t e k! k n n Joint Distributions Joint Probability Mass Function Let X and Y be two discrete random variables defined on the same outcome set. The probability that X=x and Y=y is denoted by PX,Y(x, y)= Prob(X=x,Y=y) and is called the joint probability mass function(joint p.m.f) of X and Y. PX,Y(x, y) satisfies the the following 3 properties: (1) 0 PX ,Y x, y 1 (2) P x, y 1 x , y S X ,Y (3) Pr ob X , Y A P x, y , x , y A where A is a subset of S S. X ,Y Example of Joint Distributions Assume that a supermarket collected the following statistics of customers’ purchasing behavior: Purchasing Wine Not Purchasing Wine Male 45 255 Female 70 630 Purchasing Juice Not Purchasing Juice Male 60 240 Female 210 490 Example of Joint Distributions Let random variable M correspond to whether a customer is male, random variable W correspond to whether a customer purchases wine, random variable J correspond to whether a customer purchases juice. The joint p.m.f of M and W is W PMW (0,1) = 0.07 PMW (1,1) = 0.045 M PMW (0,0) = 0.63 PMW (1,0) = 0.255 The joint p.m.f of M and J is W PMW (0,1) = 0.21 PMW (1,1) = 0.06 M PMW (0,0) = 0.49 PMW (1,0) = 0.24 Marginal Probability Mass Function Let PXY(x,y) be the joint p.m.f. of discrete random variables X and Y. PX x Pr ob X x Pr obX x, Y yi yj PXY ( x, yi ) yj is called the marginal p.m.f of X. Similarly, PY y PX ,Y xi , y xi is called the marginal p.m.f. of Y. More on Joint Probability Mass Function Note that we can always create a common outcome set for any two or more random variables. For example, let X and Y correspond to the outcomes of the first and second tosses of a coin, respectively. Then, the outcome set of X is {head up, tail up} and the outcome set of Y is also {head up, tail up}. The common outcome set of X and Y is {(head up,head up),(head up,tail up),(tail up,head up),(tail up,tail up)}. Independent Random Variables Two discrete random variables X and Y are said to be independent if and only if for all possible combination of x and y PX ,Y x, y PX x PY y . Otherwise, X and Y are said to be dependent. Example of Independent Random Variables Assume that a supermarket collected the following statistics of customers’ purchasing behavior: Purchasing soft drinks Not purchasing soft drinks Male 90 210 Female 210 490 Example of Independent Random Variables Let random variable M correspond to whether a customer is male or not and random variable S correspond to whether a customer purchases soft drinks or not. Then, M and S are independent, since for all possible combinations of the values of M and S, we have Prob(M=i,S=j)=Prob(M=i)Prob(S=j). Another Example of Joint Distribution Object X Y Class Object X Y Class 1 7.1 9.1 1 11 10.9 8.8 2 2 6.7 10.2 1 12 10.8 10.3 2 3 7.5 10.6 1 13 11.1 11 2 4 7.6 8.8 1 14 12.3 9.1 2 5 8.1 10.3 1 15 12.1 9.7 2 6 8.0 11.0 1 16 12 10.9 2 7 8.6 8.9 1 17 13.1 8.9 2 8 8.7 9.8 1 18 12.8 10.1 2 9 9.2 11.2 1 19 13.2 11.3 2 10 6.5 10.1 1 20 13.7 9.9 2 Average 7.8 10.0 - Average 12.2 10.0 - Joint p.m.f. of X, Y, and C 12 11 2 1 1 2 2 1 1 11 10 2 2 1 1 9 2 1 1 2 2 2 2 8 6 8 10 12 14 Joint p.m.f. of X and C 2 22 2 1 1 1 1 11 11 11 22 2 2 22 2 1 0 6 8 10 12 14 Joint p.m.f. of Y and C 2 22 2 1 11 1 2222 22 2 1 111 1 1 1 0 6 8 10 12 14 Joint Distribution Function Let X and Y be two random variables. The joint distribution function is defined as follows: FXY(x,y)=Prob( X≤x, Y≤y). Note that this definition applies to both discrete and continuous random variables. Joint Probability Density Function Assume that X and Y be two continuous random variables defined on the same space S. The joint probability density function of X and Y is defined as follows: x, y to be independent if and Fxysaid Xf and Y are xy x, y only if xy f XY x, y f X x fY y . In some text books, it is defined that two random variables are independent, if and only if We have FXY x, y FX x FY y . FXY x, y FX x FY y f XY x, y xy xy f X x fY y . The marginal p.d.f of X is f X x f XY x, y dy and the marginal p.d.f of Y is fY y f x, y dx XY Jointly Independent and Pairwise Independent Note that, even we have PX,Y (x,y) = PX (x)PY (y) PY,Z (y,z) = PY (y)PZ(z) PX,Z (x,z) = PX (x)PZ (z) Then, it is not necessary true that PX,Y,Z (x,y,z) = PX (x)PY (y) PZ (z) An Example of Pairwise Independence Let X and Y are two random variables that correspond to tossing a unbiased coin two times. Let Z = X Y. Then Prob(Z=0) = Prob(X=0,Y=0) + Prob(X=1,Y=1) = ½ Prob(X=0,Z=0) = Prob(X=0,Y=0) = ¼ = Prob(X=0)Prob(Z=0). Therefore, X, Y and Z are pairwise independent. However, Prob(X=0,Y=0,Z=1) = 0 and Prob(X=0)Prob(Y=0)Prob(Z=1)= 1/8 Hence, X, Y and Z are not jointly independent. On the other hand, jointly independent implies pairwise independent. For example, PX ,Y x, y PX ,Y ,Z x, y , z z PX x PY y PZ z z PX x PY y PZ z PX x PY y . z Addition of Two Random Variables Let X and Y be two random variables. Then, E[X+Y]=E[X]+E[Y]. Note that the above equation holds even if X and Y are dependent. Proof of the discrete case : E[ X Y ] PXY ( x, y )( x y ) x y x PXY ( x, y ) y PXY ( x, y ) x y x y x PXY ( x, y ) y PXY ( x, y ) x y y x xPX ( x) yPY ( y ) E[ X ] E[Y ] x y On the other hand, Var[ X Y ] E[(( X Y ) ( x y )) 2 ] E[( X Y ) 2 ( x y ) 2 2( X Y )( x y )] E[( X Y ) 2 ] ( x y ) 2 2( x y ) 2 E[ X ] E[Y ] 2 E[ XY ] x y 2 x y 2 2 2 2 ( E[ X ] x ) ( E [Y ] y ) 2( E[ XY ] x y ) 2 2 2 2 Var[ X ] Var[Y ] 2( E [ XY ] E[ X ]E[Y ]) # Note that if X and Y are independent, then E[ XY ] xyPXY ( x, y ) x y xyPX ( x) PY ( y ) x y xPX ( x) yPY ( y ) x y E[ X ]E[Y ] Therefore, if X and Y are independent, then Var[X+Y]=Var[X]+Var[Y]. Covariance Let X and Y be two random variables. Then, E[(X-µX)(Y- µY)] is called the covariance of X and Y, and is denoted by σXY, where µX and µY are the means of X and Y, respectively. Covariance E[(X-µX)(Y- µY)] = E[XY- µYX- µXY+ µXµY] = E[XY]- µYE[X]- µXE[Y]+E[µXµY] = E[XY]- µXµY Therefore, if X and Y are independent, then Cov[X,Y]=0. Examples of Correlated Random Variables Assume that a supermarket collected the following statistics of customers’ purchasing behavior: Purchasing Wine Not Purchasing Wine Male 45 255 Female 70 630 Purchasing Juice Not Purchasing Juice Male 60 240 Female 210 490 Examples of Correlated Random Variables Let random variable M correspond to whether a customer is male, random variable W correspond to whether a customer purchases wine, random variable J correspond to whether a customer purchases juice. The joint p.m.f of M and W is W PMW (0,1) = 0.07 PMW (1,1) = 0.045 M PMW (0,0) = 0.63 PMW (1,0) = 0.255 Cov(M,W)= E[MW]-E[M]E[W] = 0.045 – 0.3*0.115 = 0.0105 >0 M and W are positively correlated. The joint p.m.f of M and J is W PMW (0,1) = 0.21 PMW (1,1) = 0.06 M PMW (0,0) = 0.49 PMW (1,0) = 0.24 Cov(M,J)= E[MJ]-E[M]E[J] = 0.06 – 0.3*0.27 = -0.021 < 0 M and J are negatively correlated. Covariance of Independent Random Variables Assume that the supermarket also collected the following statistics of customers’ purchasing behavior: Purchasing soft drinks Not purchasing soft drinks Male 90 210 Female 210 490 The joint p.m.f of M and S is S PMS (0,1) = 0.21 PMWS(1,1) = 0.09 M PMS (0,0) = 0.49 PMWS(1,0) = 0.21 Cov(M,S)= E[MS]-E[M]E[S] = 0.09 – 0.3*0.3 = 0, due to the fact that M and S are independent. Correlation Coefficient The correlation coefficient of two random variables X and Y is defined as follows: cov( X , Y ) XY Bounds of a Correlation Coefficient Let K (b) E ((Y uY ) b( X u X )) 2 Y2 2b X Y b2 X2 . We have Y 2 2 K ( ) Y (1 ). X Since K(b) is the expected value of a square, K(b) 0 for all b R. Therefore, 1 1. Implication of the Value of the Correlation Coefficient Assume that the supermarket collected the following statistics of customers’ purchasing behavior: Purchasing cosmetics Not purchasing cosmetics Male 10 290 Female 260 440 Implication of the Value of the Correlation Coefficient Let random variable M correspond to whether a customer is male, random variable C correspond to whether a customer purchases cosmetics. Then, the correlation coefficient of M and C is -0.349. Implication of the Value of the Correlation Coefficient On the other hand, we also have the following dataset: Purchasing juice Not purchasing juice Male 60 240 Female 210 490 The correlation coefficient of M and J is -0.103. Another Example of Correlation Coefficient Object X Y Class Object X Y Class 1 7.1 9.1 1 11 10.9 8.8 2 2 6.7 10.2 1 12 10.8 10.3 2 3 7.5 10.6 1 13 11.1 11 2 4 7.6 8.8 1 14 12.3 9.1 2 5 8.1 10.3 1 15 12.1 9.7 2 6 8.0 11.0 1 16 12 10.9 2 7 8.6 8.9 1 17 13.1 8.9 2 8 8.7 9.8 1 18 12.8 10.1 2 9 9.2 11.2 1 19 13.2 11.3 2 10 6.5 10.1 1 20 13.7 9.9 2 Average 7.8 10.0 - Average 12.2 10.0 - Joint p.m.f. of X, Y, and C 12 11 2 1 1 2 2 1 1 11 10 2 2 1 1 9 2 1 1 2 2 2 2 8 6 8 10 12 14 Joint p.m.f. of X and C 2 22 2 1 1 1 1 11 11 11 22 2 2 22 2 1 0 6 8 10 12 14 Joint p.m.f. of Y and C 2 22 2 1 11 1 2222 22 2 1 111 1 1 1 0 6 8 10 12 14 Another Example of Correlation Coefficients The correlation coefficient of X and C is E[ XC ] E[ X ]E[C ] X C 16.1 10 1.5 0.925. 2.379 0.5 On the other hand, the covariance of Y and C is E[YC]-E[Y}E[C] = 15-10×1.5 =0 and therefore the correlation coefficient of Y and C is 0. With respect to data analysis, random variable X provides valuable information about the class of an object. On the other hand, random variable Y essentially provides no information about the class of an object. Example of Uncorrelated Random Variables Assume X and Y have the following joint p.m.f PXY(0,1)= PXY(1,0)= PXY(2,1)= 1/3 We have the following marginal p.m.f.s PX 0 PXY (0, y ) 1 / 3 ; PX 1 PXY (1, y ) 1 / 3 y y PX 2 PXY (2, y ) 1 / 3 ; PY 0 PXY ( x,0) 1 / 3 x PY 1 PXY ( x,1) 2 / 3 x x Example of Uncorrelated Random Variables Since PXY(0,1) = 1/3 PX(0) x PY(1) = 1/3 x 2/3 = 2/9, X and Y are not independent. However, Cov (X, Y) = E[XY] – E[X]E[Y] = [2/9 x 1 + 2/9 x 2] – [1 x 2/3] = 0. Therefore, independence implies uncorrelated, but the inverse is not true.