Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability and Statistical Review Lecture 1 Manoranjan Majji Lecture slides and notes available online: Visit http://dnc.tamu.edu/Class Notes/AERO626/index.php Probability and Statistical Review Probability –Motivating Example –Definition of Probability –Axioms of Probability –Conditional Probability –Bayes’s Theorem –Random Variables Discrete and Continuous –Expectation of Random Variables –Multivariate Density Functions 2 Basic Probability Concepts Probabilities are numbers assigned to events that indicate “how likely” it is that event will occur when a random experiment is performed. – The statement “E has probability P(E)” then means that if we perform the experiment very often, it is practically certain that the relative frequency is approximately equal to P(E). What do we mean by Relative Frequency? – The relative frequency is at least equal to 0 and at most equal to 1. 0 P( E ) 1 – Frequency Function: It shows how the values of the samples are distributed. f ( x) fj 0 when x x j for any value x not appearing in the sample – Sample Distribution function: F ( x) f (t ) tx 4 Basic Probability Concepts The frequency function characterizes a given sample in detail. – We can compute some numbers that characterize certain properties of the sample. – Sample Mean: m 1 n 1 m x j x j nf ( x j ) x j f ( x j ) n j 1 n j 1 j 1 – Sample Variance: m 1 n 2 ( x j ) ( x j )2 f ( x j ) n j 1 j 1 2 5 Useful Definitions Random Experiment or Random Observation: – It is performed according to a set of rules that determines the performance completely. – It can be repeated arbitrarily often. – The result of each performance depends on “chance” (that is, on influences which we can not control) and can therefore not be uniquely predicted. The result of single performance of the experiment is called the outcome of that experiment. The set of all possible outcomes of an experiment is called the sample space of the experiment. In most practical problems, we are not interested in the individual outcomes of the experiment but in whether an outcome belongs to a certain set of outcomes. Such a set is called an “Event” 6 Useful Definitions Impossible Event: An event containing no element and is denoted by . Mutually Exclusive or Disjoint Events: if A B Example: Let us consider a rolling of a dice. Sample Space: S {1, 2,3, 4,5, 6} E : an event that the dice turns up an even number {2, 4, 6} O: an event that the dice turns up an odd number = {1,3,5} E O E and O are mutually exclusive events. 7 Axioms of Probability Property 1: 0 P( E ) 1. Property 2: P( S ) 1. Property 3: P( E c ) 1 P( E ). Property 4: P( A B) P( A) P( B) P( A B). Property 5: if E1 , E2 , .........., En are mutually exclusive events then P( E1 E2 ........ En ) P( E1 ) P( E2 ) .......... P( En ) 8 Conditional Probability The probability of an event B under the condition that an event A occurs is given by P( A B) P( B / A) P( A) – P(B/A) is called the conditional probability of B given A. AB A B – In this case, event A serves as a new sample space and event B becomes AB. – A and B are called independent events if P( B / A) P( B) P( A / B) P( A) P( A B) P( A) P( B) 9 Theorem of Total Probability. B1 Bn-1 A B2 B3 Bn Let B1, B2………,Bn be mutually exclusive events s.t. n Bi S i 1 The probability of an event A can be represented as: P( A) P( A B1 ) P( A B2 ) .............. P( A Bn ) and, therefore n P( A) P( A / B1 ) P( B1 ) .............. P( A / Bn ) P( Bn ) P( A / Bi ) P( Bi ) i 1 10 Bayes’s Theorem Let us assume there are m mutually exclusive states of nature (classes) labeled j(j=1,2,…..,m). Let P(x) be the probability that an event assumes the specific value x. Definitions: – Prior Probability: P(j). – Posterior Probability: P(j/x) (of class j given observation x) – Likelihood Probability: P(x/j) (conditional probability of observation x given class j). Bayes’s Theorem: gives the relationship between the m prior probabilities P(j), the m likelihood probabilities P(x/j) and one posterior probability of interest. P( j ) P( x / j ) P( j / x) m P(k ) P( x / k ) k 1 11 Exercise Consider a clinical problem where we have to decide if a patient has one particular rare disease on the basis of an imperfect medical test. – 1 in 1000 people have a rare disease. – Test shows positive 99% when a person has disease. – Test shows positive 2% when a person does not have a disease. What is the probability when the test is positive and person actually has the disease? P( B / A1 ) P( A1 ) P( A 1 / B) P( B / A1 ) P( A1 ) P( B / A2 ) P( A2 ) 0.99 0.001 (0.99 0.001) (0.02 0.999) 0.047 0.02 0.98 12 Exercise (continued….) P(A1/B)=0.047=4.7% – seems counter-intuitive…WHY? Most positive tests arise from error than from people actually having the disease. – From prior 0.001 to posterior 0.047. Disease is rare and test is marginally reliable. NOTE: if disease were not so rare (e.g. say 25% incidents) then we would get a good diagnoses. – P(A1/B)=0.94. 13 Random Variables A random variable X (also called stochastic variable) is a function whose values are real numbers and depend on “chance”. More precisely, it is a function X which has following properties: – X is defined on the sample space S of the experiment, and its values are real numbers The function that assigns value to each outcome is fixed and deterministic. – The randomness is due to the underlying randomness of the argument of the function X. Random numbers can be Discrete or Continuous. 14 Discrete Random Variables A random variable X and the corresponding distribution are said to be discrete, if the number of values for which X has non-zero probability is finite. Probability Mass Function of X: f ( x) pj 0 when x x j otherwise Probability Distribution Function of x: F ( x) P( X x) Properties of Distribution Function: 0 F ( x) 1 P(a x b) F (b) F (a ) 15 Continuous Random Variables and Distributions A random variable X and the corresponding distribution are said to be continuous if the distribution function F(x)=P(Xx) of X can be represented by an integral form. x F ( x) f ( y )dy The integrand f(y) is called a probability density function. F '( x) f ( x) Properties: f ( x)dx 1 b P(a X b) F (b) F (a) f ( x)dx a 16 Statistical Characterization of Random Variables Expected Value: – The expected value of a discrete random variable, x is found by multiplying each value of random variable by its probability and then summing over all values of x. Expected value of x: E[ x] xP( x) xf ( x) x x – The expectation value of x is the “balancing point” for the probability mass function of x. That is, it is the arithmetic mean. – We can take an expectation of any function of a random variable. Expected value of g(x) = E[g(x)]= g(x)f(x) x – This balance point is the value expected for g(x) for all possible repetitions of the experiment involving the random variable x. – Expected value of a continuous density function f(x), is given by E ( x) xf ( x)dx 17 Illustration of Expectation A Lottery has two schemes, the First scheme has two outcomes (denoted by 1 and 2)and the second has three (denoted by 1,2 and 3). It is agreed that the participant in the First scheme gets $1, if outcome is 1, $2, if the outcome is 2. The participant in the second scheme gets $3 if the outcome is 1, -$2 if the outcome is 2 and $3 if the outcome is 3. The probabilities of each outcome are listed as follows. p(1, 1) = 0.1; p(1, 2) = 0.2; p(1, 3) = 0.3 p(2, 1) = 0.2; p(2, 2) = 0.1; p(2; 3) = 0.1 Help the investor to decide on which scheme to prefer.[Bryson] 18 Example Let us assume that we have agreed to pay $1 for each dot showing when a pair of dice is thrown. We are interested in knowing, how much we would lose on the average? Values of x Frequency Values of Probability Function 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 5 4 3 2 1 P(x=2) = 1/36 P(x=3) = 2/36 P(x=4) = 3/36 P(x=5) = 4/36 P(x=6) = 5/36 P(x=7) = 6/36 P(x=8) = 5/36 P(x=9) = 4/36 P(x=10) = 3/36 P(x=11) = 2/36 P(x=12) = 1/36 Sum 36 1.00 Probability Distribution Function P(x2) = 1/36 P(x3) = 3/36 P(x4) = 6/36 P(x5) = 10/36 P(x6) = 15/36 P(x7) = 21/36 P(x8) = 26/36 P(x9) = 30/36 P(x10) = 33/36 P(x11) = 35/36 P(x12) = 1 Average amount we pay= (($2x1)+($3x2)+……+($12x1))/36=$7 E(x)=$2(1/36)+$3(2/36)+……….+$12(1/36)=$7 19 Example (Continue…) Let us assume that we had agreed to pay an amount equal to the squares of the sum of the dots showing on a throw of dice. – What would be the average loss this time? Will it be ($7)2=$49.00? Actually, now we are interested in calculating E[x2]. – E[x2]=($2)2(1/36)+……….+($12)2(1/36)=$54.83 $49 – This result also emphasized that (E[x])2 E[x2] 20 Variance of Random Variable Variance of random variable, x is defined as V ( x) 2 E[( x ) 2 ] V ( x) E[ x 2 2 x 2 ] E[ x 2 ] 2( E[ x]) 2 ( E[ x]) 2 E[ x 2 ] ( E[ x]) 2 This result is also known as “Parallel Axis Theorem” 21 Expectation Rules Rule 1: E[k]=k; where k is a constant Rule 2: E[kx] = kE[x]. Rule 3: E[x y] = E[x] E[y]. Rule 4: If x and y are independent E[xy] = E[x]E[y] Rule 5: V[k] = 0; where k is a constant Rule 6: V[kx] = k2V[x] 22 Propagation of moments and density function through linear models y=ax+b – Given: = E[x] and 2 = V[x] – To find: E[y] and V[y] E[y] = E[ax]+E[b] = aE[x]+b = a+b V[y] = V[ax]+V[b] = a2V[x]+0 = a2 2 Let us define z (x ) Here, a = 1/ and b = - / Therefore, E[z] = 0 and V[z] = 1 z is generally known as “Standardized variable” 23 Propagation of moments and density function through non-linear models If x is a random variable with probability density function p(x) and y = f(x) is a one to one transformation that is differentiable for all x then the probability function of y is given by – p(y)=p(x)|J|-1, for all x given by x=f-1(y) – where J is the determinant of Jacobian matrix J. Example: Let y ax 2 and p( x) 1 x 2 exp( x 2 / 2 x2 ) NOTE: for each value of y there are two values of x. p( y ) 1 exp( y / 2a x2 ), y 0 2 x 2 ay and p(y) = 0, otherwise We can also show that E ( y ) a x2 and V ( y ) 2a 4 x4 24 Random Vectors Just an extension to random variable – A vector random variable X is a function that assigns a vector of real number to each outcome in the sample space. Joint Probability Functions: – Joint Probability Distribution Function: F ( X ) P[{X1 x1} {X 2 x2 } ......... {X n xn }] – Joint Probability Density Function: n F ( X ) f ( x) X 1X 2 ...X n Marginal Probability Functions: A marginal probability functions are obtained by integrating out the variables that are of no interest. F ( x) P( x, y ) or y y f ( x, y )dy y 25 Multivariate Expectations Mean Vector: E[ x ] [ E[ x1 ] E[ x2 ] ...... E[ xn ]] Expected value of g(x1,x2,…….,xn) is given by E[ g ( x )] ..... g ( x ) f ( x ) or xn xn1 x1 ..... g ( x) f ( x)dx xn xn-1 x1 Covariance Matrix: cov[ x ] P E[( x )( x )T ] E[ XX T ] T where, S E[ xxT ] is known as autocorrelation matrix. 1 0 0 1 0 0 2 21 NOTE: P R 0 0 n n1 12 1 n 2 1n 1 0 2 n 0 2 1 0 0 0 0 n R is the correlation matrix 26 Covariance Matrix Covariance matrix indicates the tendency of each pair of dimensions in random vector to vary together i.e. “co-vary”. Properties of covariance matrix: – Covariance matrix is square. – Covariance matrix is always +ive definite i.e. xTPx > 0. – Covariance matrix is symmetric i.e. P = PT. – If xi and xj tends to increase together then Pij > 0. – If xi and xj are uncorrelated then Pij = 0. 27 Probability Distribution Function There are many situations in statistics that involves the same type of probability functions. – It is not necessary to derive these results over and over again in each special case with different numbers. We can avoid this tedious process by recognizing the similarities between certain types of apparently unique experiments, and then merely matching a given case to general formula. Examples: – Toss a coin: – Take an exam: – analyzing stock market: Head or Tail Pass or Fail Up or Down So all the above written processes can be distinguished by only two events “Success” and “Failure” 28 Binomial Distribution Binomial Distribution plays an important role in experiments involving repeated independent trials each with just two possible outcomes. – Independent trials means result of one trial cannot influence the result of other trials. – Repeated trials means the probability of “success” or “failure” does not change with trials. In binomial distribution, we are interested in the probability of receiving a certain number of successes. Let us assume that we have n independent trials, each trial having same probability of success say p – probability of failure: q = 1-p Let us say we are interested in determine the probability of x successes in n trials. – Find the probability of any one occurrence of this type and then multiply this value by number of possible occurrences. 29 Binomial Distribution One of the possible occurrence is: SS ......S FF ......F x times n x times Joint probability of this particular sequence is given by p x q n x NOTE: pxqn-x represents the probability not only of our one arrangement but of any possible arrangement of x successes and n-x failures. How many occurrences of n successes and n-x failures are possible? n Cx n! x !(n x)! P( x successes in n trials) nCx p x q n x Binomial distribution is discrete in nature as x or n can take only discrete values. 30 Mean and Variance of Binomial Distribution Mean: E[ xbinomial ] np Variance: V [ xbinomial ] 2 ( E[ x 2 ] ( E[ x]) 2 ) npq Example: A football executive claims that 90% of viewers watch football over baseball on a concurrent telecast. An advertising agency claims that the viewers for each are 50%. Who is right? We did a survey in 25 households and assume that in 10 of them the games were being viewed with the following breakdown Viewing Football Viewing Baseball 7 3 Which of the two reports is correct? 31 Hypergeometric Distribution Binomial distribution is important in sampling with replacement but many practical problems involve sampling without replacement. – In that case hypergeometric distribution can be used to obtain precise probability. Cx N M Cn x f ( x) N Cn M n M N 2 nM ( N M )( N n) N 2 ( N 1) – Example: We want to pick two apples from a box containing 15 apples, 5 of which are rotten. Find the probability function for number of rotten apples in our sample? Without Replacement 5 Cx 10C2 x f ( x ) 15 C2 With Replacement x 5 10 f ( x ) 2Cx 15 15 2 x 32 Poisson Distribution Poisson Distribution is one of the most important discrete distribution. – It was first used by French Mathematician S.D. Poisson in 1837 to describe the probability of deaths in the Prussian army from the kick of a horse as well as number of suicides among women and children. – These days it is successfully used in problems involving the number of arrivals/requests for a service per unit time at any service facility. Assumptions – It must be possible to divide the time interval being used into a large number of small time intervals s.t. the the probability of an occurrence in each sub interval is very small. – The probability of an occurrence in each of these sub intervals must remain constant throughout the time period being considered. – The probability of two or more occurrences in each sub intervals must be small enough to be ignored. – The occurrences in one time interval are independent from the any occurrence in other time interval. 33 Poisson Distribution Probability mass function for Poisson distribution is given by f ( x) x e x! The Poisson distribution has the mean and the variance 2= . It can be shown that Poisson distribution can be obtained as a special case of Binomial distribution when p0 and n. Example: It is given that on average 60 customers visit the bank b/w 10am and 11am daily. Then we may be interested in knowing the probability of exactly or less than or equal to 2 customers visiting the bank in a given one minute time interval. e 1.12 1 P(2 arrivals) 2 2e 1 1 1 5 P( 2 arrivals)= e e 2e 2e 34 Gaussian or Normal Distribution The normal distribution is the most widely known and used distribution in the field of statistics. – Many natural phenomena can be approximated by Normal distribution. Central Limit Theorem: – The central limit theorem states that given a distribution with a mean and variance 2, the sampling distribution of the mean approaches a normal distribution with a mean and a variance 2/N as N, the sample size, increases. Normal Density Function: f ( x) 1 e 2 0.399 ( x )2 2 2 x -2 - + +2 35 Normal Distribution Multivariate Gaussian Density Function: T 1 1 2 X μ P X μ 1 f ( X) e 1 n 2 P 2 What is the probability that Y A( X μ) Yi zi i z12 z22 zn2 R 2 X μ T P 1 2 1 0 Where, 0 P zi2 R 2 f ( z )dV V Curse of Dimensionality 1 X μ R 2 0 1 22 0 0 0 -1 T AP A 1 n2 2 3 n\R 1 1 0.683 0.955 0.997 2 0.394 0.865 0.989 3 0.200 0.739 0.971 36 Summary of Some Probability Mass/Density Functions Probability Distribution Discrete Parameters Characteristics Probability Function Binomial 0 p 1 and n 0,1, 2, Skewed unless p=0.5 M=0…n, N=0,1,2… Hypergeometric n=0…N n Cx p x q n x M Skewed xe Symmetric about 1 e 2 Standardized Normal Symmetric about zero 1 x2 e 2 Exponential Skewed Positively >0 np C x N M Cn x N Cn Skewed positively Poisson Mean Variance n M N npq nM ( N M )( N n ) N 2 ( N 1) 2 0 1 1/ 1/2 x! Continuous Normal - and 0 0 ( x )2 2 2 2 e T A distribution is skewed if it has most of its values either to the right or to the left of its mean A measure of this variability in density is given by the third moment of a distribution called the “skewness” defined as E(x3). 37