Download Probability and Statistical Review

Probability and Statistical Review Lecture 1 Manoranjan Majji Lecture slides and notes available online: Visit http://dnc.tamu.edu/Class Notes/AERO626/index.php Probability and Statistical Review  Probability –Motivating Example –Definition of Probability –Axioms of Probability –Conditional Probability –Bayes’s Theorem –Random Variables Discrete and Continuous –Expectation of Random Variables –Multivariate Density Functions 2 Basic Probability Concepts  Probabilities are numbers assigned to events that indicate “how likely” it is that event will occur when a random experiment is performed. – The statement “E has probability P(E)” then means that if we perform the experiment very often, it is practically certain that the relative frequency is approximately equal to P(E).  What do we mean by Relative Frequency? – The relative frequency is at least equal to 0 and at most equal to 1. 0  P( E )  1 – Frequency Function: It shows how the values of the samples are distributed. f ( x)  fj 0 when x  x j for any value x not appearing in the sample – Sample Distribution function: F ( x)   f (t ) tx 4 Basic Probability Concepts  The frequency function characterizes a given sample in detail. – We can compute some numbers that characterize certain properties of the sample. – Sample Mean: m 1 n 1 m    x j   x j  nf ( x j )    x j f ( x j ) n j 1 n j 1 j 1 – Sample Variance: m 1 n 2    ( x j   )   ( x j   )2 f ( x j ) n j 1 j 1 2 5 Useful Definitions  Random Experiment or Random Observation: – It is performed according to a set of rules that determines the performance completely. – It can be repeated arbitrarily often. – The result of each performance depends on “chance” (that is, on influences which we can not control) and can therefore not be uniquely predicted.  The result of single performance of the experiment is called the outcome of that experiment.  The set of all possible outcomes of an experiment is called the sample space of the experiment.  In most practical problems, we are not interested in the individual outcomes of the experiment but in whether an outcome belongs to a certain set of outcomes. Such a set is called an “Event” 6 Useful Definitions  Impossible Event: An event containing no element and is denoted by .  Mutually Exclusive or Disjoint Events: if A  B    Example: Let us consider a rolling of a dice. Sample Space: S  {1, 2,3, 4,5, 6} E : an event that the dice turns up an even number  {2, 4, 6} O: an event that the dice turns up an odd number = {1,3,5} E O   E and O are mutually exclusive events. 7 Axioms of Probability Property 1: 0  P( E )  1. Property 2: P( S )  1. Property 3: P( E c )  1  P( E ). Property 4: P( A  B)  P( A)  P( B)  P( A  B). Property 5: if E1 , E2 , .........., En are mutually exclusive events then P( E1  E2  ........  En )  P( E1 )  P( E2 )  ..........  P( En ) 8 Conditional Probability  The probability of an event B under the condition that an event A occurs is given by P( A  B) P( B / A)  P( A) – P(B/A) is called the conditional probability of B given A. AB A B – In this case, event A serves as a new sample space and event B becomes AB. – A and B are called independent events if P( B / A)  P( B) P( A / B)  P( A) P( A  B)  P( A) P( B) 9 Theorem of Total Probability. B1 Bn-1 A B2 B3 Bn  Let B1, B2………,Bn be mutually exclusive events s.t. n Bi  S i 1  The probability of an event A can be represented as: P( A)  P( A  B1 )  P( A  B2 )  ..............  P( A  Bn )  and, therefore n P( A)  P( A / B1 ) P( B1 )  ..............  P( A / Bn ) P( Bn )   P( A / Bi ) P( Bi ) i 1 10 Bayes’s Theorem  Let us assume there are m mutually exclusive states of nature (classes) labeled j(j=1,2,…..,m).  Let P(x) be the probability that an event assumes the specific value x.  Definitions: – Prior Probability: P(j). – Posterior Probability: P(j/x) (of class j given observation x) – Likelihood Probability: P(x/j) (conditional probability of observation x given class j).  Bayes’s Theorem: gives the relationship between the m prior probabilities P(j), the m likelihood probabilities P(x/j) and one posterior probability of interest. P( j ) P( x /  j ) P( j / x)  m  P(k ) P( x / k ) k 1 11 Exercise  Consider a clinical problem where we have to decide if a patient has one particular rare disease on the basis of an imperfect medical test. – 1 in 1000 people have a rare disease. – Test shows positive 99% when a person has disease. – Test shows positive 2% when a person does not have a disease.  What is the probability when the test is positive and person actually has the disease? P( B / A1 ) P( A1 ) P( A 1 / B)  P( B / A1 ) P( A1 )  P( B / A2 ) P( A2 ) 0.99  0.001  (0.99  0.001)  (0.02  0.999)  0.047 0.02 0.98 12 Exercise (continued….)  P(A1/B)=0.047=4.7% – seems counter-intuitive…WHY?  Most positive tests arise from error than from people actually having the disease. – From prior 0.001 to posterior 0.047.  Disease is rare and test is marginally reliable.  NOTE: if disease were not so rare (e.g. say 25% incidents) then we would get a good diagnoses. – P(A1/B)=0.94. 13 Random Variables  A random variable X (also called stochastic variable) is a function whose values are real numbers and depend on “chance”. More precisely, it is a function X which has following properties: – X is defined on the sample space S of the experiment, and its values are real numbers  The function that assigns value to each outcome is fixed and deterministic. – The randomness is due to the underlying randomness of the argument of the function X.  Random numbers can be Discrete or Continuous. 14 Discrete Random Variables  A random variable X and the corresponding distribution are said to be discrete, if the number of values for which X has non-zero probability is finite.  Probability Mass Function of X: f ( x)  pj 0 when x  x j otherwise  Probability Distribution Function of x: F ( x)  P( X  x)  Properties of Distribution Function: 0  F ( x)  1 P(a  x  b)  F (b)  F (a ) 15 Continuous Random Variables and Distributions  A random variable X and the corresponding distribution are said to be continuous if the distribution function F(x)=P(Xx) of X can be represented by an integral form. x F ( x)   f ( y )dy   The integrand f(y) is called a probability density function. F '( x)  f ( x)  Properties:   f ( x)dx  1  b P(a  X  b)  F (b)  F (a)   f ( x)dx a 16 Statistical Characterization of Random Variables  Expected Value: – The expected value of a discrete random variable, x is found by multiplying each value of random variable by its probability and then summing over all values of x. Expected value of x: E[ x]   xP( x)   xf ( x) x x – The expectation value of x is the “balancing point” for the probability mass function of x. That is, it is the arithmetic mean. – We can take an expectation of any function of a random variable. Expected value of g(x) = E[g(x)]= g(x)f(x) x – This balance point is the value expected for g(x) for all possible repetitions of the experiment involving the random variable x. – Expected value of a continuous density function f(x), is given by  E ( x)   xf ( x)dx  17 Illustration of Expectation A Lottery has two schemes, the First scheme has two outcomes (denoted by 1 and 2)and the second has three (denoted by 1,2 and 3). It is agreed that the participant in the First scheme gets $1, if outcome is 1, $2, if the outcome is 2. The participant in the second scheme gets $3 if the outcome is 1, -$2 if the outcome is 2 and $3 if the outcome is 3. The probabilities of each outcome are listed as follows. p(1, 1) = 0.1; p(1, 2) = 0.2; p(1, 3) = 0.3 p(2, 1) = 0.2; p(2, 2) = 0.1; p(2; 3) = 0.1 Help the investor to decide on which scheme to prefer.[Bryson] 18 Example  Let us assume that we have agreed to pay $1 for each dot showing when a pair of dice is thrown. We are interested in knowing, how much we would lose on the average? Values of x Frequency Values of Probability Function 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 5 4 3 2 1 P(x=2) = 1/36 P(x=3) = 2/36 P(x=4) = 3/36 P(x=5) = 4/36 P(x=6) = 5/36 P(x=7) = 6/36 P(x=8) = 5/36 P(x=9) = 4/36 P(x=10) = 3/36 P(x=11) = 2/36 P(x=12) = 1/36 Sum 36 1.00 Probability Distribution Function P(x2) = 1/36 P(x3) = 3/36 P(x4) = 6/36 P(x5) = 10/36 P(x6) = 15/36 P(x7) = 21/36 P(x8) = 26/36 P(x9) = 30/36 P(x10) = 33/36 P(x11) = 35/36 P(x12) = 1  Average amount we pay= (($2x1)+($3x2)+……+($12x1))/36=$7  E(x)=$2(1/36)+$3(2/36)+……….+$12(1/36)=$7 19 Example (Continue…)  Let us assume that we had agreed to pay an amount equal to the squares of the sum of the dots showing on a throw of dice. – What would be the average loss this time?  Will it be ($7)2=$49.00?  Actually, now we are interested in calculating E[x2]. – E[x2]=($2)2(1/36)+……….+($12)2(1/36)=$54.83  $49 – This result also emphasized that (E[x])2  E[x2] 20 Variance of Random Variable  Variance of random variable, x is defined as V ( x)   2  E[( x   ) 2 ] V ( x)  E[ x 2  2  x   2 ]  E[ x 2 ]  2( E[ x]) 2  ( E[ x]) 2  E[ x 2 ]  ( E[ x]) 2 This result is also known as “Parallel Axis Theorem” 21 Expectation Rules  Rule 1: E[k]=k; where k is a constant  Rule 2: E[kx] = kE[x].  Rule 3: E[x  y] = E[x]  E[y].  Rule 4: If x and y are independent E[xy] = E[x]E[y]  Rule 5: V[k] = 0; where k is a constant  Rule 6: V[kx] = k2V[x] 22 Propagation of moments and density function through linear models  y=ax+b – Given:  = E[x] and 2 = V[x] – To find: E[y] and V[y] E[y] = E[ax]+E[b] = aE[x]+b = a+b V[y] = V[ax]+V[b] = a2V[x]+0 = a2 2  Let us define z (x  )  Here, a = 1/  and b = - /  Therefore, E[z] = 0 and V[z] = 1 z is generally known as “Standardized variable” 23 Propagation of moments and density function through non-linear models  If x is a random variable with probability density function p(x) and y = f(x) is a one to one transformation that is differentiable for all x then the probability function of y is given by – p(y)=p(x)|J|-1, for all x given by x=f-1(y) – where J is the determinant of Jacobian matrix J.  Example: Let y  ax 2 and p( x)  1  x 2 exp( x 2 / 2 x2 ) NOTE: for each value of y there are two values of x. p( y )  1 exp( y / 2a x2 ),  y  0 2 x 2 ay and p(y) = 0, otherwise We can also show that E ( y )  a x2 and V ( y )  2a 4 x4 24 Random Vectors  Just an extension to random variable – A vector random variable X is a function that assigns a vector of real number to each outcome in the sample space.  Joint Probability Functions: – Joint Probability Distribution Function: F ( X )  P[{X1  x1} {X 2  x2 }  ......... {X n  xn }] – Joint Probability Density Function: n F ( X ) f ( x)  X 1X 2 ...X n  Marginal Probability Functions: A marginal probability functions are obtained by integrating out the variables that are of no interest. F ( x)   P( x, y ) or y y   f ( x, y )dy y  25 Multivariate Expectations  Mean Vector: E[ x ]  [ E[ x1 ] E[ x2 ] ...... E[ xn ]]  Expected value of g(x1,x2,…….,xn) is given by E[ g ( x )]    ..... g ( x ) f ( x ) or xn xn1 x1   ..... g ( x) f ( x)dx xn xn-1 x1  Covariance Matrix: cov[ x ]  P  E[( x   )( x   )T ]  E[ XX T ]   T where, S  E[ xxT ] is known as autocorrelation matrix.  1 0  0   1  0   0   2   21 NOTE: P  R        0 0   n    n1 12 1 n 2 1n   1 0   2 n   0  2    1   0 0 0 0      n  R is the correlation matrix 26 Covariance Matrix  Covariance matrix indicates the tendency of each pair of dimensions in random vector to vary together i.e. “co-vary”.  Properties of covariance matrix: – Covariance matrix is square. – Covariance matrix is always +ive definite i.e. xTPx > 0. – Covariance matrix is symmetric i.e. P = PT. – If xi and xj tends to increase together then Pij > 0. – If xi and xj are uncorrelated then Pij = 0. 27 Probability Distribution Function  There are many situations in statistics that involves the same type of probability functions. – It is not necessary to derive these results over and over again in each special case with different numbers.  We can avoid this tedious process by recognizing the similarities between certain types of apparently unique experiments, and then merely matching a given case to general formula.  Examples: – Toss a coin: – Take an exam: – analyzing stock market: Head or Tail Pass or Fail Up or Down  So all the above written processes can be distinguished by only two events “Success” and “Failure” 28 Binomial Distribution  Binomial Distribution plays an important role in experiments involving repeated independent trials each with just two possible outcomes. – Independent trials means result of one trial cannot influence the result of other trials. – Repeated trials means the probability of “success” or “failure” does not change with trials.  In binomial distribution, we are interested in the probability of receiving a certain number of successes.  Let us assume that we have n independent trials, each trial having same probability of success say p – probability of failure: q = 1-p  Let us say we are interested in determine the probability of x successes in n trials. – Find the probability of any one occurrence of this type and then multiply this value by number of possible occurrences. 29 Binomial Distribution  One of the possible occurrence is: SS ......S FF ......F x times n  x times  Joint probability of this particular sequence is given by p x q n  x NOTE: pxqn-x represents the probability not only of our one arrangement but of any possible arrangement of x successes and n-x failures.  How many occurrences of n successes and n-x failures are possible? n Cx  n! x !(n  x)! P( x successes in n trials)  nCx p x q n  x Binomial distribution is discrete in nature as x or n can take only discrete values. 30 Mean and Variance of Binomial Distribution  Mean: E[ xbinomial ]    np  Variance: V [ xbinomial ]   2  ( E[ x 2 ]  ( E[ x]) 2 )  npq  Example: A football executive claims that 90% of viewers watch football over baseball on a concurrent telecast. An advertising agency claims that the viewers for each are 50%. Who is right? We did a survey in 25 households and assume that in 10 of them the games were being viewed with the following breakdown Viewing Football Viewing Baseball 7 3 Which of the two reports is correct? 31 Hypergeometric Distribution  Binomial distribution is important in sampling with replacement but many practical problems involve sampling without replacement. – In that case hypergeometric distribution can be used to obtain precise probability. Cx N  M Cn  x f ( x)  N Cn M n M N 2  nM ( N  M )( N  n) N 2 ( N  1) – Example: We want to pick two apples from a box containing 15 apples, 5 of which are rotten. Find the probability function for number of rotten apples in our sample? Without Replacement 5 Cx 10C2 x f ( x )  15 C2 With Replacement x  5   10  f ( x )  2Cx      15   15  2 x 32 Poisson Distribution  Poisson Distribution is one of the most important discrete distribution. – It was first used by French Mathematician S.D. Poisson in 1837 to describe the probability of deaths in the Prussian army from the kick of a horse as well as number of suicides among women and children. – These days it is successfully used in problems involving the number of arrivals/requests for a service per unit time at any service facility.  Assumptions – It must be possible to divide the time interval being used into a large number of small time intervals s.t. the the probability of an occurrence in each sub interval is very small. – The probability of an occurrence in each of these sub intervals must remain constant throughout the time period being considered. – The probability of two or more occurrences in each sub intervals must be small enough to be ignored. – The occurrences in one time interval are independent from the any occurrence in other time interval. 33 Poisson Distribution  Probability mass function for Poisson distribution is given by f ( x)   x e  x!  The Poisson distribution has the mean  and the variance 2= .  It can be shown that Poisson distribution can be obtained as a special case of Binomial distribution when p0 and n.  Example: It is given that on average 60 customers visit the bank b/w 10am and 11am daily. Then we may be interested in knowing the probability of exactly or less than or equal to 2 customers visiting the bank in a given one minute time interval. e 1.12 1 P(2 arrivals)   2 2e 1 1 1 5 P( 2 arrivals)=    e e 2e 2e 34 Gaussian or Normal Distribution  The normal distribution is the most widely known and used distribution in the field of statistics. – Many natural phenomena can be approximated by Normal distribution.  Central Limit Theorem: – The central limit theorem states that given a distribution with a mean  and variance 2, the sampling distribution of the mean approaches a normal distribution with a mean  and a variance  2/N as N, the sample size, increases.  Normal Density Function: f ( x)  1 e  2 0.399  ( x   )2 2 2 x  -2 -  + +2 35 Normal Distribution  Multivariate Gaussian Density Function: T 1  1    2  X μ  P  X μ   1   f ( X)  e 1 n 2 P 2  What is the probability that Y  A( X  μ) Yi zi  i z12  z22   zn2  R 2  X μ  T P  1  2  1  0 Where,     0  P   zi2  R 2    f ( z )dV V Curse of Dimensionality 1  X μ   R 2 0 1  22 0  0    0  -1 T   AP A   1   n2  2 3 n\R 1 1 0.683 0.955 0.997 2 0.394 0.865 0.989 3 0.200 0.739 0.971 36 Summary of Some Probability Mass/Density Functions Probability Distribution Discrete Parameters Characteristics Probability Function Binomial 0  p  1 and n  0,1, 2, Skewed unless p=0.5 M=0…n, N=0,1,2… Hypergeometric n=0…N n Cx p x q n x M Skewed  xe  Symmetric about  1 e  2 Standardized Normal Symmetric about zero 1 x2 e 2 Exponential Skewed Positively >0 np C x N  M Cn  x N Cn Skewed positively Poisson Mean Variance n M N npq nM ( N  M )( N  n ) N 2 ( N  1)    2 0 1 1/ 1/2 x! Continuous Normal -     and   0 0 ( x   )2 2 2 2  e T A distribution is skewed if it has most of its values either to the right or to the left of its mean A measure of this variability in density is given by the third moment of a distribution called the “skewness” defined as E(x3). 37

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Probability and Statistical Review