Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Basic Probability and Statistics Random variables Distribution functions Various probability distributions Definitions • An experiment is a process whose output is not known with certainty. • The set of all possible outcomes of an experiment is called the sample space (S). • The outcomes are called sample points in S. • A random variable is a function that assigns a real number to each point in S. • A distribution function F(x) of the random variable X is defined for each real number x as follows F ( x) Pr( X x) 2 Properties of distribution function 0 F ( x) 1 x. F(x) is non - decreasing . For x1 x2 : F ( x1 ) F ( x2 ). lim F ( x) 1; x lim F ( x) 0. x 3 Random Variables • A random variable (r.v.) X is discrete if it can take on at most a countable number of values x1, x2, x3,… • The probability that the discrete r.v. X takes on a value xi is given by: p(xi)=Pr(X= xi). • p(x) is called the probability mass function. p ( x ) 1. i 1 i F ( x) p( xi ). xi x 4 Random Variables • A r.v. is said to be continuous if there exists a nonnegative function f(x) such that for any set of real numbers B, Pr( X B) f ( x)dx and B f ( x)dx 1. • f(x) is called probability density function. F ( x) Pr( X x) Pr X (, x) x f ( y)dy. 5 Random Variables • Mean or expected value of a r.v. X is denoted by E[X] or µ, and given by: x j p ( x j ) j 1 xf ( x)dx if X is discrete, if X is continuous . • Variance of a r.v. X is denoted by Var(X) or σ2, and given by: 2 E X 2 E X 2 2 . 6 Properties of mean • If X is a discrete random variable having pmf p(x), then: Eg ( x) g ( x) p( x). x • If X is continuous with pdf f(x), then: E g ( x) g ( x) f ( x)dx. • Hence, for constants a and b, EaX b aEX b. 7 Property of variance • For constants a and b, Var aX b a 2Var X . 8 Joint Distribution • If X and Y are discrete r.v., then, p( x, y ) Pr( X x, Y y ) x, y is called the joint probability mass function of X and Y. • Marginal probability mass functions of X and Y: p X ( x ) p ( x, y ) Y pY ( y ) p( x, y ) X • X, Y are independent if p( x, y) pX ( x) pY ( y) 9 Conditional probability • Let A and B be two events. • Pr(A|B) is the conditional probability of event A happening given that B has already occurred. • Baye’s theorem: Pr A | B Pr A B . Pr B • If events A and B are independent, then Pr(A|B) = Pr(A). • Hence, from Baye’s theorem: PrA B PrAB PrAPrB. 10 Dependency • Covariance is a measure of linear dependence and is denoted by Cij or Cov(Xi, Xj) Cij E X i i X j j E X i X j i j , i 1..n & j 1..n • Another measure of linear dependency is the correlation factor: Cij ij , i 1..n & j 1..n. i2 2j • Correlation factor is dimensionless but covariance is not. 11 Two random numbers in simulation experiment • Let X and Y be two random variates in a given simulation experiment that are not independent. • Our performance parameter is X+Y. E X Y E X E Y . Var X Y Var X Var Y 2Cov X , Y . • However, if the two r.v.’s are independent: Cov X , Y 0. Var X Y Var X Var Y . 12 Bernoulli trial • An experiment with only two outcomes – “Success” and “Failure” where the chance of outcome is known apriori. • Denoted by the chance of success “p” (this is a parameter for the distribution). • Example: Tossing a “fair” coin. • Let us define a variable Xi such that – 1 Xi 0 if trial i is a success otherwise. • Then, E[Xi] = p; and Var(Xi) = p(1-p). 13 Binomial r.v. • A series of n independent Bernoulli trials. • If X is the number of successes that occur in the n trials, then X is said to be Binomial r.v. with parameters (n, p). Its probability mass function is: n x Px PrX x p (1 p) n x , x where x 0,1,2,...n n n! . x x!(n x)! 14 Binomial r.v. n X Xi, i 1 1 Xi 0 if trial i is a success otherwise. n E[ X ] E X i np, i 1 n Var ( X ) Var X i i 1 np (1 p). 15 Poisson r.v. • A r.v. X which can take values 0, 1, 2, … is said to have a Poisson distribution with parameter λ (λ > 0) if the pmf is given by: pi PrX i e i i! , i 0,1,2,... • For a Poisson r.v., EX Var X . • The probabilities can be recursively found out: pi 1 i 1 pi , i 0. 16 Uniform r.v. • A r.v. X is said to be uniformly distributed over the interval (a, b) when its pdf is: 1 f ( x) b a 0 if a x b otherwise. • Expected value: 1 b2 a 2 a b E X xdx . ba a 2(b a ) 2 b E X2 3 3 2 2 1 b a a b ab 2 x dx . ba a 3(b a ) 3 b 17 Uniform r.v. • Variance Var X E X EX 2 2 (b a) 2 . 12 • Distribution function F(x) for a given x: a < x < b is 1 xa F ( x) PrX x dy . ba ba a x 18 Normal r.v. pdf: 1 ( x ) 2 / 2 2 f ( x) e , x . 2 The normal density is a bell-shaped curve that is symmetric about µ. It can be shown that for a normal r.v. X with parameters (µ, σ2), EX , Var X 2 . 19 Normal r.v. • If X ~ N(µ, σ2), then Z X is N(0,1). • Probability distribution function of “Standard Normal” is given as: 1 x y2 / 2 ( x) e dy, x . 2 • If X ~ N(µ, σ2), then: x F ( x ) . 20 Central Limit Theorem • Let X1, X2, X3…Xn be a sequence of IID random variables having a finite mean µ and finite variance σ2. Then: X X 2 X n n lim Pr 1 x ( x). n n 21 Exponential r.v. pdf: f ( x) e x , 0 x . cdf: x x 0 0 F ( x) f ( y )dy e y dy 1 e x . EX 1 ; Var X 1 2 . 22 Exponential r.v. • When multiplied by a constant, it still remains an exponential r.v. x x / c PrcX x Pr X 1 e . cX ~ Expo . c c • Most useful property: Memoryless!!! PrX s t | X t PrX s t, s 0. • Analytical simplicity X 1 ~ Exp(1 ), X 2 ~ Exp(2 ) P( X 1 X 2 ) 1 1 2 . 23 Poisson process A counting process {N (t ), t 0} is said to be a Poisson process if : N ( 0) 0 . The process has independen t increments . The number of events in any interval of length t is Poisson distribute d with mean . That is, s, t 0 PrN (t s ) N ( s ) n e t ( t ) n , n 0,1,2 n! If Tn , n 1 is the time between (n 1) st and nth event, then this interarriv al time has exponentia l distributi on. 24 Useful property of Poisson process • Let S11 denote the time of the first event of the first Poisson process (with rate λ1), and S12 denote the time of the first event of the second Poisson process (with rate λ2). Then: P ( S11 S12 ) 1 1 2 25 Covariance stationary processes • Covariance between two observations Xi and Xi+j depends only on j and not on i. • Let Cj be the covariance for this process. • So the correlation factor is given by: j Ci ,i j 2 i 2 i j Cj 2 , j 1,2,. 26 Point Estimation • Let X1, X2, X3…Xn be a sequence of IID random variables (observations) having a finite population mean µ and finite population variance σ2. • We are interested in finding these population parameters through nthe sample values. Xn X i 1 i n • This sample mean is unbiased point estimator of µ. • That is to say that: E X n . 27 Point Estimation X n • The sample variance: S 2 ( n) i 1 i Xn 2 n 1 is an unbiased point estimator of σ2. • Variance of the mean: Var X n 2 n . S 2 ( n) . • We can estimate this variance of mean by: Var X n n • This is true only if X1, X2, X3…Xn are IID. 28 Point Estimation • However, most often in simulation experiment, the data is correlated. • In that case, estimation using sample variance is dangerous. Because it underestimates the actual population variance. E S 2 (n) 2 , and S 2 ( n) E Var X n . n 29 Interval Estimation • Let X1, X2, X3…Xn be a sequence of IID random variables (observations) having a finite population mean µ and finite population variance σ2(> 0). • We want to construct confidence interval for mean µ. • Let Zn be a random variable with a probability distribution Fn(z). X n Zn . 2 /n Fn ( z ) Pr Z n z . 30 Interval Estimation • Central Limit Theorem states that: Fn ( z ) ( z ) as n . where is the standard normal distribution with mean 0 and variance 1. • Often, we don’t know the population variance σ2. • It can be shown that CLT applies if we replace σ2 by sample variance S2(n). tn X n S 2 ( n) / n • The variable tn is approximately normal as n increases. 31 Standard Normal distribution • Standard Normal distribution is N(0,1). • The cumulative distributive function (CDF) at any given value (z) can be found using standard statistical tables. • Conversely, if we know the probability, we can compute the corresponding value of z such that, F ( z1 ) Pr Z z1 1 . 2 • This value is z1-α/2 and is called the critical point for N(0,1). • Similarly, the other critical point (z2 = -z1-α/2) is such that: F ( z 2 ) PrZ z 2 . 2 32 Interval Estimation • It follows for a large n: Pr z1 Z n z1 2 Pr z1 2 2 z1 2 2 S ( n) / n Xn S 2 ( n) S 2 (n) Pr X n z1 X n z1 2 2 n n 1. 33 Interval Estimation • Therefore, if n is sufficiently large, an approximate 100(1- α) percent confidence interval of µ is given by: X n z1 2 S 2 ( n) . n • If we construct a large number of independent 100(1- α) percent confidence intervals each based on n different observations (n sufficiently large), the proportion of these confidence intervals that contain µ should be 1- α. 34 Interval Estimation • What if the n is not “sufficiently large”? • If Xi’s are normal random variables, the random variable tn has a t-distribution with n-1 degrees of freedom. • In this case, the 100(1-α) percent confidence interval for µ is given by: X n tn 1,1 2 S 2 ( n) . n 35 Interval Estimation • In practice, the distribution of Xi’s is rarely normal and the confidence interval (with t-distribution) will be approximate. • Also, tn1,1 2 z1 2 , the CI given with “t” is larger than the one with “z”. • Hence, it is recommended that we use the CI with “t”. Why? • However, tn1,1 z1 2 as n . 2 36 Interval Estimation • The confidence level has a long-run relative frequency interpretation. • The unknown population mean µ is a fixed number. • A confidence interval constructed from any particular sample either does or does not contain µ. • However, if we repeatedly select random samples of that size and each time constructed a confidence interval, with say 95% confidence, then in the long run, 95% of the CI’s would contain µ. • This happens because 95% of the time the sample mean Y would fall within 1.96 Y of . • So 95% of the times, the inference about µ is correct. 37 Interval Estimation • Every time we take a new sample of the same size, the confidence interval is going to little different than the previous one. • This is because the sample mean Y varies from sample to sample. • In practice, however, we select just one sample of fixed size n and construct one confidence interval using the observations in that sample. • We do not know whether any particular CI truly contains μ. • Our 95% confidence in that interval is based on long-term properties of the procedure. 38 Hypotheses testing • Assume that X1, X2, X3…Xn are normally distributed (or be approximately normal) and that we would like to test whether µ = µ0, where µ0 is a fixed hypothesized value of µ. • If X n 0 is large then our hypothesis is not true. • To conduct such test (whether the hypothesis is true or not), we need a statistical parameter whose distribution is known when the hypothesis is true. • Turns out, if our hypothesis is true (µ = µ0), then the statistic tn has a t-distribution with n-1 df. 39 Hypotheses testing • We form our two-tailed hypothesis (H0) to test for µ = µ0 as: If tn t n 1,1 2 t n 1,1 2 Reject H 0 ``Accept' ' H 0 • The portion of real line that corresponds to the rejection of H0 is called the critical region for the test. • The probability that the statistic tn falls in the critical region given that H0 is true, which is clearly equal to α, is called level of the test. • Typically if the tn doesn’t fall in the rejection region, we “do not reject” the H0. 40 Hypotheses testing • Type I error: If one rejects H0 when it is true, this is called Type I error, which is again equal to α. This errors is under experimenter's control. • Type II error: If one accepts H0 when it is false, it is Type II error. It is denoted by β. • We call δ = 1 - β as power of test which is the probability of rejecting H0 when it is false. • For a fixed α, power of the test can only be increased by increasing n. 41