* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download doc - Berkeley Statistics
Survey
Document related concepts
Transcript
Introduction to Probability (II) This lecture extends the ideas of mean, standard deviation. This involves sums and averages of randomly produced numbers. Random variables, introduced below, provide a good notation for this purpose. Random variables The number of heads in four tosses of a coin could be any one of the possible values 0, 1, 2, 3, 4. The term “random variable” is introduced for something like the number of heads, which might be one of several possible values, with a distribution of probabilities over this set of values. Typically, capital letters X, Y, Z, etc., are used to denote random variables. For example, X might stand for “the number obtained by rolling a die”, Y for “the number of heads in four coin tosses”, and Z for “the suit of a card dealt from a well-shuffled deck”. This is not really a new idea, rather a compact notation for the familiar idea of something or other picked at random according to a probability distribution. The “range” of a random variable X is the set of all possible value that X might produce. If X either has finitely many possible values, or that has a countably infinite number of possible values (ie its possible values can be labeled by an integer), the distribution of X is determined by the probabilities of individual values --- P(X = x) for x in the range of X, and P(X x) 1 as u runs through the set of all x possible values of the random variable X. Here it is assumed that the random variable X has a uniquely specified value, no matter what happens. So the events (X=x) as x varies over the range of X are mutually exclusive, and their probabilities must add up to 1. This kind of X is called a “discrete random variable”, and has a discrete probability distribution. Here are the examples for such kind of X’s: X can be the number of calls you are expecting to receive from 2:00pm-4:00pm today. Or X is defined as the number of heads in four tosses of a coin. The “expectation” of a discrete random variable X is defined by E ( X ) xP(X x) . x The range of a continuous random variable X is over an interval of real numbers. We once used the normal curve to approximate the histogram of 19 students’ midterm scores. The idea there was to approximate a discrete distribution of many small individual probabilities by scaling the histogram to make it follow a continuous curve. The function defining such a curve is called a “probability density”, denoted f(x). This function determines probabilities over an infinite continuous range of possible values. A random variable X has density f(x) if for all a b b P(a X b) f ( x)dx. a The “expectation” of a discrete random variable X is defined by E( X ) xf ( x)dx. If you try to predict the value of a random variable X by its mean E(X), you will be off by the random amount X-E(X). It is often important to have an idea of how large this deviation is likely to be. Because E(X-E(X))=E(X)-E(X)=0 it is necessary to consider either the absolute value or the square of X-E(X) to get an idea of the size of the deviation without regard to sign. Because the algebra is easier with squares than with absolute values, it is natural to first consider E[( X E ( X )) 2 ] ,then take a square root to get back to the same scale of units as X . The “variance” of X, denoted Var(X), is the mean squared deviation of X from its expected value: Var ( X ) E[( X E ( X ))2 ] . The “standard deviation” of X, denoted SD(X), is the square root of the variance of X: SD( X ) Var ( X ) . If SD(X)>0, the random variable X* = (X-E(X))/SD(X) called X in “Standard units”, has E(X*) = 0 and SD(X*) = 1. Addition rule for expectation and variance Let X1 , X 2 ,..., X n be random variables. E ( X1 X 2 ... X n ) E ( X1 ) E ( X 2 ) ... E ( X n ) . If X1 , X 2 ,..., X n are independent, Var ( X1 X 2 ... X n ) Var ( X1 ) Var ( X 2 ) ... Var ( X n ) . Central limit theorem Let Sn X1 ... X n be the sum of n independent random variables each with the same distribution (or say iid). For large n, the distribution of Sn is approximately normal, with mean E ( Sn ) n , and standard deviation SD(Sn ) n , where E ( X i ) and SD( X i ) . The law of average John Kerrich, a South African mathematician, found out “When you toss the coin many times, the number of heads will be around half the number of tosses, but it will be off by some amount-chance error. As the number of tosses goes up, the chance error gets bigger in absolute terms. Compared to the number of tosses, it gets smaller.” The above process can also be described by “the sum of draws”. The sum of draws from the box is the key ingredient for many statistical procedures. For example, take the box Imagine drawing n times at random with replacement from this box (the phrase “with replacement” reminds you to put the ticket back in box before drawing again), and make a note on the number at each draw. You can add up all the numbers, then the sum should be around n/2 but be off by some chance error, say e. When n is bigger, e gets bigger but e/n gets smaller. In fact, when n goes to infinity, e will go to infinity as well but e/n will decrease to 0.