Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Exercise III. The central limit theorem. In probability theory, the central limit theorem (CLT) states conditions under which the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed. The central limit theorem also requires the random variables to be identically distributed, unless certain conditions are met. The CLT also justifies the approximation of the distribution of large-sample statistics using the normal distribution in controlled experiments. Consider the following example. A coin is tossed n = 100 times. Denote the outcome when heads is thrown as 1, and when tails is thrown as 0, i.e. introduce a random variable Xi which takes values 0 and 1 with equal probability: 1 P ( X 0 )P ( X 1 ) , i i 2 which describes a single coin toss. This distribution is called the Bernoulli distribution (the experiment is a Bernoulli trial). Define the random variable Y as the sum of n independent random variables Xi: n Y Xi . i1 We know that the random variable Y has a binomial distribution with parameters p = 0.5 and n = 100. We show that for sufficiently large n (usually n> 30), we can use the normal distribution N ( m, ) tables instead of the binomial distribution tables. To do this, first generate the table for this binomial distribution and the table for the corresponding normal distribution and then compare them with each other. On the first screenshot we show how to calculate the value of the distribution function for this binomial distribution for 0 successes (p = 0.5 and n = 100). Then we proceed to successive values for the number of success: 10, 20, ..., 100. On the second screenshot, we show the corresponding calculations for the normal distribution. In this case, you must first calculate the expected value and standard deviation: m p n and np(1 p) . In our example, these figures are respectively 50 and 5 n Bernoulli cdf Normal cdf 0 7,88861E-31 7,61985E-24 10 1,53165E-17 6,22096E-16 20 5,57954E-10 9,86588E-10 30 3,92507E-05 3,16712E-05 40 0,028443967 0,022750132 50 0,539794619 0,5 60 0,9823999 0,977249868 70 0,99998392 0,999968329 80 1 0,999999999 90 1 1 100 1 1 Comparing the results obtained, we find that differences are not too great and in fact the Bernoulli distribution can be approximated by the corresponding normal distribution. Now we calculate the probability that there will be no less than 65 heads. This corresponds to the following P ( Y 6 5 ) 1 P ( Y 6 5 ) 1 F ( 6 5 ) B e r n o u l l i Thus, the required probability is 0.000895. Note that, if there were such an outcome, ie. at least 65 heads in the 100 tosses, it would indicate that the coin is not “fair”- we expect 50 heads here and not 65! The result is possible but unlikely (probability 0.000895). Similarly, we calculate the following approximate result using the CLT (the normal approximation to the binomial distribution). P ( Y 6 5 ) 1 P ( Y 6 5 ) 1 F ( 6 5 ) N ( 5 0 , 5 ) Thus, the required probability is approximately 0.00135. The difference between these calculations is 0.000455 and is definitely small. Perhaps in the case of the Bernoulli distribution, which is formed immediately as the sum of independent two-point distributions, using the approximate solution is not necessary - you can do the calculation directly in an accurate manner. But in other cases (some examples are given in the additional problems), the approximate approach is the only possible one. If calculations can be done only with the aid of normal distribution tables, in this case only approximate calculations are possible. Consider our example: 65 50 P(Y 65) 1 P(Y 65) 1 FN (50, 5) (65) 1 1 (3) 5 x --1.9| 2.0| 2.1| 2.2| 2.3| 2.4| 2.5| 2.6| 2.7| 2.8| 2.9| 3.0| 0 0.01 0.02 0.03 0.04 --------------------------------0.971 0.972 0.973 0.973 0.974 0.975 0.978 0.978 0.979 0.979 0.982 0.983 0.983 0.983 0.984 0.986 0.986 0.987 0.987 0.987 0.989 0.990 0.990 0.990 0.990 0.992 0.992 0.992 0.992 0.993 0.994 0.994 0.994 0.994 0.994 0.995 0.995 0.996 0.996 0.996 0.997 0.997 0.997 0.997 0.997 0.997 0.998 0.998 0.998 0.998 0.998 0.998 0.998 0.998 0.998 0.999 0.999 0.999 0.999 0.999 So the required result is 1 - 0.999 = 0.0001. Additional problems. 1) The probability of winning in a certain game equals 0.1. What is the probability that among 500 participants more than 60 win? 2) The probability that a newborn baby is male equals 0.515. What is the probability that the number of female births among 1000 births will be no more than 480? 3) On average, in 1000 trials the number of successes equals 700. If you obtained 800 successes in 1000 trials would you explain it by luck or your own skill? 4) What is the probability that in 800 independent trials the number of successes will be more than 150 and less than 250 when the probability of success in a single trial equals 0.25? 5) In a large number of products only 20% are of first class. We prepare a sample of 300 of products. a) what is the probability that among them there are more than 50 of first class? b) what is the probability that the frequency of first class products in the sample will be no greater than 0.3? 6) The probability of success in an individual trial equals p. How many trials must be completed to obtain 40 successes with probability no less than ½? 7) The random variables X 1 , X 2 ,..., X n are of Poisson type with parameter λ=2. Estimate the following probability: 1 0 0 P 1 9 0 X 2 1 0 ? i i 1 8) The number of marks obtained in a “Statistics” exam varies between 0 and 100. The average number of marks obtained by a student equals 40 with standard deviation 20. What is the probability that: a) the total number of marks of 150 students will be more than 6500? b) the average score for these student is between 40 and 70? 9) The random variables X 1 , X 2 ,..., X n are of exponential type with parameter λ=4. Estimate 100 the following probability : P X i 30 ? i 1 10) 1500 real numbers are summarized by a computer. Each number is rounded to the nearest integer number. Numbers are given with a precision of one decimal place. Assuming that rounding errors are independent and of uniform type on (-0.5, 0.5) estimate the probability that the sum of these errors exceeds 15.