Download Lab. 3

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Gambler's fallacy wikipedia , lookup

Transcript
Exercise III. The central limit theorem.
In probability theory, the central limit theorem (CLT) states conditions under which the mean of a
sufficiently large number of independent random variables, each with finite mean and variance, will
be approximately normally distributed. The central limit theorem also requires the random variables
to be identically distributed, unless certain conditions are met. The CLT also justifies the
approximation of the distribution of large-sample statistics using the normal distribution in
controlled experiments.
Consider the following example. A coin is tossed n = 100 times. Denote the outcome when
heads is thrown as 1, and when tails is thrown as 0, i.e. introduce a random variable Xi which
takes values 0 and 1 with equal probability:
1
P
(
X
0
)P
(
X
1
) ,
i
i
2
which describes a single coin toss. This distribution is called the Bernoulli distribution (the
experiment is a Bernoulli trial).
Define the random variable Y as the sum of n independent random variables Xi:
n
Y  Xi .
i1
We know that the random variable Y has a binomial distribution with parameters p = 0.5 and
n = 100. We show that for sufficiently large n (usually n> 30), we can use the normal
distribution N ( m,  ) tables instead of the binomial distribution tables. To do this, first
generate the table for this binomial distribution and the table for the corresponding normal
distribution and then compare them with each other. On the first screenshot we show how to
calculate the value of the distribution function for this binomial distribution for 0 successes (p
= 0.5 and n = 100). Then we proceed to successive values for the number of success: 10, 20,
..., 100. On the second screenshot, we show the corresponding calculations for the normal
distribution. In this case, you must first calculate the expected value and standard deviation:
m  p  n and   np(1  p) . In our example, these figures are respectively 50 and 5
n
Bernoulli cdf Normal cdf
0 7,88861E-31 7,61985E-24
10 1,53165E-17 6,22096E-16
20 5,57954E-10 9,86588E-10
30 3,92507E-05 3,16712E-05
40 0,028443967 0,022750132
50 0,539794619
0,5
60
0,9823999 0,977249868
70 0,99998392 0,999968329
80
1 0,999999999
90
1
1
100
1
1
Comparing the results obtained, we find that differences are not too great and in fact the
Bernoulli distribution can be approximated by the corresponding normal distribution.
Now we calculate the probability that there will be no less than 65 heads. This corresponds to
the following
P
(
Y

6
5
)

1

P
(
Y

6
5
)

1

F
(
6
5
)
B
e
r
n
o
u
l
l
i
Thus, the required probability is 0.000895. Note that, if there were such an outcome, ie. at
least 65 heads in the 100 tosses, it would indicate that the coin is not “fair”- we expect 50
heads here and not 65! The result is possible but unlikely (probability 0.000895).
Similarly, we calculate the following approximate result using the CLT (the normal
approximation to the binomial distribution).
P
(
Y

6
5
)

1

P
(
Y

6
5
)

1

F
(
6
5
)
N
(
5
0
,
5
)
Thus, the required probability is approximately 0.00135. The difference between these
calculations is 0.000455 and is definitely small. Perhaps in the case of the Bernoulli
distribution, which is formed immediately as the sum of independent two-point distributions,
using the approximate solution is not necessary - you can do the calculation directly in an
accurate manner. But in other cases (some examples are given in the additional problems), the
approximate approach is the only possible one.
If calculations can be done only with the aid of normal distribution tables, in this case only
approximate calculations are possible. Consider our example:
 65  50 
P(Y  65)  1  P(Y  65)  1  FN (50, 5) (65)  1   
  1   (3)
 5 
x
--1.9|
2.0|
2.1|
2.2|
2.3|
2.4|
2.5|
2.6|
2.7|
2.8|
2.9|
3.0|
0
0.01
0.02
0.03
0.04
--------------------------------0.971 0.972 0.973 0.973 0.974
0.975 0.978 0.978 0.979 0.979
0.982 0.983 0.983 0.983 0.984
0.986 0.986 0.987 0.987 0.987
0.989 0.990 0.990 0.990 0.990
0.992 0.992 0.992 0.992 0.993
0.994 0.994 0.994 0.994 0.994
0.995 0.995 0.996 0.996 0.996
0.997 0.997 0.997 0.997 0.997
0.997 0.998 0.998 0.998 0.998
0.998 0.998 0.998 0.998 0.998
0.999 0.999 0.999 0.999 0.999
So the required result is 1 - 0.999 = 0.0001.
Additional problems.
1) The probability of winning in a certain game equals 0.1. What is the probability that among
500 participants more than 60 win?
2) The probability that a newborn baby is male equals 0.515. What is the probability that the
number of female births among 1000 births will be no more than 480?
3) On average, in 1000 trials the number of successes equals 700. If you obtained 800
successes in 1000 trials would you explain it by luck or your own skill?
4) What is the probability that in 800 independent trials the number of successes will be more
than 150 and less than 250 when the probability of success in a single trial equals 0.25?
5) In a large number of products only 20% are of first class. We prepare a sample of 300 of
products.
a) what is the probability that among them there are more than 50 of first class?
b) what is the probability that the frequency of first class products in the sample will be no
greater than 0.3?
6) The probability of success in an individual trial equals p. How many trials must be
completed to obtain 40 successes with probability no less than ½?
7) The random variables X 1 , X 2 ,..., X n are of Poisson type with parameter λ=2. Estimate the
following probability:
1
0
0


P
1
9
0

X
2
1
0

?

i


i

1


8) The number of marks obtained in a “Statistics” exam varies between 0 and 100. The
average number of marks obtained by a student equals 40 with standard deviation 20. What is
the probability that:
a) the total number of marks of 150 students will be more than 6500?
b) the average score for these student is between 40 and 70?
9) The random variables X 1 , X 2 ,..., X n are of exponential type with parameter λ=4. Estimate
 100

the following probability : P   X i  30   ?
 i 1

10) 1500 real numbers are summarized by a computer. Each number is rounded to the nearest
integer number. Numbers are given with a precision of one decimal place. Assuming that
rounding errors are independent and of uniform type on (-0.5, 0.5) estimate the probability
that the sum of these errors exceeds 15.