Download Law of large numbers, Sample distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Law of large numbers,
Sample distribution
Contents
1
The Law of Large Numbers, Limit Theorems
2
2
Excercises
6
3
Sample distribution
3.1 Survey Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Independent Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . .
7
7
7
4
Excercises
12
1
The Law of Large Numbers, Limit Theorems
If we repeat an experiment independently we can create distribution of relative frequencies using given observed values and calculate some measures (mean, median, variance . . . ). This
distribution (measures) is called sample distribution (sample measures). Under particular conditions we can expect that the sample distribution (measures) will converge towards a theoretical
distribution (measures). The more repetitions of the experiment the better convergence.
Notice that the convergence of the sample values towards theoretical ones is not the convergence in the sense of mathematical convergence, but the probability convergence. The probability convergence – if the number of experiments increases, the probability of deviation between
sample values and theoretical values decreases.
Definition 1.1. If the sequence of random variables X1 , X2 , . . . , Xn , . . . fulfils
lim P (|Xn − c| < ) = 1, > 0,
n→∞
it is said that the sequence {Xn } converges in probability to the constant c, we write
P
Xn −
→ c.
Theorem 1.1 (Chebyshev’s Inequality). For any random variable X with the mean E(X), the
finite variance D(X) and for every ε > 0 we have
D(X)
.
ε2
Chebyshev’s inequality is useful first of all in the theoretical field. It allow us to estimate
some probabilities of random variables with unknown distribution.
P (|X − E(X)| < ε) ≥ 1 −
Theorem 1.2 (Bernoulli’s Theorem). If the random variable X denotes the number of occurrence of the event in the sequence of n independent experiments, where π is the probability of
occurrence of the event in one experiment, then for every ε > 0 is
X
lim P − π < ε = 1.
n→∞
n
Theorem 1.3 (de Moivre-Laplace Theorem). Let X be a random variable with binomial distribution X ∼ B(n, π)1 For the standardized random variable
X − nπ
U=p
nπ(1 − π)
we have
lim P (U ≤ u) = Φ(u),
n→∞
where Φ(u) is the distribution function of the standard normal distribution N (0, 1).
The de Moivre-Laplace theorem says that for n → ∞ the binomial distribution converges
to the normal distribution.
Given approximation is acceptable if
1
n
<π<
.
n+1
n+1
nπ(1 − π) > 9 and
1
X = X1 , X2 , . . . , Xn , where Xi , i = 1 . . . , n, are independent Bernoulli random variables E(Xi ) = π,
D(Xi ) = π(1 − π), which means E(X) = nπ and D(X) = nπ(1 − π).
2
Theorem 1.4 (de Moivre-Laplace Theorem for proportion). Let X be a randomvariable with
binomial distribution X ∼ B(n, π). The random variable Xn has the mean E Xn = π and the
variance D Xn = π(1−π)
. For the standardized random variable
n
X
−π √
U = pn
n
π(1 − π)
we have
lim P (U ≤ u) = Φ(u),
n→∞
where Φ(u) is the distribution function of the standard normal distribution N (0, 1).
Theorem 1.5 (Lévy-Lindeberg’s Theorem). Let the random variable be X = X1 + X2 + · · · +
Xn , where Xi , i = 1, . . . , n, are independent random variables with the same distribution with
the mean E(Xi ) = µ and the finite variance D(Xi ) = σ 2 .2 For the standardized random
variable
X − nµ
U= √
nσ 2
we have
lim P (U ≤ u) = Φ(u),
n→∞
where Φ(u) is the distribution function of the standard normal distribution N (0, 1).
Theorem 1.6 (Lévy-Lindeberg’s Theorem for the Mean). Let the random variable X be the
mean of n independent random variables X1 , X2 , . . . , Xn , with the same distribution and the
mean E(Xi ) = µ and the finite variance D(Xi ) = σ 2 , i = 1, . . . , n, then
E(X) = µ
and
D(X) =
σ2
n
and for the standardized random variable
U=
X − µ√
n
σ
we have
lim P (U ≤ u) = Φ(u),
n→∞
where Φ(u) is the distribution function of the standard normal distribution N (0, 1).
For M = X1 + · · · + Xn is:
• M=
n
P
Xi ∼ as.N (nµ, nσ 2 ), E(M ) = nµ, D(M ) = nσ 2
i=1
• U = M√−E(M ) =
D(M )
M
√−nµ
nσ 2
∼ as.N (0, 1)
• P (M ≤ m) = F (m) ≈ Φ
• P −u1−α/2 <
2
m−nµ
√
nσ 2
m−nµ
√
nσ 2
< u1−α/2 = 1 − α
E(X) = nµ and D(X) = nσ 2
3
For the sample mean X is:
• X=
1
n
n
P
i=1
2
Xi ∼ as.N (µ, σn ), E(X) = µ, D(X) =
√
• U = X−E(X)
=
D(X)
X−µ √
n
σ
σ2
n
∼ as.N (0, 1)
√ n
• P (X ≤ x) = F (x) ≈ Φ x−µ
σ
√
• P −u1−α/2 < x−µ
n < u1−α/2 = 1 − α
σ
In the case of using the normal distribution as an approximation of a distribution of a discrete
random variable, it is recommended to apply so called continuity correction which improves this
approximation.
If we calculate P (X ≤ x) or P (X ≥ x) by normal approximation, we get underestimated
results. On the contrary, if we calculate P (X < x) or P (X > x) by normal approximation, we
get overestimated results.
Some examples of continuity correction:
before correction
after correction
x<3
x≤3
x < 2.5 x < 3.5
x=5
x≥7
x>7
4.5 < x < 5.5 x > 6.5 x > 7.5
Example 1.1. The probability that you hit the target is 0.8. What is the probability that the
difference between the number of hits in the sequence of 200 shots and the mean of the this
number will not be large than 10?
Solution. Using the binomial distribution: E(X) = nπ = 200 · 0.8 = 160, D(X) = nπ(1 −
π) = 200 · 0.8 · (1 − 0.8) = 32,
P (150 ≤ X ≤ 170) = p(150) + p(151) + · · · + p(170)
200
200
150
50
=
0.8 · 0.2 +
0.8151 · 0.249 + · · ·
150
151
200
+
0.8170 · 0.230 = 0.937;
170
usingde Moivre-Laplace theorem:
F (x) ≈ Φ
x − nπ
!
p
nπ(1 − π)
P (150 ≤ X ≤ 170) = F (170) − F (149) ≈ Φ
170 − 160
√
32
−Φ
149 − 160
√
32
= 0.936;
using de Moivre-Laplace theorem (with continuity correction):
!
x − nπ
F (x) ≈ Φ p
nπ(1 − π)
P (150 ≤ X ≤ 170) ≈ P (149.5 < X < 170.5) = F (170.5) − F (149.5)
170.5 − 160
149.5 − 160
√
√
=Φ
−Φ
= 0.937;
32
32
4
using Chebyshev’s inequality:
P (|X − E(X)| < ) ≥ 1 −
D(X)
2
E(X) = nπ = 200 · 0.8 = 160, D(X) = nπ(1 − π) = 200 · 0.8 · (1 − 0.8) = 32
32
= 0.68,
102
32
P (|X − 160)| < 11) ≥ 1 − 2 = 0.736.
11
P (|X − 160)| < 10) ≥ 1 −
Example 1.2. In some elections the coalition obtained 52% of votes. What is the probability
that in the public opinion research of the size 2600 respondents the opposition won?
Solution. X . . . the number of respondents who voted the opposition, X ∼ B(2600; 0.48),
E(X) = nπ = 2600 · 0.48 = 1248, D(X) = nπ(1 − π) = 2600 · 0.48 · (1 − 0.48) = 648.96
P (X > 1300) = 1 − P (X ≤ 1300) = 1 − [p(0) + · · · + p(1300)]
2600
=1−
· 0.480 · 0.522600 + · · ·
0
2600
1300
1300
+
· 0.48
· 0.52
= 1 − 0.98031 = 0.01969
1300
de Moivre-Laplace theorem:
F (x) ≈ Φ
x − nπ
!
p
nπ(1 − π)
P (X > 1300) = 1 − P (X ≤ 1300) = 1 − F (1300)
1300 − 1248
√
≈1−Φ
= 1 − 0.97939 = 0.02061
648.96
de Moivre-Laplace theorem (with continuity correction):
!
x − nπ
F (x) ≈ Φ p
nπ(1 − π)
P (X > 1300) = 1 − P (X ≤ 1300) ≈ 1 − P (X < 1300.5)
1300.5 − 1248
√
=1−Φ
648.96
= 1 − 0.98034 = 0.01966
5
2
Excercises
1. By the long-term observation, it was found that the time needed to locate and eliminate
disturbances of a machine has a mean value 30 minutes and a standard deviation 12
minutes. Determine
(a) the time which is required for 40 machines if this period has not been exceeded with
a probability of 0.95,
(b) the probability that the average time for 40 machines does not exceed 32 minutes.
2. From the breeding pond were fished 15 carps and after finding their weights were released
back. Based on the measured weight the mean value 2.2 kg and the standard deviation of
0.6 kg were estimated. Suppose that the weight of carps follows a normal distribution. In
the pond were given 1500 carps and the mortality is 10%. What is the probability that
a) randomly caught carp will weigh less than 2 kg,
b) we get at least 3000 kg carps from the whole pond?
6
3
Sample distribution
3.1
Survey Sampling
Survey sampling can be
• entire, total, complete → census
• incomplete → sample survey
We would like to get sample which represents the characteristics of the population as closely
as possible → representative sample.
A sample can be
random A sample is drawn in a such way that each element of the population has a chance of
being selected. If all samples of the same size selected from a population have the same
chance of being selected, we call it simple random sampling. Such a sample is called a
simple random sample.
non-random The elements of the sample are not selected randomly but with a view of obtaining a representative sample.
3.2
Independent Random Variables
We measure some characteristic (variable) xi (i = 1, 2, . . . , n) in given random sample – we
obtain data. We can consider each value of characteristic as a possible value of a random
variable Xi . Every random variable Xi , (i = 1, . . . , n) has the same distribution.
Definition 3.1. The random sample of size n is a sequence of independent random variables
X1 , X2 , . . . , Xn with the same distribution.
The random sample can be considered as a vector X = (X1 , X2 , . . . , Xn ). We denote
measured data x1 , x2 , . . . , xn and they are called measurements or (empirical) data.
If X1 , X2 , . . . , Xn is the random sample (i.i.d. – independent identically distributed random
variables) then a distribution function F (x) of the random sample is
F (x) = F (x1 )F (x2 ) · · · F (xn ),
xi ∈ R.
Example 3.1. Let X = (X1 , X2 , . . . , Xn ) be a random sample from a uniform distribution on
an interval (0, 1). Find a distribution function F (x) of the random sample.
Solution. Xi ∼ R(0, 1) thus F (xi ) = xi for 0 < xi < 1,
F (x) = F (x1 )F (x2 ) · · · F (xn ) = x1 · x2 · · · xn .
If X1 , X2 , . . . , Xn is the random sample (i.i.d. random variables) then a probability function
p(x) of the random sample is
p(x) = p(x1 )p(x2 ) · · · p(xn ),
xi ∈ R.
Example 3.2. Let X = (X1 , X2 , . . . , Xn ) be a random sample from a Poisson distribution with
a parameter λ. Find a probability function p(x) of the random sample.
7
λxi −λ
e
xi !
Solution. Xi ∼ P o(λ) thus p(xi ) =
for xi = 0, 1, 2, . . . , i = 1, 2, . . . , n
Pn
λxn −λ
1
λx1 −λ
e ···
e = λ i=1 xi e−nλ
.
x1 !
xn !
x1 ! · x2 ! · · · xn !
If X1 , X2 , . . . , Xn is the random sample (i.i.d. random variables) then a probability density
function f (x) of the random sample from a distribution with the probability density function
f (x) is
f (x) = f (x1 , x2 , . . . , xn ) = f (x1 )f (x2 ) · · · f (xn ).
p(x) =
Example 3.3. Let X = (X1 , X2 , . . . , Xn ) be a random sample from normal distribution N (µ, σ 2 ).
Find the probability density function f (x) of the random sample.
Solution. Xi ∼ N (µ, σ 2 ) thus f (xi ) =
f (x) =
n
Y
i=1
√
(xi −µ)2
2σ 2
√ 1 e−
2πσ
for xi ∈ R, i = 1, 2, . . . , n
Pn
(xi −µ)2
2
1
1
− 12
i=1 (xi −µ)
2σ
e− 2σ2 =
e
n/2
n
(2π) σ
2πσ
Definition 3.2. A function of random variables X1 , X2 , . . . , Xn is called statistics
T = T (X1 , X2 , . . . , Xn ) = T (X).
Examples of statitistics:
Sample sum
n
X
M=
Xi
i=1
Sample mean
n
X=
1X
Xi
n i=1
Sample variance
n
1 X
S =
(Xi − X)2
n − 1 i=1
2
Sample standard deviation
√
S=
S2
Sample (moment) variance
n
Sn2 =
1X
n−1 2
(Xi − X)2 =
S
n i=1
n
Sample rth moment
n
Mr0
1X r
=
X
n i=1 i
Sample rth central moment
n
1X
Mr =
(Xi − X)r
n i=1
8
Sample skewness
A3 =
M3
3/2
M2
Sample kurtosis
A4 =
M4
−3
M22
Let X1 , X2 , . . . , Xn be a random sample from a distribution with the expected value (the
mean) µ and the variance σ 2 (E(Xi ) = µ, D(Xi ) = σ 2 , for i = 1, 2 . . . , n). The expected value
and the variance of the sample sum are
" n
#
n
X
X
E(M ) = E
Xi =
E(Xi ) = nµ
D(M ) = D
" i=1
n
X
#
Xi =
i=1
i=1
n
X
D(Xi ) = nσ 2
i=1
Theorem 3.1. If X1 , X2 , . . . , Xn is a random sample from a normal distribution N (µ, σ 2 ), then
the sample sum also has a normal distribution
M ∼ N (nµ, nσ 2 ).
Let X1 , X2 , . . . , Xn be a random sample from a distribution with the expected value (the
mean) µ and the variance σ 2 . The expected value and the variance of the sample mean are
" n
#
n
1X
1
1X
E(X) = E
Xi =
E(Xi ) = nµ = µ
n
n i=1
n
#
" i=1
n
n
1 X
σ2
1
1X
D(X) = D
Xi = 2
D(Xi ) = 2 nσ 2 =
n i=1
n i=1
n
n
Theorem 3.2. If X1 , X2 , . . . , Xn is a random sample from a normal distribution N (µ, σ 2 ), then
the sample mean also has a normal distribution
σ2
X ∼ N µ,
.
n
A standardized random variable
Z=
X − µ√
n,
σ
has standard normal distribution N (0, 1).
If X1 , X2 , . . . , Xn is a random sample from a distribution with the mean µ and variance σ 2 ,
then a random variable
X − µ√
Z=
n
σ
has for n ≥ 30 approximately a standard normal distribution N (0, 1) – see the central limit
theorem.
9
To derive the expected value of the sample variance we need following formulas:
n
n
1X
1X 2
2
=
(Xi − X)2 =
Xi − X
n i=1
n i=1
Sn2
D(Xi ) = E(Xi2 ) − E(Xi )2 → E(Xi2 ) = D(Xi ) + E(Xi )2 = σ 2 + µ2
σ2
2
2
D(X) = E(X ) − E(X)2 → E(X ) = D(X) + E(X)2 =
+ µ2
n
n
1X 2
2
Xi − X
n i=1
E(Sn2 ) = E
!
=E
n
1X 2
X
n i=1 i
!
n
1X
1
2
=
E(Xi2 ) − E(X ) = n(σ 2 + µ2 ) −
n i=1
n
= σ2 −
2
E(S ) = E
2
− E(X )
σ2
+ µ2
n
σ2
n−1 2
=
σ
n
n
n
S2
n−1 n
=
n−1 2
n
·
σ = σ2
n−1
n
Theorem 3.3. Let X1 , X2 , . . . , Xn be a random sample from a normal distribution with the
mean µ and the variance σ 2 . A random variable
χ2 =
n−1 2
S
σ2
has χ2 -distribution with n − 1 degrees of freedom.
Let us assume a random sample from a normal distribution with the mean µ and variance
√
σ . We know that Z = X−µ
n ∼ N (0, 1) and χ2 = n−1
S 2 ∼ χ2 (n − 1). A random variable
σ
σ2
2
Z
T =q
χ2
n−1
√
X − µ√
n−1
X − µ√ σ
X − µ√
=
n· q
=
n· =
n
σ
σ
S
S
n−1 2
S
σ2
has a Student t-distribution with n − 1 degrees of freedom.
Theorem 3.4. Let us have a random sample from a normal distribution with the mean µ and
variance σ 2 . A random variable
X − µ√
T =
n
S
has a Student t-distribution with n − 1 degrees of freedom.
Let us assume that distribution in a population can be described as a distribution of a
Bernoulli random variable. A random sample can contain either ones or zeros. A random
variable X = X1 + X2 + · · · + Xn denotes the number of ones (co called a sample frequency).
A ratio
X
P =
n
is called a sample relative frequency or a sample proportion.
10
Let us assume that n is big enough. The random variable P = Xn has approximately normal
p
distribution with the mean π and the standard deviation π(1 − π)/n – see the central limit
theorem 1.4.
A standardized random variable
Z=p
P −π
π(1 − π)/n
has for large n approximately normal distribution N (0, 1). Approximation can be used if nπ ≥
5 and n(1 − π) ≥ 5.
11
4
Excercises
1. The university office prepares students opinion survey on the quality of teaching. The
total number of students is 1850, the sample size is assumed to be approximately 50.
Design structure of the sample.
2. In January, the traffic police conducted an extensive operation in which it was examined
whether vehicles have winter tires. From all passing cars, every tenth vehicle was subjected to the control. A total of 1 463 vehicles were inspected and only 97 vehicles did
not have winter tires.
(a) Specify the basic set and the random sample?
(b) What is the proportion of vehicles which did not have the right tires?
(c) What kind of random sample was performed?
(d) What is the probability that a randomly selected vehicle will be controlled?
12