Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Law of large numbers, Sample distribution Contents 1 The Law of Large Numbers, Limit Theorems 2 2 Excercises 6 3 Sample distribution 3.1 Survey Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Independent Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7 7 4 Excercises 12 1 The Law of Large Numbers, Limit Theorems If we repeat an experiment independently we can create distribution of relative frequencies using given observed values and calculate some measures (mean, median, variance . . . ). This distribution (measures) is called sample distribution (sample measures). Under particular conditions we can expect that the sample distribution (measures) will converge towards a theoretical distribution (measures). The more repetitions of the experiment the better convergence. Notice that the convergence of the sample values towards theoretical ones is not the convergence in the sense of mathematical convergence, but the probability convergence. The probability convergence – if the number of experiments increases, the probability of deviation between sample values and theoretical values decreases. Definition 1.1. If the sequence of random variables X1 , X2 , . . . , Xn , . . . fulfils lim P (|Xn − c| < ) = 1, > 0, n→∞ it is said that the sequence {Xn } converges in probability to the constant c, we write P Xn − → c. Theorem 1.1 (Chebyshev’s Inequality). For any random variable X with the mean E(X), the finite variance D(X) and for every ε > 0 we have D(X) . ε2 Chebyshev’s inequality is useful first of all in the theoretical field. It allow us to estimate some probabilities of random variables with unknown distribution. P (|X − E(X)| < ε) ≥ 1 − Theorem 1.2 (Bernoulli’s Theorem). If the random variable X denotes the number of occurrence of the event in the sequence of n independent experiments, where π is the probability of occurrence of the event in one experiment, then for every ε > 0 is X lim P − π < ε = 1. n→∞ n Theorem 1.3 (de Moivre-Laplace Theorem). Let X be a random variable with binomial distribution X ∼ B(n, π)1 For the standardized random variable X − nπ U=p nπ(1 − π) we have lim P (U ≤ u) = Φ(u), n→∞ where Φ(u) is the distribution function of the standard normal distribution N (0, 1). The de Moivre-Laplace theorem says that for n → ∞ the binomial distribution converges to the normal distribution. Given approximation is acceptable if 1 n <π< . n+1 n+1 nπ(1 − π) > 9 and 1 X = X1 , X2 , . . . , Xn , where Xi , i = 1 . . . , n, are independent Bernoulli random variables E(Xi ) = π, D(Xi ) = π(1 − π), which means E(X) = nπ and D(X) = nπ(1 − π). 2 Theorem 1.4 (de Moivre-Laplace Theorem for proportion). Let X be a randomvariable with binomial distribution X ∼ B(n, π). The random variable Xn has the mean E Xn = π and the variance D Xn = π(1−π) . For the standardized random variable n X −π √ U = pn n π(1 − π) we have lim P (U ≤ u) = Φ(u), n→∞ where Φ(u) is the distribution function of the standard normal distribution N (0, 1). Theorem 1.5 (Lévy-Lindeberg’s Theorem). Let the random variable be X = X1 + X2 + · · · + Xn , where Xi , i = 1, . . . , n, are independent random variables with the same distribution with the mean E(Xi ) = µ and the finite variance D(Xi ) = σ 2 .2 For the standardized random variable X − nµ U= √ nσ 2 we have lim P (U ≤ u) = Φ(u), n→∞ where Φ(u) is the distribution function of the standard normal distribution N (0, 1). Theorem 1.6 (Lévy-Lindeberg’s Theorem for the Mean). Let the random variable X be the mean of n independent random variables X1 , X2 , . . . , Xn , with the same distribution and the mean E(Xi ) = µ and the finite variance D(Xi ) = σ 2 , i = 1, . . . , n, then E(X) = µ and D(X) = σ2 n and for the standardized random variable U= X − µ√ n σ we have lim P (U ≤ u) = Φ(u), n→∞ where Φ(u) is the distribution function of the standard normal distribution N (0, 1). For M = X1 + · · · + Xn is: • M= n P Xi ∼ as.N (nµ, nσ 2 ), E(M ) = nµ, D(M ) = nσ 2 i=1 • U = M√−E(M ) = D(M ) M √−nµ nσ 2 ∼ as.N (0, 1) • P (M ≤ m) = F (m) ≈ Φ • P −u1−α/2 < 2 m−nµ √ nσ 2 m−nµ √ nσ 2 < u1−α/2 = 1 − α E(X) = nµ and D(X) = nσ 2 3 For the sample mean X is: • X= 1 n n P i=1 2 Xi ∼ as.N (µ, σn ), E(X) = µ, D(X) = √ • U = X−E(X) = D(X) X−µ √ n σ σ2 n ∼ as.N (0, 1) √ n • P (X ≤ x) = F (x) ≈ Φ x−µ σ √ • P −u1−α/2 < x−µ n < u1−α/2 = 1 − α σ In the case of using the normal distribution as an approximation of a distribution of a discrete random variable, it is recommended to apply so called continuity correction which improves this approximation. If we calculate P (X ≤ x) or P (X ≥ x) by normal approximation, we get underestimated results. On the contrary, if we calculate P (X < x) or P (X > x) by normal approximation, we get overestimated results. Some examples of continuity correction: before correction after correction x<3 x≤3 x < 2.5 x < 3.5 x=5 x≥7 x>7 4.5 < x < 5.5 x > 6.5 x > 7.5 Example 1.1. The probability that you hit the target is 0.8. What is the probability that the difference between the number of hits in the sequence of 200 shots and the mean of the this number will not be large than 10? Solution. Using the binomial distribution: E(X) = nπ = 200 · 0.8 = 160, D(X) = nπ(1 − π) = 200 · 0.8 · (1 − 0.8) = 32, P (150 ≤ X ≤ 170) = p(150) + p(151) + · · · + p(170) 200 200 150 50 = 0.8 · 0.2 + 0.8151 · 0.249 + · · · 150 151 200 + 0.8170 · 0.230 = 0.937; 170 usingde Moivre-Laplace theorem: F (x) ≈ Φ x − nπ ! p nπ(1 − π) P (150 ≤ X ≤ 170) = F (170) − F (149) ≈ Φ 170 − 160 √ 32 −Φ 149 − 160 √ 32 = 0.936; using de Moivre-Laplace theorem (with continuity correction): ! x − nπ F (x) ≈ Φ p nπ(1 − π) P (150 ≤ X ≤ 170) ≈ P (149.5 < X < 170.5) = F (170.5) − F (149.5) 170.5 − 160 149.5 − 160 √ √ =Φ −Φ = 0.937; 32 32 4 using Chebyshev’s inequality: P (|X − E(X)| < ) ≥ 1 − D(X) 2 E(X) = nπ = 200 · 0.8 = 160, D(X) = nπ(1 − π) = 200 · 0.8 · (1 − 0.8) = 32 32 = 0.68, 102 32 P (|X − 160)| < 11) ≥ 1 − 2 = 0.736. 11 P (|X − 160)| < 10) ≥ 1 − Example 1.2. In some elections the coalition obtained 52% of votes. What is the probability that in the public opinion research of the size 2600 respondents the opposition won? Solution. X . . . the number of respondents who voted the opposition, X ∼ B(2600; 0.48), E(X) = nπ = 2600 · 0.48 = 1248, D(X) = nπ(1 − π) = 2600 · 0.48 · (1 − 0.48) = 648.96 P (X > 1300) = 1 − P (X ≤ 1300) = 1 − [p(0) + · · · + p(1300)] 2600 =1− · 0.480 · 0.522600 + · · · 0 2600 1300 1300 + · 0.48 · 0.52 = 1 − 0.98031 = 0.01969 1300 de Moivre-Laplace theorem: F (x) ≈ Φ x − nπ ! p nπ(1 − π) P (X > 1300) = 1 − P (X ≤ 1300) = 1 − F (1300) 1300 − 1248 √ ≈1−Φ = 1 − 0.97939 = 0.02061 648.96 de Moivre-Laplace theorem (with continuity correction): ! x − nπ F (x) ≈ Φ p nπ(1 − π) P (X > 1300) = 1 − P (X ≤ 1300) ≈ 1 − P (X < 1300.5) 1300.5 − 1248 √ =1−Φ 648.96 = 1 − 0.98034 = 0.01966 5 2 Excercises 1. By the long-term observation, it was found that the time needed to locate and eliminate disturbances of a machine has a mean value 30 minutes and a standard deviation 12 minutes. Determine (a) the time which is required for 40 machines if this period has not been exceeded with a probability of 0.95, (b) the probability that the average time for 40 machines does not exceed 32 minutes. 2. From the breeding pond were fished 15 carps and after finding their weights were released back. Based on the measured weight the mean value 2.2 kg and the standard deviation of 0.6 kg were estimated. Suppose that the weight of carps follows a normal distribution. In the pond were given 1500 carps and the mortality is 10%. What is the probability that a) randomly caught carp will weigh less than 2 kg, b) we get at least 3000 kg carps from the whole pond? 6 3 Sample distribution 3.1 Survey Sampling Survey sampling can be • entire, total, complete → census • incomplete → sample survey We would like to get sample which represents the characteristics of the population as closely as possible → representative sample. A sample can be random A sample is drawn in a such way that each element of the population has a chance of being selected. If all samples of the same size selected from a population have the same chance of being selected, we call it simple random sampling. Such a sample is called a simple random sample. non-random The elements of the sample are not selected randomly but with a view of obtaining a representative sample. 3.2 Independent Random Variables We measure some characteristic (variable) xi (i = 1, 2, . . . , n) in given random sample – we obtain data. We can consider each value of characteristic as a possible value of a random variable Xi . Every random variable Xi , (i = 1, . . . , n) has the same distribution. Definition 3.1. The random sample of size n is a sequence of independent random variables X1 , X2 , . . . , Xn with the same distribution. The random sample can be considered as a vector X = (X1 , X2 , . . . , Xn ). We denote measured data x1 , x2 , . . . , xn and they are called measurements or (empirical) data. If X1 , X2 , . . . , Xn is the random sample (i.i.d. – independent identically distributed random variables) then a distribution function F (x) of the random sample is F (x) = F (x1 )F (x2 ) · · · F (xn ), xi ∈ R. Example 3.1. Let X = (X1 , X2 , . . . , Xn ) be a random sample from a uniform distribution on an interval (0, 1). Find a distribution function F (x) of the random sample. Solution. Xi ∼ R(0, 1) thus F (xi ) = xi for 0 < xi < 1, F (x) = F (x1 )F (x2 ) · · · F (xn ) = x1 · x2 · · · xn . If X1 , X2 , . . . , Xn is the random sample (i.i.d. random variables) then a probability function p(x) of the random sample is p(x) = p(x1 )p(x2 ) · · · p(xn ), xi ∈ R. Example 3.2. Let X = (X1 , X2 , . . . , Xn ) be a random sample from a Poisson distribution with a parameter λ. Find a probability function p(x) of the random sample. 7 λxi −λ e xi ! Solution. Xi ∼ P o(λ) thus p(xi ) = for xi = 0, 1, 2, . . . , i = 1, 2, . . . , n Pn λxn −λ 1 λx1 −λ e ··· e = λ i=1 xi e−nλ . x1 ! xn ! x1 ! · x2 ! · · · xn ! If X1 , X2 , . . . , Xn is the random sample (i.i.d. random variables) then a probability density function f (x) of the random sample from a distribution with the probability density function f (x) is f (x) = f (x1 , x2 , . . . , xn ) = f (x1 )f (x2 ) · · · f (xn ). p(x) = Example 3.3. Let X = (X1 , X2 , . . . , Xn ) be a random sample from normal distribution N (µ, σ 2 ). Find the probability density function f (x) of the random sample. Solution. Xi ∼ N (µ, σ 2 ) thus f (xi ) = f (x) = n Y i=1 √ (xi −µ)2 2σ 2 √ 1 e− 2πσ for xi ∈ R, i = 1, 2, . . . , n Pn (xi −µ)2 2 1 1 − 12 i=1 (xi −µ) 2σ e− 2σ2 = e n/2 n (2π) σ 2πσ Definition 3.2. A function of random variables X1 , X2 , . . . , Xn is called statistics T = T (X1 , X2 , . . . , Xn ) = T (X). Examples of statitistics: Sample sum n X M= Xi i=1 Sample mean n X= 1X Xi n i=1 Sample variance n 1 X S = (Xi − X)2 n − 1 i=1 2 Sample standard deviation √ S= S2 Sample (moment) variance n Sn2 = 1X n−1 2 (Xi − X)2 = S n i=1 n Sample rth moment n Mr0 1X r = X n i=1 i Sample rth central moment n 1X Mr = (Xi − X)r n i=1 8 Sample skewness A3 = M3 3/2 M2 Sample kurtosis A4 = M4 −3 M22 Let X1 , X2 , . . . , Xn be a random sample from a distribution with the expected value (the mean) µ and the variance σ 2 (E(Xi ) = µ, D(Xi ) = σ 2 , for i = 1, 2 . . . , n). The expected value and the variance of the sample sum are " n # n X X E(M ) = E Xi = E(Xi ) = nµ D(M ) = D " i=1 n X # Xi = i=1 i=1 n X D(Xi ) = nσ 2 i=1 Theorem 3.1. If X1 , X2 , . . . , Xn is a random sample from a normal distribution N (µ, σ 2 ), then the sample sum also has a normal distribution M ∼ N (nµ, nσ 2 ). Let X1 , X2 , . . . , Xn be a random sample from a distribution with the expected value (the mean) µ and the variance σ 2 . The expected value and the variance of the sample mean are " n # n 1X 1 1X E(X) = E Xi = E(Xi ) = nµ = µ n n i=1 n # " i=1 n n 1 X σ2 1 1X D(X) = D Xi = 2 D(Xi ) = 2 nσ 2 = n i=1 n i=1 n n Theorem 3.2. If X1 , X2 , . . . , Xn is a random sample from a normal distribution N (µ, σ 2 ), then the sample mean also has a normal distribution σ2 X ∼ N µ, . n A standardized random variable Z= X − µ√ n, σ has standard normal distribution N (0, 1). If X1 , X2 , . . . , Xn is a random sample from a distribution with the mean µ and variance σ 2 , then a random variable X − µ√ Z= n σ has for n ≥ 30 approximately a standard normal distribution N (0, 1) – see the central limit theorem. 9 To derive the expected value of the sample variance we need following formulas: n n 1X 1X 2 2 = (Xi − X)2 = Xi − X n i=1 n i=1 Sn2 D(Xi ) = E(Xi2 ) − E(Xi )2 → E(Xi2 ) = D(Xi ) + E(Xi )2 = σ 2 + µ2 σ2 2 2 D(X) = E(X ) − E(X)2 → E(X ) = D(X) + E(X)2 = + µ2 n n 1X 2 2 Xi − X n i=1 E(Sn2 ) = E ! =E n 1X 2 X n i=1 i ! n 1X 1 2 = E(Xi2 ) − E(X ) = n(σ 2 + µ2 ) − n i=1 n = σ2 − 2 E(S ) = E 2 − E(X ) σ2 + µ2 n σ2 n−1 2 = σ n n n S2 n−1 n = n−1 2 n · σ = σ2 n−1 n Theorem 3.3. Let X1 , X2 , . . . , Xn be a random sample from a normal distribution with the mean µ and the variance σ 2 . A random variable χ2 = n−1 2 S σ2 has χ2 -distribution with n − 1 degrees of freedom. Let us assume a random sample from a normal distribution with the mean µ and variance √ σ . We know that Z = X−µ n ∼ N (0, 1) and χ2 = n−1 S 2 ∼ χ2 (n − 1). A random variable σ σ2 2 Z T =q χ2 n−1 √ X − µ√ n−1 X − µ√ σ X − µ√ = n· q = n· = n σ σ S S n−1 2 S σ2 has a Student t-distribution with n − 1 degrees of freedom. Theorem 3.4. Let us have a random sample from a normal distribution with the mean µ and variance σ 2 . A random variable X − µ√ T = n S has a Student t-distribution with n − 1 degrees of freedom. Let us assume that distribution in a population can be described as a distribution of a Bernoulli random variable. A random sample can contain either ones or zeros. A random variable X = X1 + X2 + · · · + Xn denotes the number of ones (co called a sample frequency). A ratio X P = n is called a sample relative frequency or a sample proportion. 10 Let us assume that n is big enough. The random variable P = Xn has approximately normal p distribution with the mean π and the standard deviation π(1 − π)/n – see the central limit theorem 1.4. A standardized random variable Z=p P −π π(1 − π)/n has for large n approximately normal distribution N (0, 1). Approximation can be used if nπ ≥ 5 and n(1 − π) ≥ 5. 11 4 Excercises 1. The university office prepares students opinion survey on the quality of teaching. The total number of students is 1850, the sample size is assumed to be approximately 50. Design structure of the sample. 2. In January, the traffic police conducted an extensive operation in which it was examined whether vehicles have winter tires. From all passing cars, every tenth vehicle was subjected to the control. A total of 1 463 vehicles were inspected and only 97 vehicles did not have winter tires. (a) Specify the basic set and the random sample? (b) What is the proportion of vehicles which did not have the right tires? (c) What kind of random sample was performed? (d) What is the probability that a randomly selected vehicle will be controlled? 12