Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
COM2023 Mathematics Methods for Computing II Lecture 13& 14 Gianne Derks Department of Mathematics (36AA04) http://www.maths.surrey.ac.uk/Modules/COM2023 Autumn 2010 Use channel 04 on your EVS handset Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Uniform distribution 3 The Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 The Uniform Distribution and Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Normal distribution Normal distribution . . . . . . . . . . . . . . . Normal distribution: changing µ and σ . . The normal distribution and Matlab . . Spread of the normal distribution. . . . . . Samples and normal distribution . . . . . . Properties of normal variables . . . . . . . . Examples for combined normal variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 . 7 . 8 . 9 10 11 12 13 Central limit theorem Averages of a normal distribution . . . . . Central limit theorem. . . . . . . . . . . . . . Example Central limit theorem: n = 200. Example Central limit theorem: n = 400. Example Central limit theorem: n = 600. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 15 16 17 18 19 1 Overview ● Distributions for continuous random variables: ◆ The uniform distribution ◆ The normal distribution ● The central limit theorem The deadline for the assignment has been extended to Monday 22 November, 2pm Uniform distribution (§4.7) The Uniform Distribution The random variable X is uniformly distributed over the interval [a, b], in short X ∼ U [a, b], if it has a constant probability density function over the interval [a, b]: f(x) f (x) = ( 1 b−a , 1 b−a a≤x≤b 0, otherwise a The expectation is E(X) = b+a 2 and the variance is Var(X) = b x (b−a)2 12 . Example The random variable X has a uniform distribution over the interval [1, 6], i.e., X ∼ U [1, 6]. What is the probability that X takes a value between 2 and 4? The probability is the area of the graph for x between 2 and 4, hence P (2 ≤ X ≤ 4) = (4 − 2) · 1 6−1 = 25 . 2 The Uniform Distribution and Matlab The package Matlab has the uniform distribution build in. For X ∼ U [a, b]: ● unifpdf(x,a,b) gives the pdf f (x). ● unifcdf(x,a,b) gives P (X ≤ x). ● unifrnd(a,b,1,y) gives a random sample in a 1×y matrix. Examples ● f (4) for X ∼ U [2, 7] is obtained by unifpdf(4,2,7) and is 0.2; ● P (X ≤ 3) for X ∼ U [−1, 9] follows from unifcdf(3,-1,9) and is 0.4; ● P (−1 ≤ X ≤ 3) for X ∼ U [−6, 8] is obtained by unifcdf(3,-6,8)-unifcdf(-1,-6,8) and is 0.2857143; ● P (X > 5) for X ∼ U [4, 9] follows from 1-unifcdf(5,4,9) and is 0.8; ● P (X ≤ 8 | X > 6) for X ∼ U [5, 10] is obtained by using that P (X ≤ 8 | X > 6) = PP(6<X≤8) (X>6) , hence in Matlab we find it is 0.5 by (unifcdf(8,5,10)-unifcdf(6,5,10))/(1-unifcdf(6,5,10)). Normal distribution (§4.8) Normal distribution The normal distribution or Gaussian distribution is the most important distribution in statistics. It has two parameters: the location, controlled by µ and the dispersion, controlled by σ. The probability density function is ! 1 1 x−µ 2 f (x) = √ exp − 2 σ 2πσ 2 (you don’t have to remember this). A random variable X satisfying a normal distribution is denoted by X ∼ N (µ, σ 2 ). For a random variable X ∼ N (µ, σ 2 ) the expectation and variance are E(X) = µ and 3 Var(X) = σ 2 . Normal distribution: changing µ and σ Note the effect of changing µ and σ and note that the graph is symmetric about X = µ. 1.6 N(2,0.25) N(2,1) N(0,1) 1.4 pdf f(x) of normal distribution 1.2 1 0.8 0.6 0.4 0.2 0 -2 -1 0 1 x 2 3 4 The normal distribution and Matlab Matlab has the uniform distribution build in: For X ∼ N (µ, σ 2 ). ● normpdf(x,µ,σ) gives f (x); ● normcdf(x,µ,σ) gives P (X ≤ x); ● normrnd(µ,σ,1,y) gives a random sample of y values in a 1×y matrix. Note: Matlab uses the standard deviation σ, while N uses the variance σ 2 . Examples for X ∼ N (15, 11): ● ● P (X ≤ 18) follows from normcdf(18,15,sqrt(11)) and is 0.90928; P (X > 15) follows from 1-normcdf(15,15,sqrt(11)) and is 0.5; ● P (18 < X ≤ 25) is 0.90928, which follows from normcdf(25,15,sqrt(11))-normcdf(18,15,sqrt(11)); ● P (15 < X < 25) is 0.4987, which follows from normcdf(25,15,sqrt(11))-normcdf(15,15,sqrt(11)); ● P (X ≤ 22 | X ≤ 26) follows from P (X ≤ 22 | X ≤ 26) = PP (X≤22) (X≤26) , i.e., normcdf(22,15,sqrt(11))/normcdf(26,15,sqrt(11)) is 0.9830. 4 Spread of the normal distribution The area of the normal distribution is such that: ● 95% is in the range [µ − 1.96σ, µ + 1.96σ]; ● 99% is in the range [µ − 2.58σ, µ + 2.58σ]. To check for N (3, 4), use Matlab: ● ● normcdf(3+1.96*2,3,2)normcdf(3-1.96*2,3,2) 0.9500042; gives normcdf(3+2.58*2,3,2)normcdf(3-2.58*2,3,2) 0.99012. gives Samples and normal distribution With Matlab, we can get a sample from a normal distribution N (µ, σ 2 ) by using normrnd(µ,σ,1,n) to get a 1×n matrix. N(12,8): scaled plot and histogram of a sample of size 200 25 On the right is the histogram of a sample of size 200 from the normal distribution N (12, 8), together with a scaled plot of the normal distribution. 15 10 5 0 Frequency 20 The sample follows from normrnd(12,sqrt(8),1,200) 5 10 15 Sample value 5 20 Properties of normal variables Recall that for any two independent random variables X and Y : E(aX + bY + c) = aE(X) + bE(Y ) + c; Var(aX + bY + c) = a2 Var(X) + b2 Var(Y ). For random variables with normal distributions, we have 1. If X ∼ N (µ, σ 2 ), then aX + b ∼ N (aµ + b, a2 σ 2 ), as E(aX + b) = aµ + b and Var(aX + b) = a2 σ 2 . 2. If X ∼ N (µx , σx2 ) and Y ∼ N (µy , σy2 ) and X and Y are independent, then aX + bY ∼ N (aµx + bµy , a2 σx2 + b2 σy2 ). as E(aX + bY ) = aµx + bµy and Var(aX + bY ) = a2 σx2 + b2 σy2 . Examples for combined normal variables For X ∼ N (µx , σx2 ) and Y ∼ N (µy , σy2 ), we have aX + bY + c ∼ N (aµx + bµy + c, a2 σx2 + b2 σy2 . Examples 2X − Y − 8: Answer: 2X − Y − 8 ∼ N (1, 9). 80 ● 1 ● 4 0 20 40 (X + X + X + X): Answer: 8 1 4 (X + X + X + X) ∼ N (3, 16 )6= X! 60 X +Y: Answer: X + Y ∼ N (0, 3) Frequency ● 100 120 Let X ∼ N (3, 2) and Y ∼ N (−3, 1) (X, Y indep.). Find the distributions of ● −X + 2: 2X−Y−8: histogram of sample and scaled N(1,9) plot Answer: −X + 2 ∼ N (−1, 2); −10 −5 0 sample value 6 5 10 Central limit theorem (§4.8.3) Averages of a normal distribution For independent normal variables X1 , X2 , . . . , Xn , each with distribution N (µ, σ 2 ), the average X = n1 (X1 + . . . + Xn ) has a N (µ, σ 2 /n) distribution. This follows immediately from property 2 and is refeered to as the sampling distribution of the sample mean 0.15 0.10 0.00 0.05 probility density function 0.20 0.25 N(25,20/10) N(25,20/3) N(25,20) 10 15 20 25 30 35 40 x Central limit theorem Central limit theorem For any population distribution with mean µ and variance σ: If is n sufficiently large, the sampling distribution of the sample mean X = N (µ, σ 2 /n) distribution. 1 n (X + . . . + X) has a Example Consider the Poisson distribution with average 8, hence µ = 8 = σ 2 . ● n = 200: Using poissrnd, we take a sample of size 200, register its average and repeat this 500 times. This average of these 500 numbers is µ = 8.014 and the variance is σ 2 = 0.03802665, while 8/200 = 0.04. ● n = 400: Using poissrnd, we take a sample of size 400, register its average and repeat this 500 times. This average of these 500 numbers is µ = 8.0068 and the variance is σ 2 = 0.0206, while 8/400 = 0.02. ● n = 600: Using poissrnd, we take a sample of size 600, register its average and repeat this 500 times. This average of these 500 numbers is µ = 7.9963 and the variance is σ 2 = 0.01377, while 8/600 = 0.01333. 7 Example Central limit theorem: n = 200 Histogram of the 500 observations of means of samples of size 200, together with the scaled graph of N (8, 8/200). QQplot of quantiles of N (8.8/200) against the quantiles of the 500 observations of means of samples of size 200. QQ plot for means of samples of size 200 0 7.6 20 7.8 8.0 Quantiles of means 60 40 Frequency 8.2 80 8.4 100 8.6 Means of samples of size 200 of Po(8) 7.4 7.6 7.8 8.0 8.2 8.4 8.6 −3 −2 −1 mean of sample 0 1 2 3 Quantiles of N(8,8/200) Example Central limit theorem: n = 400 QQplot of quantiles of N (8.8/400) against the quantiles of the 500 observations of means of samples of size 400. Histogram of the 500 observations of means of samples of size 400, together with the scaled graph of N (8, 8/400). QQ plot for means of samples of size 400 60 8.0 7.8 40 7.6 20 0 Frequency 80 Quantiles of means 8.2 100 120 8.4 Means of samples of size 400 of Po(8) 7.4 7.6 7.8 8.0 8.2 8.4 8.6 −3 mean of sample −2 −1 0 Quantiles of N(8,8/400) 8 1 2 3 Example Central limit theorem: n = 600 Histogram of the 500 observations of means of samples of size 600, together with the scaled graph of N (8, 8/600). QQplot of quantiles of N (8.8/600) against the quantiles of the 500 observations of means of samples of size 600. QQ plot for means of samples of size 600 8.0 Quantiles of means 7.9 7.8 40 7.7 20 0 Frequency 60 8.1 8.2 80 8.3 Means of samples of size 600 of Po(8) 7.4 7.6 7.8 8.0 8.2 8.4 8.6 −3 mean of sample −2 −1 0 Quantiles of N(8,8/600) 9 1 2 3