Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

no text concepts found

Transcript

Inference 1 Sample Proportions Column of zeros and ones ˆ : estimated proportion or fraction equals # of p successes/ # of observations = k/n Where k= Binomial[mean=np, variance=p*(1-p)n] E[ p ˆ } = E[k/n] =E[(1/n)*k] = 1/n E[k] =( 1/n)np=p ˆ ] = Var[k/n] = (1/n)2 Var k =(1/n)2 np(1-p) Var[ p ˆ ] = p(1-p)/n Var[ p ˆ is a point estimate of p p The estimate of the Var of p ˆ is p ˆ *(1 - p ˆ )/n 2 Sample Proportions example from Lab 3 one of the ten columns of 50 observations of ones and zeros with the smallest proportion of 0.32 ˆ = 0.32*0.68/50 =0.004352 with square root, i.e Var p standard deviation of 0.066 A 95 % confidence interval for an estimate of p from this sample is: Prob [-1.96≤(0.32-p)/0.066≤1.96]=0.95 Prob[-1.96*0.066≤(0.32-p)≤1.96*0.066]=0.95 Prob[-0.13≤(0.32-p)≤0.13]=0.95 Prob[0.13≥(p-0.32)≥-0.13]=0.95 Prob[0.45≥p≥ 0.19]=0.95 3 4 Note: This 95 % confidence interval does not include p=0.5, the population parameter chosen for the simulation, illustrating that 5 % of the time the 95% confidence interval will not include the true value! The Prob [-1.96≤(0.32-p)/0.066≤1.96]=0.95 Is the same as Prob[-1.96≤z≤1.96]=0.95, where, (p ˆ - p)/ ˆ ( pˆ ) = z = (0.32 – p)/0.066, in this example We can use the normal distribution approximation to the binomial in this example since n*p = 50*0.32 ≥ 5 and n*(1-p)≥5 5 a Z value of 1.96 leads to an area of 0.475, leaving 0.025 in the Upper tail 6 Interval Estimation The conventional approach is to choose a probability for the interval such as 95% or 99% 7 So z values of -1.96 and 1.96 leave 2.5% in each tail 8 f ( z) [1 / 2 ] * e 1/ 2[( z 0) /1]2 Density Function for the Standardized Normal Variate 0.45 0.4 0.35 Density 0.3 0.25 0.2 -1.96 1.96 0.15 0.1 2.5% 2.5% 0.05 0 -5 -4 -3 -2 -1 0 1 Standard Deviations 2 3 4 5 9 Application of Sample Proportions 10 11 12 13 Field Poll Margin of Error ˆ= p ˆ *(1 - p ˆ )/n = 0.47* 0.53/599 = 0.000416 Variance of p ˆ ( pˆ ) = √0.000416 = 0.0204 So two standard deviations is about 0.041 0r 4.1%, i.e the of error is plus or minus 4.1 percentage points margin 14 Inferring the unknown population mean from a sample mean Example from Lab 3: simulate the population as uniform, with random variable x, 0≤x≤1, and density f(x) =1 Note: f (x)dx x | 1 0 1 F(1), theCDF 1 1 2 1 Note the expected value of x, E[x]= x * f ( x)dx x *1* dx x / 2 |0 1 / 2 0 0 Var[x] = E[x-E(x)]2 = E[x – E(x)]2 =E{x2 -2xE[x]+E[x]2} Var[x] = {E(x2) – 2E[x]*E[x] + E[x]2 } = E[x2] –[Ex]2 1 1 2 2 E[x ] = x f (x)dx x 2dx [x 3 /3] |10 = 1/3 1 1 0 0 0 0 Var[x] = E[x2] – E[x]2 = 1/3 –[1/2]2 = 1/3-1/4 = (4 -3)/12 =1/12 X~ Uniform(mean=1/2, Variance=1/12) 15 In lab 3 we drew a random sample of size 50 from this uniform distribution and calculated the sample mean:x ( x ) / n 50 i 1 From the central limit theorem, we know, and we saw in lab 3, that the sample mean is distributed normally 16 Central tendency and dispersion of sample mean n n n 1 1 1 E[x ] E x i /n (1/n) Ex i (1/n) (1/n)n * Where μ is the population mean. In the simulation from Lab 3 using the uniform distribution, we knew that μ = 0.5 n n Var[x ] Var[ x i /n (1/n) Var x i (1/n) 2 1 1 n 2 n var[ x ] (1/n) 2 i 1 2 (1/n) 2 n 2 2 /n 1 Where σ2 is the variance of x. In the simulation from Lab 3 using the uniform distribution, we knew the σ2 =1/12. 17 Hypothesis testing Example from Lab 3 for sample proportions Step one: formulate the hypotheses Null hypothesis, H0: p = 0.5 Alternative hypothesis, HA : p<0.5 Step two: Identify a test statistic ˆ p) / ˆ pˆ z (p Where the value for p is from the null hypothesis, so z= (0.32 - 0.5)/0.066 = 0.18/0.066 = 2.73 If the null hypothesis were true, what is the probability of getting a test statistic of this size 18 Hypothesis Testing: 4 Steps Formulate all the hypotheses Identify a test statistic If the null hypothesis were true, what is the probability of getting a test statistic this large? Compare this probability to a chosen critical level of significance, e.g. 5% 19 19 a Z value Of 2.73 leads to an area of 0.4968, leaving 0.0032 in the Upper tail, and Hence 0.0032 In the lower tail. If you choose a risk level of .05, i.e. α = 0.05 for The probability A type I error, Then reject H0 20 20 f ( z) [1 / 2 ] * e 1/ 2[( z 0) /1]2 Density Function for the Standardized Normal Variate 0.45 0.4 0.35 Density 0.3 0.25 0.2 0.15 0.1 0.0032 0.05 0 -5 -4 -3 -2.73 -2 -1 0 1 Standard Deviations 2 3 4 5 21 f ( z) [1 / 2 ] * e 1/ 2[( z 0) /1]2 Density Function for the Standardized Normal Variate 0.45 0.4 0.35 Density 0.3 0.25 0.2 0.15 0.050 0.1 0.05 -5 -4 -3 -2 0 -1.645 -1 0 1 Standard Deviations 2 3 4 5 22

Related documents