Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Inference
1
Sample Proportions
Column of zeros and ones
ˆ : estimated proportion or fraction equals # of
p
successes/ # of observations = k/n
Where k= Binomial[mean=np, variance=p*(1-p)n]
E[ p
ˆ } = E[k/n] =E[(1/n)*k] = 1/n E[k] =( 1/n)np=p
ˆ ] = Var[k/n] = (1/n)2 Var k =(1/n)2 np(1-p)
Var[ p
ˆ ] = p(1-p)/n
Var[ p
ˆ is a point estimate of p
p
The estimate of the Var of p
ˆ is p
ˆ *(1 - p
ˆ )/n
2
Sample Proportions example from Lab
3
one of the ten columns of 50 observations of ones and
zeros with the smallest proportion of 0.32
ˆ = 0.32*0.68/50 =0.004352 with square root, i.e
Var p
standard deviation of 0.066
A 95 % confidence interval for an estimate of p from
this sample is:
Prob [-1.96≤(0.32-p)/0.066≤1.96]=0.95
Prob[-1.96*0.066≤(0.32-p)≤1.96*0.066]=0.95
Prob[-0.13≤(0.32-p)≤0.13]=0.95
Prob[0.13≥(p-0.32)≥-0.13]=0.95
Prob[0.45≥p≥ 0.19]=0.95
3
4
Note:
This 95 % confidence interval does not include p=0.5,
the population parameter chosen for the simulation,
illustrating that 5 % of the time the 95% confidence
interval will not include the true value!
The Prob [-1.96≤(0.32-p)/0.066≤1.96]=0.95
Is the same as Prob[-1.96≤z≤1.96]=0.95, where,
(p
ˆ - p)/ ˆ ( pˆ ) = z = (0.32 – p)/0.066, in this example
We can use the normal distribution approximation to the
binomial in this example since n*p = 50*0.32 ≥ 5 and
n*(1-p)≥5
5
a Z value
of 1.96 leads
to an area of
0.475, leaving
0.025 in the
Upper tail
6
Interval Estimation
The conventional approach is to choose a probability for
the interval such as 95% or 99%
7
So z values
of -1.96 and
1.96 leave
2.5% in each
tail
8
f ( z) [1 / 2 ] * e
1/ 2[( z 0) /1]2
Density Function for the Standardized Normal Variate
0.45
0.4
0.35
Density
0.3
0.25
0.2
-1.96
1.96
0.15
0.1
2.5%
2.5%
0.05
0
-5
-4
-3
-2
-1
0
1
Standard Deviations
2
3
4
5
9
Application of Sample Proportions
10
11
12
13
Field Poll Margin of Error
ˆ= p
ˆ *(1 - p
ˆ )/n = 0.47* 0.53/599 = 0.000416
Variance of p
ˆ ( pˆ ) = √0.000416 = 0.0204
So two standard deviations is about 0.041 0r 4.1%, i.e the
of error
is plus or minus 4.1 percentage points
margin
14
Inferring the unknown population
mean from a sample mean
Example from Lab 3: simulate the population as uniform,
with random variable x, 0≤x≤1, and density f(x) =1
Note: f (x)dx x | 1 0 1 F(1), theCDF
1
1
2
1
Note the expected value of x, E[x]= x * f ( x)dx x *1* dx x / 2 |0 1 / 2
0
0
Var[x] = E[x-E(x)]2 = E[x – E(x)]2 =E{x2 -2xE[x]+E[x]2}
Var[x] = {E(x2) – 2E[x]*E[x] + E[x]2 } = E[x2] –[Ex]2
1
1
2
2
E[x ] = x f (x)dx x 2dx [x 3 /3] |10 = 1/3
1
1
0
0
0
0
Var[x] = E[x2] – E[x]2 = 1/3 –[1/2]2 = 1/3-1/4 = (4 -3)/12 =1/12
X~
Uniform(mean=1/2, Variance=1/12)
15
In lab 3 we drew a random sample
of size 50 from this uniform
distribution and calculated the
sample mean:x ( x ) / n
50
i
1
From the central limit theorem, we know, and we saw in lab
3, that the sample mean is distributed normally
16
Central tendency and
dispersion of sample mean
n
n
n
1
1
1
E[x ] E x i /n (1/n) Ex i (1/n) (1/n)n *
Where μ is the population mean. In the simulation from Lab 3 using
the uniform distribution, we knew that μ = 0.5
n
n
Var[x ] Var[ x i /n (1/n) Var x i (1/n)
2
1
1
n
2
n
var[ x ] (1/n)
2
i
1
2
(1/n) 2 n 2 2 /n
1
Where σ2 is the variance of x. In the simulation from Lab 3 using the
uniform distribution, we knew the σ2 =1/12.
17
Hypothesis testing
Example from Lab 3 for sample proportions
Step one: formulate the hypotheses
Null hypothesis, H0: p = 0.5
Alternative hypothesis, HA : p<0.5
Step two: Identify a test statistic
ˆ p) /
ˆ pˆ
z (p
Where the value for p is from the null hypothesis, so
z= (0.32 - 0.5)/0.066 = 0.18/0.066 = 2.73
If the null hypothesis were true, what is the probability
of getting a test statistic of this size
18
Hypothesis Testing: 4 Steps
Formulate all the hypotheses
Identify a test statistic
If the null hypothesis were true, what is the probability of
getting a test statistic this large?
Compare this probability to a chosen critical level of
significance, e.g. 5%
19
19
a Z value
Of 2.73 leads
to an area of
0.4968, leaving
0.0032 in the
Upper tail, and
Hence 0.0032
In the lower tail.
If you choose a
risk level of .05,
i.e. α = 0.05 for
The probability
A type I error,
Then reject H0
20
20
f ( z) [1 / 2 ] * e
1/ 2[( z 0) /1]2
Density Function for the Standardized Normal Variate
0.45
0.4
0.35
Density
0.3
0.25
0.2
0.15
0.1
0.0032
0.05
0
-5
-4
-3
-2.73
-2
-1
0
1
Standard Deviations
2
3
4
5
21
f ( z) [1 / 2 ] * e
1/ 2[( z 0) /1]2
Density Function for the Standardized Normal Variate
0.45
0.4
0.35
Density
0.3
0.25
0.2
0.15
0.050
0.1
0.05
-5
-4
-3
-2
0
-1.645
-1
0
1
Standard Deviations
2
3
4
5
22