Download Consolidation & Review

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Inference
1
Sample Proportions
 Column of zeros and ones
ˆ : estimated proportion or fraction equals # of
 p
successes/ # of observations = k/n
 Where k= Binomial[mean=np, variance=p*(1-p)n]
 E[ p
ˆ } = E[k/n] =E[(1/n)*k] = 1/n E[k] =( 1/n)np=p
ˆ ] = Var[k/n] = (1/n)2 Var k =(1/n)2 np(1-p)
 Var[ p
ˆ ] = p(1-p)/n
 Var[ p
ˆ is a point estimate of p
 p
  The estimate of the Var of p
ˆ is p
ˆ *(1 - p
ˆ )/n




2
Sample Proportions example from Lab
3
 one of the ten columns of 50 observations of ones and
zeros with the smallest proportion of 0.32
ˆ = 0.32*0.68/50 =0.004352 with square root, i.e
 Var p
standard deviation of 0.066
 A 95 % confidence interval for an estimate of p from
 this sample is:
 Prob [-1.96≤(0.32-p)/0.066≤1.96]=0.95
 Prob[-1.96*0.066≤(0.32-p)≤1.96*0.066]=0.95
 Prob[-0.13≤(0.32-p)≤0.13]=0.95
 Prob[0.13≥(p-0.32)≥-0.13]=0.95
 Prob[0.45≥p≥ 0.19]=0.95
3
4
Note:
 This 95 % confidence interval does not include p=0.5,
the population parameter chosen for the simulation,
illustrating that 5 % of the time the 95% confidence
interval will not include the true value!
 The Prob [-1.96≤(0.32-p)/0.066≤1.96]=0.95
 Is the same as Prob[-1.96≤z≤1.96]=0.95, where,
 (p
ˆ - p)/ ˆ ( pˆ ) = z = (0.32 – p)/0.066, in this example
 We can use the normal distribution approximation to the
binomial in this example since n*p = 50*0.32 ≥ 5 and
 n*(1-p)≥5

5
a Z value
of 1.96 leads
to an area of
0.475, leaving
0.025 in the
Upper tail
6
Interval Estimation
 The conventional approach is to choose a probability for
the interval such as 95% or 99%
7
So z values
of -1.96 and
1.96 leave
2.5% in each
tail
8
f ( z)  [1 / 2 ] * e
1/ 2[( z 0) /1]2
Density Function for the Standardized Normal Variate
0.45
0.4
0.35
Density
0.3
0.25
0.2
-1.96
1.96
0.15
0.1
2.5%
2.5%
0.05
0
-5
-4
-3
-2
-1
0
1
Standard Deviations
2
3
4
5
9
Application of Sample Proportions
10
11
12
13
Field Poll Margin of Error
ˆ= p
ˆ *(1 - p
ˆ )/n = 0.47* 0.53/599 = 0.000416
 Variance of p

ˆ ( pˆ ) = √0.000416 = 0.0204
 So two standard deviations is about 0.041 0r 4.1%, i.e the
 of error
is plus or minus 4.1 percentage points
margin
14
Inferring the unknown population
mean from a sample mean
 Example from Lab 3: simulate the population as uniform,
with random variable x, 0≤x≤1, and density f(x) =1
 Note:  f (x)dx  x |  1 0  1  F(1), theCDF
1
1
2
1
 Note the expected value of x, E[x]=  x * f ( x)dx   x *1* dx  x / 2 |0  1 / 2
0
0
 Var[x] = E[x-E(x)]2 = E[x – E(x)]2 =E{x2 -2xE[x]+E[x]2}
 Var[x] = {E(x2) – 2E[x]*E[x] + E[x]2 } = E[x2] –[Ex]2
1
1
2
2
 E[x ] =  x f (x)dx   x 2dx  [x 3 /3] |10 = 1/3
1
1
0
0
0
0
 Var[x] = E[x2] – E[x]2 = 1/3 –[1/2]2 = 1/3-1/4 = (4 -3)/12 =1/12

 X~
Uniform(mean=1/2, Variance=1/12)
15
In lab 3 we drew a random sample
of size 50 from this uniform
distribution and calculated the
sample mean:x  ( x ) / n
50
i
1
 From the central limit theorem, we know, and we saw in lab
3, that the sample mean is distributed normally
16
Central tendency and
dispersion of sample mean
n
n
n
1
1
1
E[x ]  E  x i /n  (1/n) Ex i  (1/n)   (1/n)n *   
Where μ is the population mean. In the simulation from Lab 3 using
the uniform distribution, we knew that μ = 0.5
n
n
Var[x ]  Var[ x i /n  (1/n) Var x i  (1/n)
2
1
1
n
2
n
 var[ x ]  (1/n) 
2
i
1
2
 (1/n) 2 n 2   2 /n
1
Where σ2 is the variance of x. In the simulation from Lab 3 using the
uniform distribution, we knew the σ2 =1/12.
17
Hypothesis testing
 Example from Lab 3 for sample proportions
 Step one: formulate the hypotheses
 Null hypothesis, H0: p = 0.5
 Alternative hypothesis, HA : p<0.5
 Step two: Identify a test statistic
ˆ  p) /
ˆ pˆ
z  (p
 Where the value for p is from the null hypothesis, so
z= (0.32 - 0.5)/0.066 = 0.18/0.066 = 2.73
 If the null hypothesis were true, what is the probability
of getting a test statistic of this size
18
Hypothesis Testing: 4 Steps
 Formulate all the hypotheses
 Identify a test statistic
 If the null hypothesis were true, what is the probability of
getting a test statistic this large?
 Compare this probability to a chosen critical level of
significance, e.g. 5%
19
19
a Z value
Of 2.73 leads
to an area of
0.4968, leaving
0.0032 in the
Upper tail, and
Hence 0.0032
In the lower tail.
If you choose a
risk level of .05,
i.e. α = 0.05 for
The probability
A type I error,
Then reject H0
20
20
f ( z)  [1 / 2 ] * e
1/ 2[( z 0) /1]2
Density Function for the Standardized Normal Variate
0.45
0.4
0.35
Density
0.3
0.25
0.2
0.15
0.1
0.0032
0.05
0
-5
-4
-3
-2.73
-2
-1
0
1
Standard Deviations
2
3
4
5
21
f ( z)  [1 / 2 ] * e
1/ 2[( z 0) /1]2
Density Function for the Standardized Normal Variate
0.45
0.4
0.35
Density
0.3
0.25
0.2
0.15
0.050
0.1
0.05
-5
-4
-3
-2
0
-1.645
-1
0
1
Standard Deviations
2
3
4
5
22
Related documents