Download Estimation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Topics
Semester I


Descriptive statistics
Time series
Semester II



Sampling
Statistical Inference: Estimation,
Hypothesis testing
Relationships, casual models
Sampling
Statistical observations
Expectations:



Quickness
Accuracy
Reliability
Solutions


Observe each individuals
Sampling
Statistical inference
Descriptive statistics:
Describe the observed elements
Statistical inference:
Inferences to the populations which are
based on the sample.


Estimation,
hypothesis testing
Estimation: Estimate the population
parameter from a sample
Types


Point
Interval
Error types
Type of Errors
Sampling error: due to selecting a
sample instead of the entire population
Nonsampling error: errors due to
mistakes
Issues
Probability vs. Nonprobability samples
Sample size
Representativity
Probability versus
Nonprobability
Probability Samples: each member of the
population has a known non-zero probability of
being selected

Methods include random sampling,
sampling, and stratified sampling.
systematic
Nonprobability Samples: members are
selected from the population in some nonrandom
manner

Methods include convenience sampling, judgment
sampling, quota sampling, and snowball sampling
Random Sampling
Random sampling is the purest form of probability
sampling.
Simple Random sample with replacement: Each
member of the population has an equal and known chance
of being selected.
Simple Random sample without replacement
Stratified Sampling
Stratified sampling is commonly used probability
method that is superior to random sampling because it
reduces sampling error.
A stratum is a subset of the population that share at
least one common characteristic; such as males and
females.

Identify relevant stratums and their actual representation
in the population.

Random sampling is then used to select a sufficient
number of subjects from each stratum.

Stratified sampling is often used when one or more of the
stratums in the population have a low incidence relative to
the other stratums.
Cluster Sampling
Cluster Sample: a probability sample in which each
sampling unit is a collection of elements.
Effective under the following conditions:


A good sampling frame is not available or costly, while a
frame listing clusters is easily obtained
The cost of obtaining observations increases as the distance
separating the elements increases
Examples of clusters:




City blocks – political or geographical
Housing units – college students
Hospitals – illnesses
Automobile – set of four tires
We examine
Sample
Size
Mean
x
N
m
Std. Dev.
Proprtion
s*
P
s
P
-
n
Population
Distribution of variables and parameters
Relationship between variables
Point estimation
Point estimation
The statistic is computed from sample
to estimate the population parameter
Consistence
ˆ)
E( 
Estimation of population mean
Can the sample mean be a potential
estimation?
Yes, if
?
E( x )  m
Example
Population:
Mean (m):
10, 11, 12, 13, 14
12
Variance (s2):
2
Std. Dev. (s):
1,4142136
Size (N):
5
Sample size (n):
2
Consider each sample with sample size 2
Describe the distribution of the sample means!
Calculate the expected value of the sample means!
Sample distribution: distribution of the examined parameter.
What is the result?
The expected value of the sample
means with given sample size is
equal to the population mean
E( x )  m
n
x
fx
i i
i 1
n
Point estimation of pop. STD. DEV.
n
f i xi  x 


s 
2
i 1
n
E (s )  s 2
2
Corrected empirical std. Dev.
n
s 
 xi  x 
n
2
i 1
n 1

 di 
n
2
i 1
n 1
s 

i 1
f i xi  x 
E (s 2 )  s 2
n
2
n 1


i 1
f i d i 
2
n 1
Point estimation of proportion
k
p
n
E ( p)  
With replacement
sp 
pq
n
Without
replacement
sp 
pq N  n

n N 1
Standard Error of the
estimation
The difference on average between
the sample statistics and the
population parameter with given
sample size
In the case of the sample means:
he standard error of the estimation
The difference on average between the
sample means and the population mean.
Standard error of the mean
sx
Calculation
With
replacement
sx 
s
n
Without
replacement
sx 

n
s
N n
N 1
What is happened if s is unknown?
Estimation
E (s 2 )  s 2
s s
s x  sx
With
replacement
sx 
s
n
s
sx 
n
Without
replacement
sx 
s
n
N n

N 1
s
n
sx 
 1
N
n
INTERVAL ESTIMATE OF THE
POPULATION MEAN
Structure of the confidence
interval
• 95%s
interval: from 100
estimates on average 95
contain
the
population
mean
• First step?
x
Point estimation
 : max imum allowable error (error bound )
Maximum error: with a given probability the maximum
error of the estimation
!!!!
Maximum error: with a given
probability the maximum error of
the estimation
Standard error of the estimate: the
average error of the estimation.
How can we calculate the maximum
error?
Start from

See:
exp ected value  k  std .dev.
k depends on the probabilit y
2. In the case of sample means:
exp ected value  k  std.dev.
k depends on the probabilit y
x  k  s x if s is known
x
 x  k  s x if s is un known
How can we calculate the
value of k (1)
it depends on the probability
What do we know about the
distribution of the sample means?
Distribution of sample means
Size of sample
Distribution
Small
Same as the
popupulation
Large (n>100)
Normal distribution
(Central limit theorem)
About the normal distribution
X~N(E(X),s2)
Special case E(X)=0,s2=1
transform into standard
Normal distribution
z~N(0,1)
F(x)=F(z)
F(-z)=1-F(z)
F (z )
If x is a variable z is a
Standardized variable


Mean of z:0
Std. Dev of z:1
z
XE ( X )
s

X m
s
m  s  x  m  s  1  z  1
m  2s  x  m  2s  2  z  2
m  3s  x  m  3s  3  z  3
Apply for sample means
z
x E ( x )
sx

x m
sx
m  s x  x  m  s x  1  z  1
m  2s x  x  m  2s x  2  z  2
m  3s x  x  m  3s x  3  z  3
Calculation of value of k (2)
1-a given
Pld.
kz
1
xz
a
1
2
a
2
sx
Calculation of value of k (3)
We should know the std. Dev of the
population (s)
In the real life we know nothing about
it
1. Instead of s we use s
2. Instead of normal distribution we
use t-distribution!
t-distribution
ifn  , then t
xt
a
1
2
a
1
2
(v ) s x
( v)  z
1
a
2
v
 n  1
Summary
x
Examine
 Std dev. s or s?
 Small or large sample? In case of small
samples can we assume the normal
distribution?
 Type of sample (EV/FAE/R)?
Alapsokaság szórása ismert
x  k s x  x  
x  z a s x
1
2
Alapsokaság szórása nem ismert
x  k  sx  x  
x  t a (v )  s x
1
2
v  n 1
Plan the sample size
In real life the maximum error is given in
advance. In this case what about the
sample size?
  z a sx  z
1
2
1
a
2
 z a s 
 1 2 
s
n 

n
  


2
Proportional stratified sample
s known
xz
1
sx 
as x
s unknown
x t
2
sB
n
s
a x
1
2
M
sx 
n s
j1
n
2
j j
Estimate of proportion
pz
if min( np ; nq )  10
a
1
2
With replacement Without
replacement
sp 
pq
n
sp 
pq N  n

n N 1
sp
Estimate of standard deviation
( n  1) s
( n  1) s
2
 s  2
2
 a ( )
 a ( )
2
1
2
2
  n 1
2