Download Estimating Confidence Interval of Mean Using Classical

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
International Journal of Mathematical Analysis
Vol. 8, 2014, no. 48, 2375 - 2383
HIKARI Ltd, www.m-hikari.com
http://dx.doi.org/10.12988/ijma.2014.49287
Estimating Confidence Interval of Mean Using
Classical, Bayesian, and Bootstrap Approaches
Solimun
Department of Mathematics
Faculty of Mathematics and Natural Sciences
University of Brawijaya
Jalan Veteran Malang-Indonesia
Copyright © 2014 Solimun. This is an open access article distributed under the Creative
Commons Attribution License, which permits unrestricted use, distribution, and reproduction in
any medium, provided the original work is properly cited.
Abstract
In one study, sometimes observed the characteristics of a population (eg the
median, variance, median, or proportion). Given the limitations and constraints, it
is not possible to observe the whole of the population elements. Alternative
estimation step is performed using a population sample drawn at random from a
population. In this study, the median interval estimation of the population ().
There are three methods that will be studied is the classical method, Bayes
approach, and a bootstrap approach. This study focused on estimating means
using the third approach and compare the results obtained from the three
approaches. Test results using the data of the population obtained an estimate of
the value of the middle third population is relatively the same method, in which
the bootstrap method produces the smallest confidence interval.
Keywords: Confidence interval, classic, Bayes, and Bootstrap
1 Introduction
All observations, whether finite or infinite, comprise of what’s known as
population. In a study, sometimes characteristics of a population are observed.
Several statistical measures are used to discover the characteristics of populations,
such as mean, variance, median, or proportion.
2376
Solimun
In statistical inference we want to draw conclusions on populations, although it’s
impossible or impractical for us to observe all individuals in a population. With
various limitations and obstacles, it’s impossible to observe the entire population
elements. An alternative step is estimating populations using a randomly collected
sample from a population. One of population parameter estimation system based
on sample statistics is confidence interval which is a system which produces
representative parameter estimations.
Statistical inference theory includes all methods used in drawing conclusions
or generalizing a population. The current tendency in estimating a population
parameter is the development of classical method which bases its conclusion only
on information from a random sample from the population. Two new methods
discussed in this study are Bayesian and Bootstrap methods. Bayesian method
uses or combines subjective knowledge on the distribution of unknown parameter
opportunity with information from data sample. Bootstrap method uses classical
method which uses resampling.
Based on the background above, the problem discussed was How is the use
of interval estimation of the means of a population using classical method,
Bayesian approach, and bootstrap approach, and the comparison of the three
methods? The purpose of this study was using interval estimation of the means of
a population using classical method, Bayesian approach, and bootstrap approach,
and the comparison of the three methods. The benefit of this study was
researchers can use Bayesian approach and bootstrap approach as alternatives in
parameter estimation, aside from the popular classical method.
2 Materials and Methods
The population of a data is assumed to be normally distributed with X 
N(µ,  ) where expectation value of X is with mean µ and variance 2. Population
parameters µ and 2 are unknown. Mean sample X and variance s2 are
estimators of the mean and variance of the population:
2
̂  X 
1 n
1 n
X i  X 2
X i dan ̂ 2  s 2 


n  1 i 1
n i 1
where Xi is random variable taken randomly from a population. Expectation value
of average sample is E ( X ) = µ and Standard deviation Se ( X ) = 
. For a
n
small sample (n < 30) the population distributes normally (X  N (µ, 2)) and 2 is
unknown and estimated by s2, so it can be formulated that:
Estimating confidence interval of mean
2377
X µ
 tn-1
s
n
where tn-1 is produced from t distribution with degree of freedom n-1, so
confidence interval for mean is:
P( X  t1 / 2,n1 s
n
< µ < X  t1 / 2,n1 s
n
) = 95%
In classical approach, confidence interval estimation comes from asymptotic
sample drawing theory, while for Bayesian approach, confidence interval
estimation comes from posterior distribution from the generation of sample data
from data and some concentration of prior distribution of parameters.
At the first level of the model, it’s assumed that the distribution of the
sample is normal
Level 1 (DATA): Xi  N(µ, 2). At the second level, it specifies prior distribution
for μ
Level 2 (PRIOR): µ  N(μμ , 2μ ). At the third or last level, it specifies hyperprior
distribution for 2, μμ, 2μ.
Level 3 (HYPERPRIOR): P(2), P(μμ) and P(2μ). This Bayesian approach
generate a sample for unobserved parameters µ(1), µ(2),…, µ(k) of distribution µ.
Every sample generation estimates posterior distribution for µ and calculates
posterior mean. Estimator of confidence interval of mean with confidence level
95% is obtained from percentile 2.5% and 97.5% of the simulation.
Bootstrap method uses resampling method. It’s assumed that data
distribution is unknown. x1 , x2 ,..., xn is a random sample of F which is an
unknown distribution,  =(F) is parameter and ˆ  T ( x ,...., x ) is the
1
n
estimation of . Estimator ˆ  T ( x ,...., xn ) obtained from bootstrap sample
*
*
( x ,...., x ) is called bootstrap replication for ˆ .
*
1
*
1
*
n
This study uses data of bowling scores presented in Table 1.
2378
Solimun
Table 1: Data Score Bowling Game
No
1
2
3
4
5
6
7
8
9
10
11
Score
93
119
110
72
99
85
53
70
66
142
63
No
12
13
14
15
16
17
18
19
20
21
22
Score
72
118
73
102
122
70
81
130
97
89
27
Using three method, classical, Bayesian, and bootstrap, estimation of confidence
level of a population was conducted. The software used were SPLUS and
Winbugs:
3 Result and Discussion
0
0.0
0.002
2
0.004
0.006
4
0.008
0.010
0.012
6
0.014
Figure 1 shows histogram and data concentration function. We can see that
the data has asymmetrical distribution. Estimator of the mean of the population is
88, 77 with standard deviation 5, 90. Confidence interval 95% for estimation of
the mean of the population is [76, 51; 101.05].
20
40
60
80
100
data
120
140
160
0
50
100
150
data
Figure 1: Histograms and Density Function Data
Estimating confidence interval of mean
2379
This Bayesian approach used software Winbugs. First, it defined model data
and estimation of initial value. Next, it performed simulation with iteration 10000.
Figure 2 shows the mean obtained in every simulation. The final part used
analysis based on 1001st to 10000th iterations.
mu
120.0
100.0
80.0
60.0
40.0
1000
2500
5000
7500
10000
iteration
Figure 2: Trace plots for the median population
(after the disposal of the first observation in 1000)
Estimation of concentration function for posterior distribution for mean of
the population is presented in Figure 3. Estimation of confidence interval of the
mean of the population is obtained from quantile values 2,5% and 97,5% from the
simulation.
mu sample: 9001
0.08
0.06
0.04
0.02
0.0
40.0
60.0
80.0
100.0
Figure 3: Posterior density function for the distribution of the median
population
Estimator of the mean of posterior distribution is 88, 58 and standard deviation is
6, 18. Confidence interval 95% for mean of the population is [76, 46; 100, 30].
Estimation using Maximum Likelihood in Bootstrap approach is 88, 78 with
standard deviation 5, 75. Confidence interval 95% for mean of the population is
[77, 50; 99, 95]. Figure 4 shows histogram and concentration function for the
mean of the sample based on the result of 1000 bootstrap iterations.
Solimun
0
0.0
200
400
0.02
600
dx$y
800
0.04
1000
1200
0.06
1400
2380
70
80
90
100
110
70
80
90
100
110
theta
theta.x
Figure 4: Histograms and Density Function Central Value Based on 1000
Repetition Bootstrap Samples
The results of the three methods are presented in the following table:
Table 2: Estimation of Central Value, Standard Deviation and Confidence
Interval Methods Classical, Bayes and Bootstrap
Hose confidence
Central
Value
Standard
Deviation
Clasic
88,77
Bayes
Bootstrap
Method
Lower
limit
Upper
Limit
Width
5,90
76,51
101,05
24,54
88,58
6,18
76,46
100,30
23,84
88,78
5,74
77,50
99,95
22,45
Table 2 shows that the mean obtained from the three approaches were
nearly the same, especially classical and bootstrap methods. Similarly for standard
deviation Bayesian method had the biggest standard deviation, and Bootstrap
method had the smallest standard deviation. Similarly for confidence interval,
Bootstrap method had the smallest width of confidence interval. The main differences
Estimating confidence interval of mean
2381
of the three methods were: 1) classical and Bayesian methods required
distribution assumption to base the data, while bootstrap method didn’t assume
data with certain distribution. 2) Classical method was derived from
multiplication with critical value. This made the confidence interval produced to
be symmetrical with mean estimator. While in Bayesian and Bootstrap methods,
confidence interval approaches used quantile 2,5% and 97,5% which produced
asymmetrical confidence interval.
4 Conclusion
Confidence interval estimation could use Classical, Bayesian and
Bootstrap methods. In the application, by using data of a population, three
methods were relatively similar. Bootstrap method had the smallest width of
confidence interval, indicating that this method was more thorough and
recommended.
Acknowledgements. Many thanks to University of Brawijaya for financial
support.
References
[1]. Dukic, V., dan Hogan, J.W. A hierarchical bayesian approach to modeling
embryo
implantation
following
in
vitro
fertilization.
http://biostatistics.oxfordjournals.org/cgi/reprint/3/3/361.pdf.
[2]. Friedman, N., Goldszmidt, M., and Wyner, A. Data analysis with
Bayesian
networks:
a
bootstrap
approach.
Http://www.cs.huji.ac.il/~nir/Abstracts/FGW2.html.
[3]. Matthew, J. B., Falciani, F., Ghahramani, Z., Rangel, C., dan Wild, D. L..
A Bayesian approach to reconstructing genetic regulatory networks with
hidden
factors.
Http://bioinformatics.oxfordjournals.org/cgi/content/full/21/3/349.
[4]. Walpole, R.E. 1995. Pengantar Statistika. (In Indonesian) PT. Gramedia
Pustaka Utama, Indonesia.
Received: August 12, 2014
Appendix 1. Code
2382
Solimun
Clasical Method
data<-c(93,119,110,72,99,85,53,70,66,142,63,72,118,73,102,122,70,81,130,97,89
,27)
par(mfrow=c(1,2))
hist(data,col=0,nclass=7)
dx<-density(data)
data<-dx$x
plot(data,dx$y,type="l")
# histogram of replicates
# density estimate
s.data<-sum(data)
ssq.data<-sum(data^2)
n<-length(data)
# sum the data
# sum of square the data
# sample size
ml.x<-s.data/n
mean
ml.sd<-sqrt((ssq.data-s.data^2/n)/(n-1))
for standard deviation
# maximum likelihood estimator for
ml.se<-ml.sd/sqrt(n)
df<-n-1
CIL<-ml.x-qt(0.975,df)*ml.se
CIU<-ml.x+qt(0.975,df)*ml.se
# maximum likelihood estimator
# standard error
# degree of freedom
# lower limit CI
# upper limit CI
ml.x
ml.se
CIL
CIU
Bayesian Approach
model
{
for( i in 1 : N )
{
data[i] ~ dnorm(mu,tau.c)
}
tau.c ~ dgamma(0.001,0.001)
mu ~ dnorm(alpha,tau.alpha)
alpha ~ dnorm(0.0,1.0E-6)
tau.alpha ~ dgamma(0.001,0.001)
}
list(N=22,
data=c(93,119,110,72,99,85,53,70,66,142,63,72,118,73,102,122,70,81,130,97,89,
27))
list(mu=10, alpha = 0, tau.c = 1, tau.alpha = 1)
Estimating confidence interval of mean
2383
Bootstraping Method
data<-c(93,119,110,72,99,85,53,70,66,142,63,72,118,73,102,122,70,81,130,97,
89,27)
B<-10000
# number of bootstrap
theta.x<-c(1:B)
# vector to keep the theta
for (i in 1:B)
{
data.boot<-sample(data,size=n,replace=T)
# draw non-parametric bootstrap
sample
theta.x[i]<-mean(data.boot)
# calculate theta
}
mu<-mean(theta.x)
# mean of theta
sd<-stdev(theta.x)
# standard deviation of theta
CIL<-quantile(theta.x,probs=0.025)
# lower limit confidence interval
CIU<-quantile(theta.x,probs=0.975)
# upper limit confidence interval
mu
sd
CIL
CIU
par(mfrow=c(1,2))
hist(theta.x,col=0,nclass=n)
dx<-density(theta.x)
theta<-dx$x
plot(theta,dx$y,type="l")
# histogram of replicates
# density estimate
Related documents