Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Journal of Mathematical Analysis Vol. 8, 2014, no. 48, 2375 - 2383 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ijma.2014.49287 Estimating Confidence Interval of Mean Using Classical, Bayesian, and Bootstrap Approaches Solimun Department of Mathematics Faculty of Mathematics and Natural Sciences University of Brawijaya Jalan Veteran Malang-Indonesia Copyright © 2014 Solimun. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract In one study, sometimes observed the characteristics of a population (eg the median, variance, median, or proportion). Given the limitations and constraints, it is not possible to observe the whole of the population elements. Alternative estimation step is performed using a population sample drawn at random from a population. In this study, the median interval estimation of the population (). There are three methods that will be studied is the classical method, Bayes approach, and a bootstrap approach. This study focused on estimating means using the third approach and compare the results obtained from the three approaches. Test results using the data of the population obtained an estimate of the value of the middle third population is relatively the same method, in which the bootstrap method produces the smallest confidence interval. Keywords: Confidence interval, classic, Bayes, and Bootstrap 1 Introduction All observations, whether finite or infinite, comprise of what’s known as population. In a study, sometimes characteristics of a population are observed. Several statistical measures are used to discover the characteristics of populations, such as mean, variance, median, or proportion. 2376 Solimun In statistical inference we want to draw conclusions on populations, although it’s impossible or impractical for us to observe all individuals in a population. With various limitations and obstacles, it’s impossible to observe the entire population elements. An alternative step is estimating populations using a randomly collected sample from a population. One of population parameter estimation system based on sample statistics is confidence interval which is a system which produces representative parameter estimations. Statistical inference theory includes all methods used in drawing conclusions or generalizing a population. The current tendency in estimating a population parameter is the development of classical method which bases its conclusion only on information from a random sample from the population. Two new methods discussed in this study are Bayesian and Bootstrap methods. Bayesian method uses or combines subjective knowledge on the distribution of unknown parameter opportunity with information from data sample. Bootstrap method uses classical method which uses resampling. Based on the background above, the problem discussed was How is the use of interval estimation of the means of a population using classical method, Bayesian approach, and bootstrap approach, and the comparison of the three methods? The purpose of this study was using interval estimation of the means of a population using classical method, Bayesian approach, and bootstrap approach, and the comparison of the three methods. The benefit of this study was researchers can use Bayesian approach and bootstrap approach as alternatives in parameter estimation, aside from the popular classical method. 2 Materials and Methods The population of a data is assumed to be normally distributed with X N(µ, ) where expectation value of X is with mean µ and variance 2. Population parameters µ and 2 are unknown. Mean sample X and variance s2 are estimators of the mean and variance of the population: 2 ̂ X 1 n 1 n X i X 2 X i dan ̂ 2 s 2 n 1 i 1 n i 1 where Xi is random variable taken randomly from a population. Expectation value of average sample is E ( X ) = µ and Standard deviation Se ( X ) = . For a n small sample (n < 30) the population distributes normally (X N (µ, 2)) and 2 is unknown and estimated by s2, so it can be formulated that: Estimating confidence interval of mean 2377 X µ tn-1 s n where tn-1 is produced from t distribution with degree of freedom n-1, so confidence interval for mean is: P( X t1 / 2,n1 s n < µ < X t1 / 2,n1 s n ) = 95% In classical approach, confidence interval estimation comes from asymptotic sample drawing theory, while for Bayesian approach, confidence interval estimation comes from posterior distribution from the generation of sample data from data and some concentration of prior distribution of parameters. At the first level of the model, it’s assumed that the distribution of the sample is normal Level 1 (DATA): Xi N(µ, 2). At the second level, it specifies prior distribution for μ Level 2 (PRIOR): µ N(μμ , 2μ ). At the third or last level, it specifies hyperprior distribution for 2, μμ, 2μ. Level 3 (HYPERPRIOR): P(2), P(μμ) and P(2μ). This Bayesian approach generate a sample for unobserved parameters µ(1), µ(2),…, µ(k) of distribution µ. Every sample generation estimates posterior distribution for µ and calculates posterior mean. Estimator of confidence interval of mean with confidence level 95% is obtained from percentile 2.5% and 97.5% of the simulation. Bootstrap method uses resampling method. It’s assumed that data distribution is unknown. x1 , x2 ,..., xn is a random sample of F which is an unknown distribution, =(F) is parameter and ˆ T ( x ,...., x ) is the 1 n estimation of . Estimator ˆ T ( x ,...., xn ) obtained from bootstrap sample * * ( x ,...., x ) is called bootstrap replication for ˆ . * 1 * 1 * n This study uses data of bowling scores presented in Table 1. 2378 Solimun Table 1: Data Score Bowling Game No 1 2 3 4 5 6 7 8 9 10 11 Score 93 119 110 72 99 85 53 70 66 142 63 No 12 13 14 15 16 17 18 19 20 21 22 Score 72 118 73 102 122 70 81 130 97 89 27 Using three method, classical, Bayesian, and bootstrap, estimation of confidence level of a population was conducted. The software used were SPLUS and Winbugs: 3 Result and Discussion 0 0.0 0.002 2 0.004 0.006 4 0.008 0.010 0.012 6 0.014 Figure 1 shows histogram and data concentration function. We can see that the data has asymmetrical distribution. Estimator of the mean of the population is 88, 77 with standard deviation 5, 90. Confidence interval 95% for estimation of the mean of the population is [76, 51; 101.05]. 20 40 60 80 100 data 120 140 160 0 50 100 150 data Figure 1: Histograms and Density Function Data Estimating confidence interval of mean 2379 This Bayesian approach used software Winbugs. First, it defined model data and estimation of initial value. Next, it performed simulation with iteration 10000. Figure 2 shows the mean obtained in every simulation. The final part used analysis based on 1001st to 10000th iterations. mu 120.0 100.0 80.0 60.0 40.0 1000 2500 5000 7500 10000 iteration Figure 2: Trace plots for the median population (after the disposal of the first observation in 1000) Estimation of concentration function for posterior distribution for mean of the population is presented in Figure 3. Estimation of confidence interval of the mean of the population is obtained from quantile values 2,5% and 97,5% from the simulation. mu sample: 9001 0.08 0.06 0.04 0.02 0.0 40.0 60.0 80.0 100.0 Figure 3: Posterior density function for the distribution of the median population Estimator of the mean of posterior distribution is 88, 58 and standard deviation is 6, 18. Confidence interval 95% for mean of the population is [76, 46; 100, 30]. Estimation using Maximum Likelihood in Bootstrap approach is 88, 78 with standard deviation 5, 75. Confidence interval 95% for mean of the population is [77, 50; 99, 95]. Figure 4 shows histogram and concentration function for the mean of the sample based on the result of 1000 bootstrap iterations. Solimun 0 0.0 200 400 0.02 600 dx$y 800 0.04 1000 1200 0.06 1400 2380 70 80 90 100 110 70 80 90 100 110 theta theta.x Figure 4: Histograms and Density Function Central Value Based on 1000 Repetition Bootstrap Samples The results of the three methods are presented in the following table: Table 2: Estimation of Central Value, Standard Deviation and Confidence Interval Methods Classical, Bayes and Bootstrap Hose confidence Central Value Standard Deviation Clasic 88,77 Bayes Bootstrap Method Lower limit Upper Limit Width 5,90 76,51 101,05 24,54 88,58 6,18 76,46 100,30 23,84 88,78 5,74 77,50 99,95 22,45 Table 2 shows that the mean obtained from the three approaches were nearly the same, especially classical and bootstrap methods. Similarly for standard deviation Bayesian method had the biggest standard deviation, and Bootstrap method had the smallest standard deviation. Similarly for confidence interval, Bootstrap method had the smallest width of confidence interval. The main differences Estimating confidence interval of mean 2381 of the three methods were: 1) classical and Bayesian methods required distribution assumption to base the data, while bootstrap method didn’t assume data with certain distribution. 2) Classical method was derived from multiplication with critical value. This made the confidence interval produced to be symmetrical with mean estimator. While in Bayesian and Bootstrap methods, confidence interval approaches used quantile 2,5% and 97,5% which produced asymmetrical confidence interval. 4 Conclusion Confidence interval estimation could use Classical, Bayesian and Bootstrap methods. In the application, by using data of a population, three methods were relatively similar. Bootstrap method had the smallest width of confidence interval, indicating that this method was more thorough and recommended. Acknowledgements. Many thanks to University of Brawijaya for financial support. References [1]. Dukic, V., dan Hogan, J.W. A hierarchical bayesian approach to modeling embryo implantation following in vitro fertilization. http://biostatistics.oxfordjournals.org/cgi/reprint/3/3/361.pdf. [2]. Friedman, N., Goldszmidt, M., and Wyner, A. Data analysis with Bayesian networks: a bootstrap approach. Http://www.cs.huji.ac.il/~nir/Abstracts/FGW2.html. [3]. Matthew, J. B., Falciani, F., Ghahramani, Z., Rangel, C., dan Wild, D. L.. A Bayesian approach to reconstructing genetic regulatory networks with hidden factors. Http://bioinformatics.oxfordjournals.org/cgi/content/full/21/3/349. [4]. Walpole, R.E. 1995. Pengantar Statistika. (In Indonesian) PT. Gramedia Pustaka Utama, Indonesia. Received: August 12, 2014 Appendix 1. Code 2382 Solimun Clasical Method data<-c(93,119,110,72,99,85,53,70,66,142,63,72,118,73,102,122,70,81,130,97,89 ,27) par(mfrow=c(1,2)) hist(data,col=0,nclass=7) dx<-density(data) data<-dx$x plot(data,dx$y,type="l") # histogram of replicates # density estimate s.data<-sum(data) ssq.data<-sum(data^2) n<-length(data) # sum the data # sum of square the data # sample size ml.x<-s.data/n mean ml.sd<-sqrt((ssq.data-s.data^2/n)/(n-1)) for standard deviation # maximum likelihood estimator for ml.se<-ml.sd/sqrt(n) df<-n-1 CIL<-ml.x-qt(0.975,df)*ml.se CIU<-ml.x+qt(0.975,df)*ml.se # maximum likelihood estimator # standard error # degree of freedom # lower limit CI # upper limit CI ml.x ml.se CIL CIU Bayesian Approach model { for( i in 1 : N ) { data[i] ~ dnorm(mu,tau.c) } tau.c ~ dgamma(0.001,0.001) mu ~ dnorm(alpha,tau.alpha) alpha ~ dnorm(0.0,1.0E-6) tau.alpha ~ dgamma(0.001,0.001) } list(N=22, data=c(93,119,110,72,99,85,53,70,66,142,63,72,118,73,102,122,70,81,130,97,89, 27)) list(mu=10, alpha = 0, tau.c = 1, tau.alpha = 1) Estimating confidence interval of mean 2383 Bootstraping Method data<-c(93,119,110,72,99,85,53,70,66,142,63,72,118,73,102,122,70,81,130,97, 89,27) B<-10000 # number of bootstrap theta.x<-c(1:B) # vector to keep the theta for (i in 1:B) { data.boot<-sample(data,size=n,replace=T) # draw non-parametric bootstrap sample theta.x[i]<-mean(data.boot) # calculate theta } mu<-mean(theta.x) # mean of theta sd<-stdev(theta.x) # standard deviation of theta CIL<-quantile(theta.x,probs=0.025) # lower limit confidence interval CIU<-quantile(theta.x,probs=0.975) # upper limit confidence interval mu sd CIL CIU par(mfrow=c(1,2)) hist(theta.x,col=0,nclass=n) dx<-density(theta.x) theta<-dx$x plot(theta,dx$y,type="l") # histogram of replicates # density estimate