Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Estimation Sampling Distributions • Because estimators are based on random samples, they are random variates just like data! • Estimators have distributions called sampling distributions Say are interested in mean Mn mass contained in bullets manufactured at a particular factory Lets use the average mass of Mn in a sample (of size n) to estimate the population mean mass: What might the distribution of size 10 over the course of a week? look like if we take 1000 samples of Sampling Distributions Important features of an estimator’s sampling distribution: (Approximate) sampling distribution of Sampling dist mean: Sample size, n = 10 bullets Number of samples = 1000 Sampling dist s.d.: Handy Unbiased Estimators • An unbiased estimator of the mean that we always use is: Same as MLE estimate • An unbiased estimator of the variance (which we will typically use as a variance estimator) is: Different from MLE estimate Handy Unbiased Estimators • An unbiased estimator for a proportion is: Heads, Success, etc, … • An unbiased estimator of the standard error of p is: Sampling Distributions • Uncertainty in the estimate can be represented as standard deviation for the sampling distribution: is called the standard error of the estimator • Estimated standard error of the sample average by plugging in Interval Estimation • We are interested in methods that produce an interval: • Given the assumptions of the methods are satisfied, the interval covers the true value of the parameter with (approximate) probability at least 1 – a. • Common interval methods for: • Confidence intervals • Prediction intervals • Tolerance intervals • Credibility/Probability intervals (Bayesian) Confidence Intervals • q is a parameter we are interested in and assume we don’t know its true value. • e.g. a mean, a sd, a proportion, etc. • Consider an experiment that will collect a sample of data. • Then BEFORE we collect the data, we can devise procedure such that: Estimates we will get from the sample we have yet to collect Confidence Intervals • In order to get actual numerical values for the experiment and plug in the data and we perform • The outcomes for this experiment are: • Under the frequentist definition, probabilities (other than 0 or 1) only exist for outcomes of experiments that haven’t happened yet. • After we collect data is a set of plausible values for q. Confidence Intervals • Given a sample of data, the (1 − a)×100% confidence interval for a parameter estimate on the sample is: • We are (1 − a)×100% confident that the true value of q is covered by • The CI’s level of confidence: (1 − a)×100% is the same “number” as the CI –method’s probability of producing an interval that covers q, but… confidence is not probability Confidence Intervals • So how do we compute a (1 − a)×100% confidence interval given a set of data?? • General Case: (1 − a)×100% CIs for the mean m : • Sample size n, sd sX unknown and estimated: Two sided One sided, lower bound One sided, upper bound Student-t(n-1) quantiles qt(1-a/2,df=n-1) or qt(1-a,df=n-1) Compute the Confidence Intervals A the mass of an unknown powder was determined 30 times. The Results are shown below (units: mg): 4.11, 3.70, 3.36, 3.68, 4.42, 3.23, 4.03, 4.03, 3.52, 4.75, 5.09, 3.47, 3.02, 4.24, 4.74, 4.51, 2.90, 4.15, 3.54, 3.81, 2.98, 3.82, 4.32, 3.06, 4.00, 4.05, 3.19, 3.17, 3.67, 4.37 Compute: a. b. c. d. The sample mean: The sample sd: The estimated standard error of the mean: The number of estimated standard errors that cover 95% of the sampling distribution symmetrically about the sample mean: ± Compute the Confidence Intervals a. Sample mean = 3.83 b. Sample sd = 0.58 c. Est se of mean = 0.11 d. For 95% , a = 0.05. 95% spread symmetrically about the mean we want t0.025, 29 and t0.975, 29 = ± 2.04523 # Data from the question: x <- c(4.11, 3.70, 3.36, 3.68, 4.42, 3.23, 4.03, 4.03, 3.52, 4.75, 5.09, 3.47, 3.02, 4.24, 4.74, 4.51, 2.90, 4.15, 3.54, 3.81, 2.98, 3.82, 4.32, 3.06, 4.00, 4.05, 3.19, 3.17, 3.67, 4.37) n <- length(x) mn <- mean(x) s <- sd(x) se <- s/sqrt(n) # # # # Sample size Sample average (estimated mean) Sample standard deviation Estimated standard error of the mean alpha <- 0.05 conf <- 1 - alpha/2 tt <- qt(p = conf, df = n-1) # Level of significance # Level of confidence # t-quantile: The number of estimated standard # errors that cover conf*100% of the # sampling distribution for the mean. Compute the Confidence Intervals e. Compute the two-sided 95% CI for the mean given this data: [ 3.83 – 2.04*0.11, 3.83 + 2.04*0.11 ] lo <- mn - tt*se hi <- mn + tt*se c(lo,hi) # Two-sided confidence interval for a set of # plauseable values for the mean given this sample. [3.61, 4.05] Confidence Intervals • For us, we can approximate the CI for any parameter we have encountered as • (1 − a)×100% CIs for general parameter q : Two sided One sided, lower bound One sided, upper bound Student-t(n-1) quantiles qt(1-a/2,df=n-1) or qt(1-a,df=n-1) Example Over a several month period the rate of attacks on a certain computer network per day were measured: 11.1, 12.3, 12.0, 11.3, 12.6, 12.9, 12.0, 13.2, 11.8, 13.2, 12.4, 10.3, 12.0, 12.1, 13.1 Compute the 90% lower confidence limit of the hack rate parameter.