Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CHAPTER 7: Estimation and Statistical Intervals 1 Point Estimation Question: When shooting at a target, what does it mean to be accurate? What does it mean to be precise? Can you have one without the other? 1.1 Definitions: 1. Unbiased is when the to the of the sample parameter is equal it is estimating. 2. Consistent Estimator is when the number of data points used to calculate the parameter of interest increases indefinitely, the resulting sequence of estimates converges in probability to the . 1.2 Examples: 1. Sketch the sampling distributions for biased and unbiased estimators of θ̂: 2. Sketch the concept of a consistent estimator: Remark: x̄ is an unbiased and consistent estimator of the population parameter µ by . the 1 2 Large-Sample Confidence Intervals for a Population Mean Question: If we have the point estimator for x̄, but how do we know how well the estimated determines the population mean? 2.1 Definitions: A Confidence Interval is type of statistical inference that is an interval estimate of the population parameter of interest. i.e. For the confidence interval of µ where 100(1 − α)% is the level of confidence. Draw Concept: Remark: samples sizes or when n > 30. 1. The formula is valid for The level of confidence determines zα/2 , a z score which is also referred to as a ” ”. 2. CI does not predict that the true value of the parameter has a particular probability of being in the CI given the data actually obtained. 3. Two interpretations: and . There is a 95% probability that the calculated CI from some future experiment encompasses the true value of the population parameter. 2 2.2 Example: 1. Conceptual Question: Which CI on the average height of person in the world is most useful from the list below and why? • 100% CI of (2ft, 8ft 6in) • 95% CI of (5ft 2in, 6ft 1in) SOLUTION: 2. In Middle Earth, dwarves are famous for drinking. Suppose we collected data on how much dwarves drank in a typical week. We observe 128 dwarves and find the average dwarf drinks 16.79 liters with a standard deviation 1.36 liters. (Values are based from the University of Penn. study on male college freshmen drinking where the sample mean is 16.79 oz and the standard deviation is 13.57 oz.) (a) Find a 95% confidence interval for µ, the true mean consumption of alcohol among the dwarven population. SOLUTION: (b) Provide an interpretation for the confidence interval. SOLUTION: 3 (c) Find a 99% confidence interval for the true mean consumption of alcohol among the dwarven population. SOLUTION: (d) What happens to the width of the confidence interval as the confidence level increases? SOLUTION: (e) Do we need to make any assumptions concerning the shape of the distribution we are sampling from? SOLUTION: 4 2.3 Two Sided and One Sided Confidence Intervals The confidence intervals calculated in the previous example are called . are intervals looking at the upper or lower bound. i.e. Upper Bound: Lower Bound: 2.4 Examples: 1. The average adult male American is 70 inches or 5’10” tall with a standard deviation of 4 inches. Find the 95% CI and covert to lower bound CI for male heights where the number of observations is 100. SOLUTION: 2. In Middle Earth, hobbits are also famous for drinking (not as much as dwarves). Suppose we collected data on how much hobbits drank in a typical week. We observe 184 hobbits and find the average hobbit drinks 10.79 liters with a standard deviation 1.15 liters. Find a 90% upper confidence bound for µ. SOLUTION: 5 2.5 Choosing n Question: How does one choose n? Answer: In practice, when researchers choose a sample size, they usually have a desired level of precision in mind or want their estimate to be within a certain margin of error. Bound on the Error of Estimation, denoted by B, is the half-width of the confidence interval i.e. For a given B, the following expression provides a way to determine how large a sample should be taken. Remark: When we have no idea the value of s, we can use the rule of thumb of estimating σ to be 2.6 Examples: 1. For the dwarf problem, suppose the observers had a rough guess of 1.6 liters for the value of s. What sample size would be necessary to obtain an interval width of 0.5 liters? Use 95% confidence level. SOLUTION: 6 2. For the male adult American problem, what sample size would be necessary to obtain an interval width of 1 inch? Use 95% confidence level. SOLUTION: Shortest person is 2ft and the tallest is 8’6”. So the range is 78 and the estimate standard deviation is 19.5 inches. 3 More Large-Sample Confidence Intervals Question: What about other point estimators such as p? 3.1 Definitions: 1. Let π denote the proportion of elements in a population that have a certain characteristic. Then, π can be estimated by 2. The confidence interval of π is given by where 100(1 − α)% is the level of confidence. 7 3.2 Examples: 1. 1012 Americans were polled by the Gallup organization on June 1st, 2012 about their beliefs concerning human origins. 466 said they believe in the creationist view that God created humans in their present form at one time within the last 10,000 years. Let π represent the true population proportion of Americans that hold this creationist view point. (a) What is the value of the point estimate for π, that is, what is the value of p? SOLUTION: (b) Find a 95% confidence interval for π. SOLUTION: 2. Class Activity: Students with quarters, flip your coin ten times. We will assume that we all flipped the same coin exactly the same. We will record the number of times tails is shown. (a) How many times tails showed out of the total number of times flipped? SOLUTION: (b) Find a 95% confidence interval for π, the true proportion of tails. SOLUTION: 8 3.3 Choosing n For proportions, the required sample size n can be determined for a pre-chosen margin of error: Again, π is unknown. Either use previous knowledge about p or use p = 0.50. Question: Why pick p = 0.50? Answer: 3.4 Examples: 1. Refer to the example prior concerning the Gallup pool of n = 1012 Americans asking about their views on the origin of life. What sample size would be required if we wished to cut in half the margin of error we obtained for the 95% CI? Use p∗ = 0.50. SOLUTION: 2. Do the same as the previous example, but with the coin flipping. SOLUTION: 9 3.5 A Large-Sample Confidence Interval for µ1 − µ2 Question: How does one compare parameters from two different populations based on the information in two samples, one from each population? Sketch Notation/Concept: Mathematically, 3.6 Examples: 1. Example 7.9 on page 313: An experiment carried out to study various characteristics of anchor bolts resulted in n = 78 observations on shear strength (kip) of 3/8 inch diameter bolts and n = 88 observations on strength of 1/2 inch diameter bolts. (a) What is the confidence interval for the difference between true average shear strength for 3/8 inch bolts (µ1 ) and the true average shear strength for the 1/2 inch bolts (µ2 ) using 95% confidence given the sample statistics are x¯1 = 4.25, x¯2 = 7.14, s1 = 1.30, and s2 = 1.68. SOLUTION: 10 (b) What happens to the width of the confidence interval as we increase either or both the sample sizes? SOLUTION: 2. Compute a 99% CI for the true population difference between the mean alcohol consumption for dwarves and hobbits in the Middle Earth study. x¯1 = 16.79, x¯2 = 10.79, s1 = 1.36, s2 = 1.15, n1 = 128, and n2 = 184 SOLUTION: 4 Small-Sample Intervals Based on a Normal Population Question: What happens when the sample size is small? i.e. n ≤ 30. Answer: The previous formulas have allowed us to use z score from the normal curve as the critical value in the computation of our confidence interval. For small values (n ≤ 30), we will use a different critical value, . We again make the assumption that we are sampling approximately from a normal distribution. (i.e. population is normal). 11 4.1 Comparison of t Distributions The t-statistic has a sampling distribution very similar to the z-statistic. i.e. . The primary difference is that the t-statistic is more variable than the z or the distribution of the t-statistic has . The t-distribution depends on a quantity called , n − 1. As n becomes larger, the degrees of freedom become larger. This means the t distribution will look more like a . 4.2 Example: 1. Compute tα/2 for α = 0.05 on 20 degrees freedom. Compare to zα/2 for α = 0.05. SOLUTION: 2. Compute tα/2 for α = 0.05 on 18 degrees freedom. Compare to zα/2 for α = 0.05. SOLUTION: 12 3. Scientists have discovered levels of the hormone adrenocorticoptropin in people just before they awake from sleeping (Nature, Jan. 1999). In the study described, 15 subjects were monitored during sleep after being told they would be woken at a particular time. One hour prior to the designated wake-up time, the hormone level (pg/mL) was measured in each, with the following results: x̄ = 37.3, s = 13.9 (a) Find a 95% CI to estimate the true mean hormone level of the sleepers 1 hour before waking. SOLUTION: (b) Find 90% CI to estimate the true mean hormone level of the sleepers 1 hour before waking. SOLUTION: 4.3 Inference on a Single Value of x The confidence interval in the previously has looked at making inferences for µ or the population mean. To obtain a two-sided prediction interval for a single value that has not yet been observed 13 4.4 Examples: 1. Refer to the sleep study in the previous example. (a) Compute the 95% prediction interval for the hormone level of a single subject one hour before waking. SOLUTION: (b) Provide an interpretation for this interval. SOLUTION: (c) How does the prediction interval differ from the confidence interval? Why? SOLUTION: 2. Earlier in the semester, we collected the height and weight of 10 students in the class. The mean height is 70.1 inches and the standard deviation is 4.7 inches. (a) Compute the 95% prediction interval for the height of a student in the class. SOLUTION: 14 Intervals for µ1 − µ2 based on Normal Population Distributions 5 Question: What if the two sample groups are small or are paired in some way? 5.1 The Two-Sample t Interval Before when computing for µ1 − µ2 , the large sample assumption was used. For the following formula, the sample sizes can be small, but the assumption that both populations are sampled from approximately must hold. Similar to the original formula, but using t as the critical value instead of z. Remark: How to find the degrees of freedom? where se = 5.2 √s n and df is rounded down to the nearest integer. Examples: 1. Permeability of a fabric refers to the accessibility of void space to the flow of a gas or liquid. A study gave the summary information on air permeability (cm3 /cm2 /sec) for a number of different fabrics. Here is the data: Fabric Type Cotton Triacetate Sample Size 10 10 15 x̄ 51.71 136.14 s 0.79 3.59 se 0.250 1.135 Calculate a 95% CI for the difference between true average porosity for cotton and acetate fabrics. SOLUTION: 2. Class Activity: Let’s ask 10 men and 10 women to provide their height and then calculate a 95% CI for the difference between true average height of men and women. Gender Men Women Sample Size x̄ SOLUTION: 16 s se 5.3 A Confidence Interval from Paired Data Samples are not always obtained from two independent or unrelated populations. However, there are times when the two groups of data are related. For example, we administer a pretest and posttest to a group of math students and desire to compute a confidence interval for the difference in mean posttest and mean pretest scores. The two datasets (preand post- tests scores) are related since the same students took both. This means these . To find the CI for the mean difference of two datasets are ¯ paired data, denoted by d, where the t statistic is based on n − 1 degrees of freedom, sd is the the standard deviation difference, and if n is small we assume we are sampling from an approximately normal distribution. 5.4 Example: Japanese researchers have developed a compression-depression method of testing electronic circuits based on Huffman coding. The new method is designed to reduce the time required for input decompression and output compression-called the compression ratio. Experimental results were obtained by testing a sample of 11 benchmark circuits (all of different sizes) from a SUN Blade 1000 workstation. Each circuit was tested with the standard compression-depression method and the new Huffman–based coding method and the compression ratio was recorded: Circuit 1 2 3 4 5 6 7 8 9 10 11 Standard Method 0.80 0.80 0.83 0.53 0.50 0.96 0.99 0.98 0.81 0.95 0.99 17 Huffman Coding Method 0.78 0.80 0.86 0.53 0.51 0.68 0.82 0.72 0.45 0.79 0.77 1. Why is this a paired data experiment? SOLUTION: 2. Compute a 95% CI. SOLUTION: 3. What assumptions did we make? SOLUTION: 4. What would have been the result if we had computed a 95% CI without recognizing this is paired data, i.e. treated the two samples as independent? SOLUTION: 18 6 Other Topics in Estimation Question: We use estimators throughout the course such as x̄ and s, but how do we derive these estimators in the first place? Answer: A popular method is the 6.1 . Maximum Likelihood Estimation Let f (x) denote either a mass or density function that is defined by parameters θ1 , θ2 , ..., θk . Given data x1 , x2 , ..., xn sampled from a population described by f (x), the is given by The Maximum Likelihood Estimator (MLE) maximizes the likelihood function. 6.2 Examples: 1. Suppose a box contains the three elven rings of power, which are either gold or silver (we don’t know because the rings are invisible when worn by the wearer). We sample two without replacement, and say they are both gold. (a) What is our estimate of the number of gold rings in the box and why? SOLUTION: 19 2. Suppose we flip a coin which may be biased (i.e., the probability of flipping tails may not be 50%). We want to estimate π, the probability of flipping tails with this coin, by performing n = 10 flips. Suppose we observe the sequence: T, T, T, H, H, T, T, H, T, T (a) What is the likelihood function for π, denoted as L(π)? SOLUTION: because there are seven tails and three heads. (b) What value of π maximizes L(π)? SOLUTION: 6.3 General Steps to Find the MLE 1. Write down the likelihood function, 2. Most of the time, it is easier to solve the problem to take the natural log of the likelihood function, 3. Take the derivative of with respect to 4. Solve for θ. Once an expression is found for θ, we state we found 20 . . 6.4 Examples: 1. From Chapter 1, the Austrian Hospital had a mean of time of 33.3 minutes out of 43 observations. In section 3 of chapter 1, we used a exponential density function with parameter λ = 0.03 where the value of λ was calculated from λ̂ = 1/x̄. ( λeλx if x ≥ 0 f (x) = 0 otherwise How was this derived? Find the MLE of λ in an exponential density function, given data x1 , x2 , ...x43 . SOLUTION: 2. Let x1 , x2 , ..., xn be a random sample from the Poisson distribution. Find the MLE of the parameter λ. SOLUTION: 21 3. Let x1 , x2 , ..., xn be a random sample from the population described by a uniform distribution. Find the MLE of θ ( 1 if 0 ≤ x ≤ θ f (x) = θ 0 otherwise SOLUTION: 22