(1) Why is the population shape a concern when estimating a mean? What effect does sample size, n, have on the estimate of the mean? Is it possible to normalize the data when the population shape has a known skew? How would you demonstrate the central limit theorem to your classmates? The shape of the population is of importance if we have to eliminate all the assignable causes for the observed variation in the data. We know that if a population is normally distributed (bell shaped curve), the variations in the system/process are due to unassignable (random) causes alone and not due to assignable causes that are within our control. By taking a larger sample size, we are able to search for assignable causes that may not be apparent in a sample of small size. That is, as N increases, the distribution approaches the normal distribution more closely. (By taking a larger sample size, we are able to “hunt out” any outliers in the data, and by removing them, get closer to estimating the true mean of the population) Yes, standard transformation techniques are available to normalize a skewed data. A simple demonstration of the CLT can be a numerical example such as If samples of size 25 are drawn from a population of standard deviation σ , the mean of the sampling distribution will be close to the population mean () whereas the standard deviation, s = Population standard deviation,/25 = σ/5 (2) How do you calculate sample size? What factors do you need to know to calculate it? Sample size calculation (for estimating a mean) requires the knowledge of the level of confidence, the population standard deviation and the margin of error (tolerance). The formula relating these factors is N = (z * σ/E)^2 or (t * σ/E)^2 Sample size calculation (for estimating a proportion) requires the knowledge of the level of confidence, the population proportion and the margin of error (tolerance). The formula relating these factors is N = (z/E)^2 * [p * (1 - p)] (3) Why do so many of life’s events share the same characteristics as the central limit theorem? Why are estimations and confidence intervals important? When might systematic sampling be biased? Explain. What roles do confidence intervals and estimation play in the selection of sample size? In non-mathematical terms, the Central Limit Theorem says that when we put together a lot of random events, the aggregate will tend to follow a bell-curve. That's how we get from something distributed linearly (say, the roll of a die, where each number is equally likely) to a curve where most events are near the average, and the farther an event is from the average, the less likely it is. Most occurrences in nature may appear to be random (mostly because of the sheer size and the diverse factors in play) but when statistically analyzed, they are seen to fit the “bell-shaped” normal distribution. For example, how tall a person will be is the sum of a number of random variables (what genes the person has, what kind of food she eats, general state of health etc), and so people's heights distributes like a bell curve. The same thing applies to almost every physical property of living things. Political polling tells us that if we sum up a group of randomly-polled people, we will get a pretty good approximation of what would happen if we polled everybody. Thus, many events of life share the same characteristics as the central limit theorem. Estimations are inferential tools that are used when we know there is an effect (or we have found an effect) in the sample and we want to quantify the size of the effect in a population. They are important because that is the only way we can get an idea of what to expect in a population based on the information extracted from the sample. Confidence intervals are required to qualify our estimation. They act as “covers” around an estimate. Without a confidence interval an estimation is meaningless. Systematic sampling involves a random start and then proceeds with the selection of every k th element from there. This sampling method may be biased if periodicity is present in the population and the period is a multiple or factor of the interval used in the sampling. In this case, the sample is not representative of the population. The sample size is determined by specifying the preferred width of the confidence interval. For this, we state a margin of error, and a level of confidence. Thus, estimation and confidence intervals together are critical factors in sample size determination. (4) As a sample size approaches infinity, how does the t distribution compare to the normal z distribution? When you draw a sample from a normal distribution, what can you conclude about the sample distribution? Explain. A careful look into the t distribution probability tables, and we observe that as the number of degrees of freedom become greater than about 30 the values of the t table are very close to those of the standard normal distribution table (This is the basis for the rule of thumb of having 30 or more samples for normality -- The t- statistic approximates the zstatistic as n >> 30 and approaches infinity). The t- distribution takes into account the fact that we do not know the population variance. As the number of the degrees of freedom increases then we have a better estimate of the population variance and thus the student t approaches the standard normal. The two curves appear to the identical but there are differences. For small values of n, the curve of the t- distribution is platykurtic. The peak is narrower and the tails are fatter as compared to the normal distribution curve. This means at lower degrees of freedom, the critical t- value is higher than the critical z- value. This means the t- test is tougher and the sample evidence has to be more extreme for the null hypothesis to be rejected. When a sample is drawn from a normally distributed population, the sample units are also normally distributed. (5) A mayoral election race is tightly contested. In a random sample of 1,100 likely voters, 572 said they were planning to vote for the current mayor. Based on this sample, what is your initial hunch? Would you claim with 95% confidence that the mayor will win a majority of the votes? Explain. 572/1100 = 0.52. It appears that the election is tightly contested. p = 0.52, q = 1 - p = 0.48 Standard error, SE = (pq/n) = (0.52 * 0.48/1100) = 0.0151 H0: p = 0.5 and Ha: p > 0.5 z = (p - p')/SE z = (0.52 - 0.5)/0.0151 = 1.3245 P(z > 1.3245) = 0.093 Since 0.093 > 0.05, we cannot say with 95% confidence that the mayor will win a majority of votes.