Download (1) Why is the population shape a concern when estimating a mean

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
(1) Why is the population shape a concern when estimating a mean? What effect does sample size, n, have on the
estimate of the mean? Is it possible to normalize the data when the population shape has a known skew? How would
you demonstrate the central limit theorem to your classmates?
The shape of the population is of importance if we have to eliminate all the assignable causes for the observed variation
in the data. We know that if a population is normally distributed (bell shaped curve), the variations in the
system/process are due to unassignable (random) causes alone and not due to assignable causes that are within our
control.
By taking a larger sample size, we are able to search for assignable causes that may not be apparent in a sample of small
size. That is, as N increases, the distribution approaches the normal distribution more closely. (By taking a larger sample
size, we are able to “hunt out” any outliers in the data, and by removing them, get closer to estimating the true mean of
the population)
Yes, standard transformation techniques are available to normalize a skewed data.
A simple demonstration of the CLT can be a numerical example such as If samples of size 25 are drawn from a population of standard deviation σ , the mean of the sampling distribution will be
close to the population mean () whereas the standard deviation, s = Population standard deviation,/25 = σ/5
(2) How do you calculate sample size? What factors do you need to know to calculate it?
Sample size calculation (for estimating a mean) requires the knowledge of the level of confidence, the population
standard deviation and the margin of error (tolerance).
The formula relating these factors is N = (z * σ/E)^2 or (t * σ/E)^2
Sample size calculation (for estimating a proportion) requires the knowledge of the level of confidence, the population
proportion and the margin of error (tolerance).
The formula relating these factors is N = (z/E)^2 * [p * (1 - p)]
(3) Why do so many of life’s events share the same characteristics as the central limit theorem? Why are estimations
and confidence intervals important? When might systematic sampling be biased? Explain. What roles do confidence
intervals and estimation play in the selection of sample size?
In non-mathematical terms, the Central Limit Theorem says that when we put together a lot of random events, the
aggregate will tend to follow a bell-curve. That's how we get from something distributed linearly (say, the roll of a die,
where each number is equally likely) to a curve where most events are near the average, and the farther an event is
from the average, the less likely it is. Most occurrences in nature may appear to be random (mostly because of the sheer
size and the diverse factors in play) but when statistically analyzed, they are seen to fit the “bell-shaped” normal
distribution. For example, how tall a person will be is the sum of a number of random variables (what genes the person
has, what kind of food she eats, general state of health etc), and so people's heights distributes like a bell curve. The
same thing applies to almost every physical property of living things. Political polling tells us that if we sum up a group of
randomly-polled people, we will get a pretty good approximation of what would happen if we polled everybody. Thus,
many events of life share the same characteristics as the central limit theorem.
Estimations are inferential tools that are used when we know there is an effect (or we have found an effect) in the
sample and we want to quantify the size of the effect in a population. They are important because that is the only way
we can get an idea of what to expect in a population based on the information extracted from the sample.
Confidence intervals are required to qualify our estimation. They act as “covers” around an estimate. Without a
confidence interval an estimation is meaningless.
Systematic sampling involves a random start and then proceeds with the selection of every k th element from there. This
sampling method may be biased if periodicity is present in the population and the period is a multiple or factor of the
interval used in the sampling. In this case, the sample is not representative of the population.
The sample size is determined by specifying the preferred width of the confidence interval. For this, we state a margin of
error, and a level of confidence. Thus, estimation and confidence intervals together are critical factors in sample size
determination.
(4) As a sample size approaches infinity, how does the t distribution compare to the normal z distribution? When you
draw a sample from a normal distribution, what can you conclude about the sample distribution? Explain.
A careful look into the t distribution probability tables, and we observe that as the number of degrees of freedom
become greater than about 30 the values of the t table are very close to those of the standard normal distribution table
(This is the basis for the rule of thumb of having 30 or more samples for normality -- The t- statistic approximates the zstatistic as n >> 30 and approaches infinity). The t- distribution takes into account the fact that we do not know the
population variance. As the number of the degrees of freedom increases then we have a better estimate of the
population variance and thus the student t approaches the standard normal.
The two curves appear to the identical but there are differences. For small values of n, the curve of the t- distribution is
platykurtic. The peak is narrower and the tails are fatter as compared to the normal distribution curve. This means at
lower degrees of freedom, the critical t- value is higher than the critical z- value. This means the t- test is tougher and
the sample evidence has to be more extreme for the null hypothesis to be rejected.
When a sample is drawn from a normally distributed population, the sample units are also normally distributed.
(5) A mayoral election race is tightly contested. In a random sample of 1,100 likely voters, 572 said they were
planning to vote for the current mayor. Based on this sample, what is your initial hunch? Would you claim with 95%
confidence that the mayor will win a majority of the votes? Explain.
572/1100 = 0.52. It appears that the election is tightly contested.
p = 0.52, q = 1 - p = 0.48
Standard error, SE = (pq/n) = (0.52 * 0.48/1100) = 0.0151
H0: p = 0.5 and Ha: p > 0.5
z = (p - p')/SE
z = (0.52 - 0.5)/0.0151 = 1.3245
P(z > 1.3245) = 0.093
Since 0.093 > 0.05, we cannot say with 95% confidence that the mayor will win a majority of votes.