Download YMS Chapter 11 Inference for Distributions (T

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Confidence interval wikipedia , lookup

Taylor's law wikipedia , lookup

Omnibus test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
YMS Chapter 11 Inference for Distributions (T-Tests of Means)
Q1. What two conditions does our text list for inference about means when the
population standard deviation is not known?
Q2. The sample standard deviation, divided by the sqrt of the n of observations in the
sample, is called the s_____ e_____ of the sample mean.
Q3. When the standard deviation of any statistic is estimated from the data, the result is
called the ____ ____ of that statistic. (Thus you can have these that apply not just to the
sample mean.)
Q4. Does it make sense to speak of the standard deviation of the population mean? If not,
why not?
Q5. The z-statistic is the (x-bar-mu)/(sigma/sqrt(n)). What is the t-statistic?
Q6. There is just one standard normal distribution. Is there just one t-distribution?
Q7. What is the general shape of the t-distribution?
Q8. As the degrees of freedom increase, the shape of the t-distribution more and more
closely approximates what?
Q9. Can you please explain the reason for the way the shape of the t-distribution differs
from that of the normal when the degrees of freedom are low?
Q10. What's the expression for the level C confidence interval for the population mean
(mu), using the t distribution to estimate when the population standard deviation is
unknown?
Q11. How do you construct the t statistic and use it to test a hypothesis about a
hypothesized population mean mu-zero?
Q12. Suppose you make from the subjects who have signed up for your study a set of
pairs who are as similar as you can make them. Then you randomly assign one from each
pair to get a treatment, and one to get placebo or no-treatment. What sort of design is
this?
Q13. If in a matched pairs design, you simply measure the post-treatment outcome for
each subject and want to know if the treated subjects do better than the untreated subjects,
how does our text suggest conducting that test?
Q14. When you using a matched pairs design as suggested, what population parameter
are you estimating?
Q15. What is a "robust" procedure?
Q16. The t procedures are very robust against (nonnormality of the population, outliers)
but not very robust against (nonnormality of the population, outliers).
Q17. The main reason why the t procedures are robust against the nonnormality of the
population is what theorem?
Q18. The rules of thumb for sample size and t procedures are that with sample size less
than ____, you can use t if the data are close to normal and there are no outliers; with
sample size ___ to ____, you can use t unless there are outliers or strong skewness, and
with sample size over ____, you can use t procedures even for clearly skewed
distributions.
Q19. When calculating the power of a t-test, (as when calculating the power of a z-test),
what two steps do you go through?
Q20. In power calculations, and in explaining them, there's a tricky distinction between
the "alternative hypothesis" for the test, and the "alternative" that is assumed for the sake
of power calculations. Please clear up any confusion on this by giving an example. (The
example on page 640 will do fine.)
Q21. Suppose you are hoping that the listening score of 20 subjects will improve by 2
points, and that your test can detect such an improvement. Your null hypothesis would be
a mean improvement score of zero, tested against an Ha, "alternative hypothesis," that the
mean is greater than zero (or, if a two-sided test is desired, that the mean is not equal to
zero). But the "alternative" that you use in your power calculations is that the mean
change is really 2 points. So the "alternative hypothesis" for the test is that the mean is
greater than 0, and the "alternative" for the power calculation is that the mean is equal to
2.
Q22. One researcher randomly samples 2 groups from a population, and gives training to
one and not the other. The researcher uses a t procedure to compare the test scores of the
two groups. Another researcher samples a group from the population, and gives a test to
the group 2 times, once before training and once after. The researcher uses a t procedure
to compare the results after testing with those before testing. How are these two situations
different, and what different statistical procedures should they result in?
Q23. Has it always been an assumption or condition, for every inference procedure we
have studied, (as contrasted to the exploratory analysis procedures we studied) that the
sample can be considered a SRS of the population of interest?
Q24. Suppose someone were to draw many pairs of samples from two populations, and
compute the difference between the sample means for each pair. What would the mean of
this difference approach as the number of samples drawn approached infinity?
Q25. The fact that the mean of the difference in sample means approaches the difference
in population means as the number of samples gets larger is a long way of saying that the
difference in sample means is an ____ estimator of the difference in population means.
Q26. True or false: just as the difference in sample means estimates the difference in
population means, the difference in sample standard deviations estimates the population
standard deviation of the difference between two means.
Q27. True or false: the variance of the difference between two population means is
estimated by s1^2/n1 + s2^/n2, where s1 and s2 are the sample standard deviations (and
thus s1^2 and s2^2 are the sample variances) and where n1 and n2 are the sample sizes.
Q28. When the standard deviations of the two populations you are sampling from are
different, why does the difference of the means of two independent samples not exactly
follow the t-distribution?
Q29. When the sample sizes for the two samples are different, how many degrees of
freedom do you use for t procedures? Please mention two options.
Q30. Can you explain in words how you form a confidence interval for the difference
between two means?
Q31. Can you explain in words how you do a hypothesis test of the null hypothesis that
two group means are equal?
Q32. The guidelines for sample sizes with one-sample t procedures are also applicable to
two-sample t tests, if you substitute for the phrase "sample size" the phrase ____.
Q33. For using t procedures with means of independent samples, an excellent
approximation is achieved by using the t distribution with a not-necessarily-integer
degrees of freedom computed by the formula on p. 659. This approximation is quite
accurate when the sample size of both samples is what?
Q34. When software gives you a choice between assuming, or not assuming, equal
variances for the two populations whose means you are comparing with a t procedure,
which choice should you generally make?
YMS Chapter 12 Infererence for Proportions
Q1. The statistic that estimates (in an unbiased way) the population proportion is ____.
.
Q2. What is the standard deviation of the sample proportion (provided the population is
at least 10 times as big as the sample)?
Q3. If np and nq are at least 10, then we can treat the distribution of p-hat as
approximately what?
Q4. Just for review: if asked for the standard deviation of the population proportion, what
would you say? .
Q5. Just for review: what does it mean to "standardize" any statistic?
Q6. When you standardize p-hat, what do you subtract and divide by?
Q7. What familiar ploy do we use to cope with not knowing the population proportion, in
creating a standardized p-hat?
.
Q8. What is the expression for a confidence interval around the sample proportion?
Q9. Just for review: what would you say if asked to do a hypothesis test about whether a
sample proportion was equal to a certain number?
Q10. How do you do a hypothesis test that a population proportion is equal to a certain
number?
Q11. Given a certain margin of error, for example +- .03, how do you determine how big
a sample size you need to get a margin of error that small or smaller?
Q12. How do you solve for sample size needed in a study of proportion, without knowing
the proportion you are looking for in the first place? Please mention two strategies.
Q13. If the guessed proportion is anywhere between .3 and .7, you should as a rule of
thumb use what as the p* or estimated proportion of successes, for your calculations on
determining the necessary sample size?
Q14. Suppose we give a cancer drug to one group and not to another group, and look at
the difference in proportions of people who survive for five years in these two conditions.
What parameter are we trying to estimate, and what statistic do we use to estimate it?
Q15. How do you find the standard deviation of the difference in sample proportions?
Q16. What is the expression for the confidence interval for the difference of two
proportions?
Q17. What are the conditions for the confidence interval for a difference in proportions?
Q18. In doing a hypothesis test for the difference of two proportions, we compute a z
statistic. What, in general terms (that is without going into the specific formula) is in the
numerator and the denominator of this statistic?
Q19. In the special case where the null hypothesis is that the two proportions are equal, in
other words the difference between them is 0, what do we do differently because of this?
YMS Chapter 13 Inference for Tables: Chi-Square Procedures
Q1. Suppose that you are dealing with a situation where there are several possible
outcomes, not just 2 (success and failure). You are interested in seeing whether the
proportion of outcomes falling into each of a certain set of categories is consistent with a
certain hypothesized population distribution. What is the name of the test you use?
Q2. Suppose that your hypothesized population distribution for the percent of objects that
are certain colors is 20% black, 50% white, and 30% green. Suppose you draw a sample
of 200, to test this hypothesis. What are the "expected" values that you use when you do
the chi-square goodness of fit test?
Q3. Suppose you are doing a goodness of fit test for the distribution hypothesized in Q2.
What is the formula that you use for chi-square?
Q4. In testing the hypothesis mentioned in Q2, suppose your observed counts are 45, 90,
and 65. What does chi-square equal for this goodness of fit test? Please write a numerical
expression without bothering to calculate the result.
Q5. Is there just one chi-square distribution, or a family of distributions, with one
distribution for each number of degrees of freedom?
Q6. How do you find the number of degrees of freedom for a chi-square goodness of fit
test? For example, how many degrees of freedom would there be if you were looking at
the proportion of blacks, whites, and greens as in Q2?
Q7. When you look up in a table or a calculator the P-value associated with a certain chisquare, what is that the probability of?
Q8. Is the chi-square distribution symmetrical? If not, in which direction is it skewed?
Q9. When you are doing a chi-square test for goodness of fit, what are the hypothesis H0
and the alternative hypothesis Ha?
Q10. The H0 is that the population percents are equal to the set of hypothesized percents.
The Ha is that the population percents do not equal that set of hypothesized percents.
Q11. What are the rule of thumb conditions for the use of the chi-square goodness of fit
test?
Q12. What's an efficient way of calculating a chi-square on a TI 83 or 84 graphing
calculator? (If you have a TI-89, please tell me how to do the same thing on it.)
Q13. If a chi-square goodness of fit test yields a significant result, what should you
inspect before you interpret the results?
.
Q14. Two-way tables describe relationships between two (categorical or continuous)
variables.
Q15. When there are multiple comparisons that can be made, what two steps are often
carried out?
Q16. When doing a chi-square test to compare several proportions, the first step is to set
up the table with the numbers in it being (proportions of success and number of trials, or
counts of the number of cases falling into each category).
Q17. When there are two categorical variables being displayed in an r by c table (with r
rows and c columns), each of the r x c possible categories into which the observations
may fall is called a _____ of the table.
Q18. When we are comparing the proportion of successes for three treatment conditions,
what null hypothesis would we use?
Q19. When comparing the proportion of successes for three treatment conditions, what
would be the alternative hypothesis?
Q20. In testing H0 via chi-square with a two-way table, we compare the observed counts
with the expected counts. Evidence against H0 consists of observed and expected counts
that are (far from each other, close to each other).
Q21. How do you compute the expected count in a certain cell of a two-way table?
Q22. The calculation of the expected value for a cell of a two-way table assumes what
relationship between the row and column variables is (disjoint, independent).
Q23. Please explain why the formula for expected cell counts is what it is.
Q24. When you want to test the statistical significance of the deviation of observed from
expected counts, in a two-way table, using chi-square, how do you compute the chisquare statistic?
Q25. Large values of chi-square are evidence for, or against H0? Why?
Q26. How many degrees of freedom do you have in a chi-square test with an r * c twoway table?
Q27. True or false: when doing chi-square tests, the p-value is always the area under the
distribution curve that is to the right of the observed chi-square, and never the area to the
left.
Q28. What cell counts are required for doing a chi-square test for homogenity of
populations?
Q29. In the special case of a two-by-two table (r=2 and c=2), how many cell counts need
to be 5 or greater in order to do a chi-square?
Q30. How many degrees of freedom would be used for a 3 by 2 table?
Q31. Just for review: what are the four steps for the "inference toolbox" that you use for
doing chi-square tests as well as any other hypothesis tests?
Q32. After having done an overall test rejecting the hypothesis that all the proportions are
equal, what should be done?
Q33. True or false: the chi-square tests the hypothesis that "the row and column variables
are not related to each other," even when it is difficult to conceive of the groups defined
by the rows and columns as different populations, i.e. when you are dealing with the
relation of some variables in one population. .
Q34. True or false: for a chi-square test of association/independence of variables, you
compute the expected counts just as in the other situations: the row total * column total/
table total.
Q35. True or false: converting table entries to percents is not necessary for the
computation of chi-square, but it does help to shed light on the association among the
variables.
Q36. For a chi-square test of association/independence of variables, what is the null
hypothesis?
Q37. True or false: the distinction between tests of homogeneity of populations and tests
of association/independence is that in the first, there is a sample from each of two or
more populations, and in the second, there is a single sample from a single population.
Q38. When there is a two-by-two table, and you wish to compare two proportions, how
will a two-sided z test for equality of proportions and a chi-square test compare with
respect to the p values that result?
Q39. If there is a two-by-two table and you wish to compare two proportions, which test
is usually recommended, between a z test and a chi-square, and why?
YMS Chapter 14: Inference for Regression
Q1. To review: we use least-squares regression to study the relation between a couple of
variables, both of which are (quantitative, categorical).
Q2. Before doing regressions to study the relationship between two quantitiative
variables, we should explore the data by examining a _______.
Q3. The statistic that describes the strength of a linear relationship, that is the same
whichever variable is thought of as the explanatory variable, and which has a familiar
relationship to the percent of variance in one variable explained by the other, is the
______ ______.
Q4. The formulas for generating a slope and intercept to form a regression line are set up
so as to minimize what quantity?
Q5. What is a residual?
Q5. The r-squared, which is part of the regression output, tells us how much of what is
what?
Q6. Suppose we draw lots of samples and compute a regression line for each sample.
The slope and intercept of each sample line estimate a true value. Thus the slope and
intercept we obtain from our sample are _____ that estimate population ______.
Q7. One of the conditions for regression inference is that for any fixed value of x, the
response variable y varies according to a _____ distribution.
Q8. Another assumption for regression inference is that for any fixed value of x, the
repeated responses y are ____ of each other.
Q9. Another assumption for regression inference is that the means of the sets of y-values
for each x value have what relationship to the x values?
Q10. Another assumption for regression inference is that what measure of dispersion is
equal for each value of x?
Q11. True or false: the slope and intercept we obtain from the least squares regression for
our sample are unbiased estimators, respectively, of the line connecting the population
means for each of the x's.
Q12. What is the unbiased estimator for the standard deviation of the y values around the
regression line (in other words, the standard deviation of the y values around the means
of each of those values for each x)?
..
Q13. What's the formula for the standard error s obtained from a regression?
Q14. The parameter we are usually most interested in estimating from regression output
is the (slope, y-intercept) of the line.
Q15. How do we put a confidence interval around the slope of a regression line?
.
Q16. When you find the confidence interval for the slope of a regression line, one of the
variables in the equation is t*. Please explain how you would find t* when your sample
size is 40 and your desired level of confidence is 95%.
Q17. The standard error of the slope is a function of the s. But it's not really necessary to
memorize the formula for this, because _____.
Q18. The most commonly tested hypothesis about regressions is that Beta, the
"Population slope," is 0. Can you put this hypothesis in some other phrasings?
Q19. If you form the ratio of the slope obtained in your sample to the standard error of
that slope, what is the sampling distribution of that statistic?
Q20. Regression output usually gives a two-sided p value for the hypothesis test that the
population slope is 0. How do you obtain a one-sided p-value for the same hypothesis?
Q21. Someone asks, "The regression equation we obtained from analyzing our data is
y=2 + 10x. What's the 95% confidence interval for the y when x is 8?" The other person
says, "That depends upon what question you want the confidence interval to answer."
What does the second person mean by this?
Q22. When we are getting a confidence interval for the predicted mean of y's for a given
x value, our confidence interval is (narrower, wider) than when we get a "prediction
interval" for a single individual?
Q23, Suppose that some researchers are testing how long a thermometer takes to come to
equilibrium, given the temperature difference between starting and ending temperature.
They regress the time to equilibrium on the temperature difference. Then the researchers
want to produce a table for a clinician. The table will contain numbers so as to help the
clinician predict the time to wait for a certain individual who is using the thermometer,
such that there is a 95% chance that this individual will not start trying to raise the
temperature before the temperature has equilibrated. Please explain whether the
researchers should use a confidence interval or a prediction interval in constructing their
table.
Q24. Please list 4 conditions for inference in regression.
.
Q25. Is there an additional condition placed upon the construction of a prediction interval
via regression?
Q26. Suppose that in a residual plot, the values are close to 0 when x is low, but the
residuals get bigger and bigger in absolute value as the x values get greater. What
condition of regression is violated in this circumstance?
Q27. Someone examines a residual plot and a scatter plot and observes a curvilinear
pattern. What condition of regression is being violated, and what should the researcher
consider doing in order to correct this?