Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
YMS Chapter 11 Inference for Distributions (T-Tests of Means) Q1. What two conditions does our text list for inference about means when the population standard deviation is not known? Q2. The sample standard deviation, divided by the sqrt of the n of observations in the sample, is called the s_____ e_____ of the sample mean. Q3. When the standard deviation of any statistic is estimated from the data, the result is called the ____ ____ of that statistic. (Thus you can have these that apply not just to the sample mean.) Q4. Does it make sense to speak of the standard deviation of the population mean? If not, why not? Q5. The z-statistic is the (x-bar-mu)/(sigma/sqrt(n)). What is the t-statistic? Q6. There is just one standard normal distribution. Is there just one t-distribution? Q7. What is the general shape of the t-distribution? Q8. As the degrees of freedom increase, the shape of the t-distribution more and more closely approximates what? Q9. Can you please explain the reason for the way the shape of the t-distribution differs from that of the normal when the degrees of freedom are low? Q10. What's the expression for the level C confidence interval for the population mean (mu), using the t distribution to estimate when the population standard deviation is unknown? Q11. How do you construct the t statistic and use it to test a hypothesis about a hypothesized population mean mu-zero? Q12. Suppose you make from the subjects who have signed up for your study a set of pairs who are as similar as you can make them. Then you randomly assign one from each pair to get a treatment, and one to get placebo or no-treatment. What sort of design is this? Q13. If in a matched pairs design, you simply measure the post-treatment outcome for each subject and want to know if the treated subjects do better than the untreated subjects, how does our text suggest conducting that test? Q14. When you using a matched pairs design as suggested, what population parameter are you estimating? Q15. What is a "robust" procedure? Q16. The t procedures are very robust against (nonnormality of the population, outliers) but not very robust against (nonnormality of the population, outliers). Q17. The main reason why the t procedures are robust against the nonnormality of the population is what theorem? Q18. The rules of thumb for sample size and t procedures are that with sample size less than ____, you can use t if the data are close to normal and there are no outliers; with sample size ___ to ____, you can use t unless there are outliers or strong skewness, and with sample size over ____, you can use t procedures even for clearly skewed distributions. Q19. When calculating the power of a t-test, (as when calculating the power of a z-test), what two steps do you go through? Q20. In power calculations, and in explaining them, there's a tricky distinction between the "alternative hypothesis" for the test, and the "alternative" that is assumed for the sake of power calculations. Please clear up any confusion on this by giving an example. (The example on page 640 will do fine.) Q21. Suppose you are hoping that the listening score of 20 subjects will improve by 2 points, and that your test can detect such an improvement. Your null hypothesis would be a mean improvement score of zero, tested against an Ha, "alternative hypothesis," that the mean is greater than zero (or, if a two-sided test is desired, that the mean is not equal to zero). But the "alternative" that you use in your power calculations is that the mean change is really 2 points. So the "alternative hypothesis" for the test is that the mean is greater than 0, and the "alternative" for the power calculation is that the mean is equal to 2. Q22. One researcher randomly samples 2 groups from a population, and gives training to one and not the other. The researcher uses a t procedure to compare the test scores of the two groups. Another researcher samples a group from the population, and gives a test to the group 2 times, once before training and once after. The researcher uses a t procedure to compare the results after testing with those before testing. How are these two situations different, and what different statistical procedures should they result in? Q23. Has it always been an assumption or condition, for every inference procedure we have studied, (as contrasted to the exploratory analysis procedures we studied) that the sample can be considered a SRS of the population of interest? Q24. Suppose someone were to draw many pairs of samples from two populations, and compute the difference between the sample means for each pair. What would the mean of this difference approach as the number of samples drawn approached infinity? Q25. The fact that the mean of the difference in sample means approaches the difference in population means as the number of samples gets larger is a long way of saying that the difference in sample means is an ____ estimator of the difference in population means. Q26. True or false: just as the difference in sample means estimates the difference in population means, the difference in sample standard deviations estimates the population standard deviation of the difference between two means. Q27. True or false: the variance of the difference between two population means is estimated by s1^2/n1 + s2^/n2, where s1 and s2 are the sample standard deviations (and thus s1^2 and s2^2 are the sample variances) and where n1 and n2 are the sample sizes. Q28. When the standard deviations of the two populations you are sampling from are different, why does the difference of the means of two independent samples not exactly follow the t-distribution? Q29. When the sample sizes for the two samples are different, how many degrees of freedom do you use for t procedures? Please mention two options. Q30. Can you explain in words how you form a confidence interval for the difference between two means? Q31. Can you explain in words how you do a hypothesis test of the null hypothesis that two group means are equal? Q32. The guidelines for sample sizes with one-sample t procedures are also applicable to two-sample t tests, if you substitute for the phrase "sample size" the phrase ____. Q33. For using t procedures with means of independent samples, an excellent approximation is achieved by using the t distribution with a not-necessarily-integer degrees of freedom computed by the formula on p. 659. This approximation is quite accurate when the sample size of both samples is what? Q34. When software gives you a choice between assuming, or not assuming, equal variances for the two populations whose means you are comparing with a t procedure, which choice should you generally make? YMS Chapter 12 Infererence for Proportions Q1. The statistic that estimates (in an unbiased way) the population proportion is ____. . Q2. What is the standard deviation of the sample proportion (provided the population is at least 10 times as big as the sample)? Q3. If np and nq are at least 10, then we can treat the distribution of p-hat as approximately what? Q4. Just for review: if asked for the standard deviation of the population proportion, what would you say? . Q5. Just for review: what does it mean to "standardize" any statistic? Q6. When you standardize p-hat, what do you subtract and divide by? Q7. What familiar ploy do we use to cope with not knowing the population proportion, in creating a standardized p-hat? . Q8. What is the expression for a confidence interval around the sample proportion? Q9. Just for review: what would you say if asked to do a hypothesis test about whether a sample proportion was equal to a certain number? Q10. How do you do a hypothesis test that a population proportion is equal to a certain number? Q11. Given a certain margin of error, for example +- .03, how do you determine how big a sample size you need to get a margin of error that small or smaller? Q12. How do you solve for sample size needed in a study of proportion, without knowing the proportion you are looking for in the first place? Please mention two strategies. Q13. If the guessed proportion is anywhere between .3 and .7, you should as a rule of thumb use what as the p* or estimated proportion of successes, for your calculations on determining the necessary sample size? Q14. Suppose we give a cancer drug to one group and not to another group, and look at the difference in proportions of people who survive for five years in these two conditions. What parameter are we trying to estimate, and what statistic do we use to estimate it? Q15. How do you find the standard deviation of the difference in sample proportions? Q16. What is the expression for the confidence interval for the difference of two proportions? Q17. What are the conditions for the confidence interval for a difference in proportions? Q18. In doing a hypothesis test for the difference of two proportions, we compute a z statistic. What, in general terms (that is without going into the specific formula) is in the numerator and the denominator of this statistic? Q19. In the special case where the null hypothesis is that the two proportions are equal, in other words the difference between them is 0, what do we do differently because of this? YMS Chapter 13 Inference for Tables: Chi-Square Procedures Q1. Suppose that you are dealing with a situation where there are several possible outcomes, not just 2 (success and failure). You are interested in seeing whether the proportion of outcomes falling into each of a certain set of categories is consistent with a certain hypothesized population distribution. What is the name of the test you use? Q2. Suppose that your hypothesized population distribution for the percent of objects that are certain colors is 20% black, 50% white, and 30% green. Suppose you draw a sample of 200, to test this hypothesis. What are the "expected" values that you use when you do the chi-square goodness of fit test? Q3. Suppose you are doing a goodness of fit test for the distribution hypothesized in Q2. What is the formula that you use for chi-square? Q4. In testing the hypothesis mentioned in Q2, suppose your observed counts are 45, 90, and 65. What does chi-square equal for this goodness of fit test? Please write a numerical expression without bothering to calculate the result. Q5. Is there just one chi-square distribution, or a family of distributions, with one distribution for each number of degrees of freedom? Q6. How do you find the number of degrees of freedom for a chi-square goodness of fit test? For example, how many degrees of freedom would there be if you were looking at the proportion of blacks, whites, and greens as in Q2? Q7. When you look up in a table or a calculator the P-value associated with a certain chisquare, what is that the probability of? Q8. Is the chi-square distribution symmetrical? If not, in which direction is it skewed? Q9. When you are doing a chi-square test for goodness of fit, what are the hypothesis H0 and the alternative hypothesis Ha? Q10. The H0 is that the population percents are equal to the set of hypothesized percents. The Ha is that the population percents do not equal that set of hypothesized percents. Q11. What are the rule of thumb conditions for the use of the chi-square goodness of fit test? Q12. What's an efficient way of calculating a chi-square on a TI 83 or 84 graphing calculator? (If you have a TI-89, please tell me how to do the same thing on it.) Q13. If a chi-square goodness of fit test yields a significant result, what should you inspect before you interpret the results? . Q14. Two-way tables describe relationships between two (categorical or continuous) variables. Q15. When there are multiple comparisons that can be made, what two steps are often carried out? Q16. When doing a chi-square test to compare several proportions, the first step is to set up the table with the numbers in it being (proportions of success and number of trials, or counts of the number of cases falling into each category). Q17. When there are two categorical variables being displayed in an r by c table (with r rows and c columns), each of the r x c possible categories into which the observations may fall is called a _____ of the table. Q18. When we are comparing the proportion of successes for three treatment conditions, what null hypothesis would we use? Q19. When comparing the proportion of successes for three treatment conditions, what would be the alternative hypothesis? Q20. In testing H0 via chi-square with a two-way table, we compare the observed counts with the expected counts. Evidence against H0 consists of observed and expected counts that are (far from each other, close to each other). Q21. How do you compute the expected count in a certain cell of a two-way table? Q22. The calculation of the expected value for a cell of a two-way table assumes what relationship between the row and column variables is (disjoint, independent). Q23. Please explain why the formula for expected cell counts is what it is. Q24. When you want to test the statistical significance of the deviation of observed from expected counts, in a two-way table, using chi-square, how do you compute the chisquare statistic? Q25. Large values of chi-square are evidence for, or against H0? Why? Q26. How many degrees of freedom do you have in a chi-square test with an r * c twoway table? Q27. True or false: when doing chi-square tests, the p-value is always the area under the distribution curve that is to the right of the observed chi-square, and never the area to the left. Q28. What cell counts are required for doing a chi-square test for homogenity of populations? Q29. In the special case of a two-by-two table (r=2 and c=2), how many cell counts need to be 5 or greater in order to do a chi-square? Q30. How many degrees of freedom would be used for a 3 by 2 table? Q31. Just for review: what are the four steps for the "inference toolbox" that you use for doing chi-square tests as well as any other hypothesis tests? Q32. After having done an overall test rejecting the hypothesis that all the proportions are equal, what should be done? Q33. True or false: the chi-square tests the hypothesis that "the row and column variables are not related to each other," even when it is difficult to conceive of the groups defined by the rows and columns as different populations, i.e. when you are dealing with the relation of some variables in one population. . Q34. True or false: for a chi-square test of association/independence of variables, you compute the expected counts just as in the other situations: the row total * column total/ table total. Q35. True or false: converting table entries to percents is not necessary for the computation of chi-square, but it does help to shed light on the association among the variables. Q36. For a chi-square test of association/independence of variables, what is the null hypothesis? Q37. True or false: the distinction between tests of homogeneity of populations and tests of association/independence is that in the first, there is a sample from each of two or more populations, and in the second, there is a single sample from a single population. Q38. When there is a two-by-two table, and you wish to compare two proportions, how will a two-sided z test for equality of proportions and a chi-square test compare with respect to the p values that result? Q39. If there is a two-by-two table and you wish to compare two proportions, which test is usually recommended, between a z test and a chi-square, and why? YMS Chapter 14: Inference for Regression Q1. To review: we use least-squares regression to study the relation between a couple of variables, both of which are (quantitative, categorical). Q2. Before doing regressions to study the relationship between two quantitiative variables, we should explore the data by examining a _______. Q3. The statistic that describes the strength of a linear relationship, that is the same whichever variable is thought of as the explanatory variable, and which has a familiar relationship to the percent of variance in one variable explained by the other, is the ______ ______. Q4. The formulas for generating a slope and intercept to form a regression line are set up so as to minimize what quantity? Q5. What is a residual? Q5. The r-squared, which is part of the regression output, tells us how much of what is what? Q6. Suppose we draw lots of samples and compute a regression line for each sample. The slope and intercept of each sample line estimate a true value. Thus the slope and intercept we obtain from our sample are _____ that estimate population ______. Q7. One of the conditions for regression inference is that for any fixed value of x, the response variable y varies according to a _____ distribution. Q8. Another assumption for regression inference is that for any fixed value of x, the repeated responses y are ____ of each other. Q9. Another assumption for regression inference is that the means of the sets of y-values for each x value have what relationship to the x values? Q10. Another assumption for regression inference is that what measure of dispersion is equal for each value of x? Q11. True or false: the slope and intercept we obtain from the least squares regression for our sample are unbiased estimators, respectively, of the line connecting the population means for each of the x's. Q12. What is the unbiased estimator for the standard deviation of the y values around the regression line (in other words, the standard deviation of the y values around the means of each of those values for each x)? .. Q13. What's the formula for the standard error s obtained from a regression? Q14. The parameter we are usually most interested in estimating from regression output is the (slope, y-intercept) of the line. Q15. How do we put a confidence interval around the slope of a regression line? . Q16. When you find the confidence interval for the slope of a regression line, one of the variables in the equation is t*. Please explain how you would find t* when your sample size is 40 and your desired level of confidence is 95%. Q17. The standard error of the slope is a function of the s. But it's not really necessary to memorize the formula for this, because _____. Q18. The most commonly tested hypothesis about regressions is that Beta, the "Population slope," is 0. Can you put this hypothesis in some other phrasings? Q19. If you form the ratio of the slope obtained in your sample to the standard error of that slope, what is the sampling distribution of that statistic? Q20. Regression output usually gives a two-sided p value for the hypothesis test that the population slope is 0. How do you obtain a one-sided p-value for the same hypothesis? Q21. Someone asks, "The regression equation we obtained from analyzing our data is y=2 + 10x. What's the 95% confidence interval for the y when x is 8?" The other person says, "That depends upon what question you want the confidence interval to answer." What does the second person mean by this? Q22. When we are getting a confidence interval for the predicted mean of y's for a given x value, our confidence interval is (narrower, wider) than when we get a "prediction interval" for a single individual? Q23, Suppose that some researchers are testing how long a thermometer takes to come to equilibrium, given the temperature difference between starting and ending temperature. They regress the time to equilibrium on the temperature difference. Then the researchers want to produce a table for a clinician. The table will contain numbers so as to help the clinician predict the time to wait for a certain individual who is using the thermometer, such that there is a 95% chance that this individual will not start trying to raise the temperature before the temperature has equilibrated. Please explain whether the researchers should use a confidence interval or a prediction interval in constructing their table. Q24. Please list 4 conditions for inference in regression. . Q25. Is there an additional condition placed upon the construction of a prediction interval via regression? Q26. Suppose that in a residual plot, the values are close to 0 when x is low, but the residuals get bigger and bigger in absolute value as the x values get greater. What condition of regression is violated in this circumstance? Q27. Someone examines a residual plot and a scatter plot and observes a curvilinear pattern. What condition of regression is being violated, and what should the researcher consider doing in order to correct this?