Download Short Answer Questions - Colorado Mesa University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Short Answer Questions that could appear on exams:
1. What is the population?
2. What is a sample?
3. List three problems with using the number 868/1523 (obtained from a Gallup
poll) for the probability that all adults bought a lottery ticket last year.
4. If the sample size in the Gallup poll went from 1523 to 6523 will the percentage
that said they bought a lottery ticket most likely go up, most likely go down, or
can you not tell?
5. If you take two samples of the same size from the same population will the
percentage that bought a lottery ticket be the same?
6. Which is likely to be closer? The percentages in two samples of size 5 from the
same population, or the percentages in two samples of size 500 from the same
population?
7. In a discrete probability model all the probabilities add up to what number?
8. In a continuous probability model what adds up to 1?
9. Give 3 ways of determining probability.
10. If a give you a coin, can you find exactly the probability it will land heads?
11. Suppose I give you a bent coin, how can you estimate the probability it will land
heads?
12. Chance behavior has what property in the short run?
13. Chance behavior has what property in the long run?
14. When observing, do people tend to see the long run?
15. When observing, do people tend to give equal importance to all outcomes?
16. When observing, which outcomes do people tend to give more importance to?
17. Suppose airline A has three times as many flights out of a city than airline B
which will have a higher percent of delayed flights? Most likely A, most likely B
or you have no idea.
18. What is the notation for the population mean?
19. In a continuous distribution the mean is the area under what curve?
20. The area under xp(x) gives what value in the continuous case?
21. The area under p(x) gives what value in the continuous case?
22. The area under | x   | 2 p( x) gives what value in the continuous case?
23. What is the meaning of | x   | ?
 | x   | which tells what in everyday terms?
24. Variance is similar to
N
25. Is variance an average?
26. Variance is the average of what?
27. The variance and standard deviation measure what?
28. The mean measures what?
29. What is the notation for population variance?
30. What is the notation for population standard deviation?
31. Describe what the standard deviation is in words.
32. What is the area under the z curve?
33. What is the mean of the z curve?
34. What is the standard deviation of the z curve?
35. What is the formula for the z curve?
36. Describe how far a standard deviation is on the z curve.
37. On the z curve how much of the data is within 1 standard deviation of the mean?
38. On the z curve how much of the data is within 2 standard deviations of the mean?
39. On the z curve how much of the data is within 3 standard deviations of the mean?
40. For any probability distribution how much of the data is within 1 standard
deviation of the mean?
41. For any probability distribution how much of the data is within 2 standard
deviations of the mean?
42. For any probability distribution how much of the data is within 3 standard
deviations of the mean?
43. If a set of data is normal with a mean of 40 and a standard deviation of 8, what
shape will the data have if each piece of data has 40 subtracted and then that result
divided by 8?
44. If a set of data is normal with a mean of 40 and a standard deviation of 8, what
will be the mean of the data have if each piece of data has 40 subtracted and then
that result divided by 8?
45. If a set of data is normal with a mean of 40 and a standard deviation of 8, what
will be the standard deviation of the data have if each piece of data has 40
subtracted and then that result divided by 8?
46. What is a parameter?
47. What is a statistic?
48. Most often what is calculated, a parameter or a statistic?
49. What is the notation for the sample mean?
50. What is the notation for the sample standard deviation?
51. What is the notation for the sample variance?
52. If the sample is random, what is the best guess for  ?
53. If the sample is random, what is the best guess for  ?
_
54. The Law of Large Numbers says that for what kind of samples x is more likely to
be closer to  ?
55. If you flip a fair coin and record the percentage of heads, you will get close to
50% for what two reasons?
56. If you flip a fair coin 10 times and get close to 50% it will be mostly due to what?
57. If you flip a fair coin 1000 times and get close to 50% it will be mostly due to
what?
58. If you were to get all samples of the same size from a population with mean  ,
the mean of all these sample means would be what?
59. If you were to get all samples with replacement of size n from a population with
standard deviation  , the standard deviation of all these sample means would be
what?
60. For large samples describe the difference between sampling with and without
replacement.
61. If the original data is normal, what about the shape of all sample means from
samples of the same size?
62. If the original data is not normal, what happens to the shape of all sample means
from samples of size n as n goes up? What is the name of this theorem?
63. Consider data sets A:{25,26,26,25,24} and B:{15,25,38,22,40}. If you know one
set of data is 5 individuals and the other is 5 averages, which is more likely to be
the 5 averages? This is because the ___________ __________ of averages is
___________.
64. Explain why it makes sense that averages tend to have a smaller standard
deviation than individuals?
65. Explain the difference between  ,  _ , and s.
x
66. Which two of the three should be close and the other one is what compared to
those two?  ,  _ , and s
x
67. What does the z score tell us in terms of standard deviation?
68. We came up with the formula for P(A | B) by taking a sports team and making a
fraction for P(W | H) and the top of the fraction represented what?
69. We came up with the formula for P(A | B) by taking a sports team and making a
fraction for P(W | H) and the bottom of the fraction represented what?
70. How can you turn “how many” into “probability”?
71. True or False: P(A and B) and P(A | B) are just two ways to write the same thing?
72. True of False: P(A | B) is the same as P(B | A)?
73. Explain what P(A | B) = P(A) means in everyday terms.
74. If P(A | B) = P(A) as well as P(B | A) = P(B) we say that A and B have what
property?
75. If X and Y are independent then P(X | Y) = what?
76. To figure out how many ways a multi-step process can be done you do what?
77. How many ways can n things can be arranged in a row?
78. When finding out how many ways to pick 6 numbers from 42 numbers in a lottery
in which order does not matter we first (incorrectly) came up with what? We
realized that each outcome was being counted how many times? So we divided
by this number and came up with the correct answer of what?
79. The formula for the probability of getting exactly k successes in n trials,
 n  k nk
  p q , is a combination of what two main ideas?
k 
80. When using the normal to approximate the binomial the probability of exactly 6
successes is approximately the area from _____ to ____, and this is why we add
or subtract .5 when using the normal approximation for the binomial.
81. In the binomial setting what does n – k represent?
n
82. In the binomial setting what does   represent?
k 
83. In the binomial setting what does nq represent?
84. Is it human nature to tend to pay more attention to anecdotes or all the data?
85. Which is more important to pay attention to, anecdotes or all the data?
86. Give an example of how data beat anecdotes.
87. What is a lurking variable?
88. Give an example of lurking variable.
89. Is the mean sensitive to outliers?
90. Is the standard deviation sensitive to outliers?
91. Is the median sensitive to outliers?
92. Are the quartiles sensitive to outliers?
93. Suppose you have data only summarized in different numerical ranges. How can
you estimate the mean and standard deviation?
94. Why does the following graph make it look like drivers under 25 are the worst?
Accidents for drivers up to 50 years of age
1000
900
800
accidents
700
600
500
400
300
200
100
0
under 25
25-29
30-34
35-39
age group
40-44
45-50
95. Give two problems with the following graph.
Price for a set list of groceries
109
108.22
108
107.66
total cost
107
106.51
106
105
104
103.22
103
Albertson's
City Market
Walmart
Safeway
store
96. Why do we do statistical graphs?
97. Let’s compare percent of children abused in Idaho and Virginia. In Idaho its
22.6% and in Virginia its only 5.9%. Does this mean it is safer for children in
Virginia? Explain.
98. How is it that in 1998 North Dakota that was 45th in spending per pupil has a
much higher SAT average (by almost 200 points) than New Jersey that was 2nd in
spending per pupil?
99. Suppose in a big city it is found that in all fatal car accidents 25% were under the
influence of alcohol and 75% were not. It seems that it is better to be drunk,
explain why it is not the case.
100. Are statistical conclusions about populations based on samples ever 100% sure?
101. A good graph will show that many people most likely in Florida voted for whom
by mistake in 2000?
102. Explain why Colorado is probably doing better than Alabama in education
despite the fact Alabama has a higher SAT average than Colorado?
103. Our scatter plot of states with percent taking SAT and SAT average would
probably show what if we colored the southern states’ dots a different color 70 years
ago?
104. Our scatter plot of states with percent taking SAT and SAT average would
probably show what if we colored the southern states’ dots a different color today?
105. What is the notation for the sample linear correlation coefficient?
106. What is the notation for the population correlation coefficient?
107. What does the least squares line minimize?
108. What happens if you switch x and y when finding correlation?
109. What happens if you switch x and y when finding the regression line?
110. What happens to r if the units of measurement on x and/or y are changed?
111. Why does not r change if the units of measurement on x and/or y are changed?
112. If you are 1.43 standard deviations taller than the mean when measured in
inches, how many standard deviations above the mean will you be when measured in
centimeters?
113. The linear correlation coefficient is always between what two numbers?
114. If there is a negative relationship, then r will be negative in part because bigger
than average x’s will correspond to __________ than average y’s making
_
_



 x  x  y  y  the product of a _______ and a _______ which is ________.
 s  s 
x
y



115. Does r measure the strength of the relationship between x and y?
116. The only kind of relationship r measures is what?
117. Name three other relationships besides linear.
118. Is r sensitive to outliers?
119. Is the regression line sensitive to outliers?
120. A change in one standard deviation of x results in a change of ____ standard
deviations in y.
121. What is the meaning of r 2 ?
122. If a scatter plot does not show a linear pattern can you still find the line of best
fit?
123. If a scatter plot does not show a linear pattern should you still find the line of
best fit?
124. If r is close to 1 or -1 is that enough of a reason to find the line of best fit?
125. Are predictions for y based on an x far beyond the range of x’s you have data for
are reliable?
126. Predicting a y based on an x far beyond the range of x’s you have data for is
called what?
127. Which scatter plot shows a stronger relationship?
128. Which scatter plot will have a higher value of r?
129. If there is a strong correlation between x and y does that mean that changing x
will most likely bring about a change in y?
130. Give an example in which there is a strong association between x and y, but there
is no cause and effect.
131. There is a strong relationship between elementary kids’ grades and involvement
in soccer, explain how this could be true even if there is no cause and effect.
132. Give an example in which there is a fairly strong linear correlation between x
and y but there is another variable contributing to the differences in y besides x.
Name the two variables.
133. People are often interested in how one variable affects another, give an example
in which there are many variables involved and it is basically impossible to do so.
134. Do you think that people with an agenda will still try to show x affects y even if
the setting is too complex with many variables interacting?
135. What was a possible lurking variable that would explain why it appears that
smoking causes lung cancer despite a high correlation between smoking and lung
cancer?
136. What is some really good evidence that there is not some gene that both causes
lung cancer and nicotine addiction?
137. There is a strong correlation between education and wealth. Give a possible
lurking variable that could explain this without having education have a cause and
effect on wealth.
138. If a person is motivated they are likely to become wealthy and also become
educated. Do you think that motivation explains all the association between
education and wealth, so in fact there is no cause and effect?
139. If a person is motivated they are likely to become wealthy and also educated.
Do you think that motivation explains part of the association between education and
wealth, so in fact the cause and effect still exists, but it not a strong as many might
think?
140. Give an example in which a lurking variable makes a cause and effect look
weaker than it actually is.
141. Which scatter plot will have more scatter, or will they be about the same? A)
SAT math vs SAT verbal for individual students, B) SAT math vs SAT verbal for
state averages.
142. If you try to predict an individual student’s SAT verbal from their SAT math
using the regression line for state averages instead of individuals will the prediction
be too high, too low, or about right?
143. If you try to predict an individual student’s SAT verbal from their SAT math
using the regression line for state averages instead of individuals will the prediction
be more reliable, less reliable, or have about the right amount of reliability?
144. With categorical data a what takes the place of a scatter plot?
145. What is Simpson’s Paradox?
146. Give an example of Simpson’s Paradox.
147. Which is always possible, an experiment or an observational study?
148. If done correctly, which controls lurking variables, an experiment or an
observational study?
149. The investigators control which subjects get what treatments in which one, an
experiment or an observational study?
150. Give an example in which an experiment changed the conclusions of an
observational study.
151. Were the observational studies wrong that said women at menopause that had
hormone replacement therapy had fewer heart attacks?
152. What was the lurking variable that in observational studies made it appear that
hormone replace at menopause made women have fewer heart attacks?
153. Give three lurking variables that may explain why it appear that drinking wine
appears to be better than beer or hard liquor in observational studies.
154. How could it be proven that wine is better than beer or hard liquor when it
comes to health?
155. Suppose a large florist is deciding whether or not to accept a shipment of roses.
The florist asks a recently hired employee to go into the truck where the shipment is and
get a sample of 10 roses. What do you think this employee will do? Would you be
surprised if after accepting the shipment the florist is not happy with the overall quality of
the roses?
156. Give three biases with Mall Sampling.
157. True or false: Getting a good sample is usually pretty easy to do.
158. The reasons people get bad samples can be classified into what two categories?
159. Are volunteer response samples good?
160. Give an example of a volunteer response sample.
161. The AFA (American Family Association) has online polls. Usually these polls
will have what kind of bias?
162. The AFA (American Family Association) got upset when an online poll about
same sex marriage showed 2-1 in support of it. What happened?
163. The ultimate way to sample is to get a what kind of sample?
164. How often are SRS’s possible?
165. Is it hard to get a bad sample?
166. Is it hard to get a good sample?
167. Give an example of undercoverage.
168. Give an example of nonresponse.
169. What is the problem with undercoverage and nonresponse?
170. Suppose a large city is deciding whether or not to use tax money to build a new
stadium for its NFL football team. A newspaper is curious what the residents think
and so they send out a mail questionnaire to 10,000 addresses picked at random. Do
you think all 10,000 questionnaires will be returned? Do you think that even half
will be returned? Do you think people that would like a new stadium and those that
do not will have the same rate of mailing the questionnaires back? What sort of bias
do you think will result if the newspaper relies only on the returned questionnaires?
Should they put a story in their paper telling the residents what they think about the
potential new stadium?
171. Does the wording of a question have much affect on the answers?
172. Give an example in which the wording of a question could make quite a
difference.
173. Give an example of a sensitive question could not give accurate results.
174. Give an example of a question in which people are forgetful and the results may
not be accurate.
175. Give an example of a question asked by the wrong person that would make the
results worthless.
176. Give an example of a question that begs a certain answer and hence the results
can’t be trusted.
177. We wish to perform an experiment to see whether an online version of a Stat
course is better than an in class version. We have data from two teachers. Teacher A
teaches an online class and the average grade point for the students in this class is
2.94. Teacher B teaches a regular class and the average grade point in this class was
2.33. So we conclude the online version is better. What are three distinct problems
with this experiment?
178. What is a control group?
179. What are the three principals of experimental design?
180. What does statistically significant mean?
181. What is a placebo?
182. What is the purpose of a placebo?
183. What is a double-blind experiment?
184. What is the purpose of a double-blind experiment?
185. Statistically significant depends on what two things?
186. Give an example of how lack of realism can cause problems in a experiment.
187. In a matched-pairs experiment if each person gets both treatments, why is it still
important to divide the people up at random?
188. What is the advantage of a block design?
189. In a CI as the confidence level goes up, what happens to the margin of error?
190. In a CI as the sample size goes up, what happens to the margin of error?
191. In a CI if the standard deviation gets higher, what happens to the margin of
error?
192. All things being equal, do we prefer the margin of error to be big or small?
193. Which hypothesis {Ho or Ha} are we trying to prove in a HT?
194. If we have better evidence for Ha than for Ho, does that mean that Ho will
probably be rejected?
195. If Ho is true what is the probability that you will reject it by mistake?
196. If Ho is not true what is the probability that you mistakenly not reject it?
197. What is the total area of the rejection region in a HT?
198. If you mistakenly reject Ho, what type of error is it?
199. If you mistakenly don’t reject Ho, what type of error is it?
200. What is the chance of making a type I error?
201. What is the chance of making a type II error?
202. What is the notation for the significance level?
203. What is the notation for the total area of the rejection region?
204. Generally speaking which type of error is more important to keep small?
205. At the beginning we always assume what about Ho?
206. The pictures we draw in a HT show how ___________ would be distributed
assuming __________.
207. The edge(s) of the rejection region(s) are called what?
208. Are the critical value(s) found by a table or calculation in this class?
209. The standardized number of the statistic(s) related to the parameter(s) in Ho are
called what?
210. Is the test statistic found by a table or calculation?
211. What is the p-value in everyday terms?
212. In a right-hand tail the p-value is the area to the ________ of the test statistic.
This is because this area represents the chance of getting _________________
evidence against Ho, assuming Ho is _______.
213. In order to reject Ho, the p-value must be what compared to the significance
level?
214. Which casts more doubt on Ho, a small p-value or a large p-value?
215. Are the conditions usually met exactly when doing CIs or HTs?
216. Is it rare to see any problems when doing CIs or HTs?
217. To be a good statistician what should you do about not meeting conditions once
the data is collected?
218. To be a good statistician what should you do about any problems when doing
HTs or CIs?
219. Do CIs and HTs remedy basic flaws in the data?
220. Give an example where a SRS is called for and not met, but probably does not
cause any bad problems.
221. Give an example where a SRS is called for and not met, and this causes the
results to be useless.
222. Give an example where there was a high statistical significance of something
occurring, but it was not what people thought at first.
223. In the gastric freezing example, we were pretty sure patients were getting better,
at first doctors thought it was ______________ , but later experiments showed it was
probably just because of _________ affect? The problem was at first the gastric
freezing experiment was not _________.
224. Do outliers have much affect on the HTs and CIs we do in the class?
225. If you have an outlier that is found to be an incorrect piece of data and can’t be
corrected, the best thing to do is what?
226. If you have an outlier that is found to be a real piece of data, should you remove
it?
227. Does the margin of error in a CI fix nonresponse?
228. Does the margin of error in a CI fix undercoverage?
229. Does the margin of error in a CI fix biased data?
230. If your sample is not a random sample, can you be 95% sure that the CI has the
correct answer for the parameter?
231. There is only one thing the margin of error in a CI covers, what it that?
232. What does the p-value mean in cases where the sample you use for the HT has
problems with it?
233. What are three things that affect how small we want the p-value or the
significance level to be?
234. Should we always use the 5% significance level?
235. If you have a small sample size, what will happen to the p-value if the same
behavior is seen with a larger sample?
236. If you have a small sample size and the p-value is too high, should you just give
up on rejecting Ho?
237. Does practically significant mean the same as statistically significant?
238. A small difference that nobody would care about in the real world, but we are
really sure about is _____________ significant, but not _____________ significant?
239. When doing HTs is it best to first look at the data you collect before deciding on
Ho and Ha?
240. Is it a good idea to do many different HTs to search for things that are true?
241. Why is it not a good idea to do many different HTs to search for things that are
true?
242. Is it a good idea to do repeat the same HTs with different sets of data?
243. Why is it a good idea to repeat the same HTs with different sets of data?
244. If   40 and the   8 and the data is normal, what will be the mean of sample
means of size 16?
245. If   40 and the   8 and the data is normal, what will be the standard
deviation of sample means of size 16?
246. If   40 and the   8 and the data is normal, what will be the shape of sample
means of size 16?
247. Gosset came up with the t distributions by trying to make what product have a
high quality?
248. Suppose you have a large sample and use z in place of t will the difference be
that noticeable?
249. Suppose you have a small sample and use z in place of t will the difference be
that noticeable?
250. What is the area under a t curve?
251. What distribution is a t with  degrees of freedom?
252. How often we will be able to exactly meet the condition for CIs and HTs to be
mathematically precise?
253. Generally speaking there is there more concern with doing HTs and CIs with
small sample sizes or large sample sizes?
254. Name one problem with doing HTs and CIs with large sample sizes.
255. When using the z or t why do we not really care about the normality of the data
for large sample sizes?
256. If your degrees of freedom are not in the table what should you do?
257. If you reject an Ho assuming fewer degrees of freedom than you actually have,
will you be able to reject Ho with the correct degrees of freedom?
258. If you reject an Ho assuming more degrees of freedom than you actually have,
will you be able to reject Ho with the correct degrees of freedom?
259. If you give a 95% CI assuming fewer degrees of freedom than you actually have,
you should be ____________ than 95% sure you have the correct answer in the CI?
260. If   40 and the s  8 based on a random sample of size 16 and the data is
normal, what will be the mean of sample means of size 16?
261. If   40 and the s  8 based on a random sample of size 16 and the data is
normal, what will be the best estimate for the standard deviation of sample means of
size 16?
262. If   40 and the s  8 based on a random sample of size 16 and the data is
normal, what will be the shape of sample means of size 16?
_
263. Suppose you assume   40 and s  8 and n = 16 and x  42 and you are trying
to prove   40 .
a) Would it be better of worse if s = 9?
b) Would it be better of worse if n = 17?
_
c) Would it be better of worse if x  43 ?
264. In each case do you think that conditions are OK to do a HT or CI with the given
data:
a) You are comparing two means and your sample sizes are 5 and 8. The samples
are random. There are no outliers but the shapes of the sample data are quite
different.
b) You are comparing two means and your sample sizes are 50 and 80. The
samples are random. There are no outliers but the shapes of the sample data
are quite different.
c) You are comparing two means and your sample sizes are 5 and 8. The samples
are random. There are no outliers and the shapes of the sample data are very
close.
d) You are comparing two means and your sample sizes are 50 and 80. The
samples are random. There are no outliers and the shapes of the sample data
are very close.
e) You are comparing two means and your sample sizes are 5 and 8. The samples
are random. There are two minor outliers and the shapes of the sample data are
very close.
f) You are comparing two means and your sample sizes are 50 and 80. The
samples are random. There are two minor outliers and the shapes of the sample
data are very close.
g) You are comparing two means and your sample sizes are 5 and 8. The samples
are random. There are two minor outliers and the shapes of the sample data are
quite different.
h) You are comparing two means and your sample sizes are 50 and 80. The
samples are random. There are two minor outliers and the shapes of the sample
data are quite different.
i) You are studying a mean and have a sample of size 10. The sample data is
symmetric with no outliers and the data was collected at random.
j) You are studying a mean and have a sample of size 10. The sample data is not
symmetric and there are no outliers and the data was collected at random.
k) You are studying a mean and have a sample of size 10. The sample data is
symmetric with a minor outlier and the data was collected at random.
l) You are studying a mean and have a sample of size 100. The sample data is
symmetric with no outliers and the data was collected at random.
m) You are studying a mean and have a sample of size 100. The sample data is not
symmetric with no outliers and the data was collected at random.
n) You are studying a mean and have a sample of size 100. The sample data is
symmetric with a minor outlier and the data was collected at random.
o) You are studying the mean heights of all adult men and have a sample of size
1200. The sample data is all major league baseball players and it is symmetric
with no outliers.
265. In each case the sample is not a SRS; do you think it will be OK to do a HT or CI
with the given data?
a) You are studying the mean number of gallons of milk sold per day by a store
and your sample is 60 days all in a row.
b) You are studying the mean number of gallons of milk sold per day by a store
and your sample is 30 days starting with one day and picking every 7th day
after that.
c) You are studying the mean number of gallons of milk sold per day by a store
and your sample is 30 days starting with one day and picking every 12th day
after that.
d) You are studying the mean drying time of paint on 2x4’s sold by a home
improvement store and your sample is 40 boards all from the same shipment
and the wood is pretty much the same from shipment to shipment.
e) You are studying the mean drying time of paint on 2x4’s sold by a home
improvement store and your sample is 40 boards all from the same shipment
and the wood tends to vary quite a bit from shipment to shipment.
f) You are studying the mean drying time of paint on 2x4’s sold by a home
improvement store and your sample is 40 boards in which you choose 10
shipments spaced out over several months and then chose 4 boards from each
at shipment (one off the top, two from the middle, and one off the bottom).
g) You are studying the mean drying time of paint on 2x4’s sold by a home
improvement store and your sample is 240 boards all from the same shipment
and the wood tends to vary quite a bit from shipment to shipment.
h) You are studying the percents of cats that prefer two different types of cat
food and your sample is 42 cats that were basically all the cats of all the
people you knew real well that would participate.
i) You are studying the percents of people that prefer two different types of beer
and your sample is 42 prisoners in county jail.
j) You are studying the difference in average weights of boy 4th graders and girl
4th graders and your samples are all 52 4th grade boys from a school in
Mississippi and all 32 4th grade girls from a school in Colorado.
k) You are studying the difference in average weights of boy 4th graders and girl
4th graders and your samples are all 52 4th grade boys from a school in
Mississippi and all 32 4th grade girls from the same school.
266. For HTs and CIs for comparing means from two independent samples with small
sample sizes, you want the samples to have similar __________ with no __________.
267. How can you get a good idea about the shape of a distribution?
268. How can you get a good idea if there are outliers?
269. For HTs and CIs for comparing means from two independent samples, if you
knew the population standard deviations what distribution would you use?
270. When comparing two means, we use what arithmetic operation to compare
them?
271. When subtracting means from two independent samples {X and Y}of size n X
and nY with variances  X2 and  Y2 the variance is ______________. The standard
deviation is _____________. The best estimate for  X2 is ______ and the best
estimate for  Y2 is ________, so the best estimate for the standard deviation of
 X  Y is _______________ .
272. Suppose  X  40 and Y  40 and  X  8 and  Y  9 and n X  14 and nY  12
_
_
and X and Y are normal, what will be the shape of the distribution of x y ?
273. Suppose  X  40 and Y  40 and  X  8 and  Y  9 and n X  14 and nY  12
_
_
and X and Y are normal, what will be the mean of the distribution of x y ?
274. Suppose  X  40 and Y  40 and  X  8 and  Y  9 and n X  14 and nY  12
and X and Y are normal, what will be the standard deviation of the distribution of
_
_
x y ?
275. Suppose  X  40 and Y  40 and s X  8 and sY  9 and n X  14 and nY  12
and X and Y are normal, what will be the approximate shape of the distribution of
_
_
x y ?
276. Suppose  X  40 and Y  40 and s X  8 and sY  9 and n X  14 and nY  12
_
_
and X and Y are normal, what will be the mean of the distribution of x y ?
277. Suppose  X  40 and Y  40 and s X  8 and sY  9 and n X  14 and nY  12
and X and Y are normal, what will be the best guess for the standard deviation of the
_
_
distribution of x y ?
278. What is the notation for the sample proportion?
279. What is the notation for the population proportion?
280. If the data is random what is the best guess for p?
281. How do you go from how many successes (Binomial) to proportion of
successes?
282. If you divide the number of successes by n, the mean gets divided by ____ and
the variance gets divided by ______.
283. The mean of the binomial is np which divided by n is ______, so the mean of p’
is _______.
284. The variance of the binomial is npq which divided by n 2 is _______, so the
standard deviation of p’ is __________.
285. If a population is normal, then dividing by n will give it what shape?
286. The binomial is approximately normal when np and nq exceed _______ so p’ is
also approximately normal under the same conditions.
287. To figure the sample size, n, needed for a CI for a proportion, you are safe to use
p and q to be ______, this makes n the largest and if n is too large then the margin of
error will be even _________ than what was asked for?
288. To figure the sample size, n, needed for a CI for a proportion, if you have a
reasonable value for p’ and use it then you CI may have a margin of error a little too
big, but your sample size will be ________ making collecting the data easier.
289. Suppose p = .40. What is the approximate shape of the distribution of p’ for
samples of size 200?
290. Suppose p = .40. What is the mean of the distribution of p’ for samples of size
200?
291. Suppose p = .40. What is the standard deviation of the distribution of p’ for
samples of size 200?
pq
292. For a HT for p we use
, because we assume Ho is _____ and so have a
n
p' q'
value for p. For a CI for p we use
because we estimate p by ___ and q by
n
_____.
293. When comparing two proportions, we use what arithmetic operation to compare
them?
294. When subtracting proportions from two independent samples {X with sample
size n X and proportion p X and Y with sample size nY and proportion pY } the
standard deviations from X and Y are _______ and ______, the variances are ______
and ______, when subtracting the variance is _______________ and the standard
deviation is _________________.
295. The formula for the standard deviation of the difference of two proportions is
p1q1 p2 q2
. If we must estimate the p’s with sample numbers without the

n1
n2
assumption that the p’s are equal then the formula becomes what?
296. The formula for the standard deviation of the difference of two proportions is
p1q1 p2 q2
. If we estimate the p’s with sample numbers with the assumption that

n1
n2
the p’s are equal then the formula becomes what?
297. Suppose p X  .40 and pY  .40 and n X  140 and n y  120 , what will be the
mean of p X'  pY' ?
298. Suppose p X  .40 and pY  .40 and n X  140 and n y  120 , what will be the
standard deviation of p X'  pY' ?
299. Suppose p X  .40 and pY  .40 and n X  140 and n y  120 , what will be the
approximate shape of p X'  pY' ?
300. Are CIs and HTs about variances (or standard deviations) considered risky
compared to CIs and HTs about means?
301. Is the z distribution symmetric?
302. Are the t distributions symmetric?
303. Are the  2 distributions symmetric?
304. Are the F distributions symmetric?
305. What is the area under a  2 curve?
306. What is the area under a F curve?
307. Suppose  2 = 12 and the population is normal, what is the shape of all the
df ( s 2 )
for samples of size 25?
2

308. When comparing variances, what arithmetic operation is used?
309. Suppose X and Y are independent normal populations. What is the shape of
s X2
sY2
where the sample sizes are of size 10 for X and size 8 for Y?
310. With the O and E stuff, why is the rejection region is always to the right? It is
O  E 2 _________
because if Ho is wrong then O and E will _______, making 
E
which is to the right.
311. The E’s in the O and E stuff are found assuming what?
312. With a Test for Independence why are the Es = (row total)(column total)/(grand
total)? For E that is for Row 2 and Column 3, E should be (grand
total)(P(_________________)) = n(P( )P( )) because we assume that Ho is true
which is that the rows and columns are _____________. The best estimate for P(R2)
= ______ and for P(C3) = __________ making the estimate (R2 total)(C3 total)/n.
313. With the O and E stuff we want all the E’s to be at least what to get good
results?
314. What are the 4 assumptions for ANOVA?
315. If you do a good job of collected data from different sources, the data will vary
for only what two reasons?
316. In ANOVA to reject Ho: “all means equal” you hope the variance due to
_________ is high and the variance due to ____________ is low.
317. Variance due to factor is a weighted ____________ of the sample ___________.
318. Variance due to error is a weighted ____________ of the sample ____________.
319. Suppose you have three normal populations with equal variances and you find
_
_
_
x 1  5 , s1  11 , x 2  7 , s 2  12 , x 3  9 , and s3  13 . Would you have better
_
evidence for a difference in population means if x 3  10 instead?
320. Suppose you have three normal populations with equal variances and you find
_
_
_
x 1  5 , s1  11 , x 2  7 , s 2  12 , x 3  9 , and s3  13 . Would you have better
_
evidence for a difference in population means if x 3  8 instead?
321. Suppose you have three normal populations with equal variances and you find
_
_
_
x 1  5 , s1  11 , x 2  7 , s 2  12 , x 3  9 , and s3  13 . Would you have better
evidence for a difference in population means if s3  10 instead?
322. Suppose you have three normal populations with equal variances and you find
_
_
_
x 1  5 , s1  11 , x 2  7 , s 2  12 , x 3  9 , and s3  13 . Would you have better
evidence for a difference in population means if s3  15 instead?
323. What four things should you check in addition to graphing a scatter plot before
calculating the least-squares line?
324. Can you still do all the calculations for CIs and HTs if the data is bad?
325. Should you do all the calculations for CIs and HTs if the data is bad?
326. Give two advantages of Non-parametric statistics.
327. Give a disadvantage of Non-parametric statistics.
328. If you want to do a HT about the mean but the sample size is small and there is
an outlier, you might instead do a HT about the __________ and use what Nonparametric test?