Download Poker, Texas Hold*em

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Answers Practice Test, Section 3 Fall 2016
1. Regression
Year
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
Per capita
consumption
of margarine
8.2
7
6.5
5.3
5.2
4
4.6
4.5
4.2
3.7
Divorce rate in
Maine
5
4.7
4.6
4.4
4.3
4.1
4.2
4.2
4.2
3.7
Predicted
Divorce
rate
5.02
4.73
4.62
4.34
4.31
4.03
4.17
4.15
4.08
3.96
Residual
-.017
-.035
-.017
.065
-.012
.070
.029
.053
.123
-.260
Regress the Divorce rate in Maine (Y) on Per capita consumption of margarine (X) (data is real)
a) Calculate the predicted Y for all the observations and enter in the chart above.
b) Calculate the residual for all the observations and enter in the chart above.
c) Write out the regression line:
Divorce = 3.09 + .235 MargC
d) Find and interpret the r & r2:
r2 = .917. 91.7% of the variance of Divorces in Maine can be explained by the variance in
consumption of margarine.
r = .957. There is a strong positive correlation between Divorces in Maine and consumption of
margarine.
e) Find and interpret b:
.235. For every additional unit (pound?) consumed of margarine, the divorce rate in Maine will
rise by .235.
f) Find and interpret the Y intercept:
3.09. If people consume no margarine, then the Divorce rate in Maine will be 3.09.
g) Find and interpret the X intercept:
-13.16. People would have to consume -13.16 units (pounds?) of margarine so that there would
be no divorces in Maine.
h) If in 2015, people ate 2.6 pounds of margarine, what is the predicted divorce rate?
3.70 divorce rate.
Conduct a hypothesis test to see if the consumption of margarine has any impact on the Divorce rate in
Maine. Use α=.02 significance level. Note: degrees of freedom = n - 2
i) Using the critical value method, what is/are the critical value(s), and which distribution is being
used?
First, note: Ho: β = 0, H1: β ≠ 0
Two-tailed test. We don’t know standard deviation, so it’s a t-test.
Critical t-values: -2.90 & 2.90
j) Using the critical value method, what is the result (statistically) of the hypothesis test and why?
The test statistic (from running the regression) is: 9.38
Since the test statistics > critical value (that is, 9.38 > 2.90), we reject the null hypothesis and
accept the alternative hypothesis.
k) Draw a diagram to represent the previous test
rejection
rejection
-2.90
crit.
value
0
2.90
crit.
value
9.38
Test
stat
l) Use the p-value method to conduct a hypothesis test
p = .0000137. Since p < α (that is, .00001 < .02), we reject the null and accept the alternative.
m) State in English your results.
There is sufficient, statistically significant evidence to reject the hypothesis that the
consumption of margarine has no effect on the divorce rate in Maine, and we accept the
alternative, that the consumption of margarine does affect the divorce rate in Maine.
2. A random sample of 300 college students was conducted. The students were asked if they had
watched the TV show “Angry Housewives of Aptos”. 113 of them said yes, they had.
a. What is the point estimate for the population proportion of college students who have
seen the show?
37.7%
b. Construct a 95% confidence interval for the population proportion
(32.2%, 43.2%)
c. What is the margin of error?
5.5%
d. How many students would need to be sampled to have a 1% margin of error while
maintaining the 95% confidence interval?
9023
(note, I used the p-hat estimate, .377, from above to estimate this)
3.
(10 points) Central Limit Theorem
a. What does the Central Limit Theorem say, and why is it so important?
The CLT says that no matter how a population is distributed, that the sample mean (which
is a random variable because it is the result of a random sample) will approximate a
normally distributed random variable with a mean equal to the population mean and a
standard deviation distributed equal to the population mean divided by the square root of
the sample size. This implies that if the sample size is large enough, the sample mean will
approximate this distribution. The common assumption is that a sample size of greater than
30 is large enough (or 10 & 10 if it’s proportion).
It is important because even if we know nothing about the population, we can draw
probabilistic and valid conclusions about the results of statistical samples if the population
is large enough.
b. The height of a maple tree is distributed normally with a mean of 31 meters and a
standard deviation of 4 meters. What is the probability of a tree being taller than 36
meters? Represent this graphically
31
=normalcdf(36, 1E99, 31, 4) = 10.6%
36
c. A group of 20 trees is selected at random. What is the probability that the average
height of these trees is more than 33 meters? Represent this graphically.
=normalcdf(33, 1E99, 31, 4/√20) = 1.3%
31
33
d. A group of 20 trees is selected at random. What is the probability that the average
height of these trees is between 30 and 32 meters? Represent this graphically.
=normalcdf(30, 32, 31, 4/√20) = 73.6%
30
31
32
e. Calculate the Standard Error of the Mean
Assuming the sample size is still 20 (from previous two questions). SE of Mean = σ/√n
= 4/√20 = .8944
4. A random sample of 65 households is conducted, and they are asked about how much they
spend on vacations and travel. The sample mean is $1,780. The population has a standard
deviation of σ = $450.
a. Construct a 95% confidence interval for the population mean.
(1671, 1889)
b. What assumption is made to create this confidence interval?
Since the sample size is large enough (greater than 30), the Central Limit Theorem can
be utilized so no matter how the population is distributed, x-bar will distributed close to
normal.
c. The population data is known to be heavily skewed to the right (the very rich spend a
lot). Does this invalidate your results?
No, since the sample size is large enough, no matter how the population is distributed,
such as heavily skewed, then we can still assume that x-bar is distributed normally.
d. What is the margin of error?
109
e. How many household would need to be surveyed for the margin of error to be $50 (at
the same confidence level)?
312
5. A sample of beers that were bought at a particular bar were measured for their volume. The
beers should be 16 ounces. Use Data Set A to represent the sample.
Data Set A: 15.7, 15.8, 16.0, 15.5, 15.7, 15.9, 16.3, 15.1, 15.4, 15.9, 16.0, 15.9,
16.1, 15.8, 15.5, 15.4, 15.8, 15.7, 16.2, 15.6
a. Construct a 98% confidence interval for the population mean.
(15.60, 15.93)
b. What assumption is made to create this confidence interval?
Since the sample size is 30 or less (small), the population must be distributed
approximately normal to be able to construct a confidence interval with a given level of
confidence.
c. What is the margin of error?
.167
d. What does it mean that you are 98% confident in that interval?
If we were to repeat the experiment, collecting 20 observations of volume of beer pours,
we would expect that 98% of those experiments would result in a confidence interval
that includes (or captures) the true population parameter – what the actual true mean of
the pours is.
6. Use Data Set A above. A sample of beers that were bought at a particular bar were measured
for their volume. Test whether the average beer was less than 16 ounces. Use α=.05
significance level.
a. State the null and alternative hypothesis
H0: µ= 16 ounces
H0: µ< 16 ounces
b. Using the critical value method, what is/are the critical value(s), and which distribution
is being used?
Note, α = .01 (stated in class). Also, we MUST assume that the data is normally
distributed since the sample size is 30 or less.
It is a t-test, one-tail (left), 19 degrees of freedom.
Critical value = -2.54
c. Using the critical value method, what is the result (statistically) of the hypothesis test
and why?
Test stat = -3.57
Since the tests stat < critical value (that is -3.57 < -2.54), we reject the null and accept
the alternative hypothesis.
d. Draw a diagram to represent the previous test.
rejection
-3.57 -2.54
Test crit.
stat value
0
e. Using the P-value method, what is the result (statistically) of the hypothesis test and
why?
p = .001. Since p < α (that is .001 < .01), then reject the null, accept the hypothesis
f. State, in English, the result of the hypothesis test
There is sufficient information to conclude that the average beer poured is not 16
ounces, and we accept that the average beer poured is less than 16 ounces.
7. 150 randomly selected voters were surveyed. 81 of the voters said they would vote “yes” on
Proposition O and/or P. Use α=.01 significance level. Conduct a hypothesis test to see if a
majority of voters will pass the propositions.
a. What conditions must hold make valid conclusions and are these conditions met?
We must have at least 10 people of each category (10 who would vote yes and 10 who
would vote no). We also much have 20 times as many people in the population as in
the sample. This means the population must be at least 3000 (150 * 20).
b. State the null and alternative hypothesis
H0: p = .5 (50%)
H0: p > .5 (50%)
c. Using the critical value method, what is/are the critical value(s), and which distribution
is being used?
Proportions use the normal distribution (z values). One tail (right) test.
z = 2.33
d. Using the critical value method, what is the result (statistically) of the hypothesis test
and why?
The test statistic = .980
We fail to reject the Null Hypothesis since the test stat is not more extreme than the
critical value (that is .98 < 2.33)
e. Draw a diagram to represent the previous test.
rejection
0 .980 2.33
Test crit.
stat value
f. Using the P-value method, what is the result (statistically) of the hypothesis test and
why?
We fail to reject the null because the p-value is not less than the level of significance,
that is p is not < α. p value = .164, which is not < .01.
g. State, in English, the result of the hypothesis test
There is insufficient evidence to conclude that a majority of voters are in favor
Proposition O & P.
8. Use Data Set A above. A sample of beers that were bought at a particular bar were measured
for their volume. Test whether the standard deviation of the pours is greater than .2 ounces.
a. State the null and alternative hypothesis
H0: σ = .2
H1: σ > .2
b. Using the critical value method, what is/are the critical value(s), and which distribution
is being used?
Note: a mistake, the level of significance was not specified – and it needs to be. So I
will assume an α=.01.
The distribution being used is χ2. This is a one tail (right) test. Since n=20, the degrees
of freedom is 19. The critical value is 36.19
c. Using the critical value method, what is the result (statistically) of the hypothesis test
and why?
You need to calculate s (sample standard deviation). Use 1 Var Calc on the calculator.
It will find that s = .294. Now use the formula for the χ2 which is it is equal to
(n -1)s2/σ2.
So now calculate 19*.2942/.22 = 41.14
Note, you should not use a rounded s estimate (the .294), use the actual value stored in
the calculator.
Since the test statistics (41.14) is more extreme ( > ) than the critical value 36.19, we
reject the null and accept the alternative
d. Draw a diagram to represent the previous test.
rejection
0
36.19 41.14
crit. Test
value stat
e. State, in English, the result of the hypothesis test
There is sufficient evidence to reject the hypothesis that the standard deviation of the
pour of beers is .2 ounces. And we accept the alternative that the standard deviation is
greater than .2 ounces.
9. What kind errors can be made when doing hypothesis testing, and how do we control those
errors?
There are two types of errors
Type I error is when you reject the null hypothesis when it IS true.
Type II error is when you fail to reject the null hypothesis when it is not true.
In statistics, you choose explicitly the probability of Type I error. This is called the
“level of significance” and is represented by an “α”. This means that in doing a
hypothesis test, the probability of rejecting the null hypothesis that is true is α. If you
reduce one type of error, you’ll increase the size of the other. For example, if you make
α smaller, you make the Type II error larger.
10. What are the different probability distributions used in hypothesis testing and under what
conditions are each used?
There are three probability distributions.
There’s the normal distribution where you use z-scores. When z-scores are used, this is
the “standardized” normal distribution which has a mean µ = 0 and standard deviation σ
= 1. The normal distribution is a bell-shaped curve. You use this when testing a mean
and the population standard deviation is known. To use this distribution, the data must
be distributed normally or the sample size must be greater than 30 so you can use the
CLT. Normal distribution is also used in proportions because when you make an
assumption of mean, p, you also form an assumption about the standard deviation. The
assumption for proportions, since the data is a binomial, is that the sample size must be
large enough to be distributed normally. This happens when there are at least 10
observations in each category. Also the size of the population must be 20 times the size
of the sample
There’s the t-distribution. This is used when you are testing a mean, but the population
standard deviation is not known. The t-distribution is also a bell-shaped curve with a
mean µ = 0 and standard deviation σ = 1 though it’s a bit heavier in the tails than the
normal distribution. When using the t-distribution, you need to specific the degrees of
freedom. The degrees of freedom will equal n-1 when testing a mean. When testing the
β of a regression, degrees of freedom will equal n-2. To use the t-distribution, the data
must be distributed normally or the sample size is large enough (>30) to use CLT.
There’s the χ2 distribution. This distribution is used to test whether a standard deviation
is equal to some quantity. The distribution starts at 0 and is skewed right. You must
know the degrees of freedom which are n-1 for tests of a standard deviation. To use this
distribution the data must be distributed normal. CLT doesn’t apply for this
distribution. CLT is about the distribution of 𝑥̅ if the sample size is large enough, not
about σ. So CLT can’t be used when testing σ.