Download Here

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Inductive probability wikipedia , lookup

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Law of large numbers wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Honors Math 3
Unit 3: Inferential Statistics
Name_________________________
Unit 3 Review
Your unit 3 test will take place on Wednesday, 1/11 during class. This is a list of the important topics we have
covered in this unit:
1. Statistics for a data set / Measures of spread
 Find the mean (equally-likely outcomes or weighted), variance (mean squared deviation), and standard
deviation



Prove and use this alternate formula for variance: 𝑥2 − (𝑥)2
Prove and use the idea that means and variances are additive
Prove effects on mean and variance when a constant is added to or multiplied by each value in a data
set.
2. Repeated experiments
 Given the statistics (mean, variance, and/or standard deviation) for a single experiment, find the
statistics for summing the results of the experiment repeated n times
 For repeated Bernoulli trials, find the exact probability for k successes (using the formula involving a
combination number)
 For repeated Bernoulli trials, find the mean, variance, and standard deviation
3. Probability distributions
 Make probability histograms
 For normal distributions, apply the 68/95/99+% rule and use normalcdf function to approximate
probability
 Apply the Central Limit Theorem to identify normal distributions for answering probability questions (CLT
for sums vs CLT for averages)
 Compute z-scores to compare values from multiple data sets
4. Sample surveys
 Identify appropriate methods for random sampling
 Given a sample proportion, use confidence intervals to estimate what values are plausible for the
population parameter (assume 95% confidence if not specified)
 Find margins of error (expressed as percent)
 Understand that correlation does not imply causation
 Understand the differences between sample surveys, observational studies, and experiments, and
biases associated with each
5. Assessing the effectiveness of a treatment (experiments)
 Identify appropriate randomization methods for selecting the treatment and control groups
 Recognize outcomes that show evidence that a treatment is effective (using 5% tolerance)
1. The following table shows the distribution of days in the hospital after birth for 57 new mothers.
Days in
hospital
1
2
3
4
5
Number of
Mothers
14
34
5
3
1
a. Calculate the mean, variance, and standard deviation for the above data.
b. Does this data have normal distribution? Tell why or why not.
2. A manufacturer produces a large number of toasters. From past experience, the manufacturer knows that
approximately 2% are defective. In a quality control procedure, we randomly select 20 toasters for testing.
a. Determine the probability that exactly one of the toasters is defective.
b. Find the exact probability that at most two of the toasters are defective. Include enough details so that
it can be understood how you arrived at your answer.
c. Find the mean and standard deviation for the random variable X in the toaster problem. Make sure to
define the random variable X.
3. A student scores 60 on a math test that has a mean of 54 and a standard deviation of 3, and she scores 80
on a history test with a mean of 75 and a standard deviation of 2. On which test did she do better
compared to the rest of the class?
4. The average number of pounds of meat a person consumes a year is 218.4 pounds. Assume that the
standard deviation is 25 pounds and the distribution is approximately normal.
a. Find the probability that a person selected at random consumes less than 224 pounds per year.
b. If a sample of 40 individuals is selected, find the probability that the average of the sample will be less
than 224 pounds per year.
5. In this problem you’ll need to combine two ideas from the chapter: the statistics of repeated Bernoulli
trials and calculating the standard deviation as a proportion or percentage of the number of trials.
Suppose that a Bernoulli trial with probability p is repeated n times.
a. In terms of p and n, calculate the standard deviation as a proportion of the number of trials. Hint:
calculate n then simplify.
b. In part a you found the standard deviation as a proportion of the number of trials. What is the limit of
this result when n gets larger and larger?
c. Apply the result of part a to 1,000,000 flips of a 60%/40% unfair coin.
6. The travel times for the MBTA 76 bus from Hanscom to Alewife have a normal distribution with a mean of
39 minutes with a standard deviation of 3.5 minutes.
Answer the following questions, using appropriate calculator functions where needed.
a. Suppose that the MBTA wishes to publish a range of possible travel times for this route, such that
99.7% of the trips will fall in the range. Find this range.
b. What percent of this bus route’s trips have a travel time between 35 and 45 minutes?
c. For each of the following travel times, calculate the z score, then state whether the travel time would
be fairly typical, highly unusual, or in-between:
40 minutes, 30 minutes, 35 minutes, 45 minutes, 60 minutes
d. Tony, Sasha, and Derman separately traveled this bus route and calculated their z-scores: z = 0.3 for
Tony, z = –1.5 for Sasha, and z = 2.1 for Derman. Find their travel times.
e. Using an appropriate calculator function, approximate the probability that the travel time on this bus
route, rounded to the nearest minute, will be 38 minutes.
f. An MBTA executive is investigating a complaint that trips on this bus sometimes take more than
45 minutes. What percent of the trips take more than 45 minutes?
g. Suppose that the MBTA wishes to revise its range of possible travel times for this route, such that 97%
of the trips will fall in the range. Find this range.
7. The average time it takes a group of adults to complete a certain achievement test is 46.2 minutes. The
standard deviation is 8 minutes. Assume the variable is normally distributed.
a. Find the probability that, if 50 randomly selected adults take the test, the average time it takes the
group to complete the test will be less than 43 minutes.
b. Find the probability that, if 50 randomly selected adults take the test, the total time it takes the group
to complete the test will be more than 2440 minutes.
8. Some of the desks at a university are designed for left-handed students. The university’s Left-Handed
Student Association is investigating whether there are enough left-handed desks on campus. The group
checks 300 desks randomly chosen from the 10,000 desks at the university, and finds that 45 of them are
left-handed desks. Based on this research, the group estimates that 15% of the university’s desks are lefthanded. What is the percent margin of error (using a 95% confidence level) for this estimate?
9. A company offers a course for students to help them prepare for the state standardized test. The company
claims the course will improve students’ ability to pass the standardized test. A researcher would like to
test this claim. Forty students volunteer to participant in the study and are divided equally into two
groups: the Treatment Group and the Control Group.
a. Describe an effective and valid way to randomize the participants into the Treatment Group and
Control Group.
b. Describe what should occur if a participant is in the Treatment Group or Control Group.
c. The following table shows the experiment’s
results. Do you believe the claim that the
course improved students’ ability to pass the
test? Show calculations to support your
answer.
Treatment Group
Control Group
Total
Passed
14
11
25
Failed
6
9
15
Total
20
20
40
Answers
1. a. mean = 2, variance =
40
, standard deviation =
57
40
» 0.838
57
b. Not normally distributed – pretty heavily skewed – not symmetrical about the mean.
æ20ö
1
2. a. ç ÷(.02) (.98)19 » 0.272
è1ø
b. P(at most 2 defective) = P(exactly 0 defective) + P(exactly 1 defective) + P(exactly 2 defective) =
æ20ö
æ20ö
æ20ö
0
20
1
19
2
18
ç ÷(.02) (.98) + ç ÷(.02) (.98) + ç ÷(.02) (.98) » 0.993
è0ø
è1ø
è2ø
c. X is the number of defective toasters out of 20 tested.
mean of X: 20(0.02) = 0.4, standard deviation of X: 20(0.02)(0.98) » 0.626
60 - 54
80 - 75
= 2, History:
= 2.5. This means that in math her
3
2
score is 2 std. dev. above the mean, while in history her score is 2.5 std. dev. above the mean. Therefore,
she did better compared to the rest of the class in history.
3. Find z-scores for each test score. Math:
4. a. normcdf (0, 224, 218.4, 25)  0.589
5. a.
c.
s
n
=
np(1- p)
=
n
p(1- p)
n
b. normcdf (0, 224, 218.4,
b.

n
25
40
)  0.922.
gets smaller as n gets larger. The limit is 0.
0.6(0.4)
= 4.899 ×10-4 or 0.0004899 or 0.04899%
1000000
6. a. 28.5 to 49.5 minutes b. normalcdf(35,45,39,3.5) ≈ 0.830, so 83%.
c. 0.286 (typical), –2.571 (unusual), –1.143 (typical), 1.714 (in-between),
6 (very unusual).
d. 40.05, 33.75, and 46.35 minutes.
e. normalcdf(37.5,38.5,39,3.5) ≈ 0.109
f. You need to pick some large number for the upper end of the interval. You’ll get roughly the same
answer regardless of the choice: normalcdf(45,1000,39,3.5) ≈ 0.0432 = 4.32%.
g. Need normalcdf(39 – x, 39 + x, 39, 3.5) = 0.97
7. a. normalcdf(0,43, 46.2,
8
50
Answer: 31.4 to 46.6 minutes.
) ≈ 0.00234
b. normalcdf(2440,1000000, 46.2 × 50, 8× 50 ) ≈ 0.0108
8. s = 300(0.15)(0.85) = 6.185
2s =12.37 is 4.12%
9. a. Sample: Put all 40 names in a hat, pull 20 names and assign to Treatment Group, rest to Control Group.
b. Treatment group should attend the course and the Control Group should not
c. There is not enough evidence to say the course improved the students’ ability to pass the test (based
on a 5% cutoff). (normalcdf (14, 20, 11, 20(.55)(.45) )=0.089)