Download Homework Solutions – Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Regression toward the mean wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Dr. Jaffe / Ms. Wen
Advanced Mathematics
Homework due Thursday 9/17
NAME: _______________________________________________________
1. Suppose that a murder occurs in a small town of 10,000 people, and DNA evidence is
collected. Suppose also that DNA on everyone in the town is on record. Lastly, suppose
that DNA identification is 99.9 percent specific. If we found someone in town whose DNA
matches that of the perpetrator of the murder, how likely are you as a juror to believe that
he was guilty of murder?
To answer this question, convert your percentages into numbers. Justify your answer
completely.
99.9% = .999
10,000 * .999 = 9,990
10,000 people in the town
10
share a DNA match
9,990
don't share a DNA match
At this point it would be your opinion if you find him guilty or not. If 10 out of every 10,000 will
get a positive DNA match, then do you believe beyond a reasonable doubt that he is guilty?
2. Look at every number that you see on the first page of the New York Times. Create a
table from 1 to 9 in which you tally all the first digits of each number.
Dr. Jaffe / Ms. Wen
Advanced Topics in Mathematics
Homework due Monday 9/21
NAME: ________________________________________________
The expert witness testifies that there are about 10 million men who could have been the
perpetrator. The probability of a randomly select man having a DNA profile that is
identical with the trace recovered from the crime scene is approximately .0001 percent. If
a man has this DNA profile, it is practically certain that a DNA analysis shows a match. If
a man does not have this DNA profile, current DNA technology leads to a reported match
with a probability of only .001 percent.
A match between the DNA of the defendant and the traces on the victim has been reported.
1) What is the probability that the reported match is a true match, that is, that the person
actually has this DNA profile?
P(match) = .0001 %
.0001% = .000001
.000001 * 10,000,000 = 10
P(match|not DNA) --> P(false positive) = .001%
.001% = .00001
.00001 * 9,999,990 = 100
10,000,000 men
10
matches
positive test
9,999,990
non matches
100
positive test
9,999,890
negative test
10/110 people have the DNA = 9% chance that it is a true match
2) What is the probability that the person is the source of the trace?
There is only a 1/10 chance that if the match is a true match, the person is the source of the
trace.
3) Please render your verdict for this case: Guilty or not guilty? (and justify your
reasoning)
This is your opinion. If the person is a true match, there is only a 1/10 chance they are the
source. If the person simply is a reported match, there is only a 1/110 chance.
Dr. Jaffe / Ms. Wen
Advanced Topics in Mathematics
Homework due Tuesday 9/22
NAME: ___________________________________________________________________
1. Why is it important to consider the way we represent a statistic (a percentage vs. a
number)?
A statistic can be vague or difficult to follow if we present it as a percentage. When we are
dealing with statistics that have conditions (ex. p(match|+ test) ), numbers are far less
confusing. When we are dealing with a single statistic and very large numbers, a percentage can
be clearer. We want to make sure we are presenting information in a clear way.
2. Do you think statistics play a role in jury verdicts?
In an ideal world, absolutely. Unfortunately, the jury is not always given the statistic that is
relevant or they do not have enough knowledge to break it down to get an accurate picture. As a
result, they often do not play a role in verdicts.
3. Given the following statistics about a test for a rare immunodeficiency as follows:
P(child having the immunodeficiency) = .1%
P(+ test | child with immunodeficiency) = 92%
P(+ test | child without immunodeficiency) = 5%
What is the probability that if a child is tested and receives a positive test, they actually
have the immunodeficiency?
100,000 children
100 with ID
92
+ test
99,900 without ID
8
- test
4,995
+ test
94,905
- test
There are a total of 5,087 positive tests and only 92 of them are true immunodeficiencies:
92 / 5087 = 1.81%
4. What does the mean tell us about a set of data?
It tells us the sum of the numbers divided by the number of numbers (the sum of the distances
from each data point below the mean to the mean = the sum of the distances from each data
point above the mean to the mean)
5. Give two methods of visually representing data?
Bar Graph, Pie Chart, Scatter Plot, Line Graph, Histogram, Stem-and-Leaf Plot, Box-andWhisker Plot, Table, Venn Diagram
Dr. Jaffe / Ms. Wen
Advanced Topics in Mathematics
Homework due Monday 9/28
NAME: ___________________________________________________________
1. What is the purpose of statistics?
Statistics can tell us trends in data. They are a numeric analysis of what happened in the past.
2. Add one piece of data to the list below that will significantly affect the mean. Explain
why it works without actually calculating the mean.
52, 63, 65, 71, 79, 84, 88
If you add any outlier, that will significantly affect the mean. If you add 1, for example, that will
decrease the mean, and if you add 2000, you will significantly increase the mean.
3. What is the difference between parameters and statistics?
Parameters are what we use to analyze data collected from a population while statistics are
what we use to analyze data collected from a sample.
4. If we want to determine the average salary in the United States, describe who might be
in our sample so we can get an accurate answer. Explain.
A random sample would be effective. We can select 1000 from each state, for example.
5. When is it possible to have more than one mode?
Example:
1,1,2,2,3,3,5,6,6,7
Mode: 1,2,3, and 6
Each of these numbers appears the same number of times and more than any other number in the
list
6. What does a scatter plot tell us about our data?
Scatter Plot = exact data points and any trends or patterns
7. What is the difference between a bar graph and a histogram?
A bar graph can have non-numerical data, and because of this, the bars do not touch.
8. Give one example of a set of data for which we would want to find the mode (the
number that occurs most frequently).
There are many examples, but two are as follows:
We would want to determine the subway station that has the most people who jump the turnstile.
We would put more security at that subway station.
We would want to determine which republican candidate had the most votes because they will
become the Republican nominee in the Presidential election.
Dr. Jaffe / Ms. Wen
Advanced Topics in Mathematics
Homework due Wednesday 9/30
NAME: ___________________________________________________________
1. Why do we divide by n – 1 when finding the standard deviation for a sample?
We divide by n-1 because it makes our sample standard deviation a little bit higher so it is more representative of
the population. A sample tends to have a little less variation than a population, which is why we increase it a little
bit.
2. When we use 1-Var stats to find standard deviation Sx is always greater than Οƒx. Why?
The calculator does not know if our data is from a sample or a population so one calculation Sx represents the value
if it were a sample. The calculator divided by n-1 in this case, resulting in a slightly higher number.
3. Given the following set of data: 12, 23, 42, 51, 66, 68, 69, 73, 85, 89, 93
a) Find the population standard deviation.
25.18
b) Find the sample standard deviation.
26.40
4. Following are the number of calories in a basic hamburger (one meat patty with no cheese) at various fast
food restaurants around the country. πŸ‘πŸ–πŸŽ, πŸ•πŸ—πŸŽ, πŸ”πŸ–πŸŽ, πŸ’πŸ”πŸŽ, πŸ•πŸπŸ“, πŸπŸπŸ‘πŸŽ, πŸπŸ’πŸŽ, πŸπŸ”πŸŽ, πŸ—πŸ‘πŸŽ, πŸ‘πŸ‘πŸ, πŸ•πŸπŸŽ, πŸ”πŸ–πŸŽ,
πŸπŸŽπŸ–πŸŽ, πŸ”πŸπŸ, πŸπŸπŸ–πŸŽ
a) Find the mean
679.2
b) Find the sample standard deviation
307.62
c) Create a relative frequency histogram and describe its shape (symmetric, skewed right, skewed
left). Explain.
Interval
Frequency
Relative Frequency
200-399
4
4/15 = .27
400-599
1
1/15 = .07
600-799
6
6/15 = .4
800-999
1
1/15 = .07
1000-1199
3
3/15 = .2
The relative frequency histogram will be symmetrical. There will be a mound at 600-799. The relative
frequencies of 400-599 and 800-999 are the same, and the relative frequencies of the first and last intervals are
almost the same.
5. For the following histogram, describe the shape, and give estimates of the mean and
standard deviation of the distributions.
Distribution of head circumferences (mm)
There will be a variation of answers. The shape is symmetric. We can estimate the mean at about 560 since this is
where the mound is. Any answer between 555 and 575 would be acceptable The standard deviation is probably
around 20, since about 70% of the data falls between 540 and 580. Any answer between 15 and 25 would be
acceptable.
Dr. Jaffe / Ms. Wen
Advanced Mathematics
Homework due Thursday 10/1
NAME: ___________________________________________________________
1. How do we know when a normal distribution is an appropriate way to represent our data?
The relative frequency histogram should look symmetrical with a mound.
2. The histogram below shows the distribution of heights (to the nearest inch) of 𝟏, 𝟎𝟎𝟎 young women.
a) Mark the mean on the graph, and mark one deviation above and below the mean. Approximately
what proportion of the values in this data set are within one standard deviation of the mean?
b) Draw a smooth curve that comes reasonably close to passing through the midpoints of the tops of the
bars in the histogram. Describe the shape of the distribution.
The curve is not perfect, but you can see that it is symmetrical with a mound in the center.
c) What is the width of each bar? What does the height of the bar represent?
The width of each bar is 1 inch.
d) Shade the area under the curve that represents the proportion of the data within one standard
deviation of the mean.
3. Below is a histogram of the top speed of different types of animals. Estimate the mean and sample
standard deviation, and justify your reasoning.
There will be a variety of answers. The mean is approximately 40 (any answer between 35 and 45 would be
acceptable. The histogram is has a mound at 38, but it is slightly skewed to the right so the mean will be pulled a
little bit towards the tail. The standard deviation would be about 15 and answer between 10 and 20 would be
acceptable. About 70% of our data is between 25 and 55.
Dr. Jaffe / Ms. Wen
Advanced Mathematics
Homework due Tuesday 10/6
NAME: ___________________________________________________________
1. What is a z-score?
A z-score will tell us how many standard deviations away from the mean that piece of data is.
2. In a normal distribution, how do we calculate the percentage of data that falls one standard deviation
below the mean to one standard deviation above the mean?
We add all the relative frequencies that fall between the lines marking one standard deviation below the mean and
one standard deviation above the mean.
3. What is a relative frequency histogram?
A relative frequency histogram tells us the percentage of data within each interval. On the x-axis are the intervals,
and on the y-axis are the relative frequencies.
4. The prices of the printers in a store have a mean of $πŸπŸ’πŸŽ and a standard deviation of $πŸ“πŸŽ. The printer
that you eventually choose costs $πŸ‘πŸ’πŸŽ.
a. What is the 𝒛 score for the price of your printer?
z-score =
π’—π’‚π’π’–π’†βˆ’π’Žπ’†π’‚π’
𝒔𝒕𝒂𝒏𝒅𝒂𝒓𝒅 π’…π’†π’—π’Šπ’‚π’•π’Šπ’π’
=
πŸ‘πŸ’πŸŽβˆ’πŸπŸ’πŸŽ
πŸ“πŸŽ
=𝟐
b. How many standard deviations above the mean was the price of your printer?
It was two standard deviations above the mean
5. Ashish’s height is πŸ”πŸ‘ inches. The mean height for boys at his school is πŸ”πŸ–. 𝟏 inches, and the standard
deviation of the boys’ heights is 𝟐. πŸ– inches.
a. What is the 𝒛 score for Ashish’s height? (Round your answer to the nearest hundredth.)
z-score =
π’—π’‚π’π’–π’†βˆ’π’Žπ’†π’‚π’
𝒔𝒕𝒂𝒏𝒅𝒂𝒓𝒅 π’…π’†π’—π’Šπ’‚π’•π’Šπ’π’
=
πŸ”πŸ‘βˆ’πŸ”πŸ–.𝟏
𝟐.πŸ–
= βˆ’πŸ. πŸ–πŸ
b. What is the meaning of this value?
This means Ashish's height is 1.82 standard deviations below the mean.
c. Explain how a 𝒛 score is useful in describing data.
A z-score can help us determine the percentage of data within a certain range of standard deviations. If we
don't have the relative frequency histogram, we can't just add the relative frequencies, but by knowing the
z-score, if the distribution is normal we can use the areas under the normal curve.
Dr. Jaffe / Ms. Wen
Advanced Mathematics
Homework due Wednesday 10/7
NAME: ___________________________________________________________
1. The distribution of lifetimes of a particular brand of car tires has a mean of πŸ“πŸ, 𝟐𝟎𝟎 miles and a standard
deviation of πŸ–, 𝟐𝟎𝟎 miles. You can use your calculator, but write what you plugged in.
a.
Assuming that the distribution of lifetimes is approximately normally distributed and rounding
your answers to the nearest thousandth, find the probability that a randomly selected tire lasts
i.
between πŸ“πŸ“, 𝟎𝟎𝟎 and πŸ”πŸ“, 𝟎𝟎𝟎 miles.
55000βˆ’51200
z-score for 55,000 =
z-score for 65000 =
= .46
8200
65000βˆ’51200
8200
= 1.68
Normalcdf(.46, 1.68) = .275
ii.
less than πŸ’πŸ–, 𝟎𝟎𝟎 miles.
z-score for 48000 =
πŸ’πŸ–πŸŽπŸŽπŸŽβˆ’πŸ“πŸπŸπŸŽπŸŽ
πŸ–πŸπŸŽπŸŽ
= -.39
Normalcdf(-999, -.39) = .348
iii. at least πŸ’πŸ, 𝟎𝟎𝟎 miles.
z-score for 41000 =
41000βˆ’51200
8200
= -1.24
Normalcdf(-1.24, 999) = .893
b.
Explain the meaning of the probability that you found in part (a–iii).
If a large number of tires of this brand were to be randomly selected, then you would
expect about πŸ–πŸ—. πŸ‘% of them to last more than πŸ’πŸ, 𝟎𝟎𝟎 miles.
2.
Suppose that a particular medical procedure has a cost that is approximately normally distributed with a
mean of $πŸπŸ—, πŸ–πŸŽπŸŽ and a standard deviation of $𝟐, πŸ—πŸŽπŸŽ. For a randomly selected patient, find the
probabilities of the following events. (Round your answers to the nearest thousandth.)
a.
The procedure costs between $πŸπŸ–, 𝟎𝟎𝟎 and $𝟐𝟐, 𝟎𝟎𝟎.
z-score for 18000 =
z-score for 22000 =
18000βˆ’19800
2900
22000βˆ’19800
2900
= -.621
= .759
Normalcdf(-.621, 22000) = .509
b.
The procedure costs less than $πŸπŸ“, 𝟎𝟎𝟎.
z-score for 15000 =
15000βˆ’19800
2900
=-1.655
Normalcdf(-999, -1.655) = .049
c.
The procedure costs more than $πŸπŸ•, πŸπŸ“πŸŽ.
z-score for 17250 =
17250βˆ’19800
2900
= -.879
Normalcdf(-.879, 999) = .810
3. A farmer has πŸ”πŸπŸ“ female adult sheep. The sheep have recently been weighed, and the
results are shown in the table below.
πŸπŸ’πŸŽ to <
πŸπŸ“πŸŽ
πŸ–
Weight
(pounds)
Frequency
πŸπŸ“πŸŽ to <
πŸπŸ”πŸŽ
πŸ‘πŸ”
πŸπŸ”πŸŽ to <
πŸπŸ•πŸŽ
πŸπŸ•πŸ‘
πŸπŸ•πŸŽ to <
πŸπŸ–πŸŽ
𝟐𝟐𝟏
πŸπŸ–πŸŽ to <
πŸπŸ—πŸŽ
πŸπŸ’πŸ—
πŸπŸ—πŸŽ to <
𝟐𝟎𝟎
πŸ‘πŸ‘
𝟐𝟎𝟎 to <
𝟐𝟏𝟎
πŸ“
a.
Construct a histogram that displays these results.
b.
Looking at the histogram, do you think a normal distribution would be an appropriate model for this
distribution?
Yes. The histogram is approximately symmetric and mound shaped.
c.
The weights of the πŸ”πŸπŸ“ sheep have mean πŸπŸ•πŸ’. 𝟐𝟏 pounds and standard deviation 𝟏𝟎. 𝟏𝟏 pounds. For
a normal distribution with this mean and standard deviation, what is the probability that a randomly
selected sheep has a weight of at least πŸπŸ—πŸŽ pounds? (Round your answer to the nearest thousandth.)
Using Normalcdf(πŸπŸ—πŸŽ, πŸ—πŸ—πŸ—, πŸπŸ•πŸ’. 𝟐𝟏, 𝟏𝟎. 𝟏𝟏), you get 𝑷(β‰₯ πŸπŸ—πŸŽ) = 𝟎. πŸŽπŸ“πŸ—.
4.
State if the following is an observational study, survey, or experiment, and give a reason for your answer.
Ken wants to compare how many hours a week that sixth graders spend doing mathematics homework to
how many hours a week that eleventh graders spend doing mathematics homework. He randomly selects
ten sixth graders and ten eleventh graders and records how many hours each student spent on
mathematics homework in a certain week.
This is a survey. He is selecting a group and asking questions
5.
Data from a random sample of πŸ“πŸŽ students in a school district showed a positive relationship between
reading score on a standardized reading exam and shoe size. Can it be concluded that having bigger feet
causes one to have a higher reading score? Explain your answer.
No. The bigger feet and higher reading score are only correlated. They both increase as someone gets older.
Use the following scenarios for Problems 6–8.
A. Researchers want to determine if there is a relationship between whether or not a woman smoked
during pregnancy and the birth weight of her baby. Researchers examined records for the past five
years at a large hospital.
B. A large high school wants to know the proportion of students who currently use illegal drugs.
Uniformed police officers asked a random sample of 𝟐𝟎𝟎 students about their drug use.
C. A company develops a new dog food. The company wants to know if dogs would prefer its new food
over the competition’s dog food. One hundred dogs, who were food-deprived overnight, were given
equal amounts of the two dog foods; the new food vs. competitor’s food. The proportion of dogs
preferring the new food was recorded.
6.
Which scenario above describes an experiment? Explain why.
Scenario C is an experiment because we have groups that are receiving treatments (dogs with new
food & dogs with old food), and there were responses (preference for which food)
7.
Which scenario describes a survey? Will the results of the survey be accurate? Why or why not?
Scenario B is a survey. A group of students is being asked questions. The survey is not going to be
accurate because students are unlikely to reveal to a police officer drug use.
8.
The remaining scenario is an observational study. Is it possible to perform an experiment to determine if
a relationship exists? Why or why not?
We can perform an experiment by creating groups with different treatments (women smoke during pregnancy &
women do not smoke during pregnancy. The response would be whether or not the size of the baby changes.
However, this experiment would be unethical. We cannot have a treatment group in which we may be hurting a
human being.
Dr. Jaffe / Ms. Wen
Advanced Mathematics
Homework due Thursday 10/8
NAME: ___________________________________________________________
1. A salesman kept track of the gas mileage for his car over a πŸπŸ“-week span.
The mileages (miles per gallon rounded to the nearest whole number) were
πŸπŸ‘, πŸπŸ•, πŸπŸ•, πŸπŸ–, πŸπŸ“, πŸπŸ”, πŸπŸ“, πŸπŸ—, πŸπŸ”, πŸπŸ•, πŸπŸ’, πŸπŸ”, πŸπŸ”, πŸπŸ’, πŸπŸ•, πŸπŸ“, πŸπŸ–, πŸπŸ“, πŸπŸ”, πŸπŸ“, πŸπŸ—, πŸπŸ”, πŸπŸ•, πŸπŸ’,
πŸπŸ”.
a) Construct a relative frequency table
Interval
23
24
25
26
27
28
29
Frequency
1
3
5
7
5
2
2
Relative Frequency
.04
.12
.2
.28
.2
.08
.08
b) Estimate the mean. Justify your answer.
The data is almost symmetrical, so the mean would be at the mound, which is 26
c) Estimate the standard deviation. Justify your answer.
I would estimate the standard deviation to be 1. Sixty Eight percent of the data falls between 25 and 27.
2. The USDA document described in Example 1 also states that the average cost of food for a πŸπŸ’β€“πŸπŸ– year old
female (again, on the β€œModerate-cost” plan) is $πŸπŸπŸ“. 𝟐𝟎 per month. Assume that the monthly food cost for a
πŸπŸ’β€“πŸπŸ– year old female is approximately normally distributed with mean $πŸπŸπŸ“. 𝟐𝟎 and standard deviation
$πŸπŸ’. πŸ–πŸ“.
Find the probability that the monthly food cost for a randomly selected πŸπŸ’βˆ’πŸπŸ– year old female is
i.
less than $πŸπŸπŸ“. Use z-scores
z-score for 225 =
225βˆ’215.20
14.85
=-.66
Normalcdf(-999,-.66) = .7454
ii.
between $πŸπŸ—πŸŽ and $𝟐𝟐𝟎. Use z-scores
z-score for 190 =
z-score for 220 =
190βˆ’215.2
14.85
=-1.697
220βˆ’215.2
14.85
=.323
Normalcdf(-1.697, .323) = .582
iii. more than $πŸπŸ“πŸŽ. Do not use z-scores
Normalcdf(250, 999, 215.20, 14.85) = .0096
iv. between $180 and $205. Do not use z-scores
Normalcdf(180, 250, 215.20, 14.85) = .237
Dr. Jaffe / Ms. Wen
Advanced Mathematics
Homework due Tuesday 10/13
NAME: ___________________________________________________________
1. A student conducted a simulation of πŸ‘πŸŽ coin flips. Below is a dot plot of the sampling distribution of the
proportion of heads. This sampling distribution has a mean of 𝟎. πŸ“πŸ and a standard deviation of 𝟎. πŸŽπŸ—.
a.
Describe the shape of the distribution.
The shape is irregular, perhaps a little skewed to the left.
b.
Describe what would have happened to the mean and the standard deviation of the sampling
distribution of the sample proportions if the student had flipped a coin πŸ“πŸŽ times, calculated the
proportion of heads, and then repeated this process for a total of πŸ‘πŸŽ times.
The more times we flip the coin for each sample, the more accurate our proportion; therefore, the sample
distribution would become more symmetrical. The mound would be at .5, which would be the mean. The
standard deviation would get smaller.
2. What effect does increasing the sample size have on the standard deviation of the sampling distribution?
The standard deviation decreases as we increase the size of each sample. Our sample proportion for each
sample becomes more accurate so the standard deviation will get smaller. Our proportions get closer and
closer to the mean.
3. The same student flipped the coin 𝟏𝟎𝟎 times, calculated the proportion of heads, and repeated this process
a total of πŸ’πŸŽ times. Below is the sampling distribution of sample proportions of heads. The mean and
standard deviation of the sampling distribution is 𝟎. πŸ’πŸŽπŸ“ and 𝟎. πŸŽπŸ’πŸ”. Do you think this was a fair coin? Why
or why not?
This is not a fair coin. If the coin were fair, then there would be an equal chance of getting heads or
tails. This would mean that the mound of the sample proportions would be at .5, which is not the
case here.
4. The durations of high school baseball games are approximately normally distributed with mean
πŸπŸŽπŸ“ minutes and standard deviation 𝟏𝟏 minutes. Suppose also that the durations of high school softball
games are approximately normally distributed with a mean of πŸ—πŸ“ minutes and the same standard deviation,
𝟏𝟏 minutes. Is it more likely that a high school baseball game will last between 𝟏𝟎𝟎 and 𝟏𝟏𝟎 minutes or
that a high school softball game will last between 𝟏𝟎𝟎 and 𝟏𝟏𝟎 minutes? Answer this question without
doing any calculations and justify your reasoning!
70% of the games will last within one standard deviation of the mean. That would mean 70% of the softball games
last between 94 and 116 minutes, and 70% of the basketball games would last between 84 and 106 minutes. This
would mean it is more likely that the softball games would last between 100 and 110 minutes because the bulk of the
data surrounds the mean.
5. What does a sample proportion tell us?
A sample proportion tells us the percentage of successful outcomes we get in a sample.
6. How is it possible to get different sample proportions? Explain.
Each sample results in different data. Not all samples are uniform. You may be testing different subjects or the
probability of getting a particular outcome is only 50%, for example, so you will not get the same result each time.
Dr. Jaffe / Ms. Wen
Advanced Mathematics
Homework due Wednesday 10/14
NAME: ___________________________________________________________
1. What is a margin of error?
A margin of error is our wiggle room. It creates an interval so we can generalize to a population.
2. The following intervals were plausible population proportions for a given sample. Find the margin of
error in each case.
a.
from 𝟎. πŸ‘πŸ“ to 𝟎. πŸ”πŸ“
We can find the margin of error in three different ways:
center – lowest proportion
highest proportion – center
(highest proportion + lowest proportion) / 2
We will find the margin of error using the center so we can review how to find the center too.
Center =
.35+ .65
2
= .5
Margin of error = .65 - .5 = .15
a.
from 𝟎. πŸ•πŸ to 𝟎. πŸ•πŸ–
Center =
.72+ .78
2
= .75
Margin of error = .78 - .75 = .03
b.
from 𝟎. πŸ–πŸ’ to 𝟎. πŸ—πŸ“
Center =
.84+ .95
2
= .895
Margin of error = .95 - .895 = .055
c.
from 𝟎. πŸ’πŸ• to 𝟎. πŸ“πŸ•
Center =
.47+ .57
2
= .52
Margin of error = .57 - .52 = .05
3. Decide if each of the following statements is true or false. Explain your reasoning in each case.
a.
The smaller the sample size, the smaller the margin of error.
This is false. When your sample size is small, your results are less accurate. This means your standard
deviation is likely to be higher and your range is greater. Since margin of error is based on the range, the
margin of error will also increase.
b.
If the margin of error is 𝟎. πŸŽπŸ“ and the observed proportion of red chips is 𝟎. πŸ‘πŸ“, then the true
population proportion is likely to be between 𝟎. πŸ’πŸŽ and 𝟎. πŸ“πŸŽ.
This is false. If the margin of error is .05 and the proportion is .35, we would describe the true population
proportion to be .35 ± .05, which is between .03 and .04 not .04 and .05.
Dr. Jaffe / Ms. Wen
Advanced Mathematics
Homework due Thursday 10/15
NAME: ___________________________________________________________
1. Using the formula for finding margin of error with a 95% confidence interval, answer the following
questions:
a. If the proportion of females at Union High School is 𝟎. πŸ’, what is the standard deviation of the
distribution of the sample proportions of females for random samples of size πŸ“πŸŽ? Round your
answer to three decimal places.
The standard deviation is √
𝑝(1βˆ’π‘)
. The sample proportion is .4, and the sample size is 50
𝑛
√
b.
The proportion of males at Union High School is 𝟎. πŸ”. What is the standard deviation of the
distribution of the sample proportions of males for samples of size πŸ“πŸŽ? Round your answer to three
decimal places.
The standard deviation is √
𝑝(1βˆ’π‘)
𝑛
. The sample proportion is .6, and the sample size is 50
√
c.
. 4(.6)
= .069
50
. 6(.4)
= .069
50
Think about the graphs of the two distributions in parts (a) and (b). You do not need to graph the
distributions, but how do you think they would be different. Discuss the shape and spread.
The standard deviation is the same, and the data is the opposite. If in part a, we get a proportion of .4, in
part b we get an answer of 1-.4 because the people who aren’t girls are boys. This means the shape will
be the same. The only change is that the graph is simply shifted. In part a, our mound will be at .4, and in
part b, our mound will be at .6.
2. A newspaper in a large city asked πŸ“πŸŽπŸŽ women the following: β€œDo you use organic food products (such as
milk, meats, vegetables, etc.)?” 280 women answered β€œyes.” Compute the margin of error. What does this
mean?
p = 280/500 = .56
n = 500
Margin of error = 2 * √
𝑝(1βˆ’π‘)
𝑛
= 2 *√
.56(.44)
500
= .044
This means that 95% of our data falls within the interval .56 ± .044
3. A newspaper in New York took a random sample of πŸ“πŸŽπŸŽ registered voters from New York City and found
that πŸ‘πŸŽπŸŽ favored a certain candidate for governor of the state. A second newspaper polled 𝟏𝟎𝟎𝟎 registered
voters in upstate New York and found that πŸ“πŸ“πŸŽ people favored this candidate. Why do you think the results
of the two polls were different?
You are asking a different number of people, and a different group of people so the answers will not be the same:
different people favor different candidates.
4. The school newspaper at a large high school reported that 𝟏𝟐𝟎 out of 𝟐𝟎𝟎 randomly selected students
favor assigned parking spaces. Compute the margin of error. What does your answer mean?
p = 120/200 = .6
n = 200
Margin of error = 2 * √
𝑝(1βˆ’π‘)
𝑛
= 2 *√
.6(.4)
200
= .069
This means that 95% of our data falls within the interval .6 ± .069
5. Why do we say that the proportion ± margin of error is a 95% confidence interval?
We say this because the margin of error is equal to two standard deviations, and 95% of our data falls within two
standard deviations of our mean.
Dr. Jaffe / Ms. Wen
Advanced Mathematics
Homework due Monday 10/19
NAME: ___________________________________________________________
1. Describe the difference between a sample distribution and a simulated sample distribution.
In a regular sample distribution, we are working with sample proportions, and in a simulated sample distribution,
we are working with sample means.
2. Explain the difference between the sample mean and the mean of a sampling distribution.
The mean of the sampling distributions takes the mean of each sample, and then finds the mean of all that data.
3. At the beginning of the school year, school districts implemented a new physical fitness program. A
student project involves monitoring how long it takes eleventh graders to run a mile. The following data were
taken mid-year.
a. What is the estimate of the population mean time it currently takes eleventh graders to run a mile
based on the following data (minutes) from a random sample of ten students?
6.5, 8.4, 8.1, 6.8, 8.4, 7.7, 9.1, 7.1, 9.4, 7.5
The mean is 7.9
The standard deviation is .9568
.9568
The margin of error =2*
= .604
√10
The population mean time is 7.9 ± .604
b. The students doing the project collected 50 random samples of 10 students each and calculated the
sample means. The standard deviation of their distribution of 50 sample means was 0.6 min. Based on
this standard deviation, what is the margin of error for their sample mean estimate? Explain your
answer.
The margin of error is 2 βˆ—
𝑠
βˆšπ‘›
=2*
.6
√50
= .170
This means that the margin of error is .17 seconds. It is double the standard deviation so we include 95% of
our data.
4. It is well known that astronauts increase their height in space missions because of the lack of gravity. A
question is whether or not we increase height here on Earth when we are put into a situation where the effect
of gravity is minimized. In particular, do people grow taller when confined to a bed? A study was done in
which the heights of six men were taken before and after they were confined to bed for three full days.
a. The before-after differences in height measurements (mm) for the six men were:
𝟏𝟐. πŸ” πŸπŸ’. πŸ’ πŸπŸ’. πŸ• πŸπŸ’. πŸ“ πŸπŸ“. 𝟐 πŸπŸ‘. πŸ“.
Assuming that the men in this study are representative of the population of all men, what is an
estimate of the population mean increase in height after three full days in bed?
An estimate for the population mean is the mean of all these values = 14.15
b. Calculate the margin of error associated with your estimate of the population mean from part (a).
Round your answer to three decimal places.
The margin of error is 2 βˆ—
𝑠
βˆšπ‘›
=2*
.9397
√6
= .767
c. Based on your sample mean and the margin of error from parts (a) and (b), what are plausible values
for the population mean height increase for all men who stay in bed for three full days?
We can describe the population mean as 14.15 ± .767
5. A new brand of hot dog claims to have a lower sodium content than the leading brand.
a. A random sample of ten of these new hot dogs results in the following sodium measurements (mg):
πŸ‘πŸ•πŸŽ πŸ‘πŸπŸ” πŸ‘πŸπŸ πŸπŸ—πŸ• πŸ‘πŸπŸ” πŸπŸ–πŸ— πŸπŸ—πŸ‘ πŸπŸ”πŸ’ πŸ‘πŸπŸ• πŸ‘πŸ‘πŸ.
Estimate the population mean sodium content of this new brand of hot dog based on the ten
sampled measurements.
An estimate for the population mean is the mean of all these values = 314.5
b. Calculate the margin of error associated with your estimate of the population mean from part (a).
Round your answer to three decimal places.
The margin of error is 2 βˆ—
𝑠
βˆšπ‘›
=2*
29.44
√10
= 18.619
c. The mean sodium content of the leading brand of hot dogs is known to be πŸ‘πŸ“πŸŽ mg. Based on the
sample mean and the value of the margin of error for the new brand, is a mean sodium content of πŸ‘πŸ“πŸŽ
mg a plausible value for the mean sodium content of the new brand? Comment on whether you think
the new brand of hot dog has a lower sodium content on average than the leading brand.
We can describe the population mean as 314.5 ± .18.619.
350 is outside of the estimated population mean so it is not a plausible sodium content. The new brand of
hot dog has a lower sodium content than the leading brand.
d. Another random sample of πŸ’πŸŽ new brand hot dogs is taken. Should this larger sample of hot dogs
produce a more accurate estimate of the population mean sodium content than the sample of size 𝟏𝟎?
Explain your answer by appealing to the formula for margin of error
It would yield a more accurate estimate. The margin of error would be smaller because you are dividing by a
larger number
Dr. Jaffe / Ms. Wen
Advanced Mathematics
Homework due Tuesday 10/20
NAME: ___________________________________________________________
For Problems 1 and 2, identify (i) the subjects, (ii) the treatments, and (iii) the response variable for each
experiment.
1. A botanist was interested in determining the effects of watering (three days a week or daily) on the heat
rating of jalapeño peppers. The botanist wanted to know which watering schedule would produce the highest
heat rating in the peppers. He conducted an experiment, randomly assigning each watering schedule to half
of twelve plots that had similar soil and full sun. The average final heat rating for the peppers grown in each
plot was recorded at the end of the growing season.
Subjects: jalapeño peppers (these are the things that are receiving treatments)
Treatments: the amount of watering – three days a week or daily (this is what is changing)
Responses: the heat rating (this is what we are measuring)
2. A manufacturer advertises that its new plastic cake pan bakes cakes more evenly. A consumer group
wants to carry out an experiment to see if the plastic cake pans do bake more evenly than standard metal
cake pans. Twenty cake mixes (same brand and type) are randomly assigned to either the plastic pan or the
metal pan. All of the cakes are baked in the same oven. The rating scale was then used to rate the evenness
of each cake.
Subjects: cake mixes (these are the things that are receiving treatments)
Treatments: the kind of pan – plastic or metal (this is what is changing)
Responses: the evenness of the cake (this is what we are measuring)
3. In one high school, there are eight math classes during 2 nd period. The number of students in each 2nd
period math class is recorded below.
πŸ‘πŸ
πŸπŸ•
πŸπŸ”
πŸπŸ‘
πŸπŸ“
𝟐𝟐
πŸ‘πŸŽ
πŸπŸ—
This data set is randomly divided into two equal size groups, and the group means are computed.
a) Will the two groups means be the same? Why or why not?
They will not necessarily be the same because we are randomly dividing the numbers into two groups. One group
could potentially get the 4 highest numbers, while the other could get the four lowest numbers.
The random division into two groups process is repeated many times to create a distribution of group mean
class size.
b) What is the center of the distribution of group mean class size?
The center of the distribution would be the mean of the single list, which is 25.5
c) What is the largest possible range of the distribution of group mean class size?
This would occur if one group included the four largest numbers and the other included the four smallest
numbers. In this case, the mean of the group with the four largest numbers is 28.75, and the mean of the
group with the four smallest numbers is 22.25. The range is 28.75 – 22.25 = 6.5
d) What possible values for the mean class size are more likely to happen than others? Explain why
you chose these values.
The answers may vary here, but they should definitely fall within the range 22.25 to 28.75. 24-27 is right
around the mean, so that would be a reasonable range where the most values occur.
4. Why do we want to use random assignment in an experiment?
Random assignment allows us to make cause and effect conclusions. If the assignment is not random, it is possible
that the people in one group are too similar and yet different from the people in the second group so we don’t know
if the treatment caused a difference or it was something else about the people in group A.
5. Why can’t we use random assignment in an observational study?
Random assignment is when we randomly assign our subjects to different treatments, but in an observational study
there are no treatments.
Dr. Jaffe / Ms. Wen
Advanced Mathematics
Homework due Wednesday 10/21
NAME: ___________________________________________________________
1.
Group A: πŸ– dieters lost an average of πŸ– pounds.
Group B: πŸ– non-dieters lost an average of 𝟐 pounds over the same time period.
Calculate and interpret "Diff" = the mean of Group A minus the mean of Group B
Μ…Μ…Μ…
π‘₯
π‘₯𝐡 = 8 – 2 = 6
𝐴 βˆ’ Μ…Μ…Μ…
The diff value is positive so this means the treatment (diet) in group A caused greater weight loss.
2.
Group A: 𝟏𝟏 students were on average 𝟎. πŸ’ seconds faster in their 𝟏𝟎𝟎 meter run times after following a
new training regimen.
Group B: 𝟏𝟏 students were on average 𝟎. 𝟐 seconds slower in their 𝟏𝟎𝟎 meter run times after not
following any new training regimens.
Calculate and interpret "Diff" = the mean of Group A minus the mean of Group B.
Μ…Μ…Μ…
π‘₯
π‘₯𝐡 = 0.4 - - 0.2 = .6
𝐴 βˆ’ Μ…Μ…Μ…
The diff value is positive so this means that the new training regimen was effective
3.
a) Using the randomization distribution shown in the Exit Ticket, what is the probability of obtaining a
"Diff" value of β€”πŸŽ. πŸ” or less?
There are 29 values that are -0.6 or less. There are a total of 100 values.
29/100 = .29.
b) Would a "Diff" value of βˆ’πŸŽ. πŸ” or less be considered a β€œstatistically significant difference”? Why or
why not?
It would not be statistically significant because it occurs 29% of the time, which is not less than 5%.
c) Using the randomization distribution shown in the Exit Ticket, what is the probability of obtaining a
"Diff" value of βˆ’πŸ. 𝟐 or less?
There are 6 values that are -1.2 or less. So the probability of getting -1.2 or less is 6/100 = .06.
d) Would a "Diff" value of βˆ’πŸ. 𝟐 or less be considered a β€œstatistically significant difference”? Why or
why not?
It would not be considered statistically significant because it occurs 6%. In order to be statistically significant,
it must occur less than 5% of the time.
3. Describe the purpose of randomization testing.
Randomization testing allows us to determine if the treatment has some effect. We want to make sure that any
differences we see between the two groups are happening because of the treatment and not as a result of chance.
4. How do we determine if a diff value between two groups in an experiment is statistically significant?
A diff value is statistically significant if it would occur less than 5% of the time if we were to divide the data into two
groups at random instead of by who received the treatment and who did not.