Download Assignment - Walden University ePortfolio for Mike Dillon

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Triola Assignment B
Section 4-2 – Basic Skills and Concepts
1) What does it mean when we say that “the probability of winning the grand prize is the Illinois
lottery is 1/20358520? Is such a win unusual?
A parameter is value (number) that refers to the entire population being studied. A
statistic is a value (number) that refers to a sample of a larger population. Check: OK
3) Determine whether the given value is a statistic or a parameter: A sample of households is
selected and the average (mean) number of people per household is 2.58.
2.58 is a statistic because it the mean for a sample of the larger population. Check: OK
5) Determine whether the given values are from a discrete or continuous data set: In the Chapter
Problem, it was noted that when 50 letters were sent as part of an experiment, three of them
arrived at the target address.
The values are from a discrete set because it does not make sense in the context of the
problem to have a decimal part of a letter. Check: OK
7) Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most
appropriate: The years of cicada emergence: 1936, 1953, 1970, 1987, and 2004.
The years would be considered an interval measurement since it is possible to
determine differences between the various years, but there is no zero level that
represents zero time. Check: OK
9) Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most
appropriate: The years of cicada emergence: 1936, 1953, 1970, 1987, and 2004.
The years would be considered an interval measurement since it is possible to
determine differences between the various years, but there is no zero level that
represents zero time. Check: OK
11) Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most
appropriate: The years of cicada emergence: 1936, 1953, 1970, 1987, and 2004.
The years would be considered an interval measurement since it is possible to
determine differences between the various years, but there is no zero level that
represents zero time. Check: OK
13) Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most
appropriate: The years of cicada emergence: 1936, 1953, 1970, 1987, and 2004.
The years would be considered an interval measurement since it is possible to
determine differences between the various years, but there is no zero level that
represents zero time. Check: OK
17) Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most
appropriate: The years of cicada emergence: 1936, 1953, 1970, 1987, and 2004.
The years would be considered an interval measurement since it is possible to
determine differences between the various years, but there is no zero level that
represents zero time. Check: OK
19) Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most
appropriate: The years of cicada emergence: 1936, 1953, 1970, 1987, and 2004.
The years would be considered an interval measurement since it is possible to
determine differences between the various years, but there is no zero level that
represents zero time. Check: OK
23) Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most
appropriate: The years of cicada emergence: 1936, 1953, 1970, 1987, and 2004.
The years would be considered an interval measurement since it is possible to
determine differences between the various years, but there is no zero level that
represents zero time. Check: OK
Section 1-3 – Basic Skills and Concepts
1) What is a voluntary response sample, and why is it generally unsuitable for methods of
statistics?
A voluntary response sample is a collection of participants who determine for
themselves whether or not to participate in a study. Examples of voluntary response
samples might include: individuals who respond to an Internet survey, individuals who
respond to a mail-in survey, etc.
Voluntary response samples are not generally suitable because they lack the key
characteristic of randomness. In addition, there is no guarantee that the participants
that comprise the sample accurately reflect the composition of the larger population. In
general, the results obtained from a voluntary response sample cannot be accurately
generalized to the larger population. The population is likely to be biased. Check: OK
after adding in phrase about bias.
5) Use critical thinking to develop an alternative conclusion: Based on a study of heights of men
and women who play basketball, a researcher concludes that the exercise from playing basketball
causes people to grow taller.
Based on the characteristics and requirements of the game of basketball, athletes who
are taller generally tend to play basketball. Check: OK
17) An economist randomly selects 10 wage earners from each of the 50 states. For each state,
he finds the average of the annual incomes, and he then adds those 50 values and divides by 50.
Is the result likely to be a good estimate of the average (mean) of all wage earners in the United
States? Why or why not?
The economist found the average of the 10 people in a state in order to determine the
mean annual income for that. After determining the averages for each of the 50 states,
he found the average of the averages in order to determine a mean for the entire
country. Although his procedure was mathematically correct, the process has several
flaws with regard to his sample.
First, the sample size is much too small to give an accurate reflection of the entire
population. Even if he chose to find a single average of all 500 people, 500 is too small
of a number to accurately represent a population of 300000000.
Second, each state has different characteristics with respect to population,
socioeconomic status, job market, geography, and so forth. For example, California has
a significantly higher population than Rhode Island. Thus, choosing 10 people from
Rhode Island and 10 people from California does not accurately reflect the
characteristics of the population. Check: OK
Section 1-4 – Basic Skills and Concepts
1) What is the difference between a random sample and a simple random sample?
A random sample is used when every individual in a population has an equal chance of
being selected to be in the sample. For example, suppose that the population consists of
every student in the school. If sample of 50 students is taken, every student in the
school has an equal chance of being chosen. A simple random sample is when every
possible sample of a given size has an equal chance of being selected. From the previous
example, every possible combination of 50 students has an equal chance of being
selected. It would not be simple random sample if the students were placed into
permanent groups of 10 and 5 groups were randomly selected. Check: OK
5) Determine whether the given description corresponds to an observational study or an
experiment: Nine-year-old Emily Rosa became an author of an article in the Journal of the
American Medical Association after she tested professional touch therapists. Using a cardboard
partition, she held her hand above the therapist’s hand, and the therapist was asked to identify the
hand that Emily chose.
This would be considered an observational study since the participants are being
observed but not changed by the procedure. Check: My original thought was that it was an
experiment. I overlooked looked the condition that the participant or subject is treated (or
modified) in an experiment. OK
9) Identify the type of observational study (cross-sectional, retrospective, prospective): A
researcher from Mr. Sinai Hospital in New York City plans to obtain data by following (to the
year 2015) siblings of victims who perished in the World Trade Center terrorist attack of
September 11, 2001.
This is an example of a prospective (or longitudinal) study since the research plans to
follow the participants for an extend period of time and collect data at some point or
points in the future. Check: OK
21) Identify which type of sampling is used: In a Gallop poll of 1059 adults, the interview
subjects were selected by using a computer to randomly generate telephone numbers there were
called.
This is an example of random sampling since the phone numbers are randomly
determined. Check: OK
Unit 1 – Review Exercises
1) Shortly after the World Trade Center towers were destroyed by terrorists, American Online
ran a poll of its Internet subscribers and asked this question: “Should the WTC towers be
rebuilt?” Among the 1,304,240 responses, 768,731 answered yes, 386,756 answered no, and
248,753 said that is was too soon to decide. Given that this sample is extremely large, can the
responses be considered to be representative of the population of the U.S.? Explain.
Due to the fact that sampling process involved voluntary response, it cannot be assumed
that the sample is representative of the entire population. Most likely, people who had
strong feelings about the response were the ones that responded. In addition, the poll
was only available to those with Internet access. Check: OK
3) Identify the level of measurement used in each of the following?
a) The weight of people being hurled through the air…
Continuous Ratio Check: OK
b) A movie critic’s ratings of “must see, recommended, not recommended…”
Discrete Ordinal Check: OK
c) A movie critic’s classification of “drama, comedy, adventure”
Discrete Nominal Check: OK
d) Bob, who is different in many ways, measures time in days, with 0 corresponding to his
birth date…
Discrete Interval Check: OK
5) Identify the type of sampling used when a sample of the 366000 Coke shareholders is
obtained as described. Then determine if the sample is representative of the population:
a) A complete list is compiled and every 500th name is selected
Systematic – This will be representative. Check: OK
b) At the annual stockholders’ meeting, a survey is conducted of all who attend
Convenience – The sample will be representative depending on the number of
stockholders who attend. However, if only those who care the most about the meeting
attend, then the sample may not be representative. Check: OK
c) Fifty different stockbrokers are randomly selected, and a survey is made of their clients…
Stratified – This will not be representative because different stockbrokers may have
different numbers of clients. Check: The answer is clustered since stockholders are grouped
by stockbroker and then sampled. The sample is not representative.
d) A computer file of all stockholders is compiled and numbered and a computer generates
random numbers to select the sample…
Random – This will be representative. Check: OK
e) All of the stockholder zip codes are collected and 5 stockholders are randomly selected from
each zip code
Clustered – This will not be representative since there may a larger concentration of
stockholders in one zip code versus another (i.e. urban areas versus rural areas). Check:
The sampling is stratified since the stockholders are grouped by zip code and then randomly
sampled. The sample is not representative.
Unit 1 – Cumulative Review Exercises
1)
3)
5)
Sum = 3.0630 + 3.0487 + 2.9149 + 3.1358 + 2.9753 = 15.1377 Check: OK
Mean = 15.1377 ÷ 5 = 3.02754 Check: OK
98.20  98.60
 0.40
 0.40


 6.64234  6.64 Check: OK
0.62
0.62
0.0602
10.2956
106
50  452
45

5 2 25

 0.5  0.56 Check: OK
45 45
Section 2-2 – Basic Skills and Concepts
1) What is a frequency distribution and why is it useful?
A frequency distribution uses some type of method for listing data values and their
corresponding counts (or frequencies). A frequency distribution is useful for
organizing data, looking for patterns, and visualizing the data. Check: OK
5) Identify the class width, class midpoints, and class boundaries for the frequency distribution:
Daily Low Temp (F)
35-39
40-44
45-49
50-54
55-59
60-64
65-69
Frequency
1
3
5
11
7
7
1
Class Width: 5
Class Midpoints: 37, 42, 47, 52, 57, 62, and 67
Class Boundaries: 34.5, 39.5, 44.5, 49.5, 54.5, 59.5, 64.5, and 69.5 Check: OK
9) Does the frequency distribution given in Exercise 5 appear to have normal distribution?
The two general criteria for a normal distribution are: 1) frequency start low, reach a
maximum, and then finish low; and 2) symmetry. In the case, the distribution does
appear to be normal. Check: OK
Section 2-3 – Basic Skills and Concepts
1) What important characteristic of data can be better understood through examination of
histogram?
A histogram gives a visual representation of the shape (i.e. normal, skewed, etc.) of the
distribution. Check: OK
5) How many crew members are included in the histogram (on page 54 of the text)?
2 + 10 + 5 + 1 = 18 crew members Check: OK
11) Refer to Exercise 19 in Section 2-2 and use the frequency distribution to construct a
histogram. Do the data appear to be normal?
Rainfall (Inches)
0.00-0.24
0.25-0.49
0.50-0.74
0.75-0.99
1.00-1.24
1.25-1.49
Frequency
46
5
0
0
0
1
Frequency
Rainfall in Boston on Sunday for One Year
50
45
40
35
30
25
20
15
10
5
0
46
5
0.12
0.37
0
0
0
1
0.62
0.87
1.12
1.37
Rainfall (Inches -- Class Midpoint Values)
The two general criteria for a normal distribution are: 1) frequency start low, reach a
maximum, and then finish low; and 2) symmetry. In the case, the distribution does not
appear to be normal since it is skewed in one direction. Check: OK
15) Refer to Exercise 23 in Section 2-2 and use the frequency distribution for the weights of the
pre-1983 pennies to construct a histogram. Do the weights appear to be normal?
Coin Weights (Grams)
2.9500-2.9999
Frequency
2
3.0000-3.0499
3.0500-3.0999
3.1000-3.1499
3.1500-3.1999
3
22
7
1
Coin Weights of Pre-1983 Pennies
25
22
Frequency
20
15
10
5
7
3
2
1
0
2.97495
3.02495
3.07495
3.12495
3.17495
Weights (Grams -- Class Midpoint Values)
The two general criteria for a normal distribution are: 1) frequency start low, reach a
maximum, and then finish low; and 2) symmetry. In the case, the distribution does
appear to be normal. Check: OK
17) Refer to Table 2-8 and use the relative frequency distribution for the best actors to construct
a relative frequency histogram. Do the two genders appear to win Oscars at different ages?
Ages of Oscar Winning Actors (Percentage)
Frequency
45
40
39
33
35
30
25
20
15
10
5
0
18
4
25.5
4
35.5
45.5
55.5
65.5
Age (Years -- Class Midpoint Values)
1
75.5
Although there are similarities between the graphs, it appears that men tend to win
Oscars at slightly older ages than women. Check: OK
Section 2-4 – Basic Skills and Concepts
1) What is the main objective in graphing data?
The main objective of a graph is to visually depict data in a manner that emphasizes the
key characteristics or features of the data. Check: OK The graph can also show the
distribution, outliers, and so forth.
9) Use the heights (Data Set 11) to construct and stemplot. What does the stemplot suggest
about the distribution of heights?
Height of Eruptions of Old Faithful
Stems (Tens)
Leaves (Ones)
9
55
10
11
00055
12
000000005555
13
0000000066668
14
000088
15
00
The distribution of the eruption heights of Old Faithful appear to be approximately
normal. Check: OK
17) Use the data to create a scatter diagram. In Data Set 3, use tar for the horizontal scale and
use carbon monoxide (CO) for the vertical scale. Determine whether there appears to be a
relationship between cigarette tar and CO. If so, describe the relationship.
Relationship Between Tar and CO for Different
Brands of Cigarettes
20
18
16
CO (mg)
14
12
10
8
6
4
2
0
0
5
10
Tar (m g)
15
20
In general, it appears that as the amount of tar increases, the amount of carbon
monoxide (CO) also increases. Check: OK
Unit 2 – Review Exercises
1) Construct a frequency distribution of the ages of the Oscar-winning actors listed in Table 2-1.
Use the same class intervals that were for the actresses. How does the result compare to the
frequency distribution for actresses?
Frequency Distribution: Ages of Best Actors
Age of Actor
Frequency
21-30
3
31-40
25
41-50
30
51-60
14
61-70
3
71-80
1
It appears that the distribution for the actors is centered at a value that is slightly
higher than for actresses which means that males to win Oscars at older ages as
compared to females. Check: OK
3) Construct a dotplot of the ages of the Oscar-winning actors listed in Table 2-1. How does the
result compare to the dotplot for actresses?
Ages of Best Actors Oscar Award

 

30
35
40
45




50


55






60
65
70
75
Although the shape of the distribution is similar to the dotplot for the actresses, the
values for the males tend to be concentrated at a higher age. Check: OK
5) Refer to Table 2-1 and use only the first 10 ages of actresses and the first 10 ages of the actors.
Construct a scatterplot. Based on the result, does there appear to be an association between the
ages of actresses and the ages of actors?
Relationship Between Oscar-Winning Actress Age
and Actor Age
70
60
Actor Age (in years)
25




   
  

50
40
30
20
10
0
0
10
20
30
40
50
Actress Age (in years)
60
70
The points do not form any type of consistent pattern (i.e. a line, parabolic curve, etc.).
Therefore, there does not appear to be an association between the ages of the two
groups. Check: OK I needed to reference the answer in order to clarify how the variables
were related.
Unit 2 – Cumulative Review Exercises
1) Consider the numbers that result from spins. Do those numbers measure or count anything?
No. These numbers are the values that are obtained. The number of times (the
frequency) each value is spun is what is counted. Check: OK
80
3) Examine the distribution table. Given that the last class summarizes results from three slots,
is its frequency approximately consistent with the results that would be expected from an
unbiased roulette wheel? In general, do the frequencies suggest that the wheel is unbiased?
The other classes each represent five spaces on the wheel. Since the last class only
involves three slots, this is 60% of the size of the other classes. If you divide 25 (the
frequency) by 0.6, you get a value of 41.666666 or 42. This value seems to be relatively
consistent with the other values.
Given the fact that only 380 spins were used, I would say that the roulette wheel is
unbiased. In the long run, the values should even out and be consistent for all of the
classes. Check: OK
Section 3-2 – Basic Skills and Concepts
1) In what sense are the mean, median, mode, and midrange measures of “center”?
Each of these measurements attempts to give an indication of the value that a
distribution is centered around. The mean locates the center by dividing the sum of all
values by the number of values in the distribution. The median is simply the middle
number in an ordered list of values. The mode is the value that appears the most. The
midrange is the average of the two extreme (maximum and minimum) values. Check:
OK
9) Find the mean, median, mode, and midrange. Fourteen different second-year medical students
at Bellevue Hospital measured the blood pressure of the same person. The systolic readings are
listed. What is notable about this data set?
138  130  ...  130  150 1875

 133.92857  133.9 Check: OK
Mean:
14
14
Median: 120 120 125 130 130 130 130 135 138 140 140 143 144 150
130  135 265

 132.5 Check: OK
2
2
Mode:
130 Check: OK
120  150 270

 135 Check: OK
Midrange:
2
2
In general, all four measures of central tendency (mean, median, mode, and midrange)
are fairly close. This would tend to indicate that the distribution is relatively
symmetric. It is approximately normal. Check: When looking at values themselves, it is
interesting that the values vary as much as they do since the blood pressure was taken on the
same person. OK
17) Waiting times of customers at Jefferson Valley Bank (in one line) and the Bank of
Providence (in three lines) are listed. Determine whether there is a difference between the two
data sets that is not apparent from a comparison of the measures of center. If so, what is it?
In both sets of the data, the mean is 7.15 minutes, the median is 7.2, the mode is 7.7, and
the midrange is 7.1. The measures of central tendency would indicate that the data sets
are essentially the same. However, the wait times for Jefferson Valley do not vary as
much as the wait times for the Bank of Providence. The range (difference between the
maximum and minimum values) for Jefferson Valley is only 1.2 minutes whereas the
range for Bank of Providence is 5.8 minutes. This means that persons waiting in line at
Jefferson Valley can anticipate a consistent wait time, but persons at the Bank of
Providence could have wait times that are short or long. Check: OK
Section 3-3 – Basic Skills and Concepts
1) Why is standard deviation considered a measure of variation? In your own words, describe
the characteristic of a data set that is measured by the standard deviation.
Standard deviation measures variation because it is essentially an average of the
differences between each value and the mean for the data set. The standard deviation
gives an indication about number of (or percentage of) values in the data set that are
within a certain range of the mean. Check: OK
5) Find the range, variance, and standard deviation for the given sample data. Answer the
question: Statistics students participated in an experiment to test their ability to determine when 1
minute has passed. The results are given. Identify at least one good reason why the standard
deviation from the sample might not be a good estimate of the standard deviation for the
population of adults.
Range:
75  49  26 Check: OK
SD:
 x 2  532 522  752  622  682  582  492  492  27772
 
 x 
2
s
 (53  52  75  62  68  58  49  49) 2  466  217156
2
 
n x 2   x 
nn  1
2

8(27772)  217156

88  1
222176  217156
5020


87 
56
89.6428  9.467  9.5
Check: OK
Variance: s 2  9.467 2  89.6428  89.6 Check: OK
The sample standard deviation is not a good approximate of the population standard
deviation because it is a very small sample (n = 6). Check: OK
13) Find the range, variance, and standard deviation for the two data samples. Answer the
question: Statistics are sometimes used to compare or identify authors of different works. The
lengths of the first 20 words in the foreword written by Tennessee Williams in Cat on a Hot Tin
Roof and the first 20 words in The Cat in the Hat by Dr. Suess are listed. Does there appear to
be a different in variation?
Cat on a Hot Tin Roof Check: OK
Range: 11  1  10
 x 2  2 2  6 2  2 2  2 2  12  4 2  4 2  2 2  4 2  2 2  ...
SD:
32  8 2  4 2  2 2  2 2  7 2  7 2  2 2  32  112  434
 
 x 
2
 (2  6  2  2  1  4  4  2  4  2  ...
3  8  4  2  2  7  7  2  3  11) 2  78  6084
2
s
 
n x 2   x 
2
nn  1

20(434)  6084
8680  6084


2020  1
2019
2596

380
6.831578  2.6137  2.6
Variance: s 2  2.6137 2  6.831578  6.8
The Cat in the Hat Check: OK
Range:
52  3
 x   3
2
SD:
2
2
 3 2  3  3 2  5 2  2 2  3 2  3 2  3 2  2 2  ...
4 2  2 2  2 2  3 2  2 2  3 2  5 2  3 2  4 2  4 2  208
 x 
2
 (3  3  3  3  5  2  3  3  3  2  ...
4  2  2  3  2  3  5  3  4  4) 2  62  3844
2
s
 
n x 2   x 
2
nn  1

20(208)  3844

2020  1
4160  3844
316


2019
380
0.831578  0.9119  0.9
Variance: s 2  0.9119 2  0.831578  0.8
There does appear to be a difference in variation. The sample from The Cat in the Hat
suggests that variation in word length is much smaller than the word length in Cat on a
Hot Tin Roof. Check: OK
17) Find the range, variance, and standard deviation for the two data samples. Answer the
question: Waiting times of customers at the Jefferson Valley Bank and the Bank of Providence
are listed. Compare the variation in the two data sets.
Jefferson Valley Bank Check: OK I had to fix a minor calculation error.
Range:
7.7  6.5  1.2
 x 2  6.52  6.6 2 6.7 2  6.82  7.12  ...
SD:
7.32  7.4 2  7.7 2  7.7 2  7.7 2  513.27
 
 x 
2
 (6.5  6.6  6.7  6.8  7.1...
 7.3  7.4  7.7  7.7  7.7) 2  71.5  5112.25
2
s
 
n x 2   x 
nn  1
2

10(513.27)  5112.25
5132.7  5112.25


1010  1
109
0.22722222  0.476678  0.48
20.45

90
Variance: s 2  0.476678 2  0.2272222  0.23
Bank of Providence Check: OK
Range: 10.0  4.2  5.8
 x   4.2
2
SD:
2
2
 5.4 2  5.8  6.2 2  ...
6.7 2  7.7 2  7.7 2  8.5 2  9.3 2  10.0 2  541.09
 x 
2
 (4.2  5.4  5.8  6.2  ...
6.7  7.7  7.7  8.5  9.3  10.0) 2  71.5  5112.25
2
s
 
n x 2   x 
nn  1
2

10(541.09)  5112.25
5410.9  5112.25


1010  1
109
298.65

90
3.318333  1.821629  1.82
Variance: s 2  1.821629 2  3.318333  3.31
Due to the fact that the values are generally closer together, the variation in wait times
for Jefferson Valley Bank is much lower than for the Bank of Providence. Check: OK
Section 3-4 – Basic Skills and Concepts
1) A value from a large data set is found to have a z-score of –2. Is the value above the mean or
below the mean? How many standard deviations away from the mean is this value?
A value with a z-score of –2 is two standard deviations below the mean. Check: OK
3) For a large data set, the first quartile, Q1 is found to be 15. What does mean when we say that
15 is the first quartile?
This means that about 25% of the values in the data set are below 15. About 75% of
the scores are above 15. Check: OK
9) Human body temperatures have a mean of 98.20 F and a standard deviation of 0.62 F.
Convert the given temperatures to z-scores.
a) 97.50 F
x  x 97.50  98.20  0.70
z


 1.1290  1.13 Check: OK
s
0.62
0.62
b) 98.60 F
x  x 98.60  98.20 0.40
z


 0.6451  0.65 Check: OK
s
0.62
0.62
c) 98.20 F
x  x 98.20  98.20 0.00
z


 0.00  0.00 Check: OK
s
0.62
0.62
13) Which is relatively better: a score of 85 on a psychology test or a score of 45 on an
economics test? Scores on the psychology test have a mean of 90 and a standard deviation of 10.
Scores on the economics test have a mean of 55 and standard deviation of 5.
In order to compare the scores, they need to be converted to standard scores (z-scores).
x  x 85  90  5
z


 0.50  0.50
s
10
10
x  x 45  55  10
z


 2.00  2.00
s
5
5
Since the score on the psychology test is only half of a standard deviation below the
mean, it is a better score than the score for the economics test which is two standard
deviations below the mean. Check: OK I had to correct a minor calculation error on the zscore for the economics test.
Section 3-5 – Basic Skills and Concepts
1) Refer to the STATDISK-generated boxplot. What do the values of 2, 5, 10, 12, and 20 tell us
about the data set from which the boxplot was constructed?
2 is the minimum value in the data set.
5 is the first quartile (Q1) of the data set.
10 is the median or second quartile (Q2) of the data set.
12 is the third quartile (Q3) of the data set.
20 is the maximum value in the data set. Check: OK
3) The two boxplots shown below correspond to the service times from two different companies
that repair air conditioning units. They are shown on the same scale. The top boxplot
corresponds to the Sigma Air Conditioning Company, and the bottom boxplot corresponds to the
Newport Repair Company. Which company has less variation in repair times? Which company
should have more predictable costs? Why?
The Sigma Company has less variation in repair time. The Sigma company should
have more predictable costs because they can budget for and charge for labor in a more
consistent manner. Due to the fact that there is less variation, the company can more
reliably predict how long repairs will take. Check: OK
5) In 1908, Gosset published an article. He included the data listed below for two different types
of corn seed that were used on adjacent plots of land. The listed values are the yields of head
corn in pounds per acre. Using the yields from regular seed, find the 5-number summary and
construct a boxplot.
The data can be organized as follows:
1316 1444 1511 1612 1903 1910 1935 1961 2060 2108 2496
Minimum: 1316
Maximum: 2496
First Quartile: 1511
Median: 1910
Third Quartile: 2060 Check: OK
The boxplot for this data would look like this:
Boxplot for Regular Corn Seed Yields
1316
1300
1511
1910
1500
1700
2060
1900
2100
2496
2300
2500
Unit 3 – Review Exercises
1) In a study of the relationship between heights and trunk diameters of trees, botany students
collected sample data. Listed below are the tree circumferences. Using the circumferences, find
the mean, median, mode, midrange, range, standard deviation, various, first quartile, third
quartile, and tenth percentile.
1.8  1.9  1.8  ...  4.1  3.7  3.9 90.7

 4.535  4.54 Check: OK
Mean:
20
20
Median: 1.8 1.8 1.9 2.4 3.1 3.4 3.7 3.7 3.8 3.9 4.0 4.1 4.9 5.1 5.1 5.2 5.3 5.5 8.3 13.7
3.9  4.0 7.9

 3.95 Check: OK
2
2
Mode:
Multimodal  1.8, 3.7, and 5.1 Check: OK
1.8  13.7 15.5

 7.75 Check: OK
Midrange:
2
2
Range: 13.7  1.8  11.90 Check: OK
SD:
 x   1.8  1.8  1.9  ...  8.3  13.7  544.85
 x   (1.8  1.8  1.9  ...  8.3  13.7)  90.7  8226.49
n x    x 
20(544.85)  8226.49
10897  8226.49
s



2
2
2
2
2
2
2
2
2
nn  1
2
2
2020  1
2019
2670.51

380
7.02765  2.650973  2.65
Check: OK
Variance: s 2  2.6509732  7.02765  7.03 Check: OK
k
25
3.1  3.4 6.5
Q1  P25  L 
n 
 20  0.25  20  5  Q1 

 3.25
Q1:
100
100
2
2
Check: OK
Q3  P75  L 
Q3:
k
75
5.1  5.2 10.3
n 
 20  0.75  20  15  Q3 

 5.15
100
100
2
2
Check: OK
P10:
k
10
1.8  1.9 3.7
n 
 20  0.10  20  2  P10 

 1.85
100
100
2
2
Check: OK
P10  L 
3) Using the same data set as question 1, construction a frequency distribution. Use seven
classes with 1.0 as the lower limit, and use a class width of 2.0.
Frequency Distribution: Tree Circumferences
Circumference
Frequency
1.0-2.9
4
3.0-4.9
9
5.0-6.9
5
7.0-8.9
1
9.0-10.9
0
11.0-12.9
0
13.0-14.9
1
Check: OK
7) Using the same data set as question 1, construct a boxplot and identify the 5-number summary
values.
Minimum: 1.8
Maximum: 13.7
First Quartile: 3.25
Median: 3.95
Third Quartile: 5.15 Check: OK
The boxplot for this data would look like this: Check: OK
Boxplot for Tree Circumferences
1.8
1.0
3.25 3.95
3.0
5.15
5.0
13.7
7.0
9.0
11.0
13.0
15.0