Download Chapters 4 and 5 - College of the Canyons

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Math 140
In-Class Work
College of the Canyons
Chapters 4 and 5: Quantitative Data II
Data Set:
 COC Math 140 Survey Results
 http://www.canyons.edu/faculty/morrowa/140/datasets/
 Focus on the quantitative variables only
 Note: Questions are online at http://classes.thegradekeeper.com/math140.php (if you need a reminder).
1.
Changing Bin Size. Adjust Minitab’s number of bins for the variable sleep to get a good histogram. How many
bins did you use? Include your new histogram. Describe why this new histogram is better.
BAD HISTOGRAM:
BETTER HISTOGRAM (12 bins):
Histogram of sleep
Histogram of sleep
90
90
80
80
70
70
Frequency
Frequency
60
50
40
60
50
40
30
30
20
20
10
10
0
0
2
4
6
sleep
8
10
0
12
0.0
2.4
4.8
7.2
sleep
9.6
12.0
The first histogram is bad because it hides the true pattern of the data. Some people entered half hours,
which is what causes the low bars between the high bars (whole hours). The new histogram smoothes the
gaps, showing us the trend in the data.
2. Side-By-Side Histograms/Boxplots. We wish to compare gpa of males vs. females.
a) Create side-by-side histograms for GPA of
males and females.
Histogram of gpa
0.0
0.6
female
b) Create side-by-side boxplots for GPA of males
and females.
First, they are similar in shape. They are both
skew left. They are both unimodal (if we ignore
GPAs of 0.00). They both have a gap
separating the 0 GPAs. They both show
outliers at 0.
30
20
0
0.0
0.6
1.2
1.8
2.4
3.0
3.6
gpa
Panel variable: gender
Boxplot of gpa
4
gpa
3
2
1
Variable
gpa
gender
female
male
Mean
3.1015
3.0397
StDev
0.8679
0.7183
Median
3.2000
3.0000
IQR
0.7750
0.7000
2.4
male
10
However, they have different centers and
spreads. Female’s have a slightly higher GPA
in that they have a higher mean and median.
Female GPAs are more spread out, measured
both by the IQR and the SD.
Because the data is skew, the median and IQR
are best to measure center and spread.
1.8
40
Frequency
c) Does there appear to be a difference in
male/female gpa?
1.2
0
female
male
gender
3.0
3.6
3. Compare sleep habits for males vs. females. (Include appropriate graphs, statistics, and descriptions.)
Boxplot of sleep
Histogram of sleep
0.0
2.4
7.2
9.6
12.0
12
male
50
10
40
8
sleep
Frequency
female
4.8
30
6
20
4
10
2
0
0.0
2.4
4.8
7.2
9.6
0
12.0
female
sleep
male
gender
Panel variable: gender
Note: I adjusted bin size on sleep.
Both distributions appear unimodal, roughly symmetric (very slight skew left for females).
Females have outliers on the low end, but males have outliers on both the high end and the low
end.
The centers (means and medians) are roughly the same.
Spread (based on IQR) is approximately the same for both.
Variable
sleep
gender
female
male
Mean
6.748
6.838
StDev
1.615
1.607
Median
7.000
7.000
IQR
2.000
2.000
It appears that males and females get the same amount of sleep.
4. The basics. For weight of math 140 students, find the
following:
a) Examine the weights of math 140 students.
Create a boxplot.
Boxplot of weight
300
250
The weights on the high end are high, but
reasonable. On the low end, though, I will
remove any weights less than 50 pounds. I
removed the following rows:
Row 1, 43, 252
weight
200
b) Are there any weight values that warrant
removing? Why? (Remove rows that contain
weight values that you think are mistakes).
150
100
50
0
Note: The rest of my answers have those rows removed. Also note: As you remove rows, the data shifts. If
I delete row 1, then what used to be row 43 is now 42.
c) Create a histogram. Describe the histogram.
Histogram of weight
Skew right. Almost looks bimodal (one mode at 120
and one at 140 – probably females/males—could
find more by graphing them separately – next time).
Gap on the high end. Potential outliers above 270.
i. Mean = 154.97
ii. Median = 150
iii. Standard deviation = 36.04
30
Frequency
d) For the remaining data, find the
40
20
10
0
90
120
150
180
210
weight
240
270
300
e) How do the mean and median compare? Why?
They are pretty close. The mean is higher because the data is right skew. Skew data pulls the mean in the
direction of the tail.
5. Geometrically speaking, the mean acts like what for a histogram?
The center of balance/gravity.
6. When is the mean the best measure of center?
Symmetric data.
7.
Complete the definition: The standard deviation is approximately the
average distance a typical value is from the mean.
.
8. When is the IQR best to measure spread? When is the range best? When is the standard deviation best?
IQR: best for skew
Range: Never
SD: best for symmetric