Download Chapter3

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data mining wikipedia , lookup

World Values Survey wikipedia , lookup

Time series wikipedia , lookup

Transcript
In this chapter, we will look at some charts
and graphs used to summarize quantitative
data. We will also look at numerical
analysis of such data.
A way of listing all data values in a condensed format:
 while not required, it helps to have the data sorted
 choose the digit to be the stem (10’s place, 100’s place…)
 put the stems in increasing (or decreasing) order in a column
 next to each stem, put leaves in increasing order, left to right
Construct a stem and leaf display for wingspans in
“ACSC” using the 10’s digit as the stem.
Sometimes, if the data are clumped together in a small
range of values, we use repeated stems – that is, each
stem is listed twice
 next to the first copy of the stem, all leaves from the lower
half of the possible leaf values are listed
 next to the second copy of the stem, all leaves from the upper
half of the possible leaf values are listed
Construct a stem and leaf display for wingspans in
“ACSC” using the 10’s digit as the stem and using
repeated stems.
The quantitative data equivalent of a bar chart:
 the horizontal axis has the possible values of the variable
 the width of each rectangle is called the
or
 the vertical axis should be appropriately scaled for
representing either frequencies or relative frequencies
 the height of each rectangle corresponds to the frequency or
relative frequency of each interval
 the lower value of each of the class intervals is included
in the count but the upper value is not included
Construct a histogram for wingspans in “ACSC” with
bins 10 wide.
Construct a histogram for wingspans in “ACSC” with
bins 5 wide.
The humps in a histogram are called
.
If the histogram has one distinct hump, it is called
.
If the histogram has two distinct humps, it is called
.
If the histogram has three or more humps, it is called
.
If the histogram has no clear modes (all rectangles are about
the same height), then it is called
.
If there exists a vertical line that could be drawn
through the “middle” of the histogram such that both
the right and the left sides are pretty close to the same,
the distribution is called
.
If one side of the histogram is stretched out farther than
the other, then the histogram is said to be
in the
direction of the longer tail.
This histogram is skewed to the left.
Any observation that stand away from the body of the
distribution could be an
.
Center:
If the data is non-symmetric, its center is measured as
the
of the set.
 The median of a data set is the middle value of the ordered
set
 If n is odd, the median is the value that cuts the list in
half
 If n is even, the median is the average of the two middle
values
Find the median of the given data sets. The first is the
heights of females in “ACSC” while the second is the
heights of females with brown hair in “ACSC”.
(a) 61 62 62 63 63 64 65 65 66 66 69 70 70 70 72
(b) 62 63 64 66 69 70
Spread:
The
of a data set is the difference between the
maximum and minimum values .
The
50% of the data set
(IQR) is the range of the middle
 The
half of the data
(LQ or Q1) is the median of the lower
 The
half of the data
(UQ or Q3) is the median of the upper
 IQR = UQ – LQ
Find the range and interquartile range of the heights of
females in “ACSC”.
61 62 62 63 63 64 65 65 66 66 69 70 70 70 72
5-Number Summary
The five values: min, Q1, median, Q3, and max are
called the
of a data set.
These can be found by hand as described in the previous
slides, or using technology.
5-Number Summary via TI 83/84
• press
• press
and then enter the data in L1
to select 1-Var Stats
• press
to perform the command, then scroll down
to see results
Once we have the 5-number summary of a quantitative
data set, we can represent the data set in a
.
numerically scaled axis
F
F
D
E
A
B
C
numerically scaled axis
A = lower quartile
B = median
C = upper quartile
F
F
D
E
A
B
C
numerically scaled axis
D = Lower Fence = smallest data value that is LQ – 1.5(IQR)
E = Upper Fence = largest data value that is UQ + 1.5(IQR)
F = Outliers = values > than upper fence or < than lower fence
Construct a boxplot for shoe sizes in “ACSC”.
Center:
If the data is symmetric, its center is measured as the
of the set.
 The sample mean of a data set is the average of the values
x
å
 x=
n
 the population mean is denoted 
Spread:
• the
• the
is calculated as s 2 =
å( x - x )
n -1
is calculated as s 2 =
For our purposes, this measurement will be rarely used.
2
å( x - m )
n
2
Spread:
is the positive square root of
variance, and it is a measurement of how, on average,
observations vary from the mean
•
•
is s =
å( x - x )
is s =
2
n -1
å( x - m )
n
2
Both mean and standard deviation can be found by hand
using these formulas.
It is much more common to use technology (the
calculator for our purposes).
The 1-Var Stats command introduced earlier for the 5number summary of a data set also has the mean and
standard deviation.
Find the mean, variance, standard deviation, and the
5-number summary of wingspans from the data in
“ACSC”.