Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Data mining wikipedia , lookup

Time series wikipedia , lookup

Transcript
Statistics and Probability
Measuring data, analyzing data, the
shape of data, visualizing data,
center, variability, & distribution
Vocabulary: statistics
•Statistics is the science
of collecting, organizing,
representing, and
interpreting data.
Vocabulary: graph
Series 1
Series 2
Series 3
6
5
4
3
2
1
0
A pictorial
device used to
show a
numerical
relationship
Vocabulary: spread
Vocabulary: spread
• A measure of how much a
collection of data is spread out.
• Commonly used types include
range and quartiles.
• Also known as measures of
variation or dispersion.
Vocabulary: center
•Knowing where the
middle (center) of a set
of data would be gives
good information.
Vocabulary: center
• An average; a single value that is
used to represent a collection of
data. Three commonly used types
of averages are mode, median, and
mean.
• Also called measures of central
tendency or measures of average.
Vocabulary: outlier or
striking deviation
A number in a
data set that is
much larger or
smaller than
most of the
other numbers
in the set.
Vocabulary: variability
Vocabulary: variability
• A measure of how much a
collection of data is spread out.
• Commonly used types include
range and quartiles.
• Also known as spread or
dispersion.
Vocabulary: distribution
The pattern of the data
when a large sample is
used will be more likely
to look like chart A. This
is considered a “normal
distribution.” It is
sometimes called a Bell
Curve.
Vocabulary: distribution
A peak above the mean
such as in chart C is
“skewed to the right.” A
peak below the mean
such as in chart B is
“skewed to the left.”
Vocabulary: Measure of Center
• An average; a single value that
is used to represent a
collection of data. Three
commonly used types of
averages are mode, median,
and mean. (Also called measures of
central tendency or measures of average.)
Vocabulary: data
• Information, especially numerical information.
Usually organized for analysis.
Vocabulary: dot or line plot
A dot plot is also called a line plot.
It is a diagram showing frequency
of data on a number line. It is NOT
a line graph.
Vocabulary: tape diagram
• A drawing that looks like a segment of tape,
used to illustrate number relationships. Also
known as a strip diagram, bar model, fraction
strip, or length model
Vocabulary: histogram
Vocabulary: histogram
• A bar graph in which the labels
for the bars are numerical
intervals.
• The data is reported in
clusters, or ranges.
Vocabulary: box plots
A box plot is a
diagram that shows
the five number
summary of a
distribution. Five
number summary
includes lowest
value, lower quartile,
median, upper
quartile, and highest
value.
Vocabulary: box plot or
box and whisker plot
Vocabulary: interquartile range
A box and whisker plot breaks the data into
four parts. Each part is a quartile. The
interquartile range identifies the difference
between the upper quartile and the lower
quartile. These are the boxes on the plot.
Vocabulary: lower extreme
Lower extreme
The smallest or least
number out of a data
set, usually farther
away from the
interquartile range
than other data in set.
Also know as
minimum.
Vocabulary: Minimum
Same as the lower extreme in the previous slide!
Vocabulary: Maximum
Opposite of the lower extreme; it is the upper
extreme on the far right of the box &whisker
plot
minimum
maximum
Vocabulary: attribute
Large, blue
hexagon
Small, red
triangle
A
characteristic,
such as size,
shape or color
Vocabulary: Measure of Center
• We use several different ways to
measure the center. Some are:
–Mode (the piece of data most often
repeated)
–Median (the middle number when
data are in numerical order)
–Mean (the average of all the
numerical data)
Vocabulary: mean
•
•
•
•
•
•
Step 1: add
3,5,5,4,5,6,2,5=32
Eight data points
Step 2: divide
32 ÷8= 4
4 is the mean.
Definition: the
sum of a set of
numbers divided
by the number of
elements in the
set. A type of
average.
Vocabulary: median
13, 16, 17, 20, 22, 24, 24, 28, 32
When the numbers are arranged
from least to greatest, it is the
middle number.
Vocabulary: median
13, 16, 17, 20, 22, 24, 28, 32
If there are an even number of
data points, it is the average of the
two middle numbers.
(20+22=42; 42÷2=21 )
Vocabulary: mode
• Mode (the piece of data most
often repeated)
• 5, 7, 8, 9, 9, 11
• The mode in the data set above is 9.
• It is possible to have more than one
mode.
Vocabulary: Measure of variation
• Range, spread, and mean
absolute deviation are measures
that indicate how much the data
in one data set differ among
themselves.
• Each is a measure of variation.
Vocabulary: range
• The difference between the greatest
number and the least number in a
set of numbers.
• Data set: 3, 2, 5, 4, 1, 6, 4, 4, 2, 5, 7, 3
• Largest number is 7 and smallest is 1
• So, the range is 6 because 7-1=6
Vocabulary: mean absolute deviation
•
•
•
•
•
Large cube= 45 kg
Cylinder = 30 kg
Small cube = 24 kg
Mean = 33 kg
45-33=12, 30-33=-3,
24-33=-9,
12+3+9=24/3=8
• 8 is the mean absolute
variation
Vocabulary: mean absolute deviation
• In statistics, the absolute
deviation of an element of
a data set is the absolute
difference between that
element and a given point.
Vocabulary: statistical variability
• A variability or spread in a
variable or a probability
distribution. Common examples
of measures of statistical
dispersion are the variance,
standard deviation, and
interquartile range.
Vocabulary: data distribution
People in My Neighborhood
Age
Tally
0-19
11111
11111 1111
14
20-39
11111 111
8
40-59
1111
4
60-89
11111 11
7
1
1
90+
Frequency
A table that
shows how
many there
are of each
type of data.
Vocabulary: data
• Information,
especially
numerical
information,
usually
organized for
analysis.
Vocabulary: cluster
• A group of the same or
similar elements gathered
or occurring closely
together on a graph.
Vocabulary: cluster
Vocabulary: gap
Vocabulary: gap
Ages of orchestra members
10-15
xxxx
16-20
xxxxx xxxxx xxx
21-25
26-30
xxxxx xxxx
31-35
xxx
Gap in the data
• A place on a
graph where
no data
values are
present.