Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Time series wikipedia, lookup

Data mining wikipedia, lookup

Transcript
Statistics and Probability
Measuring data, analyzing data, the
shape of data, visualizing data,
center, variability, & distribution
Vocabulary: statistics
•Statistics is the science
of collecting, organizing,
representing, and
interpreting data.
Vocabulary: graph
Series 1
Series 2
Series 3
6
5
4
3
2
1
0
A pictorial
device used to
show a
numerical
relationship
Vocabulary: spread
Vocabulary: spread
• A measure of how much a
collection of data is spread out.
• Commonly used types include
range and quartiles.
• Also known as measures of
variation or dispersion.
Vocabulary: center
•Knowing where the
middle (center) of a set
of data would be gives
good information.
Vocabulary: center
• An average; a single value that is
used to represent a collection of
data. Three commonly used types
of averages are mode, median, and
mean.
• Also called measures of central
tendency or measures of average.
Vocabulary: outlier or
striking deviation
A number in a
data set that is
much larger or
smaller than
most of the
other numbers
in the set.
Vocabulary: variability
Vocabulary: variability
• A measure of how much a
collection of data is spread out.
• Commonly used types include
range and quartiles.
• Also known as spread or
dispersion.
Vocabulary: distribution
The pattern of the data
when a large sample is
used will be more likely
to look like chart A. This
is considered a “normal
distribution.” It is
sometimes called a Bell
Curve.
Vocabulary: distribution
A peak above the mean
such as in chart C is
“skewed to the right.” A
peak below the mean
such as in chart B is
“skewed to the left.”
Vocabulary: Measure of Center
• An average; a single value that
is used to represent a
collection of data. Three
commonly used types of
averages are mode, median,
and mean. (Also called measures of
central tendency or measures of average.)
Vocabulary: data
• Information, especially numerical information.
Usually organized for analysis.
Vocabulary: dot or line plot
A dot plot is also called a line plot.
It is a diagram showing frequency
of data on a number line. It is NOT
a line graph.
Vocabulary: tape diagram
• A drawing that looks like a segment of tape,
used to illustrate number relationships. Also
known as a strip diagram, bar model, fraction
strip, or length model
Vocabulary: histogram
Vocabulary: histogram
• A bar graph in which the labels
for the bars are numerical
intervals.
• The data is reported in
clusters, or ranges.
Vocabulary: box plots
A box plot is a
diagram that shows
the five number
summary of a
distribution. Five
number summary
includes lowest
value, lower quartile,
median, upper
quartile, and highest
value.
Vocabulary: box plot or
box and whisker plot
Vocabulary: interquartile range
A box and whisker plot breaks the data into
four parts. Each part is a quartile. The
interquartile range identifies the difference
between the upper quartile and the lower
quartile. These are the boxes on the plot.
Vocabulary: lower extreme
Lower extreme
The smallest or least
number out of a data
set, usually farther
away from the
interquartile range
than other data in set.
Also know as
minimum.
Vocabulary: Minimum
Same as the lower extreme in the previous slide!
Vocabulary: Maximum
Opposite of the lower extreme; it is the upper
extreme on the far right of the box &whisker
plot
minimum
maximum
Vocabulary: attribute
Large, blue
hexagon
Small, red
triangle
A
characteristic,
such as size,
shape or color
Vocabulary: Measure of Center
• We use several different ways to
measure the center. Some are:
–Mode (the piece of data most often
repeated)
–Median (the middle number when
data are in numerical order)
–Mean (the average of all the
numerical data)
Vocabulary: mean
•
•
•
•
•
•
Step 1: add
3,5,5,4,5,6,2,5=32
Eight data points
Step 2: divide
32 ÷8= 4
4 is the mean.
Definition: the
sum of a set of
numbers divided
by the number of
elements in the
set. A type of
average.
Vocabulary: median
13, 16, 17, 20, 22, 24, 24, 28, 32
When the numbers are arranged
from least to greatest, it is the
middle number.
Vocabulary: median
13, 16, 17, 20, 22, 24, 28, 32
If there are an even number of
data points, it is the average of the
two middle numbers.
(20+22=42; 42÷2=21 )
Vocabulary: mode
• Mode (the piece of data most
often repeated)
• 5, 7, 8, 9, 9, 11
• The mode in the data set above is 9.
• It is possible to have more than one
mode.
Vocabulary: Measure of variation
• Range, spread, and mean
absolute deviation are measures
that indicate how much the data
in one data set differ among
themselves.
• Each is a measure of variation.
Vocabulary: range
• The difference between the greatest
number and the least number in a
set of numbers.
• Data set: 3, 2, 5, 4, 1, 6, 4, 4, 2, 5, 7, 3
• Largest number is 7 and smallest is 1
• So, the range is 6 because 7-1=6
Vocabulary: mean absolute deviation
•
•
•
•
•
Large cube= 45 kg
Cylinder = 30 kg
Small cube = 24 kg
Mean = 33 kg
45-33=12, 30-33=-3,
24-33=-9,
12+3+9=24/3=8
• 8 is the mean absolute
variation
Vocabulary: mean absolute deviation
• In statistics, the absolute
deviation of an element of
a data set is the absolute
difference between that
element and a given point.
Vocabulary: statistical variability
• A variability or spread in a
variable or a probability
distribution. Common examples
of measures of statistical
dispersion are the variance,
standard deviation, and
interquartile range.
Vocabulary: data distribution
People in My Neighborhood
Age
Tally
0-19
11111
11111 1111
14
20-39
11111 111
8
40-59
1111
4
60-89
11111 11
7
1
1
90+
Frequency
A table that
shows how
many there
are of each
type of data.
Vocabulary: data
• Information,
especially
numerical
information,
usually
organized for
analysis.
Vocabulary: cluster
• A group of the same or
similar elements gathered
or occurring closely
together on a graph.
Vocabulary: cluster
Vocabulary: gap
Vocabulary: gap
Ages of orchestra members
10-15
xxxx
16-20
xxxxx xxxxx xxx
21-25
26-30
xxxxx xxxx
31-35
xxx
Gap in the data
• A place on a
graph where
no data
values are
present.