Download Analyze Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Data mining wikipedia , lookup

Time series wikipedia , lookup

Transcript
Analyze Data
USE MEAN & MEDIAN TO COMPARE THE CENTER OF DATA SETS.
IDENTIFY OUTLIERS AND THEIR EFFECT ON DATA SETS.
Focus 6 Learning Goal – (HS.S-ID.A.1, HS.S-ID.A.2, HS.S-ID.A.3, HS.S-ID.B.5) =
Students will summarize, represent and interpret data on a single
count or measurement variable.
4
In addition to level
3.0 and above and
beyond what was
taught in class, the
student may:
· Make connection
with other concepts
in math
· Make connection
with other content
areas.
3
The student will summarize,
represent, and interpret data
on a single count or
measurement variable.
- Comparing data includes
analyzing center of data
(mean/median), interquartile
range, shape distribution of a
graph, standard deviation
and the effect of outliers on
the data set.
- Read, interpret and write
summaries of two-way
frequency tables which
includes calculating joint,
marginal and relative
frequencies.
2
1
The student will be
able to:
- Make dot plots,
histograms, box
plots and two-way
frequency tables.
- Calculate
standard deviation.
- Identify normal
distribution of data
(bell curve) and
convey what it
means.
With help from
the
teacher, the
student has
partial success
with summarizing
and interpreting
data displayed in
a dot plot,
histogram, box
plot or frequency
table.
0
Even with
help, the
student has
no success
understandin
g statistical
data.
Reminder:

To find the median, you put all the numbers in
order from least to greatest. The middle number
is the median.

To find the mean, you add up all of the numbers
then divide by how many numbers are in the
data set.
Measure of Central Tendency

Mean & median are both measures of central tendency. This
means they identify the “middle” of the data.

This measure attempts to describe the whole set of data with a
single value that represents the middle or center of its distribution.

Median:


Advantage: Is less affected by outliers and skewed data. It is the preferred
measure of center when the distribution is not symmetrical.
Mean:

Advantage: Can be used for both continuous and discrete numeric data.

Limitations: Is influenced by outliers and skewed distribution.
The shape of data distributions.

Normal distribution is mound shaped, symmetric.

If the mean and median are equal, then the data is symmetric.

If the mean is greater than the median, the data is skewed right.

If the mean is less than the median, the data is skewed left.
Test your memory…

The mean of a data set is 12 and the median is 12. What are the
possible shapes for this data set?


A. Mound
B. Symmetric
C. Skewed Right
D. Skewed Left
E. Both A & B
The mean of a data set is 12 and the median is 10. What is the data
shape?

A. Octagonal
B. Symmetric
C. Skewed Right
D. Skewed Left
Outliers

The shape of the data helps us find and identify
outliers.

An outlier is something that sticks out from the rest of
the data.

It is a data point that has an “extreme value” when
compared with the rest of the data set.

Mathematically speaking, an outlier is defined as any
point that falls 1.5 times the IQR below the lower
quartile or 1.5 times the IQR above the upper quartile.
Data:
37, 37, 38, 38, 40, 40, 42, 42, 42, 62
The median is: 40
 Q1: 38


Q3: 42

IQR = Q3 – Q1= 42 – 38 = 4

The box plot looks like this:






The lower limit on outliers is
Q1 – (1.5)(IQR).
38 – (1.5)(4) = 32
This means an outlier would be
any number less than 32.
The upper limit on outliers is
Q3 + (1.5)(IQR).
42 + (1.5)(4) = 48
This means an outlier would be
any number greater than 48.
Data:
37, 37, 38, 38, 40, 40, 42, 42, 42, 62

The outlier for this data set is 62.

It surpasses the cut off of 48.

When there is an outlier on one side of
the data set, we can chop off the
“whisker” at the limit and then record the
outlier as data points.


Calculate the mean of
the data set. 41.8

Calculate the mean of
the data set without the
outlier.
39.6

Removing the outlier
changes the mean
significantly.

Removing the outlier
does not change the
median significantly.
The final box plot would look like this.
Going Fishing

A fisherman records the length, in centimeters of
10 bass caught in a stream:
15 22 19 18 15 45 27 18 18 51

He wants to know the average length of a fish
he can catch.

Determine the mean and median of the data.
 Mean:
248 ÷ 10 = 24.8 cm
 Median:

15 15 18 18 18 19 22 27 45 51
18.5 cm
Going Fishing

Are there any outliers?
 Divide
 15
the data into quarters to find the IQR.
15 18 18 18 19 22 27 45 51
Q1
Q3

IQR = 27 – 18 = 9

The lower limit on outliers is Q1 – (1.5)(IQR).
 18

– (1.5)(9) = 4.5
The upper limit on outliers is Q3 + (1.5)(IQR).

27 + (1.5)(9) = 40.5
Any number less than
4.5 or greater than
40.5 are outliers.
45 and 51 are outliers.
Going Fishing

Remove the outliers and recalculate the mean and
median.

15 15 18 18 18 19 22 27

Mean: 152 ÷ 8 = 19 cm

Median: 18 cm

With the outliers removed, the mean is now closer to the
center of the data.

The average length of a fish caught in this stream is
________.