Download 1.4 Defining Data Spread

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Time series wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
1.4 Defining Data Spread
• An average alone doesn’t always describe a set
of data effectively or completely. An average
doesn’t indicate whether the data clusters,
whether the set contains outliers, what the range
is, how the data is spread etc. In general it does
not tell about the set’s distribution. The various
data distribution plots we have studied help to
do that.
• Investigate the following to discover a way to
determine a single number that can indicate the
spread and variation in a data set.
• For the following data try to determine the average
(mean) distance the values are from the mean of
the set.
Test Scores
35 44 56 58 62 67 70 72 76 88 90 94
• Step 1- Calculate the mean of the set
mean = 67.7 approx. 68
Step 2 – Calculate the distance each value is from
the mean ( mean – the value ).
This is called the deviation from the mean.
Data Value
35
44
56
58
62
67
70
72
76
88
90
94
Deviation(mean – value)
68-35= 33
24
12
10
6
1
-2
-4
-8
-20
-22
-26
• Step 3 – Square each deviation (to remove the
negatives)
Data Value Deviation Squared Deviation
35
33
1089
44
24
576
56
12
144
58
10
100
62
6
36
67
1
1
70
-2
4
72
-4
16
76
-8
64
88
-20
400
90
-22
484
94
-26
676
• Step 4 - Find the mean of the squared
deviations.
3590 / 12 = 299.2 approx. 299
Step 5 – Find the square root of step 4 (the mean
of the squared deviations)
√ 299 = 17.3
• You just found what is called…..
Standard deviation – a # that describes the
spread/variation within a set of data. It
represents the average distance the data values
are from the mean of the set.
• The greater the standard deviation…
- the more spread/variation
- the farther the random piece of data is from
the mean
The lower the standard deviation…
- the closer the random piece of data is to the
mean
- the more clustering around the mean
- the less variation/spread