Download Numerical Description

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
DESCRIBING DISTRIBUTION NUMERICALLY
MEASURES OF CENTER:
•
MIDRANGE = (MAX + MIN) / 2
•
MEDIAN IS THE MIDDLE VALUE WITH HALF OF THE DATA ABOVE AND HALF BELOW IT.
•
MEAN = (SUM OF DATA) / (NUMBER OF COUNTS n)
EXAMPLE:
DATA: 45, 46, 49, 35, 76, 80, 89, 94, 37, 61, 62, 64, 68, 56, 57, 57, 59, 71, 72.
SORTED DATA: 35, 37, 45, 46, 49, 56, 57, 59, 61, 62, 64, 68, 71, 72, 76, 80, 89, 94.
MIDRANGE = (94 + 35) / 2 = 64.5
MEDIAN = 61
MEAN = (35 + 37 + … + 94) / 19 = 62
NOTE: FOR SKEWED DISTRIBUTIONS THE MEDIAN IS A BETTER MEASURE OF THE CENTER THAN
THE MEAN.
1
MEASURES OF THE SPREAD
•
RANGE = MAX – MIN
•
INTERQUARTILE RANGE (IQR) = Q3 – Q1
Q3 = UPPER QUARTILE
= MEDIAN OF UPPER HALF OF DATA(INCLUDE MEDIAN IF n IS ODD)
Q1 = LOWER QUARTILE
MEDIAN OF LOWER HALF OF DATA(INCLUDE MEDIAN IF n IS ODD)
•
VARIANCE (later)
•
STANDARD DEVIATION (later)
2
Quartiles
EXAMPLE: (odd number of observations, 19)
Median = 61
UPPER HALF
35 37 45 46 49 56 57 57 59 [61 62 64 68 71 72 76 80 89 94]
Q3 = (71 +72) / 2 = 71.5
LOWER HALF
[35 37 45 46 49 56 57 57 59 61] 62 64 68 71 72 76 80 89 94
Q1 = (49 + 56) / 2 = 52.5
IQR = 71.5 – 52.5 = 19
Note: Include the median in the calculation of both quartiles
3
Quartiles
EXAMPLE: (even number of observations, 18)
35 37 45 46 49 56 57 57 59 [60] [61 62 64 68 71 72 76 80 89 ]
60 = Median = (59+61)/2 (Average of the middle two numbers)
UPPER HALF
35 37 45 46 49 56 57 57 59 [60] [61 62 64 68 71 72 76 80 89 ]
Q3 = 71
LOWER HALF
[35 37 45 46 49 56 57 57 59 ] 62 64 68 71 72 76 80 89 94
Q1 = 49
IQR = 71 – 49 = 42
4
5 – NUMBER SUMMARY:
•
THE 5-NUMBER SUMMARY OF A DISTRIBUTION REPORTS ITS MEDIAN, QUARTILES, AND
EXTREMES(MINIMUM AND MAXIMUM)
•
MAX = 94
•
Q3 = 71.5
•
MEDIAN = 61
•
Q1 = 52.5
•
MIN=35
OUTLIERS: DATA VALUES WHICH ARE BEYOND FENCES
IQR = Q3 – Q1 = 19
UPPER FENCE = Q3 + 1.5IQR = 71.5 + 1.5x19 = 100
LOWER FENCE = Q1 – 1.5IQR = 52.5 – 1.5x19 = 24
IN THE EXAMPLE CONSIDERED ABOVE, THERE ARE NO OUTLIERS.
5
BOXPLOTS
WHENEVER WE HAVE A 5-NUMBER SUMMARY OF A\
(QUANTITATIVE) VARIABLE, WE CAN DISPLAY THE
INFORMATION IN A BOXPLOT.
•
THE CENTER OF A BOXPLOT IS A BOX THAT SHOWS THE MIDDLE HALF OF THE
DATA, BETWEEN THE QUARTILES.
•
THE HEIGHT OF THE BOX IS EQUAL TO THE IQR.
•
IF THE MEDIAN IS ROUGHLY CENTERED BETWEEN THE QUARTILES, THEN THE
MIDDLE HALF OF THE DATA IS ROUGHLY SYMMETRIC. IF IT IS NOT CONTERED,
THE DISTRIBUTION IS SKEWED.
•
THE MAIN USE FOR BOXPLOTS IS TO COMPARE GROUPS.
6
BOXPLOTS
Boxplot of C1
100
90
80
C1
70
60
50
40
30
7
Examples:
• 1. Here are costs of 10 electric smoothtop ranges
rated very good or excellent by Consumers
Reports in August 2002.
•
•
850
1000
•
•
•
•
Find the following statistics by hand:
a) mean
b) median and quartiles
c) range and IQR
900
750
1400
1250
1200
1050
1050
565
8
VARIANCE = “AVERAGE” SQUARE DEVIATION FROM THE MEAN
• DEVIATION = (each data value) – mean
• VARIANCE = 4648 / (19 -1) = 258.8
• STANDARD DEVIATION = SQUARE ROOT (
VARIANCE)
= 16.1
9
VARIANCE = “AVERAGE” SQUARE DEVIATION FROM THE
MEAN
• Step 1: Sort Data:
565
750
850
900
1000
1050
1050
1200
1250
1400
Mean = 1001.5
Median =1025
Q1=850
Q3=1200
Range = 835
IQR= 350
10
VARIANCE = “AVERAGE” SQUARE DEVIATION FROM THE
MEAN
Computing the Variance
• DEVIATION = (each data value) – mean
• Squared Deviation= ((each data value) – mean)^2
• Sum all squared deviations
• Variance = (sum of all squared deviations)/(n-1),
where n = is the number of observations
11
Variance
Example:
Data
Squared Deviations
35
54.76
37
29.16
45
6.76
46
12.96
49
43.56
Mean = 42.4
•
Variance = 147.2/4 = 36.8
•
•
Std Deviation = square root of variance
Std dev = 6.06
12
Some Remarks
• If the shape is skewed, report the median and
IQR.
• Mean and median will be very differnet.
• You may want to include the mean and std
deviation, but you should point out why the
mean and the median differ.
• If the histogram is symmetric, report the mean
and the std deviation and possibly the median
and IQR.
13
Related documents