Download Describing Distributions 3 Topics: 1. Shape 2. Center 3. Spread

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Data mining wikipedia , lookup

Time series wikipedia , lookup

Transcript
Describing Distributions
3 Topics:
1. Shape
2. Center
3. Spread
Shape
• Always describe the basic shape of the distribution. • Symmetric – left and right side are mirror images, or approximate since we’re dealing with real data
• Skewed Left – majority of the data is on the right side but tails out to the left
• Unimodal – one major mode where the data is collected around
• Skewed Right – majority of the data is on the left side but tails out to the right
• Bimodal – two major modes
• Multimodal – multiple modes
• Uniform – data is flat and featureless
• Also mention any unusual features
• Outliers – observations away from the main distribution. Can be some of the most informative and interesting
• Gaps – spaces between clumps of data
Measure of Center
• Median or
• Mean
Median
•
•
•
•
the middle observation
Arrange the data in order from least to greatest
If an odd number of obs. find the middle number
If an even number of obs. find the average of the middle two numbers
Mean
• Arithmetic average, add all observations together and divide by the number of observations. • If there are n observations which are labeled as n
x1 , x2 ,..., xn
The mean is x =
∑x
i =1
n
i
1 n
= ∑ xi
n i =1
• Read as “x bar”
• Must be written with the bar!!!!
• Finding Summary Stats on your calculator!!!
Mean vs Median
• If the distribution is symmetric the mean the median will be approximately the same
Mean vs Median
• If the distribution is skewed to the left, the mean will be less than the median
• If the distribution is skewed to the right, the mean will be higher than the median
Symmetric Unimodal
No Outliers
x=
Med =
Skewed Right Unimodal
No Outliers
x=
Med =
Skewed Left
Unimodal
Lower Outlier at 10
x=
Med =
The mean is affected by outliers and the median is resistant to outliers
Ex. AGE
`
When the data is symmetric with no major outliers use the mean
` When the data is skewed or has major outliers use the median
`
5 Number Summary
2 2 3 4 5 5 6 6 7 7 7 7 7 8 9 9 9 10 11 12 14 16 19 27 35
Q1 = 5.5
Med = 7
Q3 = 11.5
• Numeric description of the distribution
1. Minimum
2. 1st Quartile (Q1) – Middle number of the bottom half of the data
3. Median 4. 3rd Quartile (Q3) – Middle number of the top half of the data
5. Maximum
Measures of Spread
• Put data in order
Min = 2
5 Number Summary
Max = 35
• Range – difference between the highest and lowest values
• Range = Max – Min Interquartile Range – difference between the 25% percentile and 75% percentile
• Range of the middle 50% of the data
• IQR = Q3 – Q1 Measure of Spread
Standard Deviation facts
• Standard Deviation is a number that describes how much data vary or spread out.
• Uses difference of each data value from the mean. n
2
( xi − x )
∑
s = i =1
n −1
• The higher the number the more the data is spread out.
• Can never be negative
• If s = 0 all the values in the data set are equal.
• The square of the standard deviation is call the variance. (s2 = var)
Two machines are set to cut a piece of metal that should weigh 15 grams. A random sample of 50 pieces from each machine is taken and the weights are measured.
Describing Distributions:
With Upper Outlier
x = 20.43
Both are centered at 15 grams.
s1 = 0.80 g
s2 = 2.08 g
s = 4.18
med = 20.03
IQR = 1.8
Without Upper Outlier
x = 19.84
s = 2.38
med = 20.03
IQR = 1.8
1. If you use the mean, use st. dev.
2. If you use the median, use the IQR
• Finding Summary Stats on your calculator!!!