Download Shape of the Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Time series wikipedia , lookup

Transcript
Histogram
Box plot
Dot plot
Ch04_Hurricanes
0
1
2
3 4 5 6
Hurricanes
Dot Plot
7
8
Number of hurricanes that occurred each year from 1944
through 2000 as reported by Science magazine
3 Characteristics of data
Shape
Center
Spread
Shape of the Data –
Symmetric
The IQ scores of 60 randomly selected 5th graders
Shape of the Data –
Symmetric
The age of all US Presidents at the time they took office
Notice that this distribution has only one mode
Shape of the Data – Bimodal
The winning times in the Kentucky Derby from 1875 to the
present. Why two modes?
Shape of the Data – Bimodal
The winning times in the Kentucky Derby from 1875 to the
present. Why two modes?
The length of the track was reduced from 1.5 miles to
1.25 miles in 1896. The race officials thought that 1.5 miles
was too far.
Shape of the Data – Skewed
LEFT
RIGHT
Data for two different variables for all female heart attack
patients in New York state in one year. One is skewed
left; the other is skewed right. Which is which?
Center and Spread of Data
Maximum
100th percentile
Q3
75th percentile
Median
50th percentile
Q1
25th percentile
Minimum
0th percentile
These numbers are called the 5 number summary.
The median measures the center of the data.
Q3 – Q1 = Interquartile range (IQR) measures the spread.
Measures of Central
Tendency and Dispersion
Central Tendency: Mean, Median, and Mode
Dispersion or Spread: Range, IQR, Standard Deviation, and Variance
Examples of Uses for Standard Deviation and Variance:
•
•
•
•
•
•
•
Factory Processes
Stocks
Weather
Sports Teams
Grades
Attendance to events
?????
x
x
Symbols
Symbols:
• s2 = Sample Variance
• s = Sample Standard Deviation
• 2 = Population Variance
x
•  = Population Standard Deviation
•
--
x = Mean
Center and Spread of Data
N
x 
x
i 1
N
i
sum of all numbers

number of numbers
The mean or average is a measure of
the center of a distribution
Center and Spread of Data
N
x
i 1
i
N
x
 mean absolute deviation
The mean of the absolute deviation of each number
Mean absolute deviation (mad) measures the spread
of the data (you learned this last year).
Center and Spread of Data
N
x
i 1
i
 x
N
2
 variance
The mean of the squares of the deviation of each number
The formula given above is for the population variance
Center and Spread of Data
N
x
i 1
i
 x
N
2
 standard deviation
The square root of the variance. This quantity has the same
units as the data. This is one of the most common measures
of the spread of a distribution.
The formula given above is for the population standard
deviation.
Center and Spread of Data
N
x
i 1
i
x
N
x
i 1
i
 x
2
N
x
i 1
i
 x
N
N
N
Mean
Variance
Standard
Absolute
Deviation
Deviation
2