Download Ch. 4: Average & Standard Deviation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
• Questions??
• Example: Exercise set B #1 p. 38
– Set up table with intervals, frequencies &
percent per year
– Graph appears on p. 39 Figure 5
– For discrete quantitative variables, make a
histogram with the value in the middle of the
base of the bar
Ch. 4: Average &
Standard Deviation
• Descriptive statistics: ways of describing
data
– Center: average (mean) or median
– Spread: variance & standard deviation
– Shape
• Descriptive statistics can only be
calculated on quantitative variables. For
qualitative/categorical variables we
compute frequencies and percentages.
Measures of center
• Average:
– Sum of the values divided by the number of
values.
– Balances a histogram.
• Median
– 50th percentile = the median is the middle
value when the data is put in order from least
to greatest. Half of the data are greater in
value than the median and half of the data are
smaller in value than the median.
– To find the median:
• First order the data.
• If it is an even number of values, take the average
of the 2 middle numbers. If it is an odd number of
values, pick the middle value in the ordered data.
Comparing Averages and Medians
• The average is to the right of the median
whenever the histogram has a long right
tail.
• Example: US Census 2004
• Median income: $45,996
• Average income: $62,083
• Using the Westvaco data:
– What is the average age of the hourly
employees?
– What is the average age of those who were
fired?
– What is the median age of the hourly
employees?
Measure of spread
• Standard deviation (SD)
– Describes the spread of the data around the
average.
– Roughly 68% of data are within 1 SD of the
average.
– Roughly 95% of data are within 2 SD’s of the
average.
– These 2 statements are true most of the time
but not always.
Calculating the SD
1. Find the average of the data.
2. Find the deviation from the average for
each data point.
3. Square the deviations.
4. Take the mean square of the square
deviations (MS).
5. Take the square root of the MS (RMS).
• Example: Find the SD of the values
4,4,5,8.
Activity
• Calculating the SD for a sample of the
Westvaco data.
Cross-sectional versus
Longitudinal Studies
• Cross-sectional studies allow one to
compare subjects to each other at one
point in time.
• Longitudinal studies allow one to follow a
subject over time and compare them to
themselves over time.
• Examples: NHANES vs. Framingham
Heart Studies