Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
• Questions?? • Example: Exercise set B #1 p. 38 – Set up table with intervals, frequencies & percent per year – Graph appears on p. 39 Figure 5 – For discrete quantitative variables, make a histogram with the value in the middle of the base of the bar Ch. 4: Average & Standard Deviation • Descriptive statistics: ways of describing data – Center: average (mean) or median – Spread: variance & standard deviation – Shape • Descriptive statistics can only be calculated on quantitative variables. For qualitative/categorical variables we compute frequencies and percentages. Measures of center • Average: – Sum of the values divided by the number of values. – Balances a histogram. • Median – 50th percentile = the median is the middle value when the data is put in order from least to greatest. Half of the data are greater in value than the median and half of the data are smaller in value than the median. – To find the median: • First order the data. • If it is an even number of values, take the average of the 2 middle numbers. If it is an odd number of values, pick the middle value in the ordered data. Comparing Averages and Medians • The average is to the right of the median whenever the histogram has a long right tail. • Example: US Census 2004 • Median income: $45,996 • Average income: $62,083 • Using the Westvaco data: – What is the average age of the hourly employees? – What is the average age of those who were fired? – What is the median age of the hourly employees? Measure of spread • Standard deviation (SD) – Describes the spread of the data around the average. – Roughly 68% of data are within 1 SD of the average. – Roughly 95% of data are within 2 SD’s of the average. – These 2 statements are true most of the time but not always. Calculating the SD 1. Find the average of the data. 2. Find the deviation from the average for each data point. 3. Square the deviations. 4. Take the mean square of the square deviations (MS). 5. Take the square root of the MS (RMS). • Example: Find the SD of the values 4,4,5,8. Activity • Calculating the SD for a sample of the Westvaco data. Cross-sectional versus Longitudinal Studies • Cross-sectional studies allow one to compare subjects to each other at one point in time. • Longitudinal studies allow one to follow a subject over time and compare them to themselves over time. • Examples: NHANES vs. Framingham Heart Studies