Download Chapter 2 Summary

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Chapter 2: Exploratory Data Analysis (EDA)
Objectives: This chapter introduces students to the graphical and numerical methods of exploratory data analysis (EDA).
The purpose of EDA is to describe the distribution of data, including shape, center, spread, and outliers. Many inferential
statistical procedures make assumptions about a variable’s distribution, and in general it is important to understand the
characteristics of a variable in order to summarize it and use it for estimation, inference, prediction, or other modeling.
It is also important to understand properties of common summary measures; for example, which measures are resistant
to the influence of extreme observations and why we might prefer one measure over another when describing a given
distribution. Finally, this chapter introduces students to the normal distribution, including graphical and other methods
for assessing whether or not a given set of data can be considered normally (or approximately normally) distributed.
Data Sets:
Beerwings
FlightDelays
GSS2002
Spruce
Key Terms:




























bar chart
bias
boxplot
categorical variable
center
contingency table
decile
distribution
dot plot
empirical cumulative distribution function (ecdf)
exploratory data analysis (EDA)
factor variable
fence
five-number summary
frequency
frequency table
histogram
interquartile range (IQR)
kurtosis
level
maximum
mean
mean absolute deviation
median
midmean
minimum
mode
moment




























natural variability
normal distribution
normal quantile plot
normal probability plot
numerical variable
outlier
percentile
pie chart
qualitative variable
quantile
quantile-quantile plot (qq plot)
quantitative variable
quartile
quintile
range
relative frequency
scatter plot
skewed (left, right)
skewness
spread
standard deviation (sample, population)
standard normal distribution
statistically significant
symmetric
trimmed mean
variability
variance (sample, population)
whisker
STT3850 Data Analysis 1 – L. Chihara and T. Hesterberg (2011). Mathematical Statistics and Resampling with R. Hoboken, NJ: John Wiley & Sons.
Last Updated 02/03/2015
Chapter Skills:














Create and interpret frequency/relative frequency tables for one categorical (qualitative) variable.
Create and interpret different types of contingency tables for two categorical (qualitative) variables.
Create and interpret frequency/relative frequency bar charts for one or two categorical (qualitative) variables.
Create and interpret dot plots and histograms numerical (quantitative) data, especially shape and outliers.
Create and interpret boxplots, including computing fences, for numerical (quantitative) data, especially outliers.
Be able to compute and interpret all of the summary measures listed in the Key Terms.
Understand and discuss the concept of resistant as it applies to summary measures.
Understand and discuss the concept of bias as it applies to summary measures.
Know which summary measures describe shape (e.g., skew).
Know which summary measures describe center (location).
Know which summary measures describe spread (variability).
Create and interpret a normal probability plot (i.e., a normal quantile-quantile plot).
Create and interpret an empirical cumulative distribution (ecdf) plot.
Create and interpret a scatterplot (e.g., linear versus nonlinear).
Overall…
 Describe distributions in terms of frequencies (categorical) or shape, center, spread, and outliers (numerical).
 Compare distributions in terms of frequencies (categorical) or shape, center, spread, and outliers (numerical).
 Understand and explain the general concept and importance of exploratory data analysis (EDA).
STT3850 Data Analysis 1 – L. Chihara and T. Hesterberg (2011). Mathematical Statistics and Resampling with R. Hoboken, NJ: John Wiley & Sons.
Last Updated 02/03/2015