Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 2: Exploratory Data Analysis (EDA) Objectives: This chapter introduces students to the graphical and numerical methods of exploratory data analysis (EDA). The purpose of EDA is to describe the distribution of data, including shape, center, spread, and outliers. Many inferential statistical procedures make assumptions about a variable’s distribution, and in general it is important to understand the characteristics of a variable in order to summarize it and use it for estimation, inference, prediction, or other modeling. It is also important to understand properties of common summary measures; for example, which measures are resistant to the influence of extreme observations and why we might prefer one measure over another when describing a given distribution. Finally, this chapter introduces students to the normal distribution, including graphical and other methods for assessing whether or not a given set of data can be considered normally (or approximately normally) distributed. Data Sets: Beerwings FlightDelays GSS2002 Spruce Key Terms: bar chart bias boxplot categorical variable center contingency table decile distribution dot plot empirical cumulative distribution function (ecdf) exploratory data analysis (EDA) factor variable fence five-number summary frequency frequency table histogram interquartile range (IQR) kurtosis level maximum mean mean absolute deviation median midmean minimum mode moment natural variability normal distribution normal quantile plot normal probability plot numerical variable outlier percentile pie chart qualitative variable quantile quantile-quantile plot (qq plot) quantitative variable quartile quintile range relative frequency scatter plot skewed (left, right) skewness spread standard deviation (sample, population) standard normal distribution statistically significant symmetric trimmed mean variability variance (sample, population) whisker STT3850 Data Analysis 1 – L. Chihara and T. Hesterberg (2011). Mathematical Statistics and Resampling with R. Hoboken, NJ: John Wiley & Sons. Last Updated 02/03/2015 Chapter Skills: Create and interpret frequency/relative frequency tables for one categorical (qualitative) variable. Create and interpret different types of contingency tables for two categorical (qualitative) variables. Create and interpret frequency/relative frequency bar charts for one or two categorical (qualitative) variables. Create and interpret dot plots and histograms numerical (quantitative) data, especially shape and outliers. Create and interpret boxplots, including computing fences, for numerical (quantitative) data, especially outliers. Be able to compute and interpret all of the summary measures listed in the Key Terms. Understand and discuss the concept of resistant as it applies to summary measures. Understand and discuss the concept of bias as it applies to summary measures. Know which summary measures describe shape (e.g., skew). Know which summary measures describe center (location). Know which summary measures describe spread (variability). Create and interpret a normal probability plot (i.e., a normal quantile-quantile plot). Create and interpret an empirical cumulative distribution (ecdf) plot. Create and interpret a scatterplot (e.g., linear versus nonlinear). Overall… Describe distributions in terms of frequencies (categorical) or shape, center, spread, and outliers (numerical). Compare distributions in terms of frequencies (categorical) or shape, center, spread, and outliers (numerical). Understand and explain the general concept and importance of exploratory data analysis (EDA). STT3850 Data Analysis 1 – L. Chihara and T. Hesterberg (2011). Mathematical Statistics and Resampling with R. Hoboken, NJ: John Wiley & Sons. Last Updated 02/03/2015