Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Transcript

Chapter 4 1 Displaying Quantitative Data Numerical data can be visualized with a histogram. Data are separated into equal intervals along a horizontal axis, then tally the frequency of data in each interval and build rectangles over each interval whose heights measure the frequency (or, rather, the relative frequency = proportion) of data in each interval. [TI83: STAT Edit, STATPLOT, ZoomStat, and Window settings.] A dotplot displays each value as a dot over a scale axis. Dots are stacked over the axis to indicate clusters of data. A quicker way to display numerical data by hand is with a stem-and-leaf display. All but the rightmost digit (or digits) of the measurement become stems; stems head rows in which the remaining digit(s), the leaves, are listed, carefully lined up in columns. (List all intermediate stems, even if they contain no leaves.) Data that tracks the change over time of a particular characteristic is called a time plot. Time is measured along the horizontal axis and the characteristic of interest is displayed along the vertical. Connecting consecutive data points highlights variation over time. Chapter 4 2 Describing Numerical Data: Features of Interest • The shape of a histogram or stem-and-leaf display describes the distribution of the data, where data are concentrated and how they spread out across the entire range of values. • Where is the center of the distribution located? • How much spread is there in the distribution? How tightly are data clustered about the center? • Is there more than one cluster, or mode? (The location of modes can change as the scale of a display is altered.) Is the data unimodal, bimodal, multimodal? • Is the distribution uniform (flat), indicating that every value is (roughly) equally represented? Is it roughly symmetric, with equally frequent values on either side of the center? or is it skewed (to the left or right, in the direction of the tail)? • Are there any outliers (values located far from the center)? Can we explain them? Chapter 4 3 Summarizing Numerical Data: Center and Spread • midrange the number halfway between the smallest and largest data value is an estimate of the center of the distribution; often a poor estimate, since it is highly sensitive to the presence of outlier values • median the middle observation in a sorted list of the data values (for an even number of values, average the two middle observations); a better estimate of center since it is resistant to the effects of outliers, hence a more commonly used measure of center Chapter 4 4 • range the difference between the largest and smallest data values is an estimate of the spread in the data; again, often a poor estimate, since it is highly sensitive to the size of outlier values • lower/upper quartiles ( Q1 and Q3 ) the observations which are one quarter (Q1) and three quarters (Q3) of the way up the list, the median values of the half of the data located below/above the median; also, the 25th and 75th percentiles of the data • interquartile range (IQR) the difference IQR = Q 3 − Q1 between the two quartiles; a better measure of the spread in the data since it is resistant to the presence of outliers € The five-number summary of a data set: • minimum value, • lower quartile, • median, • upper quartile, and • maximum value. [TI83: STAT CALC 1-VarStats] Chapter 4 5 • mean ( y ) the arithmetical average (where the distribution “balances”) € y= ∑y n ; in skewed distributions, the mean is pulled in the direction of the skewness (the longer tail), € indicating sensitivity to the presence of outliers; because each data value contributes own “weight” to the determination of the mean, this is the most commonly used measure of center in practice Chapter 4 6 • deviation from the mean ( y − y ) the difference between a data value and the mean of all the data € • variance ( s 2 ) estimates the average squared deviation from the mean € 2 s = ∑( y − y ) 2 n −1 • standard deviation ( s ) measure of spread that estimates the size of a € typical deviation from the mean € s= ∑( y − y)2 n −1 ; like the mean, sensitive to outliers €