Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
In this chapter, we will look at some charts and graphs used to summarize quantitative data. We will also look at numerical analysis of such data. A way of listing all data values in a condensed format: while not required, it helps to have the data sorted choose the digit to be the stem (10’s place, 100’s place…) put the stems in increasing (or decreasing) order in a column next to each stem, put leaves in increasing order, left to right Construct a stem and leaf display for wingspans in “ACSC” using the 10’s digit as the stem. Sometimes, if the data are clumped together in a small range of values, we use repeated stems – that is, each stem is listed twice next to the first copy of the stem, all leaves from the lower half of the possible leaf values are listed next to the second copy of the stem, all leaves from the upper half of the possible leaf values are listed Construct a stem and leaf display for wingspans in “ACSC” using the 10’s digit as the stem and using repeated stems. The quantitative data equivalent of a bar chart: the horizontal axis has the possible values of the variable the width of each rectangle is called the or the vertical axis should be appropriately scaled for representing either frequencies or relative frequencies the height of each rectangle corresponds to the frequency or relative frequency of each interval the lower value of each of the class intervals is included in the count but the upper value is not included Construct a histogram for wingspans in “ACSC” with bins 10 wide. Construct a histogram for wingspans in “ACSC” with bins 5 wide. The humps in a histogram are called . If the histogram has one distinct hump, it is called . If the histogram has two distinct humps, it is called . If the histogram has three or more humps, it is called . If the histogram has no clear modes (all rectangles are about the same height), then it is called . If there exists a vertical line that could be drawn through the “middle” of the histogram such that both the right and the left sides are pretty close to the same, the distribution is called . If one side of the histogram is stretched out farther than the other, then the histogram is said to be in the direction of the longer tail. This histogram is skewed to the left. Any observation that stand away from the body of the distribution could be an . Center: If the data is non-symmetric, its center is measured as the of the set. The median of a data set is the middle value of the ordered set If n is odd, the median is the value that cuts the list in half If n is even, the median is the average of the two middle values Find the median of the given data sets. The first is the heights of females in “ACSC” while the second is the heights of females with brown hair in “ACSC”. (a) 61 62 62 63 63 64 65 65 66 66 69 70 70 70 72 (b) 62 63 64 66 69 70 Spread: The of a data set is the difference between the maximum and minimum values . The 50% of the data set (IQR) is the range of the middle The half of the data (LQ or Q1) is the median of the lower The half of the data (UQ or Q3) is the median of the upper IQR = UQ – LQ Find the range and interquartile range of the heights of females in “ACSC”. 61 62 62 63 63 64 65 65 66 66 69 70 70 70 72 5-Number Summary The five values: min, Q1, median, Q3, and max are called the of a data set. These can be found by hand as described in the previous slides, or using technology. 5-Number Summary via TI 83/84 • press • press and then enter the data in L1 to select 1-Var Stats • press to perform the command, then scroll down to see results Once we have the 5-number summary of a quantitative data set, we can represent the data set in a . numerically scaled axis F F D E A B C numerically scaled axis A = lower quartile B = median C = upper quartile F F D E A B C numerically scaled axis D = Lower Fence = smallest data value that is LQ – 1.5(IQR) E = Upper Fence = largest data value that is UQ + 1.5(IQR) F = Outliers = values > than upper fence or < than lower fence Construct a boxplot for shoe sizes in “ACSC”. Center: If the data is symmetric, its center is measured as the of the set. The sample mean of a data set is the average of the values x å x= n the population mean is denoted Spread: • the • the is calculated as s 2 = å( x - x ) n -1 is calculated as s 2 = For our purposes, this measurement will be rarely used. 2 å( x - m ) n 2 Spread: is the positive square root of variance, and it is a measurement of how, on average, observations vary from the mean • • is s = å( x - x ) is s = 2 n -1 å( x - m ) n 2 Both mean and standard deviation can be found by hand using these formulas. It is much more common to use technology (the calculator for our purposes). The 1-Var Stats command introduced earlier for the 5number summary of a data set also has the mean and standard deviation. Find the mean, variance, standard deviation, and the 5-number summary of wingspans from the data in “ACSC”.