Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Stat 1793 Chapter 3 – Statistics for Describing, Exploring and Comparing Data In this chapter we are going to look at descriptive statistics, numbers calculated for sample data to summarize or describe the data. The set of methods used to make conclusions about populations is called inferential statistics and we will be looking at these towards the end of the course. What type of descriptive measures will we look at? Basically three types: Measures of central tendency (mean, median, mode, mid-range) Measures of variation ( range, variance, standard deviation, ….etc) Measures of relative standing (z-scores, percentiles) Section 3-1 Measures of Centre The word average is used in phrases common to everyday conversation. For example,batting average, average life expectancy of a battery or a human being. The word average is derived from the French word avarie, which refers to the money that shippers contributed to help compensate for losses suffered by other shippers whose cargo did not arrive safely, ie. The losses were shared, with everyone contributing an average amount. Here, we plan on measuring the centre of the distribution of data in four ways: 1. Mean This is found by adding items in a set and dividing by the number of items and is also known as the arithmetic mean or average. It is the most common measure of central tendency. The population mean is denoted (mu), and the sample mean is denoted X (x-bar). The N mathematical formula for the mean can be written as X i 1 N i , where N n is the number of elements in the population, or as X X i 1 the sample size. Eg. Find the mean of the following measurements: 2, 5, 7, 10, 11, 13 n=6, X i 48, X 8 n i where n is 2. Median The middle value of a data set arranged in numerical order when there are an odd number of observations. With an even number of observations, the median equals the average of the 2 middle values. For example, to find the median of 5,3,2,7,4, we would first order the data: 2,3,4,5,7, the median would be x = 4. With a data set of an even number of measurements, 10,8,13,14,9,8, we would again order the data: 8,8,9,10,13,14 and here the median would be the average of the two middle values, 9.5. The word is derived from the Latin word medius which means middle. Note that 50 % of the observations in the data set are smaller than median and 50 % are greater than the median. In certain situations the median provides the quickest and most economical way to locate the centre of a distribution. For example, suppose 10 000 lightbulbs are installed in a factory. The easiest way to find a central number to describe life expectancy for the bulbs can be found by noting how much times elapses before exactly 50 % of them must be replaced 3. Mode The most frequently occurring number in the data set. With the data set: 10,8,13,14,9,8, the mode equals 8. With the data set 5,3,2,7,4, there is no mode. When two values in a data set occur with the same greatest frequency we call the data set bimodal. 4. Midrange Is located halfway between the maximum and minimum value and is found by taking the average of those two values. Find the mean, median, mode and midrange for the cholesterol example: A doctor testing cholesterol levels for 20 young patients found the following readings (mg/ml) 210 209 212 208 217 207 210 203 208 210 210 199 215 221 213 218 202 218 200 214 4204 Ordered: 199 208 210 215 X i 4204, X 20 210.2 200 208 210 217 202 209 212 218 Median = 210 203 210 213 218 Mode = 21 207 210 214 221 Both the median and mean are good measures of central tendency, but the median is less sensitive to extreme values. If the distribution is symmetric, all 4 measures are equivalent. In a distribution that is positively skewed (tail is to the right), the median will be less than the mean (Example: income in an Irving household). In a negatively skewed distribution, the median will be greater than the mean. The mean is of greater importance in statistical inference, because X has certain properties that make for more powerful and robust results. Finding the Mean from A Frequency Distribution X fx f i i where f i is the frequency of the ith category and x i is the midpoint of the i ith category Page 90 28. Temperature Frequency Midpoints f i xi 96.65 776.40 1634.3 2152.7 1866.75 3156.8 594.3 397.8 10405.7 1 96.65 96.5-96.8 8 97.05 96.9-97.2 14 97.45 97.3-97.6 22 97.85 97.7-98.0 19 98.25 98.1-98.4 32 98.65 98.5-98.8 6 99.05 98.9-99.2 4 99.45 99.3-99.6 106 Total 10405.7 X 98.17 106 This is close to the 98.2F found using the original data, and it appears to be significantly less than the commonly assumed population mean 98.6. Also do 3-1 #2, 4, 12, 18