Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ISP 121 Week 4 Intro to Statistics Descriptive Statistics • Average, or mean • Median – the middle score • Percent Rank – calculates the position of a datapoint in a data set. More precisely, tells you approximately how many percent of the data is less than the datapoint. • Range – difference between the maximum and minimum values in the data set Descriptive Statistics • Lower quartile – or first quartile, it is the median of the data values in the lower half of a data set • Middle quartile – or second quartile, this is the overall median • Upper quartile – or third quartile, it is the median of the data values in the upper half of a data set • Quartiles may help in seeing the variation in a data set Quartiles • For example (bank waiting times): lower quartile Big Bank: median upper quartile 4.1 5.2 5.6 6.2 6.7 7.2 7.7 7.7 8.5 9.3 11.0 Best Bank: 6.6 6.7 6.7 6.9 7.1 7.2 7.3 7.4 7.7 7.8 7.8 Big Bank range: 11.0 – 4.1 = 6.9 Best Bank range: 7.8 – 6.6 = 1.2 Descriptive Statistics • The five number summary consists of: – The minimum value – The lower quartile (first quartile) – The median (second quartile) – The upper quartile (third quartile) – The maximum value Standard Deviation • Quartiles are OK for characterizing data, but standard deviation is preferred by statisticians • It is a measure of how far data values are spread around the mean of a data set • Std dev = sqrt(sum of (deviations from the mean)2 / total number of data values – 1) • Just use Excel’s =stdev function Standard Deviation • A simple way to estimate standard deviation is the range rule of thumb • Divide range by 4 • Watch for outliers. These can ruin your range rule of thumb Standard Deviation • • • • • • • Go back to Big Bank / Best Bank example Big Bank: range = 6.9 6.9 / 4 = 1.7 Actual standard deviation is 1.96 Best Bank: range = 1.2 1.2 / 4 = 0.3 Actual standard deviation is 0.44 Standard Deviation • Another basic rule of thumb: – If a data point lies more than two standard deviations from the mean, it can be called an outlier Histograms • Nice way to view a data set • A histogram is a chart similar to a dotplot created by defining a set of bins and counting how many data points lie in each bin. Bars are drawn with height proportional to the number of data points in each bin. Example Histogram Frequency Histogram 50 40 30 20 10 0 40 50 60 70 80 Bin 90 10 0 11 0 Excel Functions • Most, if not all, of these functions are built into Excel =average(range) where range can be like C3:C40 =median(range) =percentrank(range,value) =quartile(range,which quartile) =min(range) =max(range)