Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Intro to Statistics and SPSS • • • • Mean (average) Median – the middle score (even number of scores or odd number of scores) Percent Rank (percentile) – calculates the position of a datapoint in a data set. More precisely, tells you approximately how many percent of the data is less than the datapoint. Range – difference between the maximum and minimum values in the data set 2 Lower quartile – or first quartile, it is the median of the data values in the lower half of a data set Middle quartile – or second quartile, this is the overall median Upper quartile – or third quartile, it is the median of the data values in the upper half of a data set Quartiles may help in seeing the variation in a data set 3 For example (bank waiting times): lower quartile Big Bank: median upper quartile 4.1 5.2 5.6 6.2 6.7 7.2 7.7 7.7 8.5 9.3 11.0 Best Bank: 6.6 6.7 6.7 6.9 7.1 7.2 7.3 7.4 7.7 7.8 7.8 Big Bank range: 11.0 – 4.1 = 6.9 Best Bank range: 7.8 – 6.6 = 1.2 4 The five number summary consists of: ◦ ◦ ◦ ◦ ◦ The The The The The minimum value lower quartile (first quartile) median (second quartile) upper quartile (third quartile) maximum value In SPSS (was called PASW), when viewing output, first quartile is 25th percentile, second quartile is 50th percentile, and third quartile is 75th percentile 5 Quartiles are OK for characterizing data, but standard deviation is preferred by statisticians It is a measure of how far data values are spread around the mean of a data set Std dev = sqrt(sum of (deviations from the mean)2 / total number of data values – 1) Don’t calculate by hand, use SPSS (which we’ll do in a few minutes) 6 • • • • A simple way to estimate standard deviation is the standard deviation estimate Divide the range by 4 Watch for outliers. They can ruin your range estimate What is an outlier? Two or more standard deviations from the mean (plus OR minus) 7 • • • • • • • • Go back to Big Bank / Best Bank example Big Bank: range = 6.9 6.9 / 4 = 1.7 Actual standard deviation is 1.96 Best Bank: range = 1.2 1.2 / 4 = 0.3 Actual standard deviation is 0.44 Any outliers? Means are 7.2 and 6.7 Big Bank: 4.1 5.2 5.6 6.2 6.7 7.2 7.7 7.7 8.5 9.3 11.0 Best Bank: 6.6 6.7 6.7 6.9 7.1 7.2 7.3 7.4 7.7 7.8 7.8 8 Nice way to view a data set A histogram is a chart similar to a dotplot created by defining a set of bins and counting how many data points lie in each bin. Bars are drawn with height proportional to the number of data points in each bin. 9 10 While Excel can do some basic statistics, it is not considered a serious statistics tool You really should use something like SPSS or SAS We’ll use SPSS since DePaul has a site license 11 • Copy the dataset Grades.xls from the QRC website (OlderData) to My Documents and start SPSS (or try the file IncomeGaps.xls) • Using SPSS, open the Grades.xls spreadsheet • Change the variable names and make sure the data is numeric, not text • Click on Analyze -> Descriptive Statistics -> Frequencies 12 Be careful! If the numeric fields in the dataset have any $, % or #, SPSS will have difficulty converting these to numeric In particular, if the data has dollar signs, have SPSS first convert the field to Dollar, then convert it to Numeric (IncomeGaps.xls) 13 Using the grades for Exam 2, find the ◦ 5 number summary (minimum, 1st quartile, median, 3rd quartile, maximum) ◦ Mean ◦ Range ◦ What is the standard deviation? 14 Let’s say you have just performed a survey. One of the questions you ask is, what type of home computer Internet connection do you have? Answers can be: none, dial-up, dsl, cable, other, not sure. 15 Here are some of your results Respondent ID 11111 11112 11113 11114 11115 11116 Cable Type no ds cm dk du du Where no = none; ds = dsl; cm = cable modem; du = dial up; dk = don’t know; ot = other 16 • • • • • • You can use SPSS to count the occurrences of data items, just like a pivot table Enter your data into SPSS Click on Analyze / Descriptive Statistics / Frequencies Move the variable that you want to count from the left box to the right box Make sure Display Frequencies Table is checked Run it 17 Crosstabs are an extension of pivot tables Let’s say you have asked a number of students: How many schools did you apply to? You get results something like the following (in a spreadsheet): 18 Respondent ID 1 2 3 4 5 6 7 8 9 10 Sex F M F F M M F F F M Number Schools 2 6 1 4 9 10 3 2 7 5 19 Now open the data in SPSS Then pull down the menu Analyze and click on Descriptive Statistics, then Crosstabs What variable do you want in the row? The column? When ready, click OK to perform the crosstab. Let’s do the activity. 20