Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics 203 Solutions to Assignment #2 1. In a psychological experiment, the time on task was recorded for ten subjects under a 5-minute time constraint. The measurements are in seconds: 175 190 250 230 240 200 185 190 225 265 (a) Find the mean and range. (b) Find the mean deviation without using a computer. (c) Find the standard deviation and variance without using a computer. (d) Check all your results using a computer. Comment on any differences you may see and why they occur. To check your value of MD in SPSS use Transform > compute. Create a new variable (target variable) called MD whose values are abs(time-215). Then use Analyze > Descriptive Statistics > Descriptives to find the mean deviation. 1. a) Range = H - L = Max - Min = 265 - 175 = 90 b) MD = (|175-215| + |190-215| + ... + |265-215|)/10 = 27 x =215 ____ 10 c) Standard deviation=s = [ ( xi x ) 2 ] / 10 29.41 i 1 Variance=s2 = 865 d) refer to the SPSS instruction N Valid 10 Missing 0 Mean 215.0000 Median 212.5000 Std. Deviation 31.00179 Variance 961.111 Range 90.00 MD N Mean Valid 10 Missing 0 27.0000 NOTE: If you multiply the variance you got by hand by 10/9 you will get the one from SPSS. This is because SPSS divides by N-1 instead of N. 1 2. Consider the EmployeeData. (a) Determine the value of the most appropriate measure of central tendency and spread for variables Salary and Jobcat. (b) What is the mean and median of the data? (c) Hand construct a frequency and relative frequency histogram for the Salary data using classes with width 10,000 starting with class [15,000-25,000). (d) Using the grouped histograms constructed above, estimate the mean and median of the salary data. How close fare these to the true values? What is the variance and standard deviation of the grouped data? a) Salary: Since the level of Salary is interval and the histogram of Salary is skewed to the right, the median would be appropriate measures of centre and spread to use. median= $28,875.00 (note: N 1 is the position of the median and not the value of median) 2 Jobcat : Since the level of Jobcat is nominal the best measures to use would be the mode. Note: salary is interval data and the histogram is greatly skewed to the right, so that the mean will be greatly influenced by the extreme data and will not represent most of the data very well. However the median is not as sensitive to the extreme data so that it will represent most of the data well. b) Mean = $34,419.57 Mdn = $28,875.00 c) class ( 15,000- 25,000] ( 25,000- 35,000] ( 35,000- 45,000] ( 45,000- 55,000] ( 55,000- 65,000] ( 65,000- 75,000] ( 75,000- 85,000] ( 85,000- 95,000} ( 95,000-105,000] (105,000-115,000] (115,000-125,000] (125,000-135,000] f Cum f 143 195 53 26 21 19 7 4 4 1 0 1 143 338 391 417 438 457 464 468 472 473 473 474 rf % Cum rf Cum % .302 .411 .112 .055 .044 .040 .015 .008 .008 .002 .000 .002 30.2 41.1 11.2 5.5 4.4 4.0 1.5 0.8 0.8 0.2 0.0 0.2 .302 .713 .825 .880 .924 .964 .979 .987 .996 .998 .998 1.000 30.2 71.3 82.5 88.0 92.4 96.4 97.9 98.7 99.6 99.8 99.8 100.0 2 d) Mean= X fm = (143*20000 + 195*30000 + ... + 1*120000)/474 = $34,345.99 N m=midpoint of the class interval f=frequency of a class interval N=total number of scores N cf b )i = 25,000 + ((474/2 - 143)/195)*10000 = $29820.51 median= L ( 2 f N=number of cases in the distribution (474) cf b =cumulative frequency below the lower limit of the critical interval L=lower limit of the critical interval f=frequency within the critical interval i=class-interval size Variance: S 2 f (m X ) 2 fm 2 X2 N N 2 143 * 20,000 195 * 30,000 2 ... 1 *130,000 2 - (34,345.99) 2 474 6.974 *1011 34345.99 2 474 2.9166 10 8 Standard deviation: S f (m X ) N 2 fm N 2 X2 17078.09 Note: the formula weused to construct the mean and median using grouped histograms (refer to page 93 of text book) is different from the formula we used to calculate the true mean and median from the raw data (refer to page 111 of text book) even though all these values are fairly close to their ungrouped values (true values). Make sure of the differences between the formulas. 3 frequency histogram Count 15 0 10 0 50 0 $2 5,000 $5 0,000 $7 5,000 $1 00,00 0 $1 25,00 0 Current Salary % relative frequency histogram 40% Percent 30% 20% 10% $25,000 $50,000 $75,000 $100,000 $125,000 Current Salary Note: relative frequency histogram is different from the frequency histogram and cumulative frequency histogram or cumulative frequency polygon. Be care for that the y-axis should be the percentage of the frequency. 4 3. Open the 1991 U.S. General Social Survey. (It is found in the same place that EmployeeData is found in SPSS. A grad student is getting it for me to place on my webpage.) (a) Construct a percentage bar chart for the variable Life. How do most people in this survey view life? (b) Determine the range, mean deviation and standard deviation for the age variable. Note: You will have to use Transform > Compute again and the formula for MD to calculate the MD value. (c) What are the most appropriate measures of central tendency and spread for the variables Happy and Age. a) 40 30 20 Percent 10 0 Missing Exciting Routine Dull Is Life Exciting or Dull b) Age of Respondent N Valid Missing Mean Median Std. Deviation Variance Range 1514 3 45.63 41.00 17.808 317.140 71 From the Transform > Compute menu we can calculate the Mean Deviation to be: MD = 15.0554 5 c) Since happy is ordinal level the most appropriate measures of centre and spread would be the median and range. Since age is interval level and skewed to the right, mean is seriously affected by the extreme value. Consequently, Mean could not represent most of the data properly, whereas the median would be the appropriate measures for central tendency and the standard deviation or variance are appropriate measures for the spread. Note: measure of centre tendency is different from measure of spread. Mode, mean and median could be used to measure centre tendency, but MD, standard deviation and variance are used to measure spread. Refer to P117 on the text book “Compare Measures of Variability”, the standard deviation (like the mean deviation) has the characteristic of being an interval-level measure and, therefore, can not be used with nominal or ordinal data. Range is regarded generally as preliminary or rough index of the variability of distribution. It is quick and simple to obtain but not very reliable, and it can be applied to interval or ordinal data. 6