* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 20130424 SBSS509 Assign1 Sol
Survey
Document related concepts
Transcript
Module: SBSS509 Assignment 1 Date: 24/04/13 Instructions: Attempt questions 1-11. Type the answers after every question as indicated. Insert charts and tables where required by copying from Excel. Save the answer file under your student number and email to [email protected]. Submit no later than 06/05/13. _________________________________________________________________________________ Refer to the Excel file ‘Assign1 Data’ available on the site http://196.21.81.131 (follow the links). 1. Refer to the Excel sheet ‘Population’. Using Excel, draw a random sample of size n=100 from the population and paste the sample on the sheet ‘Sample’. Explain the procedure. Answer: 2 Methods for selecting random sample: 1) Copy the following to 100 cells: =INDIRECT(ADDRESS(RANDBETWEEN(3,27),RANDBETWEEN(1,40),1,1,”Population”)) or 2) Use the ‘Sampling’ procedure in the ‘Data Analysis’ add-in. 2. Refer to the Excel sheet ‘Descriptive’. Determine the listed descriptive statistics for the ‘Insulin’ data. Answer: 9.7911 7.4918 56.1270 1.0011 =AVERAGE(A2:A57) =STDEV(A2:A57) =VAR(A2:A57) =C3/SQRT(56) = Mean = Standard Deviation = Sample Variance = Standard Error 5.4000 =MODE(A2:A57) 12.4264 =KURT(A2:A57) 3.0386 =SKEW(A2:A57) = Mode = Kurtosis = Skewness 44.9000 =MAX(A2:A57)-MIN(A2:A57) 6.2250 =QUARTILE(A2:A57,3)-QUARTILE(A2:A57,1) = Range = Inter quartile range 3.1000 -13.2750 -3.9375 5.4000 7.4000 11.6250 20.9625 30.3000 48.0000 =MIN(A2:A57) =QUARTILE(A2:A57,1)-3*C12 =QUARTILE(A2:A57,1)-1.5*C12 =QUARTILE(A2:A57,1) =MEDIAN(A2:A57) =QUARTILE(A2:A57,3) =QUARTILE(A2:A57,3)+1.5*C12 =QUARTILE(A2:A57,3)+3*C12 =MAX(A2:A57) = Minimum = Lower outer fence = Lower inner fence = First quartile = Median = Third quartile = Upper inner fence = Upper outer fence = Maximum 3. Comment on the shape of the ‘Insulin’ data by using the descriptive statistics calculated in (2.) above. Refer specifically to the skewness and outilers/ potential outliers. Answer: Kurtosis > 0 ⇒ Peaked (leptokurtic) Mode < Median < Mean ⇒ Positively skewed 2 Values between Upper inner and Upper outer fence ⇒ 2 Outliers 0 Values between Inner and Outer fences ⇒ No potential outliers 4. Explain how the ‘Insulin’ data ideally should be distributed. Refer specifically to the mean and standard deviation. Answer: Data should ideally be normally distributed, i.e., - Kurtosis = 0 - Unimodal - Symmetric (Bell shaped) - Mode = Median = Mean - ±68% of values within 1 standard deviation of the mean - ±95% of values within 1.96 standard deviations of the mean - ±99% of values within 2.57 standard deviations of the mean 5. Construct a histogram for the ‘Insulin’ data and paste it on the ‘Histogram’ sheet. Answer: Histogram 40 35 Frequency 30 25 20 15 10 5 0 Classes 6. Using the histogram for the ‘Insulin’ data, determine the approximate probability of someone having an insulin level of 9.5 or lower. Answer: Approximate probability of insulin level of 9.5 or lower = 37/56 7. Explain the difference between the standard deviation and standard error. Answer: Pages 105 & 107 in the prescribed text - Standard deviation: An estimate of the variability of the scores of a group around the mean (can roughly be thought of as the average distance of the scores from the mean). - Standard error: The variability of the sampling distribution of a statistic. 8. Refer to the Excel sheet ‘Random’. Sort the student numbers randomly into the two groups. Explain the procedure. Answer: Insert ‘=RAND()’ column next to student numbers and sort both columns by sorting random numbers in ascending order. Student numbers will then randomly be assigned to 2 groups. Group 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 Student number 20122088 20122218 20122602 20122741 20123179 20123999 20124379 20124988 20125554 20125618 20126203 20126402 20127053 20127402 20128428 20128452 20128707 20130738 20131034 20131164 =RAND() 0.47093 0.559667 0.281182 0.653977 0.204334 0.447392 0.160771 0.360727 0.46981 0.178782 0.652603 0.968829 0.935877 0.946038 0.673428 0.747869 0.615629 0.334061 0.684618 0.146975 9. List and explain the uses of Pearson’s correlation coefficient. Answer: - Determine the strength of the linear relationship between variables. Correlation will be between -1 and 0 for negative linear relationship and between 0 and 1 for positive linear relationship. - Correlation is used in the construction of a regression equation. The regression equation is used for prediction of the values of a dependent variable using a predictor or an independent variable. - Correlation is used to determine the repeatability/ reliability of tests. A high correlation between test results implies a reliable test. 10. Refer to the Excel sheet ‘Regression’. Calculate the correlation, r, between X and Y and comment on whether a linear model will be appropriate to model the relationship between X and Y. Also construct the scatterplot illustrating the relationship between X and Y. Answer: r = 0.914134 Correlation is high (close to +1), hence strong positive linear relationship exists between X and Y. A linear model is appropriate for the data. Y Scatterplot 11. 14 12 10 8 6 4 2 0 -2 0 5 10 X 15 20 Refer to Exercises 3, 4 and 5 on page 145 of the text ‘Research Methods in Physical Activity’. Complete these three exercises. Answer: 3. 𝑠 𝑏 = 𝑟 𝑠𝑌 = (0.79)(6.3)(305)−1 = 0.01631803279 𝑋 𝑎 = 𝑀𝑌 − 𝑏𝑀𝑋 = 52.6 − 46.55534754 = 6.044652459 𝑦̂ = 𝑎 + 𝑏𝑥 = 6.044652459 + 0.01631803279𝑥 4. 𝑦̂ = 6.044652459 + 0.01631803279(2954) = 54.24812132 𝑦̂ = 6.044652459 + 0.01631803279(2688) = 49.9075246 𝑠𝑌.𝑋 = 𝑠𝑌 √1 − 𝑟 2 = 3.862573106 The true value lie within the interval: predicted value plus or minus one standard error of the estimate, 68% of the time. _________________________________________________________________________________ 5.