Download 20130424 SBSS509 Assign1 Sol

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Module: SBSS509
Assignment 1
Date: 24/04/13
Instructions:
Attempt questions 1-11. Type the answers after every question as indicated. Insert charts and tables
where required by copying from Excel. Save the answer file under your student number and email to
[email protected]. Submit no later than 06/05/13.
_________________________________________________________________________________
Refer to the Excel file ‘Assign1 Data’ available on the site http://196.21.81.131 (follow the links).
1.
Refer to the Excel sheet ‘Population’. Using Excel, draw a random sample of size n=100 from
the population and paste the sample on the sheet ‘Sample’. Explain the procedure.
Answer:
2 Methods for selecting random sample:
1) Copy the following to 100 cells:
=INDIRECT(ADDRESS(RANDBETWEEN(3,27),RANDBETWEEN(1,40),1,1,”Population”)) or
2) Use the ‘Sampling’ procedure in the ‘Data Analysis’ add-in.
2.
Refer to the Excel sheet ‘Descriptive’. Determine the listed descriptive statistics for the
‘Insulin’ data.
Answer:
9.7911
7.4918
56.1270
1.0011
=AVERAGE(A2:A57)
=STDEV(A2:A57)
=VAR(A2:A57)
=C3/SQRT(56)
= Mean
= Standard Deviation
= Sample Variance
= Standard Error
5.4000 =MODE(A2:A57)
12.4264 =KURT(A2:A57)
3.0386 =SKEW(A2:A57)
= Mode
= Kurtosis
= Skewness
44.9000 =MAX(A2:A57)-MIN(A2:A57)
6.2250 =QUARTILE(A2:A57,3)-QUARTILE(A2:A57,1)
= Range
= Inter quartile range
3.1000
-13.2750
-3.9375
5.4000
7.4000
11.6250
20.9625
30.3000
48.0000
=MIN(A2:A57)
=QUARTILE(A2:A57,1)-3*C12
=QUARTILE(A2:A57,1)-1.5*C12
=QUARTILE(A2:A57,1)
=MEDIAN(A2:A57)
=QUARTILE(A2:A57,3)
=QUARTILE(A2:A57,3)+1.5*C12
=QUARTILE(A2:A57,3)+3*C12
=MAX(A2:A57)
= Minimum
= Lower outer fence
= Lower inner fence
= First quartile
= Median
= Third quartile
= Upper inner fence
= Upper outer fence
= Maximum
3.
Comment on the shape of the ‘Insulin’ data by using the descriptive statistics calculated in
(2.) above. Refer specifically to the skewness and outilers/ potential outliers.
Answer:
Kurtosis > 0 ⇒ Peaked (leptokurtic)
Mode < Median < Mean ⇒ Positively skewed
2 Values between Upper inner and Upper outer fence ⇒ 2 Outliers
0 Values between Inner and Outer fences ⇒ No potential outliers
4.
Explain how the ‘Insulin’ data ideally should be distributed. Refer specifically to the mean
and standard deviation.
Answer:
Data should ideally be normally distributed, i.e.,
- Kurtosis = 0
- Unimodal
- Symmetric (Bell shaped)
- Mode = Median = Mean
- ±68% of values within 1 standard deviation of the mean
- ±95% of values within 1.96 standard deviations of the mean
- ±99% of values within 2.57 standard deviations of the mean
5.
Construct a histogram for the ‘Insulin’ data and paste it on the ‘Histogram’ sheet.
Answer:
Histogram
40
35
Frequency
30
25
20
15
10
5
0
Classes
6.
Using the histogram for the ‘Insulin’ data, determine the approximate probability of
someone having an insulin level of 9.5 or lower.
Answer:
Approximate probability of insulin level of 9.5 or lower = 37/56
7.
Explain the difference between the standard deviation and standard error.
Answer:
Pages 105 & 107 in the prescribed text
- Standard deviation: An estimate of the variability of the scores of a group around the mean
(can roughly be thought of as the average distance of the scores from the mean).
- Standard error: The variability of the sampling distribution of a statistic.
8.
Refer to the Excel sheet ‘Random’. Sort the student numbers randomly into the two groups.
Explain the procedure.
Answer:
Insert ‘=RAND()’ column next to student numbers and sort both columns by sorting random
numbers in ascending order. Student numbers will then randomly be assigned to 2 groups.
Group
1
1
1
1
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
Student number
20122088
20122218
20122602
20122741
20123179
20123999
20124379
20124988
20125554
20125618
20126203
20126402
20127053
20127402
20128428
20128452
20128707
20130738
20131034
20131164
=RAND()
0.47093
0.559667
0.281182
0.653977
0.204334
0.447392
0.160771
0.360727
0.46981
0.178782
0.652603
0.968829
0.935877
0.946038
0.673428
0.747869
0.615629
0.334061
0.684618
0.146975
9.
List and explain the uses of Pearson’s correlation coefficient.
Answer:
- Determine the strength of the linear relationship between variables. Correlation will be
between -1 and 0 for negative linear relationship and between 0 and 1 for positive linear
relationship.
- Correlation is used in the construction of a regression equation. The regression equation is
used for prediction of the values of a dependent variable using a predictor or an
independent variable.
- Correlation is used to determine the repeatability/ reliability of tests. A high correlation
between test results implies a reliable test.
10.
Refer to the Excel sheet ‘Regression’. Calculate the correlation, r, between X and Y and
comment on whether a linear model will be appropriate to model the relationship between
X and Y. Also construct the scatterplot illustrating the relationship between X and Y.
Answer:
r = 0.914134
Correlation is high (close to +1), hence strong positive linear relationship exists between X
and Y. A linear model is appropriate for the data.
Y
Scatterplot
11.
14
12
10
8
6
4
2
0
-2 0
5
10
X
15
20
Refer to Exercises 3, 4 and 5 on page 145 of the text ‘Research Methods in Physical Activity’.
Complete these three exercises.
Answer:
3.
𝑠
𝑏 = 𝑟 𝑠𝑌 = (0.79)(6.3)(305)−1 = 0.01631803279
𝑋
𝑎 = 𝑀𝑌 − 𝑏𝑀𝑋 = 52.6 − 46.55534754 = 6.044652459
𝑦̂ = 𝑎 + 𝑏𝑥 = 6.044652459 + 0.01631803279𝑥
4.
𝑦̂ = 6.044652459 + 0.01631803279(2954) = 54.24812132
𝑦̂ = 6.044652459 + 0.01631803279(2688) = 49.9075246
𝑠𝑌.𝑋 = 𝑠𝑌 √1 − 𝑟 2 = 3.862573106
The true value lie within the interval: predicted value plus or minus one standard
error of the estimate, 68% of the time.
_________________________________________________________________________________
5.