Download Solution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Forecasting wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Data assimilation wikipedia , lookup

Choice modelling wikipedia , lookup

Time series wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Statistical Methods in Scientific Research
Solution Sheet 1: Analysis of data using SPSS
1.
Background
Q0. Before analysing these data, what questions would you want to ask the people who actually did
the experiment?
How were the measurements made? For example, were they made by the same person, with the
same equipment under the same conditions?
Were the three replicates identical?
What other factors might influence the measurements?
2.
Introduction to SPSS
Q1. Follow the handout instruction to familiarise yourself with SPSS.
Dataset 1
Sample 1: total root length when no glyphosate is added. (n=12)
114.2, 104.7, 102.0, 114.2, 109.5, 109.0, 73.0, 133.0, 127.0, 178.0, 182.0, 145.0
Sample 2: total root length when glyphosate concentration level is 0.053ppm. (n=6)
143.1, 88.6, 87.0, 61.0, 150.0, 106.0
Q2. Go to your working directory, find the dataset. Open the data file to check the data format and
the sample size.
The form of the dataset is:
rootsize
group
114.2
0
104.7
0
102
0
etc.
The total sample size = 18, n(Group 0) = 12, n(Group 1) = 6
Q3. Read in the data and switch the window between Data Editor window and Variable View
window to find out names and types of the variables. Save it as a SPSS dataset.
The variable names are: rootsize  continuous (scale), group – discrete (nominal)
Q4. Obtain the mean and standard deviation of the variable rootsize.
For rootsize, we obtain mean = 118.183, sd = 32.7583.
1
Q5. Obtain the mean and standard deviation of each sample.
Mean
Standard Deviation
Sample size
Sample 1
124.300
31.4711
12
Sample 2
105.950
34.6444
6
Q6. Construct the 95% confidence intervals for each population mean.
We can do this with some additional calculations as follows:
Using quatiles: t(0.975, df=11) = 2.201, t(0.975, df=5) = 2.571,
Sample 1: 124.300  2.20131.4711/12 = (104.304, 144.296)
Sample 2: 109.950  2.57134.6444/6 = ( 73.537, 146.263)
These intervals are very wide, especially that for Population 2, which is based on only 6
observations.
3.
Comparing two samples
Comparing two independent samples: T test
Q7. Is there any evidence that the population means of root length are different at two concentration
levels?

State the Null Hypothesis.
H0: 1 = 2, where 1 is the mean of Population 1 and 2 is the mean of Population 2.
 What
is the difference between the sample means? ̂ = 18.35.

What is the value of the t-test statistic, the number of degrees-of-freedom and the
corresponding p-value?
t = 1.129, df = 16, p = 0.276.
Construct
the 95% confidence interval for the population mean difference.
(16.0943, 52.7943)
What
is your conclusion? Although there was slight decrease in root length in these samples
when glyphosate is added to the water at a concentration of 0.053ppm, the difference is not
statistically significant, and thus might not reflect a true and reproducible effect.
4.
Exploring relationship between variables: Dataset 2
Q8. Read in the data file and make sure types of the variables rootsize and glyph are recorded as
Scale in the Variable View window.
Obtain numerical summaries for the variable rootsize by watertype. Is there any difference in root
growth by water type?
2
Mean
Standard Deviation
Sample size
Water type 1
81.156
31.5661
27
Water type 2
90.000
48.6012
27
There is no significant difference between water types, t = 0.972, df = 52, p = 0.335.
(The unequal variance version gives t = 0.972, df = 44.6 (under unequal variance), p = 0.336,
which is hardly any different!)
Q9.
1. Create the scatter plot between rootsize and glyph describe the graph.
2. Does any of the linear regression fits appear to describe the underlying relationship
properly?


watertype
160.0



roots ize


120.0
80.0




1
2































40.0



0.000
0.053
0.106
0.211
0.423
0.845
glyph
3
1.690






3.380


Mean
160.0







roots ize


120.0



















80.0
 Mean




= 90.6








40.0







0.000
0.053
0.106
0.211
0.423
0.845

1.690
3.380
glyph


roots ize
160.0
120.0
80.0
Linear Regression



 


 






















rootsize = 111.43 + -27.98 * glyph
R-Square = 0.54







40.0







0.000
1.000

2.000
3.000
glyph
As the concentration level goes higher, the total root length decreases. There is no clear pattern
between two water types. Although the linear regression fits better than the mean regression
only, the relationship may not be linear.
4
Q10. Create the scatterplot between logroot and glyph.


5.00
logro ot
4.50

 


 


 








Linear Regression










4.00







3.50







logroot
= 4.69 + -0.41 * glyph
R-Square = 0.68
3.00
0.000
1.000
2.000
3.000
glyph
5

Q11. Create log transformed variable of glyphosate concentration called logglyph:
logglyph = Ln(glyph)
Check in the Data Editor window. Why are not all values defined, with warnings in the Output
window?
Some values of the glyphosphate concentrations are 0, and log(0) is minus infinity.
Note that the log function can only take positive values and the zeros in glyphosate level are not
acceptable. To avoid the problem, we define the new variable by
logglyphonew = Ln(1 + glyph).


Linear Regression
160.0

 
roots ize


120.0
80.0







  



















rootsize = 118.04 + -65.45 * logglyphonew
R-Square = 0.61







40.0







0.00
0.50

1.00
1.50
logglyphonew
6
Q12. The above pictures illustrate the effect of applying various transformations to the plant growth
data. Suggest which, if any, would be appropriate for analysis using the linear regression model.
Express the implied relationship between root size and glyphosate concentration levels on the scale
of the original data.


5.00
Linear Regression
 
 




  







logro ot
4.50



















4.00
3.50




logroot = 4.78 + -0.94 * logglyphonew
R-Square = 0.73








3.00
0.00
0.50
1.00
1.50
logglyphonew
Taking log transformation on both variables provides the most reasonable looking linear fit.
7
5.
Linear regression
Obtain parameter estimates of the Linear Regression fit.
The fourth table produced by SPSS, Coefficients, contains the model parameters. By default,
Intercept is included in the model.
coefficients(a)
Unstandardized
Coefficients
B
Std. Error
4.778
.051
-.944
.079
Model
1
(Constant)
logglyphonew
Standar
dized
Coeffici
Beta
ents
-.857
B
94.379
-11.971
Std. Error
.000
.000
a Dependent Variable: logroot
From this table, we can see that the fitted model is:
log(rootsize) = 4.78 – 0.94 log(1+glyph) + 
The third table provided by SPSS, ANOVA, provides the residual sum of squares (RSS) and the
degrees of freedom (df). These are useful for model comparison.
ANOVA(b)
Model
1
Regression
Residual
Total
Sum of
Squares
11.365
df
4.124
15.489
1
52
53
Mean Square
11.365
F
143.307
Sig.
.000(a)
.079
a Predictors: (Constant), logglyphonew
b Dependent Variable: logroot
The standard error of the residual terms  is estimated by the residual mean square from the
ANOVA table:
ˆ


e
s
t
i
m
a
t
e
d
s
t
a
n
d
a
r
d
d
e
v
i
a
t
i
o
n
o
f
ε

√0.079 = 0.281
The model can also be expressed as:
1
1
9
.
1 
r
o
o
t
s
i
z
e

e
0
.
9
4
1

g
l
y
p
h

 .
I wonder whether the power to which (1 + glyph) is raised is significantly different from 1.
8