Download BIMM18 * Lab 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Confidence interval wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Omnibus test wikipedia , lookup

Regression toward the mean wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Categorical variable wikipedia , lookup

Analysis of variance wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
1
BIMM18 Lab 2
Using SPSS & Descriptive Statistics (Summary measures, histograms
and Box Plots)
In this lab you should learn to
-
Using the ‘birth’-data base:
Select and paste some of the syntaxes that were saved after Lab 1 into the syntax window.
Thereafter run some of the commands from the syntax window.
Perform t-test
Perform ANOVA
Learn how to solve the tasks for the Cell culture course at BIMM18.:
Import an Excell-sheet
Suspension data
1. For each day: Provide good descriptive statistics: calculate means (with 95%
Confidence intervals for the means), standard deviation, and medians.
2. Draw graphs showing the growth curves on logarithmic scales.
Cell count data
1. Draw a box-plot showing describing the cellcounts for day 1-4 by treatment.
2. Draw a linear graph showing the mean with 95% CI for day 1-4 by treatment
3. Examine the distribution of the cell counts for each treatment (controls, SAL,
Analog), and culture day. Decide if T-tests would be suitable for testing the
difference between treatments day 2,3 and 4.
4. Irrespective of the answer of task 3 (for practice), for each day 2,3 and 4, perform ttests comparing the cellcounts between controls – SAL, controls- Analog, and SALanalog. Discuss the multiple testing problem.
5. Irrespective of the answer of task 3 (for practice), for each day 2,3 and 4, perform
one-way ANOVA analyses. Discuss the multiple testing problem.
Especially prepared paired data set:
6. Irrespective of the answer of task 3 (for practice), for each day 2,3 and 4, perform
paired t-tests comparing the cellcounts between controls – SAL, controls- Analog,
and SAL-analog.
7. Perform non-parametric paired test corresponding to the tests under 6. Compare the
results.
2
Using the ‘birth’-data set:
Use the syntax window:
1.
2.
3.
4.
5.
Open a your ‘birth’-data set by double clicking
Choose File> open> output to open the output window that you saved last lab.
Choose File > new > syntax to open a new syntax file. Save it with a sensible name.
Copy-and-paste these commands that you would like to save
Run the commands from the syntax window.
Perform T-test to compare means:
1. Investigate if the suspected difference between the birth weight of smoking and nonsmoking mothers is statistically significant.
A Choose Analyze > Compare means > Independent samples T-test >
B. Choose Birth_weight as test variable
C. Chosse maternal smoke as grouping variable
D. Press “Define groups”
Devide the smoking variable into smoking-nonsmoking categories by setting the cut-off to 2.
E. Either press paste to save the syntax to the syntax window, and execute the command
from there. Or press OK directly.
F. Study the output. What does it say?
2. Perform two more T-test. But before performing the tests: check so that the distribution of
the dependent variable is reasonably normally distributed. View the data. Do basic
descriptive analyses.
E.g., Check the distribution of maternal height
A. Transform the maternal height variable into a variable in which impossible values are
defined as missing (eg., height=0).
B. Choose Graphs > legacy dialogs > histogram
C. Choose ‘maternal height’ as ‘Variable’
D. Tick the box “Display normal curve”
Or, in order to get both summary statistics and a histogram:
A. Choose Analyze > Descriptive statistics > Frequenses
B. Press “Statistics” to decide the statistics output
3
C. Press “Charts” > histograms
D. Tick the box “Display normal curve”
E. Continue > OK / Paste
3. Compare the means between the three smoking groups by performing an ANOVA
A. Choose Analyze > Compare means > One way ANOVA
B. Choose birth weight as dependent variable
C. Choose ‘maternal smoking’ as factor
D. Perform the analysis. Interpret the results. Discuss.
Remember:
It requires more to interpret the output than it does to perform the analyses.
Applied statistics: Learn how to solve the tasks for the
Cell culture course at BIMM18.
(note that the output examples below are calculated on another data set – not the current).
Basic SPSS management:
Import the cell-suspension data set:
After starting SPSS: Importing an excellsheet by pressing File>Open>Data
Browse to find the excel-sheet you would like to import. Choose files of type=.xls, xlsx, xlsm
Press OK in the emerging dialog box (or specify which working sheet in the excel-book you
would like to import).
Save the data base as an SPSS-data base (.sav): File>Save as
Browse and name the data base (do not use default).
Open a syntax-window in which you can save your commands:
File>new>Syntax
Suspension data
I: For each day: Provide good descriptive statistics: calculate means (with 95% Confidence
intervals for the means), standard deviation, and medians.
Boxplot graph:
Graphs>Legacy dialogs>Boxplot
4
Choose “Simple” and “Summaries for groups of cases” >Define
Choose the variable and category axis as above.
Press “Paste” if you would just like to save the code.
Otherwise: press OK.
Case Processing Summary
Day
Cases
5
Valid
N
CellCount
Missing
Percent
N
Total
Percent
N
Percent
1
35
100,0%
0
0,0%
35
100,0%
2
35
100,0%
0
0,0%
35
100,0%
3
35
100,0%
0
0,0%
35
100,0%
4
35
100,0%
0
0,0%
35
100,0%
In order to have a logarithmic scale on the y-axis:
Klick on the labels on the y-axis. A dialog box appears:
6
Press ”Scale”
Choose ”Logarithmic” > Apply
Close the chart editor
Save the output window. File>Save as
7
Descriptive statistics
You could do it in many ways. For this example use:
Analyze>Descriptive statistics >Explore
Choose the variables as in the example above.
Press Plots and options. Depending on your choices, you will have a tremendous
amount of results and plots…..
Interpret your results. Decide the means and 95%CI, the SD, and the medians.
Descriptives
Day
CellCount
1
Statistic
Mean
95% Confidence Interval for Mean
,40260
Lower Bound
,38134
Upper Bound
,42386
5% Trimmed Mean
,40308
Median
,40000
Variance
Std. Deviation
,004
,061877
Minimum
,274
Maximum
,520
Std. Error
,010459
8
2
Range
,246
Interquartile Range
,092
Skewness
-,043
,398
Kurtosis
-,829
,778
,56280
,019588
Mean
95% Confidence Interval for Mean
Lower Bound
,52299
Upper Bound
,60261
5% Trimmed Mean
,56629
Median
,56000
Variance
,013
Std. Deviation
,115884
Minimum
,280
Maximum
,770
Range
,490
Interquartile Range
,164
Skewness
Kurtosis
3
Mean
95% Confidence Interval for Mean
-,539
,398
,109
,778
,90123
,029183
Lower Bound
,84192
Upper Bound
,96054
5% Trimmed Mean
,89771
Median
,88000
Variance
Std. Deviation
,030
,172650
9
4
Minimum
,560
Maximum
1,310
Range
,750
Interquartile Range
,250
Skewness
,509
,398
Kurtosis
,179
,778
1,09509
,035553
Mean
95% Confidence Interval for Mean
Lower Bound
1,02283
Upper Bound
1,16734
5% Trimmed Mean
1,08193
Median
1,11000
Variance
Std. Deviation
,044
,210332
Minimum
,740
Maximum
1,960
Range
1,220
Interquartile Range
,170
Skewness
1,674
,398
Kurtosis
7,816
,778
10
Create a line diagram showing the means for day 1-3 with 95% CI.
Graphs>legacy dialogs>line
11
Choose simple and summaries for groups of cases. Press “Define”
Choose the variables as in the example above. Press “Options”
12
Check “Display error bars”. Press “Continue”
Change the scale on the y-axes to logarithmic as before.
Close the chart-editor.
Save the output window.
13
Create a line diagram showing the means for day 1-3 for all groups.
Follow the steps above, but choose “multiple instead”.
Choose the variables as in the example (right).
Re-scale the y-axis, close the graph-editor
Save the output windoe
14
Cell count data
Import the excel-sheet “CellCount._Total2014.xlsx” to SPSS as before
Draw a box-plot showing describing the cellcounts for day 1-4 by treatment.
Follow the steps as before with the cell suspension data set. But this time, use the
clustered option:
15
Transform the y-axes scale, close the chart editor, save the output window.
Draw a linear graph showing the mean with 95% CI for day 1-4 by treatment
Follow the steps as before with the cell suspension data set. Use the multi-option.
Examine the distribution of the cell counts for each treatment (controls, SAL, Analog), and
culture day. Decide if T-tests would be suitable for testing the difference between
treatments day 2,3 and 4.
This can be done in many ways. Use the analyze>descriptive>explore command
again, or do a graphic presentation:
Graphs>legacy dialogs>histogram>
16
17
Irrespective of if the assumption of normal distribution holds, (for practice), for each day
2,3 and 4, perform t-tests comparing the cellcounts between controls – SAL, controlsAnalog, and SAL-analog. Discuss the multiple testing problem.
In order to analyze one day at a time, you must use a filter:
Data>Select cases>Check “If condition is satisfied”
Press “Continue”
To perform the T-test: Analyze>Compare means > Independent samples T-test
18
Press Define Groups
Use this setting to test Controls(=1) versus SAL (=2)
Group Statistics
TreatmentType
nCell
N
Mean
Std. Deviation
Std. Error Mean
1,0
25
761367,040
226360,3365
45272,0673
2,0
14
570000,000
111808,1048
29881,9730
19
Independent Samples Test
Levene's Test for Equality of
Variances
t-test for Equality of Means
95% Confidence Interval of the Difference
F
nCell
Equal variances assumed
11,302
Sig.
t
,002
Equal variances not assumed
Interpret the results.
Repeat for Controls vs analog, and SAL vs analog.
Repeat for day 3 and day 4 (change the filter settings).
df
Sig. (2-
Mean
tailed)
Difference
Std. Error Difference
Lower
Upper
2,955
37
,005
191367,0400
64752,4179
60166,1789
322567,9011
3,528
36,631
,001
191367,0400
54244,7452
81419,3990
301314,6810
20
For each day 2,3 and 4, perform one-way ANOVA analyses. Discuss the multiple testing
problem.
Set the filter to test a specific day.
Analyze>Compare means>One way ANOVA
Pess “Post Hoc”:
Choose type of adjustment for multiple testing.
ANOVA
nCell
21
Sum of Squares
df
Mean Square
Between Groups
1020667702643,668
2
510333851321,834
Within Groups
1717146441388,960
48
35773884195,603
Total
2737814144032,628
50
F
14,266
Sig.
,000
22
Post Hoc Tests
Multiple Comparisons
Dependent Variable: nCell
95% Confidence Interval
Tukey HSD
(I) TreatmentType
(J) TreatmentType
1,0
2,0
191367,0400*
63136,6202
,011
38671,906
344062,174
3,0
343308,0400*
66423,7336
,000
182663,063
503953,017
1,0
-191367,0400*
63136,6202
,011
-344062,174
-38671,906
3,0
151941,0000
74407,2205
,113
-28011,941
331893,941
1,0
-343308,0400*
66423,7336
,000
-503953,017
-182663,063
2,0
-151941,0000
74407,2205
,113
-331893,941
28011,941
2,0
191367,0400*
63136,6202
,012
34738,756
347995,324
3,0
343308,0400*
66423,7336
,000
178525,139
508090,941
2,0
3,0
Bonferroni
1,0
Mean Difference (I-J)
Std. Error
Sig.
Lower Bound
Upper Bound
23
2,0
3,0
1,0
-191367,0400*
63136,6202
,012
-347995,324
-34738,756
3,0
151941,0000
74407,2205
,140
-32647,203
336529,203
1,0
-343308,0400*
66423,7336
,000
-508090,941
-178525,139
2,0
-151941,0000
74407,2205
,140
-336529,203
32647,203
*. The mean difference is significant at the 0.05 level.
24
Homogeneous Subsets
nCell
Subset for alpha = 0.05
TreatmentType
Tukey HSDa,b
N
1
2
3,0
12
418059,000
2,0
14
570000,000
1,0
25
Sig.
761367,040
,076
1,000
Means for groups in homogeneous subsets are displayed.
a. Uses Harmonic Mean Sample Size = 15,403.
b. The group sizes are unequal. The harmonic mean of the group sizes is used. Type I error levels are not
guaranteed.
Save the output window
Discuss the results. Compare with the results for the t-test. Repeat for day 3 and 4.
For each day 2,3 and 4, perform paired t-tests comparing the cellcounts between controls
– SAL, controls- Analog, and SAL-analog.
Here we need a dataset structured in a different way.
Import the excel-sheet “CellCount._Total2014_paired.xlsx”
Analyze>compare means>Paired samples t-test.
25
Paired Samples Statistics
Mean
Pair 1
Pair 2
Pair 3
14
181454,010
48495,624
SalDay2
570000,00
14
111808,105
29881,973
ControlsDay2
688023,00
12
248740,327
71805,147
AnalogDay2
418059,00
12
171860,598
49611,881
1590714,286
14
403283,9100
107782,1586
588571,43
14
181240,475
48438,555
1204369,500
12
778366,4114
224695,0286
392621,83
12
171914,332
49627,393
2229214,286
14
418215,2884
111772,7374
733000,00
14
358080,996
95701,172
1825750,000
12
934931,5118
269891,4800
457096,08
12
198504,128
57303,206
ControlsDay3
ControlsDay3
ControlsDay4
SalDay4
Pair 6
Std. Error Mean
824992,86
AnalogDay3
Pair 5
Std. Deviation
ControlsDay2
SalDay3
Pair 4
N
ControlsDay4
AnalogDay4
26
Paired Samples Correlations
N
Correlation
Sig.
Pair 1
ControlsDay2 & SalDay2
14
-,037
,900
Pair 2
ControlsDay2 & AnalogDay2
12
,723
,008
Pair 3
ControlsDay3 & SalDay3
14
-,221
,448
Pair 4
ControlsDay3 & AnalogDay3
12
,392
,208
Pair 5
ControlsDay4 & SalDay4
14
,075
,798
Pair 6
ControlsDay4 & AnalogDay4
12
-,228
,477
27
Paired Samples Test
Paired Differences
95% Confidence Interval of the Difference
Mean
Std. Deviation
Std. Error Mean
Lower
Upper
t
df
Sig. (2-tailed)
Pair 1
ControlsDay2 - SalDay2
254992,857
216636,890
57898,644
129910,441
380075,274
4,404
13
,001
Pair 2
ControlsDay2 - AnalogDay2
269964,000
172105,840
49682,676
160613,167
379314,833
5,434
11
,000
Pair 3
ControlsDay3 - SalDay3
1002142,8571
477241,2480
127548,0886
726591,9643
1277693,7500
7,857
13
,000
Pair 4
ControlsDay3 - AnalogDay3
811747,6667
728358,3914
210258,9567
348970,8232
1274524,5101
3,861
11
,003
Pair 5
ControlsDay4 - SalDay4
1496214,2857
529653,9042
141555,9602
1190401,2261
1802027,3453
10,570
13
,000
Pair 6
ControlsDay4 - AnalogDay4
1368653,9167
998985,2888
288382,2127
733928,9460
2003378,8873
4,746
11
,001
Compare the results with the previous results (independent t-tests).
BIMM18 – Lab 2
den 16 september 2015
Perform non-parametric paired test corresponding to the tests under 6. Compare the
results.
Use the paired data set.
Analyze>Non parametric tests > Legacy dialogs >2 related samples
Wilcoxon Signed Ranks Test
Ranks
N
SalDay2 - ControlsDay2
Mean Rank
Sum of Ranks
Negative Ranks
12a
8,17
98,00
Positive Ranks
2b
3,50
7,00
Ties
0c
Total
14
28
Linda Hartman
Karin Källen
BIMM18 – Lab 2
den 16 september 2015
AnalogDay3 - ControlsDay2
SalDay3 - ControlsDay3
AnalogDay3 - ControlsDay3
SalDay4 - ControlsDay4
AnalogDay4 - ControlsDay4
Negative Ranks
10d
7,50
75,00
Positive Ranks
2e
1,50
3,00
Ties
0f
Total
12
Negative Ranks
14g
7,50
105,00
Positive Ranks
0h
,00
,00
Ties
0i
Total
14
Negative Ranks
10j
7,30
73,00
Positive Ranks
2k
2,50
5,00
Ties
0l
Total
12
14m
7,50
105,00
Positive Ranks
0n
,00
,00
Ties
0o
Total
14
Negative Ranks
Negative Ranks
11p
7,00
77,00
Positive Ranks
1q
1,00
1,00
Ties
0r
Total
12
a. SalDay2 < ControlsDay2
b. SalDay2 > ControlsDay2
c. SalDay2 = ControlsDay2
d. AnalogDay3 < ControlsDay2
e. AnalogDay3 > ControlsDay2
29
Linda Hartman
Karin Källen
BIMM18 – Lab 2
den 16 september 2015
f. AnalogDay3 = ControlsDay2
g. SalDay3 < ControlsDay3
h. SalDay3 > ControlsDay3
i. SalDay3 = ControlsDay3
j. AnalogDay3 < ControlsDay3
k. AnalogDay3 > ControlsDay3
l. AnalogDay3 = ControlsDay3
m. SalDay4 < ControlsDay4
n. SalDay4 > ControlsDay4
o. SalDay4 = ControlsDay4
p. AnalogDay4 < ControlsDay4
q. AnalogDay4 > ControlsDay4
r. AnalogDay4 = ControlsDay4
Test Statisticsa
SalDay2 -
AnalogDay3 -
SalDay3 -
AnalogDay3 -
SalDay4 -
AnalogDay4 -
ControlsDay2
ControlsDay2
ControlsDay3
ControlsDay3
ControlsDay4
ControlsDay4
Z
-2,857b
-2,824b
-3,297b
-2,667b
-3,296b
-2,981b
,004
,005
,001
,008
,001
,003
Asymp. Sig. (2tailed)
a. Wilcoxon Signed Ranks Test
b. Based on positive ranks.
30
Linda Hartman
Karin Källen
BIMM18 – Lab 2
den 16 september 2015
31
Linda Hartman
Karin Källen