Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Psychometrics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Confidence interval wikipedia , lookup
Foundations of statistics wikipedia , lookup
History of statistics wikipedia , lookup
Omnibus test wikipedia , lookup
Regression toward the mean wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Categorical variable wikipedia , lookup
Analysis of variance wikipedia , lookup
1 BIMM18 Lab 2 Using SPSS & Descriptive Statistics (Summary measures, histograms and Box Plots) In this lab you should learn to - Using the ‘birth’-data base: Select and paste some of the syntaxes that were saved after Lab 1 into the syntax window. Thereafter run some of the commands from the syntax window. Perform t-test Perform ANOVA Learn how to solve the tasks for the Cell culture course at BIMM18.: Import an Excell-sheet Suspension data 1. For each day: Provide good descriptive statistics: calculate means (with 95% Confidence intervals for the means), standard deviation, and medians. 2. Draw graphs showing the growth curves on logarithmic scales. Cell count data 1. Draw a box-plot showing describing the cellcounts for day 1-4 by treatment. 2. Draw a linear graph showing the mean with 95% CI for day 1-4 by treatment 3. Examine the distribution of the cell counts for each treatment (controls, SAL, Analog), and culture day. Decide if T-tests would be suitable for testing the difference between treatments day 2,3 and 4. 4. Irrespective of the answer of task 3 (for practice), for each day 2,3 and 4, perform ttests comparing the cellcounts between controls – SAL, controls- Analog, and SALanalog. Discuss the multiple testing problem. 5. Irrespective of the answer of task 3 (for practice), for each day 2,3 and 4, perform one-way ANOVA analyses. Discuss the multiple testing problem. Especially prepared paired data set: 6. Irrespective of the answer of task 3 (for practice), for each day 2,3 and 4, perform paired t-tests comparing the cellcounts between controls – SAL, controls- Analog, and SAL-analog. 7. Perform non-parametric paired test corresponding to the tests under 6. Compare the results. 2 Using the ‘birth’-data set: Use the syntax window: 1. 2. 3. 4. 5. Open a your ‘birth’-data set by double clicking Choose File> open> output to open the output window that you saved last lab. Choose File > new > syntax to open a new syntax file. Save it with a sensible name. Copy-and-paste these commands that you would like to save Run the commands from the syntax window. Perform T-test to compare means: 1. Investigate if the suspected difference between the birth weight of smoking and nonsmoking mothers is statistically significant. A Choose Analyze > Compare means > Independent samples T-test > B. Choose Birth_weight as test variable C. Chosse maternal smoke as grouping variable D. Press “Define groups” Devide the smoking variable into smoking-nonsmoking categories by setting the cut-off to 2. E. Either press paste to save the syntax to the syntax window, and execute the command from there. Or press OK directly. F. Study the output. What does it say? 2. Perform two more T-test. But before performing the tests: check so that the distribution of the dependent variable is reasonably normally distributed. View the data. Do basic descriptive analyses. E.g., Check the distribution of maternal height A. Transform the maternal height variable into a variable in which impossible values are defined as missing (eg., height=0). B. Choose Graphs > legacy dialogs > histogram C. Choose ‘maternal height’ as ‘Variable’ D. Tick the box “Display normal curve” Or, in order to get both summary statistics and a histogram: A. Choose Analyze > Descriptive statistics > Frequenses B. Press “Statistics” to decide the statistics output 3 C. Press “Charts” > histograms D. Tick the box “Display normal curve” E. Continue > OK / Paste 3. Compare the means between the three smoking groups by performing an ANOVA A. Choose Analyze > Compare means > One way ANOVA B. Choose birth weight as dependent variable C. Choose ‘maternal smoking’ as factor D. Perform the analysis. Interpret the results. Discuss. Remember: It requires more to interpret the output than it does to perform the analyses. Applied statistics: Learn how to solve the tasks for the Cell culture course at BIMM18. (note that the output examples below are calculated on another data set – not the current). Basic SPSS management: Import the cell-suspension data set: After starting SPSS: Importing an excellsheet by pressing File>Open>Data Browse to find the excel-sheet you would like to import. Choose files of type=.xls, xlsx, xlsm Press OK in the emerging dialog box (or specify which working sheet in the excel-book you would like to import). Save the data base as an SPSS-data base (.sav): File>Save as Browse and name the data base (do not use default). Open a syntax-window in which you can save your commands: File>new>Syntax Suspension data I: For each day: Provide good descriptive statistics: calculate means (with 95% Confidence intervals for the means), standard deviation, and medians. Boxplot graph: Graphs>Legacy dialogs>Boxplot 4 Choose “Simple” and “Summaries for groups of cases” >Define Choose the variable and category axis as above. Press “Paste” if you would just like to save the code. Otherwise: press OK. Case Processing Summary Day Cases 5 Valid N CellCount Missing Percent N Total Percent N Percent 1 35 100,0% 0 0,0% 35 100,0% 2 35 100,0% 0 0,0% 35 100,0% 3 35 100,0% 0 0,0% 35 100,0% 4 35 100,0% 0 0,0% 35 100,0% In order to have a logarithmic scale on the y-axis: Klick on the labels on the y-axis. A dialog box appears: 6 Press ”Scale” Choose ”Logarithmic” > Apply Close the chart editor Save the output window. File>Save as 7 Descriptive statistics You could do it in many ways. For this example use: Analyze>Descriptive statistics >Explore Choose the variables as in the example above. Press Plots and options. Depending on your choices, you will have a tremendous amount of results and plots….. Interpret your results. Decide the means and 95%CI, the SD, and the medians. Descriptives Day CellCount 1 Statistic Mean 95% Confidence Interval for Mean ,40260 Lower Bound ,38134 Upper Bound ,42386 5% Trimmed Mean ,40308 Median ,40000 Variance Std. Deviation ,004 ,061877 Minimum ,274 Maximum ,520 Std. Error ,010459 8 2 Range ,246 Interquartile Range ,092 Skewness -,043 ,398 Kurtosis -,829 ,778 ,56280 ,019588 Mean 95% Confidence Interval for Mean Lower Bound ,52299 Upper Bound ,60261 5% Trimmed Mean ,56629 Median ,56000 Variance ,013 Std. Deviation ,115884 Minimum ,280 Maximum ,770 Range ,490 Interquartile Range ,164 Skewness Kurtosis 3 Mean 95% Confidence Interval for Mean -,539 ,398 ,109 ,778 ,90123 ,029183 Lower Bound ,84192 Upper Bound ,96054 5% Trimmed Mean ,89771 Median ,88000 Variance Std. Deviation ,030 ,172650 9 4 Minimum ,560 Maximum 1,310 Range ,750 Interquartile Range ,250 Skewness ,509 ,398 Kurtosis ,179 ,778 1,09509 ,035553 Mean 95% Confidence Interval for Mean Lower Bound 1,02283 Upper Bound 1,16734 5% Trimmed Mean 1,08193 Median 1,11000 Variance Std. Deviation ,044 ,210332 Minimum ,740 Maximum 1,960 Range 1,220 Interquartile Range ,170 Skewness 1,674 ,398 Kurtosis 7,816 ,778 10 Create a line diagram showing the means for day 1-3 with 95% CI. Graphs>legacy dialogs>line 11 Choose simple and summaries for groups of cases. Press “Define” Choose the variables as in the example above. Press “Options” 12 Check “Display error bars”. Press “Continue” Change the scale on the y-axes to logarithmic as before. Close the chart-editor. Save the output window. 13 Create a line diagram showing the means for day 1-3 for all groups. Follow the steps above, but choose “multiple instead”. Choose the variables as in the example (right). Re-scale the y-axis, close the graph-editor Save the output windoe 14 Cell count data Import the excel-sheet “CellCount._Total2014.xlsx” to SPSS as before Draw a box-plot showing describing the cellcounts for day 1-4 by treatment. Follow the steps as before with the cell suspension data set. But this time, use the clustered option: 15 Transform the y-axes scale, close the chart editor, save the output window. Draw a linear graph showing the mean with 95% CI for day 1-4 by treatment Follow the steps as before with the cell suspension data set. Use the multi-option. Examine the distribution of the cell counts for each treatment (controls, SAL, Analog), and culture day. Decide if T-tests would be suitable for testing the difference between treatments day 2,3 and 4. This can be done in many ways. Use the analyze>descriptive>explore command again, or do a graphic presentation: Graphs>legacy dialogs>histogram> 16 17 Irrespective of if the assumption of normal distribution holds, (for practice), for each day 2,3 and 4, perform t-tests comparing the cellcounts between controls – SAL, controlsAnalog, and SAL-analog. Discuss the multiple testing problem. In order to analyze one day at a time, you must use a filter: Data>Select cases>Check “If condition is satisfied” Press “Continue” To perform the T-test: Analyze>Compare means > Independent samples T-test 18 Press Define Groups Use this setting to test Controls(=1) versus SAL (=2) Group Statistics TreatmentType nCell N Mean Std. Deviation Std. Error Mean 1,0 25 761367,040 226360,3365 45272,0673 2,0 14 570000,000 111808,1048 29881,9730 19 Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means 95% Confidence Interval of the Difference F nCell Equal variances assumed 11,302 Sig. t ,002 Equal variances not assumed Interpret the results. Repeat for Controls vs analog, and SAL vs analog. Repeat for day 3 and day 4 (change the filter settings). df Sig. (2- Mean tailed) Difference Std. Error Difference Lower Upper 2,955 37 ,005 191367,0400 64752,4179 60166,1789 322567,9011 3,528 36,631 ,001 191367,0400 54244,7452 81419,3990 301314,6810 20 For each day 2,3 and 4, perform one-way ANOVA analyses. Discuss the multiple testing problem. Set the filter to test a specific day. Analyze>Compare means>One way ANOVA Pess “Post Hoc”: Choose type of adjustment for multiple testing. ANOVA nCell 21 Sum of Squares df Mean Square Between Groups 1020667702643,668 2 510333851321,834 Within Groups 1717146441388,960 48 35773884195,603 Total 2737814144032,628 50 F 14,266 Sig. ,000 22 Post Hoc Tests Multiple Comparisons Dependent Variable: nCell 95% Confidence Interval Tukey HSD (I) TreatmentType (J) TreatmentType 1,0 2,0 191367,0400* 63136,6202 ,011 38671,906 344062,174 3,0 343308,0400* 66423,7336 ,000 182663,063 503953,017 1,0 -191367,0400* 63136,6202 ,011 -344062,174 -38671,906 3,0 151941,0000 74407,2205 ,113 -28011,941 331893,941 1,0 -343308,0400* 66423,7336 ,000 -503953,017 -182663,063 2,0 -151941,0000 74407,2205 ,113 -331893,941 28011,941 2,0 191367,0400* 63136,6202 ,012 34738,756 347995,324 3,0 343308,0400* 66423,7336 ,000 178525,139 508090,941 2,0 3,0 Bonferroni 1,0 Mean Difference (I-J) Std. Error Sig. Lower Bound Upper Bound 23 2,0 3,0 1,0 -191367,0400* 63136,6202 ,012 -347995,324 -34738,756 3,0 151941,0000 74407,2205 ,140 -32647,203 336529,203 1,0 -343308,0400* 66423,7336 ,000 -508090,941 -178525,139 2,0 -151941,0000 74407,2205 ,140 -336529,203 32647,203 *. The mean difference is significant at the 0.05 level. 24 Homogeneous Subsets nCell Subset for alpha = 0.05 TreatmentType Tukey HSDa,b N 1 2 3,0 12 418059,000 2,0 14 570000,000 1,0 25 Sig. 761367,040 ,076 1,000 Means for groups in homogeneous subsets are displayed. a. Uses Harmonic Mean Sample Size = 15,403. b. The group sizes are unequal. The harmonic mean of the group sizes is used. Type I error levels are not guaranteed. Save the output window Discuss the results. Compare with the results for the t-test. Repeat for day 3 and 4. For each day 2,3 and 4, perform paired t-tests comparing the cellcounts between controls – SAL, controls- Analog, and SAL-analog. Here we need a dataset structured in a different way. Import the excel-sheet “CellCount._Total2014_paired.xlsx” Analyze>compare means>Paired samples t-test. 25 Paired Samples Statistics Mean Pair 1 Pair 2 Pair 3 14 181454,010 48495,624 SalDay2 570000,00 14 111808,105 29881,973 ControlsDay2 688023,00 12 248740,327 71805,147 AnalogDay2 418059,00 12 171860,598 49611,881 1590714,286 14 403283,9100 107782,1586 588571,43 14 181240,475 48438,555 1204369,500 12 778366,4114 224695,0286 392621,83 12 171914,332 49627,393 2229214,286 14 418215,2884 111772,7374 733000,00 14 358080,996 95701,172 1825750,000 12 934931,5118 269891,4800 457096,08 12 198504,128 57303,206 ControlsDay3 ControlsDay3 ControlsDay4 SalDay4 Pair 6 Std. Error Mean 824992,86 AnalogDay3 Pair 5 Std. Deviation ControlsDay2 SalDay3 Pair 4 N ControlsDay4 AnalogDay4 26 Paired Samples Correlations N Correlation Sig. Pair 1 ControlsDay2 & SalDay2 14 -,037 ,900 Pair 2 ControlsDay2 & AnalogDay2 12 ,723 ,008 Pair 3 ControlsDay3 & SalDay3 14 -,221 ,448 Pair 4 ControlsDay3 & AnalogDay3 12 ,392 ,208 Pair 5 ControlsDay4 & SalDay4 14 ,075 ,798 Pair 6 ControlsDay4 & AnalogDay4 12 -,228 ,477 27 Paired Samples Test Paired Differences 95% Confidence Interval of the Difference Mean Std. Deviation Std. Error Mean Lower Upper t df Sig. (2-tailed) Pair 1 ControlsDay2 - SalDay2 254992,857 216636,890 57898,644 129910,441 380075,274 4,404 13 ,001 Pair 2 ControlsDay2 - AnalogDay2 269964,000 172105,840 49682,676 160613,167 379314,833 5,434 11 ,000 Pair 3 ControlsDay3 - SalDay3 1002142,8571 477241,2480 127548,0886 726591,9643 1277693,7500 7,857 13 ,000 Pair 4 ControlsDay3 - AnalogDay3 811747,6667 728358,3914 210258,9567 348970,8232 1274524,5101 3,861 11 ,003 Pair 5 ControlsDay4 - SalDay4 1496214,2857 529653,9042 141555,9602 1190401,2261 1802027,3453 10,570 13 ,000 Pair 6 ControlsDay4 - AnalogDay4 1368653,9167 998985,2888 288382,2127 733928,9460 2003378,8873 4,746 11 ,001 Compare the results with the previous results (independent t-tests). BIMM18 – Lab 2 den 16 september 2015 Perform non-parametric paired test corresponding to the tests under 6. Compare the results. Use the paired data set. Analyze>Non parametric tests > Legacy dialogs >2 related samples Wilcoxon Signed Ranks Test Ranks N SalDay2 - ControlsDay2 Mean Rank Sum of Ranks Negative Ranks 12a 8,17 98,00 Positive Ranks 2b 3,50 7,00 Ties 0c Total 14 28 Linda Hartman Karin Källen BIMM18 – Lab 2 den 16 september 2015 AnalogDay3 - ControlsDay2 SalDay3 - ControlsDay3 AnalogDay3 - ControlsDay3 SalDay4 - ControlsDay4 AnalogDay4 - ControlsDay4 Negative Ranks 10d 7,50 75,00 Positive Ranks 2e 1,50 3,00 Ties 0f Total 12 Negative Ranks 14g 7,50 105,00 Positive Ranks 0h ,00 ,00 Ties 0i Total 14 Negative Ranks 10j 7,30 73,00 Positive Ranks 2k 2,50 5,00 Ties 0l Total 12 14m 7,50 105,00 Positive Ranks 0n ,00 ,00 Ties 0o Total 14 Negative Ranks Negative Ranks 11p 7,00 77,00 Positive Ranks 1q 1,00 1,00 Ties 0r Total 12 a. SalDay2 < ControlsDay2 b. SalDay2 > ControlsDay2 c. SalDay2 = ControlsDay2 d. AnalogDay3 < ControlsDay2 e. AnalogDay3 > ControlsDay2 29 Linda Hartman Karin Källen BIMM18 – Lab 2 den 16 september 2015 f. AnalogDay3 = ControlsDay2 g. SalDay3 < ControlsDay3 h. SalDay3 > ControlsDay3 i. SalDay3 = ControlsDay3 j. AnalogDay3 < ControlsDay3 k. AnalogDay3 > ControlsDay3 l. AnalogDay3 = ControlsDay3 m. SalDay4 < ControlsDay4 n. SalDay4 > ControlsDay4 o. SalDay4 = ControlsDay4 p. AnalogDay4 < ControlsDay4 q. AnalogDay4 > ControlsDay4 r. AnalogDay4 = ControlsDay4 Test Statisticsa SalDay2 - AnalogDay3 - SalDay3 - AnalogDay3 - SalDay4 - AnalogDay4 - ControlsDay2 ControlsDay2 ControlsDay3 ControlsDay3 ControlsDay4 ControlsDay4 Z -2,857b -2,824b -3,297b -2,667b -3,296b -2,981b ,004 ,005 ,001 ,008 ,001 ,003 Asymp. Sig. (2tailed) a. Wilcoxon Signed Ranks Test b. Based on positive ranks. 30 Linda Hartman Karin Källen BIMM18 – Lab 2 den 16 september 2015 31 Linda Hartman Karin Källen