Download Bio 286: Worksheet 3-Replication, transformations and power

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Bio 286: Worksheet 3-Replication, transformations and power – answer key
Worksheet 3: Replication, transformations and power
Replication
1ai: The number of replicates per county is 12 because sites are the independent replicates for
each county. Degrees of freedom are 11 (12-1) for each county.
1aii: The replicates for the comparison of sites within Monterey county are 25. Here each plots
is a replicate for the site.
1bi: Ho – there is no difference in diving depths for otters before and after eating.
1bii: Paired t-test
1biii: replicates are the 50 otters. The degrees of freedom are 49.
Transformations
2ai: A two sample T-test
2aii: use DISTRIBUTION and put ‘mil’ in the Y box. Then look at the box plot and frequency
distribution. Click on the Mil red triangle then on CONTINUOUS FIT, then on NORMAL.
Now click on NORMAL and then on DIAGNOSTIC FIT. Look at the plot – does it look
normal? If not what type of transformation might help? Try the following click on Mil then on
CONTINUOUS FIT then on LOGNORMAL. Now click on LOGNORMAL and then on
DIAGNOSTIC FIT. Look at the plot – Now does it look like the data fit the distribution?
Now lets look at homogeneity of variance. Here you want to use TABLES, SUMMARY and
click on ‘mil’ then on STATISTICS, VARIANCE. Now put ‘Urban 2’ in the GROUP box.
Click ok. Check the VARIANCE term. Then on the MEAN term. Remember one of the
assumptions for Two sample t-test is homogeneity of variance. Are the two variance terms
similar? Perhaps try a log transformation
2aiiia: Log transform
2aiiib: The box plot and probability plots suggest a log normal distribution; also the variance
terms differ in a way suggestive of log normal data (variance scales with the mean). Try making
a new variable call it ‘Logmil’. First – go to the data window. Click on COLS then on NEW
COLUMN and enter ‘Logmil’. Now go to that column – and right click it the variable name,
then on FORMULA. A window will open. On the right side is the FUNCTIONS window.
Click on TRANSCENDENTAL then on LOG10. That function will show up in the open
window at the bottom. Now click on ‘mil’ from the TABLE COLUMNS window. This will
insert ‘mil’ in the LOG10 function. Now click OK and you will have transformed ‘mil’ to Log
base 10 ‘mil’.
1|Page
Bio 286: Worksheet 3-Replication, transformations and power – answer key
2aiiic: Look at the tables and graphs below. Look at the probability graphs and the compare the
variance terms for untransformed and transformed variables. ‘Logmil’ meets the assumptions of
normality and homogeneity of variance (with urban2 being the grouping variable) better than
‘Mil’
2|Page
Bio 286: Worksheet 3-Replication, transformations and power – answer key
2aiiid: To conduct a t-test: ANALYZE > FIT Y BY X > (add the variables) > OK > [red
triangle] > MEANS ANOVA POOLED T ~or~ T TEST. Based on the results below we can
reject the null hypothesis. Urban countries spend more on military than do rural ones.
Difference
Std Err Dif
Upper CL Dif
Lower CL Dif
Confidence
Mean(Logmil) with 95% CI ( pooled)
-1.0
-0.5
-0.8421
0.1622
-0.5168
-1.1674
0.95
0.0
0.5
t Ratio
DF
Prob > |t|
Prob > t
Prob < t
-5.19043
54
<.0001*
1.0000
<.0001*
1.0
2
1.5
1
0.5
0
city
rural
Urban 2
3|Page
Bio 286: Worksheet 3-Replication, transformations and power – answer key
2bi: These data are more complicated than earlier. The box plot looks ok but the probability plot
is very strange. It has the characteristic look of data that are in need of an ARCSIN
transformation (ask why this is the case if you are uncertain)
Prop
2bii:
4|Page
Bio 286: Worksheet 3-Replication, transformations and power – answer key
aprop
5|Page
Bio 286: Worksheet 3-Replication, transformations and power – answer key
2biii: They are much more appropriate for a t-test. The data are now much more normally
distributed
2biv: There is more free space low in the tide zone. Here I am showing both the raw data (prop)
and the transformed data (aprop). I am also showing the confidence interval for aprop as this is
the variable used in the analysis and am using the within group standard error for prop to show
the variability in the data.
t Test
High-Low
Assuming equal variances
Difference
-0.37650 t Ratio
Std Err Dif
0.08400 DF
Upper CL Dif -0.20761 Prob > |t|
Lower CL Dif -0.54540 Prob > t
Confidence
0.95 Prob < t
-4.48208
48
<.0001*
1.0000
<.0001* -0.4
-0.2
0.0 0.1 0.2 0.3 0.4
Mean(PROP) & Mean(aprop) vs. TIDEHEIGHT
Mean(PROP)
Mean(aprop)
1.0
aprop
0.8
0.6
0.4
0.2
0.0
0.7
0.6
PROP
0.5
0.4
0.3
0.2
0.1
0.0
Low
High
TIDEHEIGHT
6|Page
Bio 286: Worksheet 3-Replication, transformations and power – answer key
3b: The power is very low (.1236). Hence we had a very low likelihood of getting a significant
result. Now you need to ask yourself the single most important question “What size effect did I
want to be able to detect?”
3c: Now ask yourself – was the power of the test high or low.
7|Page
Bio 286: Worksheet 3-Replication, transformations and power – answer key
4.
The graphs below show a cumulative distribution of means and the frequency distribution of
means. The cumulative distribution is usually easier to understand. Ere it show that 95% of
means (from resampling) occur between 66 and 68.136. This is the two tailed confidence
interval and it contains (just barely) the null value – 68.
95%
66
68.136
8|Page