Download Biol2050 2014 stats primer – final

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pattern recognition wikipedia , lookup

Predictive analytics wikipedia , lookup

Corecursion wikipedia , lookup

Regression analysis wikipedia , lookup

Psychometrics wikipedia , lookup

Least squares wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Data assimilation wikipedia , lookup

Data analysis wikipedia , lookup

Transcript
Ecology
reporting
and statistical
analysis
Chris Luszczek
Biol2050
Introduction
• Please treat this slide show as a statistics manual for
Biol2050
• This tutorial will provide you with the basics of various
common statistical methods and examples of how to
perform these tests using SPSS statistical software
available in York computer labs and accessible from
home using York’s remote File Access System (FAS)
• *WARNING* The FAS may involve a lengthy installation
procedure and I have found it to be finicky, sometimes
requiring multiple tempts at installation. Be aware of
this if you are downloading the software at home… at
midnight the evening before your report is due.
Outline
1) Hypothesis Building
– Null hypothesis/alternate hypothesis
2) Hypothesis Testing
3) Common Statistical tests and how to run them
A) Correlation
B) t-test
C) ANOVA
4) Graphing
– how to present your findings
– Types of graphs and usage
– formatting
1) Hypothesis Building
• Creating testable hypothesis is central to scientific method
– Null (Ho) hypothesis – ‘no effect’ or ‘no difference’ between samples
or treatments
– Alternative (Ha) hypothesis – experimental treatment has a certain
statistically significant
– A claim for which we are trying to find evidence
Example
Ho: “Different habitats on the York university campus display no
differences in diversity”
(Ho: x2=x1 or x2-x1=0)
Ha: “Grassland habitats at York University contain higher diversity than
managed or landscaped areas”
(Ha: x2>x1or x2-x1> 0)
2) Hypothesis Testing
• Either reject or fail to reject the H0 based on statistical
testing
• Statistical testing compares the p-value of observed
data to an assigned significance level (α)
– p-value – the frequency or probability with which the
observed event would occur
– α = the probability that the outcome did not occur by
chance
• Popular levels of significance are 5% (0.05), 1% (0.01), and 0.1%
(0.001)
IF p-value is SMALLER than α reject the null hypothesis (H0)
Hypothesis Testing Visual Summary
Are the means significantly different?
Sample distribution 1
Sample distribution 2
Mean 1
Mean 2
3) Common Statistical Tests in Ecology
• T-tests
• ANOVA - Analysis Of VAriance
• Correlation
Common Statistical Tests in Ecology
• T-tests: used to determine if two sets of data
(2 means) are significantly different from each
other. It assumes that data is normally
distributed and samples are equal.
– 2 decisions must be made when selecting a t-test:
• Paired vs. independent
• 1-tailed vs 2-tailed
• ANOVA - Analysis Of VAriance
• Correlation
3A)
T-test
• One-sample (paired) t-test: Compares two samples in
cases where each value in one sample has a natural
partner in the other (data are not independent). Used
on pre/post data . Also compares a sample mean to a
specified value
– Comparing patient performance before and after the
application of a drug (repeated measures sampling – the
same subjects are measures before and after treatment)
• Two- sample (Independent) t-test: compares means
for two groups of cases.
– Comparing patient performance in a group receiving a
drug versus a separate group receiving a trial drug
3A)
T-test
• One-tailed/sided t-test: expect the effect to be in
a certain direction
– “is the sample mean greater than µ?”
– “is the sample mean less than µ?”
H0 : µ = 𝜇0 , where 𝜇0 is known
HA : µ > 𝜇0 or µ < 𝜇0
• Two-tailed/sided t-test: testing for different
means regardless of direction
– “is there a significant difference?”
H0 : 𝜇1 = 𝜇2
HA : 𝜇1 ≠ 𝜇2
Match Your Hypothesis and Test!
A carefully stated experimental hypothesis with indicate the type of
effect you are looking for
For example, the hypothesis that
"Coffee improves memory“
– suggests paired, one tailed because you will repeatedly
measure the same participants and expect an improvement.
"Men weigh a different amount from women“
- suggests an independent two tailed test as no direction is
implied.
So remember, don't be vague with your hypothesis if you are looking
for a specific effect! Be careful with the null hypothesis too - avoid "A
does not effect B" if you really mean "A does not improve B".
Running a T-test in SPSS
• Question: Do the fish in lake 1 and lake 2
weigh the same?
• Null hypothesis: 𝜇1 = 𝜇2 (the fish in lake 1
weigh the same as the fish in lake 2)
– An independent, 2-tailed test!
• Alternative hypothesis : 𝜇1 ≠ 𝜇2 (the fish in
lake 1 and lake 2 DO NOT weigh the same)
1)
2)
1) Data from an excel sheet can be opened in SPSS –Sometimes will
automatically see a summary of your data rather than the data – to
correct:
2) Click Data view tab rather than Variable view
Data View / entry
Select Analyze  compare means  independent samples t test
Weight is the test variable
Lake is the grouping variable (Click on define groups and type the
two names used in the data view)
Output
Levene’s test – Assesses if variances
are equal, if greater p > 0.05 you can
interpret the t results
* Given the quality of data collected
in these labs assume that the data
fulfills the Levene’s test and go on to
interpret t-test*
Our example: Levene’s (p = 0.669) so we can interpret the t-test (p = 0.01) so we can
reject the null hypothesis, thus the fish from lake 1 and lake 2 DO NOT weigh the same.
How to report: two-sample t(df) = t-value, p = p-value
(two-sample t(12) = -3.065, p = 0.01)
Common Statistical Tests in Ecology
• T-tests:
• ANOVA - Analysis Of VAriance
– Comparing more than two groups of means
– Compares variance within groups and between
groups
– Parametric, extension of two-tailed t-test
• Correlation:
3B)
ANOVA
• Analysis Of VAriance (ANOVA)
Examples:
• Is tree density at all York habitats the same?
• Does insect diversity in York grasslands differ
from insect diversity in York woodlots and
human impacted?
• 3 means being compared
H0 : µ1 = µ2 = µ3 = … = µk
where k = number of related groups
HA: one or more means are different
Running an ANOVA
You sample four fish from each of three
lakes to determine if the fish from the
three lakes all weight the same.
H0 : There is no difference in fish weight
between lakes
H0 : 𝜇𝐿𝑎𝑘𝑒 1 = 𝜇𝐿𝑎𝑘𝑒 2 = 𝜇𝐿𝑎𝑘𝑒 3
HA : 𝜇𝐿𝑎𝑘𝑒 1 ≠ 𝜇𝐿𝑎𝑘𝑒 2 ≠ 𝜇𝐿𝑎𝑘𝑒 3
Select Analyze  compare means  one way
ANOVA
*IMPORTANT*
Select post hoc Tukey  continue  OK
Running the ANOVA will identify IF differences
between groups exist.
Running a post hoc test will test all
combinations to determine WHICH groups are
difference from each other
Sig. difference between groups
Lake 1 and 2 are not
significantly different but both
are sig. different from lake 3
(based on α = 0.05)
Common Statistical Tests in Ecology
• T-tests
• ANOVA - Analysis Of VAriance
• Correlation: Indicates the strength and
direction of a linear relationship between two
random variables
H0 : no relationship between variables
HA : there is a relationship between variables
3C)
Correlation
• Pearson’s Correlation Coefficient (r) – measures
the relationship between two variables
• r always lies between -1 and +1
– Positive r-values means that the two variables increase with each
other. Negative r-values mean they decrease with each other
– r-values close to zero mean the variables have no relationship. r-values
close to either -1 or 1 mean the relationship is strong.
– Generally, for ecological data, r greater than 0.5 is considered very
strong and a correlation less than 0.2 is considered weak.
– R2 (coefficient of determination) is the percent of the data that is
closest to the line of best fit or a measure of how well the regression
lines represents the data.
Correlation Example 1
• Is there a relationship between the bird
diversity and plant diversity in a given habitat?
H0 : no relationship
between variables
HA : there is a
relationship between
variables
r=0.3
Correlation Example 2
• Is there a relationship between plant density
and a) bare ground b) soil pH c) species
richness?
H0 : no
relationship
between variables
HA : there is a
relationship
between variables
Running a Correlation
Hypothesize that there is a
relationship between mean fish
length and lake size (larger lakes
might have larger fish).
Collected data from 21 lakes.
Select Graphs  legacy dialogs  scatter 
define
(lake size x variable and fish length y variable)
Select Graphs  legacy dialogs  scatter  define (lake size x variable and fish length y
variable)
r = 0.824
p < 0.001
Therefore, there is a HIGHLY
SIGNIFICANT, STRONG, POSITIVE
relationship between fish length
and lake size.
Outline
1) Hypothesis Building
– Null hypothesis/alternate hypothesis
2) Hypothesis Testing
3) Common Statistical tests and how to run them
A) Correlation
B) t-test
C) ANOVA
4) Graphing
– how to present your findings
– Types of graphs and usage
– formatting
Choosing Graphs
Your hypothesis and statistical test should guide your
choice of figures!
• As we have seen some
tests are related to
specific figures
– Correlations and
Scatter plots
The following slides outline
the basic use of several
common graphs
–
–
–
–
Scatter plots
Line Graphs
Bar graphs
Histograms
Scatter plot
• Displays 2 variables for a set of data
• Dependant vs. independent – one variable is under the control of
the other variable (Regression Analysis)
OR
• If we have no dependent variable, a scatter plot will show the
degree of correlation (NOT CAUSATION!)
Line graph
• Shows relationship between values plotted on
each axis (dependant vs. independent)
– Used on continuous variables
Bar graph
• Used for discreet quantitative variables which
are similar but not necessarily related
• Often use ANOVA to test difference
Making Proper Error bars in Excel
Excel will apply the same error to all bars if you use the automatic error bar feature.
To produce proper, interpretable error bars you must:
1) Calculate standard error for your data:
- First calculate standard deviation using the “STDEV.S” function
- Then divide standard deviation by the square root of n (observations
per group) to give you Standard Error.
2) Different versions of excel hide the ‘custom error bar’ option in different places –
- try selecting the data bars  right click  select ‘format data series’ 
‘error bars’
OR
- try clicking the graph  move to ‘layout’ under ‘chart tools’ tab  ‘error
bars’
3) select ‘custom’ and ‘specify value’
4) Be sure to select the ‘range’ of SE values to match the range of selected data for
both the positive and negative error value
Proper Error Bars
1)
2)
See previous slide
for explanation of
steps.
4)
3)
Histogram
• Used exclusively for showing the
distribution of data that are continuous.
Conclusion
• This tutorial has provided you with the basic
theory, mechanics and applications of
common statistical tests.
• You should now be able to carry out scientific
reporting from hypothesis formation to
statistical testing and figure formatting.