Download pearson r correlation coefficient

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Law of large numbers wikipedia , lookup

Elementary mathematics wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
Statistical Analysis - 3
PEARSON R
CORRELATION
COEFFICIENT
Introduction:
Sometimes in scientific data, it appears that two variables are
connected in such a way that when one variable changes, the other
variable changes also. This connection is called a correlation.
Examples of this type of correlation include: (1) in deer populations,
large males seem to have more successful matings; and (2) larger
numbers of birds seem to nest in areas with dense vegetation.
Scientists measure the strength of a relationship between two
variables by calculating a correlation coefficient. The value of the
correlation coefficient indicates to what extent the change found in
one variable relates to change in another. There are several types
of correlation coefficients, but the one that is most widely used is
called the Pearson Product-Moment Correlation Coefficient, or
simply, the Pearson r.
Student Procedure
Example: Your students have done some classroom research on
amphibian species found in your area and have discovered that the
red-backed salamander uses fallen logs and debris on the forest
floor for their home. During their earlier census of their Biodiversity
Plot, they have noticed that some quadrats have many fallen logs
whereas other quadrats have few or none. They expect that they
would find more red-backed salamanders in those quadrats with
many fallen logs and design an experiment to test this hypothesis.
This experiment measures: (1) the number of fallen logs in each
quadrat; and (2) the number of red-backed salamanders in each
quadrat. This is a table of the data your class has collected:
SA 3.1
Statistical Analysis - 3
Q uadrat N um ber
# F a lle n L o g s
# S a la m a n d e r s
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
4
6
2
3
7
1
0
2
0
3
2
4
0
2
2
1
3
5
3
2
2
0
0
2
1
3
2
0
1
3
0
0
0
0
1
1
2
0
1
0
0
1
3
1
2
1
0
0
0
0
Step 1:
Graphing a the data
Graph your data by computer, or by hand, by assigning the number
of fallen logs (the second column) as the X-axis and the number of
salamanders (the last column) as the Y-axis. For example, in
Quadrat 1, the X-value would be 4 and the Y-value would be 3. The
results, when we plot all 25 points on the graph, look like this:
RELATIONSHIP OF FALLEN LOGS AND SALAMANDERS
3.5
Number of Salamanders
3
2.5
2
1.5
1
0.5
0
0
1
2
3
4
Number of Fallen Logs
SA 3.2
5
6
7
8
Statistical Analysis - 3
Looking at this graph, there seems to be a positive relationship
between the number of fallen logs and the number of salamanders.
In other words, it appears that when the number of fallen logs
increases, the number of Red-Backed Salamanders also increases.
Some things to remember about the Pearson r correlation:
• The lowest value that the Pearson r can have is r = 0.00. This
means there is ZERO correlation, and would indicate that X and Y
are not related to one another.
• The highest value that the Pearson r can have is r = 1.00. This
indicates a PERFECT correlation and would indicate that X and Y
are completely related to one another in the sample.
• Pearson r values can be either positive or negative. A positive
value indicates that increases in X correspond to increases in Y. A
negative value indicates that increases in one variable are
associated with decreases in the other variable.
The following graphs illustrate some of the various types of correlations
possible:
a. This is an example of a perfect, positive
correlation, in that the data shows no
deviation from a straight line.
b. This is an example of a perfect, negative
correlation, in that the data shows no
deviation from a straight line.
c. This is an example of a high, positive
correlation. Since the data shows some
variability, a perfect prediction cannot be
made.
d. This is an example of a high, positive
correlation. Since the data shows some
variability, a perfect prediction cannot be
made.
e. This shows a low correlation. Although
predictions could be made, and those
predictions would be slightly better than
chance, estimates would still be imprecise.
f. This figure shows a zero correlation.
Prediction would be no better than chance.
SA 3.3
Statistical Analysis - 3
Step 3:
Calculating the Pearson r Correlation
Coefficient
The graph below was produced by Microsoft Excel (charting
function) which calculated a correlation coefficient from the data in
our example. The graph shows a trend indicating an increase in
salamanders where there are more fallen logs present. Note,
however, that the value calculated by this program is the Pearson r
value squared. You must take the square root of this figure to give
the Pearson r value. From the graph: R2 = 0.72; Pearson r = 0.85.
Because 0.85 is close to 1.0 (the maximum value for the Pearson r),
this demonstrates a strong, positive correlation.
RELATIONSHIP OF FALLEN LOGS AND SALAMANDERS
3.5
R2 = 0.7175
Pearson r = 0.85
Number of Salamanders
3
2.5
2
1.5
1
0.5
0
0
1
2
3
4
5
6
7
8
-0.5
Number of Fallen Logs
If not using the Excel Software, or other graphing program, you can
calculate the Pearson r by using the following formula:
FORMULA FOR CALCULATING THE PEARSON R
CORRELATION COEFFICIENT
Pearson r =
(∑ XY )− (∑ X )(∑ Y)
[N(∑ X )− (∑ X) ][N(∑ Y )− (∑ Y) ]
N
2
SA 3.4
2
2
2
Statistical Analysis - 3
This formula looks complicated, but can be simplified by breaking it
into its separate components. Using your original data, create the
following table:
Q uadrat
Num ber
(N)
# Fallen
Logs
(X)
# Salam anders
(Y)
X2
Y2
XY
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Σ N = 25
4
6
2
3
7
1
0
2
0
3
2
4
0
2
2
1
3
5
3
2
2
0
0
2
1
Σ X = 25
3
2
0
1
3
0
0
0
0
1
1
2
0
1
0
0
1
3
1
2
1
0
0
0
0
Σ Y = 25
16
36
4
9
49
1
0
4
0
9
4
16
0
4
4
1
9
25
9
4
4
0
0
4
1
2
ΣX =
25
9
4
0
1
9
0
0
0
0
1
1
4
0
1
0
0
1
9
1
4
1
0
0
0
0
2
ΣY =
25
12
12
0
3
21
0
0
0
0
3
2
8
0
2
0
0
3
15
3
4
2
0
0
0
0
Σ XY =
25
Using the values from the new table, complete the Pearson r
formula:
Pearson r =
(∑ XY )− (∑ X )(∑ Y)
[N(∑ X )− (∑ X) ][N(∑ Y )− (∑ Y) ]
N
2
2
2
2
SA 3.5
Statistical Analysis - 3
The numerator, or top of the formula, looks like this once we plug in
all the numbers:
Pearson r =
(25)(90) − (57)(22)
[25(213) − (57)2 ][25(46) − (22)2 ]
Pearson r =
2250 − 1254
[5325 − 3249][1150 − 484]
Pearson r =
996
[2076][666]
Pearson r =
996
1382616
Pearson r =
996
1175.85
Pearson r = 0.8471
Again, this Pearson r correlation coefficient (being extremely close to
the maximum value 1.0) demonstrates a strong positive correlation
between the number of fallen logs and the number of salamanders.
SA 3.6
Statistical Analysis - 3
Step 4:
Determine if your calculations have
statistical significance
You must determine whether or not your calculations have statistical
significance. To do this you must determine the ‘critical value’ for
your Pearson r correllation coefficient by using the following table:
So for our example:
1. Calculate the degrees of freedom (DF) by subtracting the 2 from
the number of comparisons you are making (DF = N - 2)
In our case, we are sampling fallen logs and salamanders in
25 quadrats (N = 25)
DF = 25 - 2 = 23
2. Find your DF on the table below and find the critical value allowed.
In our case, the nearest DF listed is 25 with a critical value of
0.3233. Our calculated Pearson r correlation coefficient is
0.8471.
3. The calculated figure is greater than the critical value from the
table; our findings have statistical significance. Therefore, we can
assume that our hypothesis is true and that there is a strong
positive correlation betwen the number of fallen logs and the
number of salamanders and this correlation is not due to chance.
SA 3.7
Statistical Analysis - 3
CRITICAL VALUES FOR THE PEARSON R
CORRELLATION COEFFICIENT
SA 3.8
Critical
Value
DF
(N - 2)
(5% certainty)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
25
30
35
40
45
50
60
70
80
90
100
.98769
.90000
.8054
.7293
.6694
.6215
.5822
.5494
.5214
.4973
.4762
.4575
.4409
.4259
.4124
.4000
.3887
.3783
.3687
.3598
.3233
.2960
.2746
.2573
.2428
.2306
.2108
.1954
.1829
.1726
.1638