* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download EquationsStudy
Survey
Document related concepts
Transcript
Math Symbols, Equations, and Examples
Name
Symbol
Size
Proportion
Mean
Standard Deviation
Median
Correlation Coefficient
Spearman rho
Quartiles
z-score
Other Terms:
FNS
IQR
Max
Min
Sample
N
Μ
π
Μ
π
s
M
r
Ο
Q1, Q2, Q3
zi
Five Number Summary
InterQuartileRange
Maximum value
Minimum value
Names
Population
N
π
µ
Ο
M
Ignore
Ignore
Q1, Q2, Q3
zi
p-hat, pi
x-bar, myu
s, sigma
M
are
rho
zee
Min, Q1, Median, Q3, Max
Q3 β Q1
Max(xi)
Min(xi)
Note that all equations for Sample and Population are the same (symbols sometimes differ) except for
standard deviation, and that only differs in the denominator.
Size
N is the number of Observational Units (OUs) in the population.
n is the number of OUs in the sample.
Proportion
Sample Proportion
For n = n1+ n2, where Niβs represent the number in a subgroup of the sample. (e.g. n1 might be
the number of males, and n2 the number of females in the sample). A proportion is ni/n. The proportion
Μ = n1/n.
of n1 is π
Sample Proportion
For N = N1+ N2, where Niβs represent the number in a subgroup. The proportion of N1 is
Ο=N1/N.
Also see handout on proportions if this is not sufficient.
Mean (Measure of Center)
For a data set {x1, x2,β¦, xn}
Sample Mean
Population Mean
µ=
π₯1 + π₯2 + β― + π₯π
π
Standard Deviation (Measure of Spread)
Of a Sample
π = β
2
βπ
π=1(π₯π βπ₯Μ
)
πβ1
of a Population
π= β
2
βπ
π=1(π₯π βµ)
π
Without the sigma notation, equivalently:
π = β
(π₯1 βπ₯Μ
)2 +(π₯2 βπ₯Μ
)2 +β―+(π₯π βπ₯Μ
)2
π= β
πβ1
(π₯1 βµ)2 +(π₯2 βµ)2 +β―+(π₯π βµ)2
π
Note that in this notation, it looks similar to the equations for the mean. Note that the n-1 exists to
make sure that s is an unbiased estimator of Ο. You donβt have to understand why, just the fact that s is
unbiased is important.
Z-Score
Z-score is the standardized value of a data point.
Sample
π§π =
Population
π§π =
π₯π βπ₯Μ
π
π₯π βµ
π
Correlation Coefficient (Measure of Association)
π
π₯ β π₯Μ
π¦π β π¦Μ
1
π=
β( π
πβ1
π π₯ ) ( π π¦ )
π=1
In terms of z-scores
π
1
π=
β π§ππ₯ π§ππ¦
πβ1
π=1
So if you calculate the z-scores for a data set (or, equivalently, standardize the data) then the z-scores
can be used to calculate the correlation coefficient, r. Itβs fine if you just recall the latter equation.
Spearman rho, Ο, is a non-parametric correlation coefficient. Use it for ordered, ranked, or non-linear
data.1
Median, Q1, Q3, Min, Max
How to do these are in the notes. Min and Max are easy, as they are the maximum and minimum values
in the dataset.
1
There are better non-parametric measures of association for non-linear data, but for now just Spearmanβs rho
will do.
EXAMPLES
Two values are measured in an experiment with ten subjects. A dosage value, x, and a response value,
y.
The data appears in the following table.
OU#
X
Y
1
0
1
2
3
2
3
1
0
4
2
2
5
4
4
6
5
10
7
3
4
8
6
10
1) Find the sample means for x and y:
π₯Μ
=
π₯1 + π₯2 + β― + π₯π
0 + 3 + 1 + 2 + β―+ 6
=
= 3
π
8
π¦Μ
=
π¦1 + π¦2 + β― + π¦π
1 + 2 + 0 + 2 + β― + 10
=
= 4
π
8
2) Find the sample standard deviations for x and y:
(using the values for π₯Μ
and π¦Μ
previously calculated)
2
βπ
π=1(π₯π βπ₯Μ
)
π π₯ = β
π π₯ = β
πβ1
28
7
=β
(π₯1 βπ₯Μ
)2 +(π₯2 βπ₯Μ
)2 +β―+(π₯π βπ₯Μ
)2
πβ1
=β
(0β3)2 +(3β3)2 +β―+(6β3)2
7
= β4 = 2
Likewise, sy=4
3) Calculate z-scores for the x and y values:
OU#
X
Zix
Y
Ziy
1
0
-1.5
1
-1
2
3
0
2
-0.5
3
1
-1
0
-1
4
2
-0.5
2
-0.5
5
4
.5
4
0
6
5
0
10
1.5
7
3
1
4
0
8
6
1.5
10
1.5
π₯ βπ₯Μ
Weβll do the first few for x, and using π₯Μ
= 3 and π π₯ = 2, with π§π₯π = π
π
Where π₯Μ
= 3, π§ππ₯ = 0 (because π₯π β π₯Μ
= 3 - 3 = 0), so π§2π₯ and π§7π₯ are done. For
π§1π₯ = (0-3)/2 = -3/2 = -1.5
π§3π₯ = (1-3)/2 = 2/2 = 1
π§4π₯ = (2-3)/2 = -1/2 = -0.5
And so on.
Do the same for y and the ziyβs, with π¦Μ
= 4 and π π¦ = 4, and get the values for the
last row.
4) Calculate the regression coefficient, r, for x and y.
Using the values for zix and ziy, and the equation for r:
π
1
π=
β π§ππ₯ π§ππ¦
πβ1
π=1
1
π = (π§1π₯ π§1π¦ + π§2π₯ π§2π¦ + π§3π₯ π§3π¦ + +π§4π₯ π§4π¦ + β― + π§ππ₯ π§ππ¦ )
7
1
π = ((β1.5)(β1) + (0)(β0.5) + (β1)(β1) + (β0.5)(β0.5) + β― + (1.5)(1.5))
7
π=
1
β 6.5 = 0.928571
7
Thatβs it.
Recall that the use of r is quite restrictive. This is discussed in the lecture notes.
I the data is non-linear or ranked or ordered, then a non-parametric measure of association such as
Spearmanβs rho must be used.
Sigma Operator
Symbols in math, like +, -, ÷, etc. are called operators. β is the βsigma operatorβ
The sigma operator sums whatever is in the argument based on the indices, i, the lower bound
(in this case 1) and n, which marks the upper bound. So, for data set {z 1, z2, z3,β¦zn},
βππ=1 π§π means βsum all zβs starting from 1 and continue to n,β or
π
β π§π
= π§1 + π§2 + π§3 + β― + π§π
π=1
Notice that the lower and upper bounds can be anything you choose, and the indice can also
be used in the argument in a mathematic sense (not just and index):
4
β π§π
= π§2 + π§3 + π§4
π=2
6
β π π§π
= π§1 + 2π§2 + 3π§3 + 4π§4 β 5π§5 + 6π§6
π=1
I replaced i with k, just to show you can index with any letter. i is a popular choice.