Download EquationsStudy

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Math Symbols, Equations, and Examples
Name
Symbol
Size
Proportion
Mean
Standard Deviation
Median
Correlation Coefficient
Spearman rho
Quartiles
z-score
Other Terms:
FNS
IQR
Max
Min
Sample
N
Μ‚
𝒑
Μ…
𝒙
s
M
r
ρ
Q1, Q2, Q3
zi
Five Number Summary
InterQuartileRange
Maximum value
Minimum value
Names
Population
N
𝝅
µ
Οƒ
M
Ignore
Ignore
Q1, Q2, Q3
zi
p-hat, pi
x-bar, myu
s, sigma
M
are
rho
zee
Min, Q1, Median, Q3, Max
Q3 – Q1
Max(xi)
Min(xi)
Note that all equations for Sample and Population are the same (symbols sometimes differ) except for
standard deviation, and that only differs in the denominator.
Size
N is the number of Observational Units (OUs) in the population.
n is the number of OUs in the sample.
Proportion
Sample Proportion
For n = n1+ n2, where Ni’s represent the number in a subgroup of the sample. (e.g. n1 might be
the number of males, and n2 the number of females in the sample). A proportion is ni/n. The proportion
Μ‚ = n1/n.
of n1 is 𝒑
Sample Proportion
For N = N1+ N2, where Ni’s represent the number in a subgroup. The proportion of N1 is
Ο€=N1/N.
Also see handout on proportions if this is not sufficient.
Mean (Measure of Center)
For a data set {x1, x2,…, xn}
Sample Mean
Population Mean
µ=
π‘₯1 + π‘₯2 + β‹― + π‘₯𝑁
𝑁
Standard Deviation (Measure of Spread)
Of a Sample
𝑠= √
2
βˆ‘π‘›
𝑖=1(π‘₯𝑖 βˆ’π‘₯Μ… )
π‘›βˆ’1
of a Population
𝜎= √
2
βˆ‘π‘›
𝑖=1(π‘₯𝑖 βˆ’µ)
𝑁
Without the sigma notation, equivalently:
𝑠= √
(π‘₯1 βˆ’π‘₯Μ… )2 +(π‘₯2 βˆ’π‘₯Μ… )2 +β‹―+(π‘₯𝑛 βˆ’π‘₯Μ… )2
𝜎= √
π‘›βˆ’1
(π‘₯1 βˆ’µ)2 +(π‘₯2 βˆ’µ)2 +β‹―+(π‘₯𝑁 βˆ’µ)2
𝑁
Note that in this notation, it looks similar to the equations for the mean. Note that the n-1 exists to
make sure that s is an unbiased estimator of Οƒ. You don’t have to understand why, just the fact that s is
unbiased is important.
Z-Score
Z-score is the standardized value of a data point.
Sample
𝑧𝑖 =
Population
𝑧𝑖 =
π‘₯𝑖 βˆ’π‘₯Μ…
𝑠
π‘₯𝑖 βˆ’µ
𝜎
Correlation Coefficient (Measure of Association)
𝑛
π‘₯ βˆ’ π‘₯Μ… 𝑦𝑖 βˆ’ 𝑦̅
1
π‘Ÿ=
βˆ‘( 𝑖
π‘›βˆ’1
𝑠π‘₯ ) ( 𝑠𝑦 )
𝑖=1
In terms of z-scores
𝑛
1
π‘Ÿ=
βˆ‘ 𝑧𝑖π‘₯ 𝑧𝑖𝑦
π‘›βˆ’1
𝑖=1
So if you calculate the z-scores for a data set (or, equivalently, standardize the data) then the z-scores
can be used to calculate the correlation coefficient, r. It’s fine if you just recall the latter equation.
Spearman rho, ρ, is a non-parametric correlation coefficient. Use it for ordered, ranked, or non-linear
data.1
Median, Q1, Q3, Min, Max
How to do these are in the notes. Min and Max are easy, as they are the maximum and minimum values
in the dataset.
1
There are better non-parametric measures of association for non-linear data, but for now just Spearman’s rho
will do.
EXAMPLES
Two values are measured in an experiment with ten subjects. A dosage value, x, and a response value,
y.
The data appears in the following table.
OU#
X
Y
1
0
1
2
3
2
3
1
0
4
2
2
5
4
4
6
5
10
7
3
4
8
6
10
1) Find the sample means for x and y:
π‘₯Μ… =
π‘₯1 + π‘₯2 + β‹― + π‘₯𝑛
0 + 3 + 1 + 2 + β‹―+ 6
=
= 3
𝑛
8
𝑦̅ =
𝑦1 + 𝑦2 + β‹― + 𝑦𝑛
1 + 2 + 0 + 2 + β‹― + 10
=
= 4
𝑛
8
2) Find the sample standard deviations for x and y:
(using the values for π‘₯Μ… and 𝑦̅ previously calculated)
2
βˆ‘π‘›
𝑖=1(π‘₯𝑖 βˆ’π‘₯Μ… )
𝑠π‘₯ = √
𝑠π‘₯ = √
π‘›βˆ’1
28
7
=√
(π‘₯1 βˆ’π‘₯Μ… )2 +(π‘₯2 βˆ’π‘₯Μ… )2 +β‹―+(π‘₯𝑛 βˆ’π‘₯Μ… )2
π‘›βˆ’1
=√
(0βˆ’3)2 +(3βˆ’3)2 +β‹―+(6βˆ’3)2
7
= √4 = 2
Likewise, sy=4
3) Calculate z-scores for the x and y values:
OU#
X
Zix
Y
Ziy
1
0
-1.5
1
-1
2
3
0
2
-0.5
3
1
-1
0
-1
4
2
-0.5
2
-0.5
5
4
.5
4
0
6
5
0
10
1.5
7
3
1
4
0
8
6
1.5
10
1.5
π‘₯ βˆ’π‘₯Μ…
We’ll do the first few for x, and using π‘₯Μ… = 3 and 𝑠π‘₯ = 2, with 𝑧π‘₯𝑖 = 𝑖
𝑠
Where π‘₯Μ… = 3, 𝑧𝑖π‘₯ = 0 (because π‘₯𝑖 βˆ’ π‘₯Μ… = 3 - 3 = 0), so 𝑧2π‘₯ and 𝑧7π‘₯ are done. For
𝑧1π‘₯ = (0-3)/2 = -3/2 = -1.5
𝑧3π‘₯ = (1-3)/2 = 2/2 = 1
𝑧4π‘₯ = (2-3)/2 = -1/2 = -0.5
And so on.
Do the same for y and the ziy’s, with 𝑦̅ = 4 and 𝑠𝑦 = 4, and get the values for the
last row.
4) Calculate the regression coefficient, r, for x and y.
Using the values for zix and ziy, and the equation for r:
𝑛
1
π‘Ÿ=
βˆ‘ 𝑧𝑖π‘₯ 𝑧𝑖𝑦
π‘›βˆ’1
𝑖=1
1
π‘Ÿ = (𝑧1π‘₯ 𝑧1𝑦 + 𝑧2π‘₯ 𝑧2𝑦 + 𝑧3π‘₯ 𝑧3𝑦 + +𝑧4π‘₯ 𝑧4𝑦 + β‹― + 𝑧𝑛π‘₯ 𝑧𝑛𝑦 )
7
1
π‘Ÿ = ((βˆ’1.5)(βˆ’1) + (0)(βˆ’0.5) + (βˆ’1)(βˆ’1) + (βˆ’0.5)(βˆ’0.5) + β‹― + (1.5)(1.5))
7
π‘Ÿ=
1
βˆ— 6.5 = 0.928571
7
That’s it.
Recall that the use of r is quite restrictive. This is discussed in the lecture notes.
I the data is non-linear or ranked or ordered, then a non-parametric measure of association such as
Spearman’s rho must be used.
Sigma Operator
Symbols in math, like +, -, ÷, etc. are called operators. βˆ‘ is the β€˜sigma operator’
The sigma operator sums whatever is in the argument based on the indices, i, the lower bound
(in this case 1) and n, which marks the upper bound. So, for data set {z 1, z2, z3,…zn},
βˆ‘π‘›π‘–=1 𝑧𝑖 means β€œsum all z’s starting from 1 and continue to n,” or
𝑛
βˆ‘ 𝑧𝑖
= 𝑧1 + 𝑧2 + 𝑧3 + β‹― + 𝑧𝑛
𝑖=1
Notice that the lower and upper bounds can be anything you choose, and the indice can also
be used in the argument in a mathematic sense (not just and index):
4
βˆ‘ 𝑧𝑖
= 𝑧2 + 𝑧3 + 𝑧4
𝑖=2
6
βˆ‘ π‘˜ π‘§π‘˜
= 𝑧1 + 2𝑧2 + 3𝑧3 + 4𝑧4 βˆ’ 5𝑧5 + 6𝑧6
π‘˜=1
I replaced i with k, just to show you can index with any letter. i is a popular choice.