Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Math Symbols, Equations, and Examples Name Symbol Size Proportion Mean Standard Deviation Median Correlation Coefficient Spearman rho Quartiles z-score Other Terms: FNS IQR Max Min Sample N Μ π Μ π s M r Ο Q1, Q2, Q3 zi Five Number Summary InterQuartileRange Maximum value Minimum value Names Population N π µ Ο M Ignore Ignore Q1, Q2, Q3 zi p-hat, pi x-bar, myu s, sigma M are rho zee Min, Q1, Median, Q3, Max Q3 β Q1 Max(xi) Min(xi) Note that all equations for Sample and Population are the same (symbols sometimes differ) except for standard deviation, and that only differs in the denominator. Size N is the number of Observational Units (OUs) in the population. n is the number of OUs in the sample. Proportion Sample Proportion For n = n1+ n2, where Niβs represent the number in a subgroup of the sample. (e.g. n1 might be the number of males, and n2 the number of females in the sample). A proportion is ni/n. The proportion Μ = n1/n. of n1 is π Sample Proportion For N = N1+ N2, where Niβs represent the number in a subgroup. The proportion of N1 is Ο=N1/N. Also see handout on proportions if this is not sufficient. Mean (Measure of Center) For a data set {x1, x2,β¦, xn} Sample Mean Population Mean µ= π₯1 + π₯2 + β― + π₯π π Standard Deviation (Measure of Spread) Of a Sample π = β 2 βπ π=1(π₯π βπ₯Μ ) πβ1 of a Population π= β 2 βπ π=1(π₯π βµ) π Without the sigma notation, equivalently: π = β (π₯1 βπ₯Μ )2 +(π₯2 βπ₯Μ )2 +β―+(π₯π βπ₯Μ )2 π= β πβ1 (π₯1 βµ)2 +(π₯2 βµ)2 +β―+(π₯π βµ)2 π Note that in this notation, it looks similar to the equations for the mean. Note that the n-1 exists to make sure that s is an unbiased estimator of Ο. You donβt have to understand why, just the fact that s is unbiased is important. Z-Score Z-score is the standardized value of a data point. Sample π§π = Population π§π = π₯π βπ₯Μ π π₯π βµ π Correlation Coefficient (Measure of Association) π π₯ β π₯Μ π¦π β π¦Μ 1 π= β( π πβ1 π π₯ ) ( π π¦ ) π=1 In terms of z-scores π 1 π= β π§ππ₯ π§ππ¦ πβ1 π=1 So if you calculate the z-scores for a data set (or, equivalently, standardize the data) then the z-scores can be used to calculate the correlation coefficient, r. Itβs fine if you just recall the latter equation. Spearman rho, Ο, is a non-parametric correlation coefficient. Use it for ordered, ranked, or non-linear data.1 Median, Q1, Q3, Min, Max How to do these are in the notes. Min and Max are easy, as they are the maximum and minimum values in the dataset. 1 There are better non-parametric measures of association for non-linear data, but for now just Spearmanβs rho will do. EXAMPLES Two values are measured in an experiment with ten subjects. A dosage value, x, and a response value, y. The data appears in the following table. OU# X Y 1 0 1 2 3 2 3 1 0 4 2 2 5 4 4 6 5 10 7 3 4 8 6 10 1) Find the sample means for x and y: π₯Μ = π₯1 + π₯2 + β― + π₯π 0 + 3 + 1 + 2 + β―+ 6 = = 3 π 8 π¦Μ = π¦1 + π¦2 + β― + π¦π 1 + 2 + 0 + 2 + β― + 10 = = 4 π 8 2) Find the sample standard deviations for x and y: (using the values for π₯Μ and π¦Μ previously calculated) 2 βπ π=1(π₯π βπ₯Μ ) π π₯ = β π π₯ = β πβ1 28 7 =β (π₯1 βπ₯Μ )2 +(π₯2 βπ₯Μ )2 +β―+(π₯π βπ₯Μ )2 πβ1 =β (0β3)2 +(3β3)2 +β―+(6β3)2 7 = β4 = 2 Likewise, sy=4 3) Calculate z-scores for the x and y values: OU# X Zix Y Ziy 1 0 -1.5 1 -1 2 3 0 2 -0.5 3 1 -1 0 -1 4 2 -0.5 2 -0.5 5 4 .5 4 0 6 5 0 10 1.5 7 3 1 4 0 8 6 1.5 10 1.5 π₯ βπ₯Μ Weβll do the first few for x, and using π₯Μ = 3 and π π₯ = 2, with π§π₯π = π π Where π₯Μ = 3, π§ππ₯ = 0 (because π₯π β π₯Μ = 3 - 3 = 0), so π§2π₯ and π§7π₯ are done. For π§1π₯ = (0-3)/2 = -3/2 = -1.5 π§3π₯ = (1-3)/2 = 2/2 = 1 π§4π₯ = (2-3)/2 = -1/2 = -0.5 And so on. Do the same for y and the ziyβs, with π¦Μ = 4 and π π¦ = 4, and get the values for the last row. 4) Calculate the regression coefficient, r, for x and y. Using the values for zix and ziy, and the equation for r: π 1 π= β π§ππ₯ π§ππ¦ πβ1 π=1 1 π = (π§1π₯ π§1π¦ + π§2π₯ π§2π¦ + π§3π₯ π§3π¦ + +π§4π₯ π§4π¦ + β― + π§ππ₯ π§ππ¦ ) 7 1 π = ((β1.5)(β1) + (0)(β0.5) + (β1)(β1) + (β0.5)(β0.5) + β― + (1.5)(1.5)) 7 π= 1 β 6.5 = 0.928571 7 Thatβs it. Recall that the use of r is quite restrictive. This is discussed in the lecture notes. I the data is non-linear or ranked or ordered, then a non-parametric measure of association such as Spearmanβs rho must be used. Sigma Operator Symbols in math, like +, -, ÷, etc. are called operators. β is the βsigma operatorβ The sigma operator sums whatever is in the argument based on the indices, i, the lower bound (in this case 1) and n, which marks the upper bound. So, for data set {z 1, z2, z3,β¦zn}, βππ=1 π§π means βsum all zβs starting from 1 and continue to n,β or π β π§π = π§1 + π§2 + π§3 + β― + π§π π=1 Notice that the lower and upper bounds can be anything you choose, and the indice can also be used in the argument in a mathematic sense (not just and index): 4 β π§π = π§2 + π§3 + π§4 π=2 6 β π π§π = π§1 + 2π§2 + 3π§3 + 4π§4 β 5π§5 + 6π§6 π=1 I replaced i with k, just to show you can index with any letter. i is a popular choice.