Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
RMTD 404 Lecture 2 Summation Notation • We need a way to talk about the processes that occur in a statistical analysis in a succinct way • We use summation notation Σ - stands for “sum” X - stands for the variable we sum i - referred to as a subscripting index, stands for the individual values of X N - stands for the highest value we sum across (usually the number of cases). N could be replaced by a number, but we usually use a letter like N to indicate that we’re summing across all values of X (i.e., there are N values of the X variable). Summation Notation Examples Is read as the sum of the values of X ranging from 1 (the first unit/person) to the Nth person (the last unit/person) Say X is a vector of {1,2,3,4,5} Using the above summation notation we can get 1+2+3+4+5 = 15 Summation Notation • We can be more specific • In this case we are only interested in summing the first 4 integer: 1+2+3+4 = 10 • What do you think about these ones? Summation Notation • X = {11,9,8,15,3} • If i = 2, Xi = 9 • If i = N, Xi = 3 (the Nth case value; N = 5) • What do we think about this? Summation Notation X = {11,9,8,15,3} If i = 2, Xi = 9 If i = N, Xi = 3 (the Nth case value; N = 5) What do we think about these? Pay attention to the parentheses – solve those first then exponentiate Summation Notation • Some rules • Adding a constant • Multiplying a constant • Multiplying matched pairs (two vectors) • Difference between two vectors Summation Notation • Don’t let summation notation scare you • All we’re doing here is summing across a vector of rows (I) and a vector of columns (J) Column 1 Column 2 Column 3 Total Row 1 5 (X11) 7 (X12) 1 (X13) 13 Row 2 3 (X21) 9 (X22) 0 (X23) 12 Row 3 1 (X31) 2 (X32) 18 (X33) 21 Total 9 18 19 46 Measures of Central Tendency • To get at the “location” of the distributions we use measures of central tendency • We look at location shifts Measures of Central Tendency • Mean • Median • Mode X = {5,3,2,9,3,4,9,8,2} Using R… Distributions: Modality • Compare the following two graphics • The left graph shows evidence of a bimodal distribution (two distinct points) Mean, median, mode Distributions: Shape • When talking about shape, we are talking about kurtosis – the concentration of the data center in the center, shoulders, and tail leptokurtic mesokurtic platykurtic shoulders tails Distribution: Skewness • The left is negatively skewed while the right is positively skewed • When skewness is present, our measures of central tendency aren’t as obviousmode median mean Measures of Variability • Range – difference between two most extreme points • Interquartile Range – the difference between the 25th and 75th percentiles • Variance - the average deviation score from the mean • Standard deviation – average absolute deviation from the mean Measures of Variability • Coefficient of Variation - An index that rescales the standard deviations from two groups that are measured on the same scale but have very different means (useful for comparing group variability). SPSS & R • Using the NELS student data we can get the following output for the base-year math scores Descriptive Statistics N Minimum Maximum Mean Std. Deviation • Using SPSS Base-year Math standardized score 270 Valid N (listwise) 270 summary(bytxmstd) Min. 1st Qu. Median 30.28 43.17 51.45 • Using R 30.282 71.222 51.71431 Mean 3rd Qu. 51.71 59.66 Max. 71.22 10.083413 NA's 30.00 Transformation • There are some solutions to skewed distributions – Linear transformations • We can add a constant to each case in the dataset will shift the mean of the distribution by that value • We can similarly multiply or divide values each case by some constant Transformation • Standardization is a very common method • Z-scores help us turn raw scores into standard deviations (with a mean of 0 and sd of 1) • For example, if someone has a GRE score of 620, and the mean is 500, and sd is 100 then… Transformation • You can use the following formula to transform scores into have a mean and standard deviation of your interest • X’ is the transformed score, sx’ is the desired sd, and Xbar’ is the desired mean Parameters and Statistics (Quick Notation) Sample Population Mean X Variance s X2 2 SD sX Some Important Properties Sufficiency Statistic uses all of the information in the sample – think of the mean, median, and mode… Unbiasedness The average of the sum of all possible samples will yield the exact estimate of the parameter of interest – the expected value is equal to the parameter Efficiency The variability of a large number of samples is smaller for some statistic than for another (related) statistic Resistant Not heavily influenced by outliers Introduction to R • • • • Basic commands Creating variables Graphics Importing data Introduction to SPSS • Descriptives • Transformations • Graphics