Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
LIS 570 Summarising and presenting data Univariate analysis Summary Basic definitions Descriptive statistics Describing frequency distributions shape central tendency dispersion Selecting analysis and statistical techniques Specific research question or hypothesis Determine number of variables Type title here Univariate analysis Bivariate analysis Multivariate analysis Determine level of measurement of variables Choose univariate method of analysis Choose relevant descriptive statistics Choose relevant inferential statistics De Vaus p133 Basic Definitions Values : the categories developed for a variable Nominal Ordinal Interval Data : Observations (Measurements) taken on the units of analysis Basic definitions Statistics - Methods for dealing with data Descriptive statistics summarise sample or census data Inferential statistics Draw conclusions about the population from the results of a random sample drawn from that population Methods of analysis (De Vaus, 134) Univariate methods Bivariate methods Multivariate methods Frequency distributions Cross tabulations Conditional tables Scattergrams Partial rank order correlation Regression Multiple and partial correlation Rank order correlation Multiple and partial regression Comparison of means Path analysis Frequency Distributions Ungrouped frequency distribution A list of each of the values of the variable The number of times and/or the percent of times each value occurs Grouped frequency distribution A table or graph which shows the frequencies or percent for ranges of values Frequency distributions Value Label 18-24 25-31 32-38 39-45 Value Frequency Percent Valid Percent 1.00 2.00 3.00 4.00 5 5 6 4 ------20 25.0 25.0 30.0 20.0 ------100.0 25.0 25.0 30.0 20.0 ------100.0 Total Valid cases 20 Missing cases 0 Cum Percent 25.0 50.0 80.0 100.0 Frequency distributions Required information for frequency tables table number and title labels for the categories of the variables column headings the number of missing cases Histograms Histogram 100 80 60 Frequency 40 20 Std. Dev = 11.79 Mean = 37.2 N = 474.00 0 24.0 32.0 28.0 Age of employee 40.0 36.0 48.0 44.0 56.0 52.0 64.0 60.0 Describing Frequency Distributions Shape Symmetrical (Mirror image) Skewed Negative skew tail toward lower scores Positive skew tail toward higher scores Dispersion Central tendency Shape - for ordinal or interval variables Positively skewed distribution Cluster towards the low end of the variable Shape - for ordinal or interval variables Negatively skewed distribution Cluster towards the high end of the variable Shape - Symmetry Histogram 60 50 40 30 Frequency 20 10 Std. Dev = 10.06 Mean = 81.1 N = 474.00 0 62.5 65.0 67.5 Job seniority 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 90.0 92.5 95.0 97.5 Central Tendency Typical or representative value or score Mean (arithmetic mean)( x ) Median Sum all the observations / n Use for interval variables when appropriate Value that divides the distribution so that an equal number of values are above the median and an equal number below Mode Value with the greatest frequency Uni-modal, bi-modal etc. Mode Best for nominal variables Problems most common may not measure typicality may be more than one mode unstable - can be manipulated Dispersion variation ratio (v) % of people not in the modal category Median Preferred for ordinal variables people are ranked from low to high median is the middle case the median category is the one that the middle person belongs to Value Label 18-24 25-31 32-38 39-45 Value Fr equency Percent Valid Percent Cum Percent 1.00 2.00 3.00 4.00 5 5 6 4 ------20 25.0 25.0 30.0 20.0 ------100.0 25.0 25.0 30.0 20.0 ------100.0 25.0 50.0 80.0 100.0 Total Valid cases 20 Missing case s 0 Dispersion The cth percentile of a set of numbers is a value such that c percent of the numbers fall below it and the rest fall above. The median is the 50th percentile The lower quartile is the 25th percentile The upper quartile is the 75th percentile five number summary Median, quartiles and extremes Dispersion Lower quartile Median Upper quartile Boxplot Variable 1 Interquartile range IQR Variable 2 Variable 3 4 6 8 10 12 14 16 Mean uses the actual numerical values of the observations most common measure of centre makes sense only of interval or ratio data, frequently computed for ordinal variables as well. Dispersion The standard deviation and variance measure spread about the mean as centre. Variance mean of the squares of the deviations of the observations from the mean. Standard deviation the positive square root of the variance Example Data (6,7,5,3,4) = 6+7+5+3+4 5 2 Variance (S ) = 25 5 =5 Calculate the mean for the variable Take each observation and subtract the mean from it Square the result from the above Add (sum) all the individual results Divide by n Variance Observation x 6 7 5 3 4 2 (s ) Deviation Sq. deviation X- (X - )2 6-5 = 1 1 7-5 = 2 4 5-5 = 0 0 3-5 = -2 4 4-5 = -1 1 Sum = 10 Variance = sum of the sq deviations = 10 = 2 number of observation 5 Standard deviation (s) Square root of the variance 2 = 1.4 an average deviation of the observations from their mean influenced by outliers best used with symmetrical distributions Summary Determine if variable is nominal, ordinal or interval Nominal Frequency tables Mode Ordinal Frequency tables (grouped frequency tables histogram Median and five number summary plus IQR Mode Summary Interval Determine whether the distribution is skewed or symmetrical Compare median and mean Use the mean and the standard deviation if the distribution is not markedly skewed Otherwise use median and five number summary plus IQR Use the mode in addition if it adds anything.