Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Descriptive statistics Marike van der Leeden, PhD Amsterdam Rehabilitation Research Center | Reade Literature Amsterdam Rehabilitation Research Center | Reade Content Descriptive statistics: • Measurement levels of data • Data description • Distribution of data Amsterdam Rehabilitation Research Center | Reade Basic principles of statistics Amsterdam Rehabilitation Research Center | Reade 3 Reasons for using statistics • • • aids in summarizing the results helps us recognize underlying trends and tendencies in the data aids in communicating the results to others Amsterdam Rehabilitation Research Center | Reade Statistics ………!! …. do not compensate for bad study design …. are not a way to determine clinical relevance Amsterdam Rehabilitation Research Center | Reade Measurement levels of data Amsterdam Rehabilitation Research Center | Reade Variables Two kinds of variables: • Dependent variabele(s) • Independent variabele(s) Is drinking coffee a predictor in developing cardiovascular diseases? Amsterdam Rehabilitation Research Center | Reade Categorical data (1) Dichotomous data (2 categories) (e.g. gender) Nominal data (>2 categories) (e.g. blood group) • • limited number of mutually exclusive categories categories are not ordered Amsterdam Rehabilitation Research Center | Reade 8 Categorical data (2) Ordinal data (e.g. education) • • • limited number of mutually exclusive categories categories are ordered; indicates ranking of categories distances between scores are unequal Amsterdam Rehabilitation Research Center | Reade 9 Numerical data (1) - discrete/continuous data Discrete data whole numbers (counting) (e.g. number of children) Continuous data can take any value within a certain range (measuring) (e.g. body weight) Amsterdam Rehabilitation Research Center | Reade 10 Numerical data (2) - interval/ratio data Interval data • scores are quantitative • indicates amount of differences between scores • distances between scores are equal • (e.g. NRS, temperature °C) Amsterdam Rehabilitation Research Center | Reade 11 Numerical data (3) - interval/ratio data Ratio data • scores are quantitative • indicates amount of differences between scores • distances between scores are equal • ratio comparisons (e.g. length, weight) Amsterdam Rehabilitation Research Center | Reade 12 In summary Variable Categorical (qualitative) Nominal categories Ordinal ordered categories Amsterdam Rehabilitation Research Center | Reade Numerical (quantitative Continuous Discrete any value (measuring) whole numbers (counting) 13 Data description Amsterdam Rehabilitation Research Center | Reade Descriptive Statistics Types of descriptive statistics: Organize Data • Tables • Frequency Distributions • Graphs Summarize Data • Central Tendency • Variation Amsterdam Rehabilitation Research Center | Reade 16 Amsterdam Rehabilitation Research Center | Reade Graphs for numerical data: Histograms Amsterdam Rehabilitation Research Center | Reade Graphs for categorical data: Bar graphs Amsterdam Rehabilitation Research Center | Reade 18 Graphs for categorical data: Pie Amsterdam Rehabilitation Research Center | Reade 19 Summarizing categorical data Amsterdam Rehabilitation Research Center | Reade Summarizing numerical data Central Tendency (or Groups’ “Middle Values”) • Mean • Median • Mode Variation (or Summary of Differences Within Groups) • Range • Interquartile Range • Variance • Standard Deviation Amsterdam Rehabilitation Research Center | Reade Mean Most commonly called the “average.” Add up the values for each case and divide by the total number of cases. Mean Y = (Y1 + Y2 + . . . + Yn) n Mean Y = Σ Yi n Amsterdam Rehabilitation Research Center | Reade Median The middle value when a variable’s values are ranked in order; the point that divides a distribution into two equal halves. When data are listed in order, the median is the point at which 50% of the cases are above and 50% below it. The 50th percentile. Amsterdam Rehabilitation Research Center | Reade Median 89 93 97 98 102 106 109 110 115 119 128 131 140 Amsterdam Rehabilitation Research Center | Reade Median = 109 (six cases above, six below) Median 1. The median is unaffected by outliers, making it a better measure of central tendency, better describing the “typical person” than the mean when data are skewed. Amsterdam Rehabilitation Research Center | Reade Median 2. 3. If the recorded values for a variable form a symmetric distribution, the median and mean are identical. In skewed data, the mean lies further toward the skew than the median. Symmetric Skewed Mean Median Amsterdam Rehabilitation Research Center | ReadeMedian Mean Mode The most common data point In symmetric distributions, the mean, median, and mode are the same. In skewed data, the mean and median lie further toward the skew than the mode. 1. 2. 3. Symmetric Median Skewed Mean Amsterdam Rehabilitation Research Center | Reade Mode Mode Median Mean Statistics Systolisch N Mean Median Mode Std. Deviation Variance Range Percentiles Valid Missing 25 50 75 Amsterdam Rehabilitation Research Center | Reade 904 68 161,02 160,00 160 28,507 812,672 195 140,00 160,00 180,00 Descriptive Statistics Summarizing Data: Central Tendency (or Groups’ “Middle Values”) Mean Median Mode Variation (or Summary of Differences Within Groups) • Range • Interquartile Range • Variance • Standard Deviation Amsterdam Rehabilitation Research Center | Reade Range The spread, or the distance, between the lowest and highest values of a variable. The median is a quartile and divides the cases in half. The interquartile range is the distance or range between the 25th percentile and the 75th percentile. 25% of cases 0 25% 25% Amsterdam Rehabilitation Research Center | Reade 250 500 750 25% of cases 1000 Boxplot p100 { 12 p75 Median = p50 p25 T EST VAR IQR 14 10 8 6 p0 4 N= 56 1.00 TESTVAR2 Amsterdam Rehabilitation Research Center | Reade Variance A measure of the spread of the recorded values on a variable. A measure of dispersion. The larger the variance, the further the individual cases are from the mean. Mean The smaller the variance, the closer the individual scores are to the mean. Mean Amsterdam Rehabilitation Research Center | Reade Variance Calculating variance starts with a “deviation” A deviation is the distance away from the mean of a case’s score: Yi – Y-mean Squaring the deviations will eliminate negative signs... A Deviation Squared: (Yi – Y-mean)2 Variance (S2) • Average of squared distances of individual points from the mean Amsterdam Rehabilitation Research Center | Reade Variance If you were to add all the squared deviations together, you’d get what we call the “Sum of Squares.” Sum of Squares (SS) = Σ (Yi – Y-mean)2 SS = (Y1 – Y-bar)2 + (Y2 – Y-bar)2 + . . . + (Yn – Y-bar)2 Amsterdam Rehabilitation Research Center | Reade Variance The last step… The approximate average sum of squares is the variance. SS/N = Variance for a population. SS/n-1 = Variance for a sample. Variance = Σ(Yi – Y-mean)2 / n – 1 But: large and difficult to interpret Amsterdam Rehabilitation Research Center | Reade Standard Deviation To convert variance into something of meaning, create standard deviation. The square root of the variance reveals the average deviation of the observations from the mean: Square root of the Variance • expressed in the original units of measurement • Represents the average amount of dispersion in a sample 𝑠𝑑 = Amsterdam Rehabilitation Research Center | Reade Σ(Yi – Y−mean)2 𝑛−1 Standard Deviation 1. 2. 3. 4. Larger s.d. = greater amounts of variation around the mean. s.d. = 0 only when all values are the same (only when you have a constant and not a “variable”) If you were to “rescale” a variable, the s.d. would change by the same magnitude Like the mean, the s.d. will be inflated by an outlier case value. Amsterdam Rehabilitation Research Center | Reade Descriptive Statistics Summarizing Data: Central Tendency (or Groups’ “Middle Values”) Mean Median Mode Variation (or Summary of Differences Within Groups) Range Interquartile Range Variance Standard Deviation Amsterdam Rehabilitation Research Center | Reade Distribution of data Amsterdam Rehabilitation Research Center | Reade Normal distribution “Bell curve” where many cases fall near the middle of the distribution and few fall very high or very low Amsterdam Rehabilitation Research Center | Reade Normal distribution • Many characteristics are distributed through the population in a ‘normal’ manner • Parametric statistics are based on the assumption that the variables are distributed normally • Most commonly used statistics Amsterdam Rehabilitation Research Center | Reade 5 right (positive) skew 4 X 3 • skew (skewness) 2 5 1 4 0.2 0.4 0.6 D 0.8 1.0 1.2 3 X 0 0.0 left (negative) skew 2 1 Amsterdam Rehabilitation Research Center | Reade 0 0.0 0.2 0.4 0.6 D 0.8 1.0 1.2 Amsterdam Rehabilitation Research Center | Reade [“peakedness”] kurtosis 0.22 0.4 0.8 X X 0.00 -5 0.0 -5 5 D 0.0 -5 5 ‘leptokurtic’ D Amsterdam Rehabilitation Research Center | Reade ’platykurtic’ 5 Normal distribution In normal distributed data: Mean ≈ median ≈ mode Judging normality • • • • Mean ≠ median Mean – 2*sd < minimal score a-symmetric boxplot Test for skewness and kurtosis (SPSS) Amsterdam Rehabilitation Research Center | Reade Which information do you present? Depends on: • Type variable • Dichotome/categorical: frequencies • Continuous: summarize data • Normaly distributed data? • Yes: mean en sd • No: median en IQR Amsterdam Rehabilitation Research Center | Reade So… Descriptive statistics are used to summarize data from individual respondents, etc. • They help to make sense of large numbers of individual responses, to communicate the essence of those responses to others • They focus on typical or average scores, the dispersion of scores over the available responses, and the shape of the response curve Amsterdam Rehabilitation Research Center | Reade