Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Transcript

DESCRIPTVE STATISTICS Frequency distribution Stem-and-leaf Graph Mean, Median, Mode. Range, Variance, Standard Deviation Skewness, Kurtosis Contents of the course Descriptive statistics Graph, table, mean and standard deviation Inferential statistics Probability and distribution Hypothesis test Analysis of Variation Correlation and regression analysis Other special topics Types of Graphs DESCRIPTVE STATISTICS Graphical Summaries Frequency distribution Histogram Stem and Leaf plot Boxplot √ √ Numerical Summaries Location – mean, median, mode. Spread – range, variance, standard deviation Shape – skewness, kurtosis DESCRIPTVE STATISTICS Frequency Distribution A frequency distribution shows the number of observations falling into each of several ranges of values. Frequency distributions are portrayed as frequency tables, histograms, or polygons. Frequency distributions can show either the actual number of observations falling in each range or the percentage of observations. In the latter instance, the distribution is called a relative frequency distribution. DESCRIPTVE STATISTICS Frequency Distribution A frequency distribution uses a smooth curve to connect the points and, similar to a graph, is plotted on two axes: The horizontal axis from left to right (or x axis) indicates the different possible values of some variable (a phenomenon where observations vary from trial to trial). The vertical axis from bottom to top (or y axis) measures frequency or how many times a particular value occurs. DESCRIPTVE STATISTICS Frequency Distribution DESCRIPTVE STATISTICS Normal distributions take the form of a symmetric bell-shaped curve. The standard normal distribution is one with a mean of 0 and a standard deviation of 1 DESCRIPTVE STATISTICS Frequency Distribution Normal J-shaped +ve skewed -ve skewed Bimodal DESCRIPTVE STATISTICS Stem-and-leaf Graph A stem-and-leaf graph or stemplot comes from the field of exploratory data analysis. This type of graph is a good choice if the data set is small. You use the data to create the graph by dividing each observation of data into a stem and a leaf. The leaf consists of one digit and the stem consists of the remaining digits. DESCRIPTVE STATISTICS DESCRIPTVE STATISTICS Stem-and-leaf Graph For example, 35 has stem 3 and leaf 5. The number 354 has stem 35 and leaf 4. DESCRIPTVE STATISTICS Stem-and-leaf Graph To construct the graph, write the stems in a column and the leaves in a second column in increasing order. Example: Scores for a pre-calculus exam that counted 100 points were (from smallest to largest) as follows: 33, 42, 49, 49, 53, 55, 55, 61, 63, 67, 68, 68, 69, 69, 72, 73, 74, 78, 80, 83, 88, 88, 88, 90, 92, 94, 94, 94, 94, 96, 100 DESCRIPTVE STATISTICS Stem-and-leaf Graph stem The stemplot would look like this: 3 leaf 3 4 299 5 355 6 1378899 7 2348 8 03888 9 0244446 10 0 DESCRIPTVE STATISTICS Stem-and-leaf Graph To understand the stemplot, look at the second row. You see 4 299. This represents the 42 and the two 49s. The data itself actually shows us the shape and distribution of the data. The stemplot shows us that most scores fell in the 60s, 70s, 80s, and 90s. More than half of the students received a score of 70 or better. A little less than half received a score of 80 or better. About one-fourth of the students received a score of 90 or better. Stem-and-Leaf Display: GPAs Stem-and-leaf of GPAs N = 50 Leaf Unit = 0,10 3 2 222 8 2 55555 15 2 6667777 (14) 2 88888999999999 21 3 000001 15 3 22233 10 3 444555 4 3 67 2 3 8 1 4 0 O,2 was set as the increment between the lines. N = 50 values of the display Leaf unit = 0,10 i.e. the stem unit = 1 Smallest value: - a stem of 2 and - a leaf of 2 = 2.2 GPA - with 3 students Largest value: - a stem of 4 and - a leaf of 0 = 4.0 GPA - with 1 student Median ≈ 2.9 GPA DESCRIPTVE STATISTICS Measure of Central Tendency How it’s Determined (N- number of scores) Data for which it’s Appropriate Mode The most frequently occurring score is identified •Data Median The scores are arranged in order from smallest to largest, and the middle score (when N is an odd number) or the midpoint between the two middle scores (when N is an even number) is identified. •Data Arithmetic mean All the scores are added together, and their sum is divided by the total number (N) of scores. •Data Geometric mean All the scores are multiplied together, and the nth root of their product is computed. Data on ratio scales Data that fall in an ogive curve (e.g., growth data) on nominal, ordinal, interval, and ratio scales. •Multimodal distribution (two or more modes may be identified when a distribution has multiple peaks) on ordinal, interval, and ratio scales •Data that are highly skewed on interval and ratio scales •Data that fall in a normal distribution DESCRIPTVE STATISTICS DESCRIPTVE STATISTICS Measure of Central Tendency How it’s Determined (Nnumber of scores) Data for which it’s Appropriate Range The difference between the •Data on ordinal, interval, and highest and lowest scores in ratio scales* the distribution Interquartile range The difference between the •Data on ordinal, interval, and 25th and 75th percentiles ratio scales •Especially useful for highly skewed data Standard deviation s = Σ(X – M)² N •Data on interval and ratio scales •Most appropriate for normally distributed data Variance s² = Σ(X – M)² N •Data on interval and ratio scales •Most appropriate for normally A measure of the average distance distributed data between each of a set of data points •Especially useful inferential and their mean value; equal to the sum procedures (e.h., of the squares of the deviation from statistical the mean value. analysis of variance) DESCRIPTVE STATISTICS DESCRIPTVE STATISTICS Standard Deviation The standard deviation (SD) is a measure of spread in your data. The larger the SD, the more spread there is in your data. Think in terms of dispersion. The larger the SD, the more dispersion there is in your data. The smaller the SD, the less dispersion exists in your data. The standard deviation is a measure of variability around a mean score. In statistical terms, the standard deviation is the square root of a measure called the variance, which is the average of the squares of the deviation scores for the sample for a particular item. Standard Deviation A larger standard deviation (shown in light pink) indicates more scatter -- less precision -- in the results. A smaller standard deviation (shown in light blue) indicates less scatter. Both sets of results have the same mean (the green line). Standard Deviation To illustrate the standard deviation and the type of insight it provides, the following table presents scores for two students, Bill and Tom, over their last ten college exams. Standard Deviation Both students ended up with an average exam score of 80, as indicated by a mean of 80.0 for each student. Note that the standard deviation around Bill’s mean of 80.0 is 10.1, while the standard deviation around Tom’s mean of 80.0 is only 2.6. Obviously, by looking at the scores for the two students, we can see that Tom is much more consistent than Bill. Tom’s scores range from a low of 76 to a high of 84 (a range of only 8 points), whereas Bill’s scores range from a low of 66 to a high of 96 (a range of 30 points). Standard Deviation If there is no spread or dispersion in your data, then the SD is zero. While a zero SD would be unlikely in a large sample, this is something that could happen in a small sample of physicians when rating administrator communication, for example. If each physician gave the same rating on an item, then that would mean there is no spread or dispersion in the data at all. Thus, the SD would be zero. On the other hand, if the physician responses were dispersed evenly across the rating scale, the SD would be larger. Standard Deviation Here is something else to remember. If the data are normally distributed or shaped in a “bell curve,” approximately 68% of the scores will fall between one SD above the mean and one SD below the mean. Furthermore, 95% of all scores will fall between 2 SDs above and below the mean. Finally, 99.7% of scores will fall between 3 SDs above and below the mean. Standard Deviation Standard Deviation Standard Deviation Standard Deviation DESCRIPTVE STATISTICS DESCRIPTVE STATISTICS Normal distributions take the form of a symmetric bell-shaped curve. The standard normal distribution is one with a mean of 0 and a standard deviation of 1 DESCRIPTVE STATISTICS Skewness The degree of departure from symmetry of a distribution. A positively skewed distribution has a "tail" which is pulled in the positive direction. A negatively skewed distribution has a "tail" which is pulled in the negative direction. Skewness Skewness is defined as asymmetry in the distribution of the sample data values. Values on one side of the distribution tend to be further from the 'middle' than values on the other side. For skewed data, the usual measures of location will give different values, for example, mode<median<mean would indicate positive (or right) skewness. Skewness Positive (or right) skewness is more common than negative (or left) skewness. If there is evidence of skewness in the data, we can apply transformations, for example, taking logarithms of positive skew data. Measures of Shape: Skewness Left skewed Measures of Shape: Skewness Right skewed DESCRIPTVE STATISTICS DESCRIPTVE STATISTICS DESCRIPTVE STATISTICS DESCRIPTVE STATISTICS DESCRIPTVE STATISTICS DESCRIPTVE STATISTICS Skew is the tilt (or lack of it) in a distribution. The more common type is right skew, where the tail points to the right. Less common is left skew, where the tail is points left. DESCRIPTVE STATISTICS DESCRIPTVE STATISTICS Skew is the tilt (or lack of it) in a distribution. The more common type is right skew, where the tail points to the right. Less common is left skew, where the tail is points left. DESCRIPTVE STATISTICS DESCRIPTVE STATISTICS DESCRIPTVE STATISTICS Kurtosis The degree of peakedness of a distribution. A normal distribution is a mesokurtic distribution. A pure leptokurtic distribution has a higher peak than the normal distribution and has heavier tails. A pure platykurtic distribution has a lower peak than a normal distribution and lighter tails. Measures of Shape: Kurtosis Platykurtic Measures of Shape: Kurtosis Platykurtic Measures of Shape: Kurtosis Leptokurtic Measures of Shape: Kurtosis Leptokurtic DESCRIPTVE STATISTICS DESCRIPTVE STATISTICS DESCRIPTVE STATISTICS DESCRIPTVE STATISTICS DESCRIPTVE STATISTICS DESCRIPTVE STATISTICS