* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Types of data and how to present them - 47-269-203-spr2010
Survey
Document related concepts
Transcript
Types of data and how to present them 47:269: Research Methods I Dr. Leonard March 31, 2010 Scientific Theory 1. Formulate theories 2. Develop testable hypotheses (operational definitions) 3. Conduct research, gather data 4. Evaluate hypotheses based on data 5. Cautiously draw conclusions Scales of Measurement Nominal Categories Ordinal Categories that can be ranked Interval Scores with equidistant intervals between them Ratio Scores with equidistant intervals and absolute zero Nominal Responses Responses are distinct can be ranked YES NO Equal intervals Absolute zero NO NO Ordinal YES YES NO NO Interval YES YES YES NO Ratio YES YES YES YES Two major approaches to using data Descriptive statistics Describe or summarize data to characterize sample Organizes responses to show trends in data Inferential statistics Draw inferences about population from sample (is population distinct from sample?) Significance Capture impact of random error on responses Margin Note: tests of error Statistics describe responses from a sample; parameters describe responses from a population (e.g., a census) Descriptive Statistics N, total number of cases (responses) in a sample Our class would be N = 33 f, or frequency, is the number of participants who gave a particular response, x Can Can also be given as percentages or proportions be univariate or bivariate How participants vary on one variable (uni-) How participants vary on two variables (bi-) Descriptive statistics are a good first step for analyzing any data! They are the only statistics appropriate for nominal data Frequency distribution (nominal data) x (response) f (frequency) % Democrat 479 47.9 Republican 411 41.1 Independent 101 10.1 Green party 9 0.9 Total n = 1,000 100% Frequency distribution (interval or ratio data) When you need to present a wide range of scores, show responses grouped in intervals to make it easier to grasp “big picture” of data 2.7 1.9 3.1 1.0 3.3 1.3 2.2 3.0 3.4 3.1 1.8 2.6 3.7 2.2 1.9 3.1 3.4 3.0 3.5 3.0 2.4 3.0 3.4 2.4 2.4 3.2 3.3 2.7 3.5 3.2 3.1 2.1 1.5 1.4 2.6 2.9 2.1 2.3 3.1 3.3 2.7 2.4 3.4 3.3 3.0 3.8 1.6 2.8 3.8 1.4 2.6 1.5 2.8 2.3 2.8 2.3 2.8 3.2 2.8 1.9 3.3 2.9 2.0 3.2 Interval .90 - 1.1 1.2 - 1.4 1.5 - 1.7 1.8 - 2.0 2.1 - 2.3 2.4 - 2.6 2.7 - 2.9 3.0 - 3.2 3.3 - 3.5 3.6 - 3.8 f 1 3 3 5 6 7 10 14 12 3 Frequency distributions can be depicted graphically in… Bar graphs Bars not touching because of discrete data Nominal and ordinal data Histograms Bars touching because of continuous data Interval and ratio data Frequency polygons (single line) Interval and ratio data Shapes of Distributions _ normal _ positive skew _ negative skew X X X Shapes of Distributions _ normal _ platykurtic _ leptokurtic X X X What else can we do besides frequencies? Measures of central tendency show the central or “typical” scores in a distribution Mean- the average score Median- the middle score Mode- the most frequent score The mean, median, and mode are related to the horizontal shape (skew) of the distribution. In In In a normal distribution: Mean = Median = Mode a positively skewed distribution: Mode < Median < Mean a negatively skewed distribution: Mean < Median < Mode Which measure of central tendency??? Different measures of central tendency are appropriate depending upon the level of measurement used: Nominal Mode Ordinal Mode Median Interval/Ratio Mode Median Mean The Mean 2 The most informative and elegant measure of central tendency. The average The fulcrum point of the distribution 4 6 8 10 2 4 6 8 15 The Median The middle most score in a distribution. The scale value below which and above which 50% of the distribution falls Not the fulcrum: The halfway point 2 4 6 8 10 2 4 6 8 15 If 2 The Median N is odd, then median is the center score 4 6 8 2 10 4 6 8 15 If N is even, then median is the average of the two centermost score 2 4 6 8 10 12 2 4 6 8 10 15 The Median If the median occurs at a value where there are tied scores, use the tied score as the median 10 2 4 6 8 10 8 10 15 The The Mode most frequent score in the distribution 10 2 2 4 4 6 6 8 10 8 10 8 10 8 10 15 15 One more thing… These measures of central tendency vary in their sampling stability = match between the sample mean (e.g., x) and the population mean (μ). Mode Least sampling stability • Median Mean Most sampling stability Note: Roman (r, s, x) characters are used for sample statistics while Greek (, , ) characters are used for population statistics. Review of central tendency Which one is the only appropriate measure for nominal data? The mode How do you find the median when there is an odd number of scores? Simply locate the score in the middle …when there is an even number of scores? Average the two middle scores Which measure is most sensitive to extreme scores and why? The mean because it takes all scores into account and can be swayed by positive or negative skew Which measure has the most sampling stability and why? The mean because it is the most accurate representation of the overall sample Application of central tendency In 2006, the median home price in Boston was $386,300. (San Francisco was $518,400; Washington D.C was $258,700). How Why do you interpret these numbers? are housing prices framed in terms of the median rather than the mean or the mode? Measures of variability Measures of central tendency …indicate the typical scores in a distribution …are related to skew (horizontal) Measures of variability …show the dispersion of scores in a distribution …are related to kurtosis (vertical) Measures of variability Range - the difference between the highest and lowest score Variance - the total variation (distance) from the mean of all the scores Standard deviation - the average variation (distance) from the mean of all the scores Measures of variability Range = Highest Score – Lowest Score 2 4 6 8 2 4 6 8 10 15 Most sensitive to extreme scores! Measures of variability Again, variance is the overall distance from the mean of all scores (requires squaring the distance of each score from the mean) Not as useful as the standard deviation -- the average distance scores fall from the mean Measures of variability Standard deviation, like the mean, is the most informative and elegant measure of variability. The average distance of scores from the mean score -- deviation is distance! 2 Also 4 6 8 10 like the mean, standard deviation has the most sampling stability How would these standard deviations differ? 2 Mean = 6 Mean = 7.9 2 4 4 6 8 6 8 10 6 8 10 10 Range = 8 Range = 10 12 Standard deviation and shape of distribution 5 0 1 1 4 1 1 4 0 4 1 1 5 5 2 1 0 5 2 1 5 6 3 1 Mean = 15 0 Mean = 15 Std. Dev. = 10 6 Mean = 15 Std. Dev. = 0.9 Properties of Normal Distributions • All normal distributions are single peaked, symmetric, and bell-shaped • Normal distributions can have different values for mean and standard deviation but… • All normal distributions follow the 68-95-99 rule 68.3% of data within 1 standard deviation of the mean 95.4% of data within 2 standard deviations of the mean 99.7% of data within 3 standard deviations of the mean 99.7% - 95.4% - 68.3% - 95.4% - 99.7% Mean