Download Ethics & Research

Some basic statistical tests & more on basic statistical analysis Communication Research Week 11 with help from: Carey, J & Dimmitt, C. (2003) Statistical Analysis: Is Change Real? www.umass.edu/schoolcounseling/ WelcometoAmherstMassachusetts/StatisticalAnalysis.ppt [accessed 10 Oct 2006] http://www.statsoft.com/textbook/stathome.html Why Statistical Analysis?     After we gather and compute our data, we want to be sure that the scores of two groups really are different. We want to be sure that the differences we see are not just due to chance. If we are basing decisions on real differences our behavior is directed and purposeful. If we are basing decisions on differences that are only due to chance our behavior is random and chaotic. Communication Research 2 Statistical Tests   Allow us to estimate the likelihood that the apparent differences between groups are real and not due to chance. These tests have the built in capacity to take the number of people per group and the variability of the data into account when making these estimates. Communication Research 3 Measuring Variables       Variables – the things we measure, control or manipulate Independent variables (IV) are usually those that are manipulated Dependent variables (DV) are only measured or registered They differ in how well they can be measured and the type of measurement scale used Two or more variables are related if, in a sample of observations, the values systematically correspond to each other for these observations eg height is considered related to weight because typically tall people are heavier than short ones; IQ is related to the number of errors in a test, if people with higher IQs make fewer errors Communication Research 4 Why are relations between variables considered important?    The philosophy of science believes that there is no other way of representing “meaning” except in terms of relations between some quantities or qualities Statistical significance (p-value) of a result is the probability that the observed relationship (eg between the variables) or a difference (eg between the means) in a sample occurred by pure chance (“luck of the draw”) and that in the populations from which the sample was drawn, no such relationship or differences exist In other words, the statistical significance of a result tells us something about the degree to which the result is “true” (ie representative of the population) Communication Research 5 Example – "Baby boys to baby girls ratio."    Consider the following example from research on statistical reasoning (Nisbett, et al., 1987). There are two hospitals: in the first one, 120 babies are born every day, in the other, only 12. On average, the ratio of baby boys to baby girls born every day in each hospital is 50/50. However, one day, in one of those hospitals twice as many baby girls were born as baby boys. In which hospital was it more likely to happen? The answer is obvious for a statistician, but as research shows, not so obvious for a lay person: It is much more likely to happen in the small hospital. The reason for this is that technically speaking, the probability of a random deviation of a particular size (from the population mean), decreases with the increase in the sample size. Communication Research 6 Data characteristics that help determine the statistical test used      Type of data used – nominal, ordinal, interval, ratio Two groups vs more than two groups Whether groups are matched (“paired”) or unmatched Whether groups are small or large Whether the data are normally distributed (continuous data) Communication Research 7 Different data/variable types Data type Description Example Nominal Allow for only qualitative classification – they can be measured only in terms of whether the individual items belong to some distinctively different categories, but we cannot quantify or even rank order those categories. For example, all we can say is that two (2) individuals are different in terms of variable A (eg of a different race), but we cannot say which one "has more" of the quality represented by the variable. Typical examples of nominal variables are gender, race, color, city, etc Allow us to rank order the items we measure in terms of which has less and which has more of the quality represented by the variable, but still they do not allow us to say "how much more." Eg socioeconomic status of families. For example, we know that upper-middle is higher than middle but we cannot say that it is, for example, 18% higher. Also this very distinction between nominal, ordinal, and interval scales itself represents a good example of an ordinal variable. For example, we can say that nominal measurement provides less information than ordinal measurement, but we cannot say "how much less" or how this difference compares to the difference between ordinal and interval scales. Ordinal Communication Research 8 Different data/variable types Data type Interval Ratio Description Example Allow us not only to rank order the items that are measured, but also to quantify and compare the sizes of differences between them. For example, temperature, as measured in degrees Fahrenheit or Celsius, constitutes an interval scale. Eg We can say that a temperature of 40 degrees is higher than a temperature of 30 degrees, and that an increase from 20 to 40 degrees is twice as much as an increase from 30 to 40 degrees Are very similar to interval variables; in addition to all the properties of interval variables, they feature an identifiable absolute zero point, thus they allow for statements such as x is two times more than y. Typical examples of ratio scales are measures of time or space. For example, as the Kelvin temperature scale is a ratio scale, not only can we say that a temperature of 200 degrees is higher than one of 100 degrees, we can correctly state that it is twice as high. Interval scales do not have the ratio property. Most statistical data analysis procedures do not distinguish between the interval and ratio properties of the measurement scales. Communication Research 9 Example 2: After implementation of a family math education intervention, Latino/a students average 4th Grade MCAS scaled score increased from 206 to 215.   Hypothesis (Ha). The two groups are really different. Null Hypotheses (Ho). The two groups are not different, the apparent difference is due to chance. Communication Research 10 Example 2: After implementation of a family math education intervention, Latino/a Students’ average 4th Grade MCAS score increased from 206 to 215.   The variability of the outcome data is a major factor in determining whether the differences are real or due to chance. At the TAB, in a straight bet, how much would you be willing to wager the Ha is true if you knew that, if students retake the MCAS within a month    90% of the time their two scores differ by less than 2 points. 90% of the time their two scores differ by less that 10 points. 90% of the time their two scores differ by less than 50 points. Communication Research 11 Example 1: 70% of White and 40% of African American 3rd graders score Advanced or Proficient on the MCAS Reading Test.   The number of people in two groups is a major factor in determining whether differences are real or due to chance. At the TAB, in a straight bet, how much would you be willing to wager the hypothesis is true if you knew that:    The percentages are based on 10 students from each group. The percentages are based on 50 students from each group. The percentages are based on 100 students from each group. Communication Research 12 Parametric vs non parametric tests     Since we have two types of data we need two types of statistical tests. Parametric – the DV is a continous variable (eg age in years) so it makes sense to calculate the mean and SD Non parametric – the DV is a count (nominal data) or a ranking (ordinal data) and so it makes no sense to measure a means eg “the average gender of Australians is 1.5” Parametric Tests are generally more powerful, meaning that if there is a real difference between the groups its easier to find it with a Parametric Test Communication Research 13 Examples of parametric tests      Independent t-test or a comparison of two means Looks for a difference between two groups (eg men and women) on a particular variable (eg whether they kiss on the first date) Paired t-test eg such as how someone feels about drink driving before they get caught, and how they feel about it afterwards. In an SPSS output table, the Sig (2-tailed) value is the significance value – the likelihood that the result could happen by pure chance. If the value is less than 0.05, the chance is less than 5%, so the significance of the difference is 95% – this is therefore highly significant Communication Research 14 Choosing a significance level    Statistical Tests do not give us information that allows us to definitively say whether an observed difference between groups is real or just due to chance. Statistical Tests do give us an estimate of the likelihood that observed difference between groups results from chance. We must decide what criteria we will use for deciding whether a difference is real. Communication Research 15 Choosing a Significance Level  We do this by choosing a Significance Level     .25 25% chance difference is due to chance .10 10% chance difference is due to chance .05 5% chance difference is due to chance .01 1% chance difference is due to chance Communication Research 16 T-Test in SPSS SPSS will allow you to do all of these tests quite easily Communication Research 17 If the value is less than 0.05, the chance is less than 5%, so the significance of the difference is 95% which is highly significant. Levene’s test, checks to see whether the variances of the two variables are relatively similar. If the significance for Levene's test is 0.05 or below, then the ‘Equal Variances Not Assumed’ t-test result (the one on the bottom) is used. Otherwise you use the ‘Equal Variances Assumed’ test (the one on the top) Communication Research 18 T-Test for Independent Samples  Remember, we need to know two other things in order to ascertain the likelihood of chance creating this size of a difference:    The number of people in each group The variability of the scores The number of people is easy, and is counted in the frequency table (n) Communication Research 19 Variability  In order to know whether a difference between two means is important, we need to know how much the scores vary around the means. Communication Research 20 Variability    Holding the difference between the means constant With High Variability the two groups nearly overlap With Low Variability the two groups show very little overlap Communication Research 21 Measuring Variability Communication Research  Medium Variance  High Variance  Low Variance 22 Measuring Variability    Usually it’s easier to work with the square root of the variance. This statistic is called the Standard Deviation. SPSS statistical tests will calculate the SD for you Communication Research 23 ANOVA      ANOVA is an acronym; ANalysis Of VAriance. It is an extension of the two-tailed t-test, and is generally used to test for significant differences between means. The name is derived from the fact that in order to test for statistical significance between means, we actually compare (or analyse) variances. For two-group comparisons, ANOVA will give results identical to a ttest, but when the design is more complex, ANOVA offers numerous advantages that t-tests cannot provide (even if you run a series of ttests comparing various cells of the design). For example, it often happens in research practice that you need to compare more than two groups (e.g., drug 1, drug 2, and placebo), or compare groups created by more than one independent variable while controlling for the separate influence of each of them (such as Gender, Type of Drug, and Size of Dose). Communication Research 24 Communication Research 25 Communication Research 26 Communication Research 27 Independent T test John Robert Communication Research 28 Dependent T-test John John Communication Research 29 Oneway ANOVA John Robert Kevin Communication Research Tom 30 Repeated Measures ANOVA John John John Communication Research John 31 Factorial ANOVA John Robert Kevin Tom Janine Roberta Katie Teresa Communication Research 32 Mixed ANOVA John John John John Janine Janine Janine Janine Communication Research 33 Chi2 Test     Chi Square (X2) - is a non-parametric test It uses nominal data and checks to see if there are significant differences between/among groups compared to what would be expected. The crosstabulation table tells you whether selected variables are related to other selected variables; the chisquare table tells you what the degree of certainty is. Chi-square is based on the fact that for a two-way table, we can compute the frequencies that we would expect if there was no relationship between the variables. Communication Research 34 Chi2 Example   Suppose we ask 20 men and 20 women to choose between two brands of soft drink - brands A and B. If there is no relationship between preference and gender, then we would expect about an equal number of choices of brand A and brand B for each gender. The chi-square test becomes increasingly significant as the numbers deviate further from this expected pattern; that is, the more this pattern of choices for men and women differs. Communication Research 35 Communication Research 36 Communication Research 37 Communication Research 38 Communication Research 39 If our Chi2 test statistic has exceeded the critical value for:    The .10 significance level it would mean that there was only a 10% chance of seeing a difference that large that resulted from chance. The .05 significance level it would mean that there was only a 5% chance of seeing a difference that large that resulted from chance. The .025 significance level it would mean that there was only a 2.5% chance of seeing a difference that large that resulted from chance. Communication Research 40 If our Chi2 test statistic has exceeded the critical value for:   The .01 significance level it would mean that there was only a 1% chance of seeing a difference that large that resulted from chance. The .005 significance level it would mean that there was only a 0.5% chance of seeing a difference that large that resulted from chance. Communication Research 41 Pearson correlation     The Pearson correlation looks for a relationship between two variables and generates a mathematical index of the relationship between them. The value lies between -1.00 and +1.00, and the bigger the number the stronger the relationship. Pearson correlation indicates trends - one thing increases (or decreases) as another thing increases (or decreases). A negative value indicates that low scores on one variable go with high scores on the other variable, while a positive value indicates that a high score of one variable goes with a high score on the other variable. Communication Research 42 Communication Research 43 Note that SPSS will allow you to calculate different tests of significance as well as note significant correlations Communication Research 44 Communication Research 45

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Ethics & Research