Download Chapter 3

Comparison of groups Comparison of groups The purpose of an analysis is often to compare different groups of data. Suppose, for example, that a meat scientist wants to examine the effect of three different storage conditions on the tenderness of meat. For that purpose 24 pieces of meat have been collected and allocated into three storage (or treatment) groups, each of size eight. In each group all eight pieces of meat are stored under the same conditions, and after some time the tenderness of each piece of meat is measured. The main question is whether the different storage conditions affect the tenderness: are the observed differences between the groups due to a real effect, or due to random variation? Example: Parasite counts for salmons An experiment with two difference salmon stocks, from River Conon in Scotland and from River Ätran in Sweden, was carried out as follows. Thirteen fish from each stock were infected and after four weeks the number of a certain type of parasites was counted for each of the 26 fish with the following results: The purpose of the study was to investigate if the number of parasites during an infection is the same for the two salmon stocks. Example: Parasite counts for salmons Example: Parasite counts for salmons The mean and sample standard deviations are computed to The summary statistics and the boxplots tell the same story: The observed parasite counts are generally higher for the Ätran group compared to the Conon group, indicating that Ätran salmons are more susceptible to parasites. The purpose of the statistical analysis is to clarify whether the observed difference is caused by an actual difference between the stocks or by random variation. Example: Dung decomposition An experiment with dung from heifers was carried out in order to explore the influence of antibiotics on the decomposition of dung organic material. As part of the experiment, 36 heifers were divided into six groups. All heifers were fed a standard feed, and antibiotics of different types (alphaCypermethrin, Enrofloxacin, Fenbendazole, Ivermectin, Spiramycin) were added to the feed for heifers in five of the groups. No antibiotics were added for heifers in the remaining group (the control group). For each heifer, a bag of dung was dug into the soil, and after eight weeks the amount of organic material was measured for each bag. Example: Dung decomposition Example: Dung decomposition The observations together with group means (solid lines) and the total mean (dashed line) are shown on the left, and parallel boxplots are shown on the right panel. The amount of organic material appears to be lower for the control group compared to any of the five types of antibiotics, suggesting that decomposition is generally inhibited by antibiotics. However, there is variation from group to group (between-group variation) as well as a relatively large variation within each group (within-group variation). The within-group variation seems to be roughly the same for all types, except perhaps for spiramycin, but that is hard to evaluate because there are fewer observations in that group. Example: Dung decomposition The sample means and the sample standard deviations are computed for each group separately. We find the same indications as we did in the boxplots. On average the amount of organic material is lower for the control group than for the antibiotics groups, and except for the spiramycin group the standard deviations are roughly the same in all groups. Group means and SD’s Consider the situation with n observations split into k groups. Label the groups 1 through k. Let g(i) denote the group for observation i. Then g(i) has one of the values 1,…,k. The sample mean and sample standard deviation in group j are given by Residual variance The residual variance s2 can also be computed as a weighted average of the group variance estimates, sj2, as follows Note that the group variance sj2 is assigned the weight nj−1, the denominator in (3.1). The summation of (3.5) is called the residual sum of squares. Within-group variation Within-group variation refers to the variation in each of the groups. It is illustrated by the vertical deviations between the observations and their corresponding group means. The residual sum of squares is given by SSe describes the within-group variation since it measures squared deviations between the observations and the group means. The residual degrees of freedom is dfe = n−k, so the residual mean squares is Between-group variation Between-group variation refers to differences between the groups; for example, deviation between the different treatments in the antibiotics example. It is illustrated by the vertical differences between the group means (horizontal line segments) and the overall mean (dashed line): When we examine the between-group variation, the k group means essentially act as our “observations”; hence, dfgrp = k−1, and the “average” squared difference MSgrp per group becomes the between-group variation. Analysis of variance If there is no difference between any of the groups, then the group averages will be of similar size and be similar to the overall mean. Hence, MSgrp will be “small”. On the other hand, if groups 1 and 2, say, are different, then the group averarages will be somewhat different; hence, MSgrp will be “large”. “Small” and “large” should be measured relative to the within-group variation, and MSgrp is thus standardized with MSe. We use Large values of Fobs are critical; that is, not in agreement with the assumption (hypothesis) that there is no different between any of the groups. Analysis of variance This disagreement is equivalent to Fobs being larger, and the corresponding pvalue are often inserted in an analysis of variance table. The p-value of being smaller than 0.05 indicates significance evidence toward the disagreement between groups. Example: Dung decomposition We conclude that there is strong evidence of group differences. Subsequently, we need to quantify the conclusion further: Which groups are different and how large are the differences? Paired sample and dependence Paired samples occur, for example, if two measurements are collected for each subject in the sample under different circumstances (treatments), or if measurements are taken on pairs of related observational units such as twins. In dietary studies with two diets under investigation, for example, it is common that the subjects try one diet in one period and the other diet in another period; thus, they are “dependent.” As a consequence, the betweengroup variation is confused with the within-group variation, making the analysis of variance inappropriate for paired data. Independent samples  It is important to distinguish paired samples from unpaired—or independent—samples, because different methods of analysis are appropriate. For unpaired samples like the dung decomposition data, we impose an assumption of independence between all observations. This means that the observations do not share information.  This setup with independent samples corresponds to a one-way analysis of variance, or one-way ANOVA. It is called “analysis of variance” because different sources of variation are compared and “one-way” because only one factor—the treatment or grouping—is varied in the experiment. Summary of grouped data  Two independent samples where the samples correspond to two different groups or treatments and can be assumed to be independent.  Independent samples where the samples correspond to k different groups or treatments and can be assumed to be independent.  Paired samples where the observations consist of pairs of measurements, with the observations in a pair corresponding to two different groups or treatments. The first case is a special case of the second, but we emphasize it anyway, for two reasons. First, it is very important to distinguish two independent samples from paired samples because different analysis methods are appropriate. Example: Word count The attach() command makes it possible to use the variables GENDER, GROUP, COUNT with reference to the data frame. > attach(Data) The following command produces boxplots grouped by GENDER. > boxplot(COUNT ~ GENDER, col="green", ylab="Word counts per day") Example: Word count COUNT on the left-hand side of ~ in the call to aov() is modeled as grouped data indicated by GENDER on the right-hand side. The output will then list the analysis of variance table, and the group means. > outcome <- aov(COUNT ~ GENDER) > summary(outcome) > Means <- model.tables(outcome, "means") > Means

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Chapter 3