Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CLASS NOTES: Introduction to Analysis of Variance CONCEPT CALCULATION/EXAMPLES Analysis of Variance: A hypothesis testing procedure that is used to evaluate mean differences b/t 2 or more treatments (or populations). * ANOVA uses sample data to draw general conclusions about a population (sound familiar?) * The goal of ANOVA is to determine whether the mean differences observed among the samples provide enough evidence to conclude that there are mean differences among the populations; whether or not there is a difference b/t treatment methods. * ANOVA can be used w/ either an independent-measures or a repeated measures design. * Analysis means dividing into smaller parts. Since we are analyzing variance between means, that is why this process is called Analysis of Variance or ANOVA. Factor: The variable (independent or quasi-independent) that designates the groups being compared. An independent variable is one that is manipulated by the researcher whereas a quasiindependent variable is one that is not manipulated by the researcher. APPLICATION *ANOVA is similar to t-tests in that they test for mean differences, but t-tests are only limited to 2 treatments whereas ANOVA can compare two or more treatments. * Remember that the independent variable is the variable manipulated by the researcher. * The ability to combine different factors & to mix different designs w/in one study provides researchers w/ the flexibility to develop studies that address scientific questions that could not be answered by a single design using a single factor. Independent factor: Temperature, where the researcher will manipulate the temperature in the room Quasi-independent factor: Age, where the researcher will compare a treatment for different age groups. Remember that in a quasiexperiment, although the independent variable is manipulated, it is not created by the experimenter, but already exists in nature, such as “age.” Ex: (independent measures) measuring differences in trial medication for Evaluating differences between the variable medication at 10mg, 50mg & 100mg Variance: A measure of variability that indicates how far all of the scores in a distribution vary from the mean. Between-Treatments Variance: The variance b/t treatment conditions or the difference b/t the overall means of the treatment conditions. sample set 1: 10mg sample set 2: 50 mg sample set 3: 100mg There are two possible explanations for differences b/t treatment conditions or differences b/t treatment condition means: 1) treatment effect: The difference is caused by the treatments. * which would correspond w/ your alternative hypothesis 2) chance: The difference is simply due to chance. * error There are 2 primary sources for Chance differences: When computing the betweentreatment variance, we are measuring differences that could be caused by either individual differences or experimental error. By analyzing these differences, we can then establish how big the differences are when there is no treatment effect involved or how much difference (or variance) occurs by chance alone (null hypothesis). 1) Individual differences: There are variances b/t scores for each sample group b/t of there are different participants w/ different scores for each sample. 2) Experimental Error: There is always potential for some degree of error in general. Within-Treatment Variance: The variance within each sample. Ex: (independent measures) evaluating the differences b/t measures for sample set 1 Those differences or variances b/t scores for each individual even though they may have received exactly the same treatment. Not only are there variations we are looking for between treatments, but also variations within the individual sample sets. This is not a complete list of all notations & formulas, but those that are different from earlier notations & also those that are new. There are other notations & formulas used in ANOVA that are not listed here, but that you are already familiar w/. ANOVA Notation & Formulas Example: k = Number of treatment conditions 3 different treatments for Alzheimer’s Disease k = 3 (treatment conditions) If the scores (or participants) in each treatment condition are not equal (i.e., 5 for one treatment, 4 in another, 6 in another); then you can identify a specific sample by using a subscript (ex: n2 = number of scores for treatment condition #2) n = Number of scores in each treatment If there are 5 scores in each treatment condition, n = 5 N = Total # of scores in the entire study If there are 9 scores altogether (3 treatment conditions, 3 scores in each treatment condition), then N = 9 T = Total value of each treatment condition T = ∑X C = Correction Factor G = The sum of all the scores in the research study (the grand total) SS = Sum of Squares MS = Mean square dfT = Degrees of Freedom TOTAL C = (∑X1 + ∑X2 + ∑X3…..)2 N Add up all the N scores or add up the treatment totals G = Σ (∑X1 + ∑X2 + ∑X3…..) SS = ∑X2 – (∑X)2 N MS = SS df N–1 This is the total df formula for ANOVA. This formula looks familiar, yes? The Sum of Squares formula is the numerator portion of our standard deviation formula! You will be plugging in different values into the SS & df place depending upon which MS you are calculating: for between treatments, within treatments, between subjects or error. There are variations to this formula based upon the following factors: Repeated vs. Independent Measures ANOVA, Between-treatment variance, Between-subject variance & error. See additional materials on Repeated & Independent measures for these specific df formulas. Also, see full example for calculations of each degrees of freedom) DISCLAIMER: There are no universally accepted notations for ANOVA & you may find other sources or use other symbols. But for the purposes of this class, these are the symbols that we will be using & recognizing w/ ANOVA. Hypothesis Testing with ANOVA One-Way F-Ratio Between Subjects Step 1: State Your Hypothesis Null hypothesis (Ho) states that there is no difference b/t treatments Alternative hypothesis (H1) states that at least one population mean is different from another. Here is where the alternative hypothesis looks different from the other alternative hypotheses. Since we are comparing more than 2 treatment methods, there are a number of different alternatives that can exist (A > B, but B = C; A = B = C; A = B, but B < C, etc..). So all we need to “reject the null” is for at least one difference b/t treatments to exist since the null hypothesis would require no variance b/t means. * Researchers usually have some idea of what difference they are looking for in their study. Step 2: Set the Criteria for a Decision a. Set your alpha level (alpha) = .05 Remember that most researchers choose from alpha levels of .05, .01 or .001 dft = dfw + dfb or dft = N – 1 dft = total degrees of freedom dfw = degrees of freedom within treatment groups dfb = degrees of freedom between treatments Σ(n – 1) or dfw = N – k dfw = The total number of all values or scores in the entire study minus the number of treatment b. Calculate the degrees of freedom (this is for independent measures. See Class Notes for repeated measures ANOVA for specific df formulas related to repeated measures) 1. df Total 2. df Within Treatments conditions. If you were to have an N of 9 (9 total scores) & 3 treatment methods, then your w/in samples df would be 9 – 3 or 6. 3. df Between Treatments c. Determine your critical region. Look in the back of your textbook in the appendices for the Critical Values of the F for the analysis of variance dfb = k – 1 The number of treatment effects (k) minus 1. So if there are 3 treatment effects, then 3 – 1 = 2 For the F table, the row across the top indicates the between treatments degrees of freedom & the column on the far left indicates the within treatments degrees freedom. Put your finger on these two values & bring them together to indicate the F-value that separates the critical region from the null region. (one for an α = .05; an F value in bold to represent α = .01 As in Correlation & T-stats, we look to the normal curve to determine the cut-off point between the critical region (the tail(s) of the distribution where, should the Fratio fall, the outcome would be to reject the null hypothesis, meaning that at least one sample mean is different from the others) & the null region (the body portion of the distribution where, should the Fratio fall, the outcome would be to fail to reject the null hypothesis indicating that there is no difference b/t the sample means) Step 3: Collect Your Data & Compute your Sample Statistics ANOVA Formula For Between Treatment Variance For each treatment condition, calculate… a) ΣX, ΣX2 & (ΣX)2 Review example in text b) n n = number of participants/ sources in each sample c) M The mean for each treatment condition For the full or TOTOAL set of values, calculate…. d) N N = The total number of all scores in the experiment e) M Mean of each treatment mean By adding the mean values together & dividing by the number of treatment measures f) ΣX, ΣX2 & (ΣX)2 For the total set of values (see example for calculations) Sum of Squares g) SSt or Total SSt = ΣX2 – (ΣX)2 N h) SSb or between treatments SSb = ∑ (∑Xg)2 – (∑X)2 g ng N i) SSw or within treatments SSW = ∑ ∑Xg2 – (∑Xg)2 g ng Using your total values The sum of squares between treatment groups. The Sigma (∑) represents the sum. The ‘g’ beneath the Sigma indicates that you should repeat the formula for each treatment. The subscript ‘g’ represents the value associated w/ each Mean Square j) MSb MSb = SSb dfb The mean square for between groups is the sum of squares for between groups divided by the degrees of freedom for between groups. MSw = SSw Dfw The mean square for within groups is the sum of the squares for within groups divided by the degrees of freedom for within groups. k) MSw The F-Ratio Requirements for Using the FRatio: * The sample groups have been randomly & independently selected * There is a normal distribution in the population from which the samples are selected. * The data are in interval from (or ratio) * The within-group variances of the samples should be fairly similar. F= Variance between treatments Variance w/in treatments F = MSb MSw * Remember your t-statistic: Obtained difference from _________sample means_______ Difference expected by chance * notice that the F-ratio is based on variance instead of sample mean difference. Again, b/c there are a number of different possibilities that can exist b/t the different means in the case of ANOVA, then we calculate the overall variance b/t the sample means. This property called homogeneity of variance, simply means that ANOVA demands sample groups that do not differ too much w/ regard to their internal variabilities. * The numerator of the ratio measures the actual difference obtained from the sample data, & the denominator measures the difference that would be expected if there is no treatment effect. * When the treatment has no effect & any difference is simply due to chance (the effect size is “0”), then the F-ratio should be around “1.00” * When the treatment does have an effect, then the numerator (b/t treatment differences) should be larger than those due to chance (differences due to chance), so your F-ratio should be noticeably larger than 1.00. The Distribution of F-Ratios Once you have computed your FRatio score… Since there are 2 df’s, then it is expressed as : df= 2, 12 The first df listed as your between treatments df & your second the within treatments df. MSwithin = ΣSS = SS1 + SS2 + SS3... Σdf df1 + df2 + df3 Remember that pooled variance is used when you have unequal ‘n’ values. 1) Go to the F distribution table with your dfbetween treatments score (calculated in the numerator portion of your FRatio formula) & dfwithin treatments score (found in the denominator portion of your F-Ratio formula). 2) Locate these 2 df’s on the table (numerator is listed in the row above whereas the denominator is listed in the column on the right). 3) Connect these 2 scores together in the middle. Regular type scores give you the critical value for alpha level of .05. Bold will give you the critical value for alpha level .01. MSwithin & Pooled Variance Just as in the t-statistic where each SS was added for the numerator & divided by each df added as the denominator. Same here, you just keep adding however many “SS’s” & “df’s” you have according to how many sample groups you have in your research study. Based upon your results, you will either Step 4: Make a decision Reject the null, meaning at least one mean was different from the others Or You fail to reject the null, meaning that there were no differences b/t the different treatment conditions or means. Example of ANOVA Time Before Therapy After Therapy 6 Months After Therapy Therapy 1 (Group #1) Scores for group #1 measured before Therapy #1 Scores for group #1 measured after Therapy #1 Scores for group #1 measured 6 months after Therapy #1 Therapy 2 (Group #2) Scores for group #2 measured after Therapy #2 Scores for group #2 after Therapy #2 Scores for group #2 6 months after Therapy #2 Therapy Technique Hypothetical Data Treatment 1 50º (sample 1) 0 1 3 1 0 Treatment 2 70 º (sample 2) 4 3 6 3 4 Treatment 3 90 º (sample 3) 1 2 2 0 0 M=1 M=4 M=1