Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Degrees of freedom (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Psychometrics wikipedia , lookup
Misuse of statistics wikipedia , lookup
Categorical variable wikipedia , lookup
Analysis of variance wikipedia , lookup
Omnibus test wikipedia , lookup
WORKSHOP 2 MBA 9: RESEARCH AND QUANTITYATIVE METHODS Understand your data Need to understand your research question/objective/aim to determine what type, and level of data is required • QN data is numerical • Deductive • Properties have implications for type of analysis – Internal/external – Discrete/Continuous – Primary/Secondary Levels of Measurement Increase in mathematical and statistical ability Ratio Interval Ordinal Nominal Bar Chart Nominal/Ordinal Histogram Interval/Ratio Line Chart Temporal Descriptive Organises Summarises Inferential Infer from a sample of the population of interest Summarising Data Three ways: • There are three basic ways in which we summarise data: – Central tendency, or the single value that best describes the sample (Mean, Median, Mode) – The spread of the distribution (Variance and Standard Deviation) – The shape of the distribution (Skewness and Kurtosis) 1. The centre of the distribution or measure of central tendency 2. The spread of data 3. Shape of distribution 1. Measure of central tendency • Mean, median, mode • Examine outliers Measure Usage Advantages Disadvantages Mean Most familiar average. Use mainly for ratio/interval. Exists for each dataset. Takes every score into account. Works well with many statistical methods. Is affected by extreme scores. Median Commonly used. Use mainly for ordinal. Always exists. Is not affected by extreme scores. Is often a good choice if there are some extreme scores in the dataset. Does not take every score into account. Mode Sometimes used. Use mainly for nominal. Is not affected by extreme scores. Is appropriate for data at the nominal level. It might not exist or there may be more than one mode. It does not take every score into account. 12 22 12 42 29 10 33 40 12 Mode Median Mean Most frequently occurring number in a dataset Sort the data in ascending order, then take middle number Add all of the numbers together, and divide by the total number of numbers In numbers above: 𝑀𝑜 = 12 Occurs 3 times 10 12 12 12 22 29 33 40 42 𝑀𝑒 = 22 12+22+12+42+29+10+33+4 0+12 = 212 212 9 = 23.56 𝜇 = 23.56 2. Spread of the data Level of Representation measurement Nominal Table or frequency distribution showing frequencies Ordinal Tables/frequency distribution, but choosing a single measure is problematic. Use interquartile range if single measure is chosen. Interval/Ratio Graphic dispersion, standard deviation provided cases have an approximately normal distribution. Standard Deviation and Variance • The most important and commonly used measure of variability in statistics is the variance. • Variance is a measure of the dispersion in a set of scores, and is calculated by determining the ‘average distance’ of a set of scores from its ‘centre’ or mean, by the formula: • • • • Sums of Squares -SS Degrees of Freedom df Standard Deviation • The standard deviation σ, in the case of a population, is the square root of the variance (σ2), so the formula is the same as for the variance, except that a square root is calculated 3. Shape of the distribution • Look at: – Symmetry (Skewness) – Peakedness (Kurtosis) ess skewn kur tosis Data Analysis • Enhance knowledge – Break knowledge down into elements – Elements contain concepts • Can look at relationship between concepts • Can look for differences or associations • Test hypotheses Term Definition Data Groups of observations Attribute/Value Characteristic of the studied phenomenon e.g. female Variable Logical collection of attributes, or characteristics, e.g. gender, with attributes male and female Response Variables Cases Variables we are interested in The individual, person, things, events where we get our information from • To test hypotheses we need variables • A variable or construct is theoretical or operational (explains how we are going to measure the variable) • Operationalization is a process of defining the measurement of a phenomenon that is not directly measurable • Two types of variables: Independent Variables (IV’s) and Dependent Variables (DV’s) • IV’s – the thing we think causes the DV • DV’s – is the effect VARIABLE/CONCEPT Hypothesis Testing • Hypothesis testing uses sample evidence to statistically test whether a claim made about a population is valid. The results of the sample are used to make an inference about the population as a whole. Requires the identification of Independent, and Dependent variables. – Null Hypothesis: Assumes no difference in the state of affairs – Alternate hypothesis: If our theory is, in fact, true beyond reasonable doubt If I hypothesised the more cups of coffee had, the more alert • IV and DV? IV: Number of cups of coffee DV: Level of alertness Hypothesis: 𝑯𝟎 : There is no significant difference in the level of alertness between those people drinking coffee, and not drinking coffee 𝑯𝟏 : There is a significant difference in the level of alertness between those people drinking coffee, and those not drinking coffee There is a significant difference in the level of alertness between those people drinking coffee, and those not drinking coffee What's normal ?? Normal Distribution • Smooth continuous curve representing the form a binomial distribution would take for an infinite number of events with equiprobable outcomes • • • • Bell-shaped curve Symmetrical Unimodal (Mean, Median, Mode all coincide) Tails extend indefinitely to the left and right Normal Distribution cont… • The area under the curve of a normal distribution represents probability • Allows us to determine where an individual score lies in relation to other scores • Model of the shape of the frequency distribution of many naturally occurring phenomena • Help us understand the “relative position” of a case relative to other cases How do you calculate where an individual stands relative to the normal distribution The Standard Normal Distribution • Distributions allow us to predict probability or proportion from an individual score… • But to do so we need three pieces of info.. – Mean – Variance – Shape • Every phenomenon has a different distribution (different means and variances), but all have the same shape (normal shape) So what’s the problem? Because there are so many different types of distributions – each distribution has a different proportion of cases falling below any particular score The Standard Normal Distribution • Measuring standard for all distributions • Mean = 0 • Variance = 1 • Defined: • Standard deviation units (z-scores) Difference between Normal and Standard Normal • Normal distributions have x-values along x-axis (individual scores); standard normal distribution has z-values • z-scores: – Standardised scores – Do not depict real values of individuals – Hypothetical values to show where an individual case lies relative to other cases – Indicate the number of SD units a score lies above or below the mean Parametric vs non-Parametric Tests Normality Independence Parametric tests Homogeneity of variance But how do I decide how to test this? • • • • • • Frequency? Relationships? Difference? Association? Correlation? Am I trying to forecast? Determined by?..... YOUR AIM! Decision tree Number 1… CHI-SQUARE Nominal/Categorical: Chi-Square • Significance test used where the data consists of counts rather than scores • Most MBA dissertations will involve the use of categorical data • Best analysed: – Basic descriptives • Frequency tables • Crosstabs Classifications: • Dichotomous classifications: married and single, children and adults, politically active and politically indifferent, etc. • Multiple classifications: Sheldon’s classification of body types as ectomorphic (thin), mesomorphic (muscular), and endomorphic (fat) Classifications are of interest to a researcher mainly when they are exhaustive and mutually exclusive Chi-Square cont… Contingency tables, or Crosstabulations • Frequency tables summarise a single categorical variable • Cross-tabulations summarise the relationship between two categorical variables • When data are classified with respect to two or more variables Notice something? Mutually exclusive! Exhaustive! Cell Frequency/Count (count rather than a continuous measurement) The χ² significance test • Appropriate for the analysis of counts is the χ² test • Used as a goodness of fit test (i.e. does the existing data fit a theoretical distribution, such as a normal distribution?) • The null hypothesis would be that no association exists between the sets of categories The χ² significance test • The key concept in this test is the notion of an expected frequency: – What we would expect if only chance variation were operating across the categories of interest, and the category frequencies were in fact equal in the population It is not difficult to calculate expected frequencies for the teadrinker data. The population as a whole contains 84 moderate tea-drinkers. If the sample of long sleepers and the sample of short sleepers were from the same population, we would expect the moderate tea-drinkers to be distributed in proportion to the number of people in each sample. Since the two samples actually contain the same number of people, we would expect the same number of moderate teadrinkers in both the long and short sleepers, i.e. 42 people. We can work out the other columns in a similar fashion. The general principle for working out the expected frequency in each cell of a contingency table is: • Once we have expected and observed values for each cell, we are in a position to calculate the χ2 statistic. • To calculate χ2, we apply the formula below. The resulting total is the χ2 value, and we can look up its significance in the χ2 tables. • Obviously, it will always have a positive value because of the squaring of the differences. Alternative measures for contingency tables • Why? – size of the sample, confounding sample size and effect size • The simplest measure of effect size, the mean square contingency coefficient (usually denoted by φ ) simply divides χ²by the size of the sample: 2 In the case of our tea-drinking study, φ² = χ²/n = 10.68/200 = 0.0534, which indicates a very small effect Cramer’s V • φ² is, however, not considered a good measure of association, largely because it does not generate scores that fall between 0 and 1 in the same way as a correlation does • Nevertheless φ² is used, with some modifications, in meta-analytic studies • A measure of association in contingency tables with somewhat better properties is Cramer’s V, usually denoted by φ𝑐 : Odds Ratio • Unaffected by sample size or by unequal row or column totals • 2 x 2 tables – Collapse over one of the categories to generate a 2 × 2 table. – Collapsing over categories is in general not a good idea because it can (sometimes) alter the meaning of the data, either obscuring or exaggerating the association between the categories. Assumptions of the χ² test There are two assumptions that must be satisfied if a χ2 test is to be used appropriately 1. Expected frequency minimum: • The number of subjects expected in each cell must reach a certain minimum • A rule of thumb that is frequently used is that the expected frequency should be no less than 5 in at least 80% of the cells 2. All the items or people involved in the test are independent of each other: • Each observation comes from a different subject. • No subject should be omitted from the table. Interpreting Chi-Square Case Processing Summary Cases Valid Missing N Percent N Percent Gender * Level_of_satisfaction 100 100.0% 0 Total N 0.0% Percent 100 100.0% Gender * Level_of_satisfaction Crosstabulation Count Gender Level_of_satisfaction Satisfied Dissatisfied 26 26 27 21 53 47 Male Female Total Value Pearson Chi-Square Continuity Correction Likelihood Ratio Chi-Square Tests Asymp. Sig. (2sided) df Crosstabs (2 x 2) 52 48 100 a 1 .532 .181 1 .671 .392 1 .391 b Total Exact Sig. (2sided) .554 .336 .531 .554 .336 .534 .554 .554 .336 .336 Fisher's Exact Test Linear-by-Linear Association N of Valid Cases c .387 100 1 a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 22.56. b. Computed only for a 2x2 table c. The standardized statistic is -.622. Ordinal IV Exact Sig. (1sided) Point Probability If assumptions are met If assumptions are violated .131 Check Assumptions Decision Making Tree Number 2… REGRESSION Regression • Paired Data – Allows us to measure the relationship between two measures – Collected from two INDEPENDENT measurements • Refined way of analysing scatterplots X-axis: Predictor or IV Y-axis: Criterion or DV Trend: Overall shape of plotted points Best fitting line or Regression line • Best fitting line that can be drawn through the points on a scatterplot • Linear Regression: Straight line • Non-linear Regression: Curved line Finding Regression Coefficients • To define a straight line, 2 pieces of information are required: – Slope – Intercept: point on graph where crosses the y-axis y represents the percentage of people on the criterion variable x represents the predictor variable a and b represent the two pieces of information required to fit the line (i.e. b is the slope, and a is the intercept) (Regression coefficients) Calculating Regression Coefficients n: the number of pairs of values Σx: the sum of the x values Σy: the sum of the y values Σx²: the sum of the squares of the x values Σxy: the su These intermediate values are substituted into the following equation to find the covariance, 𝑠𝑥𝑦 , and following this, the slope, b: Calculating a • Having calculated b, we can find the intercept a. • The midpoint of all the points on the scattergraph is the middlemost point in the scatter • substitute these mean values into the general equation for a line (y a + bx) and then rearrange to solve for a: Making predictions • Regression equation is essentially a mathematical summary of what we think the relationship between the two variables might be • We can use this mathematical relationship to make predictions, though not without some danger of making a mistake Linear Regression Regression applied Descriptive Statistics Mean 21.35 2.69 Level_of_alertness Cups_of_coffee Std. Deviation 12.175 1.942 N Descriptives 55 55 The “descriptives” command also gives you a correlation matrix, showing you the Pearson rs between the variables (in the top part of this table). Correlations Pearson Correlation Sig. (1-tailed) N Level_of_alertness Cups_of_coffee Level_of_alertness Cups_of_coffee Level_of_alertness Cups_of_coffee Level_of_alertne ss Cups_of_coffee 1.000 .989 .989 1.000 . .000 .000 . 55 55 55 55 This table tells you what % of variability in the DV is accounted for by all of the IVs together (it’s a multiple R-square). The footnote on this table tells you which variables were included in this equation Model Summary Adjusted R Std. Error of the Square Estimate Model R R Square 1 .989a .978 .977 1.826 a. Predictors: (Constant), Cups_of_coffee ANOVAa Model 1 Regression Residual Total Sum of Squares 7827.647 df Mean Square 1 7827.647 176.789 53 8004.436 54 a. Dependent Variable: Level_of_alertness b. Predictors: (Constant), Cups_of_coffee 3.336 F 2346.666 Sig. .000b This table gives you an F-test to determine whether the model is a good fit for the data. According to this pvalue, it is. Regression applied cont… Coefficientsa Model 1 Unstandardized Coefficients B Std. Error (Constant) 4.666 .423 Cups_of_coffee 6.198 a. Dependent Variable: Level_of_alertness .128 Standardized Coefficients Beta .989 t Sig. 11.024 .000 48.442 .000 Finally, here are the beta coefficients—one to go with each predictor. (Use the “unstandardized coefficients,” because the constant [beta zero] is included). Based on this table, the equation for the regression line is: y = 4.666 + 6.198 (cups of coffee) Using this equation, given values for “cups of coffee,” you can come up with a prediction for the “level of alertness” variable So what’s the problem? • The regression line is a useful statement of the underlying trend, but it tells us nothing about the strength of the relationship. • Correlation is a measure of the strength of linear association between two variables Decision Making Tree Number 3…… CORRELATION Correlations • Useful to gauge the strength of a relationship by looking at a scatterplot • More formal manners….. Correlation! Parametric Correlations Product-moment coefficient of correlation OR Pearson’s correlation coefficient Correlations • Calculated on the basis of how far the points lie from the ‘best-fit’ regression line • Symbolised by the small letter r • r will fall with in the range –1 to +1. • –1 means a perfect negative correlation (a perfect inverse relationship, where, as the value of x rises, so the value of y falls) • +1 means a perfect positive correlation (where the values of x and y rise or fall together) • An r of 0 means zero correlation, which means that there is no relationship between x and y Calculating r x is the variable on the horizontal axis y is the variable on the vertical axis sx and sy are the standard deviations of x and y, respectively sxy is the covariance between x and y Strength 0.0 to 0.2 - 0.2 to 0.4 - 0.4 to 0.7 0.7 to 0.9 0.9 to 1.0 - Very weak to negligible correlation Weak, low correlation (not very significant) Moderate correlation Strong, high correlation Very strong correlation Non-Parametric Correlations Spearman’s Rho • Spearman’s ƿ is a statistic for measuring the relationship between two variables • It is a non-parametric measure avoids assumptions that the variables have a straight-line relationship • Used when one or both measures is measured on ordinal scale. Spearman’s Rho • A value of 0 indicates no relationship and valu es of +1 or 1 indicate a one-toone relationship between the variables or ‘per fect correlation’ • The difference is that Spearman’s rho refers to the ranked values rather than the original measurements. Correlations Applied Correlations Cups_of_coffee Pearson Correlation Sig. (2-tailed) Level_of_alertness Level_of_alertne ss Cups_of_coffee 1 .989** N Pearson Correlation Sig. (2-tailed) N **. Correlation is significant at the 0.01 level (2-tailed). .000 55 .989** 55 1 .000 55 Significant (two-tailed) Strong, positive correlation 55 Decision Making Tree Up next…….. DIFFERENCES But first… SOME KEY CONCEPTS Key Concepts - Effect Size • Effect size: measures of how strong the association is between the two sets of categories that define a table Effect Size • Effect size is an estimate of the proportion of total variance explained by differences among the treatment means, and is thus an indication of the strength of the effect. • The meaning of effect size is evident in the formula to compute eta-squared (η²), a widely used index of effect size. • Although η² is a biased estimate of effect size, it is simple to calculate by hand and quite easy to understand Degrees of freedom • Degrees of freedom are commonly discussed in relation to chi-square and other forms of hypothesis testing statistics • Each of a number of independently variable factors affecting the range of states in which a system may exist, in particular • Degrees of freedom are used to then determine whether a particular null hypothesis can be rejected based on the number of variables and samples of in the experiment. For example, while a sample size of 50 students might not be large enough to obtain significant information, obtaining the same results from a study of 500 samples can be judged as being valid. df • The concept of degrees of freedom is central to the principle of estimating statistics of populations from samples of them. • "Degrees of freedom" is commonly abbreviated to df. • Think of df as a mathematical restriction that needs to be put in place when estimating one statistic from an estimate of another. Parametric vs. non-Parametric Tests • All statistical tests need to estimate probabilities, and if distribution-free tests do not use the well-understood characteristics of the normal curve, how do they estimate probabilities? • Most distribution-free tests use either the characteristics of ranked data or they use randomisation procedures to calculate probabilities. Parametric Tests 1. The assumption of normality • It is assumed that all the samples you are analysing have been drawn from populations that are normally distributed. • You can get a rough idea if data is normally distributed by drawing a histogram of the data and examining the shape of the distribution. • If the histogram has a bell shape, then it is probably normally distributed. 2. The assumption of homogeneity of variance • If your samples have variances that are highly different, then it is difficult to get accurate results from a t-test • This can be formally checked for, but is quite complex. We can ‘cheat’ and say that if the two variances differ by a factor of less than 4, the variance is probably homogenous. This is a rule of thumb, so it is not perfect, but seems to work a lot of the time. More assumptions The assumption of Independence • The majority of t-tests (with the exception of the repeated measures t-test) assume that the samples the means were calculated from did not influence each other’s scores in any way. For example, if you collect two datasets from the same group of people (as in a pretest/post-test design), then these two datasets are not independent. Assumption of Normality Tests of Normality a 1. Department 2. Gender 3. Highest level of education 4. Age: 5. Length of service at Mhlathuze Water: 6. Marital Status 7. How satisfied are you, working at Mhlathuze Water? 8. How satisfied are you working in your current department? 9.1 “Mhlathuze Water is an Employer of Choice” when speaking to your friends and family about your employer? 9.2 “Mhlathuze Water provides you with job security:? 9.3 Mhlathuze Water does a good job of placing competent people in key positions. 10.1 Health and Safety is a key priority for management and staff at Mhlathuze Water. 10.2 The work environment at Mhathuze Water is non-discriminatory and promotes diversity 10.3 The organisational policies promote a healthy and conducive work environment. 10.4 Mhlathuze Water respects and supports employees trying to balance work, career and their personal life. 10.5 Employee social activities, team buildings and sport and corporate events promote a positive, community like work environment which is pleasant to employees. 10.6 The work environment at Mhlathuze Water is a key factor in retaining your services at Mhlathuze Water 11.1 Mhlathuze Water provides opportunities for training and development for its employees 11.2 Mhlathuze Water is committed to the long term career development of its employees? 11.3 Gaps in employee performance have been addressed through training and development initiatives. 11.4 Supervisors play a key role in training and development. 11.5 The provisions of Training and Development policies are clearly communicated to and promoted by management to employees. 11.6 Training, development and career advancement opportunities are key factors in retaining your services at Mhlathuze Water. 12.1 Your supervisor enables you to perform at your best 12.2 Your supervisor provides you with regular feedback and is clear about what he/she expects from you in terms of performance 12.3 Your supervisor is able to address questions and concerns when they arise 12.4 Your supervisor is fair in the application of company policies and procedures 12.5 Your supervisor practices and encourages open communication and information sharing 12.6 Your supervisor has a positive impact/influence in management being able to retain your services at Mhlathuze Water. 12.7 You can trust the leadership to lead the organisation towards the attainment of the vision 12.8 The leadership of the organisation promotes the values of the organisation 12.9 There is good communication from the leadership to employees 12.10 The leadership does a good job of aligning the organisation’s objectives to individual performance 12.11 The leadership recognises and values employees for their contribution to the organisation. 12.12 The leadership style at Mhlathuze Water is a factor that contributes towards the retention of your services with the organisation. 13.1 The salary you receive is competitive with similar jobs in other organisations 13.2 Mhlathuze Water benefits sufficiently meets your needs 13.3 The PMS system is open, transparent and fair, rewarding those who have performed. Kolmogorov-Smirnov Shapiro-Wilk Statistic df Sig. Statistic df .205 88 .000 .894 88 .352 88 .000 .636 88 .146 88 .000 .927 88 .264 88 .000 .874 88 .218 88 .000 .900 88 .299 88 .000 .758 88 .344 88 .000 .788 88 .331 88 .000 .778 88 Sig. .000 .000 .000 .000 .000 .000 .000 .000 .282 88 .000 .821 88 .000 .309 .278 88 88 .000 .000 .789 .868 88 88 .000 .000 .306 88 .000 .797 88 .000 .324 88 .000 .814 88 .000 .318 88 .000 .758 88 .000 .295 88 .000 .830 88 .000 .284 88 .000 .830 88 .000 .353 88 .000 .768 88 .000 .295 88 .000 .758 88 .000 .247 88 .000 .865 88 .000 .262 88 .000 .867 88 .000 .228 88 .000 .895 88 .000 .254 88 .000 .872 88 .000 .204 88 .000 .887 88 .000 .233 88 .000 .876 88 .000 .259 88 .000 .873 88 .000 .260 .245 88 88 .000 .000 .867 .877 88 88 .000 .000 .253 88 .000 .866 88 .000 .252 88 .000 .879 88 .000 .285 88 .000 .850 88 .000 .282 .245 88 88 .000 .000 .853 .868 88 88 .000 .000 .213 88 .000 .871 88 .000 .236 88 .000 .894 88 .000 .255 88 .000 .886 88 .000 .261 .237 88 88 .000 .000 .885 .891 88 88 .000 .000 .217 88 .000 .899 88 .000 Assumption of Homogeneity of Variance Non-Parametric Tests • Non-parametric tests are used when assumptions of parametric tests are not met: – level of measurement (e.g. interval or ratio data); – normal distribution; and – homogeneity of variances across groups • They make fewer assumptions about the type of data on which they can be used • Many of these tests will use “ranked” data Decision Making Tree Next up… Z-TESTS AND T-TESTS z-and t-tests • The z-test is used to determine whether a sample mean differed from a population mean. • t-tests are used to determine the difference between means in situations where we have to estimate the population standard deviation from sample data. • Difference between the two tests are that with z-tests the population parameters (ơ & µ) are known, however, with t-tests, they are unknown. • The one-sample t-test uses a similar formula to the z-test, but the standard error is estimated from the sample standard deviation. • The aim of the t-test is to compare distributions that are normally distributed. We can represent such distributions with a bell curve. • The t-test formula always has the same general form Assumptions of the z- and t-tests 1. The assumption of normality 2. The assumption of homogeneity of variance 3. The assumption of independence Different types of t-tests… 1. One-sample t-test • Standard error is estimated from the sample standard deviation. 2. Independent samples t-test • Used to compare two distributions that are independent of each other • Suitable in most situations where you have created two separate groups by random assignment. • It is not necessary to have equal sample sizes for your samples. • It is quite important to ensure that the assumption of homogeneity of variance is not violated for this test Types of t-tests 3. Repeated measures t-test • used to compare means when the samples are not independent. It is also known as the related samples t-test T-tests Applied One-Sample Statistics N Mean Std. Deviation Std. Error Mean Cups_of_coffee 55 2.69 1.942 .262 Level_of_alertness 55 21.35 12.175 1.642 One-Sample Test Test Value = 0 95% Confidence Interval of the Difference Mean t df Sig. (2-tailed) Difference Lower Upper Cups_of_coffee 10.274 54 .000 2.691 2.17 3.22 Level_of_alertness 13.002 54 .000 21.345 18.05 24.64 Decision Making Tree Next up… ANOVA ANOVA – Analysis of Variance • ANOVA is used to test for differences between the means of more than two groups • Allows us to test the difference between more than two groups of subjects and the influence of more than one independent variable • Because we are examining a set of possible differences, instead of testing for a difference between two means, we test for an effect. • A significant effect is present in the data when at least one of the possible comparisons between group means is significant. How does it work? • As the name suggests, the procedure involves analysing variance. • Variance is a measure of the dispersion in a set of scores, and is calculated by determining the ‘average distance’ of a set of scores from its ‘centre’ or mean, by the formula: ANOVA cont… • In ANOVA terminology, the independent variables are called factors. • Instead of talking about variance, in ANOVA terminology we talk about Mean Squares (abbreviated to MS). • This is essentially what variance is – the mean or average of the sum of squared differences between each score in a set of scores and the mean of those scores. • ANOVA we need to distinguish between, and estimate, two different types of variance – random/error variance, and systematic variance. Error variance and systematic variance • Error variance random or unexplained Variance between the means of samples drawn from the same population. • variance between sample means is unexplained, random variance, and is also commonly known as error variance • Systematic variance is the variance in a set of scores that we can explain in terms of the independent variable. The whole aim of computing ANOVA is to determine whether there is systematic variance present. If there is systematic variance present in a dataset, we have a significant effect. Detecting systematic variance • To determine whether or not there is systematic variance present in a dataset, we have to follow a rather indirect path by comparing the variance within the groups to the variance between the groups. Compare the distribution of scores within the cells of Dataset 2 with the differences between the means, you will note a very different pattern. • Little variance within the cells (look at the range of scores within each cell), but there are large differences between the cell means. • Here it appears as though there may be systematic differences between the groups, since, although there is error variance present, it appears to be relatively small in comparison with the differences between the group means. It is quite likely that a significant effect would be found for this pattern of data since the difference in group means is large in comparison with the error variance between individual scores within the groups. This is a situation where the null hypothesis, H˳: μ1 = μ2 = μ3, is very likely to be false. • If the variance between the group means (error variance + systematic variance) is much greater than the variance within the cells (error variance), then this must be due to the presence of systematic variance • In technical language, the variance within the cells is known as MSError - estimate of error variance. • The variance between the groups is known as MSGroup - estimate of error variance plus systematic variance. • To determine whether an effect is present in an ANOVA, we should estimate mathematically the size of MSGroup and MSError, and then compare them. To the extent that MSGroup (error variance + systematic variance) is larger than MSError (error variance), it is likely that there is a significant effect. Post-hoc tests • Useful for determining precisely where the differences between the means lie • Tukey’s Honestly Significantly Difference test (HSD): – The HSD statistic is a critical range applied to pairwise comparisons between groups. What this means is that if any of the differences between the group means is greater than this critical range, we can conclude that there are significant differences between these groups. Multiple Comparisons Dependent Variable: Stress_Levels Tukey HSD Mean 95% Confidence Interval (I) Subject (J) Subject Difference (I-J) Std. Error Sig. Lower Bound Upper Bound Statistics Management -10.50000* 1.48499 .000 -14.1819 -6.8181 HR -22.20000* 1.48499 .000 -25.8819 -18.5181 10.50000* 1.48499 .000 6.8181 14.1819 HR -11.70000* 1.48499 .000 -15.3819 -8.0181 Statistics 22.20000* 1.48499 .000 18.5181 25.8819 Management 11.70000* 1.48499 .000 8.0181 15.3819 Management Statistics HR *. The mean difference is significant at the 0.05 level. There is a significant difference in the means of student stress levels between the subjects Statistics and Management subjects (p < 0.0001), Statistics and HR (p < 0.0001). There were also significant differences in stress levels between Management and HR (p < 0.0001). Factorial analysis of variance • Used for research designs that have more than one independent variable. Why use factorial designs? • Factorial designs are preferable to one-way designs for three related reasons: 1. They are realistic, capturing the complexity of social and psychological phenomena. 2. They allow us to analyse interactions between variables. 3. They are economical, allowing many hypotheses to be tested simultaneously. Assumptions Normality • The populations represented by the data should be normally distributed, making the mean an appropriate measure of central tendency. • Estimate the distribution of the parent populations from the data at hand. When we have small cell numbers, therefore, we should tolerate deviations from normality, appreciating that our estimates are unreliable. • In addition, ANOVA is a robust statistical procedure: the assumption of normality can be violated with relatively minor effects. Nevertheless, ANOVA is inappropriate in situations where you have unequal cell sizes and distributions skewed in different directions. 2. Homogeneity of variance • The populations from which the data are sampled should have the same variance. With balanced designs (i.e. equal numbers of subjects per cell) this assumption can be violated without major effects on the final results. ANOVA Applied… Descriptives Level_of_alertness N 0 1 2 3 4 5 6 7 Total 5 14 12 6 6 5 6 1 55 Mean 4.80 11.36 17.17 21.67 28.00 36.20 43.00 48.00 21.35 Descriptives 95% Confidence Interval for Mean Std. Deviation Std. Error Lower Bound Upper Bound Minimum Maximum .837 .374 3.76 5.84 4 6 1.216 .325 10.66 12.06 10 14 1.267 .366 16.36 17.97 15 19 1.366 .558 20.23 23.10 20 24 2.098 .856 25.80 30.20 24 30 3.899 1.744 31.36 41.04 32 40 .894 .365 42.06 43.94 42 44 . . . . 48 48 12.175 1.642 18.05 24.64 4 48 Test of Homogeneity of Variances Level_of_alertness Levene Statistic df1 df2 Sig. 6.875a 6 47 .000 a. Groups with only one case are ignored in computing the test of homogeneity of variance for Level_of_alertness. Levene’s is significant – homogeneity cannot be assumed ANOVA Applied 2… ANOVA Level_of_alertness Sum of Squares Between Groups Within Groups Total df Mean Square 7868.622 7 1124.089 135.814 47 2.890 8004.436 54 F 389.003 Sig. .000 Significant result (p < 0.0001) Decision Making Tree Repeated Measures ANOVA • Equivalent to the one-way ANOVA, except: – For related, not independent groups • • • • • • Extension of the repeated t-test Test to detect any overall differences between related means Requires one IV and one DV DV = continuous (interval or ratio) IV = categorical (either nominal or ordinal) Because data is tested more than once – the assumption of independence is not relevant • Makes the assumption of Sphericity – relationship between pairs of experimental conditions is similar i.e. the level of dependence between pairs of groups is equal – SPSS tests for this through Mauchly’s test for Sphericity – If the assumption of Sphericity is not met i.e. violated: use a correction factor Epsilon(𝜀) • 𝜀 > 0.75, then use Huynh-Feldt • 𝜀 < 0.75, then use the Greenhouse-Geisser Repeated measures ANOVA a Mauchly's Test of Sphericity Measure: MEASURE_1 b Epsilon Within Subjects Mauchly's Approx. ChiGreenhouseHuynhLowerEffect W Square df Sig. Geisser Feldt bound TIme .434 3.343 2 .188 .638 .760 .500 Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent variables is proportional to an identity matrix. a. Design: Intercept Within Subjects Design: TIme b. May be used to adjust the degrees of freedom for the averaged tests of significance. Corrected tests are displayed in the Tests of Within-Subjects Effects table. Mauchly’s test of Sphericity has been met (χ² = 3.343, df = 2, p = 0.188) Tests of Within-Subjects Effects Measure: MEASURE_1 Type III Sum of Squares Source TIme Sphericity Assumed GreenhouseGeisser Huynh-Feldt Lower-bound Error(TIme) Sphericity Assumed GreenhouseGeisser Huynh-Feldt Lower-bound Mean Square df F Sig. Partial Eta Squared Noncent. Observed a Parameter Power 143.444 2 71.722 12.534 .002 .715 25.068 .975 143.444 1.277 112.350 12.534 .009 .715 16.003 .886 143.444 143.444 1.520 1.000 94.351 12.534 143.444 12.534 .005 .017 .715 .715 19.056 12.534 .930 .806 57.222 10 5.722 57.222 6.384 8.964 57.222 7.602 7.528 57.222 5.000 11.444 a. Computed using alpha = .05 There is a significant difference in safety behaviours as a result of having undergone safety training (F(2, 10) = 12.534, p = 0.002). Pairwise Comparisons Measure: MEASURE_1 Mean Difference (I) TIme (J) TIme (I-J) Std. Error 1 2 -2.500 1.522 * 3 -6.833 1.701 2 1 2.500 1.522 * 3 -4.333 .715 * 3 1 6.833 1.701 * 2 4.333 .715 Based on estimated marginal means *. The mean difference is significant at the .05 level. b. Adjustment for multiple comparisons: Bonferroni. Sig. b .484 .030 .484 .005 .030 .005 95% Confidence Interval for b Difference Lower Bound Upper Bound -7.879 2.879 -12.846 -.821 -2.879 7.879 -6.860 -1.807 .821 12.846 1.807 6.860 The differences lie between the pre-training and 6 months (p = 0.030), and between 3 and 6 months (p = 0.005). Decision Making Tree Next Up…… NON-PARAMETRIC TESTS Mann-Whitney U-test • Perhaps the most common distribution-free test for differences between unrelated samples • Used for research designs similar to those for which the independent samples t-test is used • This means that it can be used whenever you have two groups of scores that are independent of each other Test Statisticsa Time_taken Mann-Whitney U 6.500 Wilcoxon W 21.500 Z -1.261 Asymp. Sig. (2-tailed) .207 Exact Sig. [2*(1-tailed Sig.)] .222b a. Grouping Variable: Gender b. Not corrected for ties. You can see that Sig is > 0.05 in both Asymp and Exact Sig, which means that there is no significant difference in time taken, based on respondents gender. You report your findings as: U = 6.500, p = 0.207. Kruskal-Wallis test • Test the difference between three or more groups in much the same way as ANOVA does in parametric statistical procedures • Extension of the Mann-Whitney U-test for three or more independent samples • Omnibus test for the equality of independent population medians Kruskal-Wallis Applied… Level_of_alertness Ranks Cups_of_coffee 1 2 3 4 5 6 7 Total N 19 12 6 6 5 6 1 Mean Rank 10.00 25.50 34.58 40.42 46.00 51.50 55.00 55 Test Statisticsa,b Level_of_alertness Chi-Square 51.055 df 6 Asymp. Sig. .000 a. Kruskal Wallis Test b. Grouping Variable: Cups_of_coffee Significant result (p < 0.0001) Related Samples: The sign test • Related samples occur when the same group of people is measured more than once, such as in ‘before and after’ research designs. • Non-parametric equivalent of related samples t-test • Considers the difference between two related samples Frequencies N Time_taken2 - Negative Differencesa 8 Time_taken1 Positive Differencesb 0 Tiesc 2 Total 10 a. Time_taken2 < Time_taken1 b. Time_taken2 > Time_taken1 You can see how many participants decreased (the "Negative Differences" row), improved (the "Positive Differences" row) or witnessed no change (the "Ties" row) in their performance between the two trials. c. Time_taken2 = Time_taken1 Test Statisticsa Time_taken2 Time_taken1 Exact Sig. (2-tailed) .008b a. Sign Test b. Binomial distribution used. The statistical significance of the sign test is found in the "Exact Sig. (2-tailed)" row of the table above. However, if you had more than a total of 25 positive and negative differences, an "Asymp. Sig. (2-sided test)" row will be displayed instead. You report your findings as: An exact sign test was used to compare the differences in the speed with which in the two trials were completed. The respondents elicited a statistically significant median decrease in time between the two tests (p = 0.008). Related samples: The Wilcoxon matched pairs test • Similar to the sign test except that when we have obtained the difference scores between the two samples we must rank-order the differences, ignoring the sign of the difference • Tests whether two related samples have the same median • More powerful than the sign test. Ranks N Mean Rank Sum of Ranks Time_taken2 - Negative Ranks 8a 4.50 36.00 Time_taken1 Positive Ranks 0b .00 .00 Ties 2c Total 10 a. Time_taken2 < Time_taken1 b. Time_taken2 > Time_taken1 c. Time_taken2 = Time_taken1 The Ranks table provides some interesting data on the comparison of participants' Before (Pre) and After (Post) speed at which they completed the test’s score. We can see from the table's legend that 8 participants had a higher pre-intervention time than after the intervention. And 0 participants had a higher time post the intervention, whilst 2 participants saw no change in their time taken. Test Statisticsa Time_taken2 Time_taken1 Z -2.536b Asymp. Sig. (2-tailed) .011 a. Wilcoxon Signed Ranks Test b. Based on positive ranks. We report it as follows: The Wilcoxon signed-rank test revealed that the intervention yielded a statistically significant difference in the time taken to complete the test (Z = -2.536, p = 0.011). Three or more groups of scores: Friedman (the economist’s) rank test for related samples • Friedman test is an extension of the Wilcoxon test for three or more related samples • Analogue of one-way repeated measures analysis of variance • It is used for the analysis of within-subjects designs where more than two conditions are being compared • In general, the degrees of freedom for an estimate is equal to the number of values minus the number of parameters estimated en route to the estimate in question Descriptive Statistics Ranks Percentiles Std. N Mean Deviation Mean Rank 50th Minimum Maximum 25th (Median) Time_taken1 2.90 75th Time_taken2 1.85 Time_taken3 1.25 Time_taken1 10 22.5000 5.46199 15.00 30.00 18.2500 21.0000 27.7500 Time_taken2 10 20.2000 4.10420 15.00 28.00 16.5000 20.5000 22.5000 Time_taken3 10 18.9000 4.74810 12.00 29.00 15.0000 18.0000 22.0000 The Friedman test compares the mean ranks between the related groups and indicates how the groups differed, and it is included for this reason. However, you are not very likely to actually report these values in your results section, but most likely will report the median value for each related group. Test Statisticsa N 10 Chi-Square 15.943 df 2 Asymp. Sig. .000 a. Friedman Test If you look at the contents of this table – you can see that Sig. is significant (< 0.05), therefore there is a significant difference in the amount of time taken to complete the three tests (χ² (2) = 15.943, p < 0.0001). Some considerations • Extraneous variables – These are external and uncontrolled variables that impact on a relationship. These interfere in multiple ways: – Where the change in the DV is attributable to a variable other than the IV (third variable problem) – Results are “confounded” by interactions between the DV’s – Additional variables outside of the “experiment” enter the experimental condition, and impact on the results • Confounding or “third” variables – This is where a third variable could be accountable for the observed relationship between two variables Moderator variables Moderator variables impact the strength of the relationship between two variables. Mediator variables This occurs when the variables affect each other.