Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Analysis Thema 9 / Analysis G r a n d A l e x a n d ra Grand Alexandra 1 Analysis 3. Inferential Statistics testing hypotheses and models 2. Descriptive Statistics describing the data 1. Data Preparation organizing the data Grand Alexandra 2 Conclusion Validity Conclusion Validity Internal Validity Is there a relationship between two variables (between cause and effect)? Assuming that there is a relationship in this study, is the relationship a causal one? conclusion there is a relationship there is no relationship Is the conclusion about the relationship reasonable? „third variable“? Grand Alexandra 3 Threats to conclusion validity Incorrect conclusion about a relationship in the observation 1. conclude that there is no relationship when in fact there is „missing the needle in the haystack“ signal-to-noise ratio problem „noise“ – factors that make it hard to see the relationship „signal“ – relationship you are trying to see 2. conclude that there is a relationship when in fact there is not „seeing things that aren´t there“ Grand Alexandra 4 Threats to conclusion validity „Finding no relationship when there is one“ conclusion reality no relationship relationship threats: • low reliability of measures • low reliability of treatment implementation • random irrelevancies in the setting • random heterogeneity of respondents • -> low statistical power „noise“ producing factors add variability • violation of assumptions of statistical tests „Finding a relationship when there is not one“ conclusion reality relationship no relationship threats: • fishing and the error rate problem • violation of assumptions of statistical tests Grand Alexandra 5 Improving Conclusion Validity • good statistical power (should be > 0.8) power = „the odds of saying that there is an relationship, when in fact there is one“ Factors that affect power: sample size: use lager sample size effect size: increase effect size (e.g. increase the dosage of the program) signal noise -> increase -> decrease α-level: raise the alpha-level • good reliability -> reduce „noise“ • good implementation Grand Alexandra 6 Statistical Inference Decision Matrix • two mutually exclusive hypotheses (H0, HA) • decision: which hypothesis to accept and which to reject REALITY CONCLUSION accept H0 accept HA Grand Alexandra H0 is true HA is true decision right decision wrong 1-α (e.g. 0.95) confidence level β (e.g. 0.20) β-error (Type II Error) decision wrong decision right α (e.g. 0.05) α-error (Type I Error) significance level 1-β (e.g. 0.80) Power 7 HA right 0.02 0.03 0.04 H0 right 1-α 0.01 1-β POWER β 0.00 dnorm(x, mean = 100, sd = 10) Statistical Inference Decision 80 60 100 x what we want: problem: Grand Alexandra 80 α 120 100 140 120 x high power and low Type I Error the higher the power the higher the Type I Error 8 Practical Ein in „Wirklichkeit“ hochbegabtes Kind wird als nicht hochbegabt diagnostiziert. Um welchen Fehler handelt es sich in diesem Fall? α-Fehler (Fehler 1. Art/ Type I Error) β-Fehler (Fehler 2. Art/ Type II Error) Das Ergebnis einer Studie: WU-StudentInnen mit HAK-Abschluss erreichen eine höhere Punkteanzahl bei der MC-Prüfung in Buchhaltung. In Wirklichkeit gibt es aber keinen Unterschied zwischen HAK- und nicht HAK-Absolventen hinsichtlich der erreichten Punkteanzahl. Um welchen Fehler handelt es sich in diesem Fall? α-Fehler (Fehler 1. Art/ Type I Error) β-Fehler (Fehler 2. Art/ Type II Error) Grand Alexandra 9 Practical Kreuzen Sie die richtige Antwort an und stellen Sie die falschen Antworten richtig. Durch Erhöhung des α-Fehlers von 0.01 auf 0.05 … steigt sinkt die Power (Teststärke) steigen sinken die Chancen einen Fehler 1. Art zu machen sinken die Chancen einen β-Fehler zu machen ist der Test restriktiver weniger restriktiv Grand Alexandra 10 Analysis Beispieldatensatz „Arbeitszufriedenheit“ – AZ Datensatz: AZ.sav Hinweis: Die Daten wurden zu Illustrationszwecken aus einem Datensatz* willkürlich gewählt! Etwaige Ergebnisse sollten daher nicht allzu ernst genommen werden. Stichprobengröße: n = 15 Variablen: dichotom: SEX, Items zu den Konstrukten Arbeitszufriedenheit** (AZ_... ), Betriebsklima** (BK_...), Arbeitsbelastung** (AB_... ) ordinal: POSITION (Position im Betrieb) metrisch: MITARB (Anzahl der Mitarbeiter), NETTO (monatl. Nettoverdienst in €) neue Variable: AZ „Arbeitszufriedenheit“(Annahme: intervallskaliert!) Summenscore der einzelnen Variablen AZ_... * Böhnisch, B., Grand, A., Rechberger, R., Wimmer, W. (2006). Berufliche Zufriedenheit. Seminararbeit aus Empirische Forschungsmethoden. ** Items wurden übernommen von: Giegler, H. (1985). Rasch-Skalen zur Messung von „Arbeits- und Berufszufriedenheit“, „Betriebsklima“ und „Arbeits- und Berufsbelastung“ auf Seiten der Betroffenen. Grand Alexandra 11 1. Data Preparation 1. Logging the data 2. (Checking the data for accuracy) 3. Developing a database structure – Codebook (Kodierungsschema) 4. Entering the data into the computer (once only entry or double entry); Checking the data for accuracy 5. Data Transformation • missing values • item reversals (example: transform reversal items e.g. BK_2: old value: 1 „agree“, 2 „disagree“ -> new value: 2 „agree“, 1 „disagree“) • recode variables (example: transform items „AZ_...“, „AB_...“, “BK_...“: old value: 1 „agree“, 2 „disagree“ -> new value: 1 „agree“, 0 „disagree“) • scale totals (example: generate new variable „AZ“ (Arbeitszufriedenheit)) to get a total score for AZ add across the individual items AZ_...,) • categories Grand Alexandra 12 1.Data Preparation - Codebook ID SEX MITARB NETTO 1 2 3 POSITION 2 1 The codebook should include: variable name variable description variable format instrument/method of collection date collected respondent or group variable location in database 1 2 AZ_1 - BK_2 AB_3 AZ_4 BK_5 Grand Alexandra 13 1. Data Preparation - Checking data for accuarcy summarize (e.g. frequency table) and check the data • are the listed values reasonable? („wild codes“, outlier/Ausreißer) • are there missing values? („missing values“) outlier/Ausreißer • it acutally is an outlier or • error in data entry „wild code“ „missing values“ Grand Alexandra „missing values“ • there exist no data or • data weren´t entered 14 2. Descriptive Statistics Descriptive statistics • „quantitative description in a manageable form“ • describe basic features of the data, provide simple summaries • simple graphics analysis Univariate Analysis - Analysis of one variable at a time Description of a single variable: • distribution • central tendency (Lagemaß) • dispersion (Streuungsmaß) Bivariate Analysis – Analysis of two variables at a time Multivariate Analysis – Analysis of multiple variables at a time Grand Alexandra 15 2. Descriptive Statistics - Distribution Frequency distribution table graph • absolute frequencies • relative frequencies • absolute frequencies • relative frequencies • Frequency table: Geschlecht Geschlecht männlich weiblich absolute relative Häufigkeiten Häufigkeiten 8 7 53% 47% pie chart bar chart boxplot histogram (stem and leaf diagram) … • crosstab Grand Alexandra 16 2. Descriptive Statistics - Distribution 7 6 5 47% 53% männlich 4 weiblich 3 2 graphs 1 Kreisdiagramm - Geschlecht Histogramm – monatl. Nettoverdienst Grand Alexandra 0 untere Position mittlere Position obere Position Balkendiagramm - Position Boxplot – Anzahl der Mitarbeiter 17 2. Descriptive Statistics – Central Tendency Central Tendencies / LAGEMASSE adequacy data computation Mean (Mittelwert) x Median ~ x Modus „sum of values xi / number n of values“ „center of the sample“ „most frequently occuring value“ • metric data • ordinal data • metric data • nominal data • ordinal data • metric data •if distribution is approx. normal distributed •not robust against single extreme values („outliers“) •robust against outliers •robust against outliers Grand Alexandra 18 2. Descriptive Statistics – Central Tendency / Practical Berechnen Sie den Mittelwert, Median und Modus der Variablen SEX, MITARB (Anzahl der Mitarbeiter) und POSITION - Achten Sie dabei auf eine sinnvolle Anwendung! Hilfestellung: aufsteigende Sortierung der Variablen Mitarbeiter und Position Grand Alexandra 19 2. Descriptive Statistics – Distribution / Practical_Solution Variable Mean Median Modus 48.5 18 7 Position - 2 1 Geschlecht - - 1 Mitarbeiter Grand Alexandra 20 2. Descriptive Statistics - Dispersion Dispersions/ STREUUNGSMASSE data computation Variance s² Standard Deviation s Range / Spannweite = „average of the sum of the squared deviations “ „square root of the variance“ „highest value minus lowest value“ metric data metric data ordinal data metric data Grand Alexandra 21 2. Descriptive Statistics - Dispersion Dispersions/ STREUUNGSMASSE Interquartile range IQR max •robust against outliers 75% of the cases fall below this value 25% of the cases fall below this value 50% of the cases fall above and below this value IQR computation data adequacy 25% metric data 3. quartile (Q3): 1. quartile (Q1): median: Q3 25% „difference between third and first quartile“ 25% Q2 = median 25% Grand Alexandra min Q1 22 2. Descriptive Statistics – Dispersion / Practical_Solution Berechnung der Varianz, Standardabweichung und der Spannweite der Variable NETTO (Nettoverdienst): n = 15, mean = 1553,3 ; min = 200, max = 2800 Steps (Variance): 1. compute distance between each value and the mean 2. square each discrepancy 3. sum the squares to get the Sum of Squares (SS) value 4. divide the SS by n - 1 Variable Nettoverdienst Grand Alexandra Variance Standard Deviation Range (Spannweite) Min Max 471595.238 686.728 2600 200 2800 23 Correlation Correlation „A correlation is a single number that describes the degree of relationship between two variables“ correlation coefficient between -1 < r < 1 the higher the absolute r-value, the stronger the relationship between the variables • uncorrelated r=0 • positive correlation r > 0 positive relationship the higher the x-values the higher the y-values on average • negative correlation r < 0 negative relationship the higher the x-values the lower the y-values on average and vice versa • exact linear correlation Grand Alexandra r = 1 (positive), r= -1 (negative) 24 Correlation - Example Example: Is there a relationship between the variable „Nettoverdienst“ and the variable „Arbeitszufriedenheit“? If yes, … 1. Which type of relationship? 2. How strong is the relationship? 3. Is the correlation significant? Descriptive statistics for „Nettoverdienst“ and „Arbeitszufriedenheit“ Variable Mean StDev Variance Sum Netto1553.33 verdienst 686.728 471595.238 23300 2.178 4.743 78 Arbeitszufried. Grand Alexandra 5.20 Min Max Range 200 2800 2600 1 9 8 25 Example - Descriptive Statistics Boxplot – Arbeitszufriedenheit (AZ) Grand Alexandra Boxplot – monatl. Nettoverdienst in € 26 Example – 1. Which type of relationship? Grand Alexandra 27 Example – 2. How strong is the relationship? Product-Moment-Correlation (Pearson) • variables (x,y) are metric and normal distributed Calculating the correlation SPSS-Output: Korrelation AZ/NETTO Grand Alexandra 28 Example – Q-Q Plot Q-Q Plot: AZ (Arbeitszufriedenheit) Grand Alexandra Q-Q Plot: monatl. Nettoverdienst in € 29 Example – 3. Is the correlation significant? Testing the Significance of a Correlation Null Hypothesis: Alternative Hypothesis: r=0 r <> 0 Steps: 1. determine the significance level alpha-level α = 0.05 2. compute the degrees of freedom df df = N-2 -> 15- 2 = 13 3. one-tailed or two-tailed test? two-tailed test 4. look at the critical value Grand Alexandra 30 Example – 3. Is the correlation significant? Auszug: t-Verteilungen für Produkt-Moment-Korrelationen SPSS-Output: Korrelation AZ/NETTO correlation is significant: r (0.692) > rcrit (0.514) Grand Alexandra 31 Correlation Matrix • symmetric matrix • relationships between all possible pairs of variables e.g. between C1,…,C10 45 unique correlations N*(N-1) / 2 Grand Alexandra 32 Other correlations • Pearson Product Moment (bivariate normal distribution, variables on interval scale) • Spearman rank Order Correlation (rho) (two ordinal variables) • Kendall rank order Correlation (tau) (two ordinal variables) • Point-Biserial Correlation (one variable is on a continuous interval level and the other is dichotomous) Grand Alexandra 33 Literatur Basisliteratur: Trochim, W. & Donelly, J.: The Research methods Knowledge Base (3rd edition) Atomic Dog Internet WWW page, URL: http://www.socialresearchmethods.net/kb/ (version current as of October 20, 2006). Bortz, J., Döring, N. (2006). Forschungsmethoden und Evaluation. Heidelberg: Springer Verlag. Hatzinger, R. (2006). Angewandte Statistik mit SPSS. Wien: Facultas. Hatzinger, R. , Nagel, H. (2009). PASW Statistics. Statistische Methoden und Fallbeispiele. München: Pearson Studium. Nagel, H. (2003). Empirische Sozialforschung. Grand Alexandra 34