Download Thema9_Grand

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Analysis
Thema 9 / Analysis
G r a n d A l e x a n d ra
Grand Alexandra
1
Analysis
3. Inferential Statistics
testing hypotheses and models
2. Descriptive Statistics
describing the data
1. Data Preparation
organizing the data
Grand Alexandra
2
Conclusion Validity
Conclusion Validity
Internal Validity
Is there a relationship between two
variables (between cause and effect)?
Assuming that there is a relationship in this
study, is the relationship a causal one?
conclusion
there is a
relationship
there is no
relationship
Is the conclusion about the
relationship reasonable?
„third variable“?
Grand Alexandra
3
Threats to conclusion validity
Incorrect conclusion about a relationship in the observation
1. conclude that there is no relationship when in fact there is
„missing the needle in the haystack“
signal-to-noise ratio problem
„noise“ – factors that make it
hard to see the relationship
„signal“ – relationship
you are trying to see
2. conclude that there is a relationship when in fact there is not
„seeing things that aren´t there“
Grand Alexandra
4
Threats to conclusion validity
„Finding no relationship when there is one“
conclusion
reality
no relationship
relationship
threats:
• low reliability of measures
• low reliability of treatment implementation
• random irrelevancies in the setting
• random heterogeneity of respondents
• -> low statistical power
„noise“ producing factors
add variability
• violation of assumptions of statistical tests
„Finding a relationship when there is not one“
conclusion
reality
relationship
no relationship
threats:
• fishing and the error rate problem
• violation of assumptions of statistical tests
Grand Alexandra
5
Improving Conclusion Validity
• good statistical power (should be > 0.8)
power = „the odds of saying that there is an relationship, when in fact there is one“
Factors that affect power:
sample size: use lager sample size
effect size: increase effect size (e.g. increase the dosage of the program)
signal
noise
-> increase
-> decrease
α-level: raise the alpha-level
• good reliability -> reduce „noise“
• good implementation
Grand Alexandra
6
Statistical Inference Decision Matrix
• two mutually exclusive hypotheses (H0, HA)
• decision: which hypothesis to accept and which to reject
REALITY
CONCLUSION
accept H0
accept HA
Grand Alexandra
H0 is true
HA is true
decision right
decision wrong
1-α (e.g. 0.95)
confidence level
β (e.g. 0.20)
β-error (Type II Error)
decision wrong
decision right
α (e.g. 0.05)
α-error (Type I Error)
significance level
1-β (e.g. 0.80)
Power
7
HA right
0.02
0.03
0.04
H0 right
1-α
0.01
1-β
POWER
β
0.00
dnorm(x, mean = 100, sd = 10)
Statistical Inference Decision
80
60
100
x
what we want:
problem:
Grand Alexandra
80
α
120
100
140
120
x
high power and low Type I Error
the higher the power the higher the Type I Error
8
Practical
Ein in „Wirklichkeit“ hochbegabtes Kind wird als nicht hochbegabt diagnostiziert.
Um welchen Fehler handelt es sich in diesem Fall?
 α-Fehler (Fehler 1. Art/ Type I Error)
 β-Fehler (Fehler 2. Art/ Type II Error)
Das Ergebnis einer Studie: WU-StudentInnen mit HAK-Abschluss erreichen eine
höhere Punkteanzahl bei der MC-Prüfung in Buchhaltung. In Wirklichkeit gibt es aber
keinen Unterschied zwischen HAK- und nicht HAK-Absolventen hinsichtlich der
erreichten Punkteanzahl.
Um welchen Fehler handelt es sich in diesem Fall?
 α-Fehler (Fehler 1. Art/ Type I Error)
 β-Fehler (Fehler 2. Art/ Type II Error)
Grand Alexandra
9
Practical
Kreuzen Sie die richtige Antwort an und stellen Sie die falschen Antworten richtig.
Durch Erhöhung des α-Fehlers von 0.01 auf 0.05 …
steigt
 sinkt die Power (Teststärke)
steigen
 sinken die Chancen einen Fehler 1. Art zu machen
 sinken die Chancen einen β-Fehler zu machen
 ist der Test restriktiver weniger restriktiv
Grand Alexandra
10
Analysis
Beispieldatensatz „Arbeitszufriedenheit“ – AZ
Datensatz: AZ.sav
Hinweis: Die Daten wurden zu Illustrationszwecken aus einem Datensatz* willkürlich gewählt!
Etwaige Ergebnisse sollten daher nicht allzu ernst genommen werden.
Stichprobengröße:
n = 15
Variablen:
dichotom: SEX, Items zu den Konstrukten Arbeitszufriedenheit** (AZ_... ),
Betriebsklima** (BK_...), Arbeitsbelastung** (AB_... )
ordinal:
POSITION (Position im Betrieb)
metrisch:
MITARB (Anzahl der Mitarbeiter), NETTO (monatl. Nettoverdienst in €)
neue Variable: AZ „Arbeitszufriedenheit“(Annahme: intervallskaliert!)
 Summenscore der einzelnen Variablen AZ_...
* Böhnisch, B., Grand, A., Rechberger, R., Wimmer, W. (2006). Berufliche Zufriedenheit. Seminararbeit aus Empirische Forschungsmethoden.
** Items wurden übernommen von: Giegler, H. (1985). Rasch-Skalen zur Messung von „Arbeits- und Berufszufriedenheit“, „Betriebsklima“ und „Arbeits- und Berufsbelastung“ auf Seiten der Betroffenen.
Grand Alexandra
11
1. Data Preparation
1. Logging the data
2. (Checking the data for accuracy)
3. Developing a database structure – Codebook (Kodierungsschema)
4. Entering the data into the computer (once only entry or double entry);
Checking the data for accuracy
5. Data Transformation
• missing values
• item reversals (example: transform reversal items e.g. BK_2:
old value: 1 „agree“, 2 „disagree“ -> new value: 2 „agree“, 1 „disagree“)
• recode variables (example: transform items „AZ_...“, „AB_...“, “BK_...“:
old value: 1 „agree“, 2 „disagree“ -> new value: 1 „agree“, 0 „disagree“)
• scale totals (example: generate new variable „AZ“ (Arbeitszufriedenheit))
 to get a total score for AZ add across the individual items AZ_...,)
• categories
Grand Alexandra
12
1.Data Preparation - Codebook
ID
SEX
MITARB
NETTO
1
2
3
POSITION
2
1
The codebook should include:
 variable name
 variable description
 variable format
 instrument/method of collection
 date collected
 respondent or group
 variable location in database
1
2
AZ_1
- BK_2
AB_3
AZ_4
BK_5
Grand Alexandra
13
1. Data Preparation - Checking data for accuarcy
summarize (e.g. frequency table) and check the data
• are the listed values reasonable? („wild codes“, outlier/Ausreißer)
• are there missing values? („missing values“)
outlier/Ausreißer
• it acutally is an outlier or
• error in data entry
„wild code“
„missing values“
Grand Alexandra
„missing values“
• there exist no data or
• data weren´t entered
14
2. Descriptive Statistics
Descriptive statistics
• „quantitative description in a manageable form“
• describe basic features of the data, provide simple summaries
• simple graphics analysis
Univariate Analysis - Analysis of one variable at a time
Description of a single variable:
• distribution
• central tendency (Lagemaß)
• dispersion (Streuungsmaß)
Bivariate Analysis – Analysis of two variables at a time
Multivariate Analysis – Analysis of multiple variables at a time
Grand Alexandra
15
2. Descriptive Statistics - Distribution
Frequency distribution
table
graph
• absolute frequencies
• relative frequencies
• absolute frequencies
• relative frequencies
• Frequency table: Geschlecht
Geschlecht
männlich
weiblich
absolute
relative
Häufigkeiten Häufigkeiten
8
7
53%
47%
pie chart
bar chart
boxplot
histogram
(stem and leaf diagram)
…
• crosstab
Grand Alexandra
16
2. Descriptive Statistics - Distribution
7
6
5
47%
53%
männlich
4
weiblich
3
2
graphs
1
Kreisdiagramm - Geschlecht
Histogramm – monatl. Nettoverdienst
Grand Alexandra
0
untere Position
mittlere Position
obere Position
Balkendiagramm - Position
Boxplot – Anzahl der Mitarbeiter
17
2. Descriptive Statistics – Central Tendency
Central Tendencies / LAGEMASSE
adequacy
data
computation
Mean (Mittelwert) x
Median ~
x
Modus
„sum of values xi /
number n of values“
„center of the sample“
„most frequently
occuring value“
• metric data
• ordinal data
• metric data
• nominal data
• ordinal data
• metric data
•if distribution is approx.
normal distributed
•not robust against single
extreme values („outliers“)
•robust against outliers
•robust against outliers
Grand Alexandra
18
2. Descriptive Statistics – Central Tendency / Practical
Berechnen Sie den Mittelwert, Median und Modus der Variablen SEX, MITARB (Anzahl
der Mitarbeiter) und POSITION - Achten Sie dabei auf eine sinnvolle Anwendung!
Hilfestellung: aufsteigende Sortierung
der Variablen Mitarbeiter und Position
Grand Alexandra
19
2. Descriptive Statistics – Distribution / Practical_Solution
Variable
Mean
Median
Modus
48.5
18
7
Position
-
2
1
Geschlecht
-
-
1
Mitarbeiter
Grand Alexandra
20
2. Descriptive Statistics - Dispersion
Dispersions/ STREUUNGSMASSE
data
computation
Variance s²
Standard Deviation s
Range /
Spannweite
=
„average of the sum of
the squared deviations “
„square root of the
variance“
„highest value minus
lowest value“
metric data
metric data
ordinal data
metric data
Grand Alexandra
21
2. Descriptive Statistics - Dispersion
Dispersions/ STREUUNGSMASSE
Interquartile range IQR
max
•robust against outliers
75% of the cases fall below this value
25% of the cases fall below this value
50% of the cases fall above and below this value
IQR
computation
data
adequacy
25%
metric data
3. quartile (Q3):
1. quartile (Q1):
median:
Q3
25%
„difference between third and first quartile“
25%
Q2 =
median
25%
Grand Alexandra
min
Q1
22
2. Descriptive Statistics – Dispersion / Practical_Solution
Berechnung der Varianz, Standardabweichung und der Spannweite der Variable
NETTO (Nettoverdienst): n = 15, mean = 1553,3 ; min = 200, max = 2800
Steps (Variance):
1. compute distance between each value and the mean
2. square each discrepancy
3. sum the squares to get the Sum of Squares (SS) value
4. divide the SS by n - 1
Variable
Nettoverdienst
Grand Alexandra
Variance
Standard
Deviation
Range
(Spannweite)
Min
Max
471595.238
686.728
2600
200
2800
23
Correlation
Correlation
„A correlation is a single number that describes the
degree of relationship between two variables“
 correlation coefficient between -1 < r < 1
 the higher the absolute r-value, the stronger the relationship between the variables
• uncorrelated
r=0
• positive correlation
r > 0 positive relationship
 the higher the x-values the higher the y-values on average
• negative correlation
r < 0 negative relationship
 the higher the x-values the lower the y-values on average
and vice versa
• exact linear correlation
Grand Alexandra
r = 1 (positive), r= -1 (negative)
24
Correlation - Example
Example:
Is there a relationship between the variable „Nettoverdienst“ and the
variable „Arbeitszufriedenheit“?
If yes, …
1. Which type of relationship?
2. How strong is the relationship?
3. Is the correlation significant?
Descriptive statistics for „Nettoverdienst“ and „Arbeitszufriedenheit“
Variable
Mean
StDev
Variance
Sum
Netto1553.33
verdienst
686.728
471595.238
23300
2.178
4.743
78
Arbeitszufried.
Grand Alexandra
5.20
Min
Max
Range
200 2800
2600
1
9
8
25
Example - Descriptive Statistics
Boxplot – Arbeitszufriedenheit (AZ)
Grand Alexandra
Boxplot – monatl. Nettoverdienst in €
26
Example – 1. Which type of relationship?
Grand Alexandra
27
Example – 2. How strong is the relationship?
Product-Moment-Correlation (Pearson)
• variables (x,y) are metric and normal distributed
Calculating the correlation
SPSS-Output: Korrelation AZ/NETTO
Grand Alexandra
28
Example – Q-Q Plot
Q-Q Plot: AZ (Arbeitszufriedenheit)
Grand Alexandra
Q-Q Plot: monatl. Nettoverdienst in €
29
Example – 3. Is the correlation significant?
Testing the Significance of a Correlation
Null Hypothesis:
Alternative Hypothesis:
r=0
r <> 0
Steps:
1. determine the significance level alpha-level
α = 0.05
2. compute the degrees of freedom df
df = N-2 -> 15- 2 = 13
3. one-tailed or two-tailed test?
two-tailed test
4. look at the critical value
Grand Alexandra
30
Example – 3. Is the correlation significant?
Auszug: t-Verteilungen für Produkt-Moment-Korrelationen
SPSS-Output: Korrelation AZ/NETTO
correlation is significant: r (0.692) > rcrit (0.514)
Grand Alexandra
31
Correlation Matrix
• symmetric matrix
• relationships between all possible pairs of variables
e.g. between C1,…,C10  45 unique correlations
N*(N-1) / 2
Grand Alexandra
32
Other correlations
• Pearson Product Moment
(bivariate normal distribution, variables on interval scale)
• Spearman rank Order Correlation (rho)
(two ordinal variables)
• Kendall rank order Correlation (tau)
(two ordinal variables)
• Point-Biserial Correlation
(one variable is on a continuous interval level and the other is dichotomous)
Grand Alexandra
33
Literatur
Basisliteratur:
Trochim, W. & Donelly, J.: The Research methods Knowledge Base (3rd edition) Atomic
Dog Internet WWW page, URL: http://www.socialresearchmethods.net/kb/
(version current as of October 20, 2006).
Bortz, J., Döring, N. (2006). Forschungsmethoden und Evaluation. Heidelberg: Springer
Verlag.
Hatzinger, R. (2006). Angewandte Statistik mit SPSS. Wien: Facultas.
Hatzinger, R. , Nagel, H. (2009). PASW Statistics. Statistische Methoden und Fallbeispiele.
München: Pearson Studium.
Nagel, H. (2003). Empirische Sozialforschung.
Grand Alexandra
34