Download Quantitative Research methods for the Social Science, 7.5 hp

Document related concepts

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Omnibus test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Quantitative Research methods
for the Social Science, 7.5 hp
Week 2 (4+h material with exercises)
Population
Sample survey (random sample)
population
Data,
observations
Inferential statistics =
Draw conclusions about the
population from the sample survey.
Statistics
•
•
•
•
Statistics is about
collecting,
organizing,
analyzing and
presenting data.
That is what we do when we are doing research on
for instance how many have been unemployed for
more than a year, or what the figures for car sales
were the first three months of this year.
Concepts
• Population: A group of individuals which we
want to investigate.
• Total survey: All the units in a population are
investigated.
• Sample survey: A subsample of the
populations is chosen and investigated.
• Random sampling: The sample units are
chosen by some random mechanism
• Variable: Property connected with the units in
a population.
Population
Sample survey (random sample)
population
Data,
observations
Inferential statistics =
Draw conclusions about the
population from the sample survey.
Do science, statistical perspective
• Formulate a research problem.
• Define the population and plan a
random sample survey.
• Find relevant variables to measure.
• Make descriptive statistics of the sample.
• Make inferential statistics to generalize
what you find to entire population.
• Write report.
Where it can go wrong!
• A study is carried out to understand the
training habits of students at Umeå
university.
• The researcher hands out questionnaires at
the entrance of Iksu.
• Result: Students at Umeå University train a
lot more than expected.
• Where did it go wrong?
Where it can go wrong!
• A study is carried out to find out if students
at Umeå University prefer the campus pups
more than the inner city pubs.
• The researcher hands out questionnaires in
the queue at a campus pub.
• Result: A majority of the students prefer
campus pubs.
• Where did it go wrong?
Why make a random sample?
• If the sample is random it is possible to use
probability theory to control the error that is arising
from the fact that we just study a sample and not
the entire population. This is impossible if the
sample is not random. Make a random sample.
• Give objective measures of the precision of the
results of the survey.
• Make objective comparisons between different
sampling plans prior to the survey.
• Calculate how large samples you need in order to
achieve a certain margin of error.
Find relevant variables to measure.
Variable: Property connected with the units in a
population.
Measurement: An allocation of numbers to the
subjects in a survey such that specific
relationships between the subjects, in
consideration to some specific property, can
be seen in the numbers.
Why do we measure?
• To describe, To compare, To evaluate.
Examples of things we want to measure:
• Length
• Stress
• Welfare
• Consumer satisfaction.
Data levels
Data levels are important because different level
of the data means different methods of
analyzing the data.
• Nominal Data
– Classification,
• Ordinal Data
– Classification and Order
• Scale Data (Interval or Ratio)
– Classification, Order, and Equivalent distance
Exercise 1: Variable type?
•
•
•
•
•
•
•
•
Age
Age group 25-34, 35-44, 45-54,...
Sex (male/female)
Education (primary, secondary, university)
Smoker (yes/no)
BMI (23.45, 28.12,…)
Car model (Volvo, Saab, Fiat)
Temperature (12C, -4C, 14C,…)
Descriptive statistics
• Measures of location
– mean
– median
• Measure of spread
–
–
–
–
–
range (min-max)
variance
standard deviation, SD
standard error of the mean, SEM
percentiles/quantiles (p25, p75, q1, q3,...)
• Frequency tables
• Graphs
– barchart/histogram
– boxplot
– scatterplot
Center and spread
Answering a research questions
is often to compare measures.
Exercise 2: What is a boxplot
• Black board example: Number of earrings of
males and females in an African tribe
• Sample 11 males, 11 females
• Females: 3 4 7 5 3 6 4 4 2 3 4
• Males: 10 5 15 17 18 23 10 8 7 22 16
• Construct a box plot for each gender
• Is there a difference between gender?
Mean and Variation same?
Mean and Variation same?
Descriptive Statistics
Descriptive Statistics: Number of earrings
•
•
•
Gender Minimum
f
2,000
m
5,00
Q1
Median
3,000 4,000
8,00 15,00
Q3
Maximum IQR
5,000 7,000
2,000
18,00 23,00
10,00
• Gender N Mean SE Mean StDev Variance
• f
11 4,091 0,436
1,446 2,091
• m
11 13,73 1,84
6,10 37,22
Workload and exam result
investigation.
Is there a difference in the study results
between males and females?
If so, what does the difference depend on?
A sample of graphs and plots.
Exam results (scale)
Workload (scale)
Histogram of Exam Score (scale)
Bar chart Grades (ordinal/ nominal)
Pie Chart of Grade
(ordinal/ nominal)
Boxplot of Exam Score, gender
(scale vs nominal)
Bar chart of Grade , gender
(nominal vs nominal)
Boxplot of Total Study Time, gender
(scale vs nominal)
Scatter plot (scale vs scale)
Is there a relation?
Inferential statistics
Is there a difference in how well females and males
perform on the exam?(this week)
Is there a difference in how much females and males
study to the exam? (this week)
Is there a difference in how well females and males
perform on the exam if we take the time the
students study time into account?(next week)
Inferential statistics is a collection of methods used
to draw conclusions or inference about the
characteristics of populations based on sample
data.
Exercise 3: Design an experiment
• We want to examine if there is a difference
between mosquito cream A and B.
• Material for the experiment.
– 30 students with naked arms
– 1 bottle of mosquito cream A
– 1 bottle of mosquito cream B
– A forest full of mosquitoes
How do you perform the experiment and what data
do you gather?
Inferential statistics (The idea)
Hypothesis testing
In research we want to get answers to posed questions
(hypothesis).
• Are all coffee flavors equally popular?
• Is the use of bike helmets effective in protecting
people in bicycle accidents from head injuries?
• Is there a connection between gender and alcohol
consumption among the students at Umeå
university?
HYPOTHETIC-DEDUCTIVE METHOD
1
Hypothesis
Statement
Deduction – logically
valid argument
(Predictive inference)
2
3
Induction
(Inductive inference)
1Tries to predict what will happen if the hypothesis holds.
2 ”Dialogue with reality”
Observation
Logical valid hypothesis (example)
Valid
Hypothesis: The animal is a
horse.
Statement: If the animal is a
horse it will have four legs.
Observation: The animal has not
four legs.
Conclusion: The animal is not a
horse.
Invalid
Hypothesis: The animal is a horse.
Statement: If the animal is a horse it
will have four legs.
Observation: The animal has not
four legs.
Conclusion: It is a horse.
Non valid conclusion. It can be a pig
or some other animal.
Contradiction proofs
Within statistical hypothesis testing (inference theory) we are
not looking for ”impossible” events” in order to reject posed
hypotheses.
(e.g. it is impossible that the animal has six legs if it is a horse. If
the animal has six legs the hypothesis ”it is a horse” is
rejected.)
Instead we are looking for contradictions in terms of
”improbable events”.
Improbable event
Assume that we suspect that the usage of bicycle helmets is an effective way
to protect people in bicycle accidents from skull damage.
Null hypothesis: The percentage of persons with skull damage after a bicycle
accident is the same whether or not they use bicycle helmets.
Statement: If the percentage of persons with skull damage after a bicycle
accident is the same whether or not they use bicycle helmets, in a sample
survey there should only be a small difference in the percentage of people
with skull damage in the two groups.
If the hypothesis holds, it is an improbable event in a sample survey, to
observe a large percentage difference between these kinds of groups.
Improbable event
Assume that we suspect that there is a difference between male and female
students at Umeå university concerning the opinion about EMU.
Null hypothesis: The percentage of students that are against EMU is the same
whether or not they are male or females.
Statement: If the percentage of students that are against EMU is the same
whether or not they are male or females, in a sample survey there should
only be a small percentage difference of students against EMU between
the two groups.
If the hypothesis holds, it is an improbable event in a sample survey, to
observe a large percentage difference between these groups.
Test statistic,
What is a improbable even?
Within statistical inference theory the statements are
summarized in a test statistic.
From our hypothesis and from the probability theory we
can derive the distribution of the test statistic if the
null hypothesis is true.
Next, we draw a sample and calculate a value of the
observed test statistic and compare it with the derived
distribution to understand if we have an improbable
event.
If we get an improbable event the null hypothesis is
rejected.
P-value
The P-value describes how improbable the event is.
If the p-value is small, we either have something
which is improbable or the null hypothesis does
not hold.
If the p-value is small (< 0.05 or <0.01) the null
hypothesis should be rejected.
(The level 0.05=5% or 0.01=1% is called significant level. More about this later.)
Coffee Example
100 people took part in a survey about different
brands of coffee. Each person tasted four
different brand (in a blind test), and noted
which one they preferred. The result of the
test was as follows:
Brand:
Ellips
Gexus
Luber
Eco
Number
of people
26
28
16
30
Does the result of the survey show that any of
the brands are more popular than the others,
or are they all equal?
In statistical terms we can formulate the
problem as:
Null hypothesis: All the coffee brands are
equally popular.
Alternative hypothesis: All the coffee brands
are not equally popular.
If the null hypothesis is true, we could expect
the following result of the survey:
Brand:
Ellips
Gexus
Luber
Eco
Number
of people
25
25
25
25
Can we with a significance level of 5% reject
the null hypothesis?
One way of measuring how much the observed
table differs from the expected table is to look
at the differences:
Brand:
Ellips
Gexus
Luber
Eco
Number
of people
26
28
16
30
Brand:
Ellips
Gexus
Luber
Eco
Number
of people
25
25
25
25
If we square the difference and divide with the
expectation and sum over all brands we get a teststatistic called “Chi square”. It’s possible to derive the
distribution of the test statistic under assumption
that the null hypotheses is true.


26  25

2
2
obs
25
 4.64

28  25

2
25

16  25

2
25

30  25

2
25

Chi-square distribution
Is 4.64 an improbable event?
If the null hypothesis is true,  ought to be
close to zero. Is 4.64 so far away from zero
that we can reject the null hypothesis?
2
obs
We compare the obtained p-value with our
chosen level of significance.
Observed p-value: 0,20
Conclusion?
Distribution under the null hypothesis.
(To get 4.64 or more is not unusual.
We can not reject the null hypothesis.)
Choose the right test
• Hand out the summery picture. (Reminder)
1) Decide what problem objective is
2) What is the data type? (scale, ordinal,
nominal)
3) Make assumptions/approximations to reach
a test. (Make sure you check the assumptions
or chose a test that is robust against miss
modeling.)
Significance level
• When deciding what a improbably outcome is
you compare the P-value connected to
outcome with the significance level.
• The significance level is a ”risk-level” you
decide yourself before you conduct the test.
• Common level of significance is 5% and 1%
Type I and II errors
• A Type I error is made when we reject
the null hypothesis and the null
hypothesis is actually true (incorrectly
reject a true H0).
The probability of making a Type I
error is the significance level .
Type I and II errors
• A Type II error is made when we fail to reject
the null hypothesis and the null hypothesis is
false (incorrectly keep a false H0).
Power of a Test
• The power of a test is the ability to reject a
false null hypotheses.
Questions in class
1) What happens to the power if the
significance level increase/decrease? Why?
2) What happens to the power if the sample
size increase/decrease? Why?
The steps when analyzing data
statistically.
1) Make descriptive statistic relevant for the
research question/ hypotheses.
2) Construct statistical hypotheses H0 and HA,
connected to your research question. In H0 you
put the statement you want to reject.
3) Pick a significance level.
4) Choose a appropriate test and check the
assumptions of the test.
5) Evaluate P-value and draw conclusion.
Example: Earring data
• 1) Descriptive statistics
2: Hypotheses
• A) Two sided test
– H0:There is a no difference in mean number of
earrings between males and females
– HA: there is a difference.
• B) One sided test (one side)
– H0: Men have (in mean) equal or more earrings
– HA: Class: help me
• C) One sided test (other side)
– H0: Males have (in mean) equal or less earrings
– HA: Class: help me
• 3 Pick a significance level.
– Significance level=5%
(That is we decide type I error =5%)
4: Choose a appropriate test
Look Chart:
Compare two populations
Data type = interval
Descriptive measurement = Central location
Experimental design = Independent samples
Population distributions= normal (assumption)
Population variances = Unequal
Result: T-test (with unequal variances)
Put data in computer and calculate
Difference = mean (f) - mean (m)
Estimate for difference: -9,64
A) T-Test of difference = 0 (vs not =) P-Value = 0,000
B) T-Test of difference = 0 (vs <): P-Value = 0,000
C) T-Test of difference = 0 (vs >): P-Value = 1,000
Learn how to interpret out print of the statistical
software.
5) Conclusions: Class help me
Reasons for non-significant results
• There is no difference
• There is a difference, but we have too few
observations to detect it
• Important. The fact that we can’t reject the
null hypothesis does not mean that the null
hypothesis is true.
Normal probability distribution
• Common assumption in several statistical tests
• Does NOT mean that the observations are
distributed as they normally would be.
• Notion: N(mean, variance)
Normal probability distribution
250
Mean=0
200
Frequency
SD=1
150
100
50
0
-3
-2
-1
0
1
2
3
N(0,1)
~68% of obs. within mean ±1SD
~99.7% within ±3SD
~95% within ±2SD
Estimates and confidence intervals
Ex. Estimate the mean length of the population in Umeå by
measuring a sample of 10 individuals
Estimate = sample mean
95% confidence interval = mean  1,96SEM
(SEM= Standard Error of Mean=standard deviation of
mean)
95% confidence interval is an interval that with
95% probability will cover the population value.
(what we want to estimate)
Normal probability distribution
• How do I know if my variable is normal
distributed?
– continuous variable, no cut-off point
– draw histogram, normal probability plot
– symmetric, bell-shaped, mean=median
– Unsure? Use non-parametric tests if available
How to know if data is normal
distributed?
Parametric/non-parametric test
• Parametric tests:
– if data are normal distributed
– All information in sample can be summaries in the mean
and standard deviation.
• Non-parametric tests:
– primarily if data are not normal distributed
– can also be used if data is normally distributed, but less
powerful
– less sensitive to outliers
Exercise 4: back to mosquito example.
• Summary of students ideas at blackboard
(reminder)
• There are no right way. Just ways that uses
different assumptions about reality. Three
reasonable ways can be found in the chart.
• State weakness of each analysis.
Experimental design
• Put the mosquito cream A on a random selected
arm for all 30 people. Put cream B on the other
hand.
• Let the students walk the forest (with lots of
mosquitoes)
• Count the number of mosquito bits on each arm.
(After 1 hour.)
• H0: The mosquito creams are equally effective
• HA: The mosquito creams are not equally
effective.
1: Sign test (or equivalent)
For each person calculate the number of
mosquito bites on arm A minus the mosquito
bites on arm B. If the result is positive
associate the person with ”+” and if the result
is negative associate the person with a ”-”.
Count the number of “+”.
1: Sign test (or equivalent)
• If H0 is true the we expect about 15+ and 15 –
1: Sign test (or equivalent)
• If A is the better we expect many +
1: Sign test (or equivalent)
• If B is the better we except few +
1: Sign test (or equivalent)
• We reject H0 if we get many + or few + if H0
true. Reject region less than 9 or more than 21
gives test with significance level 4.28%
1: Sign test (or equivalent)
• Weakness and strength of the test
– Only care about + or – not how big the difference
is.
+ No distribution assumption. (works if we have
even less than 30 people)
+ Eliminates variation between persons.
– No good way to handle ties. (same number of -bites on each arm.)
2: T-test (not paired)
• Calculate the
mean number of mosquito bites on arms A
minus
mean number of mosquito bites on arms B
We call this number T
(You can divide T with the estimated standard
deviation of T to get a normalized test statistic, that
is T-distribute. That is what the computer do.)
2: T-test (not paired)
• If H0 is true we expect T to be close to zero.
2: T-test (not paired)
• We reject H0 if T far out in the tails.
2: T-test (not paired)
• Weakness and strength of the test
– Normal distribution assumptions. (Works really
good if sample 30 or more, confer CLT )
– Continuous approximation to discreet data may be
a bad approximation.
– Do not eliminate variation between persons.
3: paired T-test
• For each person calculate the number of
mosquito bites on arm A minus the mosquito
bites on arm B. Calculate the mean of the
numbers you get and call this number T.
(You can divide T with the estimated standard
deviation of T to get a normalized test
statistic, that is T-distribute. That is what the
computer do.)
3: paired T-test
• If H0 is true we expect T to be close to zero.
Typically we get smaller variation. Between
persons variation eliminated. More powerful
test
3: paired T-test
• We reject H0 if T far out in the tails.
3: paired T-test
• Weakness and strength of the test
– Normal distribution assumptions. (Works really
good if sample 30 or more, confer CLT )
– Continuous approximation to discreet data may be
a bad approximation.
+ Do eliminate variation between persons. Typically
if between subject variation is large the test is
much more powerful than the not paired T-test.
4: Poisson model (beyond this course)
• Model: the intensity of mosquito bites of each
persons is Poisson distributed with a mean
equal to a person effect * cream effect (A or
B).
• Estimate cream effects and evaluate
• Only weakness: Assumed interaction pattern
between persons and creams.
If time 1: Why the tests works
• “The law of large numbers (LLN). Given a
sample of independent and identically
distributed random variables (SRS) with a
finite population mean, the average of these
observations will eventually approach and stay
close to the population mean.”
• This result tells us that the larger the sample,
the better precision of the estimates.
• “The central limit theorem (CLT) states that if
the sum of independent identically distributed
random variables (SRS) has a finite variance,
then it will be approximately normal
distributed, (following a Gaussian distribution,
or bell-shaped curve). “
• This result (and similar) is important because
it lets us approximate the distribution of test
statistics which is necessary to test hypothesis
Central Limit Theorem (In words)
• No matter the form of the distribution of the
population. If you take a large SRS (20-30+
observations) the sample mean will approximately be
normal distributed. The approximation gets better and
better the larger the sample.
• Conclusion of this. If your sample is large then you
don’t need to care so much if your observations break
the normal assumption. (This holds as long as you
don’t make predictions for individuals. Se regression
next week.)
85
Exampel
Uniform distribution between 1 and 8:
UniformDistribution(1,8)
P (X )
0.2
E(X) =  = 4.5
V(X) = 2 = 5.25
SD(X) =  = 2.2913
0.1
0.0
1
2
3
4
5
6
7
8
X
86
Sample
of size
n=2.
Mean of the two
observations.
• 8*8 = 64 different
outcomes
1
2
3
4
5
6
7
8
1
1,1
2,1
3,1
4,1
5,1
6,1
7,1
8,1
2
1,2
2,2
3,2
4,2
5,2
6,2
7,2
8,2
3
1,3
2,3
3,3
4,3
5,3
6,3
7,3
8,3
4
1,4
2,4
3,4
4,4
5,4
6,4
7,4
8,4
5
1,5
2,5
3,5
4,5
5,5
6,5
7,5
8,5
6
1,6
2,6
3,6
4,6
5,6
6,6
7,6
8,6
7
1,7
2,7
3,7
4,7
5,7
6,7
7,7
8,7
8
1,8
2,8
3,8
4,8
5,8
6,8
7,8
8,8
1
2
3
4
5
6
7
8
1
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
2
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
3
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
4
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
6.5
6
3.5
4.0
4.5
5.0
5.5
6.0
6.5
7.0
7
4.0
4.5
5.0
5.5
6.0
6.5
7.0
7.5
8
4.5
5.0
5.5
6.0
6.5
7.0
7.5
8.0
87
SamplingDistributionof theMean
X
0.10
P (X )
E (X )  
V (X )  
0.05
 4. 5
X
2
 2. 625
X
SD ( X )  
 1 .6202
X
0.00
1.01.52.02.53.03.54.04.5 5.05.56.06.57.07.58.0
X
88
SamplingDistributionof theMean
UniformDistribution(1,8)
0.2
P (X )
0.10
P (X )
Compare!
• Same mean
• Less variance
• More Bellshaped
0.1
0.05
0.00
0.0
1.01.52.02.53.03.54.04.55.05.56.06.57.07.58.0
1
2
3
4
5
6
7
8
X
89
Illustration CLT
Population
n=2
n = 30

X

X

X

X
90
If time 2:Comparing means example
A. Comparing means from 2 samples
(using T-test)
B. Comparing means from several samples
(using ANOVA).
C. Comparing means from several samples
(using Blocked ANOVA)
A: Do gender affect the mean score on
a statistical exam?
A: SPSS gives (T-test)
What does the SPSS output imply?
What about if we do a one sided test. (Se hand
in 1)
B: Do students with different grades
put down different amount of time in
the studies?
A: SPSS gives (One Way ANOVA)
• What is the simple idée behind the analysis?
• What does the SPSS output imply?
• Where is the difference?
Tukey intervals
(Where does the mean differ?)
C: Do math background or Gender or
both influence the time put into the
course?
SPSS gives (two way ANOVA)
If extra time 3
population
Talk about weaknesses and strength of
quantitative and qualitative methods.