Download pengujian hipotesis - Ilmu Sosial dan Politik

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
PENGUJIAN
HIPOTESIS
PERTEMUAN 15
METODE PENELITIAN ILMU
POLITIK (MPIP) KUANTITATIF
BRAINSTORMING
 Apa
yang telah saudara
dapatkan dari kuliah ini? Tell
to us…
 Whats ur problems so far?
Literature? Lazy? Lecturer ?
Knowledge?

Berapa tugas yang sudah dan belum
saudara kumpulkan ?
 Kuisioner
 Proposal

penelitian (due next week)
Sudah siapkah saudara untuk FINAL
Exam ?
 Berapa
kali saudara “bimbingan” ?
BAB III METODE PENELITIAN

Pendekatan dan design Penelitian

Pendekatan
 Jenis,
 Strategi dan
 Tipe Penelitian


Lokasi Penelitian
Populasi dan Sampel

Populasi
 Teknik Penarikan Sampel
 Besaran Sampel


Unit Analisis
Teknik Pengumpulan Data

Jenis Data
 Teknik Pengumpulan Data Primer dan Sekunder




Validitas dan Reabilitas Data
Analisis Data
Pengujian Hipotesis
Struktur Penulisan
True or False?

Analisis yang kompleks membuat orang
lain kagum.

Saya secara umum mampu untuk
menggeneralisir dan menginterpretasi
data yang saya dapatkan
Analisis data dan interpretasi






Pikir tentang analisis LEBIH AWAL
Start dengan a plan
Code, enter, clean
Analisis
Interpretasi
Refleksi
 What
did we learn?
 What conclusions can we draw?
 What are our recommendations?
 What are the limitations of our analysis?
kenapa saya membutuhkan
analysis plan
To make sure the questions and your
data collection instrument will get the
information you want.
 To align your desired “report” with the
results of analysis and interpretation.
 To improve reliability--consistent
measures over time.

Key components of a data
analysis plan
Purpose of the evaluation
 Questions
 What you hope to learn from the
question
 Analysis technique
 How data will be presented

Analyzing and Interpreting
Quantitative Data

Quantitative Data is
Presented in a numerical format
Collected in a standardized manner
e.g. surveys, closed-ended interviews, tests
Analyzed using statistical techniques
True or False?

Quantitative data we gather in
Extension are more generalizable than
qualitative data.

Stating limitations weakens the
evaluation
Analyzing Survey Data
Apa yang akan anda laporkan…
 how many people answered a, b, c, d?
 the average number or score?
 a change in score between two points
in time?
 how people compared?
 how many people reached a certain
level?
Common descriptive statistics









Count (frequencies)
Percentage
Mean
Mode
Median
Range
Standard deviation
Variance
Ranking
Other Statistics




Statistical Significance
Factor Analysis
Etc.
Not often used in Extension program
evaluation—generally require randomization,
large samples, and/or control groups
Getting your data ready
Assign a unique identifier
 Organize and keep all forms
(questionnaires, interviews,
testimonials)
 Check for completeness and accuracy
 Remove those that are incomplete or
do not make sense

Data entry

You can enter your data
 By
hand
 By computer

http://learningstore.uwex.edu/Using-Excel-forAnalyzing-Survey-QuestionnairesP1030C0.aspx
Data entry by computer

By Computer
 Excel
(spreadsheet)
 Microsoft Access (database mngt)
 Quantitative analysis: SPSS (statistical
software)
Data entry computer screen
Smoking: 1 (YES) 2 (NO)
Survey
ID
001
002
003
004
005
Q1 Do you
smoke?
1
1
2
2
1
Q2 Age
24
18
36
48
26
Q3 Support
ordinance?
2
2
1
1
1
Dig deeper
Did different groups show different
results?
 Were there findings that surprised you?
 Are there things you don’t understand
very well – further study needed?

Supports
restaurant
ordinance
Opposes
restaurant
ordinance
Undecided/
declined to
comment
8
(15% of
smokers)
33
(60% of
smokers)
14
(25% of
smokers)
Non-smokers
(n=200)
170
(86% of nonsmokers)
16
(8% of nonsmokers)
12
(6% of nonsmokers)
Total
(N=255)
178
(70% of all
respondents)
49
(19% of all
respondents)
26
(11% of all
respondents)
Current
smokers
(n=55)
Discussing limitations
Written reports:
 Be explicit about your limitations
Oral reports:
 Be prepared to discuss limitations
 Be honest about limitations
 Know the claims you cannot make
 Do
not claim causation without a true experimental
design
 Do not generalize to the population without
random sample and quality administration (e.g.,
<60% response rate on a survey)
PENGUJIAN HIPOTESIS


Penentuan rumus untuk pengujian
hipotesis
jenis-jenis pengujian hipotesis.
Summarizing Data


Data are a bunch of values of one or more variables.
A variable is something that has different values.

Values can be numbers or names, depending on the variable:





When values are numbers, visualize the distribution of all values in
stem and leaf plots or in a frequency histogram.


Numeric, e.g. weight
Counting, e.g. number of injuries
Ordinal, e.g. competitive level (values are numbers/names)
Nominal, e.g. sex (values are names
Can also use normal probability plots to visualize how well the
values fit a normal distribution.
When values are names, visualize the frequency of each value with
a pie chart or a just a list of values and frequencies.

A statistic is a number summarizing a bunch of values.



Simple or univariate statistics summarize values of one variable.
Effect or outcome statistics summarize the relationship between
values of two or more variables.
Simple statistics for numeric variables…



Mean: the average
Standard deviation: the typical variation
Standard error of the mean: the typical variation in the mean with
repeated sampling




Multiply by (sample size) to convert to standard deviation.
Use these also for counting and ordinal variables.
Use median (middle value or 50th percentile) and quartiles (25th and
75th percentiles) for grossly non-normally distributed data.
Summarize these and other simple statistics visually with box and
whisker plots.

Simple statistics for nominal variables
 Frequencies,
proportions, or odds.
 Can also use these for ordinal variables.

Effect statistics…
 Derived
from statistical model (equation) of the form
Y (dependent) vs X (predictor or independent).
 Depend on type of Y and X . Main ones:
Y
numeric
numeric
nominal
nominal
X
Model/Test
numeric regression
nominal t test, ANOVA
nominal chi-square
numeric categorical
Effect statistics
slope, intercept, correlation
mean difference
frequency difference or ratio
frequency ratio per…

Model: numeric vs numeric
e.g. body fat vs sum of
skinfolds
body fat
(%BM)
 Model
or test:
linear regression
 Effect statistics:


slope and intercept
= parameters
correlation coefficient or variance explained (= 100·correlation2)
= measures of goodness of fit
 Other

sum skinfolds (mm)
statistics:
typical or standard error of the estimate
= residual error
= best measure of validity (with criterion variable on the Y axis)

Model: numeric vs nominal
e.g. strength vs sex
 Model



statistics:
female
male
sex
difference between means
expressed as raw difference, percent difference, or fraction of
the root mean square error (Cohen's effect-size statistic)
variance explained or better (variance explained/100)
= measures of goodness of fit
 Other

strength
t test (2 groups)
1-way ANOVA (>2 groups)
 Effect

or test:
statistics:
root mean square error
= average standard deviation of the two groups

More on expressing the magnitude of the effect
 What
often matters is the difference between means
relative to the standard deviation:
Trivial effect:
Very large effect:
females
females
males
males
strength
strength



Fraction or multiple of a standard deviation is known as the
effect-size statistic (or Cohen's "d").
Cohen suggested thresholds for correlations and effect sizes.
Hopkins agrees with the thresholds for correlations but suggests
others for the effect size:
Correlations
Cohen: 0
Hopkins: 0
0.1
0.1
trivial
Effect Sizes
Cohen: 0
Hopkins: 0
0.3
0.3
small
0.2
0.2
0.5
0.5
moderate
0.5
0.6
0.8
1.2
0.7
large
0.9
very large
2.0
4.0
 For studies of athletic performance, percent differences or
changes in the mean are better than Cohen effect sizes.
1
!!!


Model: numeric vs nominal
(repeated measures)
e.g. strength vs trial
 Model



pre
trial
post
statistics:
change in mean expressed as raw change, percent change, or
fraction of the pre standard deviation
 Other

or test:
paired t test (2 trials)
repeated-measures ANOVA with
one within-subject factor (>2 trials)
 Effect
strength
statistics:
within-subject standard deviation (not visible on above plot)
= typical error: conveys error of measurement

useful to gauge reliability, individual responses, and magnitude of
effects (for measures of athletic performance).

Model: nominal vs nominal
e.g. sport vs sex
 Model

females
or test:
30%
75%
chi-squared test or
contingency table
 Effect
statistics:
males
rugby yes
rugby no
Relative frequencies, expressed
as a difference in frequencies,
ratio of frequencies (relative risk),
or ratio of odds (odds ratio)
 Relative risk is appropriate for cross-sectional or
prospective designs.



risk of having rugby disease for males relative to females is
(75/100)/(30/100) = 2.5
Odds ratio is appropriate for case-control designs.

calculated as (75/25)/(30/70) = 7.0

Model: nominal vs numeric
e.g. heart disease vs age

Model or test:


Effect statistics:


relative risk or odds ratio
per unit of the numeric variable
(e.g., 2.3 per decade)
Model: ordinal or counts vs whatever



categorical modeling
100
heart
disease
(%)
0
30
50
70
age (y)
Can sometimes be analyzed as numeric variables using regression
or t tests
Otherwise logistic regression or generalized linear modeling
Complex models


Most reducible to t tests, regression, or relative frequencies.
Example…

Model: controlled trial
(numeric vs 2 nominals)
e.g. strength vs trial vs group

Model or test:




strength
unpaired t test of
placebo
change scores (2 trials, 2 groups)
pre
post
repeated-measures ANOVA with
within- and between-subject factors
trial
(>2 trials or groups)
Note: use line diagram, not bar graph, for repeated measures.
Effect statistics:


drug
difference in change in mean expressed as raw difference, percent
difference, or fraction of the pre standard deviation
Other statistics:

standard deviation representing individual responses (derived from
within-subject standard deviations in the two groups)

Model: extra predictor variable to "control for
something"
e.g. heart disease vs physical activity vs age
 Can't
reduce to anything simpler.
 Model or test:




multiple linear regression or analysis of covariance (ANCOVA)
Equivalent to the effect of physical activity with everyone at the
same age.
Reduction in the effect of physical activity on disease when age
is included implies age is at least partly the reason or
mechanism for the effect.
Same analysis gives the effect of age with everyone at same
level of physical activity.
 Can
use special analysis (mixed modeling) to include a
mechanism variable in a repeated-measures model.
See separate presentation at newstats.org.

Problem: some models don't fit uniformly for
different subjects
 That
is, between- or within-subject standard deviations
differ between some subjects.
 Equivalently, the residuals are non-uniform (have
different standard deviations for different subjects).
 Determine by examining standard deviations or plots of
residuals vs predicteds.
 Non-uniformity makes p values and confidence limits
wrong.
 How to fix…




Use unpaired t test for groups with unequal variances, or…
Try taking log of dependent variable before analyzing, or…
Find some other transformation. As a last resort…
Use rank transformation: convert dependent variable to ranks
before analyzing (= non-parametric analysis–same as
Wilcoxon, Kruskal-Wallis and other tests).
Generalizing from a Sample to a
Population
 You study a sample to find out about the population.


The value of a statistic for a sample is only an estimate of
the true (population) value.
Express precision or uncertainty in true value using 95%
confidence limits.




Confidence limits represent likely range of the true value.
They do NOT represent a range of values in different subjects.
There's a 5% chance the true value is outside the 95% confidence
interval: the Type 0 error rate.
Interpret the observed value and the confidence limits as
clinically or practically beneficial, trivial, or harmful.

Even better, work out the probability that the effect is clinically or
practically beneficial/trivial/harmful. See sportsci.org.

Statistical significance is an old-fashioned way of
generalizing, based on testing whether the true value could
be zero or null.





Assume the null hypothesis: that the true value is zero (null).
If your observed value falls in a region of extreme values that would
occur only 5% of the time, you reject the null hypothesis.
That is, you decide that the true value is unlikely to be zero; you
can state that the result is statistically significant at the 5% level.
If the observed value does not fall in the 5% unlikely region, most
people mistakenly accept the null hypothesis: they conclude that
the true value is zero or null!
The p value helps you decide whether your result falls in the
unlikely region.

If p<0.05, your result is in the unlikely region.
 One
meaning of the p value: the probability of a more
extreme observed value (positive or negative) when true
value is zero.
 Better meaning of the p value: if you observe a positive
effect, 1 - p/2 is the chance the true value is positive,
and p/2 is the chance the true value is negative. Ditto
for a negative effect.


Example: you observe a 1.5% enhancement of performance
(p=0.08). Therefore there is a 96% chance that the true effect is
any "enhancement" and a 4% chance that the true effect is any
"impairment".
This interpretation does not take into account trivial
enhancements and impairments.
 Therefore,
if you must use p values, show exact values,
not p<0.05 or p>0.05.

Meta-analysts also need the exact p value (or confidence
limits).
 If
the true value is zero, there's a 5% chance of getting
statistical significance: the Type I error rate, or rate of
false positives or false alarms.
 There's also a chance that the smallest worthwhile true
value will produce an observed value that is not
statistically significant: the Type II error rate, or rate of
false negatives or failed alarms.

In the old-fashioned approach to research design, you are
supposed to have enough subjects to make a Type II error rate
of 20%: that is, your study is supposed to have a power of 80%
to detect the smallest worthwhile effect.
 If
you look at lots of effects in a study, there's an
increased chance being wrong about at least one of
them.

Old-fashioned statisticians like to control this inflation of
the Type I error rate within an ANOVA to make sure the
increased chance is kept to 5%. This approach is misguided.

The standard error of the mean (typical variation in the
mean from sample to sample) can convey statistical
significance.


Non-overlap of the error bars of two groups implies a statistically
significant difference, but only for groups of equal size (e.g. males
vs females).
In particular, non-overlap does NOT convey statistical significance
in experiments:
High reliability
p = 0.003
Low reliability
p = 0.2
Mean ± SEM
in both cases
whatever
pre
post
pre
post
pre
post

In summary
 If
you must use statistical significance, show exact p
values.
 Better still, show confidence limits instead.
 NEVER show the standard error of the mean!
 Show the usual between-subject standard deviation to
convey the spread between subjects.

In population studies, this standard deviation helps convey
magnitude of differences or changes in the mean.
 In
interventions, show also the within-subject standard
deviation (the typical error) to convey precision of
measurement.

In athlete studies, this standard deviation helps convey
magnitude of differences or changes in mean performance.
PENENTUAN RUMUS
Apa yang harus diperhatikan ketika
menentukan rumus yang akan saudara
untuk melakukan uji hipotesis ?
 Apakah semua penelitian HARUS
melakukan pengujian hipotesis ?
Bagaimana dengan penelitian dengan
satu variabel ?

YANG HARUS DIPERHATIKAN

Jenis hipotesis yang dibangun,
 Deskriptif
 Asosiatif
 Komparatif

Jenis skala pengukuran yang
dipergunakan,
 Nominal,

ordinal, interval, rasio.
Variasi populasi
 homogen
atau heterogen.
Macam Data
Nominal
Bentuk Hipotesis
Deskriptif
(satu
variabel)
Komparatif (dua sampel)
Komparatif (lebih dari 2
sampel)
Related
Independen
Related
Independen
Binomial
Mc Nemar
Fisher Exact
Probability
2 for k sample
2 for k sample
Contingency
Coefficient C
Median
Extension
Spearman Rank
Correlation
Kruskal-Wallis
One Way
Anova
Kendall Tau
One-Way
Anova*
One-Way
Anova*
Pearson
Product
Moment *
Two Way
Anova*
Two Way
Anova*
Cochran Q
2 One
Sample
Ordinal
Run Test
Asosiatif
(hubungan)
2 Two
Sample
Sign test
Wilcoxon
matched parts
Median test
Mann-Whitney
U test
Friedman
Two WayAnova
Kolmogorov
Simrnov
WaldWoldfowitz
Interval
Rasio
T Test*
T-test of*
Related
T-test of*
independent
Partial
Correlation*
Multiple
Correlation*
(6) Uji statistik (statistical testing).
Tahapan ini menguji hipotesis yang
dibangun oleh peneliti.
 Tes signifikansi merupakan nama lain dari
uji statistik ini.
 Tes signifikansi bertujuan untuk
mendapatkan bukti empiris dalam
melakukan generalisasi terhadap populasi
dari mana data diambil.

UJIAN AKHIR SEMESTER
PROPOSAL LENGKAP + KUISIONER
UNTUK UAS DEADLINE 10 DESEMBER
2012 jam 12.00 WIB
 DISARANKAN UNTUK MENDISKUSIKAN
PROPOSAL DAN KUISIONER SAUDARA
TERAKHIR KALINYA DENGAN DOSEN
PEMBIMBING.

UJIAN AKHIR SEMESTER

PERSENTASE NILAI
 UJIAN
LISAN 65%
 UJIAN TULISAN 35%

UJIAN LISAN AKAN DILAKSANAKAN
SEKITAR TANGGAL 10 – 22 DESEMBER
2012 (ATAU LIHAT PENGUMUMAN
LEBIH LANJUT)
Related documents