Download Introduction to statistics and Hypothesis Testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Analysing and Understanding Learning
Assessment for Evidence-based Policy
Making
Introduction to statistics
Dr Alvin Vista, ACER
Bangkok, 14-18, Sept. 2015
Australian Council for Educational Research
Structure of workshop
Lecture/presentation – focus on concepts, brief review
Practical exercises – using the most common and accessible
software – Excel; Hands-on, to maximise transfer of knowledge
and develop skill
Discussion and interpretation of sample studies
Collaborative setting
• If you don’t know how to do something, seek help.
• If you know how to do something, provide help.
• If you’re not sure, interact.
Drawing knowledge from reality
Measurement
Theory
Data
Statistical
Theory
Statistics allows us to draw knowledge or conclusions from the data.
Measurement theory allows us to draw meaningful data from reality.
What is Measurement?
A formal definition:
‘Measurement may be regarded as the
construction of homomorphisms (scales) from
empirical relational structures of interest into
numerical relational structures that are useful.’
(Krantz et al., 1971, p.9)
In other words:
Measurement is a process where a variable (or
construct) can be converted into a number in a
consistent manner.
Data and Models
DATA
MODELS
Observations
Measurements
Sensory perceptions
Theories
Interpretations
Generalisations
Data and Models
DATA
What you see
(observed)
MODELS
What Google
Maps says
Data and Models
If there is a
mismatch between
data and model,
which is more
likely to be wrong?
Data cannot be
changed but the
methods to collect
data can be
improved to
increase the
quality of data.
Drawing knowledge from reality
We can do statistical analysis ONLY AFTER we’re confident that our data is
reliable
Measurement
Theory
Data
Statistical
Theory
Better data = better fitting models
Better fitting models = better understanding of reality
Fundamentals: Statistics
• Statistics is the study of data.
• It concerns with the:
–
–
–
–
Collection
Analysis
Presentation
Interpretation
of data.
Data in education research
• In educational context
– records are usually students or schools or
parents
– variables are usually
• responses of the students to the test items
or
• responses of students or school principals or
parents to the questionnaire items
Data: values for variables & records
• In the educational data
– responses to a particular item from all
respondents form the values for the
corresponding variable
• a column of values in our imaginable table
– responses from a particular respondent to
all items form the values for the
corresponding record
• a row of values in our imaginable table
Levels of measurement
• Nominal: Denote a category; statistics
include counts such as mode and frequency
distributions
• Ordinal: Rank order is described but
successive categories do not denote equal
differences of the measured attribute;
statistics include median
• Interval: Where the measurement is
presumed to denote equal intervals between
scores. Both the base point and unit of
measurement are arbitrary
• Ratio: Note that ratio scales have a natural
base value that cannot be changed (i.e., a
zero in one unit means the same in all other
units). Only the unit of measurement is
arbitrary.
Non-metric -categorical measures
which describe
differences in type or
kind; arithmetical
operations are not
applicable
Metric -- continuous
measures which reflect
differences in amount or
degree
In a nutshell
Level of measurement has direct
implications for how relationships
within and between variables can
be contained and identified
Levels of measurement and measures of
distribution characteristics
Level
Central tendency
Spread
Nominal
Mode
Percent distribution
Ordinal
Median
Mode
Minimum/Maximum
Range
Percentiles
Percent distribution
Interval/Ratio
Mean
Median
Mode
Variance
Standard deviation
Minimum/Maximum
Range
Percentiles
Percent distribution
Measures of central tendency
Level
Definition
Example
Mode
The attribute of a variable
that occurs most often in
the data set
Variable = Nationality
Mode = Indonesian
Median
The value of the middle
case when the cases have
been placed in order or in
line from low to high
Variable = Rank (1st, 2nd,
3rd, … 7th)
Median = 5th
Mean
The arithmetic mean or
average. Computed as the
sum of all the valid cases
together and dividing by
the number of valid cases.
Variable = Age
Mean = 24.35
Levels of measurement determine the
possible statistical analyses
INCOME
Nominal
1.00
2.00
– Cross-tabulations
– Chi-square
– Frequencies
3.00
4.00
5.00
6.00
7.00
8.00
9.00
10.00
Total
Count
Expected Count
Count
Expected Count
Count
Expected Count
Count
Expected Count
Count
Expected Count
Count
Expected Count
Count
Expected Count
Count
Expected Count
Count
Expected Count
Count
Expected Count
Count
Expected Count
MARITAL STATUS
SINGLE
MARRIED
7
64
16
55
3
36
9
30
5
74
18
61
13
71
19
65
7
39
10
36
11
25
8
28
17
19
8
28
12
7
4
15
12
2
3
11
12
3
3
12
99
340
99
340
Total
71
71
39
39
79
79
84
84
46
46
36
36
36
36
19
19
14
14
15
15
439
439
Levels of measurement determine the
possible statistical analyses
Ordinal
– Spearman
correlations
– Non-parametric
analyses
Interval and ratio
– Pearson correlations
– Parametric analyses
Measures of central tendency
• Mode [=MODE(target range)]
• Median [=MEDIAN(target range)]
• Mean [=AVERAGE(target range)]
• Lets try that using the sample data!
– TIMSS Country_X Grade 8 data.xlsx
Measures of spread
• Standard deviation [=STDEV(target
range)]
• Min [=MIN(target range)]
• Max [=MAX(target range)]
• Percentiles [=PERCENTILE(target
range, kth percentile)] where k ranges
from 0.00 to 1.00
Characteristics of a distribution
Skewness: a measure of the asymmetry of a
distribution.
[=SKEW(target range)]
The normal distribution is symmetric and has a skewness
value of zero.
– Positive skewness:
a long right tail.
– Negative skewness:
a long left tail.
Characteristics of a distribution
Kurtosis: A measure of the extent to which
observations cluster around a central point.
• For a normal distribution, the value of the
kurtosis statistic is zero.
• [=KURT(target range)]
– Leptokurtic data values are
more peaked (positive kurtosis)
– Platykurtic data values are
flatter and more dispersed along
the X axis (negative kurtosis)
Measures of spread
•
Frequencies [=FREQUENCY(target range, groups)] where bin ranges
are groups that includes values less than and up to each bin value
–
–
–
–
–
–
–
Bins = 10, 20, 30, will result in 4 groups (bin +1)
Group 1= less than or equal to 10
Group 2= 11 to 20
Group 3= 21 to 30
Group 4= more than 30
Enter as array formula
Write the formula in the first cell of the output range, select output range equal to
number of groups, press F2, then CTRL+SHIFT+ENTER)
•
Percent distribution can be computed by dividing Frequencies with
total cases [=COUNTIF(target range, value)]
•
Lets try that using the sample data!
Practical exercise!
• TIMSS Country_X grade 8.xlsx
• Complete the Descriptive statistics for
Boys, Girls, and the whole sample
• Save your results as we will use them in
later sessions.
Analysing and Understanding Learning
Assessment for Evidence-based Policy
Making
Inferential statistics and
Hypothesis testing
Bangkok, 14-18, Sept. 2015
Australian Council for Educational Research
Confidence intervals
Standard error of the mean
 The standard error is an
indicator of how precise the
statistic is, and how close it is
‘probabilistically’ to the
parameter (e.g., the true mean).
 Confidence intervals are based
on the SE
𝑆𝑆𝑆𝑆𝑋𝑋� =
𝑠𝑠
𝑛𝑛
𝐶𝐶𝐶𝐶lower = 𝑋𝑋� − 𝑍𝑍(𝑆𝑆𝑆𝑆𝑋𝑋� )
𝐶𝐶𝐶𝐶upper = 𝑋𝑋� + 𝑍𝑍(𝑆𝑆𝑆𝑆𝑋𝑋� )
Z=1.96 corresponds to a 95% CI
Confidence intervals
Confidence intervals
Confidence intervals
• Lets try computing SEs and CIs with
data!
• Means and SDs for boys and girls on
Math achievement
• Standard error of the means
• Confidence intervals
Inferential Statistics
Estimating population parameters
 Inferential statistics can show
how closely the sample statistics
approximate parameters of the
overall population.
• The sample is randomly
chosen and representative
of the total population.
• The means we might
obtain from an infinite
number of samples form a
normal distribution.
Source: Johnson, B. & Christensen, L. (2012). Educational Research:
Quantitative, Qualitative, and Mixed Approaches. Thousand Oaks, CA: Sage.
Inferential Statistics
What can we say if we have a
sample, and it’s confidence interval
does not overlap with the confidence
interval of another sample?
Inferential Statistics
 Testing Hypotheses (1)
 Research hypothesis vs. statistical hypothesis
 Statistical hypothesis testing: comparing the distribution of
data collected by a researcher with an ideal, or hypothetical
distribution
- significance level/alpha (α): e.g., .05, .01 must be set
before testing!
- “statistically significant” means there is sufficient
evidence to reject the null hypothesis.
- it does NOT mean that the alternative hypothesis is true.
Statistical hypothesis testing
 Testing Hypotheses (2)
 Making errors in
hypothesis testing
- Type I error: alpha
error, or false positive
- Type II error: beta
error, or false negative
Relationship between α and β
Statistical hypothesis testing
Our conclusions
Not guilty
Guilty
Reality
Not guilty
Correct conclusion
Guilty
Type I error
(false positive)
Type II error
(false negative)
Correct
conclusion
Sometimes, false positive is worse
Not guilty
Guilty
Reality
“It’s better for ten guilty men to
go free than for one innocent
man to be executed”
-- anonymous wise lawyer
Our conclusions
Not guilty
Guilty
Correct conclusion
Condemned
innocent
Freed guilty
Correct
conclusion
In fire alarms, false positive is better
Our conclusions
No fire
Fire
Reality
A false negative would REALLY
suck.
-- anonymous wise homeowner
No fire
Fire
Correct
conclusion
Unnecessary
panic
Burned
to death
Correct
conclusion
Statistical hypothesis testing
Testing for significance
• Remember: statistical significance is NOT the same
as practical significance!
• In inferential statistics, we are only concerned with
statistical significance. Practical significance is a
judgment call to be made by the researcher and
audience.
Statistical hypothesis testing
Making errors in hypothesis testing
Increase the power of a statistical test
•
•
•
Use as large a sample size as is reasonably possible
Maximize the validity and reliability of your measures.
Use parametric rather than non parametric statistics
whenever possible.
• Whenever we test more than one statistical hypothesis, we
increase the probability of making at least one Type 1 error.
• For multiple hypotheses, a correction (e.g., Bonferroni)
needs to be applied to our statistical tests.
Statistical hypothesis testing
Steps in Hypothesis Testing
1. State the null and alternative hypotheses.
2. Set the significance level before the research study.
(Most educational researches use .05 as the significance level.
Note that the significance level is also called the alpha level or
more simply, alpha.)
3. Obtain the probability value using a computer program such as SPSS .
4. Compare the probability value to the significance level and make the
statistical decision.
Rule 1:If probability value is less than alpha, reject the null
hypothesis. Conclude that finding is significant.
Rule 2: If probability value is greater than alpha, fail to reject the
null hypothesis. Conclude that the finding is not significant.
5. lnterpret the results. That is, make a substantive real-world decision
and determine practical significance.
Source: Johnson, B. & Christensen, L. (2012). Educational Research: Quantitative, Qualitative, and Mixed Approaches.
Thousand Oaks, CA: Sage.
Statistical hypothesis testing
Exercise 1
Let’s compare two sets of data and test the
hypothesis that their means are equal or different
Simple T-Test for independent means
[T.TEST(data range 1, data range 2)]
Data: data check.xlsx
Sets to compare:
Science scores for sets A vs B, and B vs D
Statistical hypothesis testing
Exercise 2
Let’s compare the science achievement of those parents
completed only lower secondary (PHEL=4) vs those whose
parents have a university degree (PHEL=1), and test the
hypothesis that their means are equal or different
Simple T-Test for independent means
[T.TEST(data range 1, data range 2)]
Data: data check.xlsx
Sets to compare:
Science scores for sets PHEL=4, and PHEL=1