Download Topic One: IRT, and the Rasch Model, in a nutshell

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Confidence interval wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Topic 7
Making inferences about Means
General process of making inference about a statistic
(1) Establish the sampling distribution of the statistic to assess the variability of the
statistic.
For example, if we are interested in the mean reading score of students in Victoria, we
take a sample and compute the sample mean. Because this sample mean is not the
population mean, there is likely variation in the value of the sample mean if different
samples are drawn. We need to find out how large the variation is. If the variation is
large, then our estimate is probably not very accurate to represent the population
mean. If the variation is small, then our estimate is probably quite close to the
population mean.
(2) We can repeatedly sample to establish the sampling distribution of the statistic of
interest. But this is impractical as it will be too costly.
Making inferences about Mean
We can use the central limit theorem to establish the sampling distribution of the
sample mean, without doing repeated sampling. Central limit theorem says that the
mean of independently drawn observations will be approximately normally
distributed, even if the distribution from which the sample is drawn is not normal.
Further, it can be shown that mean values computed from samples of size n have a
normal distribution with mean  and standard deviation of

n
(known as the
standard error), where  is the mean of the distribution we draw our samples from,
and  is the standard deviation of the distribution we draw our samples from.
That is, if X denotes the sample mean, then
X 

has a standard normal distribution
n
with mean zero and standard deviation 1 (z-score). For a standard normal distribution,
95% of the observations lie within 1.96. With a little re-arrangement of the equation,
it can be shown that 95% of the time, or, we are 95% confident that,
X  1.96  
n
 

X  1.96  
n
(There is a 95% chance that the population mean lies within the range shown above.)
t-distribution
Of course, in practice, we do not know the population standard deviation,  . So we
use the sample standard deviation, s , instead.
X 
(where s is an estimate of  ) is no longer normally
s
n
distributed. Instead, it has a t-distribution, with n  1 degrees of freedom. This means
However, the statistic
that, we cannot use 1.96 for 95% confidence interval. We must find the 95% range
from a t-distribution.
The t-distribution approaches the normal distribution as the sample size increases.
To compute the probability values from a t-distribution, use the EXCEL function
TDIST. For example, in a cell in EXCEL, type
=TDIST(1.96,50,2)
The value 0.056 is returned. This means that 5.6% of the observations from a tdistribution (with 50 degrees of freedom) lie outside the values 1.96. In comparison,
for the normal distribution, 5% of the observations lie outside the values 1.96.
Conversely, the function
=TINV(0.05, 50)
returns the value 2.01. This means that 95% (=1-0.05) of the observations from a tdistribution with 50 degrees of freedom lie within 2.01 (instead of 1.96 as in the
normal distribution).
(See page 268, Coladarci, for a comparison between t-distribution and normal
distribution)
Example 1
In a sample of 50 randomly selected teachers, the average age is 36 with a standard
deviation of 6. What is the 95% confidence interval for the population mean age of
teachers?
(Hint: What is the standard error of the mean? The 95% confidence interval for the
population mean is
Sample mean  t(0.05, 49)×standard error, where t(0.05,
probability, with 49 degrees of freedom.)
49)
is the t-value for 95%
Example 2
The cost of 15 randomly selected university textbooks at one university are (in
dollars):
120
89
65
135
145
110
160
50
115
128
99
95
105
120
75
Estimate the mean cost of textbooks based on this sample at this university. Give 95%
confidence interval.
(1) Paste the prices in EXCEL. Compute average and standard deviation of the prices.
Compute standard error.
What is the 95% confidence interval?
(2) Paste the numbers in SPSS. Select Analyse -> Descriptive statistics -> Explore.
Compare the answers from SPSS with the results from EXCEL.
Example 2B
According to university administration, the average cost of textbooks is $90. Using
the data from Example 2, at 95% confidence level, will you reject the hypothesis that
the average cost is $90? What if you test this hypothesis at 90% confidence interval?
What if you test this at 99% confidence interval? (Use SPSS explore to work out the
confidence interval at different levels.)
Two-sample t-test
Example 3
At another university, a similar survey is carried out. The cost of 15 randomly
selected university textbooks at this university are (in dollars):
100
120
145
120
115
118
66
89
93
89
109
120
60
70
55
We want to compare whether the cost of textbooks at university 2 is, on average,
cheaper than the cost of textbooks at university 1.
Paste the prices in EXCEL. Compute average and standard deviation of the prices.
Based on the average and standard deviation, do you think the cost of textbooks at
university 2 is generally cheaper?
Did you make this judgement by comparing the average cost at university 1 and
university 2? Did you take into account of the confidence intervals based on the
estimates?
Since there are only 15 textbooks in the sample, we expect variability in average cost
from sample to sample. A formal statistical method to compare the means from two
independently drawn samples is the two-sample t-test. The test takes into account the
size of the difference between the two group averages, as well as the likely variability
in the averages due to sampling.
SPSS exercise
Import the data for textbooks into SPSS
(animated demo: TwoSample-t-test_demo.swf)
(animated demo: t-test output_demo.swf)
That is, there is about 1 in 3 chance (0.37) for us to observe a difference as large as
$9.50 (= $107.4 - $97.9) when there is no difference in the ‘true’ (population) means.
So we can say the following:
At 63% confidence level (or lower), we will reject the hypothesis that the population
means are the same. (Generally, 63% confidence level is not regarded as very
confident!)
At any confidence level higher than 63% (e.g., 90%, 95%, 99%), we cannot reject this
hypothesis.
You need to choose a confidence level before making your conclusions.
Discussion point:
First, note that different conclusions can be reached depending on the confidence
level.
Second, how does one decide on the confidence level?
Third, once a conclusion is made, beware that the conclusions is NOT the ‘truth’ (of
whether there IS a difference in the population means.). In fact, the population means
are extremely unlikely to be identical to 100 decimal places, so the ‘truth’ is almost
certainly that the mean cost at university 1 is NOT the same as the mean cost at
university 2. Given large enough sample sizes, we will almost certainly find statistical
significance, at high confidence levels, even if the population mean difference is
minute.
Effect size
In addition to statistical significant, we may want to assess whether the observed
difference matters. We might decide, for example, if the population mean difference
is less than $2, then we will conclude that there is no “important” difference, and,
regardless of statistical significance or not, we will conclude there is no difference.
A commonly used measure of effect size is to look at the size of the difference in
means in relation to the standard deviation:
X1  X 2
standard deviation (pooled)
(see page 295-298, Coladarci, et al)
Example 4
Use the data set StudentLiteracyScoresCutDown.sav to assess whether there is a
difference between the average reading scores for males and females.
Use the whole data first. This is already a sample from the population. What
conclusions have you drawn? How would you write your conclusion in a report?
Sub-sample from this data set. At about what sample size, are you likely to change
your conclusion?
If you are reporting this to the minister of education, or reporting it as a newspaper
article, how would you write it?
Paired sample t-test
In the examples about the cost of university textbooks, our survey method was not
very efficient, because it could be by chance that different textbooks are selected at
the two universities. The expected variation in the average cost may fluctuate more
widely when different textbooks are selected.
A better design is to choose the same textbooks at both universities and compare the
prices, pairwise.
The following shows prices of 15 textbooks as sold at two universities.
text book 1
text book 2
text book 3
text book 4
text book 5
text book 6
text book 7
text book 8
text book 9
text book 10
text book 11
text book 12
text book 13
text book 14
text book 15
price at uni 1
120
89
60
125
150
115
160
65
115
128
100
98
105
120
80
price at uni 2
120
85
60
120
145
118
155
60
108
125
100
100
105
120
78
To compare the difference in average prices at the two universities,
(1) Analyse as two independent sample t-test
(You have to organise the data in SPSS with one column stacked under the other, and
add a group variable with values of 1 or 2 to indicate which university it is.)
(2) Analyse as paired-samples t-test
(You have to organise the data in SPSS with two matched columns. See animated
demo: Paired Sample t-test_demo.swf).
What are the differences in results between the above two analyses?
Why are they different?
Exercise
Use the data set, TIMSS2003AUS_Cutdown.sav,
(1) Investigate whether there is a difference between students’ mathematics score and
science score (variables 42 and 43, bsmstdr and bssstdr).
Carry out analyses using (a) two sample t-test and (b) paired-sample t-test.
Can you give results regarding statistical significance, and also say something about
the importance (or otherwise) about the magnitude of the difference (concept of effect
size).
(2) Investigate whether there is any difference in mathematics achievement between
girls and boys (use variable 42).