Download Using statistics in the analysis of quantitative data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Student's t-test wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Using statistics in the analysis
of quantitative data
A good way to use this material for detailed study is to print the whole file then to
run the slide show, while reading the text from the printed version. This will allow
you to use the links and animations that are included in some of the slides.
Suggested Print settings for use in the print dialogue box:
Print Range:
Print what:
All
Notes Pages (from the drop down box)
Then tick:
Black & White, Scale to fit paper
Types of data
Data type
Example
Nominal or Categorical
Eye colour
Ordinal
Job seniority
Interval:
parametric
non-parametric
Ratio
parametric
non-parametric
Language
comprehension test
score; IQ
Age
Uses of statistics
Use of statistics
Describing a sample
Looking for
relationships between
variable in a sample
Estimating parameters
in a population
Testing hypotheses
Inferential or Noninferential
Non-inferential
Non-inferential
Inferential
Usually used inferentially
but can be used noninferentially
SPSS task
Entering data
Describing a sample
SPSS calculation of mean
Descriptive Statistics
N
attitude to school
Valid N (lis twis e)
40
40
Mean
9.95
Finding the spread of scores
in a sample
Standard Deviation
S  ( ( x  x ) / n )
2
Standard Deviation
Descriptive Statistics
attitude to s chool
Valid N (listwise)
N
Range
Minimum
Maximum
Mean
Std. Deviation
N
40
18
1
19
9.95
4.15
40
Finding how scores are
distributed
Distribution of attitude scores
Distribution of attitude scores
Properties of the Normal
Distribution
Mean ± 1 standard deviation
Mean ± 2 standard deviations
Mean
Standard Deviation
Mean
Standard Deviation
10
2
10
2
Range from 10-2 to 10+2 ie
from 8 to 12
contains
68% of cases
Range from 10-4 to 10+4 ie
from 6 to 14
contains
95% of cases
Checking normality
Descriptive Statistics
attitude to s chool
Statis tic
Std. Error
Valid N (listwise)
Statis tic
N
Mean
Std. Deviation
Skewness
Kurtosis
Skewness
Kurtosis
N
40
9.95
4.15
-.074
.197
.374
.733
40
An overall test for normality
Tests of Normality
a
attitude to s chool
Kolmogorov-Smirnov
Sig.
df
Statis tic
.200*
40
.109
*. This is a lower bound of the true s ignificance.
a. Lilliefors Significance Correction
Median and Mode for ordinal data
Statistics
confidence in s peaking time 1
N
Valid
40
Miss ing
0
Median
4.0000
Mode
4.00
Range
6.00
confidence in speaking time 1
Valid
1.00
2.00
3.00
4.00
5.00
6.00
7.00
Total
Frequency
1
2
8
10
8
6
5
40
Percent
2.5
5.0
20.0
25.0
20.0
15.0
12.5
100.0
Valid Percent
2.5
5.0
20.0
25.0
20.0
15.0
12.5
100.0
Cumulative
Percent
2.5
7.5
27.5
52.5
72.5
87.5
100.0
Describing ordinal data
Bar charts (no gaps)
Nominal data - Mode
Statistics
school
N
Valid
Mis s ing
Mode
40
0
1
school
Valid
s chool 1
s chool 2
s chool 3
Total
Frequency
14
13
13
40
Percent
35.0
32.5
32.5
100.0
Valid Percent
35.0
32.5
32.5
100.0
Describing nominal data –
Bar Chart
Describing nominal data –
Pie Chart
school 3
school 1
school 2
Exploring relationships
between data
Correlation
IQ and attitude to school
180
160
140
120
100
IQ score
80
60
40
0
attitude to school
10
20
Correlation
Correlations
IQ s core
attitude to s chool
Pears on Correlation
Sig. (2-tailed)
N
Pears on Correlation
Sig. (2-tailed)
N
attitude to
IQ s core
s chool
1.000
.564**
.
.000
40
40
.564**
1.000
.000
.
40
40
**. Correlation is significant at the 0.01 level (2-tailed).
IQ and attitude to school
180
160
140
120
100
IQ score
80
60
40
0
attitude to school
10
20
Review of meaning and
importance of linearity
• http://www.aiaccess.net/English/Glossarie
s/GlosMod/Flash/e_gm_fla_covariance.ht
m
• http://www.fon.hum.uva.nl/Service/Statistics.html
Extreme groups – a warning
IQ and attitude to school
180
160
140
120
100
IQ score
80
60
40
0
attitude to school
10
20
Correlation - effect of
measurement error
Test result
Actual points
motivation
Correlation - effect of
measurement error
Test result
Actual points
Measured
points
motivation
Correlation - effect of
measurement error
Test result
motivation
Correlation & Regression
Correlations
s caled iq
Pears on Correlation
Sig. (2-tailed)
N
Pears on Correlation
Sig. (2-tailed)
N
**. Correlation is s ignificant at the 0.01 level (2-tailed).
180
170
160
150
140
130
120
110
100
90
80
70
60
50
scaled iq
attitude to school
attitude to
s chool
s caled iq
1.000
.564**
.
.000
40
40
.564**
1.000
.000
.
40
40
40
30
20
10
0
0
10
20
attitude to school
Correlations
IQ and attitude to school
attitude to s chool
Pears on Correlation
Sig. (2-tailed)
N
Pears on Correlation
Sig. (2-tailed)
N
attitude to
s chool
.564**
.000
40
1.000
.
40
**. Correlation is significant at the 0.01 level (2-tailed).
160
140
120
100
80
IQ score
IQ s core
IQ s core
1.000
.
40
.564**
.000
40
180
60
40
0
attitude to school
10
20
Spearman Correlation
Ordinal data
Correlations
Spearman's rho
lis tening comprehens ion
s core time 1
lis tening comprehens ion
s core time 2
Correlation Coefficient
Sig. (2-tailed)
N
Correlation Coefficient
Sig. (2-tailed)
N
**. Correlation is s ignificant at the .01 level (2-tailed).
listening
listening
comprehen
comprehen
s ion s core
s ion s core
time 1
time 2
1.000
.878**
.
.000
40
40
.878**
1.000
.000
.
40
40
Chi squared test of association
Nominal data
gender * school Crosstabulation
gender
male
female
Total
Count
Expected Count
Count
Expected Count
Count
Expected Count
s chool 1
7
7.0
7
7.0
14
14.0
s chool
s chool 2
7
6.5
6
6.5
13
13.0
s chool 3
6
6.5
7
6.5
13
13.0
Chi-Square Tests
Pearson Chi-Square
N of Valid Cas es
Value
.154 a
40
df
2
As ymp. Sig.
(2-s ided)
.926
a. 0 cells (.0%) have expected count less than 5. The
minimum expected count is 6.50.
Total
20
20.0
20
20.0
40
40.0
Chi squared showing an
association
HAIRCOL * EYECOL Crosstabulation
blue
HAIRCOL
black
brown
blond
Total
Count
Expected Count
Count
Expected Count
Count
Expected Count
Count
Expected Count
18
9.6
6
8.6
6
11.8
30
30.0
EYECOL
brown
3
7.7
15
6.9
6
9.4
24
24.0
other
6
9.6
3
8.6
21
11.8
30
30.0
Chi-Square Tests
Pearson Chi-Square
N of Valid Cas es
Value
36.853 a
84
df
4
As ymp. Sig.
(2-s ided)
.000
a. 0 cells (.0%) have expected count less than 5. The
minimum expected count is 6.86.
Total
27
27.0
24
24.0
33
33.0
84
84.0
Calculating chi-squared from cell
values
• http://www.physics.csbsju.edu/stats/
contingency.html
Item analysis,
reliability and
validity
Cronbach’s Alpha
R E L I A B I L I T Y
A N A L Y S I S
N of Cases =
Statistics for
Scale
Mean
3.2500
-
S C A L E
(A L P H A)
28.0
N of
Variables
10
Variance
10.4167
Std Dev
3.2275
Scale
Mean
if Item
Deleted
Scale
Variance
if Item
Deleted
Corrected
ItemTotal
Correlation
3.1071
2.9643
2.8929
2.9643
2.9643
2.9643
2.8214
2.8571
2.7500
2.9643
8.7659
7.8876
8.0992
8.3320
8.4061
9.2950
8.6706
9.2381
8.9352
7.8876
Item-total Statistics
ITE0001
ITE0003
ITE0005
ITE0007
ITE0009
ITE0002
ITE0004
ITE0006
ITE0008
ITE0010
Reliability Coefficients
Alpha =
.8782
.7222
.8968
.7487
.7052
.6744
.3244
.5027
.3080
.4015
.8968
Squared
Multiple
Correlation
.
.
.
.
.
.
.
.
.
.
10 items
Standardized item alpha =
.8839
Alpha
if Item
Deleted
.8610
.8437
.8547
.8587
.8611
.8863
.8746
.8892
.8827
.8437
Estimating population values
Terminology
Population
(described by
parameters)
Sample
(described by
statistics)
Estimating population values
Sampling
Samples that allow statistical generalisation
•
•
•
•
•
random
systematic
stratified random
cluster
multi-stage
Samples that don’t allow statistical generalisation
• quota
• convenience
• snowball
Sampling
Samples that allow statistical generalisation
•
•
•
•
•
random
systematic
stratified random
cluster
multi-stage
Samples that don’t allow statistical generalisation
• quota
• convenience
• snowball
Making it practicable whilst
retaining validity
Calculating required sample sizes
• http://StatPages.org
• http://www.jalt.org/test/bro_25.htm and
related web pages
Statistics and parameters
Statistics of sample
Parameters of population
• Mean = m
• Standard Deviation = s
• Correlation = r
• Mean = μ
• Standard deviation = σ
• correlation = ρ
Statistics and parameters
Statistics of sample
• m
Parameters of
population
Best estimate is…
• μ=m
• s
• σ=
s.
n
n 1
• r
• ρ = r (for large
samples >30)
95% confidence limits for the
population mean - large
samples
m  1.96.s / (n  1)
to
m  1.96.s / ( n  1)
Calculation of confidence
intervals
Mean
• http://glass.ed.asu.edu/stats/analysis/mci.h
tml
Correlation
• http://glass.ed.asu.edu/stats/analysis/rci.ht
ml
Standard deviation
• Walpole R (1982) Introduction to statistics
3rd Edition p277-8;482
Confidence interval for 2 Walpole R. (1982) Introduction to Statistics 3rd
Edn New York: Macmillan pp277-8
The Surprising Effect of Population Size
As long as the population is at least ten times as large as
the sample, the size of the population has almost no
influence on the accuracy of sample estimates.
The margin of error for a sample size of 1000 is about 3%
whether the number of people in the population is 30,000
or 200 million.
You can make a good check on how salty a well stirred
bowl of soup is by tasting one spoonful – whatever the
size of*the
. bowl
What’s the surprise? There is no effect!