Download part 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
LIS 570
Summarising and presenting data Univariate analysis
Summary
 Basic definitions
 Descriptive statistics
 Describing frequency distributions
 shape
 central tendency
 dispersion
Selecting analysis and statistical techniques
Specific research question or hypothesis
Determine number of variables
Type title here
Univariate analysis
Bivariate analysis
Multivariate analysis
Determine level of measurement of variables
Choose univariate method of analysis
Choose relevant descriptive statistics
Choose relevant inferential statistics
De Vaus p133
Basic Definitions
 Values : the categories developed for a
variable
Nominal
 Ordinal
 Interval

 Data : Observations (Measurements) taken
on the units of analysis
Basic definitions
 Statistics - Methods for dealing with data

Descriptive statistics


summarise sample or census data
Inferential statistics

Draw conclusions about the population from the
results of a random sample drawn from that
population
Methods of analysis (De Vaus, 134)
Univariate
methods
Bivariate
methods
Multivariate
methods
Frequency distributions
Cross tabulations
Conditional tables
Scattergrams
Partial rank order
correlation
Regression
Multiple and partial
correlation
Rank order correlation
Multiple and partial
regression
Comparison of means
Path analysis
Frequency Distributions
 Ungrouped frequency distribution
A list of each of the values of the variable
 The number of times and/or the percent of times each
value occurs

 Grouped frequency distribution

A table or graph which shows the frequencies or
percent for ranges of values
Frequency distributions
Value Label
18-24
25-31
32-38
39-45
Value
Frequency
Percent
Valid
Percent
1.00
2.00
3.00
4.00
5
5
6
4
------20
25.0
25.0
30.0
20.0
------100.0
25.0
25.0
30.0
20.0
------100.0
Total
Valid cases
20
Missing cases
0
Cum
Percent
25.0
50.0
80.0
100.0
Frequency distributions
 Required information for frequency tables




table number and title
labels for the categories of the variables
column headings
the number of missing cases
Histograms
Histogram
100
80
60
Frequency
40
20
Std. Dev = 11.79
Mean = 37.2
N = 474.00
0
24.0
32.0
28.0
Age of employee
40.0
36.0
48.0
44.0
56.0
52.0
64.0
60.0
Describing Frequency Distributions
 Shape
 Symmetrical (Mirror image)

Skewed

Negative skew
 tail toward lower scores

Positive skew
 tail toward higher scores
 Dispersion
 Central tendency
Shape - for ordinal or interval variables
Positively skewed distribution
Cluster towards the low end of the variable
Shape - for ordinal or interval variables
Negatively skewed distribution
Cluster towards the high end of the variable
Shape - Symmetry
Histogram
60
50
40
30
Frequency
20
10
Std. Dev = 10.06
Mean = 81.1
N = 474.00
0
62.5
65.0
67.5
Job seniority
70.0
72.5
75.0
77.5
80.0
82.5
85.0
87.5
90.0
92.5
95.0
97.5
Central Tendency
 Typical or representative value or score

Mean (arithmetic mean)( x )



Median


Sum all the observations / n
Use for interval variables when appropriate
Value that divides the distribution so that an equal number of
values are above the median and an equal number below
Mode

Value with the greatest frequency Uni-modal, bi-modal etc.
Mode
 Best for nominal variables
 Problems



most common may not measure typicality
may be more than one mode
unstable - can be manipulated
 Dispersion

variation ratio (v)

% of people not in the modal category
Median
 Preferred for ordinal variables



people are ranked from low to high
median is the middle case
the median category is the one that the middle
person belongs to
Value Label
18-24
25-31
32-38
39-45
Value
Fr equency
Percent
Valid
Percent
Cum
Percent
1.00
2.00
3.00
4.00
5
5
6
4
------20
25.0
25.0
30.0
20.0
------100.0
25.0
25.0
30.0
20.0
------100.0
25.0
50.0
80.0
100.0
Total
Valid cases
20
Missing case s
0
Dispersion
 The cth percentile of a set of numbers is a
value such that c percent of the numbers fall
below it and the rest fall above.



The median is the 50th percentile
The lower quartile is the 25th percentile
The upper quartile is the 75th percentile
 five number summary

Median, quartiles and extremes
Dispersion
Lower
quartile
Median
Upper
quartile
Boxplot
Variable 1
Interquartile range IQR
Variable 2
Variable 3
4
6
8
10
12
14
16
Mean
 uses the actual numerical values of the
observations
 most common measure of centre
 makes sense only of interval or ratio data,
 frequently computed for ordinal variables as
well.
Dispersion
 The standard deviation and variance measure
spread about the mean as centre.
 Variance

mean of the squares of the deviations of the
observations from the mean.
 Standard deviation

the positive square root of the variance
Example Data (6,7,5,3,4)
 = 6+7+5+3+4
5
2
 Variance (S )
=
25
5
=5
Calculate the mean for the variable
 Take each observation and subtract the mean from it
 Square the result from the above
 Add (sum) all the individual results
 Divide by n

Variance
Observation
x
6
7
5
3
4
2
(s )
Deviation Sq. deviation
X-
(X - )2
6-5 = 1
1
7-5 = 2
4
5-5 = 0
0
3-5 = -2
4
4-5 = -1
1
Sum = 10
Variance = sum of the sq deviations = 10 = 2
number of observation
5
Standard deviation (s)
 Square root of the variance 2 = 1.4
 an average deviation of the observations
from their mean
 influenced by outliers
 best used with symmetrical distributions
Summary
 Determine if variable is nominal, ordinal or
interval

Nominal



Frequency tables
Mode
Ordinal




Frequency tables (grouped frequency tables
histogram
Median and five number summary plus IQR
Mode
Summary

Interval
Determine whether the distribution is skewed or
symmetrical
 Compare median and mean
 Use the mean and the standard deviation if the
distribution is not markedly skewed
 Otherwise use median and five number summary plus
IQR
 Use the mode in addition if it adds anything.

Related documents