Download 6QuantiativeDataAnalysis-CentralTendency_Dispersion

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Predictive analytics wikipedia , lookup

Regression analysis wikipedia , lookup

Corecursion wikipedia , lookup

Data analysis wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Data assimilation wikipedia , lookup

Generalized linear model wikipedia , lookup

Probability box wikipedia , lookup

Transcript
Introduction to Quantitative Data
Analysis (continued)
Reading on Quantitative Data Analysis: Baxter and Babbie,
2004, Chapter 11.
Course website:
http://www.sfu.ca/cmns/faculty/marontate_j/260/07-spring/
Audio recordings of Thursday lectures available on-line (for
students registered in the course) at
www.sfu.ca/lectures
Last Day: Beginning of Quantitative Data
Analysis

Introduction to Common Ways of Presenting
Statistics & Importance for Analysis
(descriptive statistics)
 Tables
 Charts
 Graphs

Univariate Statistics
 Measures
of Central Tendancy
 Measures of Dispersion
Discrete & Continuous Variables

Continuous
 Variable
can take infinite (or large) number of values
within range
 Ex.

Age measured by exact date of birth
Discrete
 Attributes
of variable that are distinct but not
necessarily continuous
 Ex.
Age measured by age groups (Note: techniques exist
for making assumptions about discrete variables in order to
use techniques developed for continuous variables)
The Lexis Diagram
Isochron:
observation in 1968
Age
Life line:
cohort born in
1948
80
60
40
Age at year of
observation: 20
20
0
1890
1910
1930
1950
1970
1990
2010
Period
Core Notions in Basic Univariate
Statistics
 Ways
of describing data about one
variable (“uni”=one)
Measures
of central tendency
Summarize
information about one variable
(“averages”)
Measures
of dispersion
Variations
or “spread”
Mode

most common or frequently occurring
category or value (for all types of data)
Babbie (1995: 378)
Bimodal

When there are two “most common” values that
are almost the same (or the same)
Median

middle point of rank-ordered list of all values
(only for ordinal, interval or ratio data)
Babbie (1995: 378)
Mean (arithmetic mean)
 Arithmetic
“average” = sum of values divided by
number of cases (only for ratio and interval data)
Babbie (1995: 378)
Two Data Sets with the Same Mean
Another Diagram of Normal Curve
(Showing Ideal Random Sampling
Distribution, Standard Deviation & Zscores)
Normal Distribution & Measures of
Central Tendency

Symmetric

Also called the “Bell Curve”
Neuman (2000: 319)
Skewed Distributions &
Measures of Central Tendency
Skewed to the left
Skewed to the right
Neuman (2000: 319)
Why Measures of Central Tendency
are not enough to describe
distributions

7 people at bus stop in front of bar aged
25,26,27,30,33,34,35
 median=

7 people in front of ice-cream parlour aged
5,10,20,30,40,50,55
 median=

30, mean= 30
30, mean= 30
BUT issue of “spread” socially significant
Another Illustration Normal &
Skewed Distributions
Measures of Variation or Dispersion
range: distance between largest and smallest
scores
 standard deviation: for comparing distributions
 percentiles: % up to and including the number
(from below)
 z-scores: for comparing individual scores taking
into account the context of different distributions

Range & Interquartile range

distance between largest and smallest scores
 what
does a short distance between the scores tell us
about the sample?
 But problems of “outliers” or extreme values may occur
Interquartile range (IQR)

distance between the 75th percentile and the 25th
percentile
range of the middle 50% (approximately) of the data
Eliminates problem of outliers or extreme values

Example from StatCan website (11 in sample)








Data set: 6, 47, 49, 15, 43, 41, 7, 39, 43, 41, 36
Ordered data set:6, 7, 15, 36, 39, 41, 41, 43, 43, 47, 49
Median:41
Upper quartile: 41
Lower quartile: 15
IQR= 41-15
Standard Deviation and Variance
Inter quartile range eliminates problem of
outliers BUT eliminates half the data
 Solution? measure variability from the center of
the distribution.
 standard deviation & variance measure how far
on average scores deviate or differ from the
mean.

Calculation of Standard
Deviation
1
2
13
4
5
6
7
8
Neuman (2000: 321)
Calculation of Standard
Deviation
Neuman (2000: 321)
Standard Deviation Formula
Neuman (2000: 321)
Details on the Calculation of Standard Deviation
Neuman (2000: 321)
Discussion The Bell Curve &
standard deviation
Discussion of Preceding Diagram


“Many biological, psychological and social phenomena
occur in the population in the distribution we call the
bell curve (Portney & Watkins, 2000).” link to source
Preceding picture
a
symmetrical bell curve,
 average score [i.e., the mean] in the middle, where the ‘bell’
shape tallest.
 Most of the people [i.e., 68% of them, or 34% + 34%] have
performance within 1 segment [i.e., a standard deviation] of
the average score.”
Interpreting
Standard Deviation
amount of variation
from mean
 Illustration: high &
low standard
deviation
 meaning depends on
exact case

Recall: Central Tendency & Dispersion
(description of distributions)

7 people at bus stop in front of bar aged
25,26,27,30,33,34,35
 median=
30, mean= 30
 Range= 10, standard deviation=10.5

7 people in front of ice-cream parlour aged
5,10,20,30,40,50,55
 median=
30, mean= 30
 Range= 50, standard deviation=17.9
Other ways of characterizing
dispersion or spread
Techniques for understanding position of a case
(or group of cases) in the context all of cases
 Percentiles
 Standard Scores

 z-scores
Percentile

1st Calculate rank then choose a rank (score) and figure
out percentage equal to or less than the rank (score)
 Link

to more complex definition of percentile
% up to and including the number (from below)
 “A
percentile rank is typically defined as the proportion of scores
in a distribution that a specific score is greater than or equal to.
For instance, if you received a score of 95 on a math test and this
score was greater than or equal to the scores of 88% of the
students taking the test, then your percentile rank would be 88.
You would be in the 88th percentile”

Also used in other ways (for example to eliminate cases)
z-scores
For understanding how a score is positioned in the
data set
 to enable comparisons with other scores from other
data sets

 (comparing
 example
individual scores in different distributions)
of two students from different schools with different
GPAs
 comparing
sample distributions to population. How
representative is sample to population under study?
(Link to more complete discussion of use of z-scores to
understand sampling distribution)
Calculating Z-Scores

z-score=(score – sample mean)/standard
deviation of set
 Link
to formula
 Link to z-score calculator
Calculating
Z-Scores (p.
265
textbook)
Using Z-scores to compare two
students’ from different schools: A
Susan with GPA of 3.62 and Jorge with GPA of
3.64
 Susan from College A

 Susan’s
Grade Point Average =3.62
 Mean GPA= 2.62
 SD= .50
 Susan’s z-score= 3.62-2.62=1.00/.50=2
 Susan’s grade is two Standard deviations above mean
at her school
Using Z-scores to compare two
students’ from different schools: B

Jorge from College B
 Jorge’s
GPA =3.64
 Mean GPA= 3.24
 SD=.40
 Jorge’s z-score= 3.64-3.24=.40/.40=1
 Jorge’s grade is one standard deviation above the
mean at his school

Susan’s absolute grade is lower but her position
relative to other students at her school is much
higher than Jorge’s position at his school
Another Diagram of Normal Curve
with Standard Deviation & Z-scores
Discussion of Previous Case

Relationship of sampling distribution to
population (use mean of sample to estimate
mean of population)
Recall: Results with two Variables-Bivariate Statistics

Statistical relationships between two variables
 Covariation
(vary together)
a
type of association
 Not necessarily causal
 Independence
(Null hypothesis): no relationship
between the two variables
 Cases
with values in one variable do not have any particular
value on the other variable
Sample Mean Notation
Population Mean Notation
Standard Error (recall tutorial task
about average ages in family)
Calculate mean for all possible samples
 Divide by number of samples


Measures variability
Recall: Results with two Variables-Bivariate Tables (Cross Tabulations)
Singleton, R., Straits, B. & Straits, M. (1993)
Approaches to social research. Toronto: Oxford
Interpretation issues (Bivariate
Tables)
Calculate percentages within categories of
attributes of independent variable
 In example:

 Independent
variable: gender
 Dependent variable: fear of walking alone at night
 Women more afraid than men
Other Ways of Presenting Same
Data

Link to other tables
Calculating Expected Outcomes
 If variables (gender & fear) not related then distribution
of subgroups of independent variable (male & female)
should be the same in each subgroup as in the group
overall (therefore men and women should express fear in
the same proportions)

Used in techniques for studying relationships (Chi-square)


Descriptive dimension (strength of relationship)
Inferential (probability that the association is due to chance)
Expected outcomes (Null
Hypothesis)
Singleton, R., Straits, B. & Straits, M. (1993)
Approaches to social research. Toronto: Oxford
Next Day
Control variables: Trivariate Tables
Men/Women Drivers

In, Say it with Figures, Hans Zeisel presents the following data:
Automobile Accidents by Sex
-----------------------------------------Per Cent
Accident Free
Women
Men
68%
(6,950)
56%
(7,080)
------------------------------------------
Automobile Accidents by Sex and Distance Driven
---------------------------------------------------------------------------Distance
Under 10,000 km
Over 10,000 km
Per Cent
Per Cent
Accident Free
Accident Free
Women
Men
75%
(5,035)
75%
(2,070)
48%
(1,915)
48%
(5,010)
----------------------------------------------------------------------------
Women have fewer accidents than men because women tend
to drive less frequently than do men, and people who drive
less frequently tend to have fewer accidents