Download Lecture 6

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Lecture 7:
Bivariate Statistics
Properties of Standard Deviation
Variance is just the square of
the S.D.
If a constant is added to all
scores, it has no impact on S.D.
If a constant is multiplied to all
scores, it will affect the
dispersion (S.D. and variance)
S = standard deviation
X = individual score
M = mean of all scores
n = sample size (number
of scores)
2
3
Distributions and Standard
Deviations
Example: A distribution has a mean of 40
and a standard deviation of 5. 68% of the
distribution can be found between what
two values?
95% of the distribution can be found
between what two values?
4
Standard Error of the Mean
Standard Error is an estimate of how much
the mean would vary over many samples
drawn from the same population.


It is calculated from a single sample– it is an
estimate of the standard deviation of the
sampling distribution of the mean.
Smaller S.E. suggests that our sample is likely a
good estimate of the population mean.
s
SEM 
N
5
Common Data Representations
Histograms

Simple graphs of the frequency of groups of scores.
Stem-and-Leaf Displays

Another way of displaying dispersion, particularly
useful when you do not have large amounts of data.
Box Plots

Yet another way of displaying dispersion. Boxes show
75th and 25th percentile range, line within box shows
median, and “whiskers” show the range of values
(min and max)
6
Estimation and Hypothesis Tests:
The Normal Distribution
A key assumption for many variables (or
specifically, their scores/values) is that
they are normally distributed.
In large part, this is because the most
common statistics (chi-square, t, F test)
rest on this assumption.
7
Why do we make this assumption?
Central Limit Theorem
Errors can be viewed as a sum of many
independent random effects, thus individual scores
will tend to be normally distributed.
Even if Y is not normally distributed, the
distribution of the sample mean will tend to be
normal as the sample size increases.
Y=µ+ε

A given score (Y) is the sum of the mean of
the population (µ) and some error (ε)
8
The z-score
Infinitely many normal
distributions are possible, one
for each combination of mean
and variance– but all related to
a single distribution.
Standardizing a group of
scores changes the scale to
one of standard deviation
units.
z
Y 

Allows for comparisons with
scores that were originally on a
different scale.
9
z-scores (continued)
Tells us where a score is located within a
distribution– specifically, how many
standard deviation units the score is above
or below the mean.
Properties


The mean of a set of z-scores is zero (why?)
The variance (and therefore standard
deviation) of a set of z-scores is 1.
10
Area under the normal curve
Example, you have a variable x with mean
of 500 and S.D. of 15. How common is a
score of 525?




Z = 525-500/15 = 1.67
If we look up the z-statistic of 1.67 in a z-score
table, we find that the proportion of scores
less than our value is .9525.
Or, a score of 525 exceeds .9525 of the
population. (p < .05)
Z-Score Calculator
11
Issues with Normal Distributions
Skewness
Kurtosis
12
Correlation
Hypothesis testing an association
between two metric variables
Checking for simple linear
relationships
Pearson’s correlation coefficient

Measures the extent to which two variables
are linearly related
Basically, the correlation coefficient is the
average of the cross products of the
corresponding z-scores.
14
Correlations
Ranges from zero to 1, where 1 = perfect
linear relationship between the two
variables.
Remember: correlation ONLY measures
linear relationships, not all relationships!
N
1
rxy 
z xi z yi

N  1 i 1
15
Correlation Example
General Social Survey 1993
Education and Age
16
The t-test
Hypothesis testing for the equality
of means between two
independent groups
Alternative Hypotheses Revisited
Alternative Hypotheses:



H1: μ1 < μc
H0: μ1 > μc
H0: μ1 ≠ μc
How do we test to see if the means
between two sample populations are, in
fact, different?
18
The t-test
Where:
M = mean
SDM = Standard error of the difference between means
N = number of subjects in group
s = Standard Deviation of group
df = degrees of freedom
19
Degrees of freedom
d.f. = the number of independent pieces of
information from the data collected in a study.

Example: Choosing 10 numbers that add up to 100.
This kind of restriction is the same idea: we had
10 choices but the restriction reduced our
independent selections to N-1.
In statistics, further restrictions reduce the degrees
of freedom.
In the t-test, since we deal with two means, our
degrees of freedom are reduced by two.
20
Z-distribution versus t-distribution
21
t distribution
As the degrees of freedom increase (towards
infinity), the t distribution approaches the z
distribution (i.e., a normal distribution)
Because N plays such a prominent role in the
calculation of the t-statistic, note that for very
large N’s, the sample standard deviation (s)
begins to closely approximate the population
standard deviation (σ)
22
Assumptions Underlying the
Independent Sample t-test
Assumption of Normality
Assumption of Homogeneity of Variance

The outputs for the t-test in SPSS correspond
to the standard t-test (equal variance
assumed) and a separate variance t-test
(equal variance not assumed)
23
Practical Example:
Do men and women watch different
amounts of TV per week?
General Social Survey 1993
24