Download Theories - the Department of Psychology at Illinois State University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Gibbs sampling wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Basic Statistical Concepts
Psych 231: Research
Methods in Psychology

Mistrust of statistics?


It is all in how you use them
They are a critical tool in research
Statistics

Why do we use them?

Descriptive statistics
• Used to describe, simplify, & organize data sets
• Describing distributions of scores

Inferential statistics
• Used to test claims about the population, based on data
gathered from samples
• Takes sampling error into account, are the results above
and beyond what you’d expect by random chance
Statistics


Recall that a variable is a characteristic that can take
different values.
The distribution of a variable is a summary of all the
different values of a variable
 Both type (each value) and token (each instance)
How much do you like psy231?
5 values (1, 2, 3, 4, 5)
1-2-3-4-5
Hate it
Love it
1
5
5
Distribution
7 tokens (1,1,2,3,4,5,5)
4
1
3
2

Many important distributions

Population
• All the scores of interest

Sample
• All of the scores observed
(your data)
• Used to estimate population
characteristics

Distribution of sample
distributions
1
5
52
3
3
5
3
1
2
5
1
3
Sample
Use descriptive statistics, focus on 3 properties
Distribution
3
2
1
How do we describe these distributions?

2
Population
3
1 1
1
2
5
• Used to estimate sampling error

1

Properties of a distribution

Shape
• Symmetric v. asymmetric (skew)
• Unimodal v. multimodal

Center
• Where most of the data in the distribution are
• Mean, Median, Mode

Spread (variability)
• How similar/dissimilar are the scores in the distribution?
• Standard deviation (variance), Range
Distribution

Visual descriptions - A picture of the distribution
is usually helpful
• Gives a good sense of the properties of the distribution

Many different ways to display distribution
• Graphs
• Continuous variable:
• histogram, line graph (frequency polygons)
• Categorical variable:
• pie chart, bar chart
• Table
• Frequency distribution table

Numerical descriptions of distributions
Distribution

A frequency histogram
Example: Distribution of scores on an exam
Frequency

20
18
16
14
12
10
8
6
4
2
0
18
17
12
11
10
8
7
5
3
1
5054
5559
60- 6564 69
70- 7574 79
80- 8584 89
9094
95100
Exam scores
Graph for continuous variables

A line graph
Example: Distribution of scores on an exam
Frequency

20
18
16
14
12
10
8
6
4
2
0
50
55
60
65
70
75
80
85
90
95
Exam scores
Graph for continuous variables

Bar chart

Pie chart
Cutting
Doe
Missing
Smith
Graphs for categorical variables
Be careful using a line graph for categorical variables


The line implies that there are responses between Smith and Doe,
but there are not
Caution
VAR00 003
Va lid
1.00
Fre quen cy
2
Percent
7.7
Va lid Perce nt
7.7
Cumu lati ve
Percent
7.7
2.00
3.00
4.00
3
3
5
11 .5
11 .5
19 .2
11 .5
11 .5
19 .2
19 .2
30 .8
50 .0
5.00
6.00
7.00
8.00
4
2
4
2
15 .4
7.7
15 .4
7.7
15 .4
7.7
15 .4
7.7
65 .4
73 .1
88 .5
96 .2
9.00
To tal
1
26
3.8
10 0.0
3.8
10 0.0
10 0.0
Values
Counts
Percentages
(types)
Frequency distribution table

Symmetric
• The two sides line up
Asymmetric (skewed)
• The two sides do not
line up
Properties of distributions: Shape

Symmetric
Asymmetric (skewed)
Negative Skew
Positive Skew
tail
Properties of distributions: Shape
tail

Unimodal (one mode)
Multimodal
Minor
mode
Major
mode
Bimodal examples
Properties of distributions: Shape

There are three main measures of center

Mean (M): the arithmetic average
• Add up all of the scores and divide by the total number
• Most used measure of center

Median (Mdn): the middle score in terms of location
• The score that cuts off the top 50% of the from the bottom
50%
• Good for skewed distributions (e.g. net worth)

Mode: the most frequent score
• Good for nominal scales (e.g. eye color)
• A must for multi-modal distributions
Properties of distributions: Center


The most commonly used measure of center
The arithmetic average

Divide by the total
number in the
population
Computing the mean
– The formula for the population
mean is (a parameter):
– The formula for the sample
mean is (a statistic):


The Mean
X

N
X
X
n
Add up all of
the X’s
Divide by the total
number in the
sample

How similar are the scores?

Range: the maximum value - minimum value
• Only takes two scores from the distribution into account
• Influenced by extreme values (outliers)

Standard deviation (SD): (essentially) the average amount that
the scores in the distribution deviate from the mean
• Takes all of the scores into account
• Also influenced by extreme values (but not as much as the range)

Variance: standard deviation squared
Spread (Variability)
Low variability
High variability
The scores are fairly similar
The scores are fairly dissimilar
mean
Variability
mean

The standard deviation is the most popular and most
important measure of variability.

The standard deviation measures how far off all of the
individuals in the distribution are from a standard, where that
standard is the mean of the distribution.
• Essentially, the average of the deviations.

Standard deviation
Our population
2, 4, 6, 8
 X 2  4  6  8 20


  5.0
N
4
4
1 2 3 4 5 6 7 8 9 10


An Example: Computing the Mean


Our population
2, 4, 6, 8
Step 1: To get a measure of the
deviation we need to subtract the
population mean from every
individual in our distribution.
 X 2  4  6  8 20


  5.0
N
4
4
X -  = deviation scores
2 - 5 = -3
-3
1 2 3 4 5 6 7 8 9 10

An Example: Computing Standard
Deviation (population)


Our population
2, 4, 6, 8
Step 1: To get a measure of the
deviation we need to subtract the
population mean from every
individual in our distribution.
 X 2  4  6  8 20


  5.0
N
4
4
X -  = deviation scores
2 - 5 = -3
4 - 5 = -1
-1
1 2 3 4 5 6 7 8 9 10

An Example: Computing Standard
Deviation (population)


Our population
2, 4, 6, 8
Step 1: To get a measure of the
deviation we need to subtract the
population mean from every
individual in our distribution.
 X 2  4  6  8 20


  5.0
N
4
4
X -  = deviation scores
2 - 5 = -3
4 - 5 = -1
6 - 5 = +1
1
1 2 3 4 5 6 7 8 9 10

An Example: Computing Standard
Deviation (population)


Our population
2, 4, 6, 8
Step 1: To get a measure of the
deviation we need to subtract the
population mean from every
individual in our distribution.
 X 2  4  6  8 20


  5.0
N
4
4
X -  = deviation scores
2 - 5 = -3
4 - 5 = -1
6 - 5 = +1
8 - 5 = +3
3
1 2 3 4 5 6 7 8 9 10

Notice that if you add up
all of the deviations they
must equal 0.
An Example: Computing Standard
Deviation (population)

Step 2: So what we have to do is get rid of the negative
signs. We do this by squaring the deviations and then
taking the square root of the sum of the squared
deviations (SS).
X -  = deviation scores
2 - 5 = -3
4 - 5 = -1
6 - 5 = +1
8 - 5 = +3
SS =  (X - )2
= (-3)2 + (-1)2 + (+1)2 + (+3)2
= 9 + 1 + 1 + 9 = 20
An Example: Computing Standard
Deviation (population)

Step 3: ComputeVariance (which is simply the average
of the squared deviations (SS))

So to get the mean, we need to divide by the number of
individuals in the population.
variance = 2 = SS/N
An Example: Computing Standard
Deviation (population)

Step 4: Compute Standard Deviation

To get this we need to take the square root of the population
variance.
X  
2
standard deviation =  =  
2
N

An Example: Computing Standard
Deviation (population)

To review:



Step 1: Compute deviation scores
Step 2: Compute the SS
Step 3: Determine the variance
• Take the average of the squared deviations
• Divide the SS by the N

Step 4: Determine the standard deviation
• Take the square root of the variance
An Example: Computing Standard
Deviation (population)

To review:



Step 1: Compute deviation scores
Step 2: Compute the SS
Step 3: Determine the variance
• Take the average of the squared deviations
• Divide the SS by the N-1

Step 4: Determine the standard deviation
• Take the square root of the variance

This is done because samples are biased to be less variable
than the population. This “correction factor” will increase the
sample’s SD (making it a better estimate of the population’s
SD)
An Example: Computing Standard
Deviation (sample)

Example: Suppose that you notice that the
more you study for an exam, the better your
score typically is.


This suggests that there is a relationship
between study time and test performance.
We call this relationship a correlation.
Relationships between variables

Properties of a correlation




Form (linear or non-linear)
Direction (positive or negative)
Strength (none, weak, strong, perfect)
To examine this relationship you should:


Make a scatterplot
Compute the Correlation Coefficient
Relationships between variables


Plots one variable against the other
Useful for “seeing” the relationship



Form, Direction, and Strength
Each point corresponds to a different individual
Imagine a line through the data points
Scatterplot
Hours
study
Exam
perf.
X
6
1
Y
6
2
5
6
3
4
3
2
Y
6
Scatterplot
5
4
3
2
1
1
2
3
4
5
6 X



A numerical description of the relationship between two
variables
For relationship between two continuous variables we
use Pearson’s r
It basically tells us how much our two variables vary
together

As X goes up, what does Y typically do
• X, Y
• X, Y
• X, Y
Correlation Coefficient
Linear
Form
Non-linear
Negative
Positive
Y
Y
X
X
• As X goes up, Y goes up
• As X goes up, Y goes down
• X & Y vary in the same
direction
• X & Y vary in opposite
directions
• Positive Pearson’s r
• Negative Pearson’s r
Direction

Zero means “no relationship”.


The farther the r is from zero, the stronger the
relationship
The strength of the relationship

Spread around the line (note the axis scales)
Strength
r = -1.0
“perfect negative corr.”
-1.0
r = 0.0
“no relationship”
r = 1.0
“perfect positive corr.”
0.0
The farther from zero, the stronger the relationship
Strength
+1.0
Rel A
r = -0.8
Rel B
r = 0.5
-.8
-1.0
.5
0.0
Which relationship is stronger?
Rel A, -0.8 is stronger than +0.5
Strength
+1.0

Compute the equation for the line that best
fits the data points
Y
6
5
Y = (X)(slope) + (intercept)
4
3
2
1
0.5
Change in Y
1
2
3
Regression
4
5
6 X
Change in X
2.0
= slope

4.5
Can make specific predictions about Y
based on X
Y
6
5
X=5
Y = (X)(.5) + (2.0)
Y=?
Y = (5)(.5) + (2.0)
Y = 2.5 + 2 = 4.5
4
3
2
1
1
2
3
Regression
4
5
6 X

Also need a measure of error
Y = X(.5) + (2.0) + error
Y = X(.5) + (2.0) + error
• Same line, but different relationships (strength difference)
Y
6
5
Y
6
5
4
3
2
1
4
3
2
1
1
2
3
4
5
Regression
6 X
1
2
3
4
5
6 X



Don’t make causal claims
Don’t extrapolate
Extreme scores (outliers) can strongly
influence the calculated relationship
Cautions with correlation & regression