Download X - rci.rutgers.edu

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Taylor's law wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Things you will need in
class.
Lecture notes from the my website on the
internet.
Go to www.rci.rutgers.edu/~rakarlin and look for the
latest set of lecture notes. Print them as handouts,
3/page, and bring them to class.
Also we will be doing problems in class. So,
always bring your book, a calculator a
pen/pencil or two and notebook paper to class.
Chapter 1
The mean, the number of
observations, the variance
and the standard deviation
Some definitions
Data -
observations, measurements, scores
Statistics -
a series of rules and methods that can be
used to organize and interpret data.
Descriptive Statistics -
methods to summarize
large amounts of data with just a few numbers or a
figure.
Inferential Statistics -
mathematical procedures
to make statements of a population based on a sample.
More Definitions
Parameter -
a number that summarizes or
describes some aspect of a population.
Parametric Statistics – statistical methods
based on our ability to estimate population
parameters such as the population mean and
variance.
Non-parametric Statistics -
statistics for
observations that do not allow the estimation of
the population mean and variance.
More Definitions
Sample statistic - An estimate of a population
parameter based on a random sample taken from the
population.
Sampling Error -
the difference between a
sample statistic that estimates a population parameter
and the actual parameter.
Sampling fluctuation –
The differences
among estimates based on different random samples
that arise because, most of the time, the samples
contain different individuals and there are always
random measurement problems, no matter how careful
you are.
Where we are going
Descriptive Statistics
Number of Observations
Measures of Central Tendency
Measures of Variability
Observations
Each score is represented by the
letter X.
The total number of observations is
represented by N.
Measures of Central Tendency
Finding the most typical score
median - the middle score
mode - the most frequent score
mean - the average score
In this course, the mean will be our
most important measure of central
tendency
Calculating the Mean
Greek letters are used to represent
population parameters.
 (mu) is the mathematical symbol for the
mean.
 is the mathematical symbol for summation.
Formula:  = (X) / N
English: To calculate the mean, first add
up all the scores, then divide by the
number of scores you added up.
The mode, the median and
the mean
Ages of people retiring from Rutgers this year.
60
63
45
64
Mode is 60.
65
70
55
60
66
45
55
60
60
63 Median is 63.
64
65
66
70
X = 548
N=9
Mean  = 60.89
Measures of Variability:
less important
Range - the distance from the highest to the
lowest score.
Inter-quartile Range - the distance from the
top 25% to the bottom 25%.
Sum of Squares (SS) – the total squared
distance of all scores from the mean. You
calculate it by finding the distance of each score
from the mean, square that distance, and then
sum the squared distances over all the scores.
(Note: In general, the more scores, the bigger
the sum of squares.)
Measures of Variability – more
important
Variance (2)- also called sigma2. The variance
is the average squared distance of scores from
mu. It is found by computing the total squared distance
of all the scores from the mean (SS) and then dividing
by the number of scores (2=SS/N)
Standard Deviation ()- also called sigma. The
standard deviation is the square root of the
variance. It is the average unsquared distance of
scores in the population from their mean. (That is
almost, but not exactly like saying that the standard
deviation is the average distance of scores from the
population mean.)
Computing the variance
and the standard deviation
Scores on a 10 question
Psychology quiz
Student
X
John
7
Jennifer
8
Arthur
3
Patrick
5
Marie
7
X = 30
N=5
 = 6.00
X-
+1.00
+2.00
-3.00
-1.00
+1.00
(X- ) = 0.00
(X - )2
1.00
4.00
9.00
1.00
1.00
(X- )2 = SS = 16.00
2 = SS/N = 3.20
 = 3.20 = 1.79
The variance is our most basic and
important measure of variability
The variance (  =sigma squared) is the
average squared distance of individual scores
from the population mean.
Other indices of variation are derived from the
variance.
For example,. as noted above, sigma is the
average unsquared distance of scores from mu
is the standard deviation. To find it, you
compute the square root of the variance.
2
Other measures of variability
derived from the variance
 We can randomly choose scores from a population to
form a random sample and then find the mean of such
samples.
 Each score you add to a sample tends to correct the
sample mean back toward the population mean, mu.
 The average squared distance of sample means from
the population mean is the variance divided by n, the
size of the sample.
 To find the average unsquared distance of sample
means from mu divide the variance by n, then take the
square root. The result is called the standard error of
the sample mean or, more briefly, the standard error of
the mean. We’ll see more of this in Ch. 4.
Making predictions (1)
Without any other information, the population
mean (mu) is the best prediction of each and
every person’s score.
So you should predict that everyone will score
precisely at the population mean.
Why? Because the mean is an unbiased
predictor or estimate. The mean is as close to
the high as to the low scores in the population.
This is mathematically proven by the fact that
deviations around the mean sum to zero.
You should also predict
that everyone will score
right at the mean because:
The mean is the number that is the
smallest average squared distance from all
the scores in the distribution.
Thus, the mean is your best prediction,
because it is a least squares, unbiased
predictor.
What happens if we make
a prediction other than mu.
Scores on a Psychology quiz (mu = 6.00) What if we predict
everyone will score 5.50? Deviations don’t sum to zero and the
average squared distance of scores from the prediction
increases
Student
X
John
7
Jennifer
8
Arthur
3
Patrick
5
Marie
7
X = 30
N=5
 = 6.00
X
X -- 5.5
5.50
+1.50
+2.50
-2.50
-0.50
+1.50
(X- ?) = 2.50
(X(X- 5.50)
- )2 2
2.25
6.25
6.25
0.25
2.25
(X- ?)2 = SS = 17.25
2 = SS/N = 3.45
 = 3.20 = 1.86
Compare that to predicting that everyone
will score right at the mean (mu).
Scores on a 10 question
Psychology quiz
Student
X
John
7
Jennifer
8
Arthur
3
Patrick
5
Marie
7
X = 30
N=5
 = 6.00
X-
+1.00
+2.00
-3.00
-1.00
+1.00
(X- ) = 0.00
(X - )2
1.00
4.00
9.00
1.00
1.00
(X- )2 = SS = 16.00
2 = SS/N = 3.20
 = 3.20 = 1.79
Summary: Mu vs. another prediction
Other Prediction = 5.50
Deviations don’t sum to zero. It’s a biased
prediction.
Sum of squares = 17.25
Prediction = mu = 6.00
Deviations sum to zero. It’s unbiased
Sum of squares = 16.00
So you should predict everyone will score
precisely at the mean.
But when you predict that everyone will
score at the mean, you will be wrong. In fact,
it is often the case that no one will score
precisely at the mean.
In statistics, we don’t expect our predictions to
be precisely right.
We want to make predictions that are wrong in
a particular way.
We want our predictions to be as close to the
high scores as to the low scores in the
population.
The mean is the only number that is an
unbiased predictor, it is the only number around
which deviations sum to zero.
We want to be wrong by
the least amount possible
In statistics, we consider error to be the
squared distance between a prediction
and the actual score.
Sum of squares is total amount prediction
is wrong. Variance is average amount
prediction is wrong
score right at the mean
(mu).
The mean is the least average squared distance
from all the scores in the population.
The number that is the least average squared
distance from the scores in the population is the
prediction that is least wrong, the least in error.
Thus, saying that everyone will score at the
mean (even if no one does!) is the prediction
that gives you the smallest amount of error.
Why doesn’t everyone
score right at the mean?
Sources of Error
Individual differences – people have stable
differences from one another. They differ in
an infinite number of ways and combination
of ways.
PROOF OF THAT: AREN’T YOU ARE MORE
LIKE WHO YOU WILL BE IN 5 MINUTES
THAN YOU ARE LIKE THE PERSON NEXT TO
YOU??!
AND – THERE ARE ALWAYS
MEASUREMENT
PROBLEMS!
Instruments are imperfect,
scores get mistranscribed,
participants may be
uninterested or have a
stomach ache, etc. etc.
etc. …
Remember: THERE ARE
ALWAYS MEASUREMENT
PROBLEMS
NO MEASUREMENT DEVICE IS EVER
PERFECTLY ACCURATE, WHETHER IT IS
A HIGHLY ACCURATE SCALE OR A 12
QUESTION QUESTIONNAIRE
Additionally, transient situational
factors make measurement inaccurate
This is especially true when we measure
people. Let’s say we are measuring
something relatively easy to measure, such as
verbal ability. When we are measuring
people, lots of transient factors (such as
mood, events, time, motivation etc.) all
change an individual’s responses and combine
to make our measurement of verbal ability
imperfect.
The mean square for error
We call the average squared error of
prediction when we use the mean as our
prediction the “mean square for error”. It
tells us how much (squared) error we
make, on the average, when we predict
that everyone will score precisely at the
mean.
Mean square for error =
the variance (sigma2)
If we predict that everyone will score
right at the mean, how much error
do you make on the average? To find
out, find the distance of each score
from the mean, square that distance
and divide by the number of scores
to find the average error.
WHOOPS: THAT’S SIGMA2.
Let’s summarize
Questions and answers – the
mean.
 WHAT QUALITIES OF THE MEAN (MU) MAKE IT THE
BEST PREDICTION YOU CAN MAKE OF WHERE
EVERYONE WILL SCORE?
 The mean is an unbiased predictor or estimate, because
the deviations around the mean always sum to zero.
 The mean is a least squares predictor because it is the
smallest squared distance on the average from all the
scores in the population.
Q & A: the mean
WHY WOULD YOU PREDICT THAT EVERYONE
WILL SCORE AT THE MEAN WHEN, IN FACT,
OFTEN NO ONE CAN POSSIBLY SCORE
PRECISELY AT THE MEAN?
In statistics, we don’t expect our predictions to
be precisely right.
We want to make predictions that are close and
wrong in a particular way.
We want least squares, unbiased predictors.
Q & A: The variance
WHAT ARE THE OTHER NAMES FOR THE
VARIANCE?
Sigma2 and the mean square for error.
WHAT OTHER MEASURES OF
VARIABILITY CAN BE EASILY COMPUTED
ONCE YOU KNOW THE VARIANCE?
The standard deviation and the standard
error of the sample mean.
How do you compute
THE VARIANCE? Find the distance of each score
from the mean, square it, sum them up and
divide by the number of scores in the
population.
THE STANDARD DEVIATION? Compute the
square root of the variance.
THE STANDARD ERROR OF THE SAMPLE MEAN?
Divide the variance by n, the size of the sample,
and then take a square root.
How do you compute
THE VARIANCE? Find the distance of each score
from the mean, square it, sum them up and
divide by the number of scores in the
population.
THE STANDARD DEVIATION? Compute the
square root of the variance.
THE STANDARD ERROR OF THE SAMPLE MEAN?
Divide the variance by n, the size of the sample,
and then take a square root.
END CHAPTER 1 SLIDES