Download Statistical Analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Regression toward the mean wikipedia , lookup

Time series wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Statistical Analysis
How do we make sense of the data we collect during a study or
an experiment?
Two Kinds of Statistical
Analysis
Descriptive Statistics
•
Organize and summarize data
Inferential Statistics
•
Interpret data and draw conclusions
•
Used to test validity of hypothesis
Descriptive Statistics
• Numbers that
summarize a set of
research data obtained
from a sample
• Organized into a
frequency distribution
(orderly arrangement
of scores)
• Can be pictured as a
histogram (bar graph)
• Can be pictured as a
frequency polygon
(line graph that
replaces the bars with
single points and
connects the points
with a line)
Measures of Central
Tendency
• Describe the average or most typical scores
for a set of research data
• Mode – the most frequently occurring score
• Bimodal – if two scores appear most frequently
• Multimodal – if three or more scores appear most frequently
• Median – the middle score of the set of
• Mean – the average of the set of scores (most
commonly used)
Activity: Dice and the Bell Curve
• Find a partner to work with
• One person will roll the dice, the other will record
the number rolled
• Roll the dice and add the numbers together and
record the results.
• Roll the dice for as many times as you can for
approximately 5-10 minutes.
Organize your data
• How can you organize your data?
• First, create a frequency distribution of your
numbers
• Next, graph the distribution using the x axis for the
score and the y axis for the frequency.
• Finally, calculate the mean, median and mode for
your data.
Normal Distribution (also called
normal curve or bell-curve)
• A “normal distribution” of data means that most of the examples
in a set of data are close to the mean (average), while relatively
few example tend to one extreme or the other.
• Scores are often normally distributed. When this happens, the
mode, median, and mean are all the same (in this case, 100).
Measures of Central Tendency in
Dunder Mifflin Salaries
• Let’s look at the salaries of the employees of the Dunder
Mifflin Paper Company in Scranton:
$25,000-Pam
$25,000- Kevin
$25,000- Angela
$100,000- Andy
$100,000- Dwight
$200,000- Jim
$300,000- Michael





The median salary looks good at $100,000.
The mean salary also looks good at about $110,000.
But the mode salary is only $25,000.
Maybe not the best place to work.
Then again, living in Scranton is kind of cheap.
Skewed Distributions
• When a few extreme scores (called outliers)
significantly affect the mean.
• Distributions where most of the scores are squeezed
into one end are skewed. In very skewed
distributions, the median is a better measure of
central tendency than the mean.
Skews
A few of the scores stretch out away from the group like a
tail. The skew is named for the direction of the tail.
• Tail going to the left – negatively skewed
• Tail going to the right – positively skewed
•
Look at the above figure and note that when a variable is normally distributed, the mean,
median, and mode are the same number.
•
You can use the following two rules to provide some information about skewness even when
you cannot see a line graph of the data (i.e., all you need is the mean and the median):
•
1.
Rule One. If the mean is less than the median, the data are skewed to the left.
•
2.
Rule Two. If the mean is greater than the median, the data are skewed to the right.
Measures of Variability
• Variability describes the spread or diversity
of scores for a set of data.
• Range – The largest score minus the smallest
score
• Variance and standard deviation – indicate the
degree to which scores differ from each other.
The higher the variance or SD, the more spread
out the distribution is.
More on Standard Deviation
• Standard deviation is kind of the “mean of the mean”
and can often help you get the real story behind the data.
It is how far, on average, scores deviate from the mean.
• The standard deviation is a statistic that tells you how
tightly all the various examples are clustered around the
mean in a set of data. When the examples are pretty
tightly bunched together and the bell-shaped curve is
steep, the standard deviation is small. When the
examples are spread apart and the bell curve is relatively
flat, that tells you that you have a relatively large standard
deviation.
One standard deviation away from the mean in either
direction on the horizontal axis (the red area on the graph)
accounts for around 68% of the people in this group. Two
standard deviations away from the mean (the red and
green areas) account for roughly 95% of the people. And
three standard deviations (the red, green and blue areas)
account for about 99% of the people.
If this curve were flatter and more spread out, the standard
deviation would have to be larger in order to account for
those 68% or so of the people. So that's why the standard
deviation can tell you how spread out the examples in a set
are from the mean.
To Calculate Variance
• To calculate the variance for the set of
numbers 4, 5, 5, 6, 6, 6, 6, 7, 7, 8:
• Calculate the mean (average) – 60÷10 =
6
• Subtract the mean from each score in the
distribution above
• This shows you how far each score
deviates from the mean, and when you
add all of these numbers together, they
should always equal zero.
4-6=
-2
5-6=
-1
5-6=
-1
6-6=
0
6-6=
0
6-6=
0
6-6=
0
7-6=
1
7-6=
1
8-6=
2
To Calculate Variance (cont.)
• However, we want to convert the scores to
a form that allows us to add them up and
not get zero. Therefore, we square all of
the deviations scores, which removes all
of the negative values.
• Now when we add them up, we get 12.
The larger this number is, the greater the
dispersion of the scores is.
• Now divide the sum above by the number
of scores in the group. This gives you an
estimate of the average distance that a
score is away from the mean.
12 ÷ 10 = 1.2
-2
(square
this)
-2 x -2=
4
-1
-1 x -1=
1
-1
-1 x -1=
1
0
0 x 0=
0
0
0 x 0=
0
0
0 x 0=
0
0
0 x 0=
0
1
1 x 1=
1
1
1 x 1=
1
2
2 x 2=
4
To Calculate Standard
Deviation
• To calculate standard deviation, all you do is
calculate the square root of the variation you just
calculated.
√1.2 = 1.1
• The smaller this number is, the more confident you
can be in using the mean to represent the group.
Descriptive Statistics
• PsychSim Homework:
• Descriptive Statistics worksheet (under
documents on my website)
• http://bcs.worthpublishers.com/psychsim5/Descrip
tive%20Statistics/PsychSim_Shell.html
Try it yourself
1, 3, 5, 5, 6, 7, 7, 8, 9, 9
• For this set of data, calculate
•
•
•
•
•
•
Median
Mode
Mean
Range
Variance
Standard Deviation
Answers:
1, 3, 5, 5, 6, 7, 7, 8, 9, 9
• Median = 6.5
• Mode = 5, 7, and 9 (multimodal)
• Mean = 6
• Range = 8
• Variance = 6
• Standard Deviation = 2.4
Inferential Statistics
• Whereas descriptive statistics simply summarize
data, inferential statistics attempt to make
inferences about a larger population based on
the data set.
• They help determine whether or not the results
of the study apply to the larger population from
which the sample was taken.
Inferential Statistics (cont.)
• Any time you collect data, it will contain variability due
to chance. For example, by chance alone, you might
collect data from more freshman than from
sophomores. If you repeated your data collection
several times, you would get somewhat different
results each time due to this chance variability.
• If this chance variability always exists in data
collection, how can a researcher be confident that the
inferences he or she makes about the larger
population (the entire Shorecrest student body) is
accurate? We use inferential statistics! Instead of
making absolute conclusions about the population,
researchers make statements about the population
using the laws of probability and statistical
significance.
Statistical Significance (sometimes
known as a p score or p value)
• When inferential statistics demonstrate a high
probability that research results are not due to
chance, the results are said to be statistically
significant.
• Psychologists say that something is statistically
significant when the probability that it might be due to
chance is less than 5 in 100 (indicated by the notation
p < .05)
• In other words, there is less than a 5% chance that
your results occurred just coincidentally, or by
chance.
More about the p value….
• The smaller the p value, the greater the
significance
• Why can a p value never equal 0?
• The p value can also be computed for any
correlation coefficent which will indicate the
strength of a relationship.
To Summarize
Descriptive Statistics
Organize and summarize data
Central Tendency: mean, median, mode
Standard deviation: variation in data
Range: distance from smallest to largest
Inferential Statistics
Interpret data and draw conclusions
Used to test validity of hypothesis
Critical Thinking with
Statistics
• The old saying goes”… there
are three kinds of lies – lies,
damned lies, and statistics.”
• The presentation of research
findings in the form of numbers,
graphs, etc., may look
impressive, but remember that
they can be distorted to make
you believe something that is
not necessarily true.
• The next slide shows some
common ways in which this is
done.
Biased or insufficient
samples
• “Four out of five dentists
surveyed recommended Brand X
gum.”
• The # of dentists surveyed is not
clear
• How were the dentists chosen? Was
it a random sample, or were 5
dentists chosen because they hold
stock in Brand X gum?
• Many mail-in surveys suffer from
a selection bias – the people
who send them in may differ in
important ways from those who
do not.
The Misleading Average
• Example: The principal of a small private
school met the criticism that his faculty
has no teaching experience by issuing the
statement that the average experience of
each member of the faculty was 5 years.
• This statement was technically true: there
were five teachers in the school including
the principal, but the principal neglected to
mention that he had twenty-five years of
experience while the remaining four
members had none.