Download Descriptive Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Time series wikipedia , lookup

Regression toward the mean wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Statistics
Interpreting Data
After psychologists develop a theory, form a hypothesis, make observations, and collect data,
they end up with a lot of information, usually in the form of numerical data. The term statistics
refers to the analysis and interpretation of this numerical data. Psychologists use statistics to
organize, summarize, and interpret the information they collect.
Descriptive Statistics
To organize and summarize their data, researchers need numbers to describe what happened.
These numbers are called descriptive statistics. Researchers may use histograms or bar graphs
to show the way data are distributed. Presenting data this way makes it easy to compare results,
see trends in data, and evaluate results quickly.
Example: Suppose a researcher wants to find out how many hours students study for three
different courses. Each course has 100 students. The researcher does a survey of ten students in
each of the courses. On the survey, he asks the students to write down the number of hours per
week they spend studying for that course. The data look like this:
Hours of Study per Week
Course A
Course B
Course C
Student Hours per week Student Hours per week Student Hours per week
Joe
9
Hannah 5
Meena 6
Peter 7
Ben
6
Sonia 6
Zoey 8
Iggy
6
Kim
7
Ana
8
Louis 6
Mike 5
Jose
7
Keesha 7
Jamie 6
Lee
9
Lisa
6
Ilana 6
Joshua 8
Mark 5
Lars
5
Ravi 9
Ahmed 5
Nick 20
Kristen 8
Jenny 6
Liz
5
Loren 1
Erin
6
Kevin 6
To get a better sense of what these data mean, the researcher can plot them on a bar graph.
Histograms or bar graphs for the three courses might look like this:
Measuring Central Tendency
Researchers summarize their data by calculating measures of central tendency, such as the
mean, the median, and the mode. The most commonly used measure of central tendency is the
mean, which is the arithmetic average of the scores. The mean is calculated by adding up all the
scores and dividing the sum by the number of scores.
However, the mean is not a good summary method to use when the data include a few extremely
high or extremely low scores. A distribution with a few very high scores is called a positively
skewed distribution. A distribution with a few very low scores is called a negatively skewed
distribution. The mean of a positively skewed distribution will be deceptively high, and the
mean of a negatively skewed distribution will be deceptively low. When working with a skewed
distribution, the median is a better measure of central tendency. The median is the middle score
when all the scores are arranged in order from lowest to highest.
Another measure of central tendency is the mode. The mode is the most frequently occurring
score in a distribution.
Statistics
Statistics is a branch of mathematics. Psychologists need a solid foundation in math to describe,
analyze, and summarize the results of their research.
Measuring Variation
Measures of variation tell researchers how much the scores in a distribution differ. Examples of
measures of variation include the range and the standard deviation. The range is the difference
between the highest and the lowest scores in the distribution. Researchers calculate the range by
subtracting the lowest score from the highest score. The standard deviation provides more
information about the amount of variation in scores. It tells a researcher the degree to which
scores vary around the mean of the data.
Inferential Statistics
After analyzing statistics, researchers make inferences about how reliable and significant their
data are.
Inferential statistics are used to interpret data and draw conclusions. They tell psychologists
whether or not they can generalize from the chosen sample to the whole population, if the sample
actually represents the population. Inferential statistics use rules to evaluate the probability that a
correlation or a difference between groups reflects a real relationship and not just the operation
of chance factors on the particular sample that was chosen for study. Statistical significance (p)
is a measure of the likelihood that the difference between groups results from a real difference
between the two groups rather than from chance alone. Results are likely to be statistically
significant when there is a large difference between the means of the two frequency distributions,
when their standard deviations (SD) are small, and when the samples are large. Some
psychologists consider that results are significantly different only if the results have less than a 1
in 20 probability of being caused by chance (p = .05). Others consider that results are
significantly different only if the results have less than a 1 in 100 probability of being caused by
chance (p < .01). The lower the p value, the less likely the results were due to chance. Results of
research that are statistically significant may be practically important or trivial. Statistical
significance does not imply that findings are really important. Meta-analysis provides a way of
statistically combining the results of individual research studies to reach an overall conclusion.
Scientific conclusions are always tentative and open to change should better data come along.
Good psychological research gives us an opportunity to learn the truth.
Percentile Rank – A percentage that describes your rank among those also being evaluated. I.e.
if your percentile rank on a test is 90, then your score is higher than 90% of the class. It is
impossible to get 100% percentile rank because you cannot get higher than everyone in the class,
including yourself.






Mean – The average score. Add all the numbers up and divide by number of terms. The mean of
{2,2,3,10,98} is 23.
Median – The middle point of all the terms such that half is above the number and half is below
the number (50th percentile). Arrange the number from highest to lowest or vice versa and find
the number in the middle. The median of {2,2,3,10,96} is 3.
Mode – The number that occurs the most. Count to see which number appears the most. The
mode of the {2,2,3,10,98} is 2.
Range – The range of the scores is the difference between the highest number and the lowest
number. The range of GPA score is from 0.0 to 4.0.
Standard Deviation – A measurement of how far scores differ/deviate from the mean. The
standard deviation of {5,6,5,6,6,7,5,4} is very low because terms hardly deviate from the mean
of 5.5. Whereas, the standard deviation of {5,10,8,18,-6,5,-7,22} is high.
Variance = s2
Standard Deviation Method Example: To find the Standard deviation of 1,2,3,4,5.
Step 1: Calculate the mean and deviation.
X
1
2
3
M
3
3
3
(X-M)
-2
-1
0
(X-M)2
4
1
0
4 3
5 3
1
2
1
4
Step 2:Find the sum of (X-M)2
4+1+0+1+4 = 10
Step 3:N = 5, the total number of values.Find N-1.
5-1 = 4
Step 4:Now find Standard Deviation using the formula.
√10/√4 = 1.58113
Another example:
1.
2.
3.
4.
5.
6.
Find the Standard Deviation of {2,3,3,4}
Find the mean. (2+3+3+4)/4 = 3
Subtract the mean from each term and square it. (2-3)²=1, (3-3)²=0, (3-3)²=0, (4-3)²=1
Find the average of the deviations from the mean. (1+0+0+1)/4 = 0.5
Square root the average and that’s the standard deviation (0.5)^1/2 = 0.7071
Normally this number should be rounded to the same decimal place as the data. But 0.7071 is
shown for better understanding. 0.7071 ! 1

Normal curve (the 68-95-99.7 Rule ) or more commonly known as the bell curve is a distribution
graph that dictates 68% of the scores should circa the mean. More specifically, 68% of the
scores should fall within 1 standard deviation and 95% should fall within 2 standard deviations
from the mean.

Scatterplot – A graphical representation of data by usage of dots. The degree of cluster or
formation of a slope can dictate the correlation between the two variables.
Correlation – The relationship between 2 events. I.e. Traffic accidents increase with increasing
temperatures; businesses drop after Christmas ends.

Correlation Coefficient – A proportional number that measures correlation – how strongly
two events vary together.

Positive Correlation – The two events increase and/or decrease together. For example,
increasing study time positively correlates with increasing grades or decreased food




consumption positively correlates with decreased excitability. Positive correlation coefficients
are positive numbers ranging from 0.00 (no correlation) to 1.00 (perfect correlation). In a
scatterplot graph, a positive correlation exists if a positive slope is seen.
Negative Correlation – One event increases and the other decreases or vice versa. For example,
decreasing number of hours of sleep negatively correlates with increases traffic accidents or
increasing alcohol consumption decreases alertness. Negative correlation coefficients are
negative numbers ranging from –1.00 (perfect correlation) to 0.00 (no correlation). In a
scatterplot, negative a correlation exists if a negative slope is seen. * Be sure to remember that
CORRELATIONS DO NOT NECESSARILY MEAN CAUSATION. If car accidents increase with
increasing temperatures, it does not necessarily mean that hot temperatures cause more traffic
accidents!!
Be aware of ILLUSORY CORRELATION – seeing relationships between something when there is
none. If you believe that black-colored dogs are more aggressive than white-colored dogs, then
you will be more likely to notice and recall events where black-colored dogs show
aggressiveness to confirm your belief (also know as “self -serving bias”).
Regression toward the mean – Tendency for extreme values to go back (“regress”) to the
average value (mean). I.e. If you normally get 80% on your tests and suddenly you got an
extreme (unusual) score of 50%, then on your next test you are likely to get around 80% again.
Statistical Significance – A measure of how likely an event is due to chance alone. I.e. If average
marks concerning two classes are statistically significant, then the marks are actually different,
not due to random chance or sampling errors. Statistical significance is usually determined by
mathematical analysis of the samples.