Download 1 An Exercise in STATISTICS Steps in Conducting Research

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

History of statistics wikipedia , lookup

Human subject research wikipedia , lookup

Categorical variable wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Tripken
1 Research Methods/Stats
An Exercise in STATISTICS
1.
2.
3.
4.
5.
Steps in Conducting Research
Formulating a "testable" hypothesis
Designing the study
Collecting the data
Analyze the data and draw conclusions
But we don't stop there . . . Reporting the findings
Formulating a "testable" hypothesis
What is a TESTABLE hypothesis? A prediction that can be precisely measured.
To do this we must identify the variables we wish to study and clearly define how we are going to measure or
control them. This is what is called an OPERATIONAL DEFINITION.
Exercise1 : Can science assess this question?
Read each question below and decide whether you think science could answer this question. You need to
consider whether science can measure the important variables in each question.
1. Is the content of dreams the result of unconscious motives and desires?
2. Is there any relationship between birth order and personality?
3. Do college students consume more pizza than any other group?
4. Can humans be innately evil?
5. Do sales of pain relievers increase during periods of economic crisis?
6. Do animals dream?
Designing the study
There are several research methods to choose from. The choice is greatly influenced by the nature of the
research question. For example, if you are simply interested in whether certain groups of people endorse a
particular attitude a survey would be the most efficient method.
Once a method of study has been chosen, concerns such as how will it be conducted and who will be the
subjects (and how will you obtain the subjects) need to be worked out.
Collecting the data
There are several techniques used to collect data. With our survey we could decide whether we want to
"interview" our subjects or will they complete a pencil and paper "questionnaire." If we want to assess
whether caffeine intake makes people more jittery and anxious we might use a physiological measure such as
heart rate to assess anxiety. As you can see the nature of the topic and the design being used influences the
choice of data collection.
Analyzing the data and drawing conclusions
Researchers use statistics to help them to "organize, summarize, and interpret numerical data" (Weiten,
1998, p. 53). With the use of statistics researchers can assess whether their predictions (hypotheses) were
supported or were not supported by the data gathered.
Reporting the findings
Researchers share their findings with other scientists and with the general public. This serves two purposes.
First, it informs others about what was found. Second, it allows others to comment on the research. Were its
1
Tripken
1 Research Methods/Stats
methods sound? Did its conclusions go beyond the established facts? What new questions does it raise for
science? How might we use this information to better our lives?
Typically, researchers prepare a written report that is submitted to a journal in the appropriate area of
research. For example, if our survey examined the attitudes of mothers and fathers on child care issues, then
a journal in Developmental Psychology would be more appropriate than one in Clinical Psychology. Many
journals that publish scientific research in psychology (as in most other disciplines) are "refereed journals."
This means that before the report is published experts in the area review the study. They consider the
appropriateness of the article for the particular journal; the importance of the issue; whether there are flaws
in the study's design, or analysis.
Experimental Designs: Independent & Dependent Variables
Example 1 Dr. Imanut wants to examine whether a new drug increases the maze running performance of
older rats. Just like aging humans, older rats show signs of poorer memory for new things. Dr. Imanut
teaches two groups of older rats to find a piece of tasty rat chow in the maze. One group of rats is given the
new drug while they are learning the maze. The second group is not given the drug. One week after having
learned the maze he retests the rats and records how long it takes them to find the rat chow.
What is the independent variable? Hint: What did the researcher manipulate (allow to vary) in this study?
a) age of the rats.
b) type of maze.
c) length of time it took the rats to run the maze.
d) presence or absence of the new drug.
What is the dependent variable? Hint: What was the measure of the research subjects' responses?
a) age of the rats.
b) type of maze.
c) length of time it took the rats to run the maze.
d) presence or absence of the new drug.
Example 2 A researcher wanted to study the effects of sleep deprivation on physical coordination. The
researcher selected 25 year-old male college students and deprived some of the subjects to either 24, 36, or
45 hours of sleep.
In the present study the independent variable was:
a) the length of time the subjects were deprived of sleep.
b) the age of the subjects.
c) the gender of the subjects.
d) the physical coordination skills of the subjects.
In the present study the dependent variable was:
a) the length of time the subjects were deprived of sleep.
b) the age of the subjects.
c) the gender of the subjects.
d) the physical coordination skills of the subjects.
Example 3 A researcher wanted to know whether the number of people present would influence subjects'
judgments on a simple perceptual task. In each case the other members of the group gave an incorrect
answer. The researcher then noted whether the subject conformed to the group decision.
In the present study the independent variable was:
2
Tripken
1 Research Methods/Stats
a) the number of people in the group.
b) whether the group members gave the correct or incorrect answer.
c) whether the subjects conformed with the group.
d) the type of perceptual task.
In the present study the dependent variable was:
a) the number of people in the group.
b) whether the group members gave the correct or incorrect answer.
c) whether the subjects conformed with the group.
d) the type of perceptual task.
Example 4 An investigator had 60 subjects watch a videotaped re-enactment of a bank robbery. Half of the
subjects were asked by a police investigator to recall the event, while the remaining subjects were
interviewed by a police investigator while they were hypnotized.
In the present study the independent variable was:
a) whether a police investigator was used.
b) whether subjects were hypnotized.
c) how much subjects recalled.
d) what subjects watched.
In the present study the dependent variable was:
a) whether a police investigator was used.
b) whether subjects were hypnotized.
c) how much subjects recalled.
d) what subjects watched.
Control & Experimental Groups
In an experiment, researchers are typically concerned about the performance of subjects in the experimental
group. If a researcher wants to know if a new drug helps improve memory, the researcher is most interested
in the how people who are given the drug perform on the memory test. However, in order to conclude that
the drug "improves" memory, people who take it must perform better than those who do not take the drug.
The CONTROL GROUP serves as the BASELINE performance. The group given the drug serves as the
EXPERIMENTAL GROUP.
Confounding/Extraneous Variables
In order to isolate the effect of the independent variable on the dependent variable, researchers must rule out
alternative explanations. In other words, only the independent variable can be allowed to vary.
The term CONFOUNDING/EXTRANEOUS VARIABLE is used to refer to any other factor that might affect the
dependent variable.
Try the following exercise to see if you can spot potential problems in these hypothetical research studies.
EXAMPLE 1 A researcher wanted to assess whether mood influenced people's memory. The researcher
hypothesized that positive moods would lead to greater memory performance than would a negative mood
state. On Monday the researcher had 50 subjects learn a list of nonsense syllables and then watch a very
humorous comedy film. Their recall of the list of syllables was then assessed. On Tuesday the researcher had
a second group of 50 subjects learn the same list of nonsense syllables and then watch an upsetting
documentary on World War II. Their recall of the list was then assessed after having watched the film.
EXAMPLE 2 A researcher wanted to see whether a new way of teaching English was superior to a more
traditional approach. The researcher selected two Thursday night classes at a local community college. In
3
Tripken
1 Research Methods/Stats
one class the instructor used a traditional method, the second instructor used the newer approach. The
researcher then assessed students language ability after they had completed the program.
How Researchers Control Sources of Error
To control for potential extraneous variables and other sources of error researchers use:
 A standardized set of procedures
 Equivalent Control and Experimental Groups
Standardized procedures means that subjects are treated the same way in all regards except for the
independent variable(s). Researchers also need to ensure that the control and experimental groups are
similar on important variables at the outset. To do this researchers can use one of three methods.
 Use the same subjects in both the control and experimental groups. (This is called a repeated
measures design).
 Match subjects on important variables (e.g., for every 20 year old female in the control group there
is a 20 year old female in the experimental group).
 Random assignment. (Let chance decide who gets placed into which group. Thus, each subject
has an equal chance of being placed in either group).
Why three methods?
Sometimes we cannot use the same subjects in both the control and experimental groups. Sometimes after
having been in one of the conditions it alters the subjects' behavior. This change may carry over to the next
condition and thus serve as an extraneous variable.
For example, a researcher wants to study whether a new drug is better than an old drug to reduce anxiety
symptoms. If we gave the old drug to the subjects and assessed them and then gave the new drug, there
might be carry-over effects from the old drug still. Thus, we might want to use two different groups of people
who suffer from anxiety.
We could match our subjects on important variables such as age, gender, severity of symptoms. Thus, for
every 40 year old male with mild symptoms in the old drug (control) group there is a similar subject in the new
drug (experimental) group. However, in finding perfect matches for our subjects we might have to go
through many people. This is not very resource efficient. In order to find 50 people who are perfect matches
for another group of 50 we might have to go through a few hundred potenital subjects.
A more simple method is random assignment. We let chance determine who is in the control and
experimental groups. With a large enough sample of subjects it is highly unlikely that the majority of people
with severe symptoms would be in one group.
Advantages & Disadvantages of the Experiment
The advantage of the experimental approach is that it allows investigators enough control to examine cause
and effect relationships. Experiments allow us to answer "what causes something to occur?" This is the
second goal of science, understanding and prediction.
However, this degree of control can also be a potential weakness for experiments. By controlling features of
the environments of subjects the researcher may create too artificial an environment. This means that while
the researcher may have accurately understood the "cause" of the subjects' behavior, the findings only apply
under such rigid, non-real world conditions, to have limited use in explaining real-world behavior. (The desire
of psychology is to understand this real-world behavior too).
A second weakness for experiments is that some questions for ethical or technical reasons cannot be studied
using an experiment. An important question is whether people who have had less optimal rearing
4
Tripken
1 Research Methods/Stats
experiences, such as poverty or abuse, continue to have difficulties in their adult years because of this poor
rearing. Yet, we cannot place children in abusive environments just to see if it "causes" damage that persists
into adulthood. Thus, we use other research methods, such as correlational studies. We might see whether
there is a relationship between childhood poverty or abuse and psychological and behavioral problems of
adults, by asking adults about their childhood experiences and their life as an adult.
Descriptive & Correlational Designs
These designs allow us to fulfill the first goal of science, and to isolate possible causes for experiments to then
assess. Remember only experiments can assess cause and effect. No matter how convincing data from
descriptive and correlational studies may sound, because they have less control over the variables and the
environments that they study, non-experimental designs cannot rule out extraneous variables as the cause of
what is being observed.
There are many types of non-experimental methods. We will focus on three approaches:
Case Study
Naturalistic Observation
Survey
CASE STUDIES involve in-depth examination of a single person or a few people. This approach is frequently
employed in clinical psychology. Typically the individual or small group of individuals being examined
possesses some skill, or has some problem that is unusual.
STRENGTH: Such cases can expand our knowledge about the variations in human behavior. While most
researchers are interested in what is the "general" trend in behavior, those using a case study approach
highlight individuality. Considerable information is gathered. Thus, the conclusions drawn are based on a
more complete set of information about the subjects.
WEAKNESS: Despite their strengths, case studies have some very big drawbacks. First, like all nonexperimental approaches, they are merely describing what is occurring, but cannot tell us "why" it is
occurring. Second, there is considerable room in case studies for "researcher bias" to creep in. While no
approach, including the experiment, is immune from researcher bias when in the hands of an incompetent or
poorly trained researcher, some approaches are at greater risk for this problem even when conducted by
capable people.
Why is the case study more at risk?
The case study method involves considerably more interaction between the researcher and the subjects than
most other research methods. In addition, it is from the researcher's journals of his or her subjects that the
data comes from. While this might also be supplemented by test scores and more objective measures, it is
the researcher that brings all this together in the form of a descriptive "case study" of the individual(s) in
question.
A final problem with case studies is that the small number of cases examined make it unlikely that they
represent those who may have similar problems or abilities as those studied. This problem means we might
not be able to generalize (apply) the study's findings to other people with similar problems. Thus, a case
study of a single person with schizophrenia is unlikely to be representative of all people who suffer from this
disorder.
NATURALISTIC OBSERVATION studies as their name implies observe organisms in their natural settings. A
researcher who wants to examine aggressive behavior in male and female youngsters may watch children in
the school playground, and record the number of aggressive acts boys and girls display.
STRENGTH: The behavior of the subjects is likely to reflect their true behavior as it takes place in a natural
setting, where they do not realize that they are being observed.
5
Tripken
1 Research Methods/Stats
WEAKNESS: The researcher has no control over the setting. For example, in our playground study, more
than a child's gender may be affecting the child's aggressive behavior. In addition, subjects may not have an
opportunity to display the behavior the researcher is trying to observe because of factors beyond the
researcher's control. For example, some of the children who are usually the most aggressive may not be at
school that day or in detention because of previous misconduct, thus they are not in the sample of children on
the playground. Finally, the topics of study are limited to only people's overt behavior. A researcher cannot
study topics like attitudes or thoughts using a naturalistic observation study.
SURVEY studies ask large numbers of people questions about their behaviors, attitudes, and opinions. Some
surveys merely describe what people say they think and do. Other survey studies attempt to find
relationships between the characteristics of the respondents and their reported behaviors and opinions. For
example, is there a relationship between gender and people's attitudes about some social issue? When
surveys have this second purpose we refer to them as CORRELATIONAL STUDIES.
STRENGTH: Surveys allow us to gather information from large groups of people. Surveys also allow us to
assess a wider variety of behaviors than can be studied in a naturalistic observation study.
WEAKNESS: Surveys require that the subjects understand the language. Thus, some members of the
population may be excluded from survey research. Surveys also rely heavily on subjects' memory and
honesty.
CORRELATIONAL STUDIES
Correlational studies look for relationships between variables. Do people who experience divorce have more
psychological problems? Do children who come from economically advantaged families perform better
academically? In each case we are asking is there a relationship between variable X and variable Y?
Correlational studies only tell us that there is a relationship between the two variables. They do not tell us
which variable "caused" the other.
For example, a researcher measures people's marital status and their psychological adjustment and finds that
there is a correlation between the two variables. More people who are no longer married report experiencing
psychological problems. It might be tempting to conclude that the stress of experiencing a divorce causes
depression and anxiety. However, it is also likely that people who suffer from psychological problems are
harder for partners to live with, and thus more likely to have their marriage end in divorce. The researcher
would need to determine which variable came first, the marital breakup or the psychological problems.
Establishing Causality
In order to establish causality we need three things.
That there is a correlation between the two variables
Time order. That the presumed cause came before the presumed effect
Rule out alternative explanations
Correlational studies give us the first thing. Certain studies if they follow subjects over a period of time may
provide us with the second. But correlational studies have less control over the subjects' environment and
thus have difficulty ruling out alternative explanations.
Correlation
Some studies are interested in whether two variables are related to each other.
 Is there a relationship between birth order and IQ scores?
 Is there a relationship between socioeconomic status (SES) and health?
6
Tripken
1 Research Methods/Stats
The CORRELATION COEFFICIENT is a statistic that shows the strength of the relationship between the two
variables. The correlation coefficient falls between -1.00 and +1.00. The statistic shows both the STRENGTH
of the relationship between the variables, and the DIRECTION of the relationship. The numerical value
indicates the strength of the relationship. The sign in front of the numerical value indicates the direction of
the relationship. Let us consider each of these in more detail.
THE NUMBERICAL VALUE: Correlation coefficient values that are close to zero (e.g., -.13, +.08) suggest that
there is no relationship between the two variables. The closer the correlation is to one (e.g., -.97, +.83) the
stronger the relationship between the two variables. Thus, we might expect that there would be no
relationship between the height of college students and their SAT scores, and we would be correct. The
correlation coefficient is very close to zero. However, we might expect a correlation between adult height
and weight to be stronger, and again we would be correct.
THE SIGN: The sign of the correlation coefficient tells us whether these two variable are directly related or
inversely related.
Do the two variables increase and decrease in the same direction?
The more time a student spends studying the better their grade, the less time spent studying the lower the
grade. Notice how both study time and grade vary in the same direction. As studying increases grades
increase, and when studying decreases grades decline. Grade and study time would be POSITIVELY
correlated. The term POSITIVE does not necessarily mean it’s a good thing (when is getting a poor grade a
"good" thing!). It simply means that there is a direct relationship; the variables are varying (changing) in the
same direction.
Do the two variables vary in opposing directions?
As the number of children in a family increase the lower the IQ scores of the children. Thus, family size and
children's IQ scores vary in the opposite direction. As family size increases the IQ scores decline, as the family
size decreases IQ scores increase. IQ and family size are NEGATIVELY correlated (inversely related).
Try the following exercise to see if you understand the concept of correlation. INSTRUCTIONS
Read each of the descriptions below. Then determine whether the relationship described suggests a positive
or negative correlation (the section on "statistics: correlation" will review what is meant by Positive and
Negative correlation). Then consider why we might find this relationship. The more you think about the
correlation suggested the more possible explanations for this relationship you are likely to find. This
highlights why causality cannot be established through correlational research. (The section on correlational
studies reviews this idea).
A researcher finds that students who attend fewer classes get poorer grades.
 Is this a positive or negative correlation?
 Why might we find a relationship between attendance and grades?
Example 1: A researcher finds that students who have more absences get poorer grades.
Cities with more stores selling pornography have higher rates of violence.
 Is this a positive or negative correlation?
 Why might we find a relationship between attendance and grades?
 Example 2: Cities with more stores selling pornography have higher rates of violence.
7
Tripken
1 Research Methods/Stats
The longer couples have been together the more similar they are in their attitudes and opinions.
Is this a positive or negative correlation?
 Why might we find a relationship between attendance and grades?
 Example 3: The longer couples have been together the more similar they are in their attitudes and
opinions.
Moral of the Lesson:
In each case above there was more than one explanation for why we might find the relationship between the
variables. Since we cannot rule out these alternative explanations, we cannot conclude that changes in one
variable "caused" changes in the other variable.
The snappy phrase to express this idea is: CORRELATION does not equal CAUSATION
Inferential Statistics
Inferential Statistics allow researchers to draw conclusions (inferences) from the data. There are several
types of inferential statistics. The choice of statistic depends on the nature of the study. Covering the
different procedures used is beyond the scope of this course. However, understanding why they are used is
important.
A researcher asks two groups of children to complete a personality test. The researcher then wants to know
whether the males scored differently than the females on certain measures of personality. We will create a
fictitious personality trait "Q." Here are the scores for the girls and the boys:
Girls
Boys
23
37
40
56
37
18
41
41
41
42
33
38
28
50
25
22
24
33
13
47
28
25
44
46
Mean=31.42
Mean=37.92
SD=9.03
SD=11.14
The mean score for the "Q" trait in boys was
higher than the mean score for "Q" in the
girls. But notice how within the two groups
there was considerable fluctuation. By
"chance" alone we might have obtained these
different values. Thus, in order to conclude
that "Q" shows a gender difference, we need
to rule out that these differences were just a
fluke. This is where inferential statistics come
in to play.
An important concept in inferential statistics is STATISTICAL SIGNIFICANCE. When an inferential statistic
reveals a statistically significant result the differences between the groups were unlikely due to chance. Thus,
we can rule out chance with a certain degree of confidence. When the results of the inferential statistic are
not statistically significant, chance could still be a reason why we obtained the observations that we did.
8
Tripken
1 Research Methods/Stats
In the example above we would use an inferential statistic called a T-TEST. The t-test is used when we are
comparing TWO groups. In this instance the t-test does not yield a statistically significant difference. In
other words, the differences between the scores for the boys and the scores for the girls are not large enough
for us to rule out chance as a possible explanation. We would have to conclude then that there is no gender
difference for our hypothetical "Q" trait.
Inferential statistics do not tell you whether your study is accurate or whether your findings are important.
Statistics cannot make up for an ill-conceived study or theory. They simply assess whether we can rule out
the first "extraneous" variable of all research, CHANCE.
Statistics
Statistics are used to organize, summarize, and interpret empirical data.
 Descriptive Statistics helps us to organize and summarize the data.
 Inferential Statistics help us to interpret the data gathered.
Organizing the Data
Data can be organized using frequency counts and graphs to visually structure the data set. For example, a
researcher tallies the following scores on a memory test gathered from 23 subjects.
5 13 11 12 12 11 11 12
8 12 8 11 10 13 7 12 7 5 7 12 9 11 14
Arranged in this way the data set is very confusing. However, we could group the numbers in a frequency
count. The data above ranges from 5 to 14.
5
xx
6
2
0
7
xxx
3
8
xx
2
9
x
1
10
x
1
11
xxxxx
5
12
xxxxxx
6
13
xx
2
14
x
1
We can see that the data is grouped more toward the higher end than the lower end, with almost half of the
sample scoring 11 or 12.
Summarizing the Data
While a frequency table like the one above helps us to make some sense out of the numbers, it would be nice
if we could somehow summarize the scores of 23 subjects with a single score.
 Measures of Central Tendency
 Measures of Variability
 Measures of Central Tendency
9
Tripken
1 Research Methods/Stats
There are three measures of central tendency: MEAN, MEDIAN, MODE
Each is a single score to represent a group of scores. As their collective name suggests they are looking for
the most "central" or typical response.
The MEAN is the arithmetic average. Sum the data points in the above example and divide by the number of
data points. 233 / 23 = 10.13
The MEDIAN is the exact midpoint of the data set. To calculate the median you place the numbers in order.
5 5 7 7 7 8 8 9 10 11 11 11 11 11 12 12 12 12 12 12 13 13 14
The midpoint is the observation that is in the middle of the set. As there were 23 people this would be the
12th data point.
5 5 7 7 7 8 8 9 10 11 11 11 11 11 12 12 12 12 12 12 13 13 14
What if there had been 24 people? The median is the mean of the middle two points. In our example above it
would still be 11.
The MODE is the most frequent score in the data set. In the example above this is 12. Six people scored 12
on the memory test.
Why Three Different Types of Measures?
MODE: Survey researchers are often interested in what is the most common score and thus the MODE is the
measure of choice. If I asked students which they prefer COKE or PEPSI, the mean and median score is
meaningless. What I want to know is which soda is preferred. Thus, I want the modal response.
MEDIAN: Notice in our memory score example the MEAN, MEDIAN, and MODE were not identical. The
mean was a little over 10, the median 11, and the mode 12.
Why? In our example, the data was SKEWED. What is "Skewed"? It means that many scores were bunched
down one end with a few scores existing at the other end of the scale. In what is called a NORMAL
DISTRIBUTION the data points are symmetric and the mean, median, and mode are the same value. In our
example the data was not strongly skewed, because the three values were at least close together. But
consider the following data set.
0
xxxxxxx
1
xxxxx
2
xxxx
3
xx
4
x
5
6
7
Many scores bunch between 0 and 4, with a
few trailing off at 7-10.
The MEAN is 60 / 23=2.6. The MEDIAN is 1.
As you can see with the following example,
the MEAN is strongly influenced by the more
extreme (atypical) scores. When the data are
very skewed, the MEAN can be a poor
representative of the data set, while the
MEDIAN is unaffected by extreme values.
x
8
9
10
xxx
10
Tripken
1 Research Methods/Stats
MEAN: The mean is the preferred choice, especially when the data are not highly skewed. The mean is used
in the calculation of most Inferential Statistics and is used to calculate variability.
Try the following exercise to see if you understand the concepts of mean, median, and mode.
Exercise - Measures of Central Tendency
1. The measure that is most commonly used by researchers because it is used to calculate inferential
statistics is: _____________________________
2. The measure that is least affected by extreme scores is: _________________________
3. The Mode of the following set of data (5 6 6 7 8 8 8 9) is: _______________________
4. The Mean of the data set is: __________________________
5. The Median is: __________________________________
The measures of central tendency summarize the data in terms of a single number, but not all scores in the
data set reflect that value. Measures of variability allow us to assess how much the scores in the data differ
from each other. The simplest measure of variability is the RANGE (highest and lowest score). However, the
range is also strongly influenced by more extreme scores.
0 1 1 1 2 2 2 3 5 12
The range is 0-12. But most scores are really 0-5, thus, the range can be misleading.
11
Tripken
1 Research Methods/Stats
Another method is to subtract each observation from the mean of the data set. This is the Standard
Deviation. Below is an example of how to calculate the standard deviation of the data set above. The mean
in the above data set is 29 / 10=2.9. We will subtract this value from each and every score in our data set.
0
1
1
1
2
2
2
3
5
12
-
-2.92
-1.92
-1.92
-1.92
-0.92
-0.92
-0.92
0.12
2.12
9.12
2.9
2.9
2.9
2.9
2.9
2.9
2.9
2.9
2.9
2.9
=
=
=
=
=
=
=
=
=
=
-2.9
-1.9
-1.9
-1.9
-0.9
-0.9
-0.9
0.1
2.1
9.1
=
8.41
3.61
= 3.61
= 3.61
= 0.81
= 0.81
= 0.81
= 0.01
= 4.41
= 82.81
If we were to sum up the values in the final column we would have a score of
zero. In addition, working with negative numbers is always annoying, so if
we square the values in the last column all numbers will end up as positive
values (a negative value squared ends up as a positive value).
The sum of the final column is 108.9. We need to calculate the Mean of this
score (108.9 / 10). This value is 10.89.
This is the variance of the data, but as it is based on squared values, it is also
a squared value. If the values above were people's reaction times, the value
of 10.89 would be in squared reaction times, which is hard to comprehend.
Thus, we square root this value. The standard deviation is 3.3.
=
The larger the standard deviation the greater the degree of variability in the data set. Thus, let us compare
the following two groups of responses.
2224 4 5 5 6
1 1 1 2367 9
Group 1 mean = 3.75
Group 2 mean = 3.75
Standard deviation (SD)= 1.48
Standard deviation (SD)= 2.95
Groups 1 and 2 have the same mean score. However, the scores in group 2 are more variable. The SD value
reflects this greater variation of the individual scores from the mean.
Correlation
Some studies are interested in whether two variables are related to each other.
 Is there a relationship between birth order and IQ scores?
 Is there a relationship between socioeconomic status (SES) and health?
The CORRELATION COEFFICIENT is a statistic that shows the strength of the relationship between the two
variables. The correlation coefficient falls between -1.00 and +1.00. The statistic shows both the STRENGTH
of the relationship between the variables, and the DIRECTION of the relationship. The numerical value
indicates the strength of the relationship. The sign in front of the numerical value indicates the direction of
the relationship. Let us consider each of these in more detail.
12
Tripken
1 Research Methods/Stats
THE NUMBERICAL VALUE:
Correlation coefficient values that are close to zero (e.g., -.13, +.08) suggest that there is no relationship
between the two variables. The closer the correlation is to one (e.g., -.97, +.83) the stronger the relationship
between the two variables. Thus, we might expect that there would be no relationship between the height
of college students and their SAT scores, and we would be correct. The correlation coefficient is very close to
zero. However, we might expect a correlation between adult height and weight to be stronger, and again we
would be correct.
THE SIGN:
The sign of the correlation coefficient tells us whether these two variable are directly related or inversely
related.
Do the two variables increase and decrease in the same direction?
The more time a student spends studying the better their grade, the less time spent studying the lower the
grade. Notice how both study time and grade vary in the same direction. As studying increases grades
increase, and when studying decreases grades decline. Grade and study time would be POSITIVELY
correlated. The term POSITIVE does not necessarily mean its a good thing (when is getting a poor grade a
"good" thing!). It simply means that there is a direct relationship, the variables are varying (changing) in the
same direction.
Do the two variables vary in opposing directions?
As the number of children in a family increase the lower the IQ scores of the children. Thus, family size and
children's IQ scores vary in the opposite direction. As family size increases the IQ scores decline, as the family
size decreases IQ scores increase. IQ and family size are NEGATIVELY correlated (inversely related).
Inferential Statistics
Inferential Statistics allow researchers to draw conclusions (inferences) from the data. There are several
types of inferential statistics. The choice of statistic depends on the nature of the study. Covering the
different procedures used is beyond the scope of this course. However, understanding why they are used is
important.
A researcher asks two groups of children to complete a personality test. The researcher then wants to know
whether the males scored differently than the females on certain measures of personality. We will create a
fictitious personality trait "QZ." Here are the scores for the girls and the boys:
Girls
Boys
23
37
40
56
37
18
41
41
41
42
33
38
28
50
25
22
The mean score for the "QZ" trait in boys was
higher than the mean score for "QZ" in the
girls. But notice how within the two groups
there was considerable fluctuation. By
"chance" alone we might have obtained these
different values. Thus, in order to conclude
that "QZ" shows a gender difference, we need
to rule out that these differences were just a
fluke. This is where inferential statistics come
in to play.
13
Tripken
1 Research Methods/Stats
24
33
13
47
28
25
44
46
Mean=31.42
Mean=37.92
SD=9.03
SD=11.14
An important concept in inferential statistics is STATISTICAL SIGNIFICANCE. When an inferential statistic
reveals a statistically significant result the differences between the groups were unlikely due to chance. Thus,
we can rule out chance with a certain degree of confidence. When the results of the inferential statistic are
not statistically significant, chance could still be a reason why we obtained the observations that we did.
In the example above we would use an inferential statistic called a T-TEST. The t-test is used when we are
comparing TWO groups. In this instance the t-test does not yield a statistically significant difference. In
other words, the differences between the scores for the boys and the scores for the girls are not large enough
for us to rule out chance as a possible explanation. We would have to conclude then that there is no gender
difference for our hypothetical "QZ" trait.
Inferential statistics do not tell you whether your study is accurate or whether your findings are important.
Statistics cannot make up for an ill-conceived study or theory. They simply assess whether we can rule out
the first "extraneous" variable of all research, CHANCE.
How do we know if our research has concluded anything of value?
Tests of statistical significance determine if the difference is to big to be due to chance alone
The tests look at two factors:
1. They look at the size of the difference.
The bigger the difference between the groups, the more likely the results are to be statistically significant.
For example, if the Experimental group averages 95% and the control group averages 45% on our test, that
difference would probably be statistically significant. (Intuitively, you do the same thing. If your team gets
beat by one point, you point out that the other team was lucky. You don't have to concede that the other
team is better. However, if they beat your team by 30 points, you may have to admit that the other team is
better).
2. They look at the number of participants.
The more participants that are used, the more likely the results are to be statistically significant. (Why?
Because if you only have a few participants, the groups might be very different at the beginning of the study.
However, if you have 100 participants in each group, the groups should be pretty similar before the start of
the study. If they are very similar at the start, then, if they are even slightly different at the end, that
difference could be due to the treatment. Similarly, in sports, if one team beats another in a seven game
series that's more convincing evidence of the team's superiority than winning a single game.)
Two possible verdicts from statistical tests
1. statistically significant: you are sure beyond a reasonable doubt (your doubt is less than 5% (<.05%)) that
the difference between your groups is too big to be due to chance alone.
So, if the difference between the treatment group and the no-treatment group is too big to be due to chance
alone, then some of that difference is probably due to treatment. In other words, the treatment probably had
an effect.
14
Tripken
1 Research Methods/Stats
2. not statistically significant: you are not sure, beyond a reasonable doubt, that the difference between the
groups is due to anything more than just chance.
So, you can't conclude anything. The results are inconclusive.
15
Tripken
1 Research Methods/Stats
T-test - What is it?
The T-test is used to determine whether there’s a significant difference between two group means. It helps to
answer the underlying question: do the two groups come from the same population, and only appear different
because of chance errors, or is there some significant difference between these two groups, such that we can say
that they’re really from two entirely different populations? For example, is the PROBABLITY of group 1 acting
calmer, after taking a new anxiety medication, because of the meds or is there “calm-ness” due to
chance/was it an accident?
Three basic factors help determine whether an apparent difference between two groups is a true difference
or just an error due to chance:
1. the larger the sample, the less likely that the difference is due to sampling errors or chance
2. the larger the difference between the two means, the less likely the difference is due to sampling errors
3. The smaller variance among the participants, the less likely that the difference was created by sampling
errors
Reporting Data -When t is significant: basically, is your results is due to the meds, or, is your results due
to chance?
** The difference between the means must be statistically significant for you to be able to claim that
your experiment created change.
The Z-test, similar to the t-test, is a statistical test used in inference which determines if the difference
between a sample mean and the population mean is large enough to be statistically significant, that is, if it is
unlikely to have occurred by chance.
 The Z-test is used primarily with standardized testing to determine if the test scores of a particular sample
of test takers are within or outside of the standard performance of test takers.
Definition of a P value
Consider an experiment where you've measured values in two samples, and the means are different. How
sure are you that the population means are different as well? There are two possibilities:
 The populations have different means.
 The populations have the same mean, and the difference you observed is a coincidence of random
sampling.
The P value is a probability, with a value ranging from zero to one. It is the answer to this question: If the
populations really have the same mean overall, what is the probability that random sampling would lead to a
difference between sample means as large (or larger) than you observed?
How are P values calculated? There are many methods, and you'll need to read a statistics text, and take
some Tylenol, to learn about them. The choice of statistical tests depends on how you express the results of
an experiment (measurement, survival time, proportion, etc.), on whether the treatment groups are paired,
and on whether you are willing to assume that measured values follow a Gaussian bell-shaped distribution.
16
Tripken
1 Research Methods/Stats
We use the T and Z Test to determine PROBILITY levels in your experiment.
Common misinterpretation of a P value
Many people misunderstand what question a P value answers.
If the P value is 0.03, that means that there is a 3% chance of observing a difference as large as you observed
even if the two population means are identical. It is tempting to conclude, therefore, that there is a 97%
chance that the difference you observed reflects a real difference between populations and a 3% chance that
the difference is due to chance. Wrong. What you can say is that random sampling from identical populations
would lead to a difference smaller than you observed in 97% of experiments and larger than you observed in
3% of experiments.
You have to choose. Would you rather believe in a 3% coincidence? Or that the population means are really
different?
"Extremely significant" results
Intuitively, you probably think that P=0.0001 is more statistically significant than P=0.04. Using strict
definitions, this is not correct. Once you have set a threshold P/alpha value for statistical significance, every
result is either statistically significant or is not statistically significant. Some statisticians feel very strongly
about this.
Many scientists are not so rigid, and refer to results as being "very significant" or "extremely significant" when
the P value is tiny. Often, results are flagged with a single asterisk when the P value is less than 0.05, with two
asterisks when the P value is less than 0.01, and three asterisks when the P value is less than 0.001.
P<.05 – results are significant
P<.01 – results are very significant
P<.001 – results are extremely significant
Statistical hypothesis testing
The P value is a fraction. In many situations, the best thing to do is report that number to summarize the
results of a comparison.
1. Set a threshold P value (also called the alpha for significance) before you do the experiment.
Traditionally 0.05 is a minimum threshold for significance.
2. Define the null hypothesis. If you are comparing two means, the null hypothesis is that the two
populations have the same mean.
3. Do the appropriate statistical test to compute the P value.
4. Compare the P value to the preset threshold value. If the P value is less than the threshold, state that
you "reject the null hypothesis" and that the difference is "statistically significant". If the P value is
greater than the threshold, state that you "do not reject the null hypothesis" and that the difference is
"not statistically significant".
17
Tripken
1 Research Methods/Stats
Answers Only
Answers:
1. Is the content of dreams the result of unconscious motives and desires?
While this was the assumption that Freud made, one of the reasons his theory was challenged was because it
was unscientific. We have no objective measure of the "unconscious" to even establish that it exists. So
science cannot answer this question at present.
2. Is there any relationship between birth order and personality?
Birth order is fairly straight-forward to measure, personality would have to be more precisely defined.
However, science can assess whether birth order is related to certain personality characteristics. Asking
whether it "causes" such characteristics would be more difficult to assess as there are far too many
uncontrolled factors.
3. Do college students consume more pizza than any other group?
While we would have to be more precise in defining who are the college students and who are the other
groups, science could assess this more precisely defined question.
4. Can humans be innately evil?
No, science cannot answer if humans are innately evil. How do you measure "evil"? This is too much of a
value judgment, perhaps best left to philosophy and theology.
5. Do sales of pain relievers increase during periods of economic crisis?
While we would have to be more precise about which types of pain relievers and what would be defined as an
economic crisis, science could assess this.
6. Do animals dream?
While most mammals do experience REM sleep, we cannot ask Fido and Fluffy what they were
experiencing. We need more objective measures. Science has the same problem with answering whether
human infants and fetuses dream. Both groups experience REM sleep (in fact, 50% or more of their sleep
time is spent in REM), but we cannot ask either what they were experiencing. Thus, at present science cannot
answer this question.
IV and DV
Answers/Example 1 D) Independent variable was the presence or absence of the drug. This was the variable
being manipulated by the researcher.
C) Dependent variable was the length of time it took the rats to remember where the rat chow was after one
week. This was the measure of the subjects' response.
Example 2 Independent variable was the length of time the subjects were sleep deprived.
Dependent variable was the physical coordination skills of the subjects.
Example 3 Independent variable was the number of people in the group.
Dependent variable was whether the subjects conformed with the group.
Example 4 Independent variable was whether the subjects were hypnotized.
Dependent variable was how much subjects recalled.
Confounding Variables / Answers
EXAMPLE 1 ANSWER - In any study where different subjects are being used in the treatment groups it is
important that you establish that the groups are the same at the outset. Thus, any differences found at the
end were due to your manipulation and not to preexisting differences. There is no mention of a pre-test of
subjects' mood or that subjects' moods had even been altered by watching the films (post-test). The
researcher is assuming that watching a funny film would make someone happy, or witnessing an upsetting
film will produce a negative mood. This is not always a safe assumption, and should always be verified.
18
Tripken
1 Research Methods/Stats
Second, the day of the week might be a possible confound. In this case, it might lead to fewer "happy"
subjects in the "positive mood group" as this group was assessed on a MONDAY!
EXAMPLE 2 ANSWER - There are a couple of problems with this study. Although the researcher did attempt
to control potential extraneous variables (the subjects were all night students at the same community
college), the experimental and control groups were not only taught using two different methods of
instruction, but by two different instructors. One teacher may have been a much better instructor than the
other. If this was the teacher who used the newer method, students superior performance may have had as
much to do with the teacher as it did the new approach. A second problem, is the students. The researcher
did not establish the level of ability of the two classes at the outset. One class may have had students with a
higher level of language skills than the other. The control and experimental group should be very similar to
one another at the outset. Thus, any differences at the end cannot be attributed to any preexisting
differences.
CORRELATION SECTION
 EXAMPLE 1 - ANSWER NEGATIVE correlation: As the number of absences increase, the grade
declines. The variables are changing in the opposite direction.
We might find this relationship for many reasons. (1) Students who are more absent miss important pieces
of information that would increase their chances of performing better in the class. (2) Students who have
difficulty in a class may stop attending because they see no reason for going. Thus, does the greater class
absence "cause" the poor grades or do poor grades "cause" the greater absence. (3) A student who is not
highly motivated may be absent more often and may do poorly. Thus, these two variables are related to
other variables (such as motivation) which may be the real reason for the relationship between class absences
and grades.


Example 2 Correlation answer - POSITIVE correlation: As the number of stores selling pornography
increases violence increases. Both variables are varying in the same direction.
One possibility is that pornography (which often depicts violence as well as sexuality) "causes"
aggression. It is also possible that people who are aggressive are drawn to images that depict
aggression. But there is likely a third factor in this case, the POPULATION of a CITY. As cities
increase in population there is an increase in violence, there is also an increase in the number of stores
selling anything (pornography, cars, grocery stores).
Example 3 Correlation Answer - POSITIVE correlation: As the amount of time spent together
increases, similarity increases. Both variables are varying in the same direction.
This might happen because over time people influence each other. It might also occur because people
who are most similar to each other to start with stay together longer. Thus, does the amount of time
together "cause" the similarity, or does initial similarity make it more likely that people will stay
together.
Measuring Answers: Mean, Median, 8, 7.12, 7.5 Measures of Variation
19