Download Scores

Document related concepts

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Operations research wikipedia , lookup

Human subject research wikipedia , lookup

Transcript
Research Methods in Language Issues ……………………………………
Part I
Preliminaries of Research
1
Chapter 1: What is research? ………………………………………..
Chapter 1
What is Research?
2
Research Methods in Language Issues ……………………………………
3
What is research?
Curiosity is every part and parcel of mankind.
Human beings are born curious. Right from the time
little children learn to speak, they begin to ask
questions and seek answers to their questions. It is this
curiosity to which we owe most, if not all, of our
present body of knowledge.
In their attempts to find answers to their
questions, human beings experience and learn new
things. These attempts have been one of the basic
sources of human knowledge throughout history.
However, the kind and shape of these efforts have
changed drastically over time. Early mankind obtained
information in some very simple traditional ways.
Today, technology and modern equipment have made it
possible for mankind to carry out rather complicated
and systematic investigation of various phenomena. In
simple terms, research refers to this systematic
approach we use to answer our questions.
Sources of information
In the course of history, our ancestors
accumulated knowledge in ways that were not always
as scientific and systematic as today’s research
conducted in well-equipped laboratories under strictly
controlled conditions. The traditional sources of
obtaining information include sensory experience,
authorities, and logic.
Chapter 1: What is research? ………………………………………..
4
A. Sensory experience
One of the earliest and most immediate sources
of information is the personal experience we get
through our senses: seeing, hearing, smelling, taste, and
touch. Each of these senses is a valuable source of
information. How do we know someone is at the door
when the door is shut and we are inside? We hear the
knocking. You are chatting with a friend on the phone.
All of a sudden, you say: “I have to go now. The food
is burning.” How do you know? You can smell. Even
little Johnny, who is only six years old, knows that a
certain food is more delicious than another. How? He
can taste.
Despite its obvious value as a source of
information, sensory experience has a number of
shortcomings. The first problem is that information
obtained through the senses is not always accurate; our
senses sometimes mislead us. As an example look at
the following lines. Are they parallel? See?! Your eyes
misled you.
Research Methods in Language Issues ……………………………………
5
Moreover, the information we get through
sensory experience is relative. Two eyewitnesses,
reporting a car accident to the police, in all their
honesty, may give two different or even conflicting
accounts of what happened. This means that their eyes
have seen the same thing differently. Or, put the
forefinger of your right hand in a glass of hot water and
the forefinger of your left hand in a glass of cold water.
Obviously, your right-hand finger will feel hot and the
other cold. Now take both fingers out and put both of
them in a glass of luke-warm water. Although they are
in same glass, the finger that was in the hot water feels
cold and vice versa. In short, although sensory
experience is a valuable source of information, the
accuracy and reliability of the information obtained this
way is not always guaranteed.
In addition, experiencing things to get
information is not always a safe venture. Obviously
enough, no normal parent can afford to let their fouryear-old child to know what happens if they overthrow
a kettle of hot water on their body by experiencing it.
This kind of knowledge does not do the child much
good since by the time s/he understands this, the poor
kid also comes to realise that there isn’t much left of
his/her life.
Finally, it goes without saying that not every
piece of information can be obtained simply by the use
of the senses. In fact, much of our present body of
knowledge has been accumulated via some carefully
Chapter 1: What is research? ………………………………………..
6
controlled and systematic studies under rather special
circumstances.
B. Authorities
Another way of getting access to information is
to refer to authorities. Obtaining information through
experience takes time. People can share experiences
with others to save time. Instead of waiting for a chance
to experience something, one can easily consult
someone who has already experienced it. Experts and
authorities in various fields are the people who have
spent a considerable portion of their time studying
certain phenomena. So they can be quite a reliable
source of information. That’s why, for example, when
we catch a cold, we consult a doctor. Everyone will
agree that the doctor has specialized knowledge and
experience, and that his advice is more accurate and
reliable than someone without such knowledge and
expertise.
Even expert opinion, however, cannot be
regarded as mere fact. After all, experts are human, and
humans are apt to make mistakes. In addition, the
information provided by the experts might, for one
reason or another, be biased. And sometimes, the
information might undergo a complete transformation
before it reaches its final destination.
For these and some other reasons, the
information provided by the experts and authorities
should be treated cautiously.
Research Methods in Language Issues ……………………………………
7
C. Logic
Humans are endowed with a God-given blessing
called ‘logic’. By this, they can reason, think, and learn
new things. Logic is of two types: deductive and
inductive. Deductive logic moves from general
statements of facts to more specific conclusions. For
example, if you are to meet a person, named Mark,
whom you have never met before, you already know
that ‘Mark breathes’. This is because you reason that:
A : Human beings breathe.
B : Mark is a human being. So,
C : Mark breathes.
Or, you naturally avoid drinking from a bottle on
which the word ‘poison’ is written because your logic
tells you:
A : poison is dangerous.
B : Liquid X is a poison. So,
C : Liquid X is dangerous.
In both of these examples, sentence ‘A’ is called
a major premise, ‘B’ a minor premise, and ‘C’ a
generalization.
In case of the latter example cited above, your
deductive logic saves your life, but it isn’t always as
helpful. Sometimes, using false major and minor
premises may lead you to wrong conclusions.
Only men have logic.
Women are not men.
Women do not have logic.
This is due to the use of a wrong premise. ‘Men’
in the first sentence means ‘human beings’ not the
Chapter 1: What is research? ………………………………………..
8
‘male humans’. Even correct major and minor premises
may also lead to false conclusions due to the fact that
there are exceptions in nature. You know that ‘birds can
fly’. Then you learn that ‘penguin is a bird’. So, you
conclude ‘penguins can fly’.
Inductive knowledge moves from individual and
specific statement of facts to general conclusions. In the
afore-mentioned examples, we started with some
general statements such as ‘poison is dangerous’, and
‘birds can fly’. But where do these statements come
from? How do we know that, say, birds fly? We know
that birds fly because we have observed many
individual members of the bird family fly. Then we
have made a general conclusion.
Sparrows are birds, and they fly.
Pigeons are birds, and they fly.
Crows are birds, and they fly. So
Birds fly.
Again it must be noted that because of our
limited knowledge of nature, such generalizations can
hold true only as long as there is no evidence to the
contrary. Once we learn that penguin and some other
birds cannot fly, the general statement loses its validity
as a fact, or at least, needs modification.
Owing to the limitations of each of the abovementioned sources of information, nowadays scientists,
while still exploiting those traditional sources) employ
scientific and systematic methods of obtaining
information. Using the scientific method, scientists
have explored aspects of the universe that were, for so
Research Methods in Language Issues ……………………………………
9
long, unknown to us. On the one hand, they know many
things about stars and planets that are millions of light
years away from us. On the other hand, they have learnt
how to bombard atoms, split them, and learn things
about them which only a few decades ago were simply
inconceivable. It is the scientific method to which we
owe most of our present technology, and it is this
method which we will focus on from this chapter on.
Goals of research
Scientific research is usually done for one or
more of the following four purposes:
1. Description
2. Prediction
3. Improvement
4. Explanation
Each of these goals will be briefly looked at below.
1. Description
A good description of the nature of any
phenomenon is both necessary and useful for a better
understanding of the way it interacts with other
phenomena, the reasons for these interactions, and the
way(s) these interactions can be handled. In other
words, before we can do anything about a phenomenon
in a systematic way, we first need to know what it is.
Therefore, the purpose of many researchers is to
describe an event or a phenomenon. Many questions
such as the following can be answered by description.
Chapter 1: What is research? ………………………………………..
10
1. What is the sequence of morpheme development
in Iranian children?
2. In which order do Iranian children develop their
knowledge of tenses?
3. What percent of the Iranian teachers use the
Communicative Approach in their classes?
2. Prediction
Usually, description is not the end point of
research. Description may be necessary to achieve one
or more of the other three goals. Prediction is one such
goal. Descriptions may be needed for making
predictions about the future. For instance, the
description of the Iranian university student population
will tell us things like the percentage of male and
female students, the ratio of employed students to
unemployed ones, etc. Such information may not
always be valuable per se, but it proves quite useful in
helping the officials predict what problems they might
face in the future and think of solutions before it is too
late. Suppose the description of student population in
Iran shows that the ratio of female to male university
students is two to one. This means that before long, the
country will have twice as many graduates as male
ones. So, the government needs to create more jobs that
are suitable for females. Or consider weather forecasts
on TV and radio channels. Meteorologists study and
describe the changes in the earth’s atmosphere in order
to forecast the weather.
Research Methods in Language Issues ……………………………………
11
3. Improvement
One of the major reasons why research is done is
to improve the present conditions. Consider carmanufacturing companies. They spend huge sums of
money on research projects annually to improve the
quality of their products. Similarly, in the area of
language teaching and learning, extensive research is
directed towards greater efficiency in this profession.
4. Explanation
Earlier, it was pointed out that research
originates in human curiosity, the desire to know the
reason(s) for everything that happens around them.
Men experience many new things, and intrinsically
come up with many ‘why’s. In an attempt to find
explanations for these new mysteries, they conduct
research.
Characteristics of research
Regardless of the purpose for which it is
conducted, research should have certain characteristics.
It should be :
1. Systematic
2. Generative
3. Reductive
4. Replicable
5. Logical
Chapter 1: What is research? ………………………………………..
12
1. Research is systematic
This means that at each stage of research,
researchers follow a number of pre-established steps
and procedures. When researchers publicize the
outcome of their research, they intend to share their
findings with others. Systematicity means that they
should follow certain already-established regulations,
known to other researchers, so that comprehension and
interpretation become easier.
2. Research is generative
The word generative means ‘productive’ or
‘creative’. Research generates something. What?
Questions. This may sound strange. If research is
carried out to answer questions, how can it generate
questions? Well, in fact it answers the question under
investigation, but generates many other questions.
Suppose you are conducting a research on the
relationship between age and language learning. You
have selected a group of young and a group of old
subjects. You intend to compare their learning of
English. While doing your research, you notice that the
ratio of male subjects to female subjects in the two
groups differs. This way a new question comes to your
mind, “Does gender influence language learning?” You
may also notice that the linguistic background of
subjects in the two groups varies. Another question
avails itself, “Does linguistic background affect L2
learning?” The more you try to answer these questions,
Research Methods in Language Issues ……………………………………
13
the more you realize how many more questions there
are yet to answer.
3. Research is reductive
This characteristic can be viewed from two
perspectives: conceptual and practical. From the
conceptual perspective, research reduces many
individual statements of fact into fewer but more
general statements. As an example, consider medical
research. One research may arrive at the conclusion that
a new drug (A) is good for curing the disease (D).
Another research may end up with the conclusion that
another drug (B) is also conducive to curing the same
disease. The outcome of still another research may
suggest that a third drug (C) has similar effects on the
same disease. Now, comparing and contrasting the
three drugs, researcher come up with the conclusion
that the drugs A, B, and C have the element X in
common. Thus, the three individual statements are
reduced to a single statement, that the element X cures
the disease D.
From the practical point of view, research
reduces the responsibility of other researchers. Due to
the huge size of the human reservoir of knowledge and
the multitude of questions waiting for answers, no one
can answer all the questions and explore all the
mysteries surrounding their field of specialization. For
this reason, once a researcher explores one aspect of an
issue and the finding is confirmed by some
confirmatory research, the responsibility to carry out
Chapter 1: What is research? ………………………………………..
14
the same research will be off the shoulders of other
researchers. They will take the result and use it as a
base on which to make their own contribution to human
knowledge.
4. Research is replicable
It was said that research reduces the
responsibility of other researchers. But this is so only
after the findings are confirmed by a number of other
studies. In other words, the result produced by research
should be repeatable. If the same research is repeated
under the same conditions, the same results should be
obtained; otherwise, the outcome cannot be used as a
valid piece of information.
5. Research is logical
Finally, logic should be applied from the very
beginning up to the very end of research. In all the
steps, logic has to be applied to make sure that what is
done makes sense. Imagine how strange it can be to
give a test of listening comprehension to a group of
students to investigate the effect of the reading practice
on the writing ability!
Kinds of research
Before explaining kinds of research, a distinction
should be made between kinds and methods of
research. Method refers to the practical steps and
procedures employed in the research process. Three
methods of research, namely historical, descriptive, and
Research Methods in Language Issues ……………………………………
15
experimental methods will be elaborated in later
chapters. Kind, on the other hand, refers to the nature of
research and has nothing to do with the procedures
employed.
So far as kind is concerned, research is either
exploratory or confirmatory. Exploratory research is
the kind of research that is done for the first time to
explore a mystery. We owe much of the discoveries
mankind has made to this kind of research.
Confirmatory research, on the other hand, is the kind of
research that is conducted after the exploratory research
to see whether or not the results obtained via
exploratory research can be confirmed. Simply put, it is
the exact or partial replication of previous research in
order to either consolidate or refute it’s findings.
Each of these two kinds can be either pure or
applied. Pure research is done to satisfy human
curiosity. It may have no real application in the real
world. It is the kind of research done for the sake of
research itself. In applied research, however, the
finding of research is used for some practical purpose.
Applied researchers apply the findings of pure
researchers to the real world. Pure researchers are
simply interested in discovering the laws of, say,
physics while applied researchers try to utilize the
discovered laws in car-manufacturing companies to
produce cars with certain characteristics including
weight, speed, aerodynamic structure, etc.
Chapter 1: What is research? ……………………………………………..
16
In a nutshell, since the two classifications of
research kinds are not mutually exclusive, one can
imagine the following four kinds of research:
Pure
Applied
Exploratory
1
3
Confirmatory
2
4
Research Methods in Language Issues ……………………………………
Chapter 2
The Concept of Variable
17
Chapter 2: The concept of variable ………………………………………..
18
What is a variable?
Research is the study of the relationship between
two or more variables. But what is a variable? As the
name speaks for itself, a variable is anything that can
vary. It is an attribute or a characteristic that changes
from person to person, object to object, time to time,
situation to situation, etc. Weight is a variable because
it differs from one person or object to another.
Temperature is also a variable since it varies from place
to place and from time to time.
Kinds of variables
Some variables such as weight, age, length,
temperature can be directly observed or measured. Such
variables are called concrete variables. Using a
thermometer, for instance, one can easily measure
temperature. Some other variables including
intelligence, anxiety, language proficiency, etc. are not
directly observable. Of course, they do have some
manifestations and can be indirectly estimated or
measured through their manifestations. Nevertheless,
they cannot be directly observed and measured. They
are called abstract variables.
Variables like gender, left-handedness, marital
status are discrete variables. Their nature is of all-ornothing type. That is to say, they either exist or do not.
For example, every normal person is either male or
female, either left-handed or right-handed; there is no
third possibility. On the other hand, variables like
Research Methods in Language Issues ……………………………………
19
intelligence, temperature, age, etc., which can represent
a range of possibilities and form a continuum where
every person or object may have a different degree of
the attribute , are continuous variables.
Thus, concrete variables are those that can be
directly observed and measured while abstract variables
cannot be directly observed and measured. In addition,
when a variable is of all-or-nothing (either – or) nature,
it is discrete, but when there are different degrees of the
attribute, the variable is continuous.
Again it must be remembered that the
distinctions made between concrete versus abstract and
discrete versus continuous are not mutually exclusive.
This means that both concrete and abstract variables
can be either discrete or continuous and vice versa.
Hence, there can be four different kinds of variable.
discrete
continuous
concrete
1
3
abstract
2
4
Functions of variables
Earlier it was mentioned that research is the
study of the relationship between two or more
variables. Depending on the kind of research question
and the kind of relationship between variables,
variables will have one of the following functions:
independent variable
dependent variable
moderator variable
control variable
intervening variable
Chapter 2: The concept of variable ………………………………………..
20
An independent variable is a variable that is
selected and manipulated by the researcher to study its
effect on the dependent variable. It stands by itself and
does not depend on any other variable, and is selected
on the basis of the sheer interest of researchers. A
dependent variable is one that is carefully observed
and measured to determine the extent of the
effectiveness of the treatment (the independent
variable).
An example may help clarify the point.
Suppose a researcher wants to study the effect of age on
language learning. To conduct such a study, first of all,
the researcher should operationally define the
variables, i.e., s/he must define variables in a way that
can be objectively measured and quantified. In this
case, the researcher should clarify what s/he exactly
means by age; what range of age is considered young,
middle-aged, and old. Language learning needs to be
defined in a similar way. Suppose that language
learning is defined as the subjects’ performance (score)
on a language proficiency test.
In the above-mentioned example, the choice of
‘age’ as a factor influencing language learning is totally
up to the researcher. There are many other factors that
influence language learning. This researcher has just
chosen ‘age’. So, the variable ‘age’ , which has been
selected by the researcher and does not depend on any
other variable is the independent variable. On the other
hand, the researcher has no choice over the degree of
language learning by the different age groups. He
cannot choose which age group should learn how much
Research Methods in Language Issues ……………………………………
21
language. The degree of language learning will be
influenced and determined by the independent variable
(age). Thus, language learning, defined as the subjects’
score on a language proficiency test, is the dependent
variable.
To conduct such a study, after randomly
selecting the subjects and assigning them to the young
and old groups, a treatment (certain amount of
instruction) will be given to both groups under similar
conditions. Then a language proficiency test will be
administered to both groups. Comparing the scores of
the two groups on the test, the researcher makes
conclusions about the effect of age on language
learning. Assuming that the mean (average score) of the
young group is 16 and that of the old group is 14, and
assuming further that there are ten subjects (male and
female) in each group, the researcher may conclude that
young age positively influences language learning.
Now suppose that while analyzing the data, the
researcher notices that the ‘age’ of the subjects has
affected the performance of male and female subjects
differently as shown below:
Age
Young
Old
Sex
Male
Female
Male
Female
Mean
14
18
14.4
13.5
Grand mean
16
14
Chapter 2: The concept of variable ………………………………………..
22
Although the grand average of the young group
is better than that of the old group, a closer look
suggests that apparently the young age of the male
subjects does not positively influence their language
learning. A comparison of the mean scores of the young
and old male subjects shows that the young subjects
have even a slightly lower score than their old
counterparts. The reason for the original conclusion,
that young age influences language learning positively,
seems to be the sharp difference in the mean scores of
the young and old female subjects. The researcher now
moderates the original strong claim and states the more
moderate conclusion that young age positively affects
the language learning of females. In this example, the
variable ‘gender’, which influences the outcome of the
research and moderates its finding, is called a
moderator variable.
Assume further that someone who is critical of
the above research questions the outcome of the study
claiming that the difference in the scores of the young
and old females may not be necessarily caused by their
age. There may well be other factors at work. S/he then
argues that since the linguistic backgrounds (first
language) of the subjects were different, the difference
in their scores might actually have been caused by their
linguistic background rather than their age. You see!!
The criticism sounds quite relevant and due. To avoid
facing such criticisms, the researcher needs to control
other variables that are not under investigation but may
influence the outcome of the study. Such factors should
Research Methods in Language Issues ……………………………………
23
be made identical in both groups so that they do not
influence the outcome of the research. Hence, variables
that are held constant to prevent their possible effect on
the outcome of research are called control variables.
It should be pointed out that not all of the
variables influencing the outcome of research can be
controlled. In the above example, for instance, the
researcher can control variables like linguistic
background by selecting both young and old subjects
from a single linguistic background. Nonetheless, there
are variables such as the mood of the subjects, stress,
fatigue, etc. that influence the outcome but cannot be
controlled. Such variables are intervening variables.
There is a second understanding of intervening
variables. In our example, it was mentioned that on the
basis of the scores of the subjects in the two groups
(young and old), the researcher drew the conclusion
that young age influences language learning. But you
remember that language learning is an abstract variable
and cannot be directly observed and measured. What
the researcher observed and measured was the scores of
the subjects not their language learning. Yet, in drawing
such conclusion, the assumption is that language
learning stands between age and the scores of the
subjects (the independent and the dependent variables).
This means that the age of the subjects influences their
language learning, which in turn, influences their
scores. Such a variable is an intervening variable.
Chapter 2: The concept of variable ………………………………………..
24
Variable scales
Different variables require various scales of
measurement. There are four kinds of variable scales:
1. nominal scale
2. ordinal (rank order) scale
3. interval scale
4. ratio scale
1. Nominal scale
The nominal scale is a scale of measurement in
which numbers are used only to name or to code data,
and do not have any mathematical value. Earlier it was
explained that some variables are discrete (of all-ornothing type), like nationality, gender, etc. If the
research question includes a discrete variable, for
instance if it is about the effect of sex on language
learning, obviously two groups of subjects will be
needed, male and female. ‘Gender’ is a discrete
variable. Everyone is either male or female. So the
male subjects will have the feature [+ male] or just +
and the females will have the feature [- male ] or
simply -. Instead of using + and - , one can also use
numbers. The male subjects can be assigned to group 1
and the female subjects to group 2 or vice versa. The
numbers used here have no mathematical value, that is,
no group is mathematically superior to the other; the
numbers are just names for the two groups.
Research Methods in Language Issues ……………………………………
25
2. Ordinal scale
Unlike nominal scale, in ordinal scale numbers
have mathematical value. We know that abstract
variables like intelligence and anxiety can not be
directly observed and measured. We can never see or
observe someone’s intelligence or anxiety. What we
can do in measuring such variables is to compare the
subjects and rank them according to the proportional
degree of the attribute they have. We may decide, for
example, that subject A is the most intelligent, B the
second most intelligent, C the third, and so forth. Here,
too, numbers can be used to name the subjects:
Subject
A
B
C
degree of attribute
the most intelligent
the second most intelligent
the third most intelligent
rank
1
2
3
This time, however, numbers are mathematically
significant and cannot be used interchangeably.
Moreover, although the intervals (differences between
the ranks) are mathematically equal, they are not
actually equal. Namely, the difference between the first
and the second subjects in the rank is not the same as
the difference between the second and the third.
3. Interval scale
Interval scale is similar to ordinal scale. The
only difference is that in ordinal scale intervals are not
equal while in interval scale they are. This is the scale
Chapter 2: The concept of variable ………………………………………..
26
that is most commonly used in educational settings.
The scale used to measure students’ achievement in
most classes has 21 levels ranging from 0 to 20.
4. Ratio scale
It was pointed out that to measure language
learning, we use an interval scale in which scores can
range from 0 to 20. The score zero given to someone
does not indicate any specific degree or level of an
attribute; it simply means that the attribute does not
exist. When measuring the temperature, on the other
hand, zero does not mean that there is no temperature.
If it was -5° C yesterday and 0° C today, this means that
today it is 5° C warmer than yesterday. Zero has true
value here. It indicates a specific meaningful level
which stands between –1 and +1. Ratio scale is much
like the interval scale the only difference being that the
ratio scale has a true zero.
It goes without saying that the ratio scale is at
the same time interval because it has all the
characteristics of the interval scale plus an additional
feature (a true zero). By the same token, an interval
scale is also ordinal and an ordinal scale is nominal, but
not vice versa. Thus, each of the four measurement
scales described before is convertible to the previous
scale but not to the following one.
Research Methods in Language Issues ……………………………………
Part II
How to Conduct Research
27
28
Introduction
In part I, research was defined, sources of
information, goals, characteristics, and kinds of
research were briefly explained. Also, the concept of
variable as well as kinds, functions, and measurement
scales of variables were clarified. From this point on,
the focus of attention will be on how to conduct
research. Remember that one of the characteristics of
research is that it is systematic, i.e., there are certain
steps that should be followed in conducting research. In
part II, these steps will be dealt with. The steps that
should be followed in conducting research include :
1. forming research question
2. formulating research hypothesis
3. reviewing the relavent literature
4. selecting a research method
5. collecting, summarizing, and analysing data
6. reporting research findings
Each of these steps will be discussed in detail in the
following chapters.
Research Methods in Language Issues ……………………………………
Chapter 3
Research Question,
Research Hypothsis, and
Literature Review
29
Chapter 3: Research question, research hypothesis and literature review …
30
Characteristics of research questions
Research is a systematic way of finding answers
to questions. This means that any research begins with
a question. However, not all questions are good
research questions. A good research question should
have a number of characteristics including the
following:
1. interest
2. relevance
3. managability
1. Interest
The first, and probably the most important,
characteristic of a research question is that it should be
of interest to the researcher. Otherwise, the researcher
will not commit him/herself to the task, and will
conduct the research only to fulfil an assignment.
2. Relevance
A good research question is also relevant to the
needs of both the researcher and the community or
society of which s/he is a member. If an Iranian
researcher intends to carry out research, obviously the
question “In what order do Iranian highschool students
learn the tenses of English?” will be much more
relevant than “In what order do South Korean
highschool students learn the tenses of Chinese?” Due
to the limitations in financial resources and manpower,
priority should be given to a kind of research the
Research Methods in Language Issues ……………………………………
31
finding of which benefits the immediate community
where the researcher lives.
3. Managability
The research question should be practically
feasible to investigate. No matter how interesting or
relevant a research question is, it will be no good unless
it is managable, that is, it is not too broad or general to
be answered in a single research.
To be managable, a research question should be
narrowed down; its scope should be made limited so
that it becomes practically possible to conduct.
Types of research questions
There are three kinds of research questions:
1. descriptive questions
2. correlational questions
3. cause – effect questions
1. Descriptive questions
The purpose of a descriptive question is to
describe something. Such questions ask about the
duration, frequency, sequence of occurance, etc. of an
event. As an example, the question “In what order do
Iranian children acquire the tenses of persian?” is a
descriptive question, requiring an observation and
description of the sequence of tenses that appear in the
language production of Iranian children.
Chapter 3: Research question, research hypothesis and literature review …
32
2. Correlational questions
Correlational questions seek to find out the
relationship between two factors (variables). A typical
way of asking a correlational question is “what is the
relationship between A and B?” For example : “What is
the relationship between students’ knowledge of
History and their knowledge of English?”
Remember, however, that when there is a
relationship between two variables, it doesn’t mean that
one variable causes or influences the other. In the above
example, students’ knowledge of history neither
facilitates nor hinders their ability to learn English. In
other words, knowledge of History does not create
knowledge of English or vice versa. Perhaps both
factors are influenced by another factor, say,
intelligence.
3. Cause – effect questions
Cause–effect questions ask about the causal
relationship between two or more factors. They ask
about the effect of one factor on another. A typical
cause-effect question will ask “what is the effect of X
on Y”. For instance, “what is the effect of reading
practice on speaking ability?” is a cause-effect
question. Or in the example given above, intelligence
has a positive effect on the stuents’ learning of both
History and English. So, the question “what is the
effect of intelligence on learning English?” is a causeeffect question.
Research Methods in Language Issues ……………………………………
33
Research hypothesis
Having formed an appropriate research question,
a research hypothesis should be formulated. A
hypothesis is a tentative statement about the possible
outcome of research. It is a tentative answer to the
research question.
Hypotheses are of two kinds : non-directional
(null) hypotheses and directional (alternative)
hypotheses. When no relationship is predicted between
the independent and the dependent variables, the
hypothesis will be non-directional. A directional
hypothesis,on the other hand, predicts a relationship
(either positive or negative) between variables. Here is
the schematic representation of the classification of
research hypotheses:
Non-directional (null) Ho
Research
hypothesis
positive
directional (alternatnative) H1
negative
For several reasons, it is suggested that
researchers formulate nondirectional hypotheses
whenever possible. First of all, if the hypothesis is
directional, there may be ‘researcher bias’, i.e., the
researcher may unwittingly tend to support his/her own
claim. Second, there is the principle of varifiability. To
claim something and then present evidence to support
the claim is not always a scietific and logical way of
proving things. If something is to be taken as true, there
Chapter 3: Research question, research hypothesis and literature review …
34
should be no evidence to the contrary. Suppose
someone claims that human beings have only one hand.
Suppose further that to support his claim, he names one
thousand people each of whom has only one hand.
Does this mean that his claim is proven? Not a bit. To
prove that human beings have only one hand, there
should be no man with two hands. In such cases, a
single evidence to the contrary is sufficient to refute the
whole claim. Hence, in formulating hypotheses, it is
better to state a null hypothesis. If there was not enough
evidence to support the null hypothesis, this would
automaticaly be taken as evidence supporting the
directional hypothsis. Third, non-directional hypotheses
require a greater degree of accuracy than directional
hypothses. This will be further discussed in later
chapters.
Based on what was stated, one might jump to the
conclusion that directional hypotheses should not be
used and then ask, “What is the use of directional
hypotheses?” Nondirectional hypotheses should be
used whenever possible but not always. Directional
hypotheses have their own merits. Consider the
relationship between intelligence and language
learning. Nobody has ever claimed that there is no
relationship between intelligence and language
learning. Nor has anybody contended that there is a
negative relationship between the two variables. Every
normal person agrees that there is a positive
relationship between intelligence and language
learning. If somebody decides to conduct a study on
Research Methods in Language Issues ……………………………………
35
this question, the purpose will be just to determine the
extent of the positive relationship not to prove the
existence of the relationship. In cases like this, it makes
no sense to formulate a null hypothesis.
Review of literature
Having formed an interesting, relevant, and
managable research question, and having formulated an
appropriate research hypothesis, the researcher should
now comprehensively review all the previous
documents and research findings pertinent to the topic
under investigation. This process of documenting
related materials is called ‘review of the related
literature’. The word ‘literature’ refers to all the
previous information about a certain topic. Review of
literature is done for several reasons including the
following:
1. It helps the researcher to put the topic within a
scientific framework.
2. It helps the researcher to avoid mere reduplication
of previous research.
3. It helps the researcher to avoid inadecquacies of
previous research.
Chapter 4: Methods of educational research ……………………………
Chapter 4
Methods of Educational
Research
36
Research Methods in Language Issues ……………………………………
37
Introduction
After a comprehensive review of literature and
documenting the materials, the next step for the
researcher is to choose an appropriate research method.
It was mentioned earlier that the term ‘method’ refers
to the practical steps and procedures used in research.
Depending on the nature of the topic, the researcher
may select one of the following research methods:
1. Historical method
2. Descriptive method
3. Experimental method
1. Historical Method
Historical method of research is concerned with
a systematic way of obtaining, processing, and
evaluating data about past events. A number of points
need to be clarified about historical method. First of all,
historical method of research should not be confused
with literature review. In literature review, the
researcher collects already existing information about a
topic in order to put the topic within a scientific
perspective and to consolidate and support a theoretical
position. Historical research, on the other hand, is a
process in which the researcher forms questions,
formulates hypotheses, tests those hypotheses, and
supports and rejects them, much like other research
methods.
Second, since historical research deals with the
past events, there is no control over the variables. This
Chapter 4: Methods of educational research ……………………………
38
may lead some to believe that historical method is not
scientific since too many factors may have influenced
an event, all of which were out of the researcher’s
control. It needs to be noted, however, that the
historical method, like other methods of research, is a
scientific and systematic process in which the
researcher follows certain steps including these:
1. forming research questions
2. formulating research hypotheses
3. collecting data
4. criticizing the data
5. interpreting the findings
The third point to be mentioned about historical
research is that since events are not directly observed
by researchers, the relevant pieces of information are
obtained from some sources. These sources include:
1. Official records including laws, reports,
information compiled by universities and other
government agencies
2. Non-official records including personal records
(like letters, wills, diaries), oral traditions (like
folk tales), artistic remains (like paintings,
movies), published materials (like books), and
mechanical records (such as photographs).
3. Physical remains such as buildings, manuscripts,
etc.
The sources of information in historical research
are generally divided into two types: primary and
secondary. Primary sources of information include all
the information and documents left behind by actual
Research Methods in Language Issues ……………………………………
39
participants or witnesses of an event. Secondary
sources of information, on the other hand, include
documents and information provided by those who
were not present at the scene but somehow obtained the
information from other sources, especially primary
sources.
Needless to say, researchers should make sure
that the above-mentioned sources are genuine and
truthful before using them as sources of information.
They need to scrutinize and criticize the sources of
information, especially the secondary sources. This
criticism can be of two kinds: external and internal.
External criticism deals with determining the
authenticity of the document. It asks: ‘Is the document
genuine and real or not?’ On the other hand, internal
criticism seeks to establish the truthfulness and
accuracy of the content of the document. It attempts to
answer the question, ‘Is the content of the document
true and accurate?’
2. Descriptive Method
Unlike historical method, in which past events
are studied, in descriptive method the goal is to
describe and interpret the present status of a
phenomenon. The descriptive method of research
encompasses
three
major
kinds:
survey,
interrelational, and developmental.
Chapter 4: Methods of educational research ……………………………
40
I. Survey
Through surveys, researchers gather data by
directly giving questions to the participants. That is,
using questionnaires, interviews, observations, etc. they
obtain the data directly from the respondents.
II. Interrelational methods
The purpose of interrelational methods, as the
name speaks, is to investigate the relationship between
and among various factors. There are four kinds of
interrelational studies:
a. case studies
b. field studies
c. correlational studies
d. causal-comparative studies
a. Case studies
In case studies, researchers make a thorough and
intensive investigation of a case. The case can be an
individual or a small social unit. Much like survey
methods, in case studies, data are collected on a social
unit. However, case studies are different from surveys
since in surveys, data are collected on a few factors
from a large number of social units, whereas in case
studies, many aspects (factors) of a single case
(individual or a small group) are investigated.
Research Methods in Language Issues ……………………………………
41
b. Field studies
A field study is a kind of investigation in which
the investigator directly observes a naturally occurring
event. That is to say, the researcher does not manipulate
or interfere in the event, but simply observes the event
as it occurs naturally. When the event to observe is a
short one, or when measuring the duration is important,
the researcher may decide to observe the event for its
entire duration. This kind of sampling is called
‘continuous time sampling’. At other times, when it is
not either possible or important to observe the entire
duration of an event or behaviour, researchers may opt
for observing the event or behaviour only at the end of
specific time intervals. This kind of sampling is called
‘time point sampling’.
c. Correlational studies
Correlation means ‘relationship between two or
more factors’. In correlational studies, therefore,
researchers seek to investigate the degree of
relationship between two or more variables. It was
pointed out earlier that in correlational studies, a typical
way of asking questions is, “What is the relationship
between A and B?” Also remember that correlation
refers only to the existence of a relationship not to the
existence of a causal relationship.
d. Causal – comparative studies
Similar to correlational studies, causalcomparative studies are used to determine the extent of
Chapter 4: Methods of educational research ……………………………
42
the relationship between and among variables. One
difference between the two is that the latter is used to
investigate the cause-effect relationship between
factors; whereas, the former is used just to find the
existence of relationship. Another difference is that
while correlational studies typically involve two or
more variables and one group, causal – comparative
studies usually involve two or more groups with only
one independent variable. Moreover, causal –
comparative studies are similar to experimental
research in that they both involve comparisons, and
both establish causal relationships between variables.
The difference is that in experimental research,
researchers deliberately manipulate variables to create a
research situation, while in causal-comparative studies,
researchers have no control over the variables. They
study events after they have happened to discover the
causal relationships between factors. All research
methods in which researchers have no control over the
variables and cannot manipulate them, and study them
after the events have occurred, are known as ex-postfacto.
III. Developmental studies
Developmental methods of research are those
that are used by researchers to study the trend of
development of a behaviour or an event over time.
Researchers use developmental research to investigate
the changes that take place in phenomena in the course
of their development over time. For instance,
Research Methods in Language Issues ……………………………………
43
investigating language acquisition in children requires
that researchers observe the linguistic behaviour of
children for a relatively long period of time. This way,
they can find out the characteristics of child language at
different stages of development.
Developmental studies can be conducted in two
ways: longitudinal and cross-sectional. In the
longitudinal method, the behaviour of one or a few
subjects are observed and studied for a long time;
whereas in cross-sectional method, a greater number of
subjects (at different stages of development with regard
to the variable under investigation) are studied for a
shorter period of time. In case of the above-mentioned
example (child language development), the longitudinal
method requires a comprehensive study of a few
children for a long time. The cross-sectional method, on
the other hand, allows the researcher to select more
children at different age groups, and then study them
for a much shorter time to learn about the
characteristics of their language at different levels of
age.
3. Experimental method
Historical research is concerned with the study
of past events and descriptive methods mainly deal with
the description and interpretation of the current status
of phenomena. In none of these methods do researchers
have any control over the variables. They cannot
manipulate factors. In contrast, experimental method
requires experimentation. Researchers deliberately
Chapter 4: Methods of educational research ……………………………
44
manipulate and control variables and create situations
solely for the sake of research. Due to the significance
of this method of research and its common use in
educational settings, a separate chapter is dedicated to
the discussion of the experimental method.
In a nutshell, methods of research can be
summarized and schematically represented as follows:
Methods of research
Historical
Descriptive
Experimental
Survey
Interrelational
Developmental
Longitudinal cross-sectional
Case Field correlational causal-comparative
Research Methods in Language Issues ……………………………………
Chapter 5
Experimental Methods of
Research
45
Chapter 5: Experimental methods of research ……………………………
46
Introduction
It was pointed out in chapter four that historical
research studies the past events, and descriptive
methods tend to describe the present status of
phenomena and investigate the relationship existing
between variables. It was also reiterated that none of
these methods allow researchers to manipulate
variables so as to create conditions for studying the
cause-effect relationships between variables. The
experimental method of research allows researchers to
manipulate factors and exercise their control over
variables and the conditions under which they interact,
thus enabling researchers to make sound conclusions.
Meanwhile, due to certain rigid principles that
researchers follow, the experimental method of
research enjoys high validity, a concept to be discussed
later in this chapter.
It needs to be clarified that there are different
varieties of the experimental method. The advantages
cited above pertain to the true experimental method.
Other varieties of experimental method include preexperimental and quasi-experimental methods.
It should also be added that the reason why preexperimental and quasi-experimental methods are
considered experimental is that they involve
experimentation. Nevertheless, it is obvious that they
do not enjoy the same merits as the true experimental
method. A look at each of these varieties will tell us the
reason.
Research Methods in Language Issues ……………………………………
47
Pre-experimental method
To facilitate the understanding of the preexperimental method, let us begin with an example.
Suppose someone has devised a new method of
language teaching. The founder of this new method
contends that his method – named method A - works
quite well, and is superior to the current methods of
language teaching. Obviously enough, to accept such a
big claim we need evidence, and the person says he has
just what we need. He says he has practiced his method
in his class and all his students have got such good
grades that the average of the class is 19.
To summarize, to conduct research, this
researcher has selected a group of subjects, taught them
in his special method (this is called treatment and
represented as X), and has given them a test. Such a
design is called one-shot case study and can be
schematically represented as :
G
X
T
(one-shot case study)
Do you accept the researcher’s claim that his
method works better than the current methods and
should replace them? Of course not. Many factors other
than the method of teaching may be responsible for the
high average score of the subjects. For one thing, we
know nothing about the level of the students knowledge
before the treatment was given. No test was
administered to measure their entry behaviour. So, one
may logically argue that the subjects’ level of
Chapter 5: Experimental methods of research ……………………………
48
proficiency could have been high even without
instruction.
Convinced of the cogency of the argument, the
researcher repeats his research. This time, before
introducing the treatment, he administers a test (called
pre-test). Then the treatment is given followed by
another test (post-test). This design is called onegroup-pretest-posttest-design and is represented as :
G
T1
X
T2
Imagine that the average score on the pre-test and the
post-test turned out to be 12 and 19, respectively. The
researcher now comes to claim more strongly that his
method caused the high score on the post-test since the
average score on the pre-test was not that high. Are you
convinced now that method A is more effective than
the current methods of language teaching? Not yet.
There are still many problems. The researcher
did not compare his method with any other. How can
he conclude that his method is more effective than
other methods without any comparison? Who knows
what results the subjects could achieve if they were
taught in any other method?
To obviate this problem and to be able to make
comparisons, the researcher conducts the same study
using two groups. One group of subjects receive the
experimental treatment (are instructed via method A),
the second group are taught in another method (receive
an irrelevant treatment). The second group, receiving
the irrelevant treatment is called control group, and the
Research Methods in Language Issues ……………………………………
49
irrelevant treatment given to the control group is
referred to as placebo and represented as O. After the
instructional period, a test is administered to both
groups to compare their performance. What happened
can be summarized schematically as :
G1
X
T
G2
O
T
This design is known as intact group design. If the
average score of the subjects in the experimental group
is 19 and that of the control group 12, the researcher
will joyfully shout, “Didn’t I tell you? Not any method
can produce such good results as mine.” Well, indeed
he compared his method with another method, and
those instructed in his method achieved a level of
success far better than those in the other group. Now,
do you take this as evidence of the superiority of
method A over the other method? Only naïve people
do. Although there was comparison between the two
groups of subjects, was the comparison fair and just?
Were the members of the two groups at the same
proficiency level before experimentation began? We
know nothing about this. Problem again.
Adamant to win his case, the researcher
duplicates his study with one little difference. This
time, he administers a pre-test to both groups before
introducing the treatment and placebo. So, the design
will be something like this:
G1
T1
X
T2
G2
T1
O
T2
Chapter 5: Experimental methods of research ……………………………
50
This design is referred to as the pretest-posttestcontrol group design. Using this design, the researcher
can make sure that the experimental and control group
subjects were equal before the experiment. If the entry
behaviour (performance on the pre-test) of the two
groups was more or less similar but their terminal
behaviour (performance on the post-test) radically
different, the researcher would assert that the difference
is due to the effect of the treatment. Would you agree?
A careful person wouldn’t.
Despite the fact that the level of the subjects’
knowledge was gauged before and after the treatment
and a control group was also used, there is still reason
to doubt that the difference between the average scores
of the two groups is because of the effect of the
treatment. A critic may cogently argue that although the
proficiency level of the subjects was equal on the pretest, there may have been other differences between the
members of the two groups. One such difference could
be in their intelligence. When assigning the subjects to
the experimental and control groups, the researcher
may have (intentionally or unwittingly) assigned the
more intelligent subjects to the experimental group and
the less intelligent ones to the control group. In this
case, it can be contended that the better performance of
the subjects in the experimental group is not necessarily
because of the effect of the treatment, and can be
attributed to their higher level of intelligence. This
problem of selection bias can be solved only if the
Research Methods in Language Issues ……………………………………
51
subjects are randomly selected and assigned to the
experimental and control groups.
To put everything in a nutshell, the preexperimental method of research includes the following
four designs:
1. one-shot case study
G
X
T
2. one group-pretest-posttest
design
G T X T
3. intact group design
G1
G2
X
O
4. pretest-posttest-control group
design
G 1 T1 X
G 2 T1 O
T
T
T2
T2
Each of these designs is considered a kind of
pre-experimental method because it lacks one or more
of the characteristics of the true experimental method.
In other words, when one (or more) of the requirements
of the true experimental method is either deliberately
ignored or cannot be met, the method will be preexperimental. What are the characteristics of the true
experimental method? The following section will
answer this question.
True experimental method
True experimental method is the most strictly
controlled, the most systematic, hence the strongest
method of investigating phenomena. From the previous
discussions, it can be concluded that the true
Chapter 5: Experimental methods of research ……………………………
52
experimental method has a number of requirements.
These requirements, which may also be considered the
principles of true experimental method, include the
following:
1. There should be both pre-test and post-test.
2. There should be an experimental group to
receive the treatment and a control group to
receive placebo.
3. Subjects should be randomly selected and
assigned to the experimental and control groups.
If all these requirements are met, research will be
truly experimental. If some of these requirements are
deliberately ignored, or if they cannot be met, the
research method will be pre-experimental. And if some
of the requirements of the true experimental research
cannot be met, but the researcher attempts to make up
for their lack, the method will be quasi-experimental.
Quasi-experimental method
Sometimes, one of the requirements of the true
experimental method is missing, but the researcher tries
to compensate for the lack of the missing requirement.
In such cases, the method is called quasi-experimental.
Suppose a researcher is interested in investigating the
effect of a specific method (X) on language learning.
For this purpose, he needs two groups of subjects, one
to receive the experimental treatment and the other a
placebo. Suppose further that the researcher has access
to only one group of subjects. There is no group
Research Methods in Language Issues ……………………………………
53
available to act as the control group. But the researcher
does not wish to carry out a pre-experimental research,
so he thinks of a solution. He uses the same group of
subjects as both the experimental and control groups.
After a pre-test, he begins to introduce his treatment for
a week followed by a post-test. The following week, he
gives the pre-test again, but this time instead of giving
the treatment, he gives a placebo, and then the post-test.
During the third week, the treatment is given again
preceded by a pre-test and followed by a post-test. This
goes on in such a way that the treatment and placebo
are presented every other week. Finally, the researcher
compares the subjects’ attainment during the treatment
and placebo intervals. If the outcome is somehow
similar to the following graph, he can conclude (even in
the absence of a control group) that the treatment is
more effective than the placebo.
Figure 5.1
An equivalent time series experiment
showing the effect of the treatment
20
15
10
5
0
T1
X
T2 T3
O
T4 T5
X
T6 T7
O
T8
Chapter 5: Experimental methods of research ……………………………
54
This is called equivalent time series method.
Another common variety of the quasi-experimental
method is known as time series method. This method is
also used to make up for the lack of control group. The
researcher administers several pre-tests to capture the
trend of natural growth when there is no treatment
(instruction). Then he introduces the treatment and,
finally, gives a number of post-tests to get an idea of
the pace of progress after the treatment. The
comparison of the trend of changes in scores before and
after the treatment with that of the treatment period will
indicate the extent of the effect of the treatment.
Schematically, time series is represented as :
T1 T 2 T3 T 4
X
T 5 T6 T7 T 8
Figure 5.2
A time series experiment showing the effect of the
treatment
18
16
14
12
10
8
6
4
2
0
T1 T2 T3 T4
X
T5 T6 T7 T8
Research Methods in Language Issues ……………………………………
55
Validity in research
Thanks to its characteristics, the true
experimental method produces valid results. Validity in
research can be classified into two kinds: internal and
external. Internal validity refers to the extent to which
the finding of research is the result of the effect of the
treatment not any other factor. In other words, internal
validity is the degree to which the outcome of research
is due to the manipulations imposed by the researcher
not any other factor. Thus, if the effect of, say, ‘age’ on
language learning is being investigated, the
experimental and control group subjects should be
equal with regard to other factors that might influence
the outcome, such as ‘sex’, ‘linguistic background’,
‘intelligence’, etc. In simple terms, research will be
internally valid if all irrelevant factors are controlled
and the subjects of both groups are in equal conditions
except for the variable under investigation. External
validity refers to the generalizability of research
findings. It refers to the extent to which the outcome of
research can be applied to other similar situations.
Chapter 6: Collecting and summarising data ….……………………………
Chapter 6
Collecting and Summarising
Data
56
Research Methods in Language Issues ……………………………………
57
Introduction
In the previous chapters, it was explained that
research begins with research questions, based on
which hypotheses are formulated. Then comes the
literature review, which is a comprehensive review of
all the previous research done in the area under
investigation. Next, depending on the kind of research
question and the nature of the relationship between
variables, an appropriate method should be selected.
Having done all these, researchers should now begin to
collect data.
Data collection
To test their hypotheses, researchers need to
collect data, and to collect data they need to use some
techniques and instruments. The following instruments
may be used for data collection.
1. Questionnaires
Questionnaires include a set of questions to be
answered by the subjects. There are two kinds of
questionnaires : open-ended form and closed form.
Open-ended questionnaires contain a number of
questions which are answered by respondents based on
their own feeling. In closed form questionnaires, on
the other hand, respondents choose the answer from
among a certain number of given choices.
There are advantages and disadvantages in both
kinds. The major advantage of the closed form
questionnaires is that the choices are uniform; hence the
Chapter 6: Collecting and summarising data ….……………………………
58
responses are comparable. The problem is that none of
the given choices may actually reflect the true feeling
of the respondents. That is to say, when choices are
provided, the respondents are deprived of their freedom
to respond as they wish. In addition, the researcher’s
likes and dislikes may influence the kind of choices
provided. It is obvious that the advantages of the closed
form questionnaires constitute the disadvantages of the
open-ended
questionnaires
and
vice
versa.
Questionnaires can be distributed either directly or
indirectly, i.e., by post.
2. Observation
Another way of collecting data is for the
researcher to observe a phenomenon, behaviour, or an
event as it happens. Observation may be direct or
indirect. In direct observation, the researcher uses a
carefully prepared checklist to record data. The
advantage of this kind of observation is that it helps
researchers obtain objective data. The disadvantage
(especially when human beings are observed) is that the
obtained data may not be natural. In indirect
observation, the researcher observes an event or
behaviour without letting those involved know that they
are being observed. Therefore, the data obtained this
way are quite natural. However, indirect observation is
less systematic and objective than direct observation.
The use of indirect observation also needs to be
ethically and morally justified.
Research Methods in Language Issues ……………………………………
59
3. Interview
Still another way of collecting data is to conduct
interviews. Interviews can be conducted in two ways:
structured and unstructured. In structured interviews,
the interviewer prepares a list of questions in advance.
These questions are then posed one by one, and the
responses to each question do not influence the choice
of the following questions. This kind of interview
provides quantifiable, uniform, and comparable data.
The problem is that the answers given to earlier
questions sometimes make some questions redundant.
For instance, if the question “Are you married?” is
followed by the question “How many children do you
have?”, the second question will be totally irrelevant if
the answer to the first question is “No”. This may make
the interview somewhat unnatural.
Unstructured interview is more flexible, hence
more natural than the structured interview. Here,
questions are not prepared in advance. Rather, as the
interview goes on, relevant questions are posed in a
way that the response given to one question leads to
another question, and the answer to the second question
paves the way for a third, and so forth. The advantage
of unstructured interview is that it is more flexible and
more natural. The major disadvantage is that the
obtained data are not uniform and comparable since
every interviewee might be asked a different set of
questions.
Chapter 6: Collecting and summarising data ….……………………………
60
4. Inventories
Inventories contain a number of statements to
which the respondents respond by expressing the
degree of their agreement or disagreement. Inventories
are usually used to describe the attributes and feelings
of the respondents. At the end of each semester,
students may be asked to express their feelings towards
certain aspects of their courses including the timing,
quality of instruction, content, etc. This is an example
of an inventory; statements are presented and students
express the degree of their satisfaction with each aspect
of the course by choosing one of the given alternatives
including, for example, ideal, very good, good, average,
and poor. Then, each of these alternatives is weighed,
that is, assigned a point value.
excellent very good
good average
poor
5
4
3
2
1
Adding the numbers and computing the average point
determine the attitude of the students to each item.
5. Tests
Tests are among the most commonly used
instruments of data collection. Obviously enough, tests
should have certain characteristics if they are to be used
as data collection instruments. Among those
characteristics are validity and reliability. Validity
refers to the extent to which a test measures what it is
supposed to measure. So a test of mathematics is not
valid for gauging the knowledge of English. Reliability
has to do with the consistency of scores produced by
Research Methods in Language Issues ……………………………………
61
the test. It refers to the extent to which scores are
consistent over repeated administrations of the test.
6. Projective measures
There are times when, for one reason or another,
subjects consciously avoid providing the researcher
with honest responses. In such cases, projective
measures might be used. Projective measures are
measures taken by researchers to get information
indirectly out of the subjects without letting them know
what they are actually doing. For instance, ambiguous
questions are posed so that subjects do not know the
purpose of the question. This helps reduce conscious
dishonesty. Provided that the use of projective
measures is ethically justified, they are very useful for
obtaining the true feeling of subjects towards
something.
Summarising the data
The first thing to do after data are collected is to
summarise them. A pile of sheets with a score on each
is not easily interpretable, hence not very useful. To
summarise data, researchers tabulate the data, that is,
put the data in a table. Of course, tabulation of data
requires that data be coded. In previous chapters, we
discussed the measurement scales such as nominal,
ordinal, interval, and ratio. Depending on the kind of
scale employed, there will be various kinds of data
including nominal data, ordinal data, etc.
Chapter 6: Collecting and summarising data ….……………………………
62
If, for example, tests are used as the data
collection instrument, the researcher can summarize the
data by drawing a table with some columns, using
numbers to represent subjects (nominal data) in the first
column and listing their scores in the second column.
This is much more easily understandable and
comparable than having many sheets with only one
score on each. To make comparison easier, we can rank
the scores from the highest to the lowest or vice versa.
Sometimes, more than one subject may obtain a certain
score. In such cases, especially when the number of
subjects is large, we can summarise the data further by
writing each score only once and indicating – in a
separate column – its frequency, that is, the number of
times that score has appeared in the distribution.
Basic computations
There are some basic computations that can
make data more readily understandable. These
computations will also help us learn concepts that will
prove quite helpful in the later steps of conducting
research. To begin these computations, let us suppose
that the following interval data are obtained from a
group of 100 subjects and summarised as follows:
Research Methods in Language Issues ……………………………………
63
Table 6.1
Score
Frequency
20
3
19
5
17
6
16
10
15
25
14
30
12
11
11
5
10
3
8
2
Total
100
In the second column of table 6.1, the frequency of
each score is given. This kind of frequency is referred
to as simple frequency or absolute frequency. It
simply shows how many times a score has occurred in
the distribution.
Simple frequency may sometimes cause some
misunderstanding or misinterpretation. For instance,
suppose you are comparing two groups of subjects. In
group A, five students have obtained 19 while in group
B, fifteen students have got the same score. How would
you interpret this? A simple-minded person may
quickly conclude that group B is better. But if it turns
out later that group A has 15 members and group B
consists of 60 members, what then? The decision will
be reversed because in group A, five people out of
Chapter 6: Collecting and summarising data ….……………………………
64
fifteen (that is, one third) have got 19; whereas, in
group B, 15 out 0f 60 (i.e., one fourth of the group)
have obtained the same score. So, simple frequency is
insufficient for interpretation. We need to compute
relative frequency. Relative frequency shows the
frequency of each score in relation to the total number
of scores. It indicates how many people have got a
score out of how many. To compute relative frequency
(rf), absolute frequency is divided by the total number
of scores.
rf 
f
N
In table 6.1, we can see that three people out of 100
have got 20. So the relative frequency of score 20 is
.03. In order to avoid coming up with decimal fractions,
we multiply relative frequency by 100. The result is
called percentage (P) and shows how many people have
got a score in a scale of (out of) 100.
P = rf (100)
P
f
(100)
N
The newly – obtained pieces of information are
clearly more meaningful than the original raw data.
Table 6.2 contains a summary of the new data.
Research Methods in Language Issues ……………………………………
65
Table 6.2
Score
f
rf
P
20
3
.03
3
19
5
.05
5
17
6
.06
6
16
10
.10
10
15
25
.25
25
14
30
.30
30
12
11
.11
11
11
5
.05
5
10
3
.03
3
8
2
.02
2
Despite their value, none of these pieces of
information shows the position of a score in the
distribution. To know about the position of a score
among other scores, we need to compute cumulative
frequency. Cumulative frequency (represented as F) is
an index indicating how many scores are at the same
level and below a given score. To compute cumulative
frequency, we start at the bottom of the table and add
simple frequencies successively from bottom to top of
the table. So, in table 6.2, the cumulative frequency of
each score will be computed this way:
We start at the bottom row of the table and ask,
“How many scores are equal to or below 8?” The
answer, according to the table, is 2. Now we move up
to the next row and ask the same question, “How many
scores are equal to or below 10?” The answer is 5 since
three scores are equal to 10 and two scores are below.
Chapter 6: Collecting and summarising data ….……………………………
66
In fact, we obtained 5 by adding the simple frequencies
from the bottom upwards. The frequency of the lowest
score (8) was 2 and that of the second lowest score(10)
is 3; the sum of the two equals 5 (3+2=5).
Cumulative frequency has the advantage of
showing the rank of each score in the distribution. Yet,
like the simple frequency, it is subject to
misinterpretation. If you are comparing two scores in
two different distributions, and if the cumulative
frequency of one score is 10 (that is, the score is better
than or equal to 10 scores) and the cumulative
frequency of the other score is 20, which score has a
better standing? You might say, “Obviously, the score
with a cumulative frequency of 20. But again, it all
depends on the total number of scores in the
distribution. Supposing that the score with the
cumulative frequency of 10 belongs to a distribution
containing 20 scores and the score with the cumulative
frequency of 20 is in a distribution with a total number
of 80 scores, it will not be hard to see that the former
score has a much better position than the latter. The
former is equal to or better than 10 and worse than 10
others; it is somewhere in the middle of the distribution.
The latter is better than or equal to 20 but worse than 60
scores; it is somewhere in the bottom quarter of the
distribution.
To overcome such misunderstandings, we need
to make cumulative frequency relative, just as we did
with simple frequency. We divide cumulative
Research Methods in Language Issues ……………………………………
67
frequency by the total number of scores and call the
result relative cumulative frequency.
rF 
F
N
Since the result will be a decimal fraction, we
multiply relative cumulative frequency by 100, and the
outcome is called percentile.
Percentile = relative cumulative frequency (100)
Score
20
19
17
16
15
14
12
11
10
8
f
3
5
6
10
25
30
11
5
3
2
rf
.03
.05
.06
.10
.25
.30
.11
.05
.03
.02
Table 6.3
p
3
5
6
10
25
30
11
5
3
2
F
100
97
92
86
76
51
21
10
5
2
rF
1
.97
.92
.86
.76
.51
.21
.10
.05
.02
Percentile
100
97
92
86
76
51
21
10
5
2
Chapter 7: Displaying and describing the data ….…………………………
Chapter 7
Displaying
and Describing the Data
68
Research Methods in Language Issues ……………………………………
69
Displaying the data
Another step to make data more conspicuous is
to display the data graphically. Different kinds of
graphs can be used to display data. When the number of
scores is limited, bar graphs can be used. In bar
graphs, bars are used to represent data. As an example,
the simple frequency of the scores in table 6.3 can be
graphically represented like this:
Figure 7.1
A bar graph
30
25
20
Frequencie
15
s
10
5
0
8 10 11 12 14 15 16 17 19 20
scores
In the above figure, the horizontal axis shows the
scores, and the vertical axis represents the frequency of
the scores. A vertical bar represents the frequency of
each score. Instead of bars, it is also possible to mark
the top of each score (the intersection of each score and
its corresponding frequency), and to connect these
points together. The resultant graph is called a
frequency polygon (because it usually has several
angles).
Chapter 7: Displaying and describing the data ….…………………………
70
Figure 7.2
A frequency polygon
35
30
25
Freq.
20
15
10
5
0
8
10
11
12
14
15
16
17
19
20
scores
When the number of scores is large, the use of
too many bars may make the graph messy and a little
confusing. In such cases, instead of using bars to show
the frequency of every individual score, the scores are
divided into certain intervals, and the frequency of each
interval is shown by a column. For instance, if the
frequency of scores on a TOEFL test is to be
graphically presented, due to the wide range of possible
scores, a large number of bars are needed. Under such
circumstances, scores are divided into intervals, and
columns are used to show the frequency of each
interval. This is called a histogram.
Research Methods in Language Issues ……………………………………
71
Figure 7.3
A histogram
50
40
Freq. 30
20
10
scores
1
10
20
30
40
50
Frequency
Since frequency polygon is one of the most
widely used means of the graphic representation of data
and since frequent reference will be made to it in the
following chapters, a few points need to be explained
about it. First of all, as mentioned earlier, it is called a
polygon because it usually has several angles. This is
quite possible when data are obtained from a small
group of subjects. When data are obtained from a large
population, the frequency distribution will be
something like this:
Figure 7.4
A frequency distribution curve
Scores
Chapter 7: Displaying and describing the data ….…………………………
72
Frequency
Second, the above figure is a typical distribution curve,
also referred to as a normal distribution curve, and will
occur when data are obtained from a large and normal
population. The peak (the highest point) of the curve
indicates the most frequent score. The score that has the
highest frequency in the distribution is called mode.
In a normal distribution, the mode is at the centre of the
distribution, and the distribution is symmetric, that is,
the two sides of the graph are identical in shape. On the
other hand, when data are obtained from a specific and
small group of subjects, the mode may not be at the
centre. For example, when most of the scores in a
distribution are high and only a few are low or vice
versa, the peak of the distribution may shift toward the
right or the left side of the distribution. At such times,
we say the distribution is skewed. If a majority of
scores are high and the mode is on the right side of the
graph, and few low scores are the cause of skewedness,
the distribution curve is said to be negatively skewed.
Figure 7.5
A negatively skewed distribution
Scores
Research Methods in Language Issues ……………………………………
73
Frequency
Conversely, if most of the scores are low and few
positive scores cause the skewedness , the distribution
will be positively skewed.
Figure 7.6
A positively skewed distribution
Scores
Finally, it was said that a normal distribution curve has
a peak or most frequent score, which is called the
mode. In some less typical situations, there might be no
peak in the graph. This happens when all scores in the
distribution have the same frequency, and no score is
more frequent than others. Such a distribution is called
a flat distribution.
Chapter 7: Displaying and describing the data ….…………………………
74
Frequency
Figure 7.7
A flat distribution
scores
Describing the data
Apart from summarising and graphically
displaying data, a number of other steps should be
taken to make data more conducive to the conduct of
research. Altogether these steps are referred to as
describing the data. The purpose of these steps is to
describe some of the characteristics of distributions of
scores and to clarify certain concepts that are both
necessary and quite helpful to the conduct of research.
In any set of scores, there are minimum and
maximum scores, and the rest of the scores are between
the two extremes. Obviously enough, neither the
maximum nor the minimum score can be representative
of that distribution. If we are to use one score to
represent the distribution, a more likely candidate is a
score that is somewhere at the centre of the distribution.
At the same time, in any normal distribution,
there is a tendency towards the centre. In other words, a
majority of scores tend to be round the centre of the
distribution. This is also supported by logic and
intuition. Under normal circumstances how many
Research Methods in Language Issues ……………………………………
75
people, do you think, can get the perfect score in an
examination? Few. How many people get an extremely
low score? Again, just a few. Most people get scores
between the two extremes. This tendency of scores
towards the centre of the distribution is referred to as
central tendency.
Measures of central tendency
There are three measures of central tendency.
1. mode
2. median
3. mean
1. Mode
You are already familiar with mode. Mode
means fashion. So, mode is the score that is
fashionable, that is, the score that appears most
frequently in the distribution. In the following set of
scores, therefore, the mode is 5.
1, 2, 3, 3, 4, 5, 5, 5, 6, 6, 7, 8, 9, 9, 10
To find mode, all you have to do is to look at the simple
frequency of scores. The score with the highest
frequency is the mode. Sometimes, two adjacent scores
may have the same frequency as in the following set of
scores:
1, 2, 3, 4, 5, 5, 6, 6, 7, 8, 9
In the above distribution, both 5 and 6 are the most
frequent scores. In such cases, the average of the two
Chapter 7: Displaying and describing the data ….…………………………
76
most frequent scores will be the mode. Thus, in the
above distribution, the mode is 5.5. There are also times
when there are two most frequent scores in the
distribution that are not adjacent as in :
1, 2, 2, 3, 3, 3, 4, 5, 6, 6, 6, 7, 8, 8
In this distribution, 3 and 6 have the highest frequency,
but they are not adjacent. When this happens, the
distribution is said to be bimodal (having two modes).
Figure 7.8
A bimodal distribution
2. Median
Median is the score at the 50th percentile. It is
the score right in the middle of the distribution,
dividing the rank of scores into two equal halves so that
50 percent of the scores are above and 50 percent are
below the median. When the number of scores in the
distribution is odd, the middle score is the median as in
the following distribution in which the median is 5:
2, 2, 3, 4, 4, 5, 6, 7, 8, 8, 9
Research Methods in Language Issues ……………………………………
77
When the number of scores in the distribution is
even, median is the average of the two scores at the
centre of the distribution. In the following set of scores,
the median is 5.5 (the average of 5 and 6).
1, 2, 2, 3, 4, 5, 6, 7, 7, 8, 8, 9
3. Mean
Mean is the mathematical average of the scores.
To obtain mean, all the scores in the distribution are
added up and the result is divided by the total number
of scores.
X
X
N
In this formula, the symbol Σ (sigma) means ‘sum of’,
X is used to represent ‘scores’, and N is the number of
the scores in the distribution. When there are a large
number of scores, or when there are no extreme scores
(scores that are radically distant from the rest of the
scores) in the distribution, mean is the most reliable
measure of central tendency because both mode and
median are subject to rapid fluctuations.
Measures of variability
It was pointed out that there is a tendency
towards the centre in any normal distribution. At the
same time, in every distribution, there is variability or
differences among scores. Variability is every part and
parcel of mankind and the nature around him. So, it is
quite natural to have differences between and among
scores in a distribution. There are three measures of
Chapter 7: Displaying and describing the data ….…………………………
78
variability that show how scattered scores are. These
measures include:
1. range
2. standard deviation
3. variance
1. Range
Range refers to the difference between the
highest and the lowest score in a distribution. To obtain
range, we simply subtract the minimum score from the
maximum score.
R = Xmax – Xmin
In the following set of scores, range is 15.
2, 3, 4, 6, 9, 11, 12, 13, 13, 15, 17
R = Xmax – Xmin
R = 17 – 2 R =15
Although a very simple and easily obtainable
measure of variability, range has a major disadvantage:
it is sensitive to extreme scores. In other words, it is
subject to radical change because of just one extreme
score.
2. Standard deviation
In any distribution, there is a mean. Some scores
are above the mean and some are below. Deviation
refers to the difference between each score and the
mean.
D X X
Research Methods in Language Issues ……………………………………
Figure 7.10
Frequency
Figure 7.9
79
X1 X2 X3 X X4 X5 X6
3 5 7 10 13 15 17
-3 -2 -1 0 1 2 3
Standard deviation is the average of the
deviations. So, standard deviation is the average of the
differences between each score and the mean. To
compute standard deviation, first of all, the mean of the
scores is calculated. Then the mean is subtracted from
each score. Next the difference between each score and
the mean is squared. The squared deviations are then
added up and divided by their total number minus one.
The square root of the outcome is standard deviation.
These steps are summarised in the following formula:
( X  X ) 2
SD 
N 1
This formula requires that the difference
between each score and the mean be computed
separately, every deviation score be squared, the
squared deviations be added up, and then divided by N
– 1. According to statisticians, standard deviation can
also be computed from the raw scores using the
following formula:
Chapter 7: Displaying and describing the data ….…………………………
S
80
 X 2  [(  X ) 2 / N ]
N 1
In this formula, Σx2 refers to the sum of the squared
scores, and means that all scores should be squared and
then added up; whereas, (Σx)2 refers to the sum of
scores squared, and means that all scores should be
added up and the outcome should be squared.
3. Variance
Once we have standard deviation, we can easily
obtain variance. All we need to do is to square Standard
deviation. To compute variance, therefore, the
following formula is used:
( X  X ) 2
V
N 1
Both standard deviation and variance are among the
most versatile statistical concepts that are used in a
variety of statistical operations.
An example may help to further clarify the way
standard deviation and variance are computed. Table
7.1 contains a set of scores ranked from the highest to
the lowest.
Research Methods in Language Issues ……………………………………
X
19
18
17
16
15
15
14
13
12
11
Σx =
150
Table 7.1
X-X
19 – 15 = 4
18 – 15 = 3
17 – 15 = 2
16 – 15 = 1
15 – 15 = 0
15 – 15 = 0
14 – 15 = -1
13 – 15 = -2
12 – 15 = -3
11 – 12 = -4
Σ (X - X ) =
0
81
(X - X )2
16
9
4
1
0
0
1
4
9
16
Σ(X - X )2
= 60
To compute variance, as it was said earlier, we
need to calculate the mean. So, the scores are added up
(the sum is 150) and the result is divided by the total
number of the scores (10). The outcome is 15. Now the
mean should be subtracted from each score separately.
The second column of table 7.1 contains the deviations.
As it can be seen, the total sum of the deviation scores,
Σ ( X - X ), always equals zero since the positive and
negative values cancel each other out. To avoid this
dead end, we square the difference between each score
and the mean. The third column in table 7.1 includes
the squared deviations. The squared deviations are
added up and then divided by N – 1. Here are the
computations:
Chapter 7: Displaying and describing the data ….…………………………
V
82
( X  X )2 60

 6.66
N 1
9
Standard deviation is the positive square root of
variance. So:
S = √v
S = √6.66
S = 2.58
Research Methods in Language Issues ……………………………………
Chapter 8
Standardized Scores
83
Chapter 8: Standardized scores ………………….…………………………
84
Introduction
In the previous chapters, it was pointed out that
to conduct research, researchers begin with research
questions, formulate hypotheses, review the related
literature, and then gather data to test their hypotheses.
After analysing the data, researchers arrive at
conclusions and make interpretations. It was also held
that raw data are not very interpretable per se. That is
why a number of basic statistical operations were
presented to make data more easily and consistently
interpretable. Due to their nature, raw data are subject
to misinterpretation. Consider the following example to
see how raw data can be misleading.
Suppose Rose and Mary are twin sisters. They
are in the same grade but study in different classes.
They have taken a test and obtained the following
scores:
Rose 14
Mary 19
Which one do you think has got a better score?
You might be tempted to say, “Well, it is obvious.
Mary’s score is much better”. But what if Rose’s score
is out of 20 and Mary’s score out of 40? It becomes
clear that Rose has got a better score than Mary. Now
imagine that in Rose’s class, the exam was very easy
and most of the students got better scores than Rose,
and the mean score of the class was 17; whereas, in
Mary’s class, because of the difficulty of the exam, the
mean of the class was 16 and only a few students got
better scores than that of Mary. In other words, Rose is
quite below the average while Mary is well above the
Research Methods in Language Issues ……………………………………
85
average of their classes. Who has got a better score
now? You see that our interpretation of Mary’s and
Rose’s scores as to which one is better shifts from
Mary to Rose, then back to Mary, and so forth. These
fluctuations make our interpretations unreliable. We
need scores that are understandable and interpretable in
a reliable manner without recourse to other pieces of
information. Such scores are called standardized scores
or simply standard scores. Z-score and T-score are two
examples of such scores.
Before talking about standard scores, however, a
few points need to be explained about the normal
distribution curve.
Frequency
The normal distribution curve
In chapter 7, reference was made to the normal
distribution curve. It was mentioned that with large
data, the frequency distribution curve will look
something like the following figure:
Figure 8.1
A normal distribution curve
Scores
Chapter 8: Standardized scores ………………….…………………………
86
The normal distribution curve, also referred to as
the bell-shaped curve, has four characteristics. First of
all, it is unimodal, that is, it has only one mode or most
frequent score. So, a bimodal distribution is not a
normal distribution. Second, it is symmetric. This
means that the mode is at the centre of the distribution,
and the two sides of the distribution above and below
the mean are identical. In other words, the shape of the
distribution curve above the mean is exactly like that of
below the mean. If you fold the distribution curve, the
two sides of the distribution cover each other up
perfectly. The third characteristic is that since the
distribution is symmetric, mode, median, and mean
are all the same and equal in value. The fourth
characteristic is that the normal distribution is
asymptotic, that is, the tails of the curve never meet the
horizontal line. It implies that the probability of no
score is zero.
Apart from the above-mentioned characteristics,
the normal distribution has a general characteristic,
which was discussed in the previous chapters: central
tendency. Namely, a majority of the scores tend to be
round the mean. The more we move away from the
mean, the fewer the scores become. But to what extent
are the scores centred round the mean? On the basis of
the assumptions underlying the normal distribution
curve, and making use of the concept of standard
deviation, statisticians have been able to prepare tables
representing the proportion of area under the normal
curve. Using standard deviation as the yardstick, these
Research Methods in Language Issues ……………………………………
87
-3
-2
–1
X
Scores
1
.0.228
.1359
.3413
Frequency
tables show what percentage of scores fall between the
mean and a given score depending on how many
standard deviations that score is away from the mean.
For instance, according to statisticians, almost
34% of the scores in a normal distribution are between
the mean and one standard deviation away from the
mean (either above or below). So, if a score is one
standard deviation above the mean, it means that it is
better than about 84% of the scores in the distribution.
Because according to what was said, around 34% of the
scores are between the mean and that score. We also
know that 50% of the scores fall below the mean
(because the distribution is symmetric). So, the score in
question is better than around 84 percent of the scores
in the distribution.
Figure 8.2
2
3
Chapter 8: Standardized scores ………………….…………………………
88
Standardized scores
At the beginning of this chapter, it was made
clear that raw scores are not fully interpretable per se
because to know the real value of a score and its
position among other scores, we need other pieces of
information. Meanwhile, we learnt above that if we
knew the distance between each score and the mean in
terms of standard deviation, we could know the
position of that score in the distribution.
To put everything in a nutshell, to know about
the position of scores in a distribution, instead of raw
scores, we need to represent scores in a way that
indicates how many standard deviations that score is
either above or below the mean. To this end, Z-score is
just what we need.
Z-score is a standard score indicating how many
standard deviations a given score is from the mean. It
shows the difference between a given score and the
mean in terms of standard deviation. The Z-formula is
as follows:
Z
XX
S
Supposing that the mean score of a class is 15
and standard deviation is 2, if one of the students of the
class has got 17, we can turn the raw score into Z-score
this way:
s=2
x = 17
Z =?
X = 15
Z
X X
S
Z 
17  15
1
2
Research Methods in Language Issues ……………………………………
89
This means that the student with a raw score of
17 has a percentile rank of 84, that is, s/he is better than
about 84 percent of the students in the class.
Once we have the Z-score, the raw score can be
easily computed provided that the mean and standard
deviation are known. To obtain the raw score from the
Z-score, the Z-formula can be turned into the following
formula:
Z
XX
S
X  Z (S )  X
For example, in a class with a mean of 14 and standard
deviation of 1.5, the raw score of a student with a Zscore of –1 is calculated as follows:
X  Z (S )  X
x = -1(1.5) + 14
x = 14 – 1.5
x = 12.5
A couple of important points need clarification
here. The first point is that if the Z-score is a whole
number, the case is relatively straightforward. Figure
8.2 tells us what proportion of scores fall between mean
and the Z-scores of 1, 2, and 3 (as well as –1, -2, and –
3). What if the distance between a score and the mean
is not a whole number but a fraction of standard
deviation? Consider the following example:
X = 12
s = 1.5
x = 11
Chapter 8: Standardized scores ………………….…………………………
90
Z=?
Z
X X
S
Z 
11  12  1

 .66
1.5
1.5
In this example, the score 11 is 0.66 of a
standard deviation below the mean. Earlier, it was
mentioned that statisticians have prepared tables
showing the proportion of area under the normal curve.
These tables indicate the proportion of scores between
any Z (whole number or fraction) and the mean. A copy
of one such table showing these proportions is given in
appendix 1. Using the table in appendix 1, we can see
that the proportion of scores between mean and the Zscore of -0.66 is .2454. This shows that 24.54 percent
of the scores are between our Z-score and the mean.
You remember that 50 percent are above the mean. So,
altogether, around 74.54 percent (24.54 %+ 50%) of the
scores are better than the score in question, and the
remaining 25.46 % are worse than it. So, the percentile
rank of the student with such a score is 25.46.
Another point to clarify is that when Z-score is
negative, the proportion of area under the normal curve
will be the same as those for the positive values of Z.
Only the interpretation differs. For instance, when the
Z-score is –1, it means that the score is one standard
deviation below the mean. The proportion of scores
between that score and the mean is .3413, but since the
score is below the mean, it is worse than 84.13%
(34.13% + 50%) of the scores.
Research Methods in Language Issues ……………………………………
91
One little problem with Z-score is its low
magnitude. A student with a Z-score of +1 is better than
about 84% of the class members, yet his/her Z-score is
only one. To obviate this, in the following formula,
some fixed values can be conventionally used for mean
and standard deviation to increase the magnitude.
X  Z (S )  X
In Z-score, the mean is considered to be zero and the
standard deviation is supposed to be one. In T-score,
which is another standard score, mean and standard
deviation are conventionally considered to be 50 and
10, respectively. Thus the T-formula is:
T = Z (10) + 50
Therefore, a Z-score of +2 is equal to a T-score of 70.
T = 2(10) + 50
T = 70
Chapter 9: Probability estimation and hypothesis testing ………………….
Chapter 9
Probability Estimation and
Hypothesis Testing
92
Research Methods in Language Issues ……………………………………
93
Introduction
In chapter 8, the concept of Z-score was
explained, and its advantages over raw scores were
discussed. Apart from this, Z-score has a significant
application in inferential statistics (in hypothesis testing
and probability estimation).
There are two kinds of statistics: descriptive and
inferential. In descriptive statistics, the goal is to
summarize and describe data obtained from a group of
subjects in order to learn about the characteristics of
that group of subjects. An example would be to
describe and summarize data pertaining to
characteristics (such as age, linguistic background,
marital status, etc.) of the students of a university.
On the other hand, most of the time, researchers
are not really interested in discovering characteristics of
a limited group of subjects. Even when studying a small
group of subjects (called a sample) selected from a
large group (referred to as population), their aim is to
learn about the characteristics of the population, not
just the sample itself. For example, if a researcher
intends to investigate the effect of age on language
learning, although for manageability reasons s/he
usually selects a limited group of subjects, his/her
ultimate aim is to generalize the findings to the whole
population of young and old learners. In other words, in
inferential statistics, researchers describe and
summarize data obtained from a sample to learn about
one or more characteristics of the sample. Any
characteristic of a sample is referred to as a statistic.
Chapter 9: Probability estimation and hypothesis testing ………………….
94
Then, they generalize their finding and make inferences
about the characteristic of the population, which is
called parameter. In short, descriptive statistics
studies the characteristics of a sample (statistic);
whereas, inferential statistics aims at discovering the
characteristics of the population (parameter). One can
conclude, therefore, that any characteristic described
through descriptive statistics is a statistic, and any
characteristic described and inferred though inferential
statistics is a parameter.
Inferential statistics, by its very nature, is based
on probability. Because we do not study the whole
population, we can never be 100 percent sure about the
characteristic of the population. Rather, we only guess a
parameter based on our knowledge of a statistic.
The Z-formula has applications in inferential
statistics. Using the Z-formula, we can estimate the
probability of an individual score belonging to a sample
as well as the probability of a sample belonging to a
population. Thus, the concept of Z can be very helpful
in testing hypotheses.
Probability
Probability is defined as the proportion of
desired event to the total number of possible outcome
multiplied by one hundred.
Number of desired events
Probability
(100)
Possible outcome
Research Methods in Language Issues ……………………………………
95
As an example, consider a coin. A coin has two sides:
heads and tails. If you toss a coin, the probability of the
coin coming down heads up is 50% because the number
of desired event is one and the total number of possible
outcome is two.
Probability = ½ (100) = 50 %
Or in a multiple-choice item with four alternatives, the
probability of getting the answer right without any
knowledge is 25%.
Probability = ¼ (100) = 25%
But getting at the probability of an event is not
always so straightforward. Think of a class with a mean
of 15 and standard deviation of 2. What is the
probability of score 13 existing in such a class? In cases
like this, certain computations are needed. It is here that
the Z-formula can be made use of.
The probability of a score belonging to a
distribution
Using the Z-formula and the table of the
proportion of area under the normal curve (appendix 1),
we can estimate the probability of a score belonging to
a distribution if we have only the mean and standard
deviation of the distribution. In the above-mentioned
example, the mean of the distribution was 15 and
standard deviation was 2.
X = 15
S=2
x = 13
Probability of x = ?
Chapter 9: Probability estimation and hypothesis testing ………………….
96
To compute the probability of the score 13
belonging to the distribution with the above-mentioned
characteristics, the first step to take is to convert this
raw score into the standard Z-score. The corresponding
Z-value of 13 is –1 because:
Z
XX
S
Z
13  15  2

 1
2
2
-3
-2
–1
34.13 %
15.78 %
Frequency
Figure 9.1
X
1
2
3
Scores
The moment we notice the Z-score is negative,
we know that such a score (13) is below the mean. So,
it cannot be in the positive side of the distribution. If
such a score is to be found in the distribution, it has to
be in the negative side of it. Since the normal
distribution is symmetric, we are sure that 13 cannot be
among the 50% of the scores that are above the mean.
On the other hand, appendix 1 tells us that 34.13% of
Research Methods in Language Issues ……………………………………
97
the scores are between the mean and the Z of 1. So 13
cannot be there. There remains the area beyond the Zscore of 1. According to appendix 1, the area beyond Z
–f 1 is .1587. Therefore, if 13 is to be found in the
distribution, it has to be in the area beyond the Z-score
of 1. This means that the probability of score 13 in a
distribution with the given characteristics is 15.87%.
Let us have another example. What is the
probability of score 16 in a distribution with a mean of
13 and standard deviation of 2?
X = 13
S=2
X = 16
Z
Probability of x =?
Z = 1.5
Figure 9.2
6.68 %
16  13
2
43.32 %
Z
XX
S
X
1.5
Chapter 9: Probability estimation and hypothesis testing ………………….
98
Appendix 1 tells us that the proportion of area between
the mean and the Z-score of 1.5 is .4332 and the area
beyond Z is .0668. Based on what was said earlier, we
know that score 16 should be in the positive half of the
distribution because the Z is positive. Out of the 50% of
scores in the positive side of the distribution, 43.32 %
are between the mean and the Z-score of 1.5. So, the
probability of 16 belonging to such a distribution is the
remaining 6.68% (the area beyond).
The probability of a mean score belonging to a
population
The Z-formula can also be used to estimate the
probability of a mean score belonging to a population.
An example of a situation that requires the estimation
of such probability would be as follows:
Suppose you have a group of 49 university
students. You intend to teach them general English in a
newly devised method (called A), which you guess is
more effective than other current methods. After doing
all the prerequisites, you introduce the treatment
(instruction in method A) and then give your subjects a
test. Suppose further that the mean score of your class
on a proficiency test such as TOEFL turns out to be
510. Now, you want to compare your class with the
whole population of Iranian university students who are
currently taking their general English course. Given
that you know the grand average of the population
(represented as μ) on the proficiency test is 500, you
might be tempted to conclude that your group of
Research Methods in Language Issues ……………………………………
99
subjects has outperformed the population, and that your
method has been effective.
However, great care should be exercised in such
cases. The sheer fact that the mean of your group is
above the population mean does not in any way
guarantee that the treatment has been effective.
Remember that variability is one of the characteristics
of any normal distribution. In other words, in any
normal distribution, there are scores above and below
the mean. This means that although your sample mean
is higher than the population mean, the sample may still
belong to the normal distribution. In that case, you
cannot claim that your treatment is effective, because
there will be no meaningful difference between the
sample mean and the population mean.
To see if the sample mean belongs to the
distribution of means in the population, the following
formula is used:
ZX 
X 
SX
Notice that this formula is the same as the Z-formula
we had earlier.
Z
XX
S
In the new formula, instead of having individuals, we
have groups (samples). So, instead of single scores, we
have a distribution of means. These mean scores have a
Chapter 9: Probability estimation and hypothesis testing …………………. 100
grand average (the mean of means) that is represented
as μ.
Figure 9.3
X1 X 2 X 3  X 4 X 5 X 6
In other words, in the new formula, X stands for the
sample mean, μ represents the population mean, and
S X represents the standard deviation of means.
Standard deviation of means poses a problem.
To compute it, you need to have all the sample means
in the population, and then apply the standard deviation
formula. The problem is that you have only the mean of
your sample and the mean of the population. You have
no clue as to what other mean scores are. So, you
cannot compute S X .
Still, we know that standard deviation is
inversely proportional to the number of scores in a
distribution. Statisticians contend that the standard
deviation of population can be estimated from the
sample mean using the following formula:
Research Methods in Language Issues ……………………………………
SX 
101
SX
n
Now you have every thing you need to apply the new
Z-formula to test the effect of your new method (A).
Here are the summary of the data and the calculations:
S = 35
X = 510
N = 49
μ = 500
X 
ZX 
SX 
SX
SX

35
5
n
49
510  500
ZX 
2
5
Figure 9.4
47.42%
μ
2.28%
2
The Z-table in appendix 1 indicates that the percentage
of scores between mean and the Z-score of 2 is 47.72%.
So you learn that your sample mean is neither in the
negative half of the distribution nor in the 47.72% of
scores between the mean and the Z of 2. In other words,
you are 97.72 percent sure that such a mean (510) does
Chapter 9: Probability estimation and hypothesis testing …………………. 102
not belong to this population. The probability of your
mean belonging to the population is 2.28%.
What does this mean? Does it mean that your
group does not belong to the population and is better
than it? Not necessarily. Remember that there is 2.28%
chance that your group is not different from the
population and belongs to the normal population. On
the other hand, as it was said earlier, in inferential
statistics, nothing can be proven with 100 percent
certainty; there is always room to make mistakes. How
should this result be interpreted then?
It depends on the extent of the error that can be
tolerated. The extent of the mistake to be tolerated
depends, in turn, on the importance and significance of
research and its effect on mankind. In medical research,
for example, one has to be much more sensitive to
mistakes than in, say, language research. In medical
research, the researcher cannot conclude that a certain
drug is effective for a certain illness if s/he is only 97
percent sure. There is 3 percent chance that it might
endanger the health or even the life of people. So, in
areas like medical research, researchers need to
exercise a lot of care. In language issues, researchers
have agreed on two levels of mistake to be tolerated:
1% and 5%. The extent of the possible mistake to be
tolerated is referred to as the level of significance and
is represented as α. Level of significance is expressed in
terms of the proportion of area under the normal curve.
Hence, the level of significance corresponding to one
percent mistake is .01 and that of 5% mistake is .05.
Research Methods in Language Issues ……………………………………
103
When
formulating
research
hypotheses,
researchers should decide on the level of significance of
their study based on the importance it has. It is
important to remember that they cannot change the
level of significance later. In the above–mentioned
example, if you chose the .05 level of significance (5%
mistake), you could now safely conclude that your
group is significantly different from the normal
population and does not belong to it. Whereas, if you
chose the .01 level of significance (1% mistake), you
couldn’t make such a claim because you allowed for
only one percent of possible mistake, but now you have
2.28 % of possible mistake.
Testing Hypotheses
In chapter 3, it was explained that research
hypotheses are either directional or non-directional
(null). It was also explained that when a directional
hypothesis is formulated, the researcher is certain about
the direction of the relationship between two variables;
only the extent of the relationship is to be determined.
For instance, when a researcher hypothesizes that there
is a positive relationship between intelligence and
language learning, s/he has nothing to do with the
negative side of the distribution. Rather, s/he only
wants to locate the position of his/her experimental
group in the positive side of the normal distribution.
Chapter 9: Probability estimation and hypothesis testing …………………. 104
Figure 9.5
A one-tailed distribution
5%
Such a distribution is one-tailed. Supposing that
the level of significance is .05, all of it falls on one side
of the distribution. On the other hand, when the
hypothesis is non-directional, the distribution is twotailed.
Figure 9.6
A two-tailed distribution
2.5%
2.5%
In a two-tailed distribution, because the direction
of the relationship cannot be predicted (it may be both
positive and negative), the extent of the tolerated error
Research Methods in Language Issues ……………………………………
105
(level of significance) should be equally divided
between the two tails. Thus, if the level of significance
is .05, only 2.5% of possible mistake can be tolerated
on either tail of the distribution instead of 5%.
In chapter three, it was held that for a number of
reasons, the null hypothesis should be used whenever
possible. Now you can easily understand the last
reason. In non-directional hypotheses, the extent of the
possible mistake is actually half that of directional
hypotheses.
To summarize, in the early stages of conducting
research, when formulating hypotheses, you should
decide on the level of significance and stick to it
throughout your research. You should also take into
account the kind of the research hypothesis. If the
hypothesis is directional, the entire possible mistake is
considered in one tail of the distribution; if the
hypothesis is non-directional, the extent of the tolerable
mistake is reduced to half. Next, you should find – in
the Z-table – the value you need to claim that the
sample (the experimental group) is meaningfully and
significantly different from the population and does not
belong to it. This Z-value is called the critical Z-value.
For instance, if you have a directional hypothesis and
the level of significance is .05, you must be 95% certain
in order to prove your claim. This means that the area
beyond Z in the Z-table should not be more than .05.
The Z-value corresponding to .05 level of significance
in case of the directional hypothesis is 1.65. Your
observed Z value must be equal to the critical Z value
Chapter 9: Probability estimation and hypothesis testing …………………. 106
or exceed it so that you can prove your claim. An
example may help clarify the point.
Example
To study the effect of a treatment (A) on
language learning of a group 36 subjects, having
observed all other requirements, you have administered
a post-test and the following data have been obtained.
Supposing that the population mean is 150, test your
null hypothesis at .01 level of significance.
s = 18
X = 159
N = 36
µ = 150
α = .01
hypothesis = non-directional
Prior to doing any computations, let us
determine the critical Z value. Since the hypothesis is
non-directional, we should halve the extent of mistake.
Namely, we can have only half a percent of mistake.
Hence, the area beyond Z should be .005. Using the Ztable in appendix 1, we find the Z-value corresponding
to the above index. According to the table, the critical Z
value is 2.58. Now we compute the observed Z. If it is
equal to or greater than the critical Z-value, the null
hypothesis is rejected and the treatment is effective;
otherwise, the null hypothesis is supported and the
treatment has no significant effect.
SX 
18
3
36
159  150
ZX 
3
3
Research Methods in Language Issues ……………………………………
107
Since the observed Z-value exceeds the critical Z
value, the null hypothesis is rejected, and it is
statistically proven that the treatment is effective.
Chapter 10: Comparing two means (T-test) ………….………………….
Chapter 10
Comparing Two Means
(T-test)
108
Research Methods in Language Issues ……………………………………
109
Introduction
In chapter 9, we discussed the use of Z in
comparing a sample mean with a population mean. The
Z procedure is used when there is only one sample
group. More commonly, however, researchers use more
than one group. Remember that one of the
characteristics or principles of the true experimental
research is the existence of experimental and control
groups. So, there are times when researchers select two
sample groups belonging to the same population, use
one as the experimental group and the other as the
control group, and then compare the mean scores of the
two samples. In such situations, T-test is used.
Comparing two sample means
Suppose we have decided to conduct a research
on the effect of sex on the Iranian university students’
language learning. For this purpose, we obviously need
two groups randomly selected from the population: a
group of male subjects and a group of female subjects.
We give both groups instruction for a certain period of
time and under identical circumstances. Finally, we
give a post-test to both groups and obtain the raw data.
The procedure to compare the two sample means
is much like the Z procedure. Suppose that the obtained
raw data are as follows:
G1 = female
G2 = male
n = 25
n = 25
s=3
s=4
X = 17.5
X = 16
Chapter 10: Comparing two means (T-test) ………….………………….
α = .05
110
hypothesis = null
To see if there was a statistically significant
difference between a sample mean and a population
mean, we used the following formula:
X 
ZX 
SX
You remember that S X refers to the standard deviation
of the means and is estimated this way:
SX 
SX
n
The gist of the mater is that to check the
difference between the two means, in the previous
chapter, we subtracted one of the means from the other
and divided the result by the standard deviation of
means. More or less the same thing is done in T-test.
The formula to be used here is:
tobs 
X1  X 2
S ( X 1  X 2)
In this formula, S ( X  X ) refers to the standard deviation
of the difference between the means and is computed
using the following formula:
1
2
2
 S   S 
S ( X 1 X )   1    2 
2
 n1   n2 
2
Research Methods in Language Issues ……………………………………
111
In the case of our example data, the computations are as
follows:
2
2
 S   S 
S ( X 1 X 2 )   1    2  =
 n1   n2 
2
2
 3   4 
 
S ( X 1 X 2 )  
 =
 25   25 
9 16
= 1 =1

25 25
Once we have the denominator, we can put it in the t
formula and obtain the observed t value:
t obs 
X 1  X 2 17.5  16

 1.5
S ( X1  X 2 )
1
Now that we have the observed t value, we
should check it against the critical t value. Just like the
Z procedure if the observed t value turns out to be equal
to or greater than the critical t value, the difference
between the two means is said to be statistically
significant. Then the treatment is effective and the null
hypothesis is rejected. On the other hand, if the
observed t value is smaller than the critical t value, the
null hypothesis is supported and the treatment is said to
have no significant effect. But how is the critical t value
obtained?
Chapter 10: Comparing two means (T-test) ………….………………….
112
The critical t value
Obtaining the critical t value is a little different
from obtaining the critical Z value. In the Z-procedure,
we just decided on the level of significance, and
depending on whether the hypothesis was directional or
non-directional, we found the Z value corresponding to
the proportion of area beyond Z. In the T-test
procedure, another table is used (see appendix 2). The
table of critical t value has two dimensions. The first
dimension indicates the different levels of significance
for one-tailed tests (directional hypotheses) and twotailed tests (non-directional hypotheses). The second
dimension shows the degrees of freedom, which will be
explained below. The intersection of the level of
significance and the degree of freedom shows the
corresponding critical value of t.
Degree of freedom refers to the number of
quantities that can be freely determined. In the
following equation, for instance, there are three
parameters:
A+B=C
Out of the three parameters, however, only two can be
freely chosen. Once we have assigned values to A and
B (e.g., A = 3, B = 5), the value of C is already
determined. It has to be 8, and we are not free to choose
a value for it. Similarly, in the following equation, we
are free to choose values for only three parameters out
of four:
A+B–C=D
Research Methods in Language Issues ……………………………………
113
Basically, out of three parameters, our degree of
freedom is two; out of 4 parameters, it is 3.Out of N
parameters, the degree of freedom is N-1.
To determine the degree of freedom, we apply
N – 1 to each group of subjects. In our example case,
each group included 25 members. So the degree of
freedom in each group would be 24 (25 – 1). Since we
had two groups, our total degree of freedom would be
48. If there were no such number in the t-table
(appendix 2), then the closest number to it would be the
degree of freedom.
With these pieces of information, the critical tvalue can be easily found. As for our example,
considering the column showing the .05 level of
significance for two-tailed test and taking into account
the degree of freedom (48), we can see that the critical t
value is 2.021.
Now we compare the observed t value with the
critical t value. The observed t value (1.5) is smaller
than the critical t value, suggesting that the difference
between the two groups is not statistically significant,
and that the treatment is not effective and the null
hypothesis is supported. Let us summarise these in
another example.
Example:
To test the effect of a treatment, you have given
a post-test to two groups. The following data are
obtained. Supposing that every thing is all right, test
your null hypothesis at .05 level of significance.
Chapter 10: Comparing two means (T-test) ………….………………….
G1
n = 36
s = 24
X = 150
α = .05
114
G2
n = 25
s = 15
X = 140
hypothesis = null
Before doing the computations, let us determine
the critical t value. The degree of freedom is 35 + 24 =
59. The closest number to this in the t-table is 60. The
critical t-value corresponding to 60 at .05 level of
significance in the two-tailed test is 2.00. tcrit = 2.00
2
2
 24   15 
S( X1X 2 )  

 (4) 2  (3) 2  25  5


 36   25 
150  140
tobs 
2
5
Since the observed value is equal to the critical
value, we can statistically reject the null hypothesis and
conclude that the treatment is effective.
Matched t-test
In the preceding section, the means of two
independent samples were compared. For this reason,
the t-test procedure described above is also known as
the independent t-test.
There are times when researchers need to
compare two means obtained from a single sample.
When scores on two different variables are obtained
from a single group, matched t-test is used. For
example, the researcher may give a pre-test and a post-
Research Methods in Language Issues ……………………………………
115
test and hope to be to compare the two means. Or one
may give the subjects two different tasks and want to
compare their performance on the tasks.
To conduct such a study, only one group of
subjects will be sufficient. The procedure is more or
less the same as the independent samples t-test, but the
formula is a bit different. In matched t-test, the
following formula is used:
tobs 
X1  X 2
SD
In this formula, the numerator is the difference between
the two means. The denominator S D stands for the
standard deviation of the differences between scores.
To compute S D , first of all, the difference between
every pair of scores should be calculated. The
difference between each pair of scores (x1 – x2) is called
deviation score and represented as D. Then, the
standard deviation of the deviation scores should be
computed (SD). Finally, the outcome should be adjusted
for the sample size (that is, made sensitive to N) to have
an estimate of S D .
To clarify the point further, let us do the
computations with an example. Suppose that the scores
of the above-mentioned subjects on the Pre-test and the
post-test are those presented in the following table:
Chapter 10: Comparing two means (T-test) ………….………………….
116
Table 10.1
S = subjects
X1 = scores on the
Pre-test
X2 = scores on the
Post-test
D = deviation score
(X1 – X2)
D2 = squared
deviation
(X1 – X2)2
S
1
2
3
4
5
6
7
8
9
10
Σ
X1
17
14
18
11
15
17
12
16
18
12
150
X2
16
17
15
14
10
15
9
13
14
12
135
D
1
-3
3
-3
5
2
3
3
4
0
15
D2
1
9
9
9
25
4
9
9
16
0
91
With these pieces of information, let us test our null
hypothesis at .05 level of significance. As usual, we are
to determine the critical value of t as a first step. The
degree of freedom is 9 (10 – 1). The critical value of t
with this degree of freedom at .05 level of significance
is 2.26 (see appendix 2).
To compute the observed t value, the first thing
to do is to compute the standard deviation of the
deviation scores. We have already learned (in chapter
7) that standard deviation is computed from raw scores
using the following formula:
S
 X 2  [(  X ) 2 / N ]
N 1
If we substitute D for X in the formula, it will be :
Research Methods in Language Issues ……………………………………
SD 
117
 D 2  [(  D) 2 / N ]
N 1
So SD is computed as :
SD 
91  (225 / 10)
68.5

 2.75
9
9
This SD should be adjusted for the sample size. Just like
the way S X was estimated from Sx , so is S D estimated
from SD.
SX 
SD 
SX
n
2.75
10
SD 

SD
n
2.75
 .87
3.16
The last step is to compute the observed value of t:
t obs 
X 1  X 2 15  13.5

 1.72
SD
.87
Since the observed value of t is smaller than the
critical value, we cannot reject the null hypothesis,
which means that the difference between the two means
is not statistically significant.
Assumptions underlying t-test
Although t-test is a very useful and versatile
statistical procedure, care must be taken in the use of it.
There are certain assumptions underlying t-test that
should be met before using it; otherwise, there may be
confusion. These assumptions include the following:
Chapter 10: Comparing two means (T-test) ………….………………….
118
1. The scores are measured on an interval scale.
Namely, t-test is not used to compare data
obtained on ordinal or nominal scales.
2. In the independent group t-test, every subject
should be assigned to only one group. No subject
should be a member of both experimental and
control groups.
3. Every subject’s score must be independent of
any other subject’s score.
4. Scores should be approximately normally
distributed, and the variances of the groups
should not be significantly different from each
other.
5. Finally, the most important assumption
underlying t-test is that it is ideally used to
compare the means of only two groups.
The last assumption says that t-test is not an
appropriate statistical procedure for comparing the
means of more than two groups. The reason is that
when the number of comparisons increases, so does the
probability of making mistakes (that is, the level of
significance). Statisticians say that level of significance
(α) changes according to the following formula:
α = 1 – (1- α )c
In this formula, c stands for the number of
comparisons. If two means are compared, only one
comparison can be made, and the level of significance
remains intact. Supposing that the original level of
Research Methods in Language Issues ……………………………………
119
significance was .05, with one comparison, it would not
change.
α = 1 – (1 - .05)1
α = 1 - .95 = .05
But if three means are compared, three comparisons are
possible. Then, the original .05 level of significance
will change this way:
α = 1 – (1 - .05)3
α = 1 – (.95)3
α = 1 - .85 =
.15
This means that although only 5% of possible mistake
is allowed, the probability of mistakes is now 15%.
Chapter 11: Reporting research findings …………….………………….
120
Chapter 11
Reporting Research Findings
Research Methods in Language Issues ……………………………………
121
Introduction
In the previous chapters, we discussed how to
form research questions; formulate hypotheses; review
the literature; collect data; summarise, describe, and
analyse the data; do certain statistical operations and
come up with certain results. The purpose of this
concluding chapter is to discuss the way the research
findings should be reported. In order to make research
findings understandable and useful for others, all
researchers need to follow a common format in
reporting their findings. In this chapter, one such format
is to be described.
Research report format
There are different ways of reporting research
findings. Researchers in different fields may use
different formats for reporting their findings. So, the
easiest way to find out how a research paper should be
prepared is to check the major journals to which
members of that field subscribe. Nevertheless, since
most papers in applied linguistics and language
teaching use the APA (American Psychological
Association) format, this format will be briefly
described below.
Within the APA format, there are three major
sections in any research report: introduction, method,
and results. Before these main sections, however,
certain preliminaries should be considered. Prior to
anything else, the title of research and the researcher’s
name and affiliation should be given as follows:
Chapter 11: Reporting research findings …………….………………….
122
Title
(concise and exact, capitalize major words)
Researcher’s Name
(double space below title, capitalize first letters)
Researcher’s Affiliation (University, organization, etc.)
(double space below name, first letters in caps.)
This is followed by an abstract. An abstract is a
summary of research in approximately 150 words,
which states research question, method, and results.
Abstract is written in the form of a single paragraph the
purpose of which is to let those who read it decide
whether or not they really want to read the whole
report.
This is followed by introduction, which although
not labelled, is the first major section. The first part of
introduction contains the background to the research
topic, research questions, the purpose of research, and
the significance of the study. It should be noted that
here the research question is introduced generally, but
not necessarily as a formal question.
The second part of the introduction section is
labelled ‘review of related literature’, which is usually a
side heading. But if there is not much related literature,
the heading may be omitted. Review of related
literature deals with what other people have already
done. After the review of literature, the researcher
Research Methods in Language Issues ……………………………………
123
should clearly state the research question and formulate
hypotheses.
In short, the introduction section aims at
answering four basic questions:
1. What do you intend to do?
2. What are your predictions?
3. Why is the work important?
4. What has already been done?
Method
The second major part of research report is the
method section, which is centred and the first letter is
capitalised. The method section includes the following
parts:
Subjects
In the ‘subjects’ part, which is labelled as a
single heading, the number of subjects is given and
their characteristics are described. Depending on the
topic of research, these characteristics may include age,
sex, first language background, level of education, the
number of groups they were assigned to, the criteria for
their grouping, and so on.
Materials and procedures
Sometimes ‘materials’ and ‘procedures’ are two
separate parts, each one being labelled as an
independent side heading, and sometimes they form a
single heading. In either case, the description of the
materials used in the research comes first. For example,
Chapter 11: Reporting research findings …………….………………….
124
if the treatment includes some teaching materials, then
the name of the source(s), number of chapters, units,
lessons, even pages, and the type of classroom activities
should be described. Or if tests are used for data
collection, the characteristics of tests including the
number of subtests, the number of items in each
subtest, the proportion of items testing the various
aspects of a characteristic, etc. should be clarified.
‘Procedure’ follows the description of
‘materials. Here, the researcher gives a concise and
detailed description of the way data were collected. In
this part of the method section, the researcher explains
how s/he administered the test (orally, in written form,
etc.); whether or not s/he answered questions if there
were any; in what language instructions were given –
native or target; how long data collection took; what the
scoring procedure was, and so on. In short, in the
‘procedure’ section, the researcher gives a step by step
description of whatever was done so that if somebody
wishes to replicate the research, they know what to do.
Data Analysis
The last part of the method section, which
begins with a side heading with the initial letters
capitalized, is ‘data analysis’. Data analysis explains
what was done with the data after they were collected.
For example, the researcher explains what statistical
operations and tests, what formulae, etc. were used to
analyse the data.
Research Methods in Language Issues ……………………………………
125
Results
The results section is also centred and
capitalized. It is made up of two parts. Sometimes the
first part is referred to as ‘findings’ and the second part
as ‘discussion’. At other times, the first part may be
labelled ‘results’ and the second part ‘conclusion and
discussion’. Names do not really matter. What matters
is that the first part only gives the outcome of the
statistical operations done. It shows, for example, what
the observed value of t and the critical t value turned
out to be(supposing that t-test was used). The second
part, the ‘discussion’ part, gives an interpretation of the
findings. It explains what the obtained results mean,
and what implications (theoretical or practical) they
may have.
References
Finally, the last part of a research paper is the
reference section, in which all the books and other
sources of information used are listed as follows:
Author’s last name, author’s first name, author’s
middle name (usually initialised). Date of publication.
The title of the book (usually italicised or underlined).
Place of publication : name of the publisher. Here is an
example:
Hadley, Alice,O. 2003. Teaching language in
context. Boston, Massachusetts: HEINLE
& Heinle Publishers.
Chapter 11: Reporting research findings …………….………………….
126
References for further reading
Farhady, H. 1995. Research methods in applied
linguistics. Tehran: Payame Noor University
Press.
Gravetter, F. J. & L.B. Wallnau. 1996. Statistics for
behavioural sciences: A first course for students
of psychology and education. 4th ed. St. Paul:
West.
Hatch, E. & H. Farhady. 1982. Research design and
statistics for applied linguistics. Rowley, Mass.:
Newbury House.
Kinnear, P. R., & C. D. Gray. 2000. SPSS for windows
made simple release 10. Hove (UK):
Psychology
Press.
Seliger, H.W., & E. Shohamy. 1989. Second language
research methods. Oxford: Oxford University
Press.
Winer, B. J., D. R. Brown, & K.M. Michels. 1991.
Statistical principles in experimental design.
New York: McGraw-Hill.
Research Methods in Language Issues ……………………………………
Appendices
127
128
Z
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
Appendix 1
Proportions of area under the normal curve
Area
Area
*
Z
Area
between
beyond *
between
mean and Z
Z
*
mean and Z
* 0.30
.0000
.5000
.1179
* 0.31
.0040
.4960
.1217
* 0.32
.0080
.4920
.1255
* 0.33
.0120
.4880
.1293
* 0.34
.0160
.4840
.1331
* 0.35
.0199
.4801
.1368
* 0.36
.0239
.4761
.1406
* 0.37
.0279
.4721
.1443
* 0.38
.0319
.4681
.1480
* 0.39
.0359
.4641
.1517
Area
beyond
Z
.3821
.3783
.3745
.3707
.3669
.3632
.3594
.3557
.3520
.3483
0.10
0.11
0.12
0.13
0.14
0.15
0.16
0.17
0.18
0.19
.0398
.0438
.0478
.0517
.0557
.0596
.0636
.0675
.0714
.0753
.4602
.4562
.4522
.4483
.4443
.4404
.4364
.4325
.4286
.4247
*
*
*
*
*
*
*
*
*
*
0.40
0.41
0.42
0.43
0.44
0.45
0.46
0.47
0.48
0.49
.1554
.1591
.1628
.1664
.1700
.1736
.1772
.1808
.1844
.1879
.3446
.3409
.3372
.3336
.3300
.3264
.3228
.3192
.3156
.3121
0.20
0.21
0.22
0.23
0.24
0.25
0.26
0.27
0.28
0.29
.0793
.0832
.0871
.0910
.0948
.0987
.1026
.1064
.1103
.1141
.4207
.4168
.4129
.4090
.4052
.4013
.3974
.3636
.3897
.3859
*
*
*
*
*
*
*
*
*
*
0.50
0.51
0.52
0.53
0.54
0.55
0.56
0.57
0.58
0.59
.1915
.1950
.1985
.2019
.2054
.2088
.2123
.2157
.2190
.2224
.3085
.3050
.3015
.2981
.2946
.2912
.2877
.2843
.2810
.2776
Research Methods in Language Issues ……………………………………
Z
0.60
0.61
0.62
0.63
0.64
0.65
0.66
0.67
0.68
0.69
Area
between
mean and Z
.2257
.2291
.2324
.2357
.2389
.2422
.2454
.2486
.2517
.2549
Area
beyond
Z
.2743
.2709
.2676
.2643
.2611
.2578
.2546
.2514
.2483
.2451
*
*
*
*
*
*
*
*
*
*
*
*
*
0.70
0.71
0.72
0.73
0.74
0.75
0.76
0.77
0.78
0.79
.2580
.2611
.2642
.2673
.2704
.2734
.2764
.2794
.2823
.2852
.2420
.2389
.2358
.2327
.2296
.2266
.2236
.2206
.2177
.2148
0.80
0.81
0.82
0.83
0.84
0.85
0.86
0.87
0.88
0.89
.2881
.2910
.2939
.2967
.2995
.3023
.3051
.3078
.3106
.3133
.2119
.2090
.2061
.2033
.2005
.1977
.1949
.1922
.1894
.1867
Z
129
0.90
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
Area
between
mean and Z
.3159
.3186
.3212
.3238
.3264
.3289
.3315
.3340
.3365
.3389
Area
beyond
Z
.1841
.1814
.1788
.1762
.1736
.1711
.1685
.1660
.1635
.1611
*
*
*
*
*
*
*
*
*
*
1.00
1.01
1.02
1.03
1.04
1.05
1.06
1.07
1.08
1.09
.3413
.3438
.3461
.3485
.3508
.3531
.3554
.3577
.3599
.3621
.1587
.1562
.1539
.1515
.1492
.1469
.1446
.1423
.1401
.1379
*
*
*
*
*
*
*
*
*
*
1.10
1.11
1.12
1.13
1.14
1.15
1.16
1.17
1.18
1.19
.3643
.3665
.3686
.3708
.3729
.3749
.3770
.3790
.3810
.3830
.1357
.1335
.1314
.1292
.1271
.1251
.1230
.1210
.1190
.1170
130
Z
1.20
1.21
1.22
1.23
1.24
1.25
1.26
1.27
1.28
1.29
Area
between
mean and Z
.3849
.3869
.3888
.3907
.3925
.3944
.3962
.3980
.3997
.4015
Area
beyond
Z
.1151
.1131
.1112
.1093
.1075
.1056
.1038
.1020
.1003
.0985
*
*
*
*
*
*
*
*
*
*
*
*
*
1.30
1.31
1.32
1.33
1.34
1.35
1.36
1.37
1.38
1.39
.4032
.4049
.4066
.4082
.4099
.4115
.4131
.4147
.4162
.4177
.0968
.0951
.0934
.0918
.0901
.0885
.0869
.0853
.0838
.0823
1.40
1.41
1.42
1.43
1.44
1.45
1.46
1.47
1.48
1.49
.4192
.4207
.4222
.4236
.4251
.4265
.4279
.4292
.4306
.4319
.0808
.0793
.0778
.0764
.0749
.0735
.0721
.0708
.0694
.0681
Z
1.50
1.51
1.52
1.53
1.54
1.55
1.56
1.57
1.58
1.59
Area
between
mean and Z
.4332
.4345
.4357
.4370
.4382
.4394
.4406
.4418
.4429
.4441
Area
beyond
Z
.0668
.0655
.0643
.0630
.0618
.0606
.0594
.0582
.0571
.0559
*
*
*
*
*
*
*
*
*
*
1.60
1.61
1.62
1.63
1.64
1.65
1.66
1.67
1.68
1.69
.4452
.4463
.4474
.4484
.4495
.4505
.4515
.4525
.4535
.4545
.0548
.0537
.0526
.0516
.0505
.0495
.0485
.0475
.0465
.0455
*
*
*
*
*
*
*
*
*
*
1.70
1.71
1.72
1.73
1.74
1.75
1.76
1.77
1.78
1.79
.4554
.4564
.4573
.4582
.4591
4599
.4608
.4616
.4625
.4633
.0446
.0436
.0427
.0418
.0409
.0401
.0392
.0384
.0375
.0367
Research Methods in Language Issues ……………………………………
Z
1.80
1.81
1.82
1.83
1.84
1.85
1.86
1.87
1.88
1.89
Area
between
mean and Z
.4641
.4649
.4656
.4664
.4671
.4678
.4686
.4693
.4699
.4706
Area
beyond
Z
.0359
.0351
.0344
.0336
.0329
.0322
.0314
.0307
.0301
.0294
*
*
*
*
*
*
*
*
*
*
*
*
*
1.90
1.91
1.92
1.93
1.94
1.95
1.96
1.97
1.98
1.99
.4713
.4719
.4726
.4732
.4738
.4744
.4750
.4756
.4761
.4767
.0287
.0281
.0274
.0268
.0262
.0256
.0250
.0244
.0239
.0233
2.00
2.01
2.02
2.03
2.04
2.05
2.06
2.07
2.08
2.09
.4772
.4778
.4783
.4788
.4793
.4798
.4803
.4808
.4812
.4817
.0228
.0222
.0217
.0212
.0207
.0202
.0197
.0192
.0188
.0183
Z
131
2.10
2.11
2.12
2.13
2.14
2.15
2.16
2.17
2.18
2.19
Area
between
mean and Z
.4821
.4826
.4830
.4834
.4838
.4842
.4846
.4850
.4854
.4857
Area
beyond
Z
.0179
.0174
.0170
.0166
.0162
.0158
.0154
.0150
.0146
.0143
*
*
*
*
*
*
*
*
*
*
2.20
2.21
2.22
2.23
2.24
2.25
2.26
2.27
2.28
2.29
.4861
.4864
.4868
.4871
.4875
.4878
.4881
.4884
.4887
.4890
.0139
.0136
.0132
.0129
.0125
.0122
.0119
.0116
.0113
.0110
*
*
*
*
*
*
*
*
*
*
2.30
2.31
2.32
2.33
2.34
2.35
2.36
2.37
2.38
2.39
.4893
.4896
.4898
.4901
.4904
.4906
.4909
.4911
.4913
.4916
.0107
.0104
.0102
.0099
.0096
.0094
.0091
.0089
.0087
.0084
132
Z
2.40
2.41
2.42
2.43
2.44
2.45
2.46
2.47
2.48
2.49
Area
between
mean and Z
.4918
.4920
.4922
.4925
.4927
.4929
.4931
.4932
.4934
.4936
Area
beyond
Z
.0082
.0080
.0078
.0075
.0073
.0071
.0069
.0068
.0066
.0064
*
*
*
*
*
*
*
*
*
*
*
*
*
2.50
2.51
2.52
2.53
2.54
2.55
2.56
2.57
2.58
2.59
.4938
.4940
.4941
.4943
.4945
.4946
.4948
.4949
.4951
.4952
.0062
.0060
.0059
.0057
.0055
.0054
.0052
.0051
.0049
.0048
2.60
2.61
2.62
2.63
2.64
2.65
2.66
2.67
2.68
2.69
.4953
.4955
.4956
.4957
.4959
.4960
.4961
.4962
.4963
.4964
.0047
.0045
.0044
.0043
.0041
.0040
.0039
.0038
.0037
.0036
Z
2.70
2.71
2.72
2.73
2.74
2.75
2.76
2.77
2.78
2.79
Area
between
mean and Z
.4965
.4966
.4967
.4968
.4969
.4970
.4971
.4972
.4973
.4974
Area
beyond
Z
.0035
.0034
.0033
.0032
.0031
.0030
.0029
.0028
.0027
.0026
*
*
*
*
*
*
*
*
*
*
2.80
2.81
2.82
2.83
2.84
2.85
2.86
2.87
2.88
2.89
.4974
.4975
.4976
.4977
.4977
.4978
.4979
.4979
.4980
.4981
.0026
.0025
.0024
.0023
.0023
.0022
.0021
.0021
.0020
.0019
*
*
*
*
*
*
*
*
*
*
2.90
2.91
2.92
2.93
2.94
2.95
2.96
2.97
2.98
2.99
.4981
.4982
.4982
.4983
.4984
.4984
.4985
.4985
.4986
.4986
.0019
.0018
.0018
.0017
.0016
.0016
.0015
.0015
.0014
.0014
Research Methods in Language Issues ……………………………………
Z
3.00
3.01
3.02
3.03
3.04
3.05
3.06
3.07
3.08
3.09
Area
between
mean and Z
.4987
.4987
.4987
.4988
.4988
.4989
.4989
.4989
.4990
.4990
Area
beyond
Z
.0013
.0013
.0013
.0012
.0012
.0011
.0011
.0011
.0010
.0010
*
*
*
*
*
*
*
*
*
*
*
*
*
3.10
3.11
3.12
3.13
3.14
3.15
3.16
3.17
3.18
3.19
.4990
.4991
.4991
.4991
.4992
.4992
.4992
.4992
.4993
.4993
.0010
.0009
.0009
.0009
.0008
.0008
.0008
.0008
.0007
.0007
*
*
*
*
*
*
*
*
*
*
Z
133
3.20
3.21
3.22
3.23
3.24
3.25
3.30
3.35
3.40
3.45
Area
between
mean and Z
.4993
.4993
.4994
.4994
.4994
.4994
.4995
.4996
.4997
.4997
Area
beyond
Z
.0007
.0007
.0006
.0006
.0006
.0006
.0005
.0004
.0003
.0003
3.50
3.60
3.70
3.80
3.90
4.00
.4998
.4998
.4999
.4999
.49995
.49997
.0002
.0002
.0001
.0001
.00005
.00005
Taken from Farhady, H. 1995. Research methods in applied
linguistics. Tehran: Payame Noor University Press.
134
Degree
of
freedom
1
2
3
4
5
6
7
8
9
10
.20
3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
Appendix 2
Critical values of t
Level of significance for one-tailed test
.05
.025
.01
.005
Level of significance for two-tailed test
.10
.05
.02
.01
6.314
12.706
31.821
63.657
2.920
4.303
6.965
9.925
2.353
3.182
4.541
5.841
2.132
2.776
3.747
4.604
2.015
2.571
3.365
4.032
1.943
2.447
3.143
3.707
1.895
2.365
2.998
3.499
1.860
2.306
2.896
3.355
1.833
2.262
2.821
3.250
1.812
2.228
2.764
3.169
11
12
13
14
15
16
17
18
19
20
1.363
1.356
1.350
1.345
1.341
1.337
1.333
1.330
1.328
1.325
1.796
1.782
1.771
1.761
1.753
1.746
1.740
1.734
1.729
1.725
2.201
2.179
2.160
2.145
2.131
2.120
2.110
2.101
2.093
2.086
2.718
2.681
2.650
2.624
2.602
2.583
2.567
2.552
2.539
2.528
3.106
3.055
3.012
2.977
2.947
2.921
2.898
2.878
2.861
2.845
4.437
4.318
4.221
4.140
4.073
4.015
3.965
3.922
3.883
3.850
21
22
23
24
25
26
27
28
29
30
1.323
1.321
1.319
1.318
1.316
1.315
1.314
1.313
1.311
1.310
1.721
1.717
1.714
1.711
1.708
1.706
1.703
1.701
1.699
1.697
2.080
2.074
2.069
2.064
2.060
2.056
2.052
2.048
2.045
2.042
2.518
2.508
2.500
2.492
2.485
2.479
2.473
2.467
2.462
2.457
2.831
2.819
2.807
2.797
2.787
2.779
2.771
2.763
2.756
2.750
3.819
3.792
3.767
3.745
3.725
3.707
3.690
3.674
3.659
3.646
.10
.0005
.001
636.619
31.598
12.941
8.610
6.859
5.959
5.405
5.041
4.781
4.587
40
1.303
1.684
2.021
2.423
2.704
3.551
60
1.296
1.671
2.000
2.390
2.660
3.460
120
1.289
1.658
1.980
2.358
2.617
3.373
+120
1.282
1.645
1.960
2.326
2.576
3.291
Taken from Hatch, E. & H. Farhady. 1982. Research design and
statistics for applied linguistics. Rowley, Mass.: Newbury House.