Download CNM Handbook for Educational Research (doc)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia, lookup

Operations research wikipedia, lookup

History of statistics wikipedia, lookup

Time series wikipedia, lookup

Human subject research wikipedia, lookup

Misuse of statistics wikipedia, lookup

Transcript
CNM HANDBOOK FOR
EDUCATIONAL RESEARCH
Central New Mexico Community College
Ursula Waln, Director of Student Learning Assessment, and Fred Ream, Mathematics Instructor
2014
Table of Contents
Purpose of this Handbook ...................................................................................................................................... 1
IRB and Educational Research .............................................................................................................................. 1
How the IRB Came to Be ................................................................................................................................... 1
The Role of the IRB ........................................................................................................................................... 2
Research Involving Human Subjects ................................................................................................................. 2
IRB Exemptions ................................................................................................................................................. 3
IRB Criteria ........................................................................................................................................................ 4
Informed Consent ............................................................................................................................................... 5
Research Design..................................................................................................................................................... 6
Quantitative and Qualitative Options ................................................................................................................. 6
Sampling Methods.............................................................................................................................................. 9
Statistics ............................................................................................................................................................... 10
Data Types........................................................................................................................................................ 10
Analytic Approaches ........................................................................................................................................ 10
Figure 1: Statistical Test Decision-Making Flowchart ................................................................................. 11
Refresher of Basic Concepts ............................................................................................................................ 12
Figure 2: Normal Distribution ...................................................................................................................... 14
Figure 3: Illustration of Outliers ................................................................................................................... 15
Analytic Software Programs ............................................................................................................................ 16
Preliminary Data Analysis Tips ....................................................................................................................... 16
Glossary ............................................................................................................................................................... 17
References ............................................................................................................................................................ 19
Appendix A: Recommendations from CNM’s IRB............................................................................................. 20
Appendix B: Preview of CNM IRB Application Form ....................................................................................... 21
PURPOSE OF THIS HANDBOOK
This handbook was developed to foster depth of intentionality and critical analysis in CNM student outcomes
assessment. It builds on the information presented in the CNM Handbook for Outcomes Assessment and is based
on the premises that 1) student outcomes assessment is a form of educational research, and 2) considering one’s
assessment efforts as research may suggest more creative, purposeful, and meaningful approaches. The
following pages offer information and tips for those who wish to develop a greater degree of sophistication in
their assessment approach and/or have their assessment efforts render more useful information.
IRB AND EDUCATIONAL RESEARCH
If you are interested in conducting research that is not purely operational in nature, is experimental, is for a
dissertation or master’s thesis, or is going to be used outside of CNM in any way, your project requires prior
approval by the CNM Institutional Review Board (IRB). If you think there is any possibility at all that you will
ever want to publish your findings, use the data for other publishable research, or do anything else with your
findings outside of CNM, please get IRB approval before beginning any data collection. The question of
whether your particular study must have IRB approval hinges on whether or not it meets the definition of
‘research involving human subjects.’ However, for any study that is not operational in nature and going to be
used only within CNM, the IRB, not the researcher, must make this determination, based on an application. If
you have any doubt at all, submit an application. The process itself can help you organize your research plans,
so the time won’t be wasted, and you’ll know you covered your bases.
A requirement for IRB approval need not deter faculty from pursuing educational research projects. Indeed,
structured research into student learning not only has great potential for informing internal improvement efforts,
but also can produce insights with the potential to impact the broader profession and/or instructional
community. The application form for submitting a project for IRB approval is shown in Appendix B and can be
accessed at http://www.cnm.edu/depts/planning/instres/irb.
If you are absolutely certain your project does not require IRB approval and you already know all you need to
know about the IRB, you may want to skip ahead to the research design section, page 11.
How the IRB Came to Be
The existence of IRBs for certification of research conducted at colleges and university is required under The
National Research Act of 1974. Although certainly not the first example of a societal effort to regulate
experimentation involving humans, this federal statute came about in response to medical research that violated
ethical principles deemed fundamental to American values and basic human rights. Salient examples include the
medical experiments conducted in Nazi concentration camps between 1939 and 1945, (which led to the
establishment of the first international code of research ethics, the Nuremburg Code) and the U.S. Public Health
Service’s Tuskegee Syphilis Study of 1932-1972, in which 400 African American men known to have syphilis
had information and treatment withheld so that researchers could observe the progression of the disease
(Schneider).
In 1974, Congress established the National Commission for the Protection of Human Subjects of Biomedical
and Behavioral Research and tasked it with identifying ethical principles and developing guidelines for the
conduct of research involving human subjects. In 1979, the Commission published the Belmont Report, which
summarized three basic principles:
1. Respect for persons
o
Individuals should be treated as autonomous agents
1
o
Persons with diminished autonomy are entitled to additional protections
2. Beneficence
o
Do no harm
o
Maximize possible benefits and minimize possible harms
3. Justice
o
Requires that individuals and groups be treated fairly and equitably in terms of bearing the
burdens and receiving the benefits of research
-- NIH, p. 21
The Belmont Report principles and guidelines were codified in the Title 45 Code of Federal Regulations, Part
46, and oversight was given to the U.S. Department of Health, Education and Welfare (which later split into the
Department of Health and Human Services and the Department of Education).
In 1991, The U.S. Department of Education became one of fifteen federal departments and agencies to codify in
separate regulations the “Common Rule,” including in its 34 CFR Part 97 language identical to 45 CFR 46,
Subpart A (NIH). Today, seventeen federal agencies fall under the Common Rule, and oversight for all is
provided by the Office for Human Research Protections, a division of the U.S. Department of Health and
Human Services.
The Role of the IRB
At the institutional level, “The IRB is an administrative body established to protect the rights and welfare of
human research subjects recruited to participate in research activities conducted under the auspices of the
institution with which it is affiliated” (IRB Guidebook, p. 1). IRBs determine “the acceptability of proposed
research in terms of institutional commitments and regulations, applicable law, and standards of professional
conduct and practice” (45 CFR 46.107).
The major roles of IRBs in the oversight of research are:
1. Evaluating the respect for persons, beneficence, and justice of all research activities proposed under the
auspices of the institution and providing approval or disapproval accordingly.
2. Ensuring that the process proposed to collect informed consent meets regulatory requirements.
3. Overseeing the progress and protocols of ongoing research studies.
-- NIH, p. 84
Research Involving Human Subjects
The CFR provides the following definitions:
…(d) Research means a systematic investigation, including research development, testing and evaluation,
designed to develop or contribute to generalizable knowledge. Activities which meet this definition
constitute research for purposes of this policy, whether or not they are conducted or supported under a
program which is considered research for other purposes. For example, some demonstration and service
programs may include research activities.
(e) Research subject to regulation, and similar terms are intended to encompass those research activities
for which a federal department or agency has specific responsibility for regulating as a research
activity, (for example, Investigational New Drug requirements administered by the Food and Drug
2
Administration). It does not include research activities which are incidentally regulated by a federal
department or agency solely as part of the department's or agency's broader responsibility to regulate
certain types of activities whether research or non-research in nature (for example, Wage and Hour
requirements administered by the Department of Labor).
(f) Human subject means a living individual about whom an investigator (whether professional or student)
conducting research obtains
(1) Data through intervention or interaction with the individual, or
(2) Identifiable private information.
Intervention includes both physical procedures by which data are gathered (for example,
venipuncture) and manipulations of the subject or the subject's environment that are performed
for research purposes.
Interaction includes communication or interpersonal contact between investigator and subject.
Private information includes information about behavior that occurs in a context in which an
individual can reasonably expect that no observation or recording is taking place, and
information which has been provided for specific purposes by an individual and which the
individual can reasonably expect will not be made public (for example, a medical record).
Private information must be individually identifiable (i.e., the identity of the subject is or may
readily be ascertained by the investigator or associated with the information) in order for
obtaining the information to constitute research involving human subjects…
-- 45 CFR 46.102
IRB Exemptions
The CNM IRB, with endorsement from the VPAA, Deans’ Council, Chairs’ Council, and Faculty Senate,
documented its Recommendations for Research in 2012, stating that “Some types of research are exempt from
IRB review: student projects for class, data collection (e.g., surveys) by faculty or staff, assessment data
collection, student course evaluations, and research by the college that is not designed for publication”
(Appendix A).
Whether or not an investigation is subject to IRB approval, the CNM IRB (on its homepage) recommends
following the basic rules listed in its document Recommendations for Research. In the same vein, it is advisable
for anyone planning educational research at CNM to be familiar with and follow the IRB criteria and informed
consent guidelines in the following sections of this handbook, whether or not the research requires IRB
approval.
Forming the basis for the CNM statements of exemption are the following excerpts from 45 CFR Part 46, and
the corresponding educational regulations in 34 CFR Part 97 (formatting enhanced for readability).
…(b) Unless otherwise required by department or agency heads, research activities in which the only
involvement of human subjects will be in one or more of the following categories are exempt from this
policy:
(1) Research conducted in established or commonly accepted educational settings, involving normal
educational practices, such as
(i) research on regular and special education instructional strategies, or
(ii) research on the effectiveness of or the comparison among instructional techniques, curricula, or
classroom management methods.
3
(2) Research involving the use of educational tests (cognitive, diagnostic, aptitude, achievement),
survey procedures, interview procedures or observation of public behavior, unless:
(i) information obtained is recorded in such a manner that human subjects can be identified,
directly or through identifiers linked to the subjects; and
(ii) any disclosure of the human subjects' responses outside the research could reasonably place the
subjects at risk of criminal or civil liability or be damaging to the subjects' financial standing,
employability, or reputation.
(3) Research involving the use of educational tests (cognitive, diagnostic, aptitude, achievement),
survey procedures, interview procedures, or observation of public behavior that is not exempt under
paragraph (b)(2) of this section, if:
(i) the human subjects are elected or appointed public officials or candidates for public office; or
(ii) federal statute(s) require(s) without exception that the confidentiality of the personally
identifiable information will be maintained throughout the research and thereafter.
(4) Research involving the collection or study of existing data, documents, records, pathological
specimens, or diagnostic specimens, if these sources are publicly available or if the information is
recorded by the investigator in such a manner that subjects cannot be identified, directly or through
identifiers linked to the subjects…
For the CNM IRB Recommendations for Research, see Appendix A or go to
https://www.cnm.edu/depts/planning/instres/irb/documents/RECOMMENDATIONS-FOR-RESEARCH.pdf.
IRB Criteria
The following CFR excerpts hit the highlights of the federal criteria for IRB approval:
1.
Risks to human subjects are minimized…
2.
Risks to human subjects are reasonable in relation to anticipated benefits, if any, to human subjects and
the importance of the knowledge that may reasonably be expected to result…
3.
Selection of human subjects is equitable…
4.
Informed consent will be sought from each prospective research participant or the prospective research
participant’s legally authorized representative in accordance with and to the extent required by §46.116.
5.
Informed consent will be appropriately documented in accordance with and to the extent required by
§46.117.
6.
When appropriate, the research plan makes adequate provision for monitoring the data collected to
ensure the safety of subjects
7.
When appropriate, there are adequate provisions to protect the privacy of subjects and to maintain the
confidentiality of data.
-- 45 CFR 46.111
The federal regulations can be accessed in their entirety at
http://www.hhs.gov/ohrp/humansubjects/guidance/45cfr46.html or at
http://www2.ed.gov/policy/fund/reg/humansub/part97.html.
In addition, an Institutional Review Board Guidebook, published by the HHS, is available at
http://www.hhs.gov/ohrp/archive/irb/irb_guidebook.htm.
4
And, the CNM IRB web site and links to documents can be found at
http://www.cnm.edu/depts/planning/instres/irb.
Informed Consent
The principle of respect for persons requires that investigators conducting research on human subjects obtain
informed consent in the form of “[a] legally-effective, voluntary agreement that is given by a prospective
research participant following comprehension and consideration of all relevant information pertinent to the
decision to participate in a study” (NIH, p. 117).
Autonomous adults can provide their own informed consent. (The age of majority in New Mexico is eighteen.)
However, research involving children or other vulnerable populations requires informed consent from not only
the subject (whenever the individual is reasonably capable of providing or denying consent), but also the
subject’s legally authorized representative. In some cases (e.g., specific cultural settings), community or family
consent may also be required.
Informed consent consists of voluntariness, comprehension, and disclosure:
Voluntariness: Consent must be given free of coercion, and the perceived value of any inducements
must be small enough to avoid undue influence on the decision to participate. Care must be taken to
ensure that subjects are not manipulated to ignore the risks of participation either through fear of
repercussions for refusal or through motivation by excessive or inappropriate rewards. Offering bonus
points toward a final course grade would be an example of an inappropriate incentive. However, in some
cases, minimal compensation for participants’ time and effort may be appropriate.
Comprehension: To provide informed consent, subjects must first understand the information presented
to them sufficiently to make a reasoned, informed decision. Subjects must therefore have the cognitive
capacity to understand (or else be represented by legally authorized individuals), and the information
must be presented using language and methods of delivery appropriate to the people making the
decision.
Disclosure: Information that must be disclosed for participants to be able to provide informed consent
include the purpose, risks, and benefits of the research; alternatives to the research protocol; the extent to
which confidentiality can be protected; compensation in case of injury; contact information for questions
or concerns; and conditions of participation, including the right to refuse or withdraw participation
without penalty. (Compensation may be terminated proportionately upon withdrawal, but any offer of
compensation for participation prior to the time of withdrawal must be honored. Telling prospective
subjects they must complete the study or forfeit all compensation would be an example of a penalty.)
Waivers regarding the disclosure of specific information may be considered by the IRB when the research poses
no more than minimal risk to the participants, withholding the information will not adversely affect the rights or
welfare of the participants, and the research could not practicably be carried out otherwise. When information is
withheld, it should be disclosed following the conclusion of research unless doing so might cause harm.
“No informed consent, whether oral or written, may include any exculpatory language through which the
subject or the representative is made to waive or appear to waive any of the subject's legal rights, or releases or
appears to release the investigator, the sponsor, the institution or its agents from liability for negligence” (45
CFR 46.116).
Finally, informed consent should be viewed as an ongoing process, not just a one-time event. Participants
should be kept informed and given opportunities to ask questions as the research is carried out.
5
RESEARCH DESIGN
For the purposes of CNM student learning assessment, the primary consideration in designing your research
should be what you want to know related to the development of your program competencies (student learning
outcome statements). Using that as your starting point, you can employ ‘reverse design’ principles to map out
the steps you will take toward gaining the insights you seek. Depending on the type of information you want to
collect and what you want to do with it, your research design may or may not include the following:

A written statement of the issue to be investigated

Identification of the population to be studied

A description of relevant background information

A literature review

A written rationale

A hypothesis and/or null hypothesis (or just a statement of what you expect to find)

Identification of methodology
o Sampling methods
o Instruments
o Procedures for administration
o Data analysis methods

A timeline
Quantitative and Qualitative Options
Research approaches are often described as quantitative or qualitative, though the two descriptors more
accurately represent ends of a continuum than distinct categories. Both extremes seek to answer questions using
systematic procedures, collecting evidence, and examining findings. However, where a research project falls on
the continuum is determined by its objective(s), the types of research questions posed, the methodologies
employed, and the types of data produced. Expectations for researchers differ as well.
Quantitative research tends to have as an objective obtaining generalizable findings. Questions posed tend to
focus on associations between variables, methodologies employed tend to involve studying specific variables,
and data produced tends to be numerical. Quantitative investigators are typically expected to adhere to
scientifically validated research models and techniques and to remain objective.
Qualitative research, on the other hand, tends to have as an objective exploring the dynamics of a specific
context. Questions posed tend to focus on understanding factors that influence outcomes, methodologies
employed tend to be open-ended and exploratory, and data produced tends to be textual (recorded observations,
participant comments, etc.). Qualitative investigators are typically considered free to adapt their research
models to fit the changing circumstances of their investigations and to render subjective interpretations.
Randomized controlled trials represent the quantitative end of the spectrum. In a nutshell, such investigations
provide an intervention with a randomly selected experimental group and compare the results to those of a
control group (a group that does not receive the intervention). Here are some grouping models commonly used
in randomized controlled trials:
6

Parallel grouping: Each student is randomly assigned to
an experimental or control group; the experimental
group receives the intervention and the control group
does not – but may receive a dummy/placebo
intervention, to minimize the potential for participant
expectations to influence the outcomes. (In educational
research, a placebo may be, for example, an
instructional method that has already been shown to not
produce the effect being researched.)

Cluster grouping: Pre-existing groups of students (such
as course sections) are randomly selected to receive or
not receive the intervention

Crossover grouping: Each student receives or does not
receive the intervention in a random sequence

Factorial grouping: Used when more than one
intervention is tested simultaneously to determine
whether there is a compound effect; each student is
randomly assigned a group that receives a particular
combination of interventions (AA, AB, AC, & BC,
where A = no intervention, B = intervention 1, and C =
Intervention 2)
A common crossover approach is to switch the treatment and
non-treatment groups at the midpoint of the study. Crossover
grouping minimizes the potential for group differences to
influence the findings. In educational research, the
effectiveness of a cross-over approach depends in part upon
whether the intervention being studied is influenced by timing.
A Source of Some Confusion
The word quantitative may sometimes seem a
little slippery because people use it differently
depending on whether they are describing data,
measurement methods, or research design.
Quantitative data is information derived from
direct numerical measurement. Height, weight,
area, pounds per square inch, counts of students,
proportions of responses, and GPA are examples.
Representing something as a number when it is
not directly, numerically measurable (such as
pain rated on a scale of 1 to 10 or level of effort
on a scale of 0 to 4) does not make it quantitative
data. It is a common error to refer to qualitative
data that has been numerically represented as
quantitative.
Readers will find disagreement in the literature
about the classification of measurement methods
as quantitative versus qualitative. Applying the
same distinction used for classifying data,
quantitative measurement methods are those that
involve direct, numerical measures. However,
common usage lumps together all measurement
methods that produce numerical data (especially
when conducted in systematic ways) as
quantitative measurement methods. As a result,
Likert-scale questionnaire items and rubric-based
evaluations are commonly referred to as
quantitative methods even though the data they
comprise is qualitative data.
At the research design level, quantitative
essentially means the researchers intend to use
inferential statistics to make generalizations
beyond the study sample.
For example, let’s say a foreign language instructor randomly selects half of her students to participate in an
emersion program during the first half of the semester only and then has the other half of the students
participate in the emersion program during the second half of the semester. The timing of the emersion program
could turn out to be a significant factor. Students who develop basic vocabulary and syntactic awareness
through classroom instruction during the first eight weeks may have a decoding advantage when they enter the
emersion program. On the other hand, students who experience the emersion program during the first eight
weeks may develop greater awareness of cadence, pronunciation, and speech patterns, and applying these,
advance more effectively through the conventional instruction during the second eight weeks. Anticipating
these possibilities, the instructor may decide to include in the design a third group, comprised of students who
do not participate in the emersion program at all.
A subset of randomized controlled trials, called randomized double-blind placebo-controlled crossover studies,
is generally considered the ‘gold standard’ in quantitative scientific research (particularly pharmaceutical
research). And, while it need not be the goal of every educational researcher to conduct such a study, an
understanding of the rationale supporting the design will help inform design decisions.
Double-blind means neither the researchers in contact with the subjects nor the subjects know who is receiving
the ‘treatment’ and who is not. This minimizes the potential for either the researchers or the subjects to
influence outcomes based on their expectations. In educational research, conducing a double-blind study would
require that the educational intervention be provided by someone other than the investigator.
7
For example, let’s say a researcher wants to find out whether students approached outside of class by a peer and
asked to explain core course concepts perform significantly better on subsequent exams than students not
questioned in this way. To make this a double-blind study, the students must have no inkling that being
questioned by a peer is an intervention, and the researcher must remain ignorant regarding which students are
questioned until after the data collection is complete.
At the other end of the spectrum, qualitative research usually seeks to answer a question related to the
perspectives of an involved population. Qualitative research can be particularly useful in studying attitudes,
values, opinions, perceptions, beliefs, behaviors, emotions, relationships, social situations, experiences, and
other aspects of individual, community, or cultural contexts. Although insights gained may have the potential
for extension to similar populations or situations, qualitative research usually focuses more on delving into the
factors that influence outcomes within the specific study population than on being able to generalize the
findings to other populations. Qualitative research explores and interprets how people experience the complex
reality of given issues and/or situations.
For example, imagine CNM researchers want to better understand why area employers within a specific sector
do not hire as many qualified CNM alumni as might be expected on the basis of job openings. The objective of
their investigation will be to identify factors influencing the decisions of the local employers. Their findings
might ultimately reveal something about employer perceptions that could be useful, by extension, to other CNM
programs or other colleges with similar programs. However, their primary goal is not finding something that
can help other programs, but rather, better understanding what their students’ prospective employers want so
that the program faculty might better address the preparation of candidates.
Research within the field of education is often driven by context-specific questions posed by faculty and
therefore tends toward qualitative design. Indeed, large-scale quantitative educational studies are relatively rare,
and generalizations made on their basis tend to be subject to dismissal because of the difficulties inherent in
applying scientific models to studies of humans that take place in natural settings, especially when the
conclusions contradict strongly-held convictions about teaching and learning. (For an illustrative example,
consider Project Follow Through, “the largest, most expensive educational experiment ever conducted”
[Adams, 1996].)
Because investigators conducting qualitative research typically care more about the usefulness of the
information gathered in supporting interpretation and insight than about the reproducibility of the study,
qualitative research tends to be characterized by adjustability and/or fluidity. The design needs to facilitate
collection of the desired information, not stand up to scientific scrutiny. Stated another way, validity of the
findings is important in qualitative research, but reliability is not.
Observation, interviews, and focus groups are methods commonly used in qualitative research. The data
produced may be a mix of direct and indirect measurements and/or textual information with no associated
numeric values (such as recorded dialogs, comments, field notes, responses to open-ended questions, etc.). Input
using categorization, holistic ratings, Likert scales, rubrics, and/or forced-choice responses may be numerically
represented to facilitate statistical analysis.
Note that the use of statistics is not the exclusive domain of quantitative research; however, while quantitative
research tends to employ inferential statistics, qualitative research tends to incorporate descriptive statistics.
These categories are explored further in the Statistics section of this handbook.
8
Sampling Methods
A population is all of a group of interest (in New Mexico, the whole enchilada). A sample is a subset of the
population being studied (in New Mexico, some of the enchilada). Sampling can make it feasible to implement
research methods that would otherwise be impossible or too time consuming to consider.
Sampling methods fall into two broad categories: those that offer the potential to support inferences about the
entire population (probability methods) and those that do not (non-probability methods). Probability methods
are typically used with quantitative research designs and inferential statistics; whereas, non-probability methods
are more often (but not always) used with qualitative designs and descriptive statistics.
If you want to be relatively sure that the results you obtain from your sampled students are generalizable to the
population of students in your course and/or program, consider using a probability method, such as simple,
systematic, stratified, or clustered random sampling. Below are some examples:

Simple random sample: Drawing names from a hat (For the sample to be truly random, all of the
possible names must be in the hat to begin with. If you systematically leave out any group of people
[e.g., students who are failing the course], then your results will not be generalizable to that group.)

Systematic random sample: Counting off every 5th student who comes through the door

Stratified sample: Randomly selecting 10 first-year students, 10 second-year students, and 10 recent
graduates

Cluster sample: Randomly selecting 6 students from each of five sections of the same course
To be confident in your inferences, you will want to select a sample that is big enough to be representative of
the population you are studying. Consider using an online sample-size calculator, such as the one provided by
Creative Research Systems at http://www.surveysystem.com/sscalc.htm, to find out how many students you
need to include in your sample to achieve the level of statistical reliability you want. Then, you can use an
online random integer generator, such as the one at http://www.random.org/integers/, to get numbers that you
can apply to a numbered list of all students in the research ‘population’ to create your random sample.
On the other hand, if you don’t care about generalizing your sample’s results to a larger population, you might
prefer to use non-probability methods such as convenience, purposive, snowball, quota, or theoretical sampling.
Just keep in mind that if you choose any of these, the statistics you obtain will not support claims that your
findings can be generalized to the entire population. Following are some examples:

Convenience sample: Using your own class as a sample

Purposive sample: Asking all of the African American students in your program to participate in a focus
group to provide feedback regarding their perceptions of cultural inclusiveness within the program

Snowball sample: Interviewing a few people and asking them to refer their friends to you for interviews

Quota sampling: Selecting 200 students in such a way that your sample has essentially the same
proportionate representation of ethnic groups seen in your program

Theoretical sample: Questioning a few students about something you’re considering studying further
9
STATISTICS
Depending on the type of assessment/research you conduct, you may or may not need to include statistical
analyses. This final section is included for those who do plan to use statistics but are not expert statisticians.
If you are planning to conduct a formal study, even an in-house one, and are not a statistician, consider
consulting with someone who is a statistician during the planning phase, before you begin collecting any data.
This will help ensure that the data collected will be able to provide the information you seek and shed useful
light on the research topic. Also, consider including a statistician in your plans for conducting the statistical
analysis once you have collected the data.
Data Types
Although as a branch of mathematics statistics involves numbers, many types of information can used for
purposes of statistical analysis. Before selecting a method of analysis, it is important to understand what type of
data you have. And, considering the method of analysis prior to conducting a study can help to ensure that the
data collected will provide the desired functionality. There are four main types of data: nominal, ordinal,
interval, and ratio:

Nominal: Numbers or words are used to label, classify, or categorize. For example, each student may be
represented by a student ID number. Or, maybe students are grouped according to a characteristic, such
as gender, with 1 representing male and 2 representing female. The numbers or words are
representations, not measures, and they have no inherent order (1 could just as easily represent female
and 2 male), so they cannot be used in mathematical calculations. However, nominal data can be used to
indicate similarities and differences.

Ordinal: Numbers or words are used to categorize information on the basis of order. Observations are
rank-ordered on the basis of some criterion, but the intervals between the rankings are not necessarily
equal. For example, a postgraduate degree is higher than a bachelor’s degree, which is higher than an
associate’s degree, which is higher than a high school diploma. Likert-item responses, letter grades, and
rubric scores are other examples of ordinal data. Ordinal data is used to indicate more than or less than
relationships.

Interval: Numbers represent equal intervals between points on a scale, but the scale lacks an absolute
zero. IQ scores, calendar years (e.g. 1983 and 2012), and temperatures are examples.

Ratio: Numbers represent equal intervals between points on a scale, and the scale has an absolute zero,
which allows measures of ratio to be taken. We can say that one item is two-thirds the size of another or
that one student took twice as long as another to complete a task. Height, weight, time intervals, and
counts are additional examples.
Analytic Approaches
Statistical analyses are often classified as parametric or nonparametric:
Parametric analyses, associated primarily with inferential statistics, can be used with interval or ratio data
that are normally distributed when the variance within the two groups being compared is the same or
similar. Parametric tests include t-tests and analysis of variance (ANOVA). Longitudinal and cross-sectional
studies typically employ parametric analyses.
10
Nonparametric analyses, associated more with descriptive statistics, can be used with nominal and ordinal
data. They are also useful for interval and ratio data that are not normally distributed and/or do not have
variances similar to those of the comparison populations. Chi square tests are non-parametric.
The decision chart below can be used to identify the statistical tests most applicable to your research project.
Figure 1: Statistical Test Decision-Making Flowchart
Adapted from Ruth, D., Practical Statistics for Educators, p. 227
You may have noticed that the flowchart above does not include any guidance for the use of ordinal data. This
is because statistical applications using ordinal data must take into account the degree to which the intervals
between can be assumed to be equal. In some cases, such as the example of educational attainment given in the
previous section, ordinal data is essentially nominal data with some order to it. While one might assign numbers
to progressive educational awards (e.g., 1 for High School, 2 for Associate’s, 3 for Bachelor’s, and 4 for Postgrad), it would make no sense to calculate a mean on the data (or to conduct mean comparisons using an
independent samples t-test). However, it might make sense to calculate the mode or to calculate median income
associated with each educational level.
11
In other cases, when the ranking strongly suggests a degree of continuity across intervals, ordinal data may be
treated more like interval data. For example, it is common practice, though not all agree it should be, to
calculate mean responses from Likert items. One sees this more in the social sciences and in qualitative research
than in the fields of math and science or in quantitative research.
Correlation studies are commonly used in educational research because they are very helpful in determining
whether a relationship exists and is worth exploring further. Fortunately, or unfortunately depending on how
much you love statistics, a variety of correlation coefficients are available for computation. The Pearson
correlation coefficient (a.k.a. the Pearson product-moment correlation, or just Pearson for short) is most widely
used, and indeed some analytic software programs offer only the Pearson. Generally speaking, however, it is the
type of data that determines which correlation study is most appropriate. Because the differences could be
important if you intend to publish your research, the following differentiations are provided:
Pearson is a nonparametric test and can be used when the data is not normally distributed and/or when you
want to see if there is a relationship between two interval or ratio variables. As indicated above, however,
Pearson is commonly used with normally distributed data as well.
Spearman’s rho (Spearman for short) is a parametric test, intended for use with normally distributed data.
You can use it to compute the correlation between two ordinal variables. However, the Spearman is most
appropriately used with ordinal data for which the ranking is highly detailed and/or strongly suggests a
degree of continuity across intervals. In other words, Spearman is good for analyzing ordinal data that
closely resembles interval data.
For ordinal data that is more like nominal data, having five or fewer rankings with some order to them
and/or having intervals that are not really approximately equal (such as Likert-item responses), a Gamma
correlation may be more appropriate.
Point-biserial correlation studies can be used to analyze dichotomous data. Note, however, that only one of
the variables can be dichotomous. Some data is naturally dichotomous, such as gender or Yes/No responses.
And, sometimes people artificially dichotomize data, particularly ordinal data (beware however that some
people frown on this practice). A common example of artificial dichotomization is taking Likert-items and
putting the “Agree” and “Strongly Agree” responses into one category while putting the “Disagree” and
“Strongly Disagree” responses in another. Another example would be putting all of the students who passed
with grades of C or better into one category and all the students who got D’s and F’s into another.
Refresher of Basic Concepts
The following list highlights some key statistical concepts. Additional definitions are provided in the glossary.
Far more extensive glossaries of statistical terminology can be accessed online at sites such as statistics.com and
stat.berkeley.edu.
Alternative
Hypothesis: Typically a statement made prior to a study asserting (hypothesizing) that there is or will be an
observable effect or difference. (Contrast with Null Hypothesis.)
Averages:
3 main measures of central tendency (though there are others)
Mean:
The arithmetic calculation of the sum of the numbers divided by the count of the
numbers. This is what most people mean (no pun intended) when they say “average.”
It is the balance point of the data. (A deviation is the difference between a data value
and the mean. With the mean as the balance point, the sum of the deviations above
the mean will cancel the sum of the deviations below the mean.)
12
Median: The middle value between the highest and the lowest of the numbers when the set is
arranged in order. For example, if the set is {2, 3, 3, 5, 6, 8, 9}, the median is 5. If you
have an even count of numbers, the median is the sum of the two in the middle
divided by 2. For example, if the set is {1, 1, 3, 5, 5, 6}, the median is calculated
(3+5) / 2 = 4.
Mode:
The most frequently occurring of the numbers. In the set {2, 3, 3, 5, 6, 8, 9}, the
mode is 3 because that is the only number that occurs twice. A set can have more than
one mode, as in {1, 1, 3, 5, 5, 6}, and if no number occurs more than once, there is no
mode at all. The mode is considered a measure of central tendency because in a moreor-less normally distributed population, the mode would tend to be at or near the
middle, as would the mean and median.
Correlation: A procedure used to quantify the relationship between two quantitative variables. In a positive
correlation, an increase in one variable corresponds to an increase in the other. In a negative
correlation, an increase in one corresponds to a decrease in the other. Correlations are
represented using correlation coefficients, ranging between -1 and 1, with zero indicating no
correlation at all and the strength of the correlation increasing toward each end of the range. A
correlation of 1 would suggest that an increase in one variable corresponds to an equivalent
increase in the other, a phenomenon that rarely seen in educational research.
In interpreting correlations, it is important to keep in mind that they do not necessarily
demonstrate causal relationships. Both variables could potentially be influenced by other
variables. For example, ACT scores correlate positively with college completion rates, but
having higher ACT scores does not make students more likely to finish college. Both are
influenced by prior educational experiences, socio-economic status, parental education,
cultural/family expectations, etc.
Data Plots:
Graphic representations of data sets, often taking the form of scatterplots, histograms (similar to
bar graphs), or box plots (a.k.a. box and whisker charts).
Effect Size:
A calculation that quantifies the magnitude of a difference between two groups. The effect size is
a measure of how far apart the true value of the parameter being tested and the one assumed in
the null hypothesis are. Failing to detect the difference is a type II error (interpreting the data as
not showing a difference when there really is one).
Sometimes effect size is presented instead of (or in addition to) calculations of statistical
significance, in which very large sample sizes tend to cause relatively minor differences to be
flagged as significant. Effect size is an indicator of whether the statistically significant
information is of practical significance.
Different formulas can be used to calculate effect size, but one way is to divide the difference
between the two group means by the standard deviation (of either group or of the two groups
averaged). Basically, this tells you how many standard deviations’ difference there is between
the two means. For example, an effect size of .70 tells you that the difference between the means
represents approximately 7/10 of a standard deviation. Some would consider this a large effect
size. However, different scales are used for interpretation, depending on the field of study. Some
interpret .25 as small, .50 as medium, and .80 as large. Others consider .10 small, .30 medium,
and .50 large.
13
Hypothesis
Testing:
A formal process of deciding whether to reject or not to reject a null hypothesis based on the
results of a study.
Level of
Significance
& Confidence
Level:
The level of significance is referred to as the alpha level. It is used in hypothesis testing and
should be chosen before the study is conducted. The level of significance quantifies the
investigator’s willingness to make a type I error (to interpret the data as showing a difference
when one truly does not exist) by identifying what p-value will be considered the ‘bar’ (or cutoff) for statistical significance.
After the data is collected, a p-value is calculated from the sample, and if it is smaller than the
level of significance, then the null hypothesis is rejected (and the relationship deemed to be
statistically significant).
Confidence Level is the likelihood that the corresponding confidence interval is one of the
intervals that contains the parameter being estimated. The confidence level is one minus the level
of significance. A confidence interval can be interpreted as a two-sided or two-tailed hypothesis
test at the alpha level of significance.
Normal
Curve:
A frequency plot showing the distribution of a characteristic over a population. Also known as a
bell curve, the graphic representation is bell-shaped, with the majority of people clustering near
the mean. A perfectly normal distribution is one in which mean, median, and mode all have the
same value and the tails (lowest and highest ends) remain slightly above the horizontal axis, as
shown in Figure 2.
Figure 2: Normal Distribution
Image retrieved from http://www.lesn.appstate.edu/olson/normal_curve.htm
Null
Hypothesis:
Specifies a value for a parameter. Usually, the choice is based on there being no relationship or
effect, no difference between groups, or no difference from a previous value. Like the
14
assumption of innocence until proven guilty, the null hypothesis places the burden upon the
investigator to show with significant evidence, beyond reasonable doubt, that a difference exists.
Outliers:
Data points that are significantly above or below the trend or deviate from the pattern. The
following scatterplot illustrates by pointing out two outliers:
Figure 3: Illustration of Outliers
Power:
An estimate of the sensitivity of the test in detecting a difference. The power of a test is the
chance that the test correctly rejects a null hypothesis, i.e., the chance of not making a type II
error.
p-Value:
The “p” in p-value stands for probability. A p-value represents the probability that an
experimental result is due to a random occurrence under a true null hypothesis. (It is essentially
the opposite of the probability of the phenomenon occurring in any given case, which is known
as the confidence interval.) P-values are presented with tests of statistical significance to
communicate the level of reliability of the statistical conclusions. The smaller the p-value, the
greater the reliability of the statistical significance.
P-values are usually derived from the “Sig” (significance) data in analytical output tables and are
written as “p ≤ 0.01,” which equates to a 99% confidence interval (i.e., a 99% or greater
probability of the relationship holding true), or as
“p ≤ 0.05,” which corresponds to a 95%
confidence interval. Data with a p-value > 0.05 is usually interpreted as not rising to the level of
statistical significance.
Standard
Deviation
(SD):
A measure of how spread out (dispersed) the numbers are from the mean, symbolized as σ (lower
case sigma) for the population standard deviation and “s” for the sample standard deviation. The
more closely grouped the numbers are around the mean, the smaller the SD will be. SD is
expressed in the same units used to express the mean, so it makes it relatively easy for people to
understand just how big the spread is above and below the mean. SD is the square root of the
mean of the squared differences from the mean value, i.e., the square root of the variance.
Other measures of the spread: Standard Error is used when the standard deviation for an
experiment is estimated from its data. Although the sample SD will appear in it, the correct form
will depend on the experiment. Margin of Error is the product of a probability factor, to insert
the confidence, and the standard error as a scale factor based on the data. It is the number behind
the plus or minus in a confidence interval and represents half the width of the interval. For
example, with 95% confidence, the true mean μ is thought to lie on the interval 5.25 ± .37.
15
Type I and
Type II
Errors:
Variance:
A type I error is a false positive where a true null hypothesis is rejected. A type II error is a false
negative where a false null hypothesis is not rejected.
Variance is way of measuring how spread out (or dispersed) the numbers are from the mean. The
obvious measure of the spread would be the mean of the deviations (the differences between the
data values and the mean); however, the mean is the balance point of the data, so the sum of the
deviations above the mean cancel those below, and the mean deviation therefore is always zero.
To avoid this problem, the deviations are squared. Variance, then, is the mean of the squared
deviations. This result is the variance for the population. There is an inherent error caused by
sampling. This is corrected for by dividing by one less than the number of data values to obtain
the variance for the experimental sample.
Analytic Software Programs
Open-source and public-domain platforms and online calculators are readily available to support most analytic
needs. Here are just a few examples:
Dataplot: National Institute of Standards and Technology
PSPP: The Free Technology Academy
Statistical Calculators: MathPortal.org
Sample Size Calculator: National Statistical Service
Sample Size Calculator: Creative Research Systems
Preliminary Data Analysis Tips
Clean up your data: Before beginning an in-depth analysis, look through the data for typos and errors in
formatting. If you are using Excel, be sure to format the cells consistently throughout each field. Look for
number fields formatted as text, and check decimal settings, time and date formatting, etc. If you are using
PSPP, SPSS, or some other statistics software, be sure the “Variables” view has the correct data type selected.
Scan the data for things like ID numbers with too many or too few digits or income figures with inconsistent
formatting (ranges, dollar signs, text, etc.). Consider sorting the records in ascending or descending order by
each data field to make errors more apparent.
Develop some first impressions: Look for general trends and/or groupings and anything else that stands out.
Often it is helpful to look at the data of interest in a scatter plot before running tests on it to see if any linear or
grouping trends are apparent, to check for outliers, etc. Having familiarized yourself with the data in this way
will stimulate ideas about what to explore and will help you interpret the output when you run your analyses.
Check to see if the requirements for the model you chose are met: For example, for a nearly normal model,
look for a single peak and symmetry in the data using a histogram or a normal probability plot. For a linear
regression, check for linearity using a scatterplot. After fitting the regression line to the data, look for structure
in the residuals that indicate problems in the original linear model.
16
GLOSSARY
ANOVA
Case Studies
Chi Square (χ2)
Content Validity
Construct
Construct Validity
Correlation
Cross-Sectional Analyses
Descriptive Statistics
Empirical Evidence
Hawthorne Effect
Inferential Statistics
Internal Validity
Inter-Rater Reliability
Interval Data
Longitudinal Analysis
Mean
Median
Analysis of Variance, a parametric test used to compare the means of two or
more independent samples and check for statistically significant differences.
Anecdotal reports prepared by trained observers. When numerous case studies
are analyzed together, the findings are often considered empirical evidence.
A non-parametric test used to compare the frequencies of counts, using
nominal or ordinal data, to expected frequencies. The Goodness of Fit Chi
Square is used for one independent variable, the Chi Square Test of
Independence for two independent variables.
The degree to which an instrument actually measures the content it is intended
to measure.
A theoretical entity or concept such as aptitude, creativity, engagement,
intelligence, interest, or motivation. Constructs cannot be directly measured, so
we assess indicators of their existence. For example, IQ tests assess prior
learning as an indicator of intelligence.
The ability of a research approach to actually represent the construct it
purports to address. Since constructs cannot be directly measured, construct
validity depends upon how much people can agree upon a definition of the
construct and how truly the indicators represent that definition.
A process of quantifying a relationship between two quantitative variables in
which an increase in one is associated with an increase in the other (a positive
correlation) or a decrease in the other (a negative correlation).
Studies comparing, at a single point in time, people representing different
stages of development, for the purpose of drawing inferences about how
people change/develop over time.
Tests used to quantitatively describe phenomena. (Compare nonparametric
analyses. Contrast inferential statistics.)
Usually refers to what has been systematically observed. May or may not be
obtained through scientific experimentation.
A phenomenon in which the behavior of participants in a study is affected by
their knowing they are being studied.
Tests used to quantify the degree to which generalizations can be made to a
population based on outcomes observed in a sample. (Compare parametric
analyses. Contrast descriptive statistics.)
The extent to which variables in a study can be controlled sufficiently to ensure
that any observed change is actually attributable to the intervention.
The degree of consistency in ratings among different evaluators. For example, if
five judges score a performance with nearly identical ratings, the inter-rater
reliability is said to be high.
Numbers on a scale comprised of equal intervals but having no absolute zero.
A comparison of outcomes over time within a given group, based on changes at
an individual level (versus differences in group averages). E.g., comparing pretest and post-test scores to assess learning gains.
The average, arithmetic mean, derived by dividing the sum of all scores by the
total number (count) of scores. It is the balance point of the data.
The midpoint, were 50% of scores are above it and 50% are below it (in position,
not value) when all of the scores are ranked in ascending order.
17
Mode
Nominal Data
Nonparametric Analyses
Null Hypothesis
Ordinal Data
Outliers
Parametric Analyses
Qualitative Evidence
Quantitative Evidence
Randomized, Controlled
Trials (RCTs)
Ratio Data
Reliability
Standard Deviation (SD)
t-Test
Validity
The most frequently occurring score.
Numbers (usually) represent individuals or categories with no associated order.
E.g., numbers assigned to ethnic groups or student ID numbers.
Studies using nominal and/or ordinal data or any data type that does not follow
a normal distribution and/or does not have variances similar to those of the
comparison population. Nonparametric tests do not support population
inferences. They are used primarily in qualitative research.
A statement to be tested through systematic investigation, asserting that no
relationship or effect exists between the variables being studied, that there is no
difference between groups, or that there is no difference from a previous value.
In scientific research, the assumption is that no claims can be made regarding
relationships between variables unless significant evidence exists to reject the
null hypothesis. Even so, a cause-and-effect relationship cannot be established.
Numbers (usually) represent observations that have an underlying order or
magnitude but the intervals cannot be assumed to be equal.
Data points that are noticeably outside the pattern of the point distribution.
Studies using interval or ratio data that are more-or-less normally distributed
when the variance within the two groups being compared is the same or
similar. Parametric analyses can be used to support generalizations/inferences.
(Compare inferential statistics.)
Evidence that cannot be directly measured, such as people’s perceptions,
valuations, opinions, effort levels, behaviors, etc. Many people mistakenly refer
to qualitative evidence that has been assigned numerical representations (e.g.,
Likert-scale ratings) as quantitative data. The difference is that the qualitative
evidence is inherently non-numerical.
Evidence that occurs in traditionally established, measurable units, such as
counts of students, grade point averages, percentages of questions correctly
answered, speed of task completion, etc.
A model for scientific experimentation. Participants are randomly selected and
divided into groups, experimental and control, for outcome comparison.
Numbers on a scale comprised of equal intervals and having an absolute zero.
Consistency in outcomes over time or under differing circumstances. For
example, if a student challenges a placement exam repeatedly without doing
any study in between and keeps getting essentially the same result, the test has
a high level of reliability.
A measure of the spread of scores about the mean. SD is the square root of the
variance. There are two, one for a population and one for a sample; these are
calculated slightly differently.
Originally derived for small samples, current practice is to use this distribution
when the population standard deviation is not known.
The ability of a research approach to measure what it is intended to measure.
An aptitude test in which cultural expectations or socio-economic level
significantly influence outcomes may have low validity.
18
REFERENCES
Adams, G., & Engelmann, S. (1996). Research on direct instruction. Seattle, WA: Educational Achiebement
Systems.
Grossen, B. (1996). The story behind Project Follow Through. Effective school practices, Vol. 15, 1. Retrieved
from http://darkwing.uoregon.edu/~adiep/ft/grossen.htm.
Institutional Review Board (2004). CNM institutional review board [PDF document]. Central New Mexico
Community College. Retrieved from
http://www.cnm.edu/depts/planning/instres/irb/documents/CNM_Institutional_Review_Board.pdf
National Institute of Health Office of Extramural Research. Protecting human research participants. [IRB
training PDF]. Retrieved from http://phrp.nihtraining.com/
U.S. Department of Health and Human Services (1993). Institutional review board guidebook. Retrieved from
http://www.hhs.gov/ohrp/archive/irb/irb_chapter1.htm.
Schneider, W. H. The establishment of institutional review boards in the U.S. background history. Indiana
University-Purdue University Indianapolis History Department. Retrieved from
http://www.iupui.edu/~histwhs/G504.dir/irbhist.html.
19
APPENDIX A: RECOMMENDATIONS FROM CNM’S IRB
https://www.cnm.edu/depts/planning/instres/irb/documents/RECOMMENDATIONS-FOR-RESEARCH.pdf
20
APPENDIX B: PREVIEW OF CNM IRB APPLICATION FORM
21
22
23
24
25
26
http://www.cnm.edu/depts/planning/instres/irb
27