Download Guide on Writing Methods and Results Sections

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Heritability of IQ wikipedia , lookup

Genealogical DNA test wikipedia , lookup

Transcript
Method Sections
The Method section is a detailed breakdown of the experiment, including your
subjects, research design, stimuli, equipment used, and what the subjects actually
did (the procedure). The idea is to give the reader enough information to be able
replicate the experiment.
Requirements
The Method section is often divided into subsections, such as Subjects, Design,
Stimuli, Equipment, and Procedure. Each subsection should provide only the
essential information needed to understand and reasonably replicate the
experiment. Very short subsections can be combined (e.g., Stimuli and
Equipment).
Subjects/Participants. State the number of participants (if human) or subjects (if
animals), who they were, and how they were selected.
Examples:
Participants
We randomly selected 16 UAB students from a Master’s level biostatistics course
to participate in exchange for extra credit.
Subjects
Subjects were 30 male pigtailed macaques (Macaca nemestrina) bred at the
Wisconsin National Primate Research Center (WNPRC) Breeding Colony,
Madison, Wisconsin. All animals were bred specifically for this project and were
shipped to the laboratory at 3-5 days of age. We randomly assigned subjects to
each condition.
Materials. This subsection may also be called Stimuli, Equipment, or Apparatus. It
briefly describes the equipment/materials used in the experiment.
Examples:
Eye movements were recorded using an NEC model 120 Eyetracker.
Procedure. Describe in sequence the procedures used.
Subjects were seated at a computer work station. After completing a demographic
questionnaire, they received written instructions that differed by condition. All
subjects were instructed to read a business letter and write a reply. Subjects in the
multiple draft condition were told to write an outline of a reply letter before writing
a final draft.
1
Design and Analysis. Identify and explain variables and their levels, and state
whether the variables are between-groups factors, within-subjects factors,
continuous predictors, or covariates.
Examples:
A 2 x 3 (Sex by Treatment) factorial design with Age as a covariate was a used.
Both Sex and Treatment were between-subjects factors.
We used a 2 x 4 repeated measures design with Sex as a between-subjects factors
and time of measurement as the within-subjects factors.
Multiple linear regression analysis was conducted to assess the influence of each
individual VKORC1 SNP on log-transformed maintenance dose after adjustment
for age, gender, BMI, clinic, income, education, health insurance, smoking status,
level of physical activity, alcohol intake, vitamin K intake, comorbid conditions
(e.g. CHF, renal failure and cancer) and drug interactions (e.g. amiodarone, statins,
NSAIDs, antiplatelet agents).
We performed logistic regression for data at the 10-week, 6-month, and 12-month
time points separately. We controlled for gender, ethnicity, prior smoking (three
ordinal categories), and baseline levels of motivation and expectancy to quit
smoking.
The cognitive readiness factor was regressed on all sociodemographic risk factors,
the parental involvement factor, the cumulative number of culture-related
activities, center-based programs attendance, and several higher order interactions.
All variables were centered to reduce multicollinearity among the predictors and
their interactions and to obtain more interpretable standardized regression
coefficients (Aiken & West, 1991). Because of the large sample size, very small
partial relations could be statistically significant. Therefore, only extremely small p
values were considered statistically significant (i.e., α = 0.001). Based on an
ordinary least squares (OLS) regression analysis of this model, nonsignificant
interactions and predictors were removed. However, nonsignificant predictors
remained in the model if their interactions with other predictors were statistically
significant (Aiken & West, 1991).
2
Results
This section presents the statistical analysis of the data collected. It is often less
than a page long.
Requirements
Condensed format. The Results section is the most condensed and standardized of
all the sections in the text of a lab report.
No data interpretation. Statistical results are presented but are usually not
discussed in this section. Discuss results in the Discussion section.
• Keep your hypotheses in mind while you write. Each result must refer to a stated
hypothesis.
• Describe all results that are directly related to your research questions or
hypotheses. Start with hypotheses you were able to support with significant
statistics before reporting nonsignificant trends. Then describe any additional
results that are more indirectly relevant to your questions.
• If you present many results (i.e., many variables or variables with many levels),
write a brief summary, then discuss each variable in separate subsections.
• Report main effects before reporting contrasts or interactions. Briefly mention
problems such as reasons for missing data, but save discussion of the problems for
the discussion section.
Use tables and figures to summarize data. Include descriptive statistics (such as
means and standard deviations or standard errors), and give significance levels of
any inferential statistics. The goal is to make your results section succinct and
quantitatively informative, with no extra words.
• For each test used, provide degrees of freedom, obtained value of the test, and the
probability of the result occurring by chance (p-value).
Here are examples of the results of a t-test and an F-test, respectively: t(23) =
101.2, p < .001; F(1,3489) = 7.943, p < .001
3
Reporting Results of Common Statistical Tests
The goal of the results section in an empirical paper is to report the results of the
data analysis used to test a hypothesis. The results section should be in condensed
format and lacking interpretation. Avoid discussing why or how the experiment
was performed or alluding to whether your results are good or bad, expected or
unexpected, interesting or uninteresting. This document is specifically about how
to report statistical results.
Every statistical test that you report should relate directly to a hypothesis. Begin
the results section by restating each hypothesis, then state whether your results
supported it, then give the data and statistics that allowed you to draw this
conclusion.
If you have multiple numerical results to report, it’s often a good idea to present
them in a figure (graph) or a table.
In reporting the results of statistical tests, report the descriptive statistics, such as
means and standard deviations, as well as the test statistic, degrees of freedom,
obtained value of the test, and the probability of the result occurring by chance (pvalue). Test statistics and p-values should be rounded to two decimal places. All
statistical symbols that are not Greek letters should be italicized (M, SD, t, p, etc.).
When reporting a significant difference between two conditions, indicate the
direction of this difference (i.e. which condition was more/less/higher/lower than
the other condition(s). Assume that your audience has a professional knowledge of
statistics. Don’t explain how or why you used a certain test unless it is unusual.
p-values
There are two ways to report p-values. One way is to use the alpha level (the a
priori criterion for the probability of falsely rejecting your null hypothesis), which
is typically 0.05 or 0.01. Example: F(1, 24) = 44.4, p < 0.01. You may also report
the exact p value (the a posteriori probability that the result that you obtained, or
one more extreme, occurred by chance). Example: t(33) = 2.10, p = 0.03. If your
exact p-value is less than .001, it is conventional to state merely p <.001. If you
report exact p-values, state early in the results section the alpha level used as a
significance criterion for your tests. Example: “We used an alpha level of 0.05 for
all statistical tests.”
4
EXAMPLES
Reporting a significant single sample t-test (μ ≠ μ0):
Students taking statistics courses in Public Health at the University of Alabama at
Birmingham reported studying significantly more hours for tests (M = 121, SD =
14.2) than did UAB graduate students in general, t(33) = 2.10, p = 0.034.
Reporting a significant t-test for dependent groups (μ1 ≠ μ2):
Results indicate a significant preference for pecan pie (M = 3.45, SD = 1.11) over
cherry pie (M = 3.00, SD = 0.80), t(15) = 4.00, p = 0.001.
Reporting a non-significant t-test for independent groups (μ1 ≠ μ2):
UAB students taking statistics courses in the School of Public Health had higher
Anxiety scores (M = 121, SD = 14.2) than did those taking statistics courses in
other graduate majors (M = 117, SD = 10.3); however, this difference was not
statistically significant t(44) = 1.23, p = 0.09.
Reporting a significant t-test for independent groups (μ1 ≠ μ2):
Over a two-day period, participants drank significantly fewer drinks in the
experimental group (M= 0.667, SD = 1.15) than did those in the wait-list control
group (M= 8.00, SD= 2.00), t(4) = -5.51, p = 0.005.
Reporting a significant omnibus F-test for a one-way ANOVA:
An analysis of variance showed that the effect of noise was significant, F(3,27) =
5.94, p = 0.007. Post hoc analyses using the Scheffé’s criterion for significance
indicated that the average number of errors was significantly lower in the white
noise condition (M = 12.4, SD = 2.26) than in the other two noise conditions
(traffic and industrial) combined (M = 13.62, SD = 5.56), F(3, 27) = 7.77, p =
0.042.
A one-way analyses of variance (ANOVA) showed that the gene/status groups
significantly differed in their responses to the total scores on the 5-item “Attitudes
toward genetic research scale” [F(3,178) = 3.57, p = 0.0153]. Table 3 shows the
means and standard deviations (SD) for the responses to the questions and the total
score for each gene/status group. Tukey’s HSD was used to make post hoc
pairwise comparisons. These follow-up analyses showed that the NEG and
SYMPT groups responded in a very similar fashion (p > 0.05) while the ASYMPT
(p = 0.014) and AR (p = 0.008) groups have lower total score means. The AR
group responded lower than the ASYMPT group (p =0.024) indicating that they
would be less likely than all other groups to allow their own children to participate
in observational HD research that included a yearly neurological examination.
5
Reporting tests of a priori hypotheses in a multi-group study:
Tests of the four a priori hypotheses were conducted using Bonferroni adjusted
alpha levels of 0.0125 per test (0.05/4). Results indicated that the average number
of errors was significantly lower in the silence condition (M = 8.11, SD = 4.32)
than were those in both the white noise condition (M = 12.4, SD = 2.26), F(1, 27) =
8.90, p = 0.011 and in the industrial noise condition (M = 15.28, SD = 3.30), F(1,
27) = 10.22, p = 0.007. The pairwise comparison of the traffic noise condition with
the silence condition was non-significant. The average number of errors in all noise
conditions combined (M = 15.2, SD = 6.32) was significantly higher than those in
the silence condition (M = 8.11, SD = 3.30), F(1, 27) = 8.66, p = 0.009.
Reporting results of major tests in factorial ANOVA; non-significant
interaction:
Attitude change scores were subjected to a two-way analysis of variance having
two levels of message discrepancy (small, large) and two levels of source expertise
(high, low). All effects were statistically significant at the 0.05 significance level.
The main effect of message discrepancy was statistically significant, F(1, 24) =
44.4, p < 0.001, indicating that the mean change score was significantly greater for
large-discrepancy messages (M = 4.78, SD = 1.99) than for small discrepancy
messages (M = 2.17, SD = 1.25). The main effect of source expertise was also
significant, F(1, 24) = 25.4, p < 0.01, indicating that the mean change score was
significantly higher in the high-expertise message source (M = 5.49, SD = 2.25)
than in the low-expertise message source (M = 0.88, SD = 1.21). The interaction
effect was non-significant, F(1, 24) = 1.22, p > 0.05.
Reporting results of major tests in factorial ANOVA; significant interaction:
A two-way analysis of variance yielded a main effect for the diner’s gender,
F(1,108) = 3.93, p < 0.05, such that the average tip was significantly higher for
men (M = 15.3%, SD = 4.44) than for women (M = 12.6%, SD = 6.18). The main
effect of touch was non-significant, F(1, 108) = 2.24, p > 0.05. However, the
interaction effect was significant, F(1, 108) = 5.55, p < 0.05, indicating that the
gender effect was greater in the touch condition than in the non-touch condition.
Reporting the results of a chi-square test of independence:
A chi-square test of independence was performed to examine the relation between
religion and college interest. The relation between these variables was significant,
χ2 (df = 2, N = 170) = 14.14, p < 0.01. Catholic teens were less likely to show an
interest in attending college than were Protestant teens.
6
Reporting the results of a chi-square test of goodness of fit:
A chi-square test of goodness-of-fit was performed to determine whether the three
sodas were equally preferred. Preference for the three sodas was not equally
distributed in the population, χ2 (df = 2, N = 55) = 4.53, p < 0.05.
Reporting the results of a Multiple Regression Analysis:
A multiple regression analysis yielded a statistically significant [F(11,2148) =
30.48, p < 0.0001; R2 = 0.52] model. Of the sociodemographic risk factors, the
parents’ language and education–income factors were statistically significant,
accounting for 14.7% and 7.2% unique variance, respectively. Also, center-based
preschool program attendance and the parental involvement factor were
significantly related to cognitive readiness and accounted for 11.8% and 5.2% of
unique variance, respectively. A statistically significant two-way interaction
between the education–income factor and number of culture-related activities
accounted for 0.3% of unique variance (see Table 1).
Logistic Regression
The results of the logistic regression analysis showed that after controlling for
other confounding factors males were twice as likely to be unavailable for a 10week follow-up interview as females (OR=2.01; 95% CI, 1.37-2.94; p = 0.002).
The Completers Only results (Table 4) show that the Treatment led to significantly
more reports of smoking cessation at 10-weeks. At 10-weeks, students who
completed the N-O-T program were approximately 5.7 times more likely to report
that they had quit smoking than counterparts in the comparison condition, not
including those who dropped out. These results translated into predicted quit rates
of 11.0% and 5.0% for the N-O-T and comparison groups, respectively.
7
Power Analysis
Based on preliminary data, we assume the standard deviation (SD) of the DAS is
1.1 at both baseline and follow-up and the base rate change in DAS is from 5.0 to
4.5. If there were zero correlation between the baseline and the follow-up DAS,
then the SD of the change score would be 1.56. Assuming a correlation of 0.5, the
change score SD would be 1.10. For a stronger of 0.7, the change score SD would
be 0.85. For a weaker of 0.3, the change score SD would be 1.30. Assuming that
the genetic variant will lead to a change in DAS from 5.0 to 4.0 and the genetic
effect is additive, then the wild type genotype would be expected to have the base
rate change from 5.0 to 4.5, the homozygote variant genotype would be expected
to change from 5.0 to 4.0, and the heterozygote genotype would be expected to
change from 5.0 to 4.50. The 1 degree-of-freedom (df) test for the additive effect
would have statistical power of 79.6% assuming a change score correlation of 0.5.
For the most conservative situation (zero correlation) the power could be as low as
50.2%. For a stronger change score correlation of 0.7 the power would be 95.0%.
For a weaker change score correlation of 0.3 the power would be 65.5%.
The loading dose, maintenance dose, stability (INR > 4) will be compared across
the 3 established CYP2C9 genotype groups (i.e., Extensive, Intermediate, and Poor
metabolizers) using traditional analysis of variance (ANOVA) procedures. Based
on the maintenance dosage and allele frequency data reported by Higashi et al.
(2002), a power analysis was conducted. The results of the power analysis indicate
that in order to detect differences among these three groups at a significance level
of α = 0.05 with 0.80 statistical power, sample sizes of 171 Caucasians and 200
African-American patients will be necessary. It should be noted that these power
analyses were based on parametric (normal theory) statistics. If there are
substantial departures from normality, non-parametric procedures, which are often
more powerful with non-normal data, may be used.
8