Download Graph A

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Time series wikipedia , lookup

Student's t-test wikipedia , lookup

Categorical variable wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Skills in Biology
Hypotheses and Predictions
Scientific knowledge grows through a process called the scientific method. This process involves observation and
measurement, hypothesizing and predicting, and planning and executing investigations designed to test formulated
hypotheses. A scientific hypothesis is a tentative explanation for an observation, which is capable of being tested
by experimentation. Hypotheses lead to predictions about the system involved and they are accepted or rejected
on the basis of findings arising from the investigation. Rejection of the hypothesis might lead to new alternative
explanations for the observations. Acceptance of the hypothesis as a valid explanation is not necessarily
permanent: explanations may be rejected at a later date in light of new findings.
Making Observations
These may involve the observation of behaviors in wild populations,
physiological measurements made during previous experiments or accidental
results obtained when seeking answers to completely unrelated questions.
Testing the Prediction
Asking Questions
The observations lead to the
formation of questions about
the system being studied.
The predictions are
tested out in the
practical part of an
investigation.
Formulating a Hypothesis
Features of a sound hypothesis:

It is based on observations and
prior knowledge of the system

It offers an explanation for an
observation.

It refers to only one
independent variable.

It is written as a definite
statement and not as a question

It is testable by experimentation

It leads to predictions about the
system.
Designing an investigation
Investigations are planned so
that the predictions about the
system made in the
hypothesis can be tested.
Investigations may be
laboratory or field based.
Making Predictions
Generating a Null
Hypothesis
A hypothesis based on observations
Based on a hypothesis,
predictions (expected,
repeatable outcomes) can be
generated about the behavior
of the system. Predictions may
be made on any aspect of the
material of interest, e.g. how
different variables (factors)
relate to each other.
is used to generate the null
hypothesis (Ho); the hypothesis of
no difference or no effect.
Hypotheses are expressed in the
null form for the purposes of
statistical testing. Ho may be
rejected in favor of accepting the
alternative hypothesis, HA.
Page 1 of 36
Exercise 1.1
Questions and Hypotheses
This exercise explores the nature of scientific questions and hypotheses. A hypothesis offers a tentative
explanation to questions generated by observations. Hypotheses are often constructed in a form that allows them
to be tested statistically. For every hypothesis, there is a corresponding null hypothesis; a hypothesis against the
prediction. Predictions are tested with laboratory and field experiments and carefully focused observations. For a
hypothesis to be accepted it should be possible for anyone to test the predictions with the same methods and get a
similar result each time.
Part A—Asking Questions
Scientists are characteristically curious and creative individuals whose curiosity is directed toward
understanding the natural world. They use their study of previous research or personal observations of natural
phenomena as a basis for asking questions about the underlying causes or reasons for these phenomena. For a
question to be pursued by scientists, the phenomenon must be well defined and testable. The elements must be
measurable and controllable.
There are limits to the ability of science to answer question. Science is only one of many ways of knowing
about the world in which we live. Consider, for example, this question: Do excessively high temperatures cause
people to behave immorally? Can a scientist investigate this question? Temperature is certainly a well-defined,
measurable and controllable factor, but morality of behavior is not scientifically measurable. We probably could
not even reach a consensus on the definition. Thus, there is no experiment that can be performed to test the
question. Which of the following do you think can be answered scientifically? Place a check beside the questions
you think can be answered scientifically.
_______1. Does binge drinking cause more brain damage in teenagers than in adults?
_______2. Is genetically modified corn safe to eat?
_______3. Do children who wash their hands often and bathe daily have a greater risk of asthma than those who
wash their hands less often and bathe every other day?
_______4. Should endangered species be cloned to prevent extinction?
_______5. What is the function of spines on cacti?
_______6. Did the 19 year old college student develop ulcers because of his stress and fast food diet?
How did you decide which questions can be answered scientifically?__________________________
________________________________________________________________________________
________________________________________________________________________________
Page 2 of 36
Part B—Developing Hypotheses
As questions are asked, scientists attempt to answer them by proposing possible explanations. Those proposed
explanations are called hypotheses. A hypothesis tentatively explains something observed. It proposes an answer
to a question. Consider question 5, preceding. One hypothesis based on this question might be “If spines on cacti
prevent animals from eating the cacti, then cacti will have spines.” The hypothesis has suggested a possible
explanation for the observed spines.
A scientifically useful hypothesis must be testable and falsifiable . To satisfy the requirement that a hypothesis is
falsifiable, it must be possible that the test results do not support the explanation. In our example, if spines are
removed from test cacti and the plants are not eaten by animals then the hypothesis has been falsified. Even
though the hypothesis can be falsified, it can never be proved to be true. The evidence from an investigation can
only provide support for the hypothesis. In our example, if cacti without spines were eaten, the hypothesis has not
been proved, but has been supported by evidence. Other explanations still must be excluded, and new evidence
from additional experiments and observations might falsify this hypothesis at a later date. In science seldom does a
single test provide results that clearly support or falsify a hypothesis. In most cases the evidence serves to modify
the hypothesis or the conditions of the experiment.
Science is a way of knowing about the natural world that involves testing hypotheses or explanations. The scientific
method can be applied to the unusual and the commonplace. You use the scientific method when you investigate
why your once white socks are now blue. Your hypothesis might be that your blue jeans and socks were washed
together, an assertion that can be tested through observations and experimentation.
Students often think that controlled experiments are the only way to test a hypothesis. The test of a hypothesis
may include experimentation, additional observations or the synthesis of information from a variety of sources.
Many scientific advances have relied on other procedures and information to test hypotheses. For example, James
Watson and Francis Crick developed a model that was their hypothesis for the structure of DNA. Their model could
only be supported if the accumulated data from a number of other scientists were consistent with the model.
Actually, their first model (hypothesis) was falsified by the work of Rosalind Franklin. Their final model was tested
and supported not only by the ongoing work of Franklin and Maurice Wilkins but also by research previously
published by Erwin Chargaff and others. Watson and Crick won the Nobel Prize for their scientific work. They did
not perform a controlled experiment in the laboratory but tested their powerful hypothesis through the use of
existing evidence from other research. Methods other than experimentation are acceptable in testing hypotheses.
Think about other areas of science that require comparative observations and the accumulation of data from a
variety of sources, all of which must be consistent with and support hypotheses or else be inconsistent and falsify
hypotheses.
The information in your biology textbook is often thought of as a collection of facts, well understood and correct. It
is true that much of the knowledge of biology has been derived through scientific investigations, has been
thoroughly tested, and is supported by strong evidence. However, scientific knowledge is always subject to novel
experiments and new technology, any aspect of which may result in modification of our ideas and a better
understanding of biological phenomena. The structure of the cell membrane is an example of the self-correcting
nature of science. Each model of the membrane has been modified as new results have negated one explanation
and provided support for an alternative explanation.
Page 3 of 36
Application
Before scientific questions can be answered, they must first be converted to hypotheses, which can be tested. For
each of the following questions, write an explanatory hypothesis. Recall that the hypothesis is a statement that
explains the phenomenon you are interested in investigating. For the purposes of our class, you will need to try to
state the hypothesis as an “if…then” statement.
1. Does regular interaction with pets improve the health of the elderly?
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
2. What effect do high concentrations of the industrial pollutant PCB (polychlorinated biphenyl) have on killer
whale reproduction?
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
Scientists often propose and reject a variety of hypotheses before they design a single test. Which of the following
statements would be useful as scientific hypotheses and could be investigated using scientific procedures? Give the
reason for each answer by stating whether it could possibly be falsified and what factors are measurable and
controllable.
1. The number of fungiform papillae (bumps on the tongue) affects taste sensitivity.
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
2. Inflated self-esteem in young males increases the odds of aggression.
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
3. Exposure to environmental pollutants produces feminization in newly hatched male
alligators.
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
___________________________________________________________________________
Page 4 of 36
Exercise 1.2
Designing Experiments to Test Hypotheses
The most creative aspect of science is designing a test of your hypothesis that will provide unambiguous evidence
to falsify or support a particular explanation. Scientists often design, critique and modify a variety of experiments
and other tests before they commit the time and resources to perform a single experiment. In this exercise, you
will follow the procedure for experimentally testing hypotheses, but it is important to remember that other
methods, including observation and the synthesis of other sources of data, are acceptable in scientific
investigations. An experiment involves defining variables, outlining a procedure and determining controls to be
used as the experiment is performed. Once the experiment is defined, the investigator predicts the outcome of the
experiment based on the hypothesis.
Read the following description of a scientific investigation of the effects of sulfur dioxide on soybean reproduction.
Then in Lab Study A you will determine the types of variables involved, and in Lab Study B, the experimental
procedure for this experiment and others.
Investigation of the Effect of Sulfur Dioxide on Soybean Reproduction
Agricultural scientists were concerned about the effect of air pollution, sulfur dioxide in particular, on soybean
production in fields adjacent to coal powered plants. Based on initial investigations, they proposed that sulfur
dioxide in high concentrations would reduce reproduction in soybeans. They designed an experiment to test this
hypothesis (figure 1.1). In this experiment, 48 soybean plants, just beginning to produce flowers, were divided into
two groups, treatment and no treatment. The 24 treated plants were divided into four groups of six. One group of
6 treated plants was placed in a fumigation chamber and exposed to 0.6 ppm of sulfur dioxide for 4 hours to
simulate sulfur dioxide emissions from a power plant. The experiment was repeated on the remaining three
treated groups. The no treatment plants were placed similarly in groups of 6 in a second fumigation chamber and
simultaneously exposed to filtered air for 4 hours. Following the experiment, all plants were returned to the
greenhouse. When the beans matured, the number of bean pods, the number of seeds per pod, and the weight of
the pods were determined for each plant.
Page 5 of 36
Part A—Determining the Variables
Read the description of each category of variable; then identify the variable described in the preceding
investigation. The variables in an experiment must be clearly defined and measurable. The investigator will
identify and define dependent, independent and controlled variables for a particular experiment.
The Dependent Variable
Within the experiment, one variable will be measured or counted or observed in response to the experimental
conditions. This variable is the dependent variable. For soybeans, several dependent variables are measured, all of
which provide information about reproduction. What are they?
______________________________________________________________________________________________
______________________________________________________________________________________________
______________________________________________________________________________________________
The Independent Variable
The scientist will choose one variable, or experimental condition, to manipulate. This variable is considered the
most important variable by which to test the investigator’s hypothesis and is called the independent variable.
What was the independent variable in the investigation of the effect of sulfur dioxide on soybean
reproduction?__________________________________________________________________________________
______________________________________________________________________________________________
Can you suggest other variables that the investigator might have changed that would have had an effect on the
dependent variables?____________________________________________________________________________
______________________________________________________________________________________________
Although other factors, such as light, temperature, time and fertilizer, might affect the dependent variables, only
one independent variable is usually chosen. Why is it important to have only one independent variable?
_____________________________________________________________________________________________
_____________________________________________________________________________________________
Why is it acceptable to have more than one dependent variable?________________________________________
______________________________________________________________________________________________
______________________________________________________________________________________________
The Controlled Variable or Constants
Consider the variables that you identified as alternative independent variables. Although they are not part of the
hypothesis being tested in this investigation, they would have significant effects on the outcome of this experiment.
These variables must, therefore, be kept constant during the course of the experiment. They are known as the
controlled variables. The underlying assumption in experimental design is that the selected independent variable
is the one affecting the dependent variable. This is only true if all other variables are controlled.
What are the controlled variables in this experiment?
______________________________________________________________________________________________
______________________________________________________________________________________________
______________________________________________________________________________________________
What variables other than those you may have already listed can you now suggest?
______________________________________________________________________________________________
______________________________________________________________________________________________
Page 6 of 36
Part B—Choosing or Designing the Procedure
The procedure is the stepwise method, or sequence of step, to be performed for the experiment. It should be
recorded in a laboratory notebook before initiating the experiment and any exceptions or modifications should be
noted during the experiment. The procedures may be designed from research published in scientific journals,
through collaboration with colleagues in the lab or other institutions, or by means of one’s own novel and creative
ideas. The process of outlining the procedure includes determining control treatment(s), levels of treatments and
numbers of replications.
Level of Treatment
The value set for the independent variable is called the level of treatment. For this experiment, the value was
determined based on previous research and preliminary measurements of sulfur dioxide emissions. The scientists
may select a range of concentrations from no sulfur dioxide to an extremely high concentration. The levels should
be based on knowledge of the system and the biological significance of the treatment level. What was the level of
treatment in the soybean experiment?
____________________________________________________________________________________________
____________________________________________________________________________________________
Replication
Scientific investigations are not valid if the conclusions drawn from them are based on one experiment with one or
two individuals. Generally, the same procedure will be repeated several times (replication), providing consistent
results. Notice that scientists do not expect exactly the same results inasmuch as individuals and their responses
will vary. Results from replicated experiments are usually averaged and may be further analyzed using statistical
tests. Describe replication in the soybean experiment.
_____________________________________________________________________________________________
______________________________________________________________________________________________
______________________________________________________________________________________________
Control or Control Group
The experimental design includes a control in which the independent variable is held at an established level or is
omitted. The control or control treatment serves as a benchmark that allows the scientist to decide whether the
predicted effect is really due to the independent variable. In the case of the soybean experiment, what was the
control treatment?
______________________________________________________________________________________________
______________________________________________________________________________________________
What is the difference between the control and the controlled variables discussed previously?
______________________________________________________________________________________________
______________________________________________________________________________________________
______________________________________________________________________________________________
Page 7 of 36
Part C—Writing Hypotheses
When rigorous science is conducted, experimental design generally includes a Null Hypothesis as well as
Alternative Hypotheses. A Null Hypothesis is one that can be tested statistically and generally states that there is
no effect on the experimental organisms when exposed to the treatment. The Alternative Hypothesis predicts
what the investigator thinks will happen in the experiment. The prediction is always based on the particular
experiment designed to test a specific hypothesis. Hypotheses/Predictions are written in the form of if/then
statements: “If [the independent variable] does this, then the [dependent variable] does this” For example, “if
cactus spines prevent herbivory, then removal of the spines will result in cacti being eaten by animals.” Making a
prediction provides a critical analysis of the experimental design. If the predictions are not clear, the procedure can
be modified before beginning the experiment. For the soybean experiment, the idea being tested was: “Exposure
to sulfur dioxide reduces reproduction.” State the Null Hypothesis and an Alternative Hypothesis.
Null (HO):
______________________________________________________________________________________________
______________________________________________________________________________________________
Alternative (HA):
______________________________________________________________________________________________
______________________________________________________________________________________________
To evaluate the results of the experiment, the investigator always returns to the prediction. If the results match the
prediction, then the hypothesis is supported. If the results do not match the prediction, then the hypothesis is
rejected. Either way, the scientist has increased knowledge of the process being studied. Many times the rejection
of a hypothesis can provide more information than confirmation, since the ideas and data must be critically
evaluated in light of new information. In the soybean experiment, the scientist may learn that the prediction is true
(sulfur dioxide does reduce reproduction at the concentration tested). As the next step, the scientist may now wish
to identify the particular level at which the effect is first demonstrated.
Page 8 of 36
Case Study: Catalase Activity
Catalase is an enzyme that converts hydrogen peroxide
(H2O2) to oxygen and water. An experiment investigated the
effect of temperature on the rate of the catalase reaction.
Small (10 cm3) test tubes were used for the reactions, each
containing 0.5 cm3 of enzyme and 4 cm3 of hydrogen
peroxide. Reaction rates were assessed at four
temperatures (10° C, 20° C, 30° C and 60° C). For each
temperature there were two reaction tubes (i.e. tubes 1 and
2 were both kept at 10° C). The height of oxygen bubbles
present after one minute of reaction was used as a measure
of the reaction rate; a faster reaction rate produced more
bubbles. The entire experiment, involving eight tubes, was
repeated on two separate days.
1. What is the purpose of this experiment?
2. Construct good hypotheses for this experiment.
a. Null Hypothesis
b. Alternative Hypothesis
3. What is the independent variable?________________________________________
4. What is the range of values for the independent variable?_____________________
5. What is the dependent variable?_________________________________________
6. Each temperature represents a: (circle one) treatment/sample/trial
7. Why would it have been desirable to have included an extra tube containing no enzyme?
8. Identify three variables that might have been controlled in this experiment and how they could have been
monitored.
a. _______________________________________
b. _______________________________________
c. _______________________________________
9. Explain why controlled variables should be monitored carefully.
Page 9 of 36
Recording Results
Designing a table to record your results is part of planning your investigation. Once you have collected all your
data, you will need to analyze and present it. To do this, it may be necessary to transform your data first, by
calculating a mean or a rate. An example of a table for recording results is presented below. This example
represents a fairly standard layout. The labels on the columns and rows are chosen to represent the design
features of the investigation. The first column contains the entire range chosen for the independent variable.
There are spaces for multiple sampling units, repeats (trials) and averages.
Dependent variable
and its units
Space for
3 plants
at each
pH
Range of
values for
independe
nt variable
pH 3
pH 5
pH 7
pH 9
Plant #
1
2
3
Av.
1
2
3
Av.
1
2
3
Av.
1
2
3
Av.
Space for repeats of the experimental
design (in this case, three trials).
Trial 1 (plant mass in grams)
Day no.
0
2
4 6 8
10
0.5 1.1
0.6 1.2
0.7 1.3
0.6 1.2
0.6 1.4
0.8 1.7
0.5 1.9
0.6 1.7
0.7 1.3
0.8 1.3
0.4 1.7
0.6 1.4
0.5 0.6
0.9 1.1
0.8 1.0
0.7 0.9
Trial 2 (plant mass in grams)
Day no.
0
2
4
6
8
10
Trial 3 (plant mass in grams)
Day no.
0
2
4
6
8
10
Table 1.1 Plant mass in grams for three trials of plants exposed to four different pH levels.
Page 10 of 36
Case Study
Carbon Dioxide levels in a respiration chamber
A datalogger was used to monitor the concentrations of
carbon dioxide in respiration chambers containing five green
leaves from one plant species. The entire study was
performed in conditions of full light and involved three
identical set ups. The CO2 concentrations were measured
every minute, over a period of ten minutes, using a CO2
sensor. A mean CO2 concentration (for the three set ups) was
calculated. The study was carried out two more times, two
days apart.
Design a table to collect data from the case study described. Include space for individual results and
averages from the three set ups.
Page 11 of 36
Graphing
1. Every graph presented should have a good title that tells what the graph is about. A good title
tells exactly what information the author is trying to present with the graph and usually includes
both the dependent and independent variable.
2. Axes should be carefully labeled with numbers, labels and units. Intervals marked on line graphs
should be equal in size and the scale should appropriately fit data.
3. Only plot points that have been measured. Plot points accurately!
4. Guidelines for Bar Graphs
5. Bar graphs are appropriate for data that are nonnumerical and discrete for at least one variable,
i.e. they are grouped into separate categories.
There are no dependent or independent
variables.
a. Data are collected for discontinuous, nonnumerical categories (i.e. place, color and
species) so the bars do not touch.
b. Data values may be entered on or above the
bars if you wish
c. Multiple sets of data can be displayed side by
side for direct comparison (i.e. males and
females in the same age group).
d. Axes may be reversed so that the categories
are on the x axis, i.e. the bars can be vertical or horizontal.
6. Guidelines for Histograms
a. Histograms are plots of continuous data
and are often used to represent frequency
distributions where the y axis shows the
number of times a particular measurement
or value was obtained. For this reason,
they are often called frequency histograms.
b. The data are numerical and continuous (i.e.
height or weight) so the bars touch.
c. The x axis usually records the class interval.
The y axis usually records the number of
individuals in each class interval
(frequency).
Page 12 of 36
7. Guidelines for Circle Graphs
a. Circle graphs can be used instead of bar graphs,
generally in cases where there are six or fewer
categories involved. A circle graph provides
strong visual impact of the relative proportions
in each category, particularly where one of the
categories is very dominant.
b. The data for one variable are discontinuous
(non-numerical or categories)
c. The data for the dependent variable are usually
in the form of counts, proportions or
percentages.
8. Guidelines for Line Graphs
a. Line graphs are used when one variable (the independent variable) affects another, the
dependent variable.
b. Data must be continuous for both variables.
c. The dependent variable is usually the
biological response.
d. The independent variable is often time or
the experimental treatment.
e. In cases where there is an implied trend
(e.g. one variable increases with the other),
a line of best fit is usually plotted through
the data points to show the relationship.
f. If fluctuations in the data are likely to be
important (e.g. with climate and other
environmental data) the data points are
usually connected directly (point to point).
g. Take care to NOT extend the line past the last
data point; if the line extends past the last
data point, it should be a dashed line or have
an arrow point on the end to indicate that
data is being extrapolated.
h. Can be drawn with measure of error. The
data are presented as points (the calculated
means), with bars above and below,
indicating a measure of variability or spread
in the data (e.g. standard error, standard
deviation or 95% confidence intervals).
i. Where no error value has been calculated, the scatter can be shown by plotting the individual
data points vertically above and below the mean. By convention, bars are not used to indicate
the range of raw values in a data set.
Page 13 of 36
9. Plotting Multiple Data Sets
a. A single figure can be used to show two or more data sets, i.e. more than one curve can be
plotted per set of axes. This type of
presentation is useful when you want to
visually compare the trends for two or more
treatments, or the response of one species
against the response of another.
b. If the two data sets use the same
measurement units and a similar range of
values for the independent variable, one scale
on the Y axis is used.
c. If the two data sets use different units and/or
have a very different range of values for the
independent variable, two scales for the y axis
are used. The scales can be adjusted if
necessary to avoid overlapping plots.
d. The two curves must be distinguished with a key.
10. Guidelines for Scatter Graphs
a. The data for this graph must be continuous for
both variables.
b. There is no independent variable, but the
variables are often correlated, i.e. they vary
together in some predictable way.
c. Scatter graphs are useful for determining the
relationship between two variables.
d. The points on the graph need not be
connected, but a line of best fit is often drawn
through the points to show the relationship
between the variables.
Page 14 of 36
11. Graph each of the sets of data below. Be sure to select the correct type of graph and follow
the guidelines listed above. Each graph should be hand drawn on a separate sheet of graph paper
and stapled together in the order listed below.
Graph A
Counts of eight mollusk species were made from a series of quadrat (1 square meter) samples at two
sites on a rocky shore. The summary data are presented below.
Table 1. Average abundance of 8 molluscan species from two
sites along a rocky shore.
Species
Ornate limpet
Radiate limpet
Limpet sp. A
Cats-eye
Top shell
Limpet sp. B
Limpet sp. C
Chiton
Mean (no. m2)
Site 1
Site 2
21
30
6
34
38
0
6
2
2
4
57
39
0
2
1
3
Graph B
A census of a deer population on an island indicated a population of 2000 animals in 1960. In 1961,
ten wolves (natural predators of deer) were brought to the island in an attempt to control deer
numbers. Over the next nine years, the number of deer and wolves were monitored. The results of
these population surveys are presented below.
Table 2. Results of a population survey on an island
Time (yr)
Wolf Numbers
Deer Numbers
1961
10
2000
1962
12
2300
1963
16
2500
1964
22
2360
1965
28
2244
1966
24
2094
1967
21
1968
1968
18
1916
1969
19
1952
Page 15 of 36
Graph C
The results shown below were collected in a study investigating the effect of temperature on the
activity of an enzyme.
Table 3. An enzyme’s activity at different temperatures.
Temperature °C
10
20
30
35
40
45
50
60
Rate of reaction
(mg of product formed per
minute)
1.0
2.1
3.2
3.7
4.1
3.7
2.7
0
Graph D
In a sampling program, the number of perch and trout in a hydro-electric reservoir were monitored
over a period of time. A colony of black shag was also present. Shags take large numbers of perch
and (to a lesser extent) trout. In 1960-61, 424 shags were removed from the lake during the nesting
season and nest counts were made every spring in subsequent years. In 1971, 60 shags were
removed from the lake, and all existing nests dismantled.
Use a vertical arrow on the graph to indicate the point at which the shags & nests were removed.
Table 4. Results of a population survey of trout and perch at a reservoir between 1960 and 1978.
Time
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
Fish Number
(average per haul)
Trout
Perch
----1.5
11
0.8
9
0
5
1
1
1
2.9
2
5
1.5
4.6
1.5
6
1.5
6
0.5
0.7
1
0.8
0.2
4
0.5
6.5
0.6
7.6
1
1.2
1.2
1.5
0.7
2
Shag nest
numbers
Page 16 of 36
16
4
5
10
22
25
35
40
26
32
35
42
0
0
0
2
10
32
28
Graph E
Metabolic measurements were taken from seven Antarctic fish Pagothenia borchgrevinski. The fish
are affected by a gill disease, which increases the thickness of the gas exchange surfaces and affects
oxygen uptake. The results of oxygen consumption of fish with varying amounts of affected gill (at rest
and swimming) are tabulated below.
Table 5. Oxygen consumption of fish with affected gills.
Oxygen consumption (cm3g-1h-1)
Fish
Percentage of gill
Number
affected
At rest
Swimming
1
0
0.05
0.29
2
95
0.04
0.11
3
60
0.04
0.14
4
30
0.05
0.22
5
90
0.05
0.08
6
65
0.04
0.18
7
45
0.04
0.20
Graph F
Impact failure threshold of 1018 cold rolled steel
Temperature (deg C)
Impact Energy (joules)
Mean
-195
0
20
100
1.4
62.2
70.4
77.3
Page 17 of 36
Standard Deviation
0.55
20.79
14.45
4.32
Analysis Questions to answer about graphs:
Graph A
1. What can you conclude about this set of data?
Graph B
1. Explain the relationship between wolf numbers and deer numbers as depicted by this graph.
Graph C
1. Estimate the rate of reaction at 15°C.
Graph D
1. Describe the evidence suggesting that the shag population is exercising some control over
perch numbers.
2. Describe evidence that the fluctuations in shag numbers are related to fluctuations in trout
numbers.
Graph E
1. Describe the relationship between the amount of gill affected and oxygen consumption in the
fish for the at rest data.
2. Describe the relationship between the amount of gill affected and oxygen consumption in the
fish for the swimming data set.
3. Describe how the gill disease affects oxygen uptake in resting fish.
Page 18 of 36
Interpreting Line Graphs
For each of the graphs above, give a description of the slope and an interpretation of how one variable
changes with respect to the other. For the purposes of your description, call the independent variable in
each example “variable X” and the dependent variable “variable Y”. Be aware that the existence of a
relationship between two variables does not necessarily mean that the relationship is causative (although
it may be).
(a) Slope: Positive linear relationship, with constantly rising slope
Interpretation: Variable Y (transpiration) increases regularly with increase in variable X (wind
speed)
(b) Slope: __________________________________________________________________________
Interpretation: ___________________________________________________________________
(c) Slope: __________________________________________________________________________
Interpretation: ___________________________________________________________________
(d) Slope: __________________________________________________________________________
Interpretation: ___________________________________________________________________
(e) Slope: __________________________________________________________________________
Interpretation: ___________________________________________________________________
(f) Slope: __________________________________________________________________________
Interpretation: ___________________________________________________________________
Page 19 of 36
Descriptive Statistics
For most investigations, measures of the biological response are made from more than one sampling unit.
The sample size (the number of sampling units) will
vary depending on the resources available. In lab
based investigations, the sample size may be as small
as two or three (e.g. two test tubes in each
treatment). In field studies, each individual may be a
sampling unit, and the samples size can be very large
(e.g. 100 individuals). It is useful to summarize the
data collected using descriptive statistics. Descriptive
statistics, such as mean, median, mode, and range
can help to highlight trends or patterns in the data.
Each of these statistics is appropriate to certain
types of data or distributions, e.g. a mean is not
appropriate for data with a skewed distribution.
Frequency graphs are useful for indicating the
distribution of data. Standard deviation and standard
error are statistics used to quantify the amount of
spread in the data and evaluate the reliability of
estimates of the true (population) mean.
Variation in Data
Whether they are obtained from observation or experiments, most biological data show variability. In a
set of data values, it is useful to know the value about which most of the data are grouped; the center
value. This value can be the mean, median or mode depending on the type of variable involved. The
main purpose of these statistics is to summarize important trends in your data and to provide the basis
for statistical analyses.
Statistic
Mean
Median




Mode


Range


Definition and Use
The average of all data entries
Measure of central tendency for
normally distributed data
The middle value when data
entries are placed in rank order
A good measure of central
tendency for skewed distributions
The most common data value
Suitable for bimodal distributions
and qualitative data
The difference between the
smallest and largest data values
Provides a crude indication of data
spread
Page 20 of 36







Method of Calculation
Add up all the data entries
Divide by the total number of data
entries
Arrange the data in increasing rank
order
Identify the middle value
For an even number of entries, find the
midpoint of the two middle values
Identify the category with the highest
number of data entries using a tally
chart or a bar graph
Identify the smallest and largest values
and find the difference between them
Distribution of Data
Variability in continuous data is often displayed as a
frequency distribution. A frequency plot will indicate
whether the data have a normal distribution (A), with a
symmetrical spread of data about the mean, or whether the
distribution is skewed (B), or bimodal (C). The shape of the
distribution will determine which statistic (mean, median or
mode) best describes the central tendency of the sample
data.
When to NOT calculate a mean:
a. Do NOT calculate a mean from values that are
already means (averages) themselves.
b. Do NOT calculate a mean of ratios (e.g. percentages)
for several groups of different sizes; go back to the
raw values and recalculate
c. Do NOT calculate a mean when the measurement
scale is not linear (e.g. pH units are not measured on
a linear scale).
Measuring Spread
The standard deviation is a frequently used
measure of the variability (spread) in a set of data.
It is usually presented in the form 𝑥̅ ± 𝑠. In a
normally distributed set of data, 68% of all data
values will lie within one standard deviation (s) of
the mean (𝑥̅ ) and 95% of all data values will lie
within two standard deviations of the mean. A
large standard deviation indicates that the data
have a lot of variability. A small sample standard
deviation indicates that the data are clustered
close to the sample mean.
Page 21 of 36
Two different sets of data can have the same mean and range, yet the distribution of data within the
range can be quite different. In both the data sets pictured in the histograms below, 68% of the values lie
within the range 𝑥̅ ± 1𝑠 and 95% of the values lie within 𝑥̅ ± 2𝑠. However, in B, the data values are
more tightly clustered around the mean.
Calculating Standard Deviation:
Set up a table like the one below to easily calculate standard deviation.
Data: 2, 5, 9, 12, 15, 17
Calculate mean: 2 + 5 + 9 + 12 + 15 + 17 = 60 60/6=10
Use value from table to calculate s:
168
𝑠=√
= √33.6 = 5.8
6−1
𝑥̅ ± 𝑠
10 ± 5.8
𝒙
2
5
9
12
15
17
Page 22 of 36
̅
𝒙−𝒙
2-10
5-10
9-10
12-10
15-10
17-10
(𝒙 − 𝒙
̅) 𝟐
(2-10)2
(5-10)2
(9-10)2
(12-10)2
(15-10)2
(17-10)2
64
25
1
4
25
49
168
Calculation of Descriptive Statistics
1. A survey of the number of spores found on the fronds of a fern plant was conducted. The data is
listed below:
Raw data: Number of spores per frond
64
60 64 62 68 66 63
69
70 63 70 70 63 62
71
69 59 70 66 61 70
67
64 63 64
Calculate each of the following—show work for all!
a. Mean
b. Median
c. Mode
d. Range
e. Standard deviation
Page 23 of 36
Reliability of the Mean
You have already seen how to use the standard deviation (s) to quantify the spread or dispersion in your
data. The variance (𝑠 2 ) is another such measure of dispersion, but the standard deviation is usually the
preferred of these two measures because it is expressed in the original units. Usually you will also want
to know how good your sample mean (𝑥̅ ) is an estimate of the true population mean (µ). This can be
indicated by the standard error of the mean (or just standard error—SE). SE is often used as an error
measurement simply because it is small, rather than for any good statistical reason. However, it does
allow you to calculate the 95% confidence interval (95% CI).
When we measure a particular attribute from a sample of a larger population and calculate a mean for
that attribute, we can calculate how closely our sample mean (the statistic) is to the true population mean
for that attribute (the parameter). For example: if we calculated the mean number of carapace spots
from a sample of six ladybird beetles, how reliable is this statistic as an indicator of the mean number of
carapace spots in the whole population? We can find out by calculating the 95% confidence interval.
Reliability of the Sample Mean
When we take measurements from samples of a large population, we are using those samples as
indicators of the trends in the whole population. Therefore, when we calculate a sample mean, it is useful
to know how close that value is to the true population mean. This is not merely an academic exercise; it
will enable you to make inferences about the aspect of the population in which you are interested. For
this reason, statistics based on samples and used to estimate population parameters are called inferential
statistics.
The Standard Error
The standard error is simple to calculate and is usually a small value. SE is given by:
𝑺𝑬 =
𝒔
√𝒏
Where s = standard deviation and
n = sample size.
Standard errors are sometimes plotted as error bars on graphs, but it is
usually more meaningful to plot the 95% CI.
95% Confidence Interval
This value is usually written as mean ± 95%CI. A 95% confidence
limit tells you that, on average, 95 times out of 100, the limits will
contain the true population mean.
Once researchers have developed a hypothesis, designed an
experiment, collected data and applied a number of descriptive
statistics that summarize the data visually, they can apply the
standard error statistic as an inference to describe the confidence
they have that the means of the sample represent the true means.
Page 24 of 36
SUMMARY INFO
Standard Deviation
This quantifies the spread of the data around the mean.
The larger the standard deviation, the greater the spread of data around the mean.
Standard Error and 95% confidence limits
Use this test when:
You wish to find out if there is a significant difference between two means;
The data are normally distributed; when plotted as a graph it forms a bell shaped curve.
The sizes of the samples are at least 30.
Example: Measuring a continuous variable such as height of males and females in a
population.
The measured values always show a range from a minimum to a maximum value.
The size of the range is determined by such factors as:
Precision of the measuring instrument
Individual variability among the objects being measured.
Biologists like to be confident that the data they collect is within acceptable limits of variance from
the mean.
Standard Error of the Mean
Used to provide confidence limits around the mean.
Providing no other factors other than chance influence the results, the means of future sets of
data should fall within these.
95% Confidence Limits
Biologists like to be 95% confident that the mean of any data achieved only varies from the mean
of previously recorded data by chance alone.
This range is roughly between -2 and +2 times the standard error.
The probability (p) that the mean value lies outside those limits is less than 1 in 20 (p = <0.05 ).
Multiply the standard error by 2 then add this to and subtract it from the mean.
Do this for both sets of data.
If the 95% confidence limits do not overlap there is a 95% chance that the two means are
different.
You can reject the null hypothesis and say:
There is a significant difference between the means of the two samples at the 5% level of
probability
Page 25 of 36
Practice
A student investigated the variation in the length of bivalve shells at two
locations on a rocky shore.
Show ALL WORK!!!
State the Null Hypothesis:
State an Alternative Hypothesis:
Data Collected
Shell Length in mm
Group A
Group B
46
23
50
28
45
41
45
31
63
26
57
33
65
35
73
21
55
38
79
30
62
36
59
38
71
45
68
28
77
42
Complete the table below:
Group A
Mean
Median
Mode
Range
SE
**95% CI
Group B
**CI = mean ± (SE x 2)
Page 26 of 36
Based on the statistics on the previous page, should you accept or reject the Null Hypothesis? Explain.
Based on the statistics calculated on the previous page, what can you conclude?
What does this data and the statistics tell us about the two sets of bivalves and their environment?
Page 27 of 36
Page 28 of 36
The Chi Square Test
This is a statistical test used to compare observed results with expected results. The calculation generates a X2
value. The higher the value of X2 the greater the difference between the observed and expected results.
Use this test when:
The measurements relate to the number of individuals in a particular category
The observed number can be compared with an expected number which is calculated from theory
How to use a Chi Squared Test:
1. State the Null Hypothesis
a. This statement says that there is no statistical difference between the observed and expected
results
2. Calculate the expected value
a. This may be the mean of the expected values
b. When you study inheritance, you would add up the expected values and apply a ratio
3. Calculate the Chi Squared value
(𝑜 − 𝑒)2
∑
𝑒
Where O = observed value
E = Expected value
4.
5.
6.
Determine the degrees of freedom
a. This is calculated from the formula n-1 where n = the number of sets of results
b. This basically measures how many classes of results can freely vary in their numbers. For example,
if you were tossing two coins at a time and had an accurate count of how many 2 heads and 2 tails
tosses were observed, then you already know how many of the 100 tosses ended up as mixed
head-tails, so the third measurement provides no additional information.
Compare the Chi Squared value against a table of critical values.
a. Refer to the degrees of freedom
b. Look up the chi-square value you calculated on the table below.
c. Use the table to determine the p value.
Decide whether to accept or reject the Null Hypothesis.
a. If the p value is greater than .05 then you would accept (fail to reject) your null hypothesis .
b. If the p value is .05 or less then you would reject your null hypothesis.
Page 29 of 36
7.
Draw a conclusion.
a. The Null hypothesis states that there is no difference between your observed and expected results.
b. If p ≤ .05 then Null Hypothesis is Rejected.
i. This means that if the difference between your observed data and your expected data
would occur due to chance alone fewer than 1 time in 20 (p=.05) then there is a 95%
chance that the differences between your observed and your expected data are due to
some other factor beyond chance.
ii. This means there is a statistically significant difference between what you expected and
what you observed and you would reject your null hypothesis.
c. If p>.05 then the Null Hypothesis is Accepted.
i. This means that there is no statistical difference between the observed and expected
results.
ii. Any difference between the observed and expected results are due to random events and
doesn’t represent any sort of biological phenomenon.
Page 30 of 36
Chi Square Practice
Naked mole rats are a burrowing rodent native to parts of East Africa. They have
a complex social structure in which only one female (the queen) and one to three
males reproduce, while the rest of the members of the colony function as
workers. Mammal ecologists suspected that they had an unusual male to female
ratio. They counted the numbers of each sex in one colony.
Sex
Female
Male
Number of Animals
52
34
State the Null Hypothesis:
What is the sex ratio you would EXPECT in the population?
Calculate the Chi Squared Value below:
Sex
Observed
Female
52
Male
34
Total
Expected
o-e
(o-e)2
(o-e)2/e
X2 ___________
What are the degrees of freedom? _________
Based on the degrees of freedom and calculated Chi Squared value what is the p value? (LOOK IT UP ON THE TABLE)
___________
Properly state your conclusion below based on the p value.
Page 31 of 36
Chi Squared Practice
You have been wandering about on a seashore and you have noticed that
a small snail (the flat periwinkle) seems to live only on seaweeds of
various kinds. You decide to investigate whether the animals prefer
certain kinds of seaweed by counting numbers of animals on different
species. You end up with the following data:
Type of Seaweed
Number of animals on
each kind of seaweed
Serrated wrack
45
Bladder wrack
38
Egg wrack
10
Spiral wrack
5
Other algae
2
TOTAL 100
State the Null Hypothesis:
What proportions would you EXPECT to see of each seaweed?
Calculate the Chi Squared Value below:
Seaweed
Observed
Serrated wrack
Bladder wrack
Egg wrack
Spiral wrack
Other algae
Total
Expected
o-e
(o-e)2
(o-e)2/e
45
38
10
5
2
100
X2 ___________
What are the degrees of freedom? _________
Based on the degrees of freedom and calculated Chi Squared value what is the p value? (LOOK IT UP ON THE TABLE)
___________
Properly state your conclusion below based on the p value.
Page 32 of 36
Practice!
Analyze each experiment’s strengths and weaknesses by answering the questions that follow.
Experiment 1:
Five tomato plants of the same height were placed in the same size pots, in the same type of soil and each was given the same
amount of water. Each plant was under a light bulb of the same intensity as the others but each light was of a different color.
Each day, the plants were given light (each its own color) for 12 hours and left in darkness for 12 hours. The height of each
plant was measured in centimeters at the end of each week for 10 weeks.
1 2 3 4 5 6 7
8
9
10
Week #→
Light Color ↓
Yellow
Green
Blue
Purple
Red
Height of plants (cm)
4
4
4
4
4
5
4
4
4
5
6
4
4
5
6
7
3
5
5
7
8
3
5
6
8
9
2
5
6
9
10
2
5
7
10
11
1
6
7
11
12
0
6
8
12
13
0
6
8
13
a. What question is being tested in this experiment?
b. Write one null hypothesis and two alternative hypotheses for this experiment. A null hypothesis is one that says that the
treatment will have no effect while an alternative hypothesis is one that makes a prediction of a specific effect.
H0:
Ha:
Ha:
c. What is the independent variable?________________________________________________________________
d. What is (are) the dependent variable(s)?___________________________________________________________
e. What factors are being held constant in this experiment?
f. Is there a control or control group? If not, what control would you suggest?
g. What was the sample size in this experiment?_______
h. Describe any replicates included within this experiment.
i. What are two conclusions that could be drawn from this experiment?
(1)
(2)
J. Describe the strengths of this experiment.
K. Describe the weaknesses of this experiment.
Page 33 of 36
Experiment 2
Five soup cans were painted black and five cans were painted white. A quarter liter of 24°C water was added to each can each
morning at 8am and then the temperature of the water in each can was recorded in degrees Celsius at noon each day for seven
days.
Temperature in °C
Mon
Tues
Wed
Thurs
Fri
Sat
Sun
Black
cans
1
2
3
4
5
45
45
45
45
45
37
37
37
37
37
40
40
40
40
40
41
41
41
41
41
32
32
32
32
32
35
35
35
35
35
40
40
40
40
40
White
cans
1
2
3
4
5
41
41
41
41
41
33
33
33
33
33
36
36
36
36
36
37
37
37
37
37
28
28
28
28
28
31
31
31
31
31
36
36
36
36
36
a. What question is being tested in this experiment?
b. Write one null hypothesis and two alternative hypotheses for this experiment.
H0:
H a:
H a:
c. What is the independent variable?___________________________________________________________________
d. What is (are) the dependent variable(s)?______________________________________________________________
e. What factors are being held constant in this experiment?_________________________________________________
f. Is there a control or control group? If not, what control would you suggest?
g. What was the sample size in this experiment?_________________________
h. Describe any replicates included within this experiment.
i. What are two conclusions that could be drawn from this experiment?
(1)
(2)
J. Describe the strengths of this experiment.
K. Describe the weaknesses of this experiment.
Page 34 of 36
A group of students decided to investigate whether there is really an average body temperature for all
humans. Their research question was: Is 98.6°F actually the average body temperature for humans?
Imagine that you have designed a study to answer this question and have randomly selected 130 healthy
18- to 40-year-old adults—65 males and 65 females. You have measured their body temperature, orally,
at the same time of day to a 10th of a degree precision. Table 4 shows the data for body temperature
collected, in degrees Fahrenheit.
You will be trying to answer the following question: How well does this sample represent the entire
population of humans?
Page 35 of 36
1.
2.
3.
4.
5.
6.
7.
8.
What should you construct first to determine how data is distributed? ___________________
a. Is the data parametric or nonparametric?__________________________
Based on the distribution of the data, calculate the appropriate descriptive statistic(s) for this set of
data.
a. IF data is normally distributed what should you calculate? _______________
b. IF data is nonparametric what should you calculate? ___________________
The next question to answer is: Does the mean value of the data set above represent an abnormal
mean temperature (representing a true difference between the accepted mean and the one we
calculated), or is the difference an acceptable average that just happened by chance, since we
tested only 130 individuals and not the entire human race?
a. How would you test the variability in the data?_________________________
b. What value did you calculate for this test? ____________________________
c. Explain what this value means________________________________________________
________________________________________________________________________
________________________________________________________________________
For this study, the students randomly chose 130 people to have their body temperature measured.
Discuss what would happen if the students took another random sample of 130 from the true
population. Would that sample mean equal the first sample mean? Explain!!_________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
What if would you expect if students repeated samplings like the one explained in #4 10 more
times or even 100 more times? _____________________________________________________
______________________________________________________________________________
______________________________________________________________________________
What would you calculate to determine how confident you are that the sample mean is close to the
true mean? ___________________________ What formula would you use? ________________
Calculate this value for this data set_______________________
Explain in words what your calculation in #6 means. _____________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
If you did everything correctly prior to this you would know that we can be 95% confident that the
true population mean lies between 98.13 F and 98.37 F. Because 98.6 F lies outside those
boundaries, is there sufficient evidence to claim that 98.6 F is wrong and 98.25 F is a more
accurate representation of the true population mean? You could use a T test to determine this and
would get at t-value = (98.25-98.6)/.06 = -5.45. If df = 129 then is much less than .0003. Explain
how you would interpret these results.
Page 36 of 36