Download Statistical Analysis - HIS IB Biology 2011-2013

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Biology 1.0
All DP biology students should be able to:
 Perform the basic algebraic functions (addition, subtraction, multiplication and











division)
Recognise basic geometric shapes
Carry out simple calculations with a biological context involving decimals, fractions,
percentages, ratios, approximations, reciprocals and scaling
Use standard notation e.g. 3.6 x 106
Use direct and inverse proportion
Represent and interpret frequency data in the form of bar charts, column graphs and
histograms, and interpret pie charts and nomograms
Determine the mode and median of a set of data
Plot and interpret graphs (with suitable scales and axes) involving two variable that
show linear or non-linear relationships
Plot and interpret scatter graphs to identify a correlation between two variables, and
appreciate that the existence of a correlation does not establish a causal relationship
Demonstrate sufficient knowledge of probability to understand how Mendalian ratios
arise and to calculate such ratios using a Punnett grid
Make approximations of numerical expressions
Recognise and use the relationships between length, surface area and volume
1.1.1 State that error bars are a graphical
representation of variability of data
 If we plot the mean with the range this shows the spread of the data around the
mean.
 The graph shows how variable the data (measurements) are in comparison to the
mean where:
 a wide spread the mean is unreliable.
 a narrow spread the mean is more reliable.
Example of the mean with the full data range: Comparison of the shell length of two
samples of gastropod from different locations.
Marine population: mean= 30.7, Range = 23-43
Brackish population: mean = 41.3, Range = 32-51
The rules for using error bars:
Rule 1: Always state on the graph which type of error bar is being used.
 Mean + Range
 Mean +/- SD (standard deviation)
 Mean +/- SE (standard error)
Rule 2: Always state the number (n) of the sample size in the legend of
the graph.
 if there where 20 repeats/measurements then we add n=20
Rule 3: Error bars and statistics should only be shown for
independently repeated experiments, and never for replicates.
 If we wanted to find the mean height of sycamore trees then you would
measure the height of different trees (independently repeated
experiments) not the same tree many times (replicates).
Calculating the mean & standard deviation
 Data collected from an experiment falls into three
categories:
Data type
Example
Central tendency
Nominal
Frequency counts
(number of cats / number of ball bounces)
Mode
Ordinal
Ranked
(1st, 2nd / relative data)
Median
Integral
On a scale
(mm, ms-1)
Mean
 The arithmetic mean or average is a measure of the
central tendency (middle value) of the data.
 The sum of all values in data divided by the total frequency
of data:
Calculating the mean & standard deviation
Standard deviation:
 A measurement of the spread of data above and below
the mean.
 68% of data fall within ± 1 standard deviation of the mean
Calculating the standard deviation
YOU MUST
MEMORISE THIS
EQUATION!
 Yes – you can use a calculator!
 Calculators are allowed for Papers 2 and 3 but not Paper 1
σ = standard deviation of the sample
Σ = summation of
X – = difference between x value and mean
N = number of values
Calculating the standard deviation
Find the mean and number of samples
2. Calculate X - for all sets of values
3. Find (X - )2 for all values Find the STDEV for both these data sets:
Shell length (mm) for two populations of a
2
4. Σ (X - )
mollusc species
1.
5.
6.
See excel spreadsheet to see
calculations
Pop 1
Pop 2
32
38
31
43
27
34
34
40
37
44
38
45
36
39
22
46
34
48
23
39
What does the standard deviation mean?
Used to summarise the spread of values around the mean
 68% of the values fall within 1 standard deviation (±1SD)
 95% of the values fall with 2 standard deviation (±2SD)
 A sample with a small standard deviation suggests that the set
of data has a narrow variation (less error/ less uncertainty)
 A sample with a high standard deviation suggests that the set
of data has a wide variation (more error/ more uncertainty)
What does the standard deviation mean?
 When presenting data as a graph, you can show:
 Mean +/- Range
 Mean +/- Standard deviation
 These graphs will allow you to evaluate the reliability of your
data
Graph B: mean +/- SD
Graph A: mean +/- SD
 BUT If the two SD bars do
not overlap then we
CANNOT conclude that
they are statistically
different.
 The graph show an overlap
of the SD bar.
 If two SD error bars overlap
you can conclude that the
difference is not statistically
significant.
 Sample A and Sample B are
not significantly different.
The t-test
This is a how we can test the reliability on whether 2 sets of
data are statistically different.
 Takes into account:
 Means
 Amount of overlap
 This is so we can be certain whether the two sets of data are
significantly different or not.
 With the t-test, we always start by stating the null
hypothesis
H0 = “there is no significant difference”
 If the t-test tells us to accept H0, then there is no
significant difference between the means of the 2 data sets
 If the t-test tells us to reject H0, then there is a significant
difference between the means of the 2 data sets
 The t-test tells us the probability of two data sets being
the same
 If P = 1, the 2 sets of data are exactly the same.
 If P = 0, the 2 sets of data are not at all the same
 P ≤0.05, gives us a 95% confidence that the data has a
significant difference.
The t-test in excel
 We can calculate the t-test in excel (easy for lab reports)
See excel spreadsheet to see t-test
For the examples you'll use in biology:
 Tails is always 2 ,
 Type can be:
1.
Paired
2. Two samples, equal variance
3. Two samples, unequal variance
What do the results mean?
 Using excel, we found that P = 0.003 (2-tailed test)
 P< 0.05, therefore we reject the null hypothesis...
H0 = “there is no significant difference”
SO: ‘There is a significant difference between the height of shells in sample A
and sample B'
In an exam, you will be provided with a t – value which you need
to compare to the critical value on the t-test table:
 Significance is the
confidence (P)
 df is the degrees of
freedom (total sample
size minus 2)
 Use these values to get
a critical value
Using the t-test in the exam
A researcher measured the wing spans of 12 red-throat and 13
broad-billed hummingbirds.
• H0 = there is no
significant difference in
the wingspans of these 2
hummingbird species.
 Calculate the degrees of
freedom
 df = (n1 + n2) -2
= (12 + 13) -2 = 23
 P = 0.05 (you will be given
this value)
 cf = 1.174
What do the results mean?
 If t=2.15 (as per a complicated equation you don’t need to do, as you will be
provided with this value)
 And the critical value (CV) = 1.714
If t < CV; then accept H0
If t > CV; then reject H0
2.15 > 1.714
t > CV
So we REJECT the null hypothesis.
“There is a significant difference between red-throat and
broadbill hummingbirds in terms of wingspan”
Correlations
 Correlations can suggest relationships between sets of data
Correlations
 It is important to remember that correlations do not
prove causality.
 If a correlation
exists, further
research is
needed to
determine if the
relationship is
causal
Some causal
relationships
• Temperature vs enzyme activity
• Concentration vs rate of diffusion
• CO2 concentration vs rate of photosynthesis