Download statistics!!! - mrsreedsibbiowiki

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
STATISTICS!!!
The science of data
What is data?
Information, in the form of facts or
figures obtained from experiments
or surveys, used as a basis for
making calculations or drawing
conclusions
Encarta dictionary
Statistics in Science
• Data can be collected about a
population (surveys)
• Data can be collected about a
process (experimentation)
2 types of Data
*Qualitative
*Quantitative
Qualitative Data
• Information that relates to characteristics or
description (observable qualities)
• Information is often grouped by descriptive
category
• Examples
– Species of plant
– Type of insect
– Shades of color
– Rank of flavor in taste testing
Remember: qualitative data can be “scored” and
evaluated numerically
Qualitative data, manipulated
numerically
• Survey results, teens and need for environmental action
Quantitative data
• Quantitative – measured using a
naturally occurring numerical scale
• Examples
–Chemical concentration
–Temperature
–Length
–Weight…etc.
Quantitation
• Measurements are often displayed
graphically
Quantitation = Measurement
• In data collection for Biology, data must be
measured carefully, using laboratory equipment
(ex. Timers, metersticks, pH meters, balances , pipettes, etc)
• The limits of the equipment used add some
uncertainty to the data collected. All equipment
has a certain magnitude of uncertainty. For
example, is a ruler that is mass-produced a
good measure of 1 cm? 1mm? 0.1mm?
• For quantitative testing, you must indicate the
level of uncertainty of the tool that you are
using for measurement!!
How to determine uncertainty?
• Usually the instrument manufacturer will indicate
this – read what is provided by the manufacturer.
• Be sure that the number of significant digits in
the data table/graph reflects the precision of the
instrument used (for ex. If the manufacturer
states that the accuracy of a balance is to 0.1g –
and your average mass is 2.06g, be sure to
round the average to 2.1g) Your data must be
consistent with your measurement tool
regarding significant figures.
Finding the limits
• As a “rule-of-thumb”, if not specified, use +/- 1/2
of the smallest measurement unit (ex metric
ruler is lined to 1mm,so the limit of uncertainty of
the ruler is +/- 0.5 mm.)
• If the room temperature is read as 25 degrees
C, with a thermometer that is scored at 1 degree
intervals – what is the range of possible
temperatures for the room?
(+/- 0.5 degrees Celsius - if you read 15oC, it
may in fact be 14.5 or 15.5 degrees)
-Stephen Taylor
Looking at Data
• How accurate is the data? (How close are
the data to the “real” results?)
• How precise is the data? (All test systems
have some uncertainty, due to limits of
measurement) Estimation of the limits of
the experimental uncertainty is essential.
Quick Review – 3 measures of
“Central Tendency”
• mode: value that appears most frequently
• median: When all data are listed from least to
greatest, the value at which half of the
observations are greater, and half are lesser.
• The most commonly used measure of central
tendency is the mean, or arithmetic average
(sum of data points divided by the number of
points)
do not calculate a mean from values that are
already averages.
do not calculate a mean when the
measurement scale is not linear (pH)
Comparing Averages
• Once the 2 averages are
calculated for each set of data,
the average values can be plotted
together on a graph, to visualize
the relationship between the 2
Drawing error bars
The simplest way to draw an error bar is to
use the mean as the central point, and to
use the distance of the measurement that
is furthest from the average – RANGE as the endpoints of the data bar (use with
less than 5 data points)
The RANGE is a difference between the smallest
and largest measurements of a sample provides
a sense of the variation of the sample.
Value farthest
from average
Calculated
distance
Average
value
What do error bars suggest?
• If the bars show extensive overlap, it is
likely that there is not a significant
difference between those values
Sample #1 25, 35, 32, 28
Sample #2 15, 75, 10, 20
Find the mean of each sample.
• These samples have the same mean, but
are still very different. How different? Use
range.
Sample #1 25, 35, 32, 28
Sample #2 15, 75, 10, 20
How can leaf lengths be displayed
graphically?
Simply measure the lengths of each and plot how
many are of each length
If smoothed, the histogram data
assumes this shape
This Shape?
• Is a classic bell-shaped curve, AKA
Gaussian Distribution Curve, AKA a
Normal Distribution curve.
• Essentially it means that in all studies with
an adequate number of datapoints a
significant number of results tend to be
near the mean. Fewer results are found
farther from the mean
Standard Deviation
• The standard deviation is a statistic that
tells you how tightly all the various
examples are clustered around the mean
in a set of data
Standard deviation
• The STANDARD DEVIATION is a more
sophisticated indicator of the precision of a
set of a given number of measurements
– The standard deviation is like an average
deviation of measurement values from the
mean. In large studies (5 or more data
points), the standard deviation is used to draw
error bars, instead of the maximum deviation.
A typical standard distribution curve
According to this curve:
• One standard deviation away from the
mean in either direction on the horizontal
axis (the red area on the preceding graph)
accounts for somewhere around 68
percent of the data in this group.
• Two standard deviations away from the
mean (the red and green areas) account
for roughly 95 percent of the data.
Three Standard Deviations?
• three standard deviations (the red, green
and blue areas) account for about 99
percent of the data
-3sd -2sd
+/-1sd
2sd
+3sd
How is Standard Deviation
calculated?
With this formula!
• You DO NOT
need to memorize
the formula
• It can be
calculated on a
scientific
calculator
• OR…. In Microsoft
Excel
You DO need to know the concept!
• standard deviation is a statistic that tells how
tightly all the various datapoints are clustered
around the mean in a set of data.
• When the datapoints are tightly bunched
together and the bell-shaped curve is steep, the
standard deviation is small. (precise results,
smaller sd)
• When the datapoints are spread apart and the
bell curve is relatively flat, a large standard
deviation value suggests less precise results
Given the set of numbers
{20.0 mL, 23.0 mL, 25.0 mL, 26.0 mL, 25.0 mL},
calculate the mean and the standard deviation
using your calculator.
http://click4biology.info/c4b/1/gcStat.htm#enter
Now let's look at how standard deviation can be used to help
us decide whether the difference between two mean is likely
to be significant.
Thirty teenage boys measured the length of their left and
right hands to find out whether they are different.
Hand Mean length
SD
left 188.6 mm
11.0 mm
right 188.4 mm
10.9 mm
Because the SD's are much greater than the difference
in mean length, it is very unlikely that the difference in
mean length between left and right hands is significant.
The same thirty boys also measured the length of
their right foot to find out whether it was different from
their hand lengths.
Appendage Mean length
SD
right hand 188.4 mm
10.9 mm
right foot 262.5 mm
14.3 mm
Because the SD's are much less that the difference in
mean length, it is very likely that the difference in mean
length between right hands and right feet is significant.
C. Results significantly
different?
f. Results significantly different?
f. Results significantly different?
NO