Download statistics_unit

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Categorical variable wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Assessment Statements
• 1.1.1 – State that error bars are a graphical representation of the
variability of data
• 1.1.2 – Calculate the mean and standard deviation of a set of
values
• 1.1.3 – State that the term standard deviation is used to
summarize the spread of values around the mean, and that 68%
of the values fall within one standard deviation of the mean
• 1.1.4 – Explain how the standard deviation is useful for comparing
the means and the spread of data between two or more samples
• 1.1.5 – Deduce the significance of the difference between two
sets of data using calculated values for t and the appropriate
labels
• 1.1.6 – Explain that the existence of a correlation does not
establish that there is a causal relationship between two variables
http://www.
worldometers
.info/
STATISTICS!!!
The science of data
Scientific Method - Review of Experimental
Terminology and Concepts
• Independent Variable: The variable that is manipulated in an
experiment. For example, in an experiment to test the affect
of light intensity on plant growth, the light intensity would be
the independent variable because you are manipulating
(changing) the intensity to see what affect it has on plant
growth.
• Dependent Variable: This is the variable that is measured in
an experiment. It is the variable that you are measuring to
see what affect the independent variable has on it. In the
example above, the Dependent Variable is Plant Growth
Scientific Method - Review of Experimental
Terminology and Concepts
• Experimental Group(s): All experiments have Experimental Groups.
These are the organisms (in this case, individual plants) that are exposed
to the same light intensity. For example, 10 plants exposed to 20 footcandles of light intensity would constitute one experimental group. Ten
different plants exposed to 40 foot-candles of light would be another
experimental group and so on.
•
• Control Group: Some experiments have a Control Group. If there is a
Control Group, it is the group of organisms (in this case plants) that are
not exposed to the factor being tested. In our example, the Control Group
could perhaps be a group of plants in total darkness (ie. the absence of
the factor- light intensity – being tested). This group always serves as a
standard of COMPARISON for the Experimental Groups. Think of another
experiment that would have a control group and identify the control group
below.
Scientific Method - Review of Experimental
Terminology and Concepts
The Affect of _______________________
on plant growth
The control group in this experiment would be
_____________________________________
_____________________________________
Scientific Method - Review of Experimental
Terminology and Concepts
•
Controlled Variables: Even though some experiments don’t have a Control Group, all
good experiments have Controlled Variables. These are the variables that could affect
the Dependent Variable (in this example plant growth) other than the Independent
Variables. In our plant example, controlled variables would include
–
–
–
Amount of water the plants receive
Temperature that the plants are exposed to
Nutrients in the soil
These variables can be controlled in two ways.
• Physical Control: By keeping the controlled variables exactly the same, we control
them. In the example above, we would water all the experimental and control groups
with precisely the same amount of water. We would also use the same soils with the
same nutrient content.
• Making sure any variations are experienced equally: Some variables are difficult to
control physically. In such cases, an acceptable substitute for physical control is to
make sure that all plants experience the same fluctuations, thereby cancelling out the
affects of the variation. In our example above, putting the plants in the same area of a
room would ensure that any unavoidable temperature fluctuations in the room would
be experienced by all plants in the experiment.
What is data?
Information, in the form of facts or
figures obtained from experiments
or surveys, used as a basis for
making calculations or drawing
conclusions
Encarta dictionary
Statistics in Science
• Data can be collected about a
population (surveys)
• Data can be collected about a
process (experimentation)
Qualitative Data
• Information that relates to characteristics or
description (observable qualities)
• Information is often grouped by descriptive category
• Examples
– Species of plant
– Type of insect
– Shades of color
– Rank of flavor in taste testing
Remember: qualitative data can be “scored” and evaluated
numerically
Qualitative data, manipulated numerically
• Survey results, teens and need for environmental action
Quantitative data
• Quantitative – measured using a
naturally occurring numerical scale
• Examples
–Chemical concentration
–Temperature
–Length
–Weight…etc.
Quantitation
• Measurements are often displayed graphically
Quantitation = Measurement
• In data collection for Biology, data must be
measured carefully, using laboratory equipment
(ex. Timers, metersticks, pH meters, balances , pipettes, etc)
• The limits of the equipment used add some
uncertainty to the data collected. All equipment has
a certain magnitude of uncertainty. For example, is a
ruler that is mass-produced a good measure of 1 cm?
1mm? 0.1mm?
• For quantitative testing, you must indicate the level
of uncertainty of the tool that you are using for
measurement!!
How to determine uncertainty?
• Usually the instrument manufacturer will indicate
this – read what is provided by the manufacturer.
• Be sure that the number of significant digits in the
data table/graph reflects the precision of the
instrument used (for ex. If the manufacturer states
that the accuracy of a balance is to 0.1g – and your
average mass is 2.06g, be sure to round the average
to 2.1g) Your data must be consistent with your
measurement tool regarding significant figures.
Finding the limits
• As a “rule-of-thumb”, if not specified, use +/- 1/2 of
the smallest measurement unit (ex metric ruler is
lined to 1mm,so the limit of uncertainty of the ruler
is +/- 0.5 mm.)
• If the room temperature is read as 25 degrees C,
with a thermometer that is scored at 1 degree
intervals – what is the range of possible
temperatures for the room?
• (ans. +/- 0.5 degrees Celsius - if you read 15oC, it
may in fact be 14.5 or 15.5 degrees)
Looking at Data
• How accurate is the data? (How close are the
data to the “real” results?) This is also
considered as BIAS
• How precise is the data? (All test systems have
some uncertainty, due to limits of
measurement) Estimation of the limits of the
experimental uncertainty is essential.
Comparing Averages
• Once the 2 averages are calculated
for each set of data, the average
values can be plotted together on a
graph, to visualize the relationship
between the 2
Drawing error bars
• The simplest way to draw an error bar is to
use the mean as the central point, and to use
the distance of the measurement that is
furthest from the average as the endpoints of
the data bar
Value farthest
from average
Calculated
distance
Average
value
What do error bars suggest?
• If the bars show extensive overlap, it is likely
that there is not a significant difference
between those values
Quick Review – 3 measures of “Central
Tendency”
• mode: value that appears most frequently
• median: When all data are listed from least to
greatest, the value at which half of the
observations are greater, and half are lesser.
• The most commonly used measure of central
tendency is the mean, or arithmetic average
(sum of data points divided by the number of
points)
How can leaf lengths be displayed
graphically?
Simply measure the lengths of each and plot how many are of
each length
If smoothed, the histogram data assumes this
shape
This Shape?
• Is a classic bell-shaped curve, AKA Gaussian
Distribution Curve, AKA a Normal Distribution
curve.
• Essentially it means that in all studies with an
adequate number of datapoints (>30) a
significant number of results tend to be near
the mean. Fewer results are found farther
from the mean
Standard deviation
• The standard deviation is a statistic that tells
you how tightly all the various examples are
clustered around the mean in a set of data
• The STANDARD DEVIATION is a more
sophisticated indicator of the precision of a
set of a given number of measurements
– The standard deviation is like an average deviation
of measurement values from the mean. In large
studies, the standard deviation is used to draw
error bars, instead of the maximum deviation.
A typical standard distribution curve
According to this curve:
• One standard deviation away from the mean
in either direction on the horizontal axis (the
red area on the preceding graph) accounts for
somewhere around 68 percent of the data in
this group.
• Two standard deviations away from the mean
(the red and green areas) account for roughly
95 percent of the data.
Three Standard Deviations?
• three standard deviations (the red, green and
blue areas) account for about 99 percent of
the data
-3sd -2sd
+/-1sd
2sd
+3sd
How is Standard Deviation calculated?
With this formula!
Not the formula!
• This can be calculated on a scientific calculator
• OR…. In Microsoft Excel, type the following code into the cell
where you want the Standard Deviation result, using the
"unbiased," or "n-1" method: =STDEV(A1:A30) (substitute the
cell name of the first value in your dataset for A1, and the cell
name of the last value for A30.)
You DO need to know the concept!
• standard deviation is a statistic that tells how
tightly all the various datapoints are clustered
around the mean in a set of data.
• When the datapoints are tightly bunched together
and the bell-shaped curve is steep, the standard
deviation is small.(precise results, smaller sd)
• When the datapoints are spread apart and the bell
curve is relatively flat, a large standard deviation
value suggests less precise results
T - Test
The Student’s t-test compares the averages and
standard deviations of two samples to see if there is a
significant difference between them.
We start by calculating a number, t
t can be calculated using the equation:
t=
( x1 – x2 )
(s1)2
n1
+
(s2)2
n2
Where:
x1 is the mean of sample 1
s1 is the standard deviation of sample 1
n1 is the number of individuals in sample 1
x2 is the mean of sample 2
s2 is the standard deviation of sample 2
n2 is the number of individuals in sample 2
T-Test
• used to show if the data sets occurred by chance
alone
• Ex: 0.05 = 5% is chance so 95% resulted from the
experiment (good results)
• Ex: 0.50 = the difference is due to chance 50% of
the time (not good)
• Degrees of Freedom  used to read the t-test;
total number of samples minus 2
– Ex: A has 15 samples and B has 15 samples so (15+15 2 = 28)
Correlation Does Not Mean
Causation
• Observations without experimental data can
only show correlation and not causation
• Ex: smoking and lung cancer  there is a high
correlation between smoking and lung cancer;
Does this prove smoking causes lung cancer?
 cannot prove that smoking is the cause
unless there is experimental data to show
evidence of cause
Crazy Correlations
• Shark attacks on swimmers and ice cream
sales are correlated
• The number of cavities in elementary school
children and vocabulary size have a strong
positive correlation.
Do not worry if you do not understand
how or why the test works
Follow the
instructions
CAREFULLY
You will NOT need to remember how to do this for your exam