Download File

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Transcript
Statistics!
• Used to make sense of data.
Monty Python Clip:
http://www.youtube.com/watch?v=rzcLQRXW6B0&NR=1
• What is the velocity of a European Swallow?
Pictures from: http://www.style.org/unladenswallow/
European Swallow
(Hirundo rustica
South African Swallow
(Hirundo spilodera)
Air speed of European Swallow
•
•
•
•
•
•
Data set 1:
15. m/s
7.0 m/s
6.0 m/s
15. m/s
12. m/s
•
Data derived from Jonathan Coram, http://www.style.org/unladenswallow/
• The mean is the average value for the data set
• Calculate the mean value for data set 1…
Data set 2
•
•
•
•
•
•
10. m/s
11. m/s
12. ms
11. m/s
11 m/s
Calculate the mean value for data set 2.
Mean
• The mean is the average value for the data set
• Both data set 1 and 2 have a mean value of:
11 m/s
• Are the data sets the same?
• NO! Data set 1 is much more variable than 2.
1.1.1 State that error bars are a graphical
representation of the variability of data.
• Error bars can be
used to show either
the range of the
data or the standard
deviation.
• The range indicates
the spread from the
lowest value to the
highest for a set of
data.
•
Graph:
http://www.csupomona.edu/~jcclark/classe
s/bio542l/graphics/g-error2.gif
Range
• Data set 1 includes values from 6 – 15 m/s
(range = 9 m/s)
• Data set 2 includes values from 10 – 12 m/s
(range = 2 m/s)
Standard Deviation
• 1.1.3 State that the term
standard deviation is used
to summarize the spread of
values around the mean,
and that 68% of the values
fall within one standard
deviation of the mean.
• For normally distributed
data, about 68% of all
values lie within ―1
standard deviation of the
mean. This rises to about
95% for ―2 standard
deviations.
•
Picture: http://blog.home-account.com/wpcontent/uploads/2009/06/deviation.jpg
1.1.2 Calculate the mean and standard
deviation of a set of values.
• Students should specify the standard deviation
(s), not the population standard deviation.
Students will not be expected to know the
formulas for calculating these statistics. They will
be expected to use the standard deviation
function of a graphic display or scientific
calculator.
• TI 83:
http://www.csis.ysu.edu/~chang/class/TIcalculat
orStat.pdf
You Don’t need to know this:
• Standard deviation =
Σ = Sum of
X = Individual score
M = Mean of all scores
N = Sample size (Number of scores)
• Variance:
Variance = s2
•
Population Standard deviation
•
•
Pictures and text from:
http://easycalculation.com/statistics/learn-standard-deviation.php
=
• Here is an online SD calculator
• The standard deviation for data set 1 is 4.3 m/s
• The standard deviation for data set 2 is 0.71 m/s
• Even though the means are identical, the data is
very different in regard to its variability.
Why Calculate Standard Deviation?
• 1.1.4 Explain how the standard deviation is
useful for comparing the means and the
spread of data between two or more samples.
• A small standard deviation indicates that the
data is clustered closely around the mean
value. Conversely, a large standard deviation
indicates a wider spread around the mean.
Differences between data sets
• http://www.youtube.com/watch?v=y2R3FvS4xr4
• Let’s say that you measure velocity of African Swallows,
and get the following data:
• 14 m/s, 18 m/s, 12. m/s, 14 m/s 17 m/s
• What is the mean and standard deviation?
• Mean = 15 m/s
• Standard deviation = 2.4 m/s
• Is this statistically different from that of the European?
What do error bars suggest?
• If the bars show extensive overlap, it is likely
that there is not a significant difference
between those values
T-Tests
• Used to tell whether there is a statistically
significant difference between two sets of data.
• If your error bars overlap it is a good idea to
perform a t test, but you need to have enough
data points to do so (min 10 each set of data).
• Generally differences are considered statistically
different if there is a 95% or greater chance that
the data sets are different.
• Run T-Tests comparing whether the data for the
African Sparrow is statistically different from data
sets 1 and then set 2 for the European sparrow.
Special Disclaimer
• In reality, both the African and the European
Swallows have a velocity of approximately 11
m/s.
• T test on TI 83:
• http://www.csc.villanova.edu/~ysp/Teacher/
Webpages/mstp_payne/Web_project/TI83/2t-test.html
T- test of data sets 1, 2, and 3
• European 1 vs. African
• European 2 vs. African
• p = 0.1084
• T = 1.807
• df = 8
• p = 0.0080
• T = 3.58
• df = 8
1.1.5 Deduce the significance of the difference
between two sets of data using calculated
values for t and the appropriate tables.
• For the t-test to be applied, the data must
have a normal distribution and a sample size
of at least 10. The t-test can be used to
compare two sets of data and measure the
amount of overlap.
• Students will not be expected to calculate
values of t. Only a two-tailed, unpaired t-test
is expected.
Correlation Vs. Causation
• Correlation: there is a statistically significant
similarity between two sets of data.
• Causation: changes in one variable cause the
other variable to change.
1.1.6 Explain that the existence of a correlation
does not establish that there is a causal
relationship between two variables.
• Aim 7: While calculations of such values are
not expected, students who want to use r and
r2 values in their practical work could be
shown how to determine such values using a
spreadsheet program.
r = linear correlation coefficient
• Measures the strength and direction of
correlation between two variables.
• +/- 1 is a linear relationship; 0 is no
correlation
• - is a negative correlation; + is a positive
• A correlation > 0.8 is considered strong; < 0.5
weak.
•
Formula: http://mathbits.com/mathbits/tisection/statistics2/correlation.htm
Pictures:
http://www.math.upenn.edu/~estorm/115s08/bestfitline/bestfitline.html
A
D
B
C
E
F
r2 coefficient of determination
• Tells how much of the fluctuation in one variable can be predicted by the
other variable.
• if r = 0.922, then r 2 = 0.850, which means that 85% of the total variation in
y can be explained by the linear relationship between x and y (as
described by the regression equation). The other 15% of the total
variation in y remains unexplained.
• The coefficient of determination is a measure of how well the regression
line represents the data. If the regression line passes exactly through
every point on the scatter plot, it would be able to explain all of the
variation. The further the line is away from the points, the less it is able to
explain.
• Points above quoted directly from:
http://mathbits.com/mathbits/tisection/statistics2/correlation.htm
According to this graph, is there an apparent correlation
between number of pirates and global temperature? Do pirates
cause the Earth to be cooler?
From: http://statfail.com/wp-content/uploads/2010/03/graph_pirates_gw.png