Download IB 2 Statistical Analysis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Time series wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Statistical analysis
Why?? (besides making your life difficult …)

Scientists must collect data
AND analyze it

Does your data support your
hypothesis? Is it valid?

Statistics helps us find
relationships between sets of
data.

You are the scientist now, you
must be comfortable with
analysis of your data
Let’s look at two sets of data
 Sample
 -10,
1
0, 10, 20, 30
 Sample
 8,
2
9, 10, 11, 12
What can you tell me about this data???
Mean: the “average” of the data or the
central tendency

Sample 1
Sample 2
-10, 0, 10, 20, 30
8, 9, 10, 11, 12
-10 + 0 + 10 + 20 + 30

8 + 9 + 10 + 11 + 12
5
Mean = 10
Is this analysis complete???
5
Mean = 10
NO!
Range: how far is the spread?
Largest # - smallest #
Sample 1
Sample 2
-10, 0, 10, 20, 30
8, 9, 10, 11, 12
 30
 12
– (-10)
Range = 40
-8
Range = 4
Does this data help? Yes, Sample 1 is more dispersed
Obvious? Perhaps, but now shown mathematically
Something more …
standard deviation
 SD
is a measure to show how individual
data points are dispersed around the mean
Assuming normal data distribution
(bell curve)
 68%
of all collected
values lie within +/- 1
SD
 95%
of all collected
values lie within +/- 2
SD
 So
what???
Standard deviation

A small SD indicates the
data values are clustered
around the mean
 May
also indicate few
exteme data points

A large SD indicates the
data values are spread
out
 May
also indicate extreme
data points
 Outliers??
Standard deviation
𝑥
= each data point
𝑥
= the mean
n
= the total number of
data points
Σ
= the sum of all the
values
Let’s practice …

Sample 1

-10, 0, 10, 20, 30
 Remember
𝑥 = 10

(-10 – 10)2 + (0 – 10)2 + (10 – 10)2 + (20 – 10)2 + (30 – 10)2

(-20)2 + (-10)2 + (0)2 + (10)2 + (20)2

400 + 100 + 0 + 100 + 400

1000, divide by n – 1 (5 – 1 = 4)

1000/4 = 250, now √250

15.8
Let’s practice …

Sample 2

8, 9, 10, 11, 12
 Remember
𝑥 = 10

(8– 10)2 + (9 – 10)2 + (10 – 10)2 + (11 – 10)2 + (12 – 10)2

(-2)2 + (-1)2 + (0)2 + (1)2 + (2)2

4+1+0+1+4

10, divide by n – 1 (5 – 1 = 4)

10/4 = 2.5, now √2.5

1.58
Let’s compare …
 Sample
SD
= 15.8
 Sample
SD
1
2
= 1.58
How can I
use this in
my lab?
Error bars
 Error
bars represent the
variability of your data
 STANDARD
DEVIATION
 range
 measurement
uncertainties
Error bars
 On
a bar graph, the
bar represents the
mean of your data and
the error bars
represent +/- 1 sd
sd
mean
Error bars
 On
a line graph, the
point represents the
mean of your data
and the error bars
represent +/- 1 sd
sd
mean
t-test

t-test determines statistical significance between 2
sample means
Key word!!!!!
 Is
the difference significant?
 Is
the difference due to your variable?? Or is it random chance??
 How

valid is your data?
t-test determines the probability that difference is due to
random chance
A
p value (probability) of 0.05 (5%) shows a 5% chance of
randomness, but a 95% chance of confidence …
your difference IS DUE TO YOUR VARIABLE
You want 95% or higher!
t-test
 For
tests, you do NOT
need to calculate tvalues, but you must
be able to read a tchart!!
 For
internal
assessments, you may
use calculators or excel
to calculate t-values
Need to
be able to
calculate
degrees of
freedom
This is the
range you
are hoping
for
The
difference
between your
samples has a
HIGH
probability of
being due to
your variable
(and not
chance)
Calculating degrees of freedom
 df
= (n1 + n2) - 2
Size of sample 1
Size of sample 2
# of samples
Calculating degrees of freedom

df = (n1 + n2) – 2
 Population
 -10,
 n1
1
0, 10, 20, 30
=5
 Population
2
 8,
9, 10, 11, 12
 n2
=5

df = (5 + 5) -2

df = 8
Using the t-table

If df = 8 and t = 3.5, is this
a significant difference?

Less than 1% probability
difference in data is due
to chance

Therefore, greater than
99% probability difference
in data is due to our
variable
Other options, less commonly used
in our class

Median

 The
middle #, when
arranged in numeric order

Sample 1
 -10,

# that occurs most
often

 Median
 No

9, 10, 11, 12
= 10
Sample 1
 -10,
= 10
Sample 2
 8,
 The
0, 10, 20, 30
 Median
Mode
0, 10, 20, 30
mode
Sample 2
 8,
 No
9, 10, 11, 12
mode
Some practice: looking at plant height
Height in sun (cm)
Height in shade (cm)
124
131
120
60
153
131
98
160
124
212
141
117
156
131
128
95
139
145
117
118
 Calculate
the mean for
both samples
 Sun
= 130 cm
 Shade
= 130 cm
Some practice: looking at plant height
Height in sun (cm)
Height in shade (cm)
124
131
120
60
153
131
98
160
124
212
141
117
156
131
128
95
139
145
117
118
 Calculate
the range
for both samples
 Sun
= 58 cm
 Shade
= 152 cm
Some practice: looking at plant height
Height in sun (cm)
Height in shade (cm)
124
131
120
60
153
131
98
160
124
212
141
117
156
131
128
95
139
145
117
118
 Calculate
the median
for both samples If even # of samples, find
the average of the two
middle numbers
 Sun
= 126 cm
 Shade
= 131 cm
Some practice: looking at plant height
Height in sun (cm)
Height in shade (cm)
124
131
120
60
153
131
98
160
124
212
141
117
156
131
128
95
139
145
117
118
 Calculate
the mode
for both samples
 Sun
= 124 cm
 Shade
= 131 cm
Some practice: looking at plant height
Height in sun (cm)
Height in shade (cm)
124
131
120
60
153
131
98
160
124
212
141
117
156
131
128
95
139
145
117
118
 Calculate
the sd for
both samples
 Sun
= 17.56 cm
 Shade
= 39.85 cm
Some practice: looking at plant height
Height in sun (cm)
Height in shade (cm)
124
131
120
60
153
131
98
160
124
212
141
117
156
131
128
95
139
145
117
118
 Sun:
sd = 17.56 cm
 Low
sd indicates even
(close) distribution of data
points
 More
valid
 Shade:
sd = 39.85 cm
 High
sd indicates wide
spread of data points
 MAY
indicate a problem with
your experimental design
Some practice: looking at plant height
Height in sun (cm)
Height in shade (cm)
124
131
120
60
153
131
98
160
124
212
141
117
156
131
128
95
139
145
117
118
If t = 1.5, is this a significant
difference? No
Be careful: correlation vs. cause

Observations (and carefully chosen data) may imply a
CORRELATION, but does NOT necessarily demonstrate a
cause

The average global temperature has increased over the
past 100 years.

The number of pirates in the world has decreased over
the past 100 years.

Therefore, decreased number of pirates causes increased
global temperatures
Be careful: correlation vs. cause
no
no
no !
Be careful: correlation vs. cause
 To
discern a CAUSE,
a valid EXPERIMENT
must be done
 Other
scientists
must also be able
to repeat your
experiment
Last word …
 Remember,
it is
ALWAYS better to
PROVE your
experiment failed to
support your
hypothesis, than to lie
about it being a
success!!!
Any questions?