Download N.2 Understanding Numerical Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Understanding Numerical Data
Statistics
• Statistics is a tool used to answer general
questions on the basis of a limited amount of
specific data.
• Statistics allows us to make decisions about a
population based on a sample of that
population rather than on the entire
population.
Why do we need Statistics?
• Let’s say that you want to know the lipid content of a
typical corn grain.
• You could analyze one grain, but how would you know that
you’d picked a “typical” grain?
• You’d get a better estimate of “typical” if you increased you
sample size to a few hundred grain, or even to 10,000. Or
to 1,000,000.
• Better yet….The only way to be certain your conclusions
would be to measure all of the corn grains in the world.
• Since this is clearly impossible, you must choose grains that
represent all of the grains in the world – that is, you must
be working with a representative sample.
Statistics Terms
• Mean- The mean is the arithmetic average of
a group of measurements.
• Median- The median is the middle value of a
group of measurements.
Look at the below data set which
number is the median.
• 12 21 15 17 18 20 24
Now describe how you would calculate
the average of this data set.
12 21 15 17 18 20 24
Scientists often base answers to
investigative questions on averages
• Thus in the earlier investigative question
about the lipid content of a typical corn grain,
if you took a sample of 10,000 corn, measured
their lipid content,
• then calculated their average(mean) lipid
content, would that average (mean) be an
adequate description the lipid content of all
corn in the world?
• Why? Or why not?
Other considerations - - -
Looking at these data sets what observations can you
make?
Boys Scores
60
62
68
70
63
65
65
58
64
63
Girls Scores
98
42
88
92
38
56
95
92
50
89
Based on data, what do you think is the average boy
score and girl’s score
Boys Scores
60
62
68
70
63
65
65
58
64
63
Girls Scores
98
42
88
92
38
56
95
92
50
89
Average Score
• Boys – 64%
• Girls – 74%
Does this mean that girls did significantly
better on the test?
Does the average for girls ’74%’ accurately describe how the
typical girl did on this test? Why? Or Why not?
Average
Boys Scores
Girls Scores
60
98
62
42
68
88
70
92
63
38
65
56
65
95
58
92
64
50
63
89
63,8
74
Does the average for boys ’63.8%’ accurately describe how the
typical boy did on this test? Why? Or Why not?
Average
Boys Scores
Girls Scores
60
98
62
42
68
88
70
92
63
38
65
56
65
95
58
92
64
50
63
89
63,8
74
Looking at the data what is range (lowest score & highest score)
of data (scores) for both boys & girls?
Boys Scores
60
62
68
70
63
65
65
58
64
63
Girls Scores
98
42
88
92
38
56
95
92
50
89
Standard Deviation
• Show the average difference each data point
has from the mean.
• Shows how big the range of a data set is.
Based on the range of the data sets, which gender do you think
would have a bigger Standard Deviation, boys or girls?
Boys Scores
60
62
68
70
63
65
65
58
64
63
Girls Scores
98
42
88
92
38
56
95
92
50
89
Standard Deviation
• Boys – 3.4
• Girls – 24.3
What does this difference in standard
deviation mean?
What would be the best way to graph this data in lab report?
What things should your graph include?
Boys Scores
Girls Scores
60
98
62
42
68
88
70
92
63
38
65
56
65
95
58
92
64
50
63
89
Average
63,8
74
Standard
Deviation
3,5
24,3
Based on this graph what can you conclude, about the
difference between how boys and girls did on this test?
Average Score of Boys and Girls on a test
120
100
Score on Test (%)
80
60
40
20
0
1
boys
girls
Error bars represent the standard deviation of the data sets
Data Analysis Conclusions things to think
about:
BIG vs. SMALL -- ERROR BARS
• Big error bars means lots of variation in data
& data is less reliable to draw conclusions
from
• Small error bars means less variation in data
& data is more reliable to draw conclusion
from
Big error bars = large standard
deviation = BIG Range in data
Small error bars = small standard
deviation = small range in data
BIG vs. SMALL Error Bars
Average Score of Boys and Girls on a test
120
100
Score on Test (%)
80
60
40
20
0
1
boys
girls
Error bars represent the standard deviation of the data sets
Data Analysis Conclusions things to think
about:
OVERLAPPING ERROR
BARS
120
100
80
Score on Test (%)
– When the values of
error bars overlap on a
graph it means that
there is NOT a significant
difference in averages
and data sets.
Average Score of Boys and Girls on a test
60
40
20
0
1
boys
Error bars represent the standard deviation of the data sets
girls
What overlapping error bars mean
with respect to average data between
Overlapping Error Bars
Average Score of Boys and Girls on a test
120
100
Score on Test (%)
80
60
40
20
0
1
boys
girls
Error bars represent the standard deviation of the data sets
Data Analysis Conclusions things to think about:
NON- OVERLAPPING ERROR
BARS
– When the values of error
bars DO NOT overlap on a
graph it means that there
MAY BE a significant
difference in averages and
data sets.
Average Score of Boys and Girls on a test
120
100
– In order to prove that there is
a difference between this
data set you must do a t test
Score on Test (%)
80
60
40
20
– t- tests test the differences
between means.
0
1
boys
girls
Error bars represent the standard deviation of the data sets
NON OVERLAPPING ERROR BARS
Average Score of Boys and Girls on a test
120
100
Score on Test (%)
80
60
40
20
0
1
boys
girls
Error bars represent the standard deviation of the data sets
What non-overlapping error bars
mean
YOUR Turn To PRACTICE.
For condition A is there a significant difference between the
control group & experiment group? Why or Why not?
For condition B, is there a significant difference between the
control group & experiment group? Why or Why not?
For condition C, is there a significant difference between the
control group & experiment group? Why or Why not?
Which data set (type of food) seems to be the most
reliable and why?
Between which type of food does there seem to be a significant
difference in the growth of fish? and explain why you made that
conclusion?