Download Chapter 1: Scientific Methods

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Nature of Science
Science 1101
Nature of Science
Scientific methods
Formulation of a hypothesis
 Survey literature/Archives
 Design experiments considering all
Variables (Independant, dependant and
controlled)
 Collection of data

Nature of Science
Scientific methods
Analysis of data (statistical methods)
 Interpretation of results
 Conclusion
 Publication

Descriptive
Statistics
Measures of
Central Tendency
When looking at a data set,
we often are interested in
knowing the "center" of the
group of values we're
examining.
 If you are evaluating your
exam score, you want to
know the "average" score for
the class.

Student
Score (pts.)
a
93
b
88
c
77
d
85
e
74
f
69
g
90
h
66
i
82
j
88
k
59
l
83
m
75
n
97
o
71
Nature of
Science
Descriptive Statistics
Measures of Central
Tendency
There are three such measures
of central tendency: the mode,
the median, and the mean.
 To examine measures of central
tendency, it is helpful to arrange
the values in descending order.
97, 93, 90, 88, 88, 85, 83, 82, 77,
75, 74, 71, 69, 66, 59

Student
Score (pts.)
a
93
b
88
c
77
d
85
e
74
f
69
g
90
h
66
i
82
j
88
k
59
l
83
m
75
n
97
o
71
Nature of
Science
Descriptive Statistics
Measures of Central Tendency
Mode
The mode of a data set does not need to be
near the center of the data set, it simply has
to be the most common.
 Modes are useful in that they tell you the
most common value in the data set, and can
shed some light on the data set's tendencies.

Nature of
Science
Descriptive Statistics
Measures of Central Tendency
Median
The median is the value that occurs in the middle
of the data set.
 If the data set contains an odd number of values,
the median will be the middle value.
 For example, in a ordered data set of 19 values,
value #10 would be the median as there would
be nine values below it and nine values above it.
 In a data set of 20 values, the median would be
the value halfway between the tenth and
Nature of
eleventh values.
Science

Descriptive Statistics
Measures of Central Tendency
Mean

To calculate the mean you sum all of the
values and divide by the number of values
Nature of
Science
Descriptive Statistics
Measures of Central Tendency
Range
The range describes the highest and lowest
values in a data set.
 The range is commonly used in weather
reports, where the daily high and low
temperatures are reported

Nature of
Science
Descriptive Statistics
Measures of Central Tendency
Variance and Standard Deviation
The variance describes how far each value is from
the mean.
 While the variance gives you an indication of the
deviation of each value from the mean, the
variance is in "squared" units
 Variance= Substract each value from mean
 Some values are (+), some (-) and some = 0.

Nature of
Science
Descriptive Statistics
Measures of Central Tendency
Variance and Standard Deviation




An absolute value of how much one is deviated from
the mean is to calculate the standard deviation (SD)
The square root of the variance, which will provide us
with a measure of data dispersion is the same units as
the mean.
This value is known as the standard deviation (SD).
The standard deviation is kind of the "mean of the
mean," and often can help you find the story behind
the data. To understand this concept, it can help to
learn about what statisticians call normal distribution of
data.
Nature of Science
Descriptive Statistics
Measures of Central Tendency
Variance and Standard Deviation
If you looked at normally distributed data on a
graph, it would look something like this:
 The x-axis (the horizontal one) is the value in
question... calories consumed, for example. And
the y-axis (the vertical one) is the number of data
points for each value on the x-axis... in other
words, the number of people who eat x calories

Nature of Science
Descriptive Statistics
Measures of Central Tendency
Variance and Standard Deviation

Now, not all sets of data will have graphs that look
this perfect. Some will have relatively flat curves,
others will be pretty steep. Sometimes the mean
will lean a little bit to one side or the other. But all
normally distributed data will have something like
this same "bell curve" shape.
Nature of Science
Descriptive Statistics
Measures of Central Tendency
Variance and Standard Deviation

Now, not all sets of data will have graphs that look
this perfect. Some will have relatively flat curves,
others will be pretty steep. Sometimes the mean
will lean a little bit to one side or the other. But all
normally distributed data will have something like
this same "bell curve" shape.
Figure1. Normal distribution with standard deviations
Nature of Science
Standard Deviation

Computing the value of a standard deviation
is complicated. But let us see graphically
what a standard deviation represents...
One standard deviation away from the mean in
either direction on the horizontal axis (the red area
on the above graph) accounts for somewhere
around 68 percent of the people in this group.
 Two standard deviations away from the mean (the
red and green areas) account for roughly 95
percent of the people.
 And three standard deviations (the red, green and
blue areas) account for about 99 percent of the
people
Nature of Science

Homework
Example 1: Brain teaser
 Here is one formula for computing the standard deviation.
 Terms you'll need to know
 x = one value in your set of data
 avg (x) = the mean (average) of all values x in your set of
data
 n = the number of values x in your set of data
 For each value x, subtract the overall avg (x) from x, then
multiply that result by itself (otherwise known as
determining the square of that value). Sum up all those
squared values. Then divide that result by (n-1). Got it?
Then, there's one more step... find the square root of that
last number. That's the standard deviation of your set of
data.
Nature of Science
Standard Deviation
Formula
 The formula for the standard deviation is very
simple: it is the square root of the variance. It is
the most commonly used measure of spread.
 The variance describes how far each value is from
the mean.

Nature of Science
Standard Deviation

Formula
Nature of Science
Standard Deviation

Formula
Nature of Science
Example 2:






Consider the observations 8,25,7,5,8,3,10,12,9.
First, calculate the mean and determine N.
Remember, the mean is the sum of scores divided
by N where N is the number of scores.
Therefore, the mean =
(8+25+7+5+8+3+10+12+9) / 9 or 9.67
Then, calculate the standard deviation as illustrated
below.
Standard Deviation = Square root (sum of
squared deviations / (N-1)
=
Square root(320.01/(9-1))
 = Square root(40)
Nature of Science
 = 6.32
Graphs
Datas may be presented in the forms of tables, charts
and graphs.
 The type of figure you will use most frequently is a
graph. Graphs can be an effective way to present
information, but it is important that they are properly
constructed. Let's look at a sample graph to demonstrate
the basic characteristics of a graph.
 Note that the independent variable is graphed on the xaxis and the dependent variable on the y-axis, the x-axis
(horizontal) and y-axis (vertical) are labeled
 If "time" is a factor in your graph, it should be graphed
on the x-axis.
 Graphs should also be of high quality and appropriate
size.

Graphs
Sample bar graph
Sample line graph
When Bar Graphs are Useful
Bar graphs are useful for graphing non-continuous data,
such as data from different experimental groups.
 Note that in each case, the dependent variable is on the yaxis and the independent variable on the x-axis, and


all of the formatting issues listed above are addressed.
Sample bar graph
Sample bar graph
When Line Graphs are Useful
If your data are continuous (each point is directly related to
the next and can be connected by an infinite number of
intermediate points), then a line graph is the way to go.
 Line graphs are commonly used in scientific studies to
present data when time (a continuous variable) is one of the
variables involved.
 Note the dependent variable is on the y-axis, "time" is on
the x-axis.

Sample line graph
Sample line graph
Scatter Plots
While bar and line graphs are used commonly,
there is another type of graph worth mentioning the scatter plot.
 In some cases you may need to graph two
variables against one another to determine their
relationship.
 You graph one variable on the X-axis and the other
on the Y-axis, and then graph your points
accordingly.

Scatter Plots
By doing this, you can see if one variable increases
as the other increases (positive relationship).
 Or one increases as the other decreases (negative
relationship).
 Or if the two variables show no relationship to one
another.

Sample scatter plot