Download Independent Variable

Document related concepts

History of statistics wikipedia , lookup

Transcript
Experiments:
Method and Methodology
Mícheál Ó Foghlú
Executive Director Research
TSSG, WIT [email protected]
March 2009
Revised Schedule


Mon 12th Jan
Wed 14th Jan
Wed 21st Jan
Wed 28th Jan
Wed 4th Feb
Wed 11th Feb
Wed 25th Feb
Wed 4th Mar
Wed 11th Mar
Wed 18th Mar
Wed 25th Mar
Wed 1st Apr
Wed 22nd Apr

Sessions 01-05 to be delivered by Mícheál Ó Foghlú











Thomas Magedanz - Guest Lecture on IMS [DONE]
Presentations [DONE]
Presentations [DONE]
IPv6 Summit (Dublin Castle) [DONE]
EMPTY
Session 01 [DONE]
EMPTY
Session 02 [DONE]
Session 03 [DONE]
EMPTY
EMPTY
Session 04 [Today]
Session 05
Copyright © Mícheál Ó Foghlú 2009
Schedule Detail


01 What is research?
– Philosophy, Epistemology, Methodology and Method
02 How to write academically?
– Some simple language rules
– Some simple structure rules

03 What’s the big deal with plagiarism?
– Bibliographies, references and citations, …
– Doing it in Word
– Doing it with other tools like LaTeX/BibTeX

04 Results - how to do experiments
– Support tools: simulation, data analysis, …

05 Discussion
Copyright © Mícheál Ó Foghlú 2009
Structure
Experimental Design (basics)
 Statistical Analysis (basics)

Copyright © Mícheál Ó Foghlú 2009
Experimental
Design
How to conduct a valid
experiment.
http://www.slideshare.net/mrmularella/experimental-design
A Good Experiment

Tests one variable at a time. If more than
one thing is tested at a time, it won’t be clear
which variable caused the end result.
 Must be fair and unbiased. This means that
the experimenter must not allow his or her
opinions to influence the experiment.
 Does not allow any outside factors to affect
the outcome of the experiment.
Copyright © Mícheál Ó Foghlú 2009
A Good Experiment

Is valid. The experimental procedure must
test your hypothesis to see if it is correct.
 If the procedure does not test your
hypothesis, the experiment is not valid and
the data will make no sense!
 Has repeated trials. Repeating the trials in
the experiment will reduce the effect of
experimental errors and give a more accurate
conclusion.
Copyright © Mícheál Ó Foghlú 2009
Variables



A variable is anything in an experiment
that can change or vary.
It is any factor that can have an effect
on the outcome of the experiment.
There are three main types of
variables.
Copyright © Mícheál Ó Foghlú 2009
3 Kinds of Variables
Independent Variable (IV)
– something that is intentionally changed
by the scientist
–
–
–
–
What is tested
What is manipulated
Also called a “Manipulated Variable”
You can only change ONE variable in an
experiment!!!
Copyright © Mícheál Ó Foghlú 2009
3 Kinds of Variables
Independent Variable (IV)
To determine the independent variable, ask yourself:
“What is being changed?”
Finish this sentence…
“I will change the _____________”
Copyright © Mícheál Ó Foghlú 2009
Independent Variable
Levels of the IV

These are different ways you will change the
independent variable
Example: Assume you are testing five brands of
popcorn to see which has the most unpopped
kernels.
 The IV would be the different brands of popcorn.
 The five different brands would be the different
levels of the IV.
Copyright © Mícheál Ó Foghlú 2009
3 Kinds of Variables
Dependent Variable (DV)
– something that might be affected by the
change in the independent variable
– What is observed and measured
– The data collected during the investigation
– Also called a “Responding Variable”
Copyright © Mícheál Ó Foghlú 2009
3 Kinds of Variables
Dependent Variable (DV)
To determine the dependent variable, ask yourself:
“What will I measure and observe?”
Finish this sentence…
“I will measure and observe ________________”
Copyright © Mícheál Ó Foghlú 2009
Dependent Variable
Operational Definition:
 Define exactly how the dependent variable
will be measured.
Example: Assume your DV in an experiment
is “plant growth.” How will you measure
this?! It could be…
 Height (cm), mass (g), # of leaves, etc.
 Be specific and include all necessary units!
Copyright © Mícheál Ó Foghlú 2009
3 Kinds of Variables
Controlled Variable (CV)
– a variable that is not changed and
kept the same
– Also called constants
– Allows for a “fair test”
– NOT the same as a “control”!!
– Any given experiment will have many
controlled variables
Copyright © Mícheál Ó Foghlú 2009
3 Kinds of Variables
Controlled Variable (CV)
To determine the controlled variables, ask yourself:
“What should not be allowed to change?”
Finish this sentence…
“I will not allow the ______________ to change.”
Copyright © Mícheál Ó Foghlú 2009
Control
A group or individual in the experiment that is
not tested, but is used for comparison as a
reference for what “normal” would be like.
 Not all experiments have a control (though all
experiments have controlled variables).
Example: If you tested different pollutants to
see their affect on plant growth, the control
would only receive water.
Copyright © Mícheál Ó Foghlú 2009
Example

Students of different ages were given the
same jigsaw puzzle to put together.

They were timed to see how long it took to
finish the puzzle.
Copyright © Mícheál Ó Foghlú 2009
Identify the variables in this
investigation!
Copyright © Mícheál Ó Foghlú 2009
What was the independent variable?
Ages of the students
– Different ages were tested by the
scientist
Copyright © Mícheál Ó Foghlú 2009
What was the dependent variable?
The time it to put the puzzle together
– The time was observed and
measured by the scientist
Copyright © Mícheál Ó Foghlú 2009
What was a controlled variable?
Same puzzle
– All of the participants were tested with the
same puzzle.
– It would not have been a fair test if some
had an easy 30 piece puzzle and some
had a harder 500 piece puzzle.
Copyright © Mícheál Ó Foghlú 2009
Another Example:

An investigation was done with an
electromagnetic system made from a battery and
wire wrapped around a nail.

Different sizes of nails were used.

The number of paper clips the electromagnet
could pick up was measured.
Copyright © Mícheál Ó Foghlú 2009

What are the variables in this investigation?
Copyright © Mícheál Ó Foghlú 2009
Independent variable:
Sizes of nails
– These were changed by the scientist.
– They used different sizes of nails in
their experiment to see what effect
that would have.
Copyright © Mícheál Ó Foghlú 2009
Dependent variable:
Number of paper clips picked up
– The number of paper clips were
observed and counted (measured)
Copyright © Mícheál Ó Foghlú 2009
Controlled variables:
Battery, wire, type of nail
– None of these items were changed
– They had used the same battery,
same wire, and same type of nail.
– Changing any of these things would
have made it an unfair test.
Copyright © Mícheál Ó Foghlú 2009
Here’s another:

The temperature of water was
measured at different depths of a
pond.
Copyright © Mícheál Ó Foghlú 2009

Independent variable – depth of the
water

Dependent variable – temperature

Controlled variables – same pond;
same thermometer
Copyright © Mícheál Ó Foghlú 2009
Last one:

Students modified paper airplanes by
cutting pieces off, adding tape, or adding
paper clips to increase the distance
thrown.
Copyright © Mícheál Ó Foghlú 2009

Independent variable – weight of plane,
center of gravity, air resistance (depended
on student choice-but only one was tested)

Dependent variable – distance thrown

Controlled variables – same plane design;
same paper; same throwing technique
Copyright © Mícheál Ó Foghlú 2009
Now let’s take what we know about
these variables and use them in an
experiment!
Copyright © Mícheál Ó Foghlú 2009
We are going to test how many drops
of water will fit on different sized
coins.
Let’s think about how we could test this.
– Identify the variables
– What exactly will be changed? How
will it be changed?
– What exactly will be measured?
How will it be measured?
Copyright © Mícheál Ó Foghlú 2009
What are my variables?
Independent variable – size of the coin
(penny, nickel, dime, quarter)
 Dependent variable – amount of water held
on coin (# of drops)
 Controlled variables

–
–
–
–
Same eye dropper
Same water
Same side of coin (pick heads or tails)
Same technique (height/angle of dropper)
Copyright © Mícheál Ó Foghlú 2009
Statistical Analysis

http://www.slideshare.net/sababutt/statistical-analysis-of-datafinal-presentation
Copyright © Mícheál Ó Foghlú 2009
SIGNIFICANCE OF
STATISTICS FOR ANALYSIS
AND RESEARCH
Copyright © Mícheál Ó Foghlú 2009
STATISTICS IS NECESSARY FOR ALL
FIELDS OF LIFE REQUIRING RESEARCH
AND DATA ANALYSIS
In all fields of life we have to analyze facts and
interpret from these to make conclusions. The
analysis needs statistics – to compare the qualities
and quantities to help reach some conclusion, which
will lead to decision making in business, government,
industry etc and development of theories in science.
Copyright © Mícheál Ó Foghlú 2009
BIOSTATISTICS IS A DISCIPLINE THAT IS
CONCERNED WITH:




designing experiments and other data collection,
summarizing information to aid understanding,
drawing conclusions from data, and
estimating the present or predicting the future.
In making predictions, Statistics uses the
companion subject of Probability, which models
chance mathematically and enables calculations of
chance in complicated cases.
Copyright © Mícheál Ó Foghlú 2009
SOME IMPORTANT
DEFINITIONS
Copyright © Mícheál Ó Foghlú 2009
POPULATION AND SAMPLE
POPULATION: A population consists of an entire set of
objects, observations, or scores that have something
in common. For example, a population might be
defined as all males between the ages of 15 and 18.
SAMPLE: A sample is a subset of a Population Since it
is usually impractical to test every member of a
population, a sample from the population is typically
the best approach available.
Copyright © Mícheál Ó Foghlú 2009
PARAMETER AND STATISTIC
PARAMETER: A parameter is a numerical quantity measuring
some aspect of a population of scores. For example, the mean
is a measure of central tendency in a population.
STATISTIC: A "statistic" is defined as a numerical quantity (such as
the mean calculated in a sample).
Copyright © Mícheál Ó Foghlú 2009
MEASURES OF CENTRAL TENDENCY
Mean (Arithmetic Mean)
Average value of a sample or population
Median
Middle value of sample or population
Mode
The value repeated most
Copyright © Mícheál Ó Foghlú 2009
The Arithmetic Mean or Mean is what is commonly
called the average: When the word "mean" is used without
a modifier, it can be assumed that it refers to the arithmetic
mean. The mean is the sum of all the scores divided by the
number of scores.
Formula of calculating Population Mean is:
μ = ΣX/N,
where μ = population mean, and
N = number of scores.
If the scores are from a sample, then the symbol X refers
to the mean and n refers to the sample size, formula
written as:
X = ΣX/n
Copyright © Mícheál Ó Foghlú 2009
Median: The median is the middle of a distribution:
half the scores are above the median and half are
below the median. The median is less sensitive to
extreme scores than the mean and this makes it a
better measure than the mean for highly skewed
distributions.
5
3
4
2.5 6
Mode: The mode is the most frequently occurring
score in a distribution and is used as a measure of
central tendency. The advantage of the mode as a
measure of central tendency is that its meaning is
obvious.
5
3
4
5
6
Copyright © Mícheál Ó Foghlú 2009
MEASURES OF DISPERSION
After measuring the central value i.e., mean, next is
to know that to which extent this central value
represents all values, that is, to know the scattering
or dispersion of the data. There are certain measures
which gives values of dispersion. The most important
and widely used of these in research are:
 Variance
 Standard Deviation
 Standard Error of Mean
Copyright © Mícheál Ó Foghlú 2009
HYPOTHESIS TESTING
T test
F test
ANOVA
Correlation
Regression
Copyright © Mícheál Ó Foghlú 2009
EXAMPLE OF DATA ANALYSIS


Comparison of Weight to Height Ratio expressed by
Body Mass Index of a population. BMI is calculated
as weight in Kg / Height in Meter2.
General surveys in USA and Europe showed that
young population is overweight which is enhancing
chances of diseases. We surveyed young female
population of Punjab University for BMI. We
measured BMI of 400 students randomly.
Copyright © Mícheál Ó Foghlú 2009
Subject
No.
M-1
M-2
M-3
M-4
M-5
M-6
M-7
M-8
M-9
M-10
M-11
M-12
M-13
M-14
M-15
M-16
M-17
M-18
M-19
M-20
M-21
BMI
36.66
20.21
30.29
29.33
31.97
27.58
25.33
26.90
27.74
27.01
26.82
22.65
31.90
30.81
20.84
25.19
22.98
28.68
22.73
22.86
27.73
Subject
No.
F-1
F-2
F-3
F-4
F-5
F-6
F-7
F-8
F-9
F-10
F-11
F-12
F-13
F-14
F-15
F-16
F-17
F-18
F-19
F-20
F-21
BMI
30.11
28.00
16.87
38.94
35.63
32.69
23.92
25.55
30.87
43.43
35.34
19.65
36.45
34.35
34.15
38.86
26.28
29.52
24.99
29.75
34.58
Copyright © Mícheál Ó Foghlú 2009
ARITHMETIC MEAN



We have two tables of data: one giving BMI of
girls, other BMI of boys. These are long data
tables.
Now, we have to analyze it to conclude something
from this data . What we need, now?
We need a measure of central tendency to
indicate average BMI to compare with other
populations, between boys and girls and with the
normal range.
The most common and useful measure for the
purpose is the Arithmetic Mean. Arithmetic Mean
is calculated by taking sum of all values and
dividing it by No. of observations. Copyright © Mícheál Ó Foghlú 2009
SAMPLING ERROR
Then next, we have an average value but is this
average representative of all values really. Is it
possible that some values be very large and some
very small? If it is so, the Mean is not representative
of whole data. This is called sampling error because
some students may have strong genetic tendency to
being overweight, these values are somewhat
different from population. This will make our result
erroneous, i.e., our Mean does not represent all data.
Copyright © Mícheál Ó Foghlú 2009
EXAMPLE
We have four values - 2, 3, 4, 10
Mean = Sum of values / No of
Observations
2 + 3 + 4 + 10 / 4
= 4.75
This is far from three values in the data.
This is because of a large value that
exists in the data i.e. 10.
Copyright © Mícheál Ó Foghlú 2009
STANDARD DEVIATION

Now, we need some statistical measure
that tell us how to rule out sampling
error.

This is the standard deviation –
measure to find how the individual
values vary from the average value, i.e.,
Mean.
Copyright © Mícheál Ó Foghlú 2009
Standard Deviation of that Data
SD = s = ∑
(x – x) 2
n-1
Descriptive Statistics from MINITAB
Variable
C1
N
4
Mean Median
4.75
3.50
StDev
3.59
SE Mean
1.80
Copyright © Mícheál Ó Foghlú 2009
T Test
Two Sample T-Test and Confidence Interval
Two sample T for BMI-F vs BMI-M
N
Mean
StDev
Mean
BMI-F 30
31.35
6.26
1.1
BMI-M 21
26.96
4.11
0.90
SE
95% CI for mu BMI-F - mu BMI-M: ( 1.5, 7.31)
T-Test mu BMI-F = mu BMI-M (vs not =): T= 3.02
P=0.0040 DF= 48
Copyright © Mícheál Ó Foghlú 2009
Other Issues

Covered
– Basics of experimental design
– Basics of statistical analysis

Not covered - experimental design
– Block structured design (e.g. Latin Squares)
– Understanding experimental errors

Not covered - statistical analysis
– Understanding the T Test and the large battery of
other tests (e.g. ANOVA)
– Assumptions of tests (e.g. that observations are
normally distributed) and when it is invalid to use a
test
– Discussion of significance

So this talk just scratched the surface!
Copyright © Mícheál Ó Foghlú 2009