Download Variability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linear regression wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Regression analysis wikipedia , lookup

Least squares wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Variability and statistical tests
Where the variability comes from?
• Instrumental measurements
• Biology
–
–
–
–
Genotype
Environment
Ootype
Experimental factors
• Randomly fluctuating
• Gradually changing in time: drift
Errors
- Random: not reproducible
- Systematic: is reproduced in a particular setting
- Major: something crucial has been overlooked in the experiment
Depending on context, these can be classified into factors of:
1. Core research interest
2. Satellite ones
3. Nuisance
Variables
• Nominal: yellow, blue, green…
• Ordinal: small, big
• Interval: 0…10, 11…20, 21…30 etc.
• Ratio: p/N
• Continuous: 3.1415926…
• Discrete: 4, 7, 11
• Binary: 0 or 1, “Yes” or “No”
Signal vs. noise
On can introduce a continous
varaiable,..
Y
X
…discrete classes (~bins, levels etc.) for
one variable,..
Y
Class 1
Class 2
X
…or discrete classes for both variables
Y
Class Y2
Class Y1
Class Xa
Class Xb
X
Quantify it!
We should figure out which factors are most relevant to the
phenomenon being studied. An example:
1.
Age
σ2(Age)
2.
Sex
σ2(Sex)
3.
Genotype σ2(Genotype)
4.
Measurement differences σ2(Measurement)
5.
Experimental conditions σ2(Condition)
Thus, the general linear model:
Y = μ + σ2(Age) + σ2(Sex) + σ2(Genotype) + σ2(Measurement) + σ2(Condition) + ε
• Y: response of the system
• μ: grand mean
• σ2: variance from the factor
• ε: error (correctly speaking, residual or unexplained variance!)
In other words, to capture a signal
y = f(x)
(an example signal: “the higher x, the
better y”),
a formalization is needed.
Different methods:
• Work with different data (both factors and
responses)
• Have different power in different conditions
(sample size, data type, design topology)
• Answer different questions (defined via
null hypotheses)
• Provide different amount of supplementary
output (graphs, tables etc.)
What are variables?
Variables are things that we measure,
control, or manipulate in research. They
differ in the role they are given in our
research and in the way they are
measured.
Correlational vs.
experimental research
• In correlational research we do not influence any
variables but only measure them
• In experimental research, we manipulate some variables
and then measure the effects of this manipulation on
other variables; for example, a researcher might
artificially increase blood pressure and then record
cholesterol level.
However, “correlation-like” techniques still may be applied
to experimental data. But due to a better quality of the
experimental setting, they potentially provide qualitatively
better information
Dependent vs.
independent variables
• Independent varables ARE
MANIPULATED in the experiment
• Dependent ones ARE NOT
MANIPULATED
• Independent variables shape the
experiment
• Dependent variables measure its result
Relations between variables
• Distributed in a consistent manner
• Systematically correspond to
each other
Do not forget the noise!
Features of relations
• Two basic features of every relation between
variables “magnitude” (~strength) vs.
“reliability” (~confidence, or significance): not
totally independent.
Statistical significance
p-level: the probability of the relation to NOT
EXIST
Null hypothesis
• Null hypothesis H0 which we test:
– Is the reference point in the analysis
– States that “The factor does not work” (or
“The relation does not exist”)
– Its rejection proves (at some probability) that
the factor does work (“is likely to work”)!
In the tests we are going to consider,
the null hypothesis H0: σ2(The factor) = 0
almost always has an equality condition!
• How to determine that a result is "really"
significant?
• How the "level of statistical significance" is
calculated?
• Can "no relation" be a significant result? Only
after a test on the general population!
• How to measure the magnitude (strength) of
relations between variables? (regression)
• Common "general format" of most statistical
tests.
Why stronger relations between
variables are more significant?
The stronger relation, the higher is the
chance it will exceed the noise. Thus, the
relation is easier to prove.
•
Description of the established relations:
– Strong?
1. Absolutely
2. Related to other relations
– Confident?
•
By different tests
– Robust? What happens if:
•
•
we change the method?
the distribution changes the shape?
In the general linear model
x = μ + σ2(Age) + σ2(Sex) + σ2(Genotype) + σ2(Measurement) + σ2(Condition)
+ε
Each of the terms σ2 can be questioned.
Moreover, their particular combinations can be studied.
x = μ + … σ2(Age X Sex) + … + σ2(Sex X Genotype) + σ2(Age X Genotype X
Condition) + … + ε
Examples:
“Does the disease prognosis deteriorate with age equally for men and women?”
H0: σ2(Age X Sex) = 0
“Is not genotype AbC reaction particularly difficult to detect by measuring with
tool Z?” H0: σ2(Genotype X Measurement) = 0
Pearson correlation
Kind of problems that it
solves
Describe strength of relations
between two variables
Suitability
For normally distributed data
Basic theory
Sums of squares
Addressable questions
If X and Y are related?
Not addressable questions
Format of input data
How to run
Interpretation of the results
Presentation of the results