Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Variability and statistical tests Where the variability comes from? • Instrumental measurements • Biology – – – – Genotype Environment Ootype Experimental factors • Randomly fluctuating • Gradually changing in time: drift Errors - Random: not reproducible - Systematic: is reproduced in a particular setting - Major: something crucial has been overlooked in the experiment Depending on context, these can be classified into factors of: 1. Core research interest 2. Satellite ones 3. Nuisance Variables • Nominal: yellow, blue, green… • Ordinal: small, big • Interval: 0…10, 11…20, 21…30 etc. • Ratio: p/N • Continuous: 3.1415926… • Discrete: 4, 7, 11 • Binary: 0 or 1, “Yes” or “No” Signal vs. noise On can introduce a continous varaiable,.. Y X …discrete classes (~bins, levels etc.) for one variable,.. Y Class 1 Class 2 X …or discrete classes for both variables Y Class Y2 Class Y1 Class Xa Class Xb X Quantify it! We should figure out which factors are most relevant to the phenomenon being studied. An example: 1. Age σ2(Age) 2. Sex σ2(Sex) 3. Genotype σ2(Genotype) 4. Measurement differences σ2(Measurement) 5. Experimental conditions σ2(Condition) Thus, the general linear model: Y = μ + σ2(Age) + σ2(Sex) + σ2(Genotype) + σ2(Measurement) + σ2(Condition) + ε • Y: response of the system • μ: grand mean • σ2: variance from the factor • ε: error (correctly speaking, residual or unexplained variance!) In other words, to capture a signal y = f(x) (an example signal: “the higher x, the better y”), a formalization is needed. Different methods: • Work with different data (both factors and responses) • Have different power in different conditions (sample size, data type, design topology) • Answer different questions (defined via null hypotheses) • Provide different amount of supplementary output (graphs, tables etc.) What are variables? Variables are things that we measure, control, or manipulate in research. They differ in the role they are given in our research and in the way they are measured. Correlational vs. experimental research • In correlational research we do not influence any variables but only measure them • In experimental research, we manipulate some variables and then measure the effects of this manipulation on other variables; for example, a researcher might artificially increase blood pressure and then record cholesterol level. However, “correlation-like” techniques still may be applied to experimental data. But due to a better quality of the experimental setting, they potentially provide qualitatively better information Dependent vs. independent variables • Independent varables ARE MANIPULATED in the experiment • Dependent ones ARE NOT MANIPULATED • Independent variables shape the experiment • Dependent variables measure its result Relations between variables • Distributed in a consistent manner • Systematically correspond to each other Do not forget the noise! Features of relations • Two basic features of every relation between variables “magnitude” (~strength) vs. “reliability” (~confidence, or significance): not totally independent. Statistical significance p-level: the probability of the relation to NOT EXIST Null hypothesis • Null hypothesis H0 which we test: – Is the reference point in the analysis – States that “The factor does not work” (or “The relation does not exist”) – Its rejection proves (at some probability) that the factor does work (“is likely to work”)! In the tests we are going to consider, the null hypothesis H0: σ2(The factor) = 0 almost always has an equality condition! • How to determine that a result is "really" significant? • How the "level of statistical significance" is calculated? • Can "no relation" be a significant result? Only after a test on the general population! • How to measure the magnitude (strength) of relations between variables? (regression) • Common "general format" of most statistical tests. Why stronger relations between variables are more significant? The stronger relation, the higher is the chance it will exceed the noise. Thus, the relation is easier to prove. • Description of the established relations: – Strong? 1. Absolutely 2. Related to other relations – Confident? • By different tests – Robust? What happens if: • • we change the method? the distribution changes the shape? In the general linear model x = μ + σ2(Age) + σ2(Sex) + σ2(Genotype) + σ2(Measurement) + σ2(Condition) +ε Each of the terms σ2 can be questioned. Moreover, their particular combinations can be studied. x = μ + … σ2(Age X Sex) + … + σ2(Sex X Genotype) + σ2(Age X Genotype X Condition) + … + ε Examples: “Does the disease prognosis deteriorate with age equally for men and women?” H0: σ2(Age X Sex) = 0 “Is not genotype AbC reaction particularly difficult to detect by measuring with tool Z?” H0: σ2(Genotype X Measurement) = 0 Pearson correlation Kind of problems that it solves Describe strength of relations between two variables Suitability For normally distributed data Basic theory Sums of squares Addressable questions If X and Y are related? Not addressable questions Format of input data How to run Interpretation of the results Presentation of the results