Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics and Informatics www.vfu.cz/statistics Statistics is the science that allows to formulate and describe complex data (measurements, observations) in a short form, easily understood by all professionals. Statistics is especially needed in more probabilistic and less predictive sciences such as biology and applied biology (medicine). In a predictive science (such as math, physics) one has only to apply data in an appropriate formula to obtain accurate answer. (e.g. 1 + 1= 2) In biology and applied biology (medicine) – we deal with living organisms, that are very complex in their reactions and description. There is a high level of insecurity. Every individual is unique, therefore obtained data may be very different and variable (genetic variability) – they need specific methods (statistical) for their evaluation. The only statistical methods can take into account this great variability of biological data, evaluate them and give correct inferencies about studied biological objects. When we use statistics for biological issues we can use a term biostatistics. Biostatistics = statistics applied to biological problems (in particular in research sphere: how to design experiments and evaluate their results) - analyzes biological characters (their values differ from one entity to another) they are termed variables Different kinds of variables types (categories) of data they may be treated differently depending on their exactness Types of Biological Data • Data on Nominal Scale – are classified by some quality (Categorical Data) (2 possibilities: present or not present – disease, anomaly, death, vaccination … ) • Data on Ordinal Scale (Rank Data) – consist of arrangement of measurements based on subjective scale. (classification on grades, points in competitions) • Data on Numerical Scale – exact numeric values (obtained in objective measurement, device). (body temperature, weight, lenght, volume etc.) Categories of Data Different methods of statistical examination (different exactness) Statistical methods useful with numerical or ordinal data are more exact and generally are not applicable to nominal data (little information for exact methods). It is possible reversely : less exact methods for nominal (or ordinal) data are useful also for numerical data (used for preliminary analyses). Formal viewpoint: • Continuous Data - variables that could be any conceivable value within any observed range (height, lenght, weight, temperature) • Discrete Data (discontinuous) - variables that can take only certain values – integer numbers (number of animals, patients, eggs, cells etc.) Numerical- and ordinal-scale data may be continuous or discrete. Nominal-scale data are discrete by their nature. Statistical Sets (groups of individuals – animals, plants, cells, items, etc.) • Population (Universe) – N= (number of members) - „all items“, that could show studied variable - is often very large (cattle in Europe, dogs in Czech Rep., world) - „endless“ number of entities We are not able to obtain all possible measurements from the population in practice analysis of a small subset inferences about the population (aim of statistics) Statistical Sets • Sample (Subset) – n (number of members) - definite number of individuals from the population (that implies inaccuracy in evaluation in comparison with the whole population) - to reach the most valid conclusions about a population, the sample must be a representative subset of the population. It means: • random sample (no subjective choice) - drawing lots for registration numbers of animals, table of random numbers, etc. • appropriate size of the sample (the more the better, but there are practicable limits - time, money etc.) Characteristics of Variables - Discrete - Continuous Statistical variables can be described by means of some specific terms: Variant Sequence – listing of all observed values (variants) - arranged up or down e.g.: 2,3,4,4,5,5,5,6,6,7,7,8 (discrete data- number of youngs in a litter) Frequency of Variant – how many times each value is observed Frequency Distribution – graphically presented distribution of all observed frequencies in the sample Frequency Distribution – Discrete Data: (Bar Graph) y (frequency) 3 2 1 0 1 2 3 4 5 6 7 8 x (number of pups) Discrete data - number of pups in a litter: 2,3,4,4,5,5,5,6,6,7,7,8 Frequency Distribution – Continuous Data: (histogram) Continuos data: we create classes = equivalent intervals of data. Number of classes: according to the sample size (to 100 items: 6- 9 classes to 500 items: 10-15 classes) freq. Histogram Polygon (Empirical curve) – specific for one sample x (weight) midpoint of the class All data in the interval get the same value = midpoint of the class Number of items (individuals) in the interval = frequency of the class Frequency (Probability) Distribution P(x) – probability (proportion of cases) Empirical curves (samples) Theoretical curve (population) x (weight) Empirical curves for different samples (obtained from one population) are located along the only one theoretical curve (continuous), that describes probability distribution of the variable in the population. Shapes of Probability Distributions a) Normal (Gaussian) symmetric bell curve (most often in biol.data) b) Asymmetric (right-skewed, left-skewed) ad b) Extreme (decreasing, increasing) c) Nonnormal (unknown, irregular, 2 and more peaks) Proportions of Distribution For each distribution we can define measures (quantiles) that divide a group of arranged data into 2 parts (portions): - values that are smaller than quantile - values that are bigger than quantile 50% quantile – x0.5 (median) divides a group into 2 halves 50% 50% X0.5 50% 50% X0.5 Quartiles – divide a group of data into four equal parts 25% 25% 25% 25% X0.25 x0.5 x0.75 25% 25% 25% 25% X0.25 x0.5 x0.75 Deciles – divide a group of data into 10 equal parts Percentiles – divide a group of data into 100 equal parts Quantiles – Use in Statistics: • Important quantiles and their corresponding proportions of the most common distribution curves are tabulated in statistical tables • Are used: - as critical values in statistical hypotheses testing - as coefficients in calculations (assesment of confidential intervals of statistical parameters)