* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download How do we quantify uncertainty: through Probability!
Inductive probability wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Foundations of statistics wikipedia , lookup
Student's t-test wikipedia , lookup
Statistical inference wikipedia , lookup
Gibbs sampling wikipedia , lookup
History of statistics wikipedia , lookup
Kinds of statistics • Descriptive- goal is to describe the variables and relationships among them. Often used when you are working with an entire population (e.g. all SSU students) • Inferential- used when a sample is taken from the larger population, and inferences about the population are made from the sample. Requirements for inferential statistics • Individuals selected for sampling should be chosen at random • If sampling is not random, it is biased and inferences about the population are suspect • Biased samples include those where individuals that are easy to capture for measurement differ in some way from those who escape capture Class measurement example Requirement of independence • Choice of any individual from population does not affect the likelihood that another individual will be chosen. • Each measurement should represent a separate replicate from the population • Measurements taken from some common arena or the same individuals are not independent, and statistical analyses should take account of these • We obtained 210 measurements last week in lab. • These measurements were based on 15 replicates (people), where each replicate was measured twice by two groups. • If we wanted to statistically compare gender with respect to our characters, our sample size is 15, NOT 60. • One way to deal with lack of independence would be to average the 4 measures per person. Reasons for uncertainty in data – Process uncertainty- the true population changed between samples due to some process of interest – Observation uncertainty- difference is generated by sampling error How do we quantify uncertainty: through Probability! • Summary statistics (e.g. mean) reflect imperfect estimates of the mean value of a population • Measurement error- where the measuring device is imperfect (often not nearly as great as variation among individuals in biology) 1 Probabilities • Important to understanding significance of differences between samples • Range from zero to one • Expressed as P(A) – Relative frequency of event over long term – Degree of belief whether event will occur • Often probabilities must be combined, or conditioned on other events Probability distributions (1.1) • Random variables have probability distributions associated with them • Possible values for the variable are indicated on the horizontal (X) axis • The relative probability for each value is shown on the Y axis – For a continuous variable, probability can be established by some function f(Y) – Area under the curve = 1, the sum of probabilities Using probability distributions (2.1) • Often we are interested in the probability associated with a range of values • Thus, we examine the area beneath the curve associated with that range Distributions for inferential statistics (fig 1.2) • Z or normal distribution • Student’s t distribution- used to compare sample means with a sample standard deviation to population values. Width depends on degrees of freedom (sample size) • Chi square distribution- used to compare observed versus expected values, especially frequencies • F distribution- used to compare distribution of a ratio of two independent sum of square variables, each divided by its own df. Normal distribution • Symmetrical probability distribution with a bell shape (see formula in text) • Can be defined by two parameters- mu and sigma • These parameters are independent • Mu indicates the location of the peak • Sigma indicates the breadth of the curve Estimation of the center of a distribution for a continuous variable (Table 2.1) • Median– middle measurement of a set of data, which estimates the center of a distribution – Is equal to the mean if data are normally distributed – Is less sensitive than the mean to outliers if data are skewed • Mean – Closest constant variable to a random variable – If equal weighting used, calculated by multiplying the sum by 1/N, the sample size 2