Download Data Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Data Distributions
31/12/10
FANZCA Part I Notes
SUMMARY
(1) Normal Distribution
(2) Other (Standard normal distribution, Data transformation, Binomial, Poisson)
NORMAL DISTRIBUTION
- an observation that is normally distributed within a population has a norm with random
independent factors causing variation from that norm.
- most values cluster around the norm with fewer & fewer values towards the tails.
- extreme values do exist.
- variation is random -> equal spread of values above & below the norm.
- mean = median = mode
- normal distribution can be plotted to illustrate the frequency of observations or the
probability of an observation arising.
- the curve is bell shaped, symmetrical and theoretically of infinite size with tails that never
react the x axis.
- the mean and sd (standard deviation) of a sample likely to be close to the mean and sd of
the population from which it was sampled.
- the smaller the sample the less likely it will have 'normal' geometry -> less likely the mean
and standard deviation will match those of the population.
- to determine whether a sample is normally distributed you can plot the data and 'eye-ball'
the pattern.
- also you can calculate the mean & sd of the observed data & from the frequencies of the
values which would be expected with these parameters.
Jeremy Fernando (2010)
- comparison will tell you whether the sample is 'normal'
- population mean = the average value in a real population
STANDARD NORMAL DISTRIBUTION
- a transformation of the points on a normal distribution into multiples of the standard
deviation or standard error from the population mean.
- these multiples are termed z values & their distribution is sometimes referred to as z
transformation.
DATA TRANSFORMATION
- allows you to use parametric tests on data that has a skewed distribution by first converting
them to a near normal distribution.
- can use square root of data or logs
BINOMIAL DISTRIBUTION
- describes the probability of different proportions of a binary outcome arising in a fixed
number of observations.
- ie. the probabilities of different proportions of heads arising during sets of coin tosses
- the most likely proportion (the norm) in the population = the population proportion (pie)
- ie. heads in tossed coins = 0.5
- as sample size increases, it becomes more likely that the proportion of a particular
observation within the sample will be the same or similarly to that of the population
proportion (sigma)
- ie. the more times you throw the coin the more likely the proportion of heads = 0.5
- the larger the sample -> the closer the binomial distribution is to the normal distribution.
- the total of all outcomes must = 1.0
- the probability of a specific proportion arising in a sample is calculated using the binomial
formula.
- the inputs are (1) the proportion you are seeking, (2) the population proportion, (3) the
sample number
- as the binomial distribution can be approximated to a normal distribution, hypothesis tests
such as the normal approximation test can be carried out to determine the probability of
a particular proportion arising in a binomial distribution.
POISSONS DISTRIBUTION
- describes probability of a number of events occurring in a fixed time period or in a region of
space.
- events must be random and independent of each other.
- the probability is calculated from an exponential formula and depends on prior knowledge of
one parameter only -> the mean number of occurrences per unit time period (or unit region
of space)
- ie. if the number of adverse incidents in OT occur in a 2 year period is known, what is the
Jeremy Fernando (2010)
probability of 5 incidents happening in one day?
Jeremy Fernando (2010)