Download Probability - Vetnet LLN

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
The Scientific Method
Probability and Inferential Statistics
Probability and Inferential Statistics
Scientific investigations sample values of a variable to make
inferences (or predictions) about all of its possible values in
the population
BUT!
There is always some doubt as to whether observed values = population values eg
Jersey cow serum iron concentrations
Inferential statistics quantify the doubt –
► What
are the chances of conclusions based on a sample of the population holding
true for the population as a whole ?
► are
the conclusions safe ? – will the prediction happen in most observed situations?
Probability is defined as a relative frequency or proportion – the chance of something
happening out of a defined number of opportunities
Lecture 6: Probaility and Inferential Statistics
For example:
subjectively as a % expectation of an event – a cow has a 60% chance of calving
tonight (based on experience, but subject to individual opinions)
a priori probability – based on the theoretical model defining the set of all
probabilities of an outcome. eg when a coin is tossed, the probability of obtaining a
head is ½ or 0.5
defined probability – the proportion of times an event will occur in a very large
number of trials (or experiments) performed under similar conditions. e.g. the
proportion of times a guinea pig will have a litter of greater than three, based upon
the observed frequency of this event
All of these approaches are related mathematically
Probabilities can be expressed as a percentage (23%), a fraction / proportion
(23/100) or a decimal (0.23) as parts of a whole (= parts of a unitised number of
opportunities)
Lecture 6: Probaility and Inferential Statistics
Two rules govern probabilities
Addition rule – when two events are mutually exclusive (they can’t occur at the
same time)
the probabilities of either of them occurring is the sum of the probabilities of each
event eg 1/5 + 1/5 = 2/5 or 0.4 for two particular biscuits out of 5 types
Multiplication rule - when two events are independent, the probability of both events
occurring = the product of their individual probabilities
e.g. a Friesian cow inseminated on a particular day has a probability of calving 278 days
later (the mean gestation period) of 0.5 (she either calves or she doesn’t!)
If two Friesian cows are inseminated on the same day, then the probability of both of
them calving on the same day 278 days later is (0.5 x 0.5 ) = 0.25
Probability distributions can derive from discrete or continuous data
a discrete random variable with only two possible values (e.g. male/female)
is called a binary variable
Lecture 6: Probaility and Inferential Statistics
The binomial distribution portrays the
frequency distribution of data relating
to an “all or none” event – whether an
animal displays or doesn’t display a
characteristic eg pregnant / not
pregnant, number of spots on ladybirds
(either 3,5,7,9,11,15,18, 21 etc!)
eg number of spots on ladybirds in a sample
For a continuous variable, the
probability that its value lies within a
particular interval is given by the
relevant area under the curve of the
probability density function
The NORMAL ( or Gaussian ) DISTRIBUTION is a theoretical distribution of a continuous
random variable (x) whose properties can be described mathematically by the mean
() and standard deviation (σ)
Lecture 6: Probaility and Inferential Statistics
the proportion of the values of x
lying between + and – 1x (times!),
1.96x and 2.58x the standard
deviation on either side of the
mean. It means 100% of the data
values are included within 3 sd
units either side of the mean
In a perfectly symmetrical normal distribution,
MEAN, MEDIAN and MODE have the same value
Normal distributions
with the same value of
the standard deviation
(σ) but different values
of the mean ()
Lecture 6: Probaility and Inferential Statistics
It is possible to make predictions about the likelihood of the mean value of a variable
differing from another mean value – whether the difference is likely or unlikely to be
due to chance alone
This is the basis of significance testing - if the distribution of observed values
approximates to the normal distribution, it becomes possible to compare means of
variables with the theoretical distribution and estimate whether their observed
differences are significantly different from the expected values of each variable if they
are truly normally distributed
eg Student’s t Test
We carry out an experiment on guinea pigs to test the hypothesis that dietary lipid
sources rich in ω3 polyunsaturated fatty acids improve coat condition
We compare the breaking strength of hairs from two groups of 10 guinea pigs fed a
normal mix compared with a diet supplemented with cod liver oil, recording the max.
weight their hair will support as tensile strength in g.
We want to decide whether the mean strength of hairs from the control and
experimental groups differ significantly at the end of the trial
Lecture 6: Probaility and Inferential Statistics
Calculating the t statistic
First we must calculate the sample mean, variance and standard deviation for each
data set (control and test)
x
x
n

x  x


s2
n 1
 x  x
2
2
s  var iance 
n 1
If the data for the control mean are referred to as a and the test mean as b, then the t
statistic is calculated as:
x a  x b 
t
s2 a s2 b

na nb
The steps for doing this manually are best set out in a table
Lecture 6: Probaility and Inferential Statistics
Calculating the t statistic
Hair Tensile
Strength
GP Control
Group a (g)
Hair Tensile
Strength
GP Test
Group b (g)
Xa – Xa
(Xa – Xa )2
Xb – Xb
(Xb – Xb )2
6.6
7.9
0.26
0.067
0.42
0.176
5.9
8.4
-0.44
0.193
0.92
0.846
7.0
8.0
0.66
0.436
0.52
0.270
6.1
6.7
-0.24
0.058
-0.78
0.608
6.3
8.8
-0.04
0.002
1.32
1.742
6.0
6.5
-0.34
0.116
-0.98
0.960
6.8
7.2
0.46
0.212
-0.28
0.078
5.6
6.8
-0.74
0.548
-0.68
0.462
6.7
6.4
0.36
0.130
-1.08
1.166
6.4
8.1
0.06
0.004
0.62
0.384
∑ = 63.4
∑ = 74 .8
na = 10
nb = 10
Xa = 6.34
Xb = 7.48
Lecture 6: Probaility and Inferential Statistics
∑ = 1.77
∑ = 6.69
Calculating the t statistic
Calculating the variance:
Sa  1.77  9  0.94
2
Sb  6.69  9  0.74
2
and the standard deviation
Sb  0.74  0.86
Sa  0.94  0.97
and finally the t statistic!
6.34  7.48   1.14  2.78
t
0.41
0.094  0.074
We can ignore the
–ve sign!
Degrees of Freedom of the data set
na  nb   2  10  10  2  18df
We then compare our calculated value of t with those in the table of critical values for
the value of t
Lecture 6: Probaility and Inferential Statistics
Significance and confidence
These are the significance levels for
the t statistic at 10%, 5%, 1% and 0.1%,
(from left to right)
If our value for t (2.78) exceeds any of the
tabulated values of t for 18 df, which it does
for p = 0.05, but not for p = 0.01, we can say
“the means are different at the 5% level of
significance and we can reject H0”
(the null hypothesis of no difference between
the two treatments)
The confidence level is simply
100 – (significance level)
So, alternatively, we could say:
“ we can be 95% confident that there is
a significant difference between the
two means”
Lecture 6: Probaility and Inferential Statistics
Related documents