Download Probability - HKMU Student Portal

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Birthday problem wikipedia , lookup

Hardware random number generator wikipedia , lookup

Fisher–Yates shuffle wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Probability box wikipedia , lookup

Randomness wikipedia , lookup

Boson sampling wikipedia , lookup

Nyquist–Shannon sampling theorem wikipedia , lookup

Transcript
Probability
Normal Distribution
Sampling and Sample size
Benjamin Kamala
[email protected]
Probability
Proportional of times the event
occurs in a long series of random
numbers
E.g. tossing a coin once
P(H) = ½, P(T) = 1/2
•
•
•
•
•
•
Tossing a dice (1,2,3,4,5,6)
P(1)=1/6, p(2)=1/6, p(3)=1/6……
Even numbers, E: 2,4,6
Prime number, O: 2,3,5
P(E)=3/6=1/2
P(O) = 1/2
Note
• Take values from 0 to 1
• P= 0=event never occur, P=1=event certainly
occurs
• The sum of all probabilities = 1
Calculation
• Additional rule:(A or B)=P(A)+P(B)-P(A and B)
Independent events
• Occurrence of one event does not affect in
anyway the occurrence of the other
• When a coin and Dice are tossed together the
occurrence of any number on the dice does
not affect the occurrence of H/T
• Rule: P(A and B) = P(A)*P(B)
If the coin and dice are tossed
together:
• Find the probability of getting a H and even
number (E)
• P(H)=1/2
• P(E)=3/6=1/2
• P(H and E) = ½*1/2=1/4
• A pack of 52 playing cards: 13 spades, 13
hearths, 13 flowers, 13 Diamonds
• If a two cards are picked (one at a time) in
random manner what is the probability that
both are Diamonds?
Example
• If Joshua and Hassan attend Biostatistics
lectures only once in a month and there is one
Biostatistics lecture per week; what is the
probability of finding both of them in a given
lecture
• What is the probability of finding either of
them in a given lecture
• Mutually exclusive events: two events never
occur together
• P(A and B) = 0
• E.g when the coin is tossed once P(HnT) = 0
Distributions
• Continuous Probability Distribution
Also known as Gaussian distribution
Commonest
Continuous random variable
Belly shaped
Symmetrical about ū
Determined by ū and σ
Normal distribution
• Ū=0
• σ=1
• Allows different population to be compared to
each other
•
Characteristics
• The mean, median, and mode are all in the
middle of the curve.
• The total area under the curve above the xaxis is 1 square unit (50% of the area to the
right, 50% to the left of the mean)
Probability between the limits:
•
•
•
•
Ū-1σ to Ū+1σ is 0.68
Ū-1.96σ to Ū+1,96σ is 0.95
Ū-2.58σ to Ū+2.58σ is 0.99
The values 1, 1.96, 2.58 are also known as zscore
To calculate a z score
• Z = X-Ū
σ
In other literature the z-score is also known as Standard
Normal Deviant (SND)
Thus in any normally distributed variable with mean Ū
and standard deviation σ, probability between x1 and
x2 is the same as P between SND1 and SND2
• A study of blood pressure of African American
school boys gave a distribution of systolic
• blood pressure (SBP) close to the normal with
µ = 105.8mm Hg and σ = 13.4mm Hg.
• What percentage of boys would be expected
to have SBP greater than 120 mm Hg?
• Calculate SND =
120 – 105.8
= 1.06
•
13.4
Use Table of Standard normal distribution to
calculate the probabilities
• What percentage will have SBP less than 120
• What proportion will have SBP between 85
and 120
• Within what limits will central 99% SBP be
expected
• Suppose the average length of stay in at MMH for type
2 DM patient is 90 days with a standard deviation of
10. If it is reasonable to assume an approximately
normal distribution of lengths of stay, find the
probability that a randomly selected patient from this
group will have a length of stay:
• Greater than 80 days
• Less than 70 days
• Between 60 and 120 days
• Greater than 130 days
• Calculate the 95% Confidence interval (CI)
• If the Mean systolic blood Pressure
(SBP)values for HKMU First year students is
approximately normally distributed with the
mean of 110mmHg and standard deviation of
15mmHg. Find the probability that an
individual picked at random from this
population will have a SBP value
–
–
–
–
A. Between 100 and 110
B. Greater than 120
C. Less than 90
D. Between 95 and 115
Sampling
• election of a subset of individuals from within
a population to estimate the characteristics of
the whole population
• Advantages:
i. Faster
ii. Cost effective
iii. Homogeneity of data (Accuracy and quality)
• Study of the whole population: Census
Terms
• Sampling Unit: An element or set of elements
considered for selection in some stage of
sampling.
• Sampling Frame: A list of all the sampling
units in the population.
• Sampling Scheme/technique: A method of
selecting sampling units from a sampling
frame
Random vs Non random
• Non-random: purposive, no valid assessment
,lead to some bias
• Random: representative, a sample that has all
the important characteristics
Classification of Sampling Methods
• Non-Probability Sampling (grab or
opportunity)
– Convenient sampling
– Quota
• Probability: random sampling
–
–
–
–
–
Simple random sampling
Systematic Sampling
Stratified Sampling
Cluster Sampling
Multistage Sampling
Simple Random Sampling
• Units have an equal chance of being selected.
• The steps :
– Obtain a numbered list of all units in the study
population (i.e. availability of complete sampling
frame).
– Decide on the size of the sample.
– Select the required number of units using either
the ‘lottery’ system or tables of random numbers.
Systematic Sampling
• elements in the sample are obtained
systematically.
• Steps:
– Obtain the sampling frame and the size of the study
population N.
– Decide on the sample size, n.
– Calculate the sampling interval, k = N/n.
– Select the first element at random from the first k units.
– Include every kth unit from the frame into the sample
Stratified Sampling
• population is divided into subgroups (or
strata)
• each stratum is sampled randomly with a
known sample size.
• Strata may be defined according to some
characteristics of importance in the survey
(e.g. occupation, religion, age groups or even
locality)
Stratified sampling
• Units within a stratum should be as much alike
as possible and units in different stratum
should be as much different as possible
Cluster Sampling
• Obtaining a complete list of individuals in the study
population is not feasible or practical, or a complete
sampling frame is not available before the
investigation starts.
• Sampling units are a collection (cluster) of study
units
• within a cluster should as heterogeneous as possible
while the between-cluster variability should be as
low as possible
Multistage Sampling
• Multi-stage sampling is carried out in more
than 1 stage, and different sampling
techniques can be employed at every stage
Sample size
•
•
•
•
•
Depends on three parameters
Sample size:
Directly proportion to level of CONFIDENCE
Directly proportion to VARIABILITY
Inversely proportion to acceptable ERROR
Sample size for the mean
• n=
z2σ 2
ε2
• n=sample size
• σ standard deviation
• ε = maximum acceptable error
Sample size for the proportion
• n = z2 Π ( 1- Π )
ε2
• Or
• n = z2 Π ( 100- Π )
ε2
• Where n= sample size
– Π= proportion
– ε = error
• A study is planned to estimate the mean BW
of babies born at MMH. BW is approximately
normally distributed and 95% of babies born
are probably between 2000 and 4000g.
Determine the sample size required so that
there is 95% chance that the estimated BW
does not differ from the true value by more
than 50g