Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability Normal Distribution Sampling and Sample size Benjamin Kamala [email protected] Probability Proportional of times the event occurs in a long series of random numbers E.g. tossing a coin once P(H) = ½, P(T) = 1/2 • • • • • • Tossing a dice (1,2,3,4,5,6) P(1)=1/6, p(2)=1/6, p(3)=1/6…… Even numbers, E: 2,4,6 Prime number, O: 2,3,5 P(E)=3/6=1/2 P(O) = 1/2 Note • Take values from 0 to 1 • P= 0=event never occur, P=1=event certainly occurs • The sum of all probabilities = 1 Calculation • Additional rule:(A or B)=P(A)+P(B)-P(A and B) Independent events • Occurrence of one event does not affect in anyway the occurrence of the other • When a coin and Dice are tossed together the occurrence of any number on the dice does not affect the occurrence of H/T • Rule: P(A and B) = P(A)*P(B) If the coin and dice are tossed together: • Find the probability of getting a H and even number (E) • P(H)=1/2 • P(E)=3/6=1/2 • P(H and E) = ½*1/2=1/4 • A pack of 52 playing cards: 13 spades, 13 hearths, 13 flowers, 13 Diamonds • If a two cards are picked (one at a time) in random manner what is the probability that both are Diamonds? Example • If Joshua and Hassan attend Biostatistics lectures only once in a month and there is one Biostatistics lecture per week; what is the probability of finding both of them in a given lecture • What is the probability of finding either of them in a given lecture • Mutually exclusive events: two events never occur together • P(A and B) = 0 • E.g when the coin is tossed once P(HnT) = 0 Distributions • Continuous Probability Distribution Also known as Gaussian distribution Commonest Continuous random variable Belly shaped Symmetrical about ū Determined by ū and σ Normal distribution • Ū=0 • σ=1 • Allows different population to be compared to each other • Characteristics • The mean, median, and mode are all in the middle of the curve. • The total area under the curve above the xaxis is 1 square unit (50% of the area to the right, 50% to the left of the mean) Probability between the limits: • • • • Ū-1σ to Ū+1σ is 0.68 Ū-1.96σ to Ū+1,96σ is 0.95 Ū-2.58σ to Ū+2.58σ is 0.99 The values 1, 1.96, 2.58 are also known as zscore To calculate a z score • Z = X-Ū σ In other literature the z-score is also known as Standard Normal Deviant (SND) Thus in any normally distributed variable with mean Ū and standard deviation σ, probability between x1 and x2 is the same as P between SND1 and SND2 • A study of blood pressure of African American school boys gave a distribution of systolic • blood pressure (SBP) close to the normal with µ = 105.8mm Hg and σ = 13.4mm Hg. • What percentage of boys would be expected to have SBP greater than 120 mm Hg? • Calculate SND = 120 – 105.8 = 1.06 • 13.4 Use Table of Standard normal distribution to calculate the probabilities • What percentage will have SBP less than 120 • What proportion will have SBP between 85 and 120 • Within what limits will central 99% SBP be expected • Suppose the average length of stay in at MMH for type 2 DM patient is 90 days with a standard deviation of 10. If it is reasonable to assume an approximately normal distribution of lengths of stay, find the probability that a randomly selected patient from this group will have a length of stay: • Greater than 80 days • Less than 70 days • Between 60 and 120 days • Greater than 130 days • Calculate the 95% Confidence interval (CI) • If the Mean systolic blood Pressure (SBP)values for HKMU First year students is approximately normally distributed with the mean of 110mmHg and standard deviation of 15mmHg. Find the probability that an individual picked at random from this population will have a SBP value – – – – A. Between 100 and 110 B. Greater than 120 C. Less than 90 D. Between 95 and 115 Sampling • election of a subset of individuals from within a population to estimate the characteristics of the whole population • Advantages: i. Faster ii. Cost effective iii. Homogeneity of data (Accuracy and quality) • Study of the whole population: Census Terms • Sampling Unit: An element or set of elements considered for selection in some stage of sampling. • Sampling Frame: A list of all the sampling units in the population. • Sampling Scheme/technique: A method of selecting sampling units from a sampling frame Random vs Non random • Non-random: purposive, no valid assessment ,lead to some bias • Random: representative, a sample that has all the important characteristics Classification of Sampling Methods • Non-Probability Sampling (grab or opportunity) – Convenient sampling – Quota • Probability: random sampling – – – – – Simple random sampling Systematic Sampling Stratified Sampling Cluster Sampling Multistage Sampling Simple Random Sampling • Units have an equal chance of being selected. • The steps : – Obtain a numbered list of all units in the study population (i.e. availability of complete sampling frame). – Decide on the size of the sample. – Select the required number of units using either the ‘lottery’ system or tables of random numbers. Systematic Sampling • elements in the sample are obtained systematically. • Steps: – Obtain the sampling frame and the size of the study population N. – Decide on the sample size, n. – Calculate the sampling interval, k = N/n. – Select the first element at random from the first k units. – Include every kth unit from the frame into the sample Stratified Sampling • population is divided into subgroups (or strata) • each stratum is sampled randomly with a known sample size. • Strata may be defined according to some characteristics of importance in the survey (e.g. occupation, religion, age groups or even locality) Stratified sampling • Units within a stratum should be as much alike as possible and units in different stratum should be as much different as possible Cluster Sampling • Obtaining a complete list of individuals in the study population is not feasible or practical, or a complete sampling frame is not available before the investigation starts. • Sampling units are a collection (cluster) of study units • within a cluster should as heterogeneous as possible while the between-cluster variability should be as low as possible Multistage Sampling • Multi-stage sampling is carried out in more than 1 stage, and different sampling techniques can be employed at every stage Sample size • • • • • Depends on three parameters Sample size: Directly proportion to level of CONFIDENCE Directly proportion to VARIABILITY Inversely proportion to acceptable ERROR Sample size for the mean • n= z2σ 2 ε2 • n=sample size • σ standard deviation • ε = maximum acceptable error Sample size for the proportion • n = z2 Π ( 1- Π ) ε2 • Or • n = z2 Π ( 100- Π ) ε2 • Where n= sample size – Π= proportion – ε = error • A study is planned to estimate the mean BW of babies born at MMH. BW is approximately normally distributed and 95% of babies born are probably between 2000 and 4000g. Determine the sample size required so that there is 95% chance that the estimated BW does not differ from the true value by more than 50g