Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
BY Mrs. J. O Odengle Aliyu Garba [email protected] Institute of Computing & ICT Ahmadu Bello University, Zaria October, 2014 Important statistical terms Population: a set which includes all measurements of interest to the researcher (The collection of all responses, measurements, counts that are of interest) or Sample: A subset of the population Hypothesis: A hypothesis is a kind of truth claim about some aspect of the world: for instance Important statistical terms(Cont…) Hypothesis: A hypothesis is a kind of truth claim about some aspect of the world: for instance, the attitudes of patients or the prevalence of a disease in a population. Research sets out to try to prove this truth claim (or, more properly, to reject the null hypothesis - a truth claim phrased as a negative). For example, let us think about the following hypothesis: Levels of ill-health are affected by deprivation. and the related null hypothesis: Levels of ill-health are not affected by deprivation. Why sampling? Get information about large populations Less costs Less field time More accuracy i.e. Can Do A Better Job of Data Collection When it’s impossible to study the whole population Target Population: The population to be studied/ to which the investigator wants to generalize his results Sampling Unit: smallest unit from which sample can be selected Sampling frame List of all the sampling units from which sample is drawn Sampling scheme Method of selecting sampling units from sampling frame Types of sampling Non-probability samples Probability samples Non probability samples As they are not truly representative, nonprobability samples are less desirable than probability samples. However, a researcher may not be able to obtain a random or stratified sample, or it may be too expensive. A researcher may not care about generalizing to a larger population. The validity of nonprobability samples can be increased by trying to approximate random selection, and by eliminating as many sources of bias as possible. Non probability samples Quota sample The defining characteristic of a quota sample is that the researcher deliberately sets the proportions of levels or strata within the sample. This is generally done to insure the inclusion of a particular segment of the population. The proportions may or may not differ dramatically from the actual proportion in the population. The researcher sets a quota, independent of population characteristics Non probability samples Quota sample Example: A researcher is interested in the attitudes of members of different religions towards the death penalty in Anambra. In Lower a random sample might miss Muslims (because there are not many in that state). To be sure of their inclusion, a researcher could set a quota of 3% Muslim for the sample. However, the sample will no longer be representative of the actual proportions in the population. This may limit generalizing to the state population. But the quota will guarantee that the views of Muslims are represented in the survey Non probability samples Purposive sample A purposive sample is a non-representative subset of some larger population, and is constructed to serve a very specific need or purpose. A researcher may have a specific group in mind, such as high level business executives. It may not be possible to specify the population -- they would not all be known, and access will be difficult. The researcher will attempt to zero in on the target group, interviewing whomever is available Non probability samples Purposive sample A subset of a purposive sample is a snowball sample -- so named because one picks up the sample along the way, analogous to a snowball accumulating snow. A snowball sample is achieved by asking a participant to suggest someone else who might be willing or appropriate for the study. Snowball samples are particularly useful in hard-to-track populations, such as truants, drug users, etc Non probability samples Purposive sample Convenience sample A convenience sample is a matter of taking what you can get. It is an accidental sample. Although selection may be unguided, it probably is not random, using the correct definition of everyone in the population having an equal chance of being selected. Volunteers would constitute a convenience sample. Non probability samples Non-probability samples are limited with regard to generalization. Because they do not truly represent a population, we cannot make valid inferences about the larger group from which they are drawn. Validity can be increased by approximating random selection as much as possible, and making every attempt to avoid introducing bias into sample selection Probability samples Random sampling Each subject has a known probability of being selected Allows application of statistical sampling theory to results to: Generalise Test hypotheses Conclusions Probability samples are the best Ensure Representativeness Precision Methods used in probability samples Simple random sampling Systematic sampling Stratified sampling Multi-stage sampling Cluster sampling Simple random sampling Table of random numbers 684257954125632140 582032154785962024 362333254789120325 985263017424503686 Systematic sampling Sampling fraction Ratio between sample size and population size Systematic sampling Cluster sampling Cluster: a group of sampling units close to each other i.e. crowding together in the same area or neighborhood Cluster sampling Section 1 Section 2 Section 3 Section 5 Section 4 Stratified sampling Stratified sampling: Stratified sampling is a probability sampling technique wherein the researcher divides the entire population into different subgroups or strata, then randomly selects the final subjects proportionally from the different strata. For example, by gender, social class, education level, religion, etc. Then the population is randomly sampled within each category or stratum. If 38% of the population is college-educated, then 38% of the sample is randomly selected from the college-educated population Stratified sampling It is important to note that the strata must be non-overlapping. Having overlapping subgroups will grant some individuals higher chances of being selected as subject. This completely negates the concept of stratified sampling as a type of probability sampling. Stratified samples are as good as or better than random samples, but they require a fairly detailed advance knowledge of the population characteristics, and therefore are more difficult to construct. Uses of Stratified Random Sampling Stratified random sampling is used when the researcher wants to highlight a specific subgroup within the population. This technique is useful in such researches because it ensures the presence of the key subgroup within the sample. Researchers also employ stratified random sampling when they want to observe existing relationships between two or more subgroups. With a simple random sampling technique, the researcher is not sure whether the subgroups that he wants to observe are represented equally or proportionately within the sample. Uses of Stratified Random Sampling With stratified sampling, the researcher can representatively sample even the smallest and most inaccessible subgroups in the population. This allows the researcher to sample the rare extremes of the given population. With this technique, you have a higher statistical precision compared to simple random sampling. This is because the variability within the subgroups is lower compared to the variations when dealing with the entire population. Because this technique has high statistical precision, it also means that it requires a small sample size which can save a lot of time, money and effort of the researchers. Multi-Stage Sampling The four methods we've covered so far -- simple, stratified, systematic and cluster -- are the simplest random sampling strategies. In most real applied social research, we would use sampling methods that are considerably more complex than these simple variations. The most important combine the simple variety of useful ways needs in the most possible. When we combine sampling methods, we call this With this technique, you have a higher statistical precision compared to simple random sampling. principle here is that we can methods described earlier in a that help us address our sampling efficient and effective manner Multi-Stage Sampling This is because the variability within the subgroups is lower compared to the variations when dealing with the entire population. Because this technique has high statistical precision, it also means that it requires a small sample size which can save a lot of time, money and effort of the researchers. Consider a national sample of school districts stratified by economics and educational level. Within selected districts, we might do a simple random sample of schools. Within schools, we might do a simple random sample of classes or grades. And, within classes, we might even do a simple random sample of students. In this case, we have three or four stages in the sampling process and we use both stratified and simple random sampling. By combining different sampling methods we are able to achieve a rich variety of probabilistic sampling methods that can be used in a wide range of social research contexts. Probability Suppose that an event E can happen in h ways out of a total of n possible equally likely ways. Then the probability of occurrence of the event ( called its success) is denoted by The probability of nonoccurrence of the event (called its failure) is denoted by Thus E” is sometimes denoted by The event “not Probability Example. When a die is tossed, there are 6 equally possible ways in which the die can fall The event E, that a 3 or 4 turns up is and the probability of E is Pr(E)=2/6 or 1/3. The probability of not getting a 3 or 4 (i.e. getting a 1, 2, 5, or 6) is Pr{E} = 1-Pr{E}=2/3 Note that the probability of an event is a number between 0 and 1. If the event cannot occur, its probability is 0. If it must occur(i.e. Its occurrence is certain), its probability is 1 Properties of Probability Properties Sample Space The sample space is a set S comprised of all the possible outcomes of the experiment. The elements of a sample space are called elementary outcomes, or simply outcomes. The sample space may be finite or infinate e.g. S={1,2,3,4,5,6}, S = N(set of natural numbers). The elements of S must be mutually exclusive and exhaustive, in the sense that once the experiment is carried out, there is exactly one element of S that occurs. Example: If the experiment consists of a single roll of ordinary die, the natural sample space is the set S = {1,2,3,4,5,6}, consisting of six elements. The outcome 2 indicates that the result of the roll was 2. Event Space Is any subset of sample space. Conditional Probability If are two events, the probability that occurs given that has occurred its denoted by , and is called the conditional probability of given has occurred. If the occurrence or non-occurrence of does not affect the probability of occurrence of , then and we say that are independent events; otherwise, they are dependent events. If we denote by the event that “both occur ,” sometimes called a compound event, then In particular, For three events for independent events , we have Conditional Probability(Cont…) That is, the probability of occurrence of is equal to (the probability of )X(the probability of given that has occurred) X (the probability of given that both and have occurred). In particular, Example: Let and be the events ‘‘heads on fifth toss’’ and ‘‘heads on sixth toss’’ of a coin, respectively. Then and are independent events, and thus the probability of heads on both the fifth and sixth tosses is (assuming the coin to be fair) Conditional Probability(Cont…) Example1: If the probability that A will be alive in 20 years is 0.7 and the probability that B will be alive in 20 years is 0.5, then the probability that they will both be alive in 20 years is (0.7)(0.5)=0.35. Example2: Suppose that a box contains 3 white balls and 2 black balls. Let E1 be the event ‘‘?first ball drawn is black’’ and E2 the event ‘‘second ball drawn is black,’’ where the balls are not replaced after being drawn. Here E1 and E2 are dependent events. The probability that the first ball drawn is black is .The probability that the second ball drawn is black, given that the first ball drawn was black, is . Thus the probability that both balls drawn are black is Conditional Probability(Cont…) MUTUALLY EXCLUSIVE EVENTS: Two or more events are called mutually exclusive if the occurrence of any one of them excludes the occurrence of the others. Thus if and are mutually exclusive events, then If denotes the event that ‘‘either or or both occur,’’ then In particular for mutually exclusive events Conditional Probability(Cont…) Example If is the event ‘‘drawing an ace from a deck of cards’’ and is the event ‘‘drawing a king,’’ then . The probability of drawing either an ace or a king in a single draw is Since both an ace and a king cannot be drawn in a single draw and are thus mutually exclusive events(shown below). Conditional Probability(Cont…) Example If is the event ‘‘drawing an ace from a deck of cards’’ and is the event ‘‘drawing a spade,’’ then and are not mutually exclusive since the ace of spades can be drawn(shown below). . Thus probability of drawing either an ace or a spade or both is Note that the event ‘‘ and ’’ consisting of those outcomes in both events is the ace of spades.). Probability Distribution Random Variable: A random variable is a function defined at each point of the sample space. Discrete: If a variable X can assume a discrete set of values X1, X2...,XK with respective probabilities p1,p2,...,pK, where p1+p2 +…+pK=1, we say that a discrete probability distribution for X has been defined. The function p(X), which has the respective values p1, p2,...,pK for X =X1, X2,...,XK, is called the probability function, or frequency function, of X. Because X can assume certain values with given probabilities, it is often called a discrete random variable. A random variable is also known as a chance variable or stochastic variable. Probability Distribution(Cont…) Example Let a pair of fair dice be tossed and let X denote the sum of the points obtained. Then the probability distribution is as shown in Table below. For example, the probability of getting sum 5 is 4/ 36 =1/ 9; thus in 900 tosses of the dice we would expect 100 tosses to give the sum 5. Probability Distribution(Cont…) Continuous The above ideas can be extended to the case where the variable X may assume a continuous set of values. The relativefrequency polygon of a sample becomes, in the theoretical or limiting case of a population, a continuous curve (such as shown in figure below) whose equation is Y=p(X). The total area under this curve bounded by the X axis is equal to 1, and the area under the curve between lines X = a and X = b (shaded in Fig. below) gives the probability that X lies between a and b, which can be denoted by Pr(a < X < b): Probability Distribution(Cont…) We call p(X) a probability density function, or briefly a density function, and when such a function is given we say that a continuous probability distribution for X has been defined. The variable X is then often called a continuous random variable. As in the discrete case, we can define cumulative probability distributions and the associated distribution functions. MATHEMATICAL EXPECTATION If p is the probability that a person will receive a sum of money S, the mathematical expectation (or simply the expectation) is defined as pS. Example Find E(X) for the distribution of the sum of the dice given in Table below. The distribution is given in the following EXCEL printout. The distribution is given in A2:B12 where the p(X) values have been converted to their decimal equivalents. In C2, the expression =A2*B2 is entered and a click-and-drag is performed from C2 to C12. In C13, the expression =Sum(C2:C12) gives the mathematical expectation which equals 7 MATHEMATICAL EXPECTATION MATHEMATICAL EXPECTATION Properties of Mathemical Expectation Suppose X and Y are two random variables. Then E(X+Y) = E(X) + E(Y) E(X-Y) = E(X) - E(Y) E(XY) = E(X)E(Y) E(cX) = cE(X) where c any constant E(c) = 0 MOMENTS MOMENTS The expected values E(X), E(X2), E(X3), ..., and E(Xr) are called moments. The first moment is the mean and measures the center of the distribution MOMENTS Some functions of moments are sometimes difficult to find. Therefore, special functions, called moment-generating functions can sometimes make finding the mean and variance of a random variable simpler MOMENTS