Download Quantitative Analysis

DATA TYPES AND QUANTITATIVE DATA ANALYSIS PRESENTED TO THIRD-TRIMESTER YEAR 1 1 DATA  Information expressed qualitatively or quantitatively  Data are measurements of characteristics Measurements are functions that assign values in quantitative or quantitative form  Characteristics are referred to as variables Eg. Height, weight, sex, tribe, etc  2 VARIABLES AND DATA TYPES  Variable as characterization of event  Classification of Variables   3 – Qualitative: usually categorical; values/members fall into one of a set of mutually exclusive & collectively exhaustive classes. eg. Sex, crop variety, animal breed, source of water, type of house Quantitative: numeric values possessing an inherent order.  Discrete: eg. # of children/farmers/animals, etc  Continuous: height, weight, distance, etc Random and Fixed Data Types  Scales of measurements  Nominal Ordinal    Interval Ratio Levels of measurement distinguished on the basis of the following criteria:  Magnitude or size; Direction  Distance or interval; Origin  Equality of points; Ratios of intervals; Ratio of points 4 NOMINAL DATA       5 Example: Sex (Gender) coded M,F or 0,1 ‘Numbers’ simply identify, classify, categorize or distinguish. The score has no size or magnitude Score has equality because two subjects are similar (equal) if they have same number Weakest level of measurement; poor Arithmetic operations CANNOT be performed on nominal data types ORDINAL DATA        6 Associated with qualitative random variables Generated from ranked responses (or from a counting process). Have properties of nominal-data, in addition to DIRECTION Numeric or non-numeric Next to nominal in terms of weakness Arithmetic operations must be avoided Egs: knowledge (low, average, high), socioeconomic status, attitude, opinion (like, dislike, strongly dislike), etc. INTERVAL and RATIO INTERVAL – Numeric, have magnitude or size, direction, distance or interval, and origin – Interval scale has no absolute 0 that is NOT independent of system of measurement [0oC not same temperature as 0oF] – Eg. Temperature in degrees Fahrenheit or Celsius RATIO • Weight of cassava in kilogram or pounds weight – Numeric, have magnitude or size, direction, distance or interval, and origin – Absolute origin exists and not system dependent All arithmetic operations can be performed on such data types 7 DATA COLLECTION PROCESSES  Processes include (not mutually exclusive) – Routine Records; Survey Data; – Experimental data; – 8 ROUTINE (MONITORING) DATA     9 Data periodically recorded essentially for administrative use of the establishment and for studying trends or patterns. Examples – medical records, meteorological data Some statistical analysis of data possible on description and prescription Cheap data, and planning could be haphazard EXPERIMENTAL DATA      10 Treatments are the investigated factors of variation Treatments are controlled by the designer Treatment levels may be fixed, random, qualitative, quantitative Comparative experimental data require inductive analysis Emphasis on inference including estimation of effects and test of hypotheses. SURVEY DATA COLLECTION  Information on characteristics, opinions, attitudes, tendencies, activities or operations of the individual units of the population  Based on a small set of the population Can be planned; preference for random surveys   11 Researcher or investigator has no (or must not exercise) control over the respondent or data Which procedure to use?  Depends on study objectives  All 3 procedures are possible while in the community  Monitoring and Survey procedures will be most used during the first year. We discuss SURVEY further  12 SAMPLING (SURVEY) METHODS  Ensure units of population have same chance of being in the sample. Sampling Types    13 Probability sampling - the selection of sampling units is according to a probability (random & non-random) scheme. Non-probability sampling - selection of samples not objectively made, but influenced a great deal by the sampler. Example – haphazard and use of volunteers Preference is for probability sampling, but situation may determine otherwise SYSTEMATIC SAMPLING Procedure   14 Sampling units are selected according to a pre-determined pattern. For instance, given a sampling intensity of 10% from a population of 100 numbered trees or units (strips etc) might require your observing every 1 out of 10 trees (units, strips) in an ordered manner or sequence Selection in Systematic Procedure 15  E.g. if by some process, random or non-random, the 3rd tree (unit or strip) is selected first, then the 13th, 23rd, 33rd, 43rd,..., 93rd trees (unit, strips) will accordingly be selected. Strictly, this type of selection as illustrated with the population of 100 trees (units) involves only one sample.  Improve by selecting 1st unit randomly from 1 to 10, or 1 to 100, and by MULTIPLE random starts Applications of Systematic Sampling _ Population is unknown _ Baseline studies on spatial distribution patterns of population _ Baseline studies on extent/distribution of pests, pathogens, etc. _ Mapping purposes _ Regeneration studies 16 Advantages of Systematic Sampling _ Easy to set-up _ Relative speed in data collection _ Total coverage of population assured _ Good base for future designs, as position of characters can easily be mapped (with known coordinates) _ Demarcation of units not necessary, as sampling units are defined by first unit. 17 Disadvantages of Systematic Sampling 18  With only one random observation, sampling error not valid  Unknown trend(s) in population can influence results adversely [Examples: topography, season of sampling interval] Avoiding the disadvantages 19  The first major disadvantage on sampling error can be rectified by introducing several multiple random starts through stratification of the population  The second problem of trend is more difficult but simply relates to the choice of the sampling interval. Simple/Unrestricted Random Sampling 20  Unlike the systematic sampling, sampling units need not be equally spaced.  We shall define this as that sampling procedure which ensures equal probability for all samples of the same size (without any restriction imposed on the selection process). Illustration of SRS  Given a pop. Size of N from which a sample of size n will be drawn, the number of possible ways of obtaining the sample is  {( N Nn!)!n!}     21 Supposing a population is known to have 5 units, and a sample size of 3 is required. From this population of 5 units, there are 10 possible ways of obtaining a sample of size 3. [The formula is 5C3= 5!/{(5-3)! 3!} = 10]. Each of these combinations is unique and has the same chance (1/10) of being selected. Thus SRS is a random sampling procedure where each sample of size n has the same probability of selection. SRS selection process   22 (i) Select randomly one 'sample combination' from the number 1 to 10 (as there are 10 possible combinations). (ii) Use the table of random numbers to select 3 numbers from 1 to 5 or select three numbers from a 'hat' containing all the five numbers. This option seems easier and more practicable than (i). Summary - SRS    23 Application: Applied when the population is known to be homogeneous. Procedure is suitable for units defined by plot sizes. Advantage: Easy to apply, though not as easy as the systematic procedure. Disadvantage: Requires knowledge of all the units in the population (construction of the frame is necessary) STRATIFIED RANDOM SAMPLING 24  Requires dividing the population into non-overlapping homogeneous units, which we are called STRATA.  SRS is then applied to each stratum, hence stratified random sampling (STRS).  Examples of strata types or criteria are ages of plantation, species types, aspect, topography/ altitude, farm types, habitat  Dividing the population into such homogeneous units usually leads to better estimates of the desired population parameters. Where/when to apply Stratified RS    25 Very suitable for heterogeneous areas (or units) that can be identified and classified into homogeneous entities. Supplementary information, e.g. rem sensing aerial photographs, useful for stratification. Choice of strata should ensure variation between units within strata is less than the variation between strata. Advantages/Disadvantages of STRS Advantages  Estimates are more precise  Separate estimates and inferences for strata are possible Disadvantages  Sample size depends on type of allocation to be used  Sampling likely to be efficient in some strata than others  Errors in strata classification affect overall estimate  Frame construction for each stratum is required. 26 Allocation of units (n) to strata 27  Equal allocation - Equal (same) number of units are collected from each stratum.  Proportional allocation - The number of units per strata is proportional to the size of the strata. ANALYSING QUALITATIVE DATA 28  Qualitative data are essentially labels of a categorical variable  Statistical Analyses involve totals, percentages and conversion to pie-charts and bar charts (bar-graphs).  Sophisticated analyses include categorical modelling EXAMPLE 40 Chart of A,B,C 35 30 25 20 Series1 1 15 2 10 3 5 0 1 29 2 3 Hse Freque ncy Percen t Degree of 360 A=1 36 72% 260 B=2 10 20% 72 C=3 4 8% 28 You can have multiple bar graphs (i.e, can have more than one variable illustrated on a bar chart. Example is given below: 40 80 35 70 30 60 25 50 Male 20 Female 15 30 10 20 5 10 0 Male 0 1 30 Female 40 2 3 1 2 3 Contingency Table This involves count summaries for 2 or more categories placed in row-column format: Example of a 2 by 3 contingency table: Gender Male Female 31 Group A B C 36 34 10 28 4 2 Assess association between Gender & Group ANALYSING QUANTITATIVE DATA 32  Basic analyses involve determining the CENTRE and SPREAD of data.  Inferential, probability and non-probability based Measuring Centre Statistics include – – – 33 MODE (most frequently occurring observation) MEDIAN (observation lying at the centre of an ordered data) – best for INCOME data MEAN (a sufficient, consistent, unbiased statistic, utilising ALL observations) EXAMPLE  Consider that we selected RANDOMLY 10 houses out of 50, and observed the number of school-aged children who do not go to school as follows: 1 2 4 4 1 1 6 Find MEDIAN, MODE, MEAN 34 0 5 2   MODE: 1 as it appeared most often (most households have at least 1 child of school-going age not in school) MEDIAN: Centremost observation after ordering data lies between the 4th and 5th data, i.e., between 2 and 2 (= 352) 0 1 1 1 2 2 4 4 5 6 Interpretation: 50% of the sampled population have up to 2 children of school-going age not in school)  35 MEAN: We use the arithmetic mean = sum of data divided by no. of observations, = (0+1+1+1+ 2+2+4+4+5+6)/10=2.6 Measuring Spread Statistics include – – – – – 36 MINIMUM, MAXIMUM (ie EXTREME data) RANGE (a single statistic calculated as MAXIMUM minus MINIMUM value) MEAN of the sum of the ABSOLUTE DEVIATION STANDARD DEVIATION (SD, but use the divisor n-1, not n as in most calculators). STANDARD ERROR EXAMPLE  Consider that we selected RANDOMLY 10 houses out of 50, and observed the number of school-aged children who do not go to school as follows: 1 2 4 4 1 1 6 0 5 2 Find STANDARD DEVIATION, STANDARD ERROR and CONFIDENCE LIMITS 37 CALCULATING SPREAD: STANDARD DEVIATION Standard Deviation: Square Dev X Deviation 1 -1.6 2.56 1 -1.6 2.56 1 -1.6 2.56 0 -2.6 6.76 2 -0.6 0.36 2 -0.6 0.36 4 1.4 1.96 4 1.4 1.96 5 2.4 5.76 6 3.4 11.56 n SD  (X 38 36.4 n 1 36.4 9 SD  Approximate SD = ASD  Range 4 SD  X  X )2 i n 26 i 2 i  = 2.01 ( X i ) 2 i n n 1 = (6-0)/4 = 1.5 (valid if sample is large and distribution is normal) Sampling fraction (f) and Finite Population Correction Factor (fpc) 39  Sampling fraction= f = n/N = 10/50 = 0.20 (represents the proportion of the population that is sampled, i.e. observed)  If f < 0.05, fpc is ignored. In our case, f > 0.5 (indeed equals 0.20), fpc must be calculated and used for the sampling error computation fpc = (N-n)/N = 1– n/N = 1- 0.20 = 0.80 CALCULATING SPREAD: STANDARD ERROR n (X i  X )2 i n 1 n SE  SE  40 SD n fpc  (1  2.01 10 n ) N x 0.80 = 0.57 Confidence (Fiducial) Limits  Given a level of significance, 5%, can obtain a 95% confidence limit on the mean number of non-school going children by multiplying SE by 1.96, that is: P(2.6-1.96*0.57 < true number < 2.6+1.96*0.57) =1-0.05= 0.95 P(1.5 < true number per household < 3.7) = 0.95  Interpretation: 95% certain that true number of children in community who are of school-age but at home is between 1.5 (1) and 3.7 (4). OR can conclude (after multiplying by the total 50 households  41 75 to 185 school-aged children in the community are not in school Combining Spread and Centre BOX PLOT 42 HISTOGRAM Further Analysis of Quantitative Data 43  Histograms give idea of the distribution of the data; very useful for quantitative data  An excellent alternative to histogram is the stem-leaf diagram.  Measures of association – correlation analysis, dependence (cause-effect) relations (regression procedures) – 2006/2007 DATA ANALYSIS IS ENDLESS!!! 44  ENJOY YOUR TIME DURING TTFPP  END  KS Nokoe, PT Birteeb, IK Addai, M Agbolosu, L Kyei,

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Quantitative Analysis