Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Granular computing wikipedia , lookup
Data analysis wikipedia , lookup
Psychometrics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Renormalization group wikipedia , lookup
Data assimilation wikipedia , lookup
Pattern recognition wikipedia , lookup
Corecursion wikipedia , lookup
Sampling Techniques Types of samples •Different sampling techniques Sampling Terms • Population • The entire group of people of interest from whom the researcher needs to obtain information. • it depends on the objective of your research • It should be identified properly • Individual units/elements • Appropriate inclusion-exclusion criterion should be identified • Defined target population • A population can be defined as set or collection all people or items with the characteristic one wishes to understand. • Because there is very rarely enough time or money to gather information from everyone or everything in a population, the goal becomes finding a representative sample (or subset) of that population. Contd…. • Note also that the population from which the sample is drawn may not be the same as the population about which we actually want information. Often there is large but not complete overlap between these two groups due to frame issues etc . • Sometimes they may be entirely separate - for instance, we might study rats in order to get a better understanding of human health, or we might study records from people born in 2008 in order to make predictions about people born in 2009 Contd…. • Define target population – target population could be much larger than the study population. • Sampling frame – the complete list of the population units( in finite population case) • Sampling units – the elements or units considered for inclusion in the sample Sampling process • The sampling process comprises several stages: – Defining the population of concern – Specifying a sampling frame, a set of items or events possible to measure – Specifying a sampling method for selecting items or events from the frame – Determining the sample size – Implementing the sampling plan – Sampling and data collecting – Reviewing the sampling process Sampling • The process of obtaining information from a subset (sample) of a larger group (population) • The results for the sample are then used to make estimates of the larger group • Faster and cheaper than asking the entire population • Two keys • Selecting the right people • • Have to be selected scientifically so that they are representative of the population Selecting the right number of the right people • To minimize sampling errors I.e. choosing the wrong people by chance Characteristics of a good sample • • • • • • Representative of the population Accessible Cost effective Of the right size Obtained with minimum sampling error It should be suitable for analysis as per the study design Types of samples/Sampling procedures • Probability sampling: • Scientific approach to select representative part of the population. • Every possible sample has a probability of selection which could be equal or unequal,but predetermined . • Inclusion probabilities of sampling units is defined. • Prejudiced selection/biased selection of units is avoided • A probability sampling scheme is one in which every unit in the population has a chance (greater than zero) of being selected in the sample, and this probability can be accurately determined. • . When every element in the population does have the same probability of selection, this is known as an 'equal probability of selection' (EPS) design. Such designs are also referred to as 'selfweighting' because all sampled units are given the same weight. Contd…. – simple random sampling – systematic sampling – stratified sampling – cluster sampling --Multistage and multi-phase sampling Note: Prepare assignment which should cover details of the technique, when it is used, examples and relative comparison of the different techniques ( To submit on 15/09 ) Simple random sampling-illustration • Select a random sample of 15 students from a class of 100 students • Using random number table • A university is testing the effectiveness of two different medications. They have 20 volunteers. To conduct the study, researchers randomly assign a number from 1 to 2 to each volunteer. Volunteers who are assigned number 1 get Treatment 1 and volunteers who are assigned number 2 get Treatment 2. (random number) Self study exercise • • • • Non-probability sampling When do you use? What are the drawbacks? Can we use such sample data for detailed statistical analysis? • Explain the different non-probability sampling techniques For future discussion Sample size determination Data and types of data • Data are facts/information collected together in raw or unorganized form , usually as numbers, that refer to or represent the observations on characteristics of interest/study. • Data – plural, datum - singular • Ex: • BP of patients90,110,110,140,120,110,90,130,… Contd… • • • • Marks: 87,45,35,65,68,58,30,… Height( in cms.): 167,158,148,152,160,145,… Eye color: black,blue,black,grey,blue,black,…. Income status : high,low,low,midle,low,midle,high • Pain: mild, mild,severe,mild,moderate,moderate,severe Types of data • We get data by making observations /measurements on ‘characteristics’ of interest • Can you identify the various characteristics of interest in your research study? • In Statistics, all such characteristics are called ‘variables’ • Ex: height, weight, BMI, eyecolour, level of pain, duration of recovery , concentration of a chemical, ……. Contd… DISCRETE QUANTITATIVE VARIABLE CONTINUOUS QUALITATIVE Scales of measurement • Measurement scales are used to categorize and/or quantify variables. • four scales of measurement that are commonly used in statistical analysis: nominal, ordinal, interval, and ratio scales. • Each scale of measurement satisfies one or more of the following properties of measurement • Identity. Each value on the measurement scale has a unique meaning. Magnitude. Values on the measurement scale have an ordered relationship to one another. That is, some values are larger and some are smaller. • Equal intervals. Scale units along the scale are equal to one another. • This means, for example, that the difference between 1 and 2 would be equal to the difference between 19 and 20. • A minimum value of zero. The scale has a true zero point, below which no values exist. • Nominal Scale of Measurement • The nominal scale of measurement only satisfies the identity property of measurement. Values assigned to variables represent a descriptive category, but have no inherent numerical value with respect to magnitude. • Gender is an example of a variable that is measured on a nominal scale. Individuals may be classified as "male" or "female", but neither value represents more or less "gender" than the other. • Religion , political affiliation, marital status ,eye color are other examples of variables that are normally measured on a nominal scale. • Ordinal Scale of Measurement • The ordinal scale has the property of both identity and magnitude. Each value on the ordinal scale has a unique meaning, and it has an ordered relationship to every other value on the scale. • An example of an ordinal scale: stage of a disease,severity of pain, level of satisfaction • We call such variables as ‘caterogical variables, • Interval Scale of Measurement • The interval scale of measurement has the properties of identity, magnitude, and equal intervals. • A perfect example of an interval scale is the Fahrenheit scale to measure temperature. The scale is made up of equal temperature units, so that the difference between 40 and 50 degrees Fahrenheit is equal to the difference between 50 and 60 degrees Fahrenheit. • With an interval scale, you know not only whether different values are bigger or smaller, you also know how much bigger or smaller they are. For example, suppose it is 60 degrees Fahrenheit on Monday and 70 degrees on Tuesday. You know not only that it was hotter on Tuesday, you also know that it was 10 degrees hotter. • Ratio Scale of Measurement • The ratio scale of measurement satisfies all four of the properties of measurement: identity, magnitude, equal intervals, and a minimum value of zero. • The weight of an object would be an example of a ratio scale. Each value on the weight scale has a unique meaning, weights can be rank ordered, units along the weight scale are equal to one another, and the scale has a minimum value of zero. • Weight scales have a minimum value of zero because objects at rest can be weightless, but they cannot have negative weight. Organisation of data • Data may be available in raw/unorganized form, which need to be arranged in a systematic way. This task is called organisation of data. • It may involve preprocessing/cleaning. • Coding the variables , if required • Preparing meta data ( data on data ) • Preparation of tables/cross tabulation How to prepare tables? • Simple tables: – Just one variable( qualitative/quantitative) – Listing the values with variable description – Preparing frequency distributions – Cross tabulation ( bivariate data) ___________________________________ – Stem and leaf plots ( Read & workout!) ___________________________________ Preparing frequency distributions • Frequency distribution of categorical variable/discrete variable • It is a table of frequencies of different values of the categorical variable/discrete. • Ex: The data below present the level of pain(coded) experienced by patients. The codes are : mild=0,moderate=1,severe=2 • 1,2,0,0,2,0,1,2,2,0,2,1,1,0,0,2,1,0,2,0,0,1,2,0,1, 2,0,2,1,0,2,1,0,2,1,1,2,1,0,2,2,1,1,2,0,0,0,0,1,0 Distribution of pain levels experienced by patients Pain level No. of patients 0 19 (38%) 1 15 (30%) 2 16 (32%) Total 50 0=mild,1=moderate,2=severe Remember this… • Table should be neatly drawn. • It should have a title, table no.,row and /column headings. Total of col./row should be shown ( depending on the problem). • A footnote can be added to give details of codes and any other special features noted. Contd… • Usually when we prepare a frequency distribution for categorical data, we show the % values along with the frequencies. • Suppose we have to consider the gender of the patient along with pain level, then we cross-tabulate gender v/s pain level. How will you do this? Frequency distribution of continuous data • Following formulae can be used to decide the no. of class intervals( bins ) • Determine the range of the sample data – R= Max. – min. • Square root formula: k= √n , where n is the number of observations and k is the no. of class intervals. • k= R/ h , where R is the range and h is the suggested bin width . k is approximated to the nearest integer. • ( i.e., for R=43,h=5 k=8.6 ,approximated to 9 ) Contd… • Sturge’s formula: k= 1+ log 2 n or k = 1 + 3.322 log n 10 This formula works well for n > 30. For n ≤ 30 ,it fails to reflect any trend. It is poor if data are non-normal. Ex: If n= 100 , then k= 1 + 3.322 x 2 = 7.644 ~ 8 After finding the number of bins, we determine the class width ( bin width ) using the formula w= R/k, where w is the bin width and R is the range. Usually we take w adjusted to convenient round figures. Example • In a data set min. value is 18.7 and max. value is 68.8. If there are 180 observations , determine the number of classes and class intervals using the different formulae. • Using square root formula, k= √ 180 = 13.42 ~ 13. class width,w = R/k = ( 68.8-18.7 )/13= 50.1/13 = 3.85 ~ 4. Hence the class intervals are 18-22, 22-26, 26-30,30-34,34-38,38-42,42-46,46-50,5054,54-58,58-62,62-66,66-70 Contd… • Sturge’s formula: k=1+3.322x log 180 = 7.49~7 w= 50.1/7 ~ 7 The classes are : 18 –25, 25-32,32-39,39-46,46-53,53-60,6067,67-74 You can consider: 18 – 25, 26 – 33, 34 – 41, 42 – 49, 50 – 57, 58 – 65, 66 - 73 Take bin width w=6: then k= 8.51 ~ 9. Classes are: 18-24,24-30,30-36,36-42,42-48, 48-54,54-60,6066,66-72 Know these concepts : inclusive and exclusive classes, class limits, class boundaries, frequency , cumulative frequency , relative frequency Points to remember • A thumb rule for deciding the no. of class intervals is to consider not less than six classes and not more than 15 classes. With less than six classes there will be too much of summarisation and more than 15 classes would mean not enough summarisation • The number of class intervals (k) given by different formula need not be taken as final but only as a guidance value. The actual no. of class intervals may be taken around that guidance value. • When it is appropriate,we can select classes with class width 5 or 10 and use class limits beginning and ending with 5 and itsmultiples( or multiples of 10 ) ex.: 0 -5, 5 – 10, 10 – 20, etc. Home work ( for presentation and discussion ) • Select a data set, preferably related to biological or medical example and construct frequency tables using the different formulae. • Obtain relative frequency and cumulative frequencies. • Prepare a brief report on the major observations you can highlight. • NOTE: Do not select the common data set. Diagrams /charts • Bar charts/diagrams – Simple , multiple , component , percentage • Pie chart • Home work: prepare detailed notes on the above topics based on the discussions held in the class. Coverage : need for different types of charts, context of use, interpretation ,do’s and don’t‘s etc. Learn how to use EXCEL to draw the charts/diagrams Histogram • Used to summarize continuous data. • Visualisation of a frequency distribution • Vertical bars representing frequencies are drawn over class intervals. • Usually equal width classes are considered • Histograms are useful to understand symmetry or asymmetry of data • Prepare detailed notes and workout examples. Histogram- examples • Interactive histogram Examples • People were asked to state the number of hours they exercise in a seven day period. The results of the survey are listed below. Make a frequency table and histogram to display the data. • 8, 2, 4, 7.5, 10, 11, 5, 6, 8, 12, 11, 9, 6.5, 10.5, 13 ,8.5 ,3 , 4.5, 6, 5.5, 7.5, 8, 10, 11, 10.5, 4.5, 8, 7, 5, 6.5, 4.5, 6.5 ,7.5, 8, 10,13, 9.5, 3.5, 4.5,5, 5.5 , 6.5, 7.5, 8, 10 Stem and leaf plot • Bears strong resemblance to a histogram and serves the same purpose • Used to show the distributional structure of quantitative data • It preserves the information contained in individual observations • It can be constructed along with the tallying process when a frequency distribution is constructed Method of construction • Each measurement( observation ) is partitioned into two parts – a stem and a leaf • Arrange the observations horizontally in increasing order against a stem value. All such values are the leaves. Do this for all the stem values .Such an arrangement looks like a histogram with horizontal bars of numbers. • Learn the construction using an example