Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Another Information-Gathering Technique & Introduction to Quantitative Data Analysis Neuman and Robson Chapter 11. Research Data library at SFU http://www.sfu.ca/rdl/ Quiz 2 Coverage • New Material from the Lectures and from the following Chapters – 7 (Sampling), 8 (Surveys), 10 (Nonreactive Measures & Existing Statistics) and the beginning of Chapter 11 (univariate statistics) • The quiz may also include material covered in the first quiz especially: – – – – – Standardization & rates Scales & indices validity & reliability, levels of measurement, the notions of exhaustive & mutually exclusive categories. Types of Equivalence for comparative research using existing statistics • lexicon equivalence (technique of back translation) • contextual equivalence (ex. role of religious leaders in different societies) • conceptual equivalence (ex. income) • measurement equivlence (ex. different measure for same context) Ethical Issues in Comparative Research • ethical issues sometimes very important – ex. impact of demographic research on funding of developing countries, controversy surrounding studies of the origins of AIDS • sensitivity, privacy etc.sometimes still issues even if “subjects” dead. Quantitative Data • Types of Statistics – Descriptive – Inferential • Common Ways of Presenting Statistics – Tables – Charts – Graphs Data Preparation • Recall: Coding Issues with War & Peace Journalism codes last day • Entering Data into Spreadsheet or data processing software • Cleaning Data Recall: Coding Principles • categories – exhaustive – mutually exclusive • consistent for all cases • comparable with other studies Ways of Developing Coding Categories • pre-defined coding schemes – e.g. close-ended questions – Ex. Coding Missing Values (conventions not always used) • not applicable=77, • don’t know=88, • no response=99 • post-collection analysis More Examples of Coding Process • Sheet for One Television Commercial • Excel spreadsheet showing entered codes • SPSS example Data entry conventions Discrete & Continuous Variables • Continuous – Variable can take infinite (or large) number of values within range • Ex. Age measured by exact date of birth • Discrete – Attributes of variable that are distinct but not necessarily continuous • Ex. Age measured by age groups (Note: techniques exist for making assumptions about discrete variables in order to use techniques developed for continuous variables) Cleaning Data • checking accuracy & removing errors – Possible Code Cleaning • check for impossible codes (errors) – Some software checks at data entry – Examine distributions to look for impossible codes – Contingency cleaning • inconsistencies between answers (impossible logical combinations, illogical responses to skip or contingency questions) Descriptive Statistics (some topics for next few weeks) • Univariate (one variable) – – – – Frequency distributions Graphs & charts Measures of central tendency Measures of dispersion • Bivariate (two variables) – Crosstabulations – Scattergrams & other types of graphs – Measures of association • Multivariate (more than two variables) – Statistical control – Partials – Elaboration paradigm Frequency Distribution (Univariate) Table 5-1 Alienation of Workers __________________________________ --------------------------------------------------------Level of Alienation Frequency --------------------------------------------------------High 20 Medium 67 Low 13 (Sub Total) 100 (N=150) No Response 60 (Total) (N=210) Simple Univariate Frequency Distributions and Percentages • univariate:= one variable • “raw count” (frequencies, percentages) Conventions in table design • total number of cases (N=) • grouping cases – pro: see patterns – con: lose information Graph of Frequency Distribution (Univariate) Another visual representation of a distributions: Pie charts Critically Analyzing Data on Frequency Distributions: Collapsing Categories and Treatment of Missing Data • Consider Raw Data (Numbers) not just percentages • Examine data preparation – Treatment of missing cases? – Collapsing categories? Johnson, A. G. (1977). Social Statistics Without Tears. Toronto: McGraw Hill. Treatment of Missing Data: Raw Data Table 5-1 Alienation of Workers __________________________________ --------------------------------------------------------Level of Alienation Frequency --------------------------------------------------------High 20 Medium 67 Low 13 (Sub Total) 100 (N=150) No Response 60 (Total) (N=210) Treatment of Missing Data (%) • Comparison of % distributions and without non respondents Table 5-1 Alienation of Workers Table 5-1 Alienation of Workers Level of Alienation High Medium Low No Response Level of Alienation High Medium Low F 30 100 20 60 % 14 48 10 29 (Total) (Total) 210 100 F 30 100 20 % 20 67 13 150 100 Treatment of Missing Data (%) • Comparison with high & medium collapsed Table 5-1 Alienation of Workers Table 5-1 Alienation of Workers Level of Alienation High & Medium Low No Response Level of Alienation High & Medium Low F 130 20 60 % 62 10 29 (Total) (Total) 210 Non-respondents included F 130 20 % 87 13 150 100 100 Non-respondents eliminated Treatment of Missing Data (%) • Comparison with medium & low collapsed Table 5-1 Alienation of Workers Table 5-1 Alienation of Workers Level of Alienation High Medium & Low No Response Level of Alienation High Medium & Low F 30 120 60 % 14 58 29 (Total) (Total) 210 Non-respondents included F 30 120 % 20 80 150 100 100 Non-respondents eliminated Grouping Response Categories(%) • Comparison of with high & medium response categories collapsed Table 5-1 Alienation of Workers Table 5-1 Alienation of Workers Level of Alienation High & Medium Low No Response Level of Alienation High& medium Low (Total) 210 Freq % 62 10 29 100 (Total) 150 Freq % 87 13 Core Notions in Basic Univariate Statistics Ways of describing data about one variable (“uni”=one) –Measures of central tendency • Summarize information about one variable • three types of “averages”: arithmetic mean, median, mode –Measures of dispersion • Analyze Variations or “spread” • Range, standard deviation, percentiles, z-scores Mode • most common or frequently occurring category or value (for all types of data) Babbie (1995: 378) Graph (Normal Distribution) with single mode Bimodal Distribution • When there are two “most common” values that are almost the same (or the same) Median • middle point of rank-ordered list of all values (only for ordinal, interval or ratio data) Babbie (1995: 378) Mean (arithmetic mean) – Arithmetic “average” = sum of values divided by number of cases (only for ratio and interval data) Babbie (1995: 378) Two Data Sets with the Same Mean Normal Distribution & Measures of Central Tendency • Symmetric • Also called the “Bell Curve” Neuman (2000: 319) Skewed Distributions & Measures of Central Tendency Skewed to the left Skewed to the right Neuman (2000: 319) Normal & Skewed Distributions Why Measures of Central Tendency are not enough to describe distributions: Crowd Example • 7 people at bus stop in front of bar aged 25,26,27,30,33,34,35 – median= 30, mean= 30 • 7 people in front of ice-cream parlour aged 5,10,20,30,40,50,55 – median= 30, mean= 30 • BUT issue of “spread” socially significant Measures of Variation or Dispersion • range: distance between largest and smallest scores • standard deviation: for comparing distributions • percentiles: for understanding position in distribution% up to and including the number (from below) • z-scores: for comparing individual scores taking into account the context of different distributions Range & Interquartile range • distance between largest and smallest scores – what does a short distance between the scores tell us about the sample? – problems of “outliers” or extreme values may occur Interquartile range (IQR) • distance between the 75th percentile and the 25th percentile • range of the middle 50% (approximately) of the data • Eliminates problem of outliers or extreme values • Example from StatCan website (11 in sample) – – – – – – Data set: 6, 47, 49, 15, 43, 41, 7, 39, 43, 41, 36 Ordered data set:6, 7, 15, 36, 39, 41, 41, 43, 43, 47, 49 Median:41 Upper quartile: 41 Lower quartile: 15 IQR= 41-15 Standard Deviation and Variance • Inter quartile range eliminates problem of outliers BUT eliminates half the data • Solution? measure variability from the center of the distribution. • standard deviation & variance measure how far on average scores deviate or differ from the mean. Calculation of Standard Deviation 1 2 13 4 5 6 7 8 Neuman (2000: 321) Calculation of Standard Deviation Neuman (2000: 321) Standard Deviation Formula Neuman (2000: 321) Calculation of Standard Deviation Neuman (2000: 321) Interpreting Standard Deviation • amount of variation from mean • social meaning depends on exact case Details on the Calculation of Standard Deviation Neuman (2000: 321) The Bell Curve & standard deviation Discussion of Preceding Diagram • “Many biological, psychological and social phenomena occur in the population in the distribution we call the bell curve (Portney & Watkins, 2000).” link to source • Preceding picture – a symmetrical bell curve, – average score [i.e., the mean] in the middle, where the ‘bell’ shape tallest. – Most of the people [i.e., 68% of them, or 34% + 34%] have performance within 1 segment [i.e., a standard deviation] of the average score.” Interpreting Standard Deviation • amount of variation from mean • Illustration: high & low standard deviation • meaning depends on exact case Another Diagram of Normal Curve (Showing Ideal Random Sampling Distribution, Standard Deviation & Z-scores) Example:Central Tendency & Dispersion (description of distributions) Recall: • 7 people at bus stop in front of bar aged 25,26,27,30,33,34,35 – median= 30, mean= 30 – Range= 10, standard deviation=10.5 • 7 people in front of ice-cream parlour aged 5,10,20,30,40,50,55 – median= 30, mean= 30 – Range= 50, standard deviation=17.9 Other ways of characterizing dispersion or spread Techniques for understanding position of a case (or group of cases) in the context all of cases • Percentiles • Standard Scores – z-scores Percentile • 1st Calculate rank then choose a rank (score) and figure out percentage equal to or less than the rank (score) – Link to more complex definition of percentile • % up to and including the number (from below) – “A percentile rank is typically defined as the proportion of scores in a distribution that a specific score is greater than or equal to. For instance, if you received a score of 95 on a math test and this score was greater than or equal to the scores of 88% of the students taking the test, then your percentile rank would be 88. You would be in the 88th percentile” • Also used in other ways (for example to eliminate cases) Normal Distribution with Percentiles z-scores • For understanding how a score is positioned in the data set • to enable comparisons with other scores from other data sets – (comparing individual scores in different distributions) • example of two students from different schools with different GPAs – comparing sample distributions to population. How representative is sample to population under study? Calculating Z-Scores • z-score=(score – sample mean)/standard deviation of set – Link to formula – Link to z-score calculator Calculating Z-Scores Using Z-scores to compare two students’ from different schools • Susan has GPA of 3.62 & Jorge has GPA of 3.64 • Susan from College A – Susan’s Grade Point Average =3.62 – Mean GPA= 2.62 – SD= .50 – Susan’s z-score= 3.62-2.62=1.00/.50=2 – Susan’s grade is two Standard deviations above mean at her school Using Z-scores to compare two students’ from different schools (continued) • Jorge from College B – Jorge’s GPA =3.64 – Mean GPA= 3.24 – SD=.40 – Jorge’s z-score= 3.64-3.24=.40/.40=1 – Jorge’s grade is one standard deviation above the mean at his school • Susan’s absolute grade is lower but her position relative to other students at her school is much higher than Jorge’s position at his school Another Diagram of Normal Curve with Standard Deviation & Z-scores Discussion of Previous Case • Relationship of sampling distribution to population (use mean of sample to estimate mean of population) If Time: Begin Bivariate Statistics (Results with two variables) • Types of relationships between two variables: – Correlation (or covariation) • when two variables ‘vary together’ – a type of association – Not necessarily causal • Can be same direction (positive correlation or direct relationship) • Can be in different directions (negative correlation or indirect relationship) – Independence • No correlation, no relationship • Cases with values in one variable do not have any particular value on the other variable Techniques for examining relationships between two variables • Graphs, scattergrams or plots • Cross-tabulations or percentaged tables • Measures of association (e.g. correlation coeficient, etc.) Scattergram (Bivariate) Tables: Basic Terminology (Tables) • Parts of a Table – title (conventions) • Order of naming of variables • Dependent, independent, control – body, cell, column, row – “marginals” • sources, date Bivariate Statistics: Parts of the Table Example of Raw Data Table (computer printout-bivariate) Regan, T. (1985). In search of sobriety: Identifying factors contributing to the recovery from alcoholism. Kentville, NS. Another Style of Presentation of Percentaged Tables Serial Number Descriptive Caption Dependent Variable Independent Variable Table 1. Percentage in support of strike by type of school Variable Type of School Secondary Percent supporting Strike One category of dichotomous dependent variable 60% Categories (800) Marginals for Elementary 30% independent (1000) variable __________________________________________________________ N = 1800 Total Sample Presentation of Percentaged Tables (cont’d) Dependent Variable Independent Variable Control variable Table 2. Percentage who support strike by type of school and sex Categories of control variable Type of School Secondary Sex Female Per cent supporting strike Control variable Male Per cent supporting strike 60% 60% (400) (400) Elementary 30% 30% (900) (100) __________________________________________________________ Female = .30 : Male = .30 N = 1800 Some Important Factors in Interpretation of Tables • percentages vs. “raw” frequencies, need to know absolute number of cases (N=) • grouping categories, missing cases • direction of calculation of percentages (for bivariate and multivariate statistics) Collapsing categories (U.N. example) Babbie, E. (1995). The practice of social research Belmont, CA: Wadsworth Collapsing Categories & omitting missing data Babbie, E. (1995). The practice of social research Belmont, CA: Wadsworth Grouping Response Categories • To make new categories • Facilitate analysis of trends • But decisions have effects on the interpretation of patterns