* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lecture #1
Survey
Document related concepts
Transcript
Action Research Introduction INFO 515 Glenn Booker INFO 515 Lecture #1 1 Course Scope This class focuses on understanding common types of analysis techniques which may be used to support research projects We will use the statistics program SPSS to manipulate data and generate graphs There will be weekly homework assignments for much of the term INFO 515 Lecture #1 2 Who cares… …about statistics and research methods? INFO 515 Commonly accepted techniques need to be used to ensure that valid comparisons and analyses are being made Statistics is a common language to express results Helps ensure that objective conclusions are reached Lecture #1 3 Why use SPSS? Microsoft Excel is adequate for simple math (arithmetic, averages, etc.) But Excel fails some standard tests for performing more advanced calculations (regression analysis, etc.) SPSS was chosen for its widespread usage and low cost student version INFO 515 Lecture #1 4 My Background Eighteen years of industry experience DOD (Department of Defense) and FAA (Federal Aviation Administration) work, primarily involved in software development, systems engineering, and project management Also teach statistical process control for high process maturity organizations Have been teaching for Drexel since 1998 INFO 515 Lecture #1 5 For the REAL serious student Get the ISO Standards Handbook “ISO Statistical methods for quality control”, 5th ed., 2000 It runs $418 for both 700+ page volumes No, I don’t expect you to buy this! If you do find someone to buy it for you, search for its title at http://global.ihs.com/ IHS is a great, if terribly expensive, source for military (MIL, DOD), industry (IEEE, ASTM), national (ANSI, DIN*), and international (ISO) standards * DIN is the German equivalent of ANSI INFO 515 Lecture #1 6 Other References More realistically, see my handout “Statistics for Software Process Improvement” INFO 515 It summarizes statistical terms, hypothesis testing, SPSS tips, and other stuff we’ll be using We’ll use it a lot Lecture #1 7 Definitions Data - observations collected in order to measure or describe a situation or problem of interest Data describes a variable Variables - are objects or concepts that must have a value or a definition assigned to them in order that they can be measured and analyzed INFO 515 They take on different values for individuals and groups Lecture #1 8 Discrete vs. Continuous Data Discrete data can take on only a finite number of values. It is often characterized by counting units (integers), or only specific values, like grades Continuous data can take on an infinite number of possible values and is characterized by some type of measurement, instrument, or scale INFO 515 You measure height, weight (Does anyone ever know exactly how much they weigh?), speed, etc. Lecture #1 9 Definitions Theory is a possible explanation of the relationships among variables Research Hypothesis – as a consequence of our theory, the hypothesis is the statement we submit to testing Often states there is a pattern, or difference, or trend among the variables Null hypothesis is the opposite of the research hypothesis INFO 515 States there is no trend or difference Lecture #1 10 Research Research describes what or explains why It is a method for finding answers to questions or a strategy for explanation Research is: 1. 2. 3. INFO 515 Empirical, because it is based on evidence or data Systematic, because it uses a method Objective, because it is presumably conducted and interpreted by the researcher without bias Lecture #1 11 Basic vs. Applied Research Basic research usually refers to laboratory research, such as experimental psychology INFO 515 In basic research, the researcher is testing theory and ideas without necessarily applying the results to practical problems Lecture #1 12 Basic vs. Applied Research Applied research is also called field research, evaluation research, or action research INFO 515 This type of research is often used to influence policy and decision-making, and is conducted to solve problems (often immediate problems), sometimes only within one organization (hence its results are only applicable to that organization) Lecture #1 13 Quantitative vs. Qualitative Quantitative Research tends to deal with variables that have numeric values How far do you commute to work? How tall are you? Qualitative Research looks at variables which are binary (Yes/No), have non-numeric values, or are free-form text INFO 515 What is your favorite football team? How could I improve this slide? Lecture #1 14 The Nature of Qualitative and Quantitative Research Strategies: Difference is the type of data you collect and the tools you employ Specifically— INFO 515 The same data collection strategies can be qualitative or quantitative Qualitative data can become quantitative Pure quantitative data cannot become qualitative Often in research, it is good to use qualitative and quantitative in the same study Lecture #1 15 Research Methods There are many different ways to conduct research Exactly how many ways depends on your field of study and how you wish to define them Here we break them into nine different methods (see narrative lecture notes too) INFO 515 Lecture #1 16 1. Historical Research Reconstruct the past to support a hypothesis or theme, while remaining objective and true to the actual events which occurred Example: study past software projects to see if it’s true that: “if a project was at least 10% behind schedule halfway through, it will finish at least 10% late” INFO 515 Lecture #1 17 2. Descriptive Research This is a non-judgmental type of research Examine a situation or area systematically and describe it Example: study how library patrons navigate when looking for a particular book INFO 515 Lecture #1 18 3. Developmental Research Examine how something grows or changes over time; is also non-judgmental Often looking for processes, patterns, or sequences Example: study the number of software requirements which have been described during a project, and look for that number stabilizing (not changing much) INFO 515 Lecture #1 19 4. Case and Field Research Study a given organization to understand how it faces its environment Often used for understanding business management decisions – in a given business environment, how did they choose among product development options? INFO 515 Lecture #1 20 5. Correlational Research Study how one variable is affected by one or more other variables Example: how is customer satisfaction affected by product reliability? Another example: how is productivity affected by the level of experience of the workers? INFO 515 Lecture #1 21 6. Causal Comparative A.k.a ex post facto (after the fact) research Study some outcome by looking for possible causes Example: determine if listening to classical music leads to criminal activity Or: determine if being short increases your chance of having a heart attack INFO 515 Lecture #1 22 7. True Experimental Research Examine the effect of some treatment on an experimental group by comparing it to a control group which receives no treatment (e.g. a placebo) Example: drug studies are done this way to prove whether the drug really had a noticeable effect on the patients INFO 515 Lecture #1 23 Experimental Study “Blindness” A single blind study means the testers know which subjects receive the real treatment, but the subjects don’t know A double blind study means neither side knows who received the real treatment – the information is coded so that only the analysts can figure out who received what INFO 515 Side note: If the subjects know what they are receiving, the study isn’t blind at all Lecture #1 24 8. Quasi-Experimental Research This is like True Experimental Research, but is done where you can’t control all of the variables (such as the real world) Much software development research is in this category Much qualitative research is in this category too INFO 515 Lecture #1 25 9. Action Research Develop new ways to solve problems with direct application to the real world This tends to focus on your own organization: study what’s happening, and see how to improve it INFO 515 Lecture #1 26 Action Research A strategy in Educational Research Enables problem solving in the natural setting Participatory action research Connect theory with practice INFO 515 Lecture #1 27 Action Research Questions in Library and Information Science How much does the library spend? How much do potential users actually use the library? How productive is the library staff? Is the staff the right size? How are users served by the library? INFO 515 Lecture #1 28 Statistics Statistics describes a likely range for predicting something, not a fixed point For example, instead of saying it will take “a week” to perform a task, describe a time period in which you are likely to finish the task, such as 7 days +/- 2 days Most people don’t like to think this way uncertainty makes people uncomfortable INFO 515 Lecture #1 29 General Function of Statistics Descriptive Statistics describes the characteristics of one or more variables We describe the traits of that variable Inferential Statistics is used when we develop a hypothesis, and analyze data to make decisions or draw conclusions about that hypothesis INFO 515 We infer some larger perspective or understanding, based on our limited data Lecture #1 30 General Function of Statistics Descriptive Numbers that describe situation of interest Value: efficient summary of data Interpretive (Inferential) INFO 515 More power, but certain amount of risk Hypothesize, then collect data and analyze it Accept or reject the hypothesis Lecture #1 31 Definitions Independent Variable - A variable which is thought to influence another variable Often plotted as the ‘X’ axis on a graph Might have many independent variables Dependent Variable - A variable which is influenced by or is the consequence of the independent variable INFO 515 Often plotted as the ‘Y’ axis on a graph Y Lecture #1 X 32 Independent vs. Dependent Generally speaking, we want to be able to understand and/or predict the dependent variable in a problem Often a hypothesis will try to use one or more independent variable(s) to explain the behavior of the dependent variable INFO 515 We want to understand IQ (dep variable); try to see if income predicts it (indep variable) To improve customer satisfaction (dep), see if a new card catalog (indep event) changes it Lecture #1 33 Cases and Variables Cases = units of analysis people, things, records, etc…. A.k.a.: entities, respondents, subjects, items Become the rows in your data matrix Variables = things that vary! (not constant) INFO 515 Example: Achievement, Intelligence, Attendance, Income, Aggression A.k.a.: measures, attributes, features Become the columns in your data matrix Lecture #1 34 Variables Discrete = Counting Units Continuous = Measurement Example: Intelligence Tests Independent Variables Example: Attendance influences other variables Dependent Variables INFO 515 influenced by (or consequence of) the independent variable. Lecture #1 35 Definitions Population (N) is the total group of things under study, such as all voters in an election Sample (n) is a subset of the population Basic descriptive statistics include Maximum is the largest value in a data set Minimum is the smallest value in a data set Range is the difference between the Maximum and the Minimum INFO 515 Range = Maximum - Minimum Lecture #1 36 Sample & Population Variables Notice that very often, the same variable will have a different symbol for its value for a sample, than its value for the entire population (more examples to follow) This helps distinguish between what we have measured directly (usually the sample variable), but we want to understand or predict that variable for the whole population INFO 515 Lecture #1 37 Measures of Central Tendency There are three measures of “central tendency” Mean Median Mode They convey the average, middle, and most common values in a data set INFO 515 Lecture #1 38 Definitions Mean - The average of a set of data; equal to the sum of their values (Xi), divided by the number of data points (N). Mean is X (X bar) for a sample, or m (Greek mu) for the entire population N Mean = S Xi i=1 N INFO 515 For some set of data with N values; add them up and divide by N. To be precise, this is the arithmetic mean; there are other kinds, e.g. geometric mean. Lecture #1 39 Definitions Median is the middle value of a set of data which has been sorted in numeric order (e.g. the median home selling price) If the set has an even number of data points, average the middle two values Mode is the value of data which occurs the most often (generally for integer data sets) INFO 515 There can be one mode or many, resulting in different mode types Lecture #1 40 Mode Types Unimodal - there is one mode in a data set Bimodal – there are two modes in the data set Multimodal - there are many (>2) modes in the data set INFO 515 If there are no duplicates in the data set (all values are unique), then all its values are modes, hence it would be extremely multimodal! Lecture #1 41 Definitions Standard deviation (s for sample, or s (sigma) for population) represents the average amount data differs from the mean Standard deviation affects the width or flatness of the bell shaped curve Variance (s2 or s2) is the standard deviation squared INFO 515 Lecture #1 42 The Normal Distribution We’ll look at this more later on… Normal Distribution for mean = 0, and std dev = 1/2, 1 and 2 0.9 0.8 0.7 PDF 0.6 PDF (std dev=1) 0.5 PDF (std dev=2) 0.4 PDF (std dev=1/2) 0.3 0.2 0.1 0 -8 -6 -4 -2 0 2 4 6 8 X INFO 515 Lecture #1 43 SPSS SPSS is high end statistical analysis software You can use your Drexel login to download it free from https://software.drexel.edu/ Log in with drexel\ in front of your login name, e.g. "drexel\abc28" and the same password you use for DrexelOne. Navigate to find SPSS version 16, something like https://software.drexel.edu/Students/PCSoftware/SPSS/SPSS16/. Make sure to save the readme.txt file too - it has the serial number and Authorization Code information. Download and run the executable file. Version 16 for Mac (~730 MB file) Version 16 for PC (~ 670 MB files) Anything version 10 or later is acceptable INFO 515 Lecture #1 44 SPSS Introduction SPSS is like a spreadsheet or flat file database Limits for Student Edition only Each variable has its own column (max. of 50) Each record has its own row (max. of 1500) Key navigational feature: INFO 515 Use the Data View tab to see the experimental data Use the Variable View tab to see the characteristics of each variable and how they’re displayed in the Data View Lecture #1 45 SPSS Data View INFO 515 Lecture #1 46 SPSS Variable View INFO 515 Lecture #1 47 SPSS Introduction Use the Variable View tab to change the characteristics of each variable, such as Type of variable (integer, date, text, etc.) Name of each variable, which was limited to 8 characters, is lower case, and has no spaces Labels for each variable are optional, but they allow a more useful identifier than the Name INFO 515 Recent versions finally removed the 8 character limit When you select or plot a variable, its Label is shown (if there is one), not its Name Width is how many digits or characters the variable may have Lecture #1 48 SPSS Introduction Variables can have a limited set of allowable Values, such as {0 = Male}, {1 = Female} Sort data by selecting Data / Sort Cases… INFO 515 Then select one or more variables to be the “Sort by:” criteria If more than one variable is selected, data will be sorted in that order of precedence Lecture #1 49 SPSS Introduction Can adjust column widths like Excel In Data View, move cursor between column titles (which are the variable Names), and drag the column width left or right, or In Variable View, edit the Columns field SPSS data files have an extension of “sav” Output is saved separately in files with an extension of “spo” INFO 515 Tabular output of ***** means the column is too narrow; double click to edit, and drag the right edge of the column to the right Lecture #1 50 Additional References From Prof. Val Yonker Carpenter, R.L., and Vasu, E.S. (1979). Statistical Methods for Librarians. Chicago: American Library Association. Cohen, J. and Cohen, P. (1975). Applied Multiply Regression/Correlation Analysis for the Behavioral Sciences. Hillsdale, NJ: Lawrence Erlbaum Assoc. Hernon, P. (1989). A Handbook of Statistics for Library Decision Making. Norwood, NJ: Ablex Publishing. Isaac, S. and Michael, W.B. (1977). Handbook in Research and Evaluation. San Diego: Edits Publishers. Keppel, G. (1973). Design and Analysis: A Researcher's Handbook. Englewood Cliffs, NJ: Prentice-Hall. Kerlinger, F.N. (1979). Behavioral Research: A Conceptual Approach. New York: Holt, Rinehart, and Winston. INFO 515 Lecture #1 51 Additional References Loether, H.J. and McTavish, D.G. (1980). Descriptive and Inferential Statistics: An Introduction. Boston: Allyn and Bacon. Runyon, R.P., and Haber, A. (1984). Fundamentals of Behavioral Statistics (2nd ed.). Reading, MA: AddisonWesley. Selltiz, C.; Wrightsman, L.S.; and Cook, S.W. (1976). Research Methods in Social Relations (3rd ed.). New York: Holt, Rinehart and Winston. Here’s my favorite: Salkind, Neil J., (2007) Statistics For People Who (Think They) Hate Statistics (3rd ed.). Thousand Oaks, CA: Sage Publications. ISBN: 9781412951500 INFO 515 Lecture #1 52