Download Lecture #1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Operations research wikipedia , lookup

History of statistics wikipedia , lookup

Time series wikipedia , lookup

Categorical variable wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Action Research
Introduction
INFO 515
Glenn Booker
INFO 515
Lecture #1
1
Course Scope
This class focuses on understanding
common types of analysis techniques
which may be used to support research
projects
 We will use the statistics program SPSS
to manipulate data and generate graphs
 There will be weekly homework
assignments for much of the term

INFO 515
Lecture #1
2
Who cares…

…about statistics and research methods?



INFO 515
Commonly accepted techniques need to be
used to ensure that valid comparisons and
analyses are being made
Statistics is a common language to
express results
Helps ensure that objective conclusions
are reached
Lecture #1
3
Why use SPSS?
Microsoft Excel is adequate for simple
math (arithmetic, averages, etc.)
 But Excel fails some standard tests for
performing more advanced calculations
(regression analysis, etc.)
 SPSS was chosen for its widespread
usage and low cost student version

INFO 515
Lecture #1
4
My Background

Eighteen years of industry experience



DOD (Department of Defense) and FAA
(Federal Aviation Administration) work,
primarily involved in software development,
systems engineering, and project management
Also teach statistical process control for high
process maturity organizations
Have been teaching for Drexel since 1998
INFO 515
Lecture #1
5
For the REAL serious student

Get the ISO Standards Handbook “ISO
Statistical methods for quality control”,
5th ed., 2000



It runs $418 for both 700+ page volumes
No, I don’t expect you to buy this!
If you do find someone to buy it for you,
search for its title at http://global.ihs.com/

IHS is a great, if terribly expensive, source for
military (MIL, DOD), industry (IEEE, ASTM), national
(ANSI, DIN*), and international (ISO) standards
* DIN is the German equivalent of ANSI
INFO 515
Lecture #1
6
Other References

More realistically, see my handout
“Statistics for Software Process
Improvement”


INFO 515
It summarizes statistical terms, hypothesis
testing, SPSS tips, and other stuff we’ll
be using
We’ll use it a lot
Lecture #1
7
Definitions

Data - observations collected in order to
measure or describe a situation or
problem of interest


Data describes a variable
Variables - are objects or concepts that
must have a value or a definition assigned
to them in order that they can be
measured and analyzed

INFO 515
They take on different values for individuals
and groups
Lecture #1
8
Discrete vs. Continuous Data
Discrete data can take on only a finite
number of values. It is often
characterized by counting units (integers),
or only specific values, like grades
 Continuous data can take on an infinite
number of possible values and is
characterized by some type of
measurement, instrument, or scale


INFO 515
You measure height, weight (Does anyone ever
know exactly how much they weigh?), speed,
etc.
Lecture #1
9
Definitions
Theory is a possible explanation of the
relationships among variables
 Research Hypothesis – as a
consequence of our theory, the hypothesis
is the statement we submit to testing



Often states there is a pattern, or difference,
or trend among the variables
Null hypothesis is the opposite of the
research hypothesis

INFO 515
States there is no trend or difference
Lecture #1
10
Research

Research describes what or explains why


It is a method for finding answers to
questions or a strategy for explanation
Research is:
1.
2.
3.
INFO 515
Empirical, because it is based on evidence
or data
Systematic, because it uses a method
Objective, because it is presumably
conducted and interpreted by the researcher
without bias
Lecture #1
11
Basic vs. Applied Research

Basic research usually refers to
laboratory research, such as
experimental psychology

INFO 515
In basic research, the researcher is testing
theory and ideas without necessarily
applying the results to practical problems
Lecture #1
12
Basic vs. Applied Research

Applied research is also called field
research, evaluation research, or action
research

INFO 515
This type of research is often used to
influence policy and decision-making, and is
conducted to solve problems (often
immediate problems), sometimes only within
one organization (hence its results are only
applicable to that organization)
Lecture #1
13
Quantitative vs. Qualitative

Quantitative Research tends to deal with
variables that have numeric values



How far do you commute to work?
How tall are you?
Qualitative Research looks at variables
which are binary (Yes/No), have
non-numeric values, or are free-form text


INFO 515
What is your favorite football team?
How could I improve this slide?
Lecture #1
14
The Nature of Qualitative and Quantitative
Research Strategies:
Difference is the type of data you collect
and the tools you employ
 Specifically—





INFO 515
The same data collection strategies can
be qualitative or quantitative
Qualitative data can become quantitative
Pure quantitative data cannot become
qualitative
Often in research, it is good to use
qualitative and quantitative in the same
study
Lecture #1
15
Research Methods
There are many different ways to
conduct research
 Exactly how many ways depends on
your field of study and how you wish
to define them
 Here we break them into nine different
methods (see narrative lecture notes too)

INFO 515
Lecture #1
16
1. Historical Research
Reconstruct the past to support a
hypothesis or theme, while remaining
objective and true to the actual events
which occurred
 Example: study past software projects to
see if it’s true that: “if a project was at
least 10% behind schedule halfway
through, it will finish at least 10% late”

INFO 515
Lecture #1
17
2. Descriptive Research
This is a non-judgmental type of research
 Examine a situation or area systematically
and describe it
 Example: study how library patrons
navigate when looking for a particular
book

INFO 515
Lecture #1
18
3. Developmental Research
Examine how something grows or changes
over time; is also non-judgmental
 Often looking for processes, patterns,
or sequences
 Example: study the number of software
requirements which have been described
during a project, and look for that number
stabilizing (not changing much)

INFO 515
Lecture #1
19
4. Case and Field Research
Study a given organization to understand
how it faces its environment
 Often used for understanding business
management decisions – in a given
business environment, how did they
choose among product development
options?

INFO 515
Lecture #1
20
5. Correlational Research
Study how one variable is affected by
one or more other variables
 Example: how is customer satisfaction
affected by product reliability?
 Another example: how is productivity
affected by the level of experience of
the workers?

INFO 515
Lecture #1
21
6. Causal Comparative
A.k.a ex post facto (after the fact)
research
 Study some outcome by looking for
possible causes
 Example: determine if listening to
classical music leads to criminal activity
 Or: determine if being short increases
your chance of having a heart attack

INFO 515
Lecture #1
22
7. True Experimental Research
Examine the effect of some treatment on
an experimental group by comparing it to
a control group which receives no
treatment (e.g. a placebo)
 Example: drug studies are done this way
to prove whether the drug really had a
noticeable effect on the patients

INFO 515
Lecture #1
23
Experimental Study “Blindness”
A single blind study means the testers
know which subjects receive the real
treatment, but the subjects don’t know
 A double blind study means neither side
knows who received the real treatment –
the information is coded so that only the
analysts can figure out who received what


INFO 515
Side note: If the subjects know what they are
receiving, the study isn’t blind at all
Lecture #1
24
8. Quasi-Experimental Research
This is like True Experimental Research,
but is done where you can’t control all of
the variables (such as the real world)
 Much software development research is
in this category
 Much qualitative research is in this
category too

INFO 515
Lecture #1
25
9. Action Research
Develop new ways to solve problems with
direct application to the real world
 This tends to focus on your own
organization: study what’s happening,
and see how to improve it

INFO 515
Lecture #1
26
Action Research
A strategy in Educational Research
 Enables problem solving in the natural
setting
 Participatory action research
 Connect theory with practice

INFO 515
Lecture #1
27
Action Research Questions in Library
and Information Science
How much does the library spend?
 How much do potential users actually use
the library?
 How productive is the library staff?
 Is the staff the right size?
 How are users served by the library?

INFO 515
Lecture #1
28
Statistics
Statistics describes a likely range for
predicting something, not a fixed point
 For example, instead of saying it will take
“a week” to perform a task, describe a
time period in which you are likely to
finish the task, such as 7 days +/- 2 days
 Most people don’t like to think this way uncertainty makes people uncomfortable

INFO 515
Lecture #1
29
General Function of Statistics

Descriptive Statistics describes the
characteristics of one or more variables


We describe the traits of that variable
Inferential Statistics is used when we
develop a hypothesis, and analyze data to
make decisions or draw conclusions about
that hypothesis

INFO 515
We infer some larger perspective or
understanding, based on our limited data
Lecture #1
30
General Function of Statistics

Descriptive



Numbers that describe situation of interest
Value: efficient summary of data
Interpretive (Inferential)



INFO 515
More power, but certain amount of risk
Hypothesize, then collect data and analyze it
Accept or reject the hypothesis
Lecture #1
31
Definitions

Independent Variable - A variable which
is thought to influence another variable



Often plotted as the ‘X’ axis on a graph
Might have many independent variables
Dependent Variable - A variable which is
influenced by or is the consequence of the
independent variable

INFO 515
Often plotted as the ‘Y’ axis on a graph
Y
Lecture #1
X
32
Independent vs. Dependent
Generally speaking, we want to be able to
understand and/or predict the dependent
variable in a problem
 Often a hypothesis will try to use one or
more independent variable(s) to explain
the behavior of the dependent variable



INFO 515
We want to understand IQ (dep variable); try
to see if income predicts it (indep variable)
To improve customer satisfaction (dep), see if
a new card catalog (indep event) changes it
Lecture #1
33
Cases and Variables

Cases = units of analysis




people, things, records, etc….
A.k.a.: entities, respondents, subjects, items
Become the rows in your data matrix
Variables = things that vary! (not
constant)



INFO 515
Example: Achievement, Intelligence,
Attendance, Income, Aggression
A.k.a.: measures, attributes, features
Become the columns in your data matrix
Lecture #1
34
Variables

Discrete = Counting Units


Continuous = Measurement


Example: Intelligence Tests
Independent Variables


Example: Attendance
influences other variables
Dependent Variables

INFO 515
influenced by (or consequence of) the
independent variable.
Lecture #1
35
Definitions
Population (N) is the total group of
things under study, such as all voters in
an election
 Sample (n) is a subset of the population
 Basic descriptive statistics include




Maximum is the largest value in a data set
Minimum is the smallest value in a data set
Range is the difference between the Maximum
and the Minimum

INFO 515
Range = Maximum - Minimum
Lecture #1
36
Sample & Population Variables
Notice that very often, the same variable
will have a different symbol for its value
for a sample, than its value for the entire
population (more examples to follow)
 This helps distinguish between what we
have measured directly (usually the
sample variable), but we want to
understand or predict that variable for
the whole population

INFO 515
Lecture #1
37
Measures of Central Tendency

There are three measures of “central
tendency”




Mean
Median
Mode
They convey the average, middle, and
most common values in a data set
INFO 515
Lecture #1
38
Definitions

Mean - The average of a set of data;
equal to the sum of their values (Xi),
divided by the number of data points (N).
Mean is X (X bar) for a sample, or m
(Greek mu) for the entire population
N
Mean = S Xi
i=1
N
INFO 515
For some set of data with N values;
add them up and divide by N.
To be precise, this is the arithmetic
mean; there are other kinds, e.g.
geometric mean.
Lecture #1
39
Definitions

Median is the middle value of a set of
data which has been sorted in numeric
order (e.g. the median home selling price)


If the set has an even number of data points,
average the middle two values
Mode is the value of data which occurs
the most often (generally for integer
data sets)

INFO 515
There can be one mode or many, resulting
in different mode types
Lecture #1
40
Mode Types
Unimodal - there is one mode in a data set
 Bimodal – there are two modes in the
data set
 Multimodal - there are many (>2) modes
in the data set


INFO 515
If there are no duplicates in the data set
(all values are unique), then all its values
are modes, hence it would be extremely
multimodal!
Lecture #1
41
Definitions

Standard deviation (s for sample, or
s (sigma) for population) represents the
average amount data differs from
the mean


Standard deviation affects the width or
flatness of the bell shaped curve
Variance (s2 or s2) is the standard
deviation squared
INFO 515
Lecture #1
42
The Normal Distribution

We’ll look at this more later on…
Normal Distribution for mean = 0, and std dev = 1/2, 1 and 2
0.9
0.8
0.7
PDF
0.6
PDF (std dev=1)
0.5
PDF (std dev=2)
0.4
PDF (std dev=1/2)
0.3
0.2
0.1
0
-8
-6
-4
-2
0
2
4
6
8
X
INFO 515
Lecture #1
43
SPSS


SPSS is high end statistical analysis software
You can use your Drexel login to download it free
from https://software.drexel.edu/




Log in with drexel\ in front of your login name, e.g.
"drexel\abc28" and the same password you use for
DrexelOne. Navigate to find SPSS version 16, something
like https://software.drexel.edu/Students/PCSoftware/SPSS/SPSS16/.
Make sure to save the readme.txt file too - it has the
serial number and Authorization Code information.
Download and run the executable file.
Version 16 for Mac (~730 MB file)
Version 16 for PC (~ 670 MB files)
Anything version 10 or later is acceptable
INFO 515
Lecture #1
44
SPSS Introduction

SPSS is like a spreadsheet or flat
file database



Limits for
Student
Edition only
Each variable has its own column (max. of 50)
Each record has its own row (max. of 1500)
Key navigational feature:


INFO 515
Use the Data View tab to see the
experimental data
Use the Variable View tab to see the
characteristics of each variable and how
they’re displayed in the Data View
Lecture #1
45
SPSS Data View
INFO 515
Lecture #1
46
SPSS Variable View
INFO 515
Lecture #1
47
SPSS Introduction

Use the Variable View tab to change the
characteristics of each variable, such as


Type of variable (integer, date, text, etc.)
Name of each variable, which was limited to 8
characters, is lower case, and has no spaces


Labels for each variable are optional, but they
allow a more useful identifier than the Name


INFO 515
Recent versions finally removed the 8 character limit
When you select or plot a variable, its Label is
shown (if there is one), not its Name
Width is how many digits or characters the
variable may have
Lecture #1
48
SPSS Introduction
Variables can have a limited set of
allowable Values, such as {0 = Male},
{1 = Female}
 Sort data by selecting Data / Sort Cases…



INFO 515
Then select one or more variables to be the
“Sort by:” criteria
If more than one variable is selected, data will
be sorted in that order of precedence
Lecture #1
49
SPSS Introduction

Can adjust column widths like Excel


In Data View, move cursor between column
titles (which are the variable Names), and drag
the column width left or right, or
In Variable View, edit the Columns field
SPSS data files have an extension of “sav”
 Output is saved separately in files with an
extension of “spo”


INFO 515
Tabular output of ***** means the column is
too narrow; double click to edit, and drag the
right edge of the column to the right
Lecture #1
50
Additional References






From Prof. Val Yonker
Carpenter, R.L., and Vasu, E.S. (1979). Statistical Methods
for Librarians. Chicago: American Library Association.
Cohen, J. and Cohen, P. (1975). Applied Multiply
Regression/Correlation Analysis for the Behavioral Sciences.
Hillsdale, NJ: Lawrence Erlbaum Assoc.
Hernon, P. (1989). A Handbook of Statistics for Library
Decision Making. Norwood, NJ: Ablex Publishing.
Isaac, S. and Michael, W.B. (1977). Handbook in Research
and Evaluation. San Diego: Edits Publishers.
Keppel, G. (1973). Design and Analysis: A Researcher's
Handbook. Englewood Cliffs, NJ: Prentice-Hall.
Kerlinger, F.N. (1979). Behavioral Research: A Conceptual
Approach. New York: Holt, Rinehart, and Winston.
INFO 515
Lecture #1
51
Additional References




Loether, H.J. and McTavish, D.G. (1980). Descriptive and
Inferential Statistics: An Introduction. Boston: Allyn and
Bacon.
Runyon, R.P., and Haber, A. (1984). Fundamentals of
Behavioral Statistics (2nd ed.). Reading, MA: AddisonWesley.
Selltiz, C.; Wrightsman, L.S.; and Cook, S.W. (1976).
Research Methods in Social Relations (3rd ed.). New York:
Holt, Rinehart and Winston.
Here’s my favorite:
Salkind, Neil J., (2007) Statistics For People Who (Think
They) Hate Statistics (3rd ed.). Thousand Oaks, CA: Sage
Publications. ISBN: 9781412951500
INFO 515
Lecture #1
52