Download How to use SPSS - Royal Holloway

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Time series wikipedia , lookup

Categorical variable wikipedia , lookup

Transcript
Navigator
IBMA SPSSA
Statistics
(A Vade Mecum for SPSS 19 on Windows7)
1
2
Navigator
IBMA SPSSA
Statistics
(A Vade Mecum for SPSS 19 on Windows7)
3
Preface
This is an elementary guide intended for navigating IBM SPSS v19 on the Windows
platform. Little or no previous knowledge of SPSS is assumed. The guide is to be
treated as an interface to manipulate tools available from IBM SPSS before
embarking on statistical applications and their interpretations. Brief comments on data
properties and necessary requirements for applying specific statistical techniques have
been outlined in the appendix.
September 2011
Royal Holloway
University of London
Further Readings:
Frequently Applied Statistical Techniques
Multivariate Statistical Analysis
P Pal
P Pal
4
Contents
Working with Input Editor
4
Data View
Variable View
Data Manipulations
Pre-processing of Raw Data
Compute
Re-Code
Case Selection
Simple Condition; Conjugate Condition
Split File
4
5
6
7
8
10
12
13
15
Working with Output Editor
17
Controlling Output
Export Output to another Application
18
18
Appendix
Basic Statistics
Uncertainties
Frequently Applied Statistical Techniques
Mathematical Models
22
5
NAVIGATOR IBM SPSS 19
§ Working with the Input
When SPSS is opened, the first window that appears on the screen is the Input Editor
(Data View). At the top of this Editor, two bars appear: standard menu bar and
standard tool bar.
The main body consists of a spreadsheet with rows and columns. Like most
applications packages under Windows OS, there are arrows on the right hand and
bottom margins of the SPSS spreadsheet for scrolling it up, down or side ways as
required. At the bottom (left hand corner) two tabbed buttons are located e,g Data
View and Variable View/
The standard menu bar shows the following menu items:
Item
Used for
File
Edit
View
Data
Transform
Analyze
Graphs
Add-ons
Utilities
Window
Help
opening, saving, printing files
copy, paste, editing files
controlling the appearance of software
organizing data file, code labelling of data values
recode, compute, defining new variables
statistical applications
drawing different types of graphs
plug-in additional applications
accessing command languages
activating a particular window
accessing SPSS help facilities on applications
Exhibit 1
The Input Editor (Data View)
Usage
(extensive)
(extensive)
(less extensive)
(extensive)
(extensive)
(extensive)
(extensive)
(less frequent)
(less frequent)
(seldom)
(extensive)
6
The tool bar below the standard menu bar contains tools which are used as short cut
tools to perform some of the standard menu functions. The specific function by the
individual icon may be displayed by anchoring the mouse pointer on it.
Exhibit 1 shows 5 sets of data have been entered in 3 columns. By default, each value
has 2 decimal places as the values are entered. The three active columns containing
values are automatically labelled VAR00001, VAR00002, VAR00003 as values are
typed in. The remaining columns remain passive with blank cells. These columns
labelled Var (faintly displayed).
By clicking on the Variable View tab, the Input Editor is opened showing the
variables.
Exhibit 2. Input Editor Variable View
This window shows the list of variables entered in the Input Editor (Data View) and
information about the types and format including names and other properties of
individual variables, together with their character width, decimal places etc. In the
above Exhibit, notable elements are: 3 variables with their names, their type (i.e.
numeric or string type; here it is numeric type), width (8 character width), decimals (2
decimal places), alignment of data values (right alignment).
7
§ Manipulation of Input data
Different techniques are available for the manipulation of input data. In the following
sections, a selection of frequently used manipulation methods is outlined. These
include: Changing variable names, Defining data properties, Labelling code values,
Increase or decrease decimal places.
Change Variable names:

Steps to Change Variable names (header labels)
Anchor mouse pointer to a required cell in the Name column
(Col 1 of the Variable View Editor)
Type the desired name;
Repeat process to all the variable cells as required.
Variable names must begin with a letter, have up to 8 characters (without space),
Data manipulation:
Usually the width of data values should not require any change. However, default
decimal places of 2, may need changing.

Steps to change decimal places in the data values
Anchor mouse pointer to a cell in the Decimal column, (shows the up and
down arrow scroll button appears).
Click Up arrow to increase or Down arrow to decrease decimal places
in the data values.
 Steps to change value labels
Values in some variables may consist of discrete numeric codes representing some
class names. These codes may be given their appropriate value labels.
Click on Data from the Menu bar and Choose Define Data Properties
(The variable scanning window appears)
Exhibit 3 Variable scanning window
8
Steps (contd)
Click on the desired variable from the list and transfer to the right box
Click on Continue; next window opens (see Exhibit below)
Click on the Variable appearing on the left.
Anchor mouse pointer in the Label box (top
right hand corner) and
type an elaborate label as desired;
Type in the labels in the boxes next to
each Code value.
Finally OK.
Exhibit 4 Value edit window
§ Pre-processing of raw data.
Pre-processing of raw data may consist of (i) direct arithmetic calculation, or (ii) by
application of an in-built function e.g. Log10 or Ln function or exponential or
trigonometric functions or (iii) application of a statistical probability function e.g.
CDF (cumulative distribution function)
Direct calculation may be based on a formula or an equation. The formula expression
may contain variables (entered as input data) and constants.
A variety of in-built functions are offered for transformation of raw data values. Any
specific function may be selected from a list of functions. Data values contained
selected variable(s) from the input list are used as the argument(s) of the selected
function.
9
(a)
Pre-processing by Direct calculation using an expression
Here use constant numeric values, data variables, and operator symbols are used from
the numeric key pad of the Compute dialog box
Example 1: Calculate AGE from the raw data set containing the Month and Year of
birth (to 2011) This is to be calculated with reference to the year 2011.
Suggested formula is
AGE = (2011 – Year) + (12 – Month)/12
Here AGE is the target variable in which the computed result values go and Year and
Month variables are provided as raw data input variables (See Exhibit 3)
Exhibit 5 Compute Dialog box.
The operators (e.g. + , -, *, / operators), parantheses, and numeric constants are
available from the key pad displayed by the Compute box, the input variables list
appears on the left hand side pane.

Steps to compute a new variable
Transform
Compute (Compute dialog box appears)
Enter desired name in the Target box
Enter the desired formula to compute in
the Numeric Expression box
Finally OK
10
Note:
In the Transform --- Compute process (SPSS 19), when the entries are made from the
key board, these entries are recorded as a syntax document. This syntax document
appears after the final OK is clicked. The sequence is illustrated with the following
simple computational example.
Example: Conversion of Centigrade values to Fahrenheit scale.
Formula
Fahrenheit = (5/9)*Centigrade + 32
In the Compute process, we chose Fahrenheit as the Target variable and in the
Function box, we type the formula by picking the appropriate symbols and operators
and finally OK.
The syntax representation appears on the screen. This is shown in the Exhibit.
Exhibit 6 Syntax file on the foreground and Data file on the background.
The Syntax representation may be cancelled and the results (in this case the
Fahrenheit scale values) appear as a new variable data as follows.
11
.Exhibit 7 The results after pre-processing are shown as the target variable.
(b)
Computation by using a function
For computation of any in-built function on the data values, select required function
from the functions group list (top half of the box) and select any procedure function
from special variables group (bottom half).
Example 2:
(1 ) Compute Log10 of AGE and
(2 ) natural log of AGE
The formulas for these computations are as follows:
Log_AGE = Lg10 (AGE)
LnAGE = LN (AGE)
Here LOG_AGE and LNAGE are the Target variables for Log10 and natural log
transformations respectively, and LG10 and Ln are the corresponding functions to be
applied to the variable AGE.

Steps
Transform
Compute (Compute dialog box appears)
Enter desired name in the Target box
Enter the desired function in Numeric Expression box
Enter required argument from the variable list
Finally OK.
(For entering a function, click on the up arrow button at the bottom right hand corner
of the key pad. The function is displayed which requires appropriate arguments.)
12
From the input variable list highlight the appropriate variable and enter as the required
argument.)
Exhibit 8 (Using Functions on Variables)
These pre-processed results (after applying chosen functions) appear in the Input
Editor as new Variables with the name that was given as the Target Variable in the
Compute process.
Exhibit 9.(Input Editor displaying Computed Values)
13
§ Re-Code
Re-code tool is useful to hold new specified (re-coded) values from an existing
variable. Re-code mechanism allows to form a categorical variable from an existing
variable (old) containing continuous data.
Steps to re-code
Transform
Re-code (select into different variable option)
(Re-code dialog box appears)
Exhibit 8 (Re-code Dialog box)

Steps (Contd).
Select the variable to be re-coded from the input variable list
Type an output variable in the Output variable box and click Change
Click Old and New values button
(Old and New values box appears)
Exhibit 10. (Old and New values Dialog box)
In this box three range selection options are available
(i)
(ii)
(iii)
Range
Range
Range
---------through ----------Lowest through ---- Value
Value ---- through Highest
14
Steps (Contd
Enter the limits of each range in appropriate blank boxes
Type a desired value in the New value box, and click ADD
(this New value is displayed in Old  New box
Continue (Return to the previous dialog box)
Finally OK.
Exhibit 11
The re-coded values appear in a new column in the Input Editor.
Exhibit 11
15
§ Selection of a sub-set of data
It is often necessary to analyse values contained in a variable by sub-groups. The subsets are to be specified in terms of some categorical variable. Two variables are in this
process. Analysis of data is achieved by the Split file tool which organises the output
by the specified sub-groupings. This selects only the desired sub-set from the full data
file. The selection may be based on simple conditions or compound conditions.
(a) Case selection with simple condition
Example 3. Given a sample of candidates with their gender and marital status, select
(a) the female candidates
(b) the female candidates who are married.
Calculate (a) the average age of the candidates (b) the average age of male and (c)
average age of female candidates.
Gender
Maritals
Age
1
2
2
2
1
1
1
2
2
2
1
2
2
2
1
1
1
2
2
2
2
2
2
2
1
1
2
2
2
1
2
2
2
2
1
1
2
2
2
1
23.00
19.00
33.00
42.00
26.00
28.00
39.00
27.00
18.00
31.00
23.00
19.00
33.00
42.00
26.00
28.00
39.00
27.00
18.00
31.00
Codes:
Gender
male = 1
female = 2
Maritals
Unmarried = 1
Married = 2
Select female cases only (i.e. Gender = 2)
Steps
Data – Choose Select Cases (the select cases window appears)
This window shows the list of all the variables entered (left
hand pane) and All cases selected as default
Choose the If condition satisfied option by clicking in
the adjacent circle to open the next window
(see Exhibit)
16
Finally OK
Exhibit 12 (Select Cases Conditions Dialog box). The exhibit shows all cases are
selected.
Exhibit 13 (Condition applied)
Steps (contd).
Choose appropriate variable (for the given Example, this is Gender)
Transfer to the right hand side box.
17
Paste = symbol, followed by 2, ( i.e. Gender = 2)
Continue (return to the previous window)
This window now shows the instruction next to
the If Condition satisfied option,
Finally OK.
The results of the condition are applied in the Input Editor by selecting the selected
cases and de-selecting the remaining cases. (see left hand margin and Filter_$ col) .
Exhibit 14 (Input Editor showing the selection with a simple condition)
(b) Case selection with compound condition
With the sample example Select cases for female and married candidates.
This is a compound condition case. In order to apply the condition, two variables are
to be sought (i) gender from which gender = 2 condition is to be employed and
(ii) maritals from which maritals = 2 is to be employed.
The two separate conditions are combined by the & conjunction. See the Exhibit
below.
18
Exhibit 15 (Selection with conjugate conditions)
As a result, the Input Editor displays all cases which satisfy the compound condition
and crosses out the remaining cases. See the margin and filter $ column of the Editor.
Exhibit 16 (Input Editor showing the sub-set selected with conjugate condition)
19
§ Split Files
It is often necessary to analyse data from a data file (full set) by different categories
(sub-sets). This is performed by specifying the sub-sets with appropriate defining
variable which contains the category codes.
Example: Given a sample of 10 male and female candidates with their ages.
Calculate
(a) the average age of the candidates
(b) the average age of male and
(c) average age of female candidates.
Note: (b) and (c) are split file cases.
1
2
2
2
1
1
1
2
1
1

42.00
34.00
32.00
23.00
28.00
31.00
43.00
36.00
27.00
42.00
Steps to Split file
From the Data menu select Split File (Split File box appears)
Choose Organize Output by groups and then OK
Note: Data file is sorted by the grouping variable
Exhibit 17
20
Results (a) Average of all candidates
Exhibit 18 Average age of all the candidates calculated
Average age is 33.80
(b) Average age of Male and Female candidates
Exhibit 20 (The results on the top window)
21
Appendix
§ Measurement scales
Majority of statistical analysis involves simple arithmetic operations on the data
values. However the types of admissible operations depend on the measurement scale
of the data. Measurement scales of the data values are of the following types



Nominal
Ordinal/Rank/Likert
Interval/Ratio
Discrete
Discrete
Continuous
Non-parametric
Non-parametric
Parametric
Relative strength of 3 measurement scales
Weak <-----------------------------------------> Strong
Nominal
Ordinal/Rank/Likert
Interval/Ratio
Transformation from Interval/Ratio to Ordinal is possible by grouping Interval scale
values in discrete groups within ranges defined by researchers. (cf. Re-code
application).
Transformation from Ordinal scale to Interval scale is done by applying various
transformation on the Ordinal scale data, e.g Log transform, Square root or Inverse
transform.
Admissible Statistical Analyses
In general, the non-parametric variables (Nominal or Ordinal/Rank scale) define
categories. Hence these variables are called categorical or classification or grouping
or factor variables, containing the so-called non-metric or discrete data. Parametric
variables contain actual measurements or observations which represent metric or
continuous data.
Various statistical techniques that are applied on the data are dependent on the
measurement scale of the data values. A list of admissible statistical techniques is
given below.
_________Non-parametric_________
Nominal
Ordinal
Parametric
Ratio/Interval
Mode
Frequency
Non-parametric tests
Contingency coeff
Summary parameters
Mean, Variance,
Skewness, Kurtosis
Inferential Statistics
ANOVA MANOVA
Pearsons Correlations
Linear and non-linear
regressions
Median
Frequency
Non-parametric tests
Spearmans correlation
22
§ Summary statistics
Statistical parameters are computed with a view to describing (or summarizing)
the properties of sampled data. Each parameter is a single-valued quantity. In
most cases, these parameters are computed for continuous scale data, with a few
for discrete data. A selection of these parameters is given below.
Definitions and Formulas of Summary Parameters for a sample

Mode:
Mode is the most frequently occurring value in a collection of
observations. There may be more than one mode in a set of observations.
(Nominal)

Median: Median is the value in the middle value in a range of observations.
There are exactly same number of observations greater than and less than
the median value With N observed values,
if N is odd, the median value is in the position [N+1]/2, and
if N is even, median is the average of the values in position N/2 and N/2 + 1.
(Ordinal/Rank)

Arithmetic Mean X : The (sample) mean of a sample of N observations
consisting of X1, X2,...Xi values, is given by
X 
1
N
N
X
i 1
1
i
(Interval/Ratio)

Range: Range is the difference between the maximum and minimum
values of N observations, i.e.
Range = Xmax - Xmin

2
Variance v: The variance of N observation represents the spread of the
observed data set about the mean and is expressed as
 X
N
v
i 1
i
X
N 1

2
3
(Interval/Ratio)

Standard Deviation s: Standard deviation is given by the square root of
variance
s= v
(Interval/Ratio)
23

Skewness Sk: Skewness represents features (presence or absence) of
symmetry of the distribution of the data set about the mean and is given by
Sk 
 X
i
X

3
6
N * s3
(Interval/Ratio)
Distribution

Sk zero
Symmetrical
Sk -ve
Left skewed
Sk +ve
Right skewed
Kurtosis Kr: Kurtosis represents the peakedness of a distribution.
Kr 
 X
i
X
i
N * s4

4
7
(Interval/Ratio)
24
§ Summary statistics contd.

Quartiles: Quartiles divide a data set into quarters (4 equal parts)
The first quartile, Q1, is the median of the portion of data set that lies at or below
the median of the entire data set
The second quartile, Q2 is the median of the entire data set
The third quartile Q3, is the median of the portion of data set that lies at or above
the median of the entire data set.
Note: The entire data set is arranged in ascending order for quartile evaluation.

Five number summary: Five number summary is the list of five summary
measures including the minimum, Q1, Q2 (median), Q3 and maximum of
the data set.
Note: The entire data set is arranged in ascending order for five number
evaluation.
A Box and Whisker plot gives a graphic representation of a data set based on the
Five number summary.
Q1
Median
Q3
Also the B and W plot gives a visual understanding of the skewness of the
distribution of the data set. The equal areas of the two polygons on either side of
the Median indicates a symmetrical (skew = 0) distribution of the dta.

Percentiles: A percentile of a specific value K (Kth Percentile) is the value
of the data at or below which lies the K percent of observations.
25
§ Uncertainties associated with sample data
Confidence Interval CI
A confidence interval is a range if values, wa to wb, which is expected to include a
statistical parameter θ (i.e. wa ≤ θ ≤wb ).
A sampling distribution is associated with a statistical parameter (e,g, mean,
correlation coefficient, regression coefficient and so on) calculated from a sample data
set and the Central Limit Theorem states that this distribution is normal with the well
known bell shape and two tails (areas of error) on either side. By specifying a
confidence level α (traditionally α is set at 95%) with the sample parameter, it is
necessary to exclude the regions of error. This region is 1 – α taking both sides (two
tail) of the distribution. In other words, the area of error on each side is (1 – α)/2The
area of exclusion is usually looked up from a table showing values at desired error (A
t-distribution table).
a) Calculation of CI for a mean (m)

Find the value for the error region (look up t0.025 from t-distribution table).
This value is 1.96 at 95% confidence level, or at 5% error level.

Multiply the standard error (SE) of the mean by this value which gives the
required interval.
Note: the SE of a mean m is (s/√N), where s is the standard deviation of m and
N the sample size
Thus the CI of a sample mean m is
CI if m = m ± t0.025 * SE of m i.e.
= m ± 1.96* (s/√N)
b) Calculation of CI for a proportion from a dichotomous set of values
Consider a sample of size N with dichotomous values. The proportion p of values
with one dichotomy sub-set is f1/N and the other sub-set is 1- p.

Calculate the standard error SE of a proportion (p). This is given by
SE of a proportion =

p(1  p)
N
The CI of a sample proportion p is
CI of p = p± t0.025 * SE of p at 95%level
p(1  p)
= p± t0.025 *
N
26
§ Inferential Statistics
The key objective of statistical analysis with a set of data is to draw some valid
inference from it. Two computational steps are involved in deriving such inference.
(a) Calculation of an appropriate test measure of the relevant hypothesis and
(b) Estimation of the confidence interval.
Inferential statistics comprises a wide range of tests which enable researchers to draw
inferences from their sample data set.
If an inference is to be drawn on a full population of data with a sample, certain
conditions are to satisfied. Associated with every statistical test, there are two basic
components. These components basically specify the conditions for an appropriate
inference. These include
i)
ii)
A measurement requirement and
A model.
Parametric and Non-parametric Tests:
A parametric tests is a test which is to be applied on a set of data measured on Interval
or Ratio scale (continuous) or on counts (discrete). It is generally assumed that the
data conform to some distribution e.g. Normal or Poisson etc.
A non-parametric test applied to a data set on Ordinal scale (e.g. ranked data) or on
Nominal scale. In non-parametric tests, conformity of the data to any distribution is
not assumed.
Power Efficiency of Parametric and Non-Parametric Tests
In general, inference of a more general type are drawn from the application of
statistical tests which demand weaker assumption about the data. However, the test of
null hypothesis through these applications is less powerful for any particular sample
of size N (say) under consideration. This true of any Non-Parametric tests compared
with Parametric test.
Thus when two different tests are applied Test A and Test B) to two samples of two
sizes Na and Nb where Nb > Na, then test B may be more powerful than test A
This provides a scope for test B over test A.
Efficiency and Sample size
The extent of increase in the sample size to make a test (test B say) as powerful as
another test (test A say) is linked with the power efficiency. This is defined in terms
of the sample size used the respective tests
27
Power efficiency of test B = Na/Nb
Using the above formula, it is easy to calculate the sample size of test B in order to
raise its power to level of test A.
Considering the parametric and non-parametric tests for comparing means of two
groups (i.e. students t-test and Mann-Whitney test), the power efficiency of MannWhitney test is 95% relative to the t-test.
Similarly among the parametric and non-parametric version of analysis of variance
(ANOVA) the power efficiency of Kruskal-Wallis (non-parametric) test is again 95%
relative to F-test (parametric).
Same level of efficiency may be achieved for two types of tests (parametric and nonparametric) by drawing appropriate size for each. For example, a researcher needs to
have a sample size of 20 cases for a Mann Whitney test to be able to reject Ho with
the same level of confidence for every 19 cases for a t-test. Similarly a sample of size
20 is needed in a Kruskal-Wallis test to be able to reject Ho for every 19 cases for
application of the F-test.
28
§ Hypothesis testing steps

A no difference statement is set up as the starter (Ho Null hypothesis). The
implicit aim of a null hypothesis is to negate what you wish to demonstrate with
your sample data set.

Apply appropriate statistical technique, which produces the test statistic value,
degrees of freedom (df), and probability p (sig-value)

Compare the test statistic p (sig) value with the p value of the sampled
distribution
If the p value is low (conventionally < 0.05), reject Ho with residual uncertainty
proportional to p or else accept Ho.
Significance level (p): The dichotomy
p
yes
Reject Ho
no
Accept Ho
< 0.05
§ Common Errors
The common errors in arriving at a decision about Ho are of two types:
Type I Error
To reject Ho,
when in fact it is true.
Probability associated with Type I error = alpha
Type II Error
To accept Ho
when in fact it is false
Probability associated with Type II error = beta
The two probabilities alpha and beta are inversely related, if alpha is increased. Beta
is decreased.
These errors originate from the sample size N of the data set. The errors are reduced
by by increasing the sample size N.
29
§ Specific Parametric Tests.
Simple ANOVA applications

ANOVA (One Way or Univariate)
Data specification:
2 variables - one Test Variable which includes all test scores (Interval/Ratio scale)
and one categorical variable (Ordinal or Nominal scale) which defines the groups
(sub-sets) of data on the test variables.
The key quantity that is computed in ANOVA is the F-statistic, together with the
degrees of freedom and the sig-value.
The F statistic (a single valued result) is the ratio of the between group (b-g) variation
and the within-group (w-g) variation present in the test variable.
F=
SSb g
SS w g
Both the numerator and the denominator are single-valued scaler quantities for
ANOVA applications and hence the F-statistic turns out to be a single-valued
quantity.
Simple MANOVA applications
GLM Multiple Analysis of Variance (MANOVA)
The MANOVA technique is applied to analyse the variances with 2 or more test
variables (contrast to MANOVA). These test variables containing continuous data are
treated as the dependent variables and the categorical variable(s) as independent
variable(s). MANOVA computes the effects of the categorical (factor) variable on the
dependent variables through the F-statistic. Normality assumptions on the test
variables are not stringent.
In Manova with multiple test variables, the between group (b-g) and within group (wg) variations are not single values scaler quantities but matrices. These matrices are
known as SSCP(bg) and SSCP(wg) matrices respectively.
30
The latent roots from the matrix represent the so-called eigenvalues. The F-statistic
calculation from these eigenvalues proceeds along the route as follows.
SSCP
b-g (Matrix)/SSCPw-g(Matrix)
SSCP
b-g
(Matr
ix)/S
SCPwg(Mat
rix)
Computes eigenvalues 
Computes Wilks 
The following equations relate the eigenvalues (s) with the Wilk's  and the Fstatsistic.
1
Wilks =  i
1  i
1     n1  n2  p  1
F = 


  
p

where n1 and n2 represent the number of cases in p groups.
Other commonly used Manova test indices are also computed from the eigenvalues.
These are given below.
Pillais Trace
=
i
Hotelling Trace
i
1  i
=  i
i
Roy’s max root
=
 max
1   max
31
Specific MANOVA Applications

Variability of 2 or more test variables with 1 or more factor variables
Data Specification:
(i) 2 or more test variables (Continuous) and
(ii) 1 or more categorical (factor) variables (discrete). The discrete variable may
include 2 or more levels

Repeated measure MANOVA (Between groups)
Data Specification: Sub-sets of Test variables (Interval/Ratio scale) are to be entered
in a 2-dimensional matrix format. The first element of the matrix represents the main
factor and the second element denotes the factor levels (the levels indicate time
repetition) .

Repeated measure MANOVA (mixed Within-Between groups)
Sub-sets of Test variable (Interval/Ratio scale) is to be entered in a 1-dimensional
matrix form with factor levels (the levels indicate time repetition)
§ BI-VARIATE RELATIONSHIPS
Pearsons Product-Moment Correlation (Parametric)
Data specification
Measurement scale
2 or more variables
Ratio/Interval scale
Example Correlation Matrix 3 variables:
ABC
Correlation coefficients appear as a Matrix
AB = BA;
Range of Correlation coefficient r
AA
BA
CA
AB
BB
CB
AC
BC
CC
AC = CA; BC = CB
-1 ≤ r ≤ +1
Qualitative description of strength of r
Range
+/-0.1 to +/+/-0.31 to +/+/-0.51 to +/+/-0.71 to +/-
0.3
0.5
0.7
1.0
Relationship
weak
medium
moderate
strong
32
§ MATHEMATICAL MODELS
An aspect of rationality is the possibility of going beyond purely intuitive
judgements to the use of structural models that provide methods of
aggregating intuitive judgements.
P Suppes
Lucien Stern Professor of Philosophy
Stanford University California.
§ General Linear Model Family
Regression Analysis
GLM
Discriminant Analysis
Factor Analyis
Principal Component Analysis
§ GLM Prediction Model
Simple Linear Regression
A single linear regression is formed with one dependent variable y and one
independent variable x1, and the equation is reduced to the form:
y = constant + b1* x1
The coefficient b1 which represents the slope of the regression line and the constant
is the intercept of the regression line with the vertical axis The coefficient may have a
positive (+ve) or a negative (-ve) value. A +ve coefficient means that as the
independent variable x increases, the dependent variable y also increases and a -ve
coefficient means that as x increases y decreases. The magnitude of the coefficient
value gives the measure of steepness of the regression line.
33
Multiple Linear Regression
Regression is an example of GLM structures with the objective of prediction
In Multiple regression, a linear equation is constructed by least square fitting using
multiple independent (explanatory) variables X1, X2, X3 and one dependent variable
Y.. The equation is expressed as follows:
Y = constant + b1* X1 + b2* X2 + b3* X3 +...
In the above equation, the coefficients (b1, b2 etc) associated with the independent
variables are partial regression coefficients. Each coefficient or parameter should be
regarded as an estimator of the measure of influence of a particular independent
variable on the dependent variable. These coefficients may be +ve or -ve
Data Specification
 Usually all the variables (independent and dependent) contain continuous
(Interval/Ratio scale) data.
 In many regression applications, some of the independent variables may
contain Ordinal /Rank scale data.
 In special applications (e.g. regression used as a classification model), the
dependent variable may contain categorical data (Nominal or Ordinal/Rank scale)
Terms and Definitions
Partial Regression Coefficient - The coefficient associated with each independent
variable (e.g. b1, b2 etc) is called the partial regression coefficient. These are the
measures of influence of the independent variables on the dependent variable. These
are unstandardised coefficients
Standardised Coefficients - Owing to the presence of different scales of values in the
independent variables, it is necessary to transform the unstandardised coefficients
to produce the standardised coefficient which gives the measure of relative influence
of each regression coefficient in comparable units. This is given by
S
= b x
Sy
where  is the standardised coefficient which is related to b, the unstandardised
coefficient for a particular independent variable X and the dependent variable Y, and
Sx and Sy are the respective standard deviations.
Coefficient of Multiple Determination R2 - This is the measure of the proportion of
the variation in the dependent variable explained by the independent riables
34
§ The Ballantine for Multiple Regression
The Ballantine model provides a diagrammatic representation of the influence of the
explanatory variables on the dependent variable. The variables involved (dependent
and independents) in the multivariate regression analysis are depicted as circles.
which mutually overlap. The overlap sections of these circles represent the extent of
partial contributions by the independent variables to the dependent variable and
among themselves.
A
B
Y
X2
X1
C
D
Three circles (above) represent 3 variables Y, X1 and X2. There are 4 overlap regions
shown by A, B, C, and D.
'A' represents the overlap between X1 and Y (independent of X2) equivalent to b1
'B' represents the overlap between X2 and Y (independent of X1) equivalent to b2
'C' represents the overlap between X1, X2 and Y
The ratio of the sum 'A' + 'B' + 'C' to the total area of the 'Y' circle is equivalent to R2,
the coefficient of multiple determination.
'D' represents the overlap between X1 and X2 (independent of Y) is equivalent to
partial correlation of X1 and X2.
35