Download Innovations and Applications in Quantitative Research

Document related concepts

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

German tank problem wikipedia , lookup

Choice modelling wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Data assimilation wikipedia , lookup

Transcript
Introduction to
Intensive Longitudinal Methods
Larry R. Price, Ph.D.
Director
Interdisciplinary Initiative for Research Design & Analysis
Professor – Psychometrics & Statistics
What are intensive
longitudinal methods?
Methods used in natural settings using
many repeated data captures
(measurements) over time within a person.
For example, daily diaries, interaction
records, ecological momentary
assessments, real-time data capture.
The term intensive longitudinal methods is
an umbrella term that includes the above
types of data structures.
Areas of notable growth in
the use of the methods
Interpersonal processes in dyads and
families.
For example, an examination of the link
between daily pleasant and unpleasant
behaviors (over 14 days) and global ratings
of marital satisfaction.
The study of dyads and family processes
requires unique data-analytic challenges.
Why use intensive
longitudinal methods?
Can be used to quantitatively study thoughts,
feelings, physiology, and behavior in their natural,
spontaneous contexts or settings.
The data that result show an unfolding temporal
process (a) descriptively and (b) as a causal
process.
For example, it is possible to show how an outcome
Y changes over time and how this change is
contingent on changes in a causal variable X.
Difference between traditional
repeated measures analysis
Usually limited to a few repeated measurements
taken over long time intervals.
We often use dynamic factor analysis models to
study complex change over time in this type of
scenario.
Intensive longitudinal methods offers a framework
for analyzing intrapersonal change (i.e. withinperson process) with extensive or rich outcome
data.
Advantages of use
Measurement may be continuous or
discrete based on a response to an
experimental or observational condition
captured over many time points in a short
total duration.
Important for studying intraindividual and
interindividual change within a unified
model.
Measuring a process over time
For example, Time 4 is regressed on Time 3, Time 3 is regressed on
Time 2, Time 2 is regressed on Time 1. This yields a autoregressive
structure or time series vector. In intensive longitudinal methods,
these time series vectors are nested within individual persons
yielding a hierarchical structure. Thus the need for a random
coefficients mixed model approach (e.g., using SPSS mixed, SAS
PROC MIXED, or the HLM program).
Advantages of use
Depending on when a variable X measured at one
point in time has a maximally causal effect on Y at a
later time point,
The precise temporal design of a longitudinal study
can greatly influence the observed effects.
We want to avoid using a between-subjects
analysis to analyze these type of data.
Example – fMRI data structure
Small Sample Properties of Bayesian
Multivariate Autoregressive
Time Series Models
(Structural Equation Modeling Journal, 2012)
Relationship of measurement
model to autoregressive model

Multivariate vector
autoregressive (MAR) time series
model
A MAR model predicts the next value in a d –
dimensional time series as a linear function of the p
previous vector values of the time series.
The MAR model is based on the Wishart
distribution, where V = p x p, symmetric, positive
definite matrix of random variables.
Multivariate Vector
Autoregressive Model
Blue = contemporaneous
Brown = cross-covariances
Red = autoregression of time (t) on t-1
Goals of the present study
Bayesian MAR model formulation capturing
contemporaneous and temporal
components.
Evaluation of the effect of variations of the
autoregressive and cross-lagged
components of a multivariate time series
across sample size conditions where N is
smaller than T (i.e., N <= 15 and T = 25 –
125).
Goals of the present study
To examine the impact of sample size and
time vector length within the sampling
theory framework for statistical power and
parameter estimation bias.
To illustrate an analytic framework that
combines Bayesian statistical inference with
sampling theory (frequentist) inference.
Goals of the present study
Compare and relate Bayesian credible
interval estimation based on the Highest
Posterior Density (HPD) to frequentist
power estimates and Type I error.
Research Challenges
Sample size and vector length
determination for statistically reliable and
valid results – Bayesian and Frequentist
considerations.
Modeling the structure of the multivariate
time series contemporaneously and
temporally.
Research Challenges
Examining the impact of autocorrelation,
error variance and crosscorrelation/covariance on multivariate
models in light of variations in sample size
and time series length.
Introduction to Bayesian
probability & inference
Bayesian approach prescribes how learning
from data and decision making should be
carried out; the classical school does not.
Prior information via distributional
assumptions of variables and their
measurement is considered through careful
conceptualization of the research design,
model, and analysis.
Introduction to Bayesian
probability & inference
Stages of model development:
1. Data Model: [data|process, parameters] - specifies
the distribution of the data given the process.
2. Process Model: [process|parameters] - describes
the process, conditional on other parameters.
3. Parameter Model: [parameters] accounts for
uncertainty in the parameters.
Introduction to Bayesian
probability & inference
Through the likelihood function, the actual
observations modify prior probabilities for the
parameters.
So, given the prior distribution of the parameters
and the likelihood function of the parameters given
the observations , the posterior distribution of the
parameters given the observations is determined.
The Bayesian posterior distribution is the full
inferential solution to the research problem.
Introduction to Bayesian
probability & inference
8
6
4
2
0
.1
.2
.3
.4
.5
.6
.7
The dashed line represents the likelihood, with 𝜃 being at its maximum at approximately .22, given the
observed frequency distribution of the data. Applying Bayes theorem involves multiplying the prior density
(solid curve) times the likelihood (dashed curve).
If either of these two values are near zero, the resulting posterior density will also be negligible
(i.e., near zero, for example, for 𝜃 < .2 or 𝜃 > .6). Finally, the posterior density (i.e., the dotted-dashed line)
is more informative than either the prior or the likelihood alone.
Bayesian vs. Frequentist
probability

Frequentist (a.k.a. long-run frequency)
– Estimates the probability of the data (D) under a
point null hypothesis (H0) or p(D|H0 )
– A large sample theory with adjustments for small and
non-normal samples (e.g., t-distribution and χ2
tests/rank tests)

Bayesian (a.k.a. conditional or subjective)
– Evaluates the probability of the hypothesis (not
always only the point) given the observed data p(H | D)
and that a parameter falls within a specific
confidence interval.
– Used since 1950’s in computer science, artificial
intelligence, economics, medicine.
Application of Bayesian
modeling and inference
Data obtained from the real world may
– Be sparse
– Exhibit multidimensionality
– Include unobservable variables
IRT & CAT
Bioinformatics
The 2-Level hierarchical
voxel-based fMRI model
1 voxel = 3 x 3 x 3mm
Hierarchical Bayesian SEM in
functional neuroimaging (fMRI)
Bayesian probability &
inference
Advantages
– Data that are less precise (i.e., less reliable) will
have less influence on the subsequent
plausibility of a hypothesis.
– The impact of initial differences in the perceived
plausibility of a hypothesis tend to become less
important as results accumulate (e.g.,
refinement of posterior estimates via MCMC
algorithms and Gibbs sampling).
Bayesian probability &
inference
Advantages
– Sample size is not an issue due to MCMC algorithms.
– Level of measurement is not problematic





Interval estimates use the cumulative normal density
(CDF) function.
Nominal/dichotomous and ordinal measures use the
probit, logistic or log-log link function to map to a CDF.
Uses prior probability of each category given little/no
information about an item.
Categorization produces a posterior probability distribution
over the possible categories given a description of an item.
Bayes theorem plays a critical role in probabilistic learning
and classification.
Example: parameters &
observations
Recall that the goal of parametric statistical
inference to make statements about unknown
parameters that are not directly observable, from
observable random variables the behavior of which
is influenced by these unknown parameters.
The model of the data generating process specifies
the relationship between the parameters and the
observations.
If x represents the vector of n observations, and 𝜣
represents a vector of k parameters, 𝜃1, 𝜃2, 𝜃3… 𝜃k on
which the distribution of observation depends…
Example: parameters &
observations
Then, inserting 𝜣 in place of y yields
p( | x) 
p( x | ) p()
p ( x)
So, we are interested in making statements about
parameters (𝜣), given a particular set of
observations x.
p (x) serves as a constant that allows for p(𝜣|x) to
sum or integrate to 1.
Bayes Theorem is also given as p( | x)  p( x | ) p()
 = “proportional to”
Example: parameters &
observations
In the previous slide, p (𝜣) serves as “a distribution of
belief” in Bayesian analysis and is the joint
distribution of the parameters prior to the
observations being applied.
p (x |𝜣) is the joint distribution of the observations
conditional on values of the parameters.
p (𝜣|x) is the joint distribution of the parameters
posterior to the observations becoming available.
So, once the data are obtained, p (x |𝜣) is viewed as
function of 𝜣 and results in the Likelihood function for
𝜣 given x , i.e., L (𝜣|x).
Example: parameters &
observations
Finally, it is through the likelihood function that the
actual observations modify prior probabilities for
the parameters.
So, given the prior distribution of the parameters
and the likelihood function of the parameters given
the observations , the posterior distribution of the
parameters given the observations is determined.
The posterior distribution is the full inferential
solution to the research problem.
Univariate unrestricted VAR
time series model
A time series process for each variable contained in
the vector y is
y  t   A  L  y  t   X t    t ;
E u  t  u  t  '  t  1,


,T .
Univariate unrestricted VAR
time series model
Where,






y(t) = n X 1 stationary vector of variables observed
at time t;
A(L) = n X n matrix of polynomials in the lag or
backward shift operator L;
X(t) = n X nk block diagonal matrix of observations
on k observed variables;
β = nk X 1 vector of coefficients on the observed
variables;
u(t) = n X 1 vector of stochastic disturbances;
Σ = n X n contemporaneous covariance matrix.
Univariate unrestricted VAR
time series model
Also,


The coefficient on L0 is zero for all elements of A(L)
(i.e., only the lagged elements of y appear on the
right side of the equation).
y(t) matrix is equal to
y '(t )  Ι n
– where y(t) is the k X 1 vector of observations on the k
variables related to each equation and is the matrix
Kroenecker product.
Multivariate vector
autoregressive time series model
A MAR model predicts the next value in a d –
dimensional time series as a linear function of the p
previous vector values of the time series.
The MAR model is based on the Wishart
distribution, where V = p x p, symmetric, positive
definite matrix of random variables.
Multivariate vector
autoregressive time series model
The MAR model divides each time series into two
additive components (a) predictable portion of the
time series and (b) prediction error (i.e., white noise
error sequence).
The error covariance matrix is Gaussian with zero
mean and precision (inverse covariance) matrix Λ.
Multivariate vector
autoregressive time series model
The model with N variables can be expressed in
matrix format as

M  a '11(i )

y n  

i 1  a 'N1(i )

a '1N (i )   y1(n  i )   e1(n ) 

 





a 'NN (i )   y n (n  i )   eN (n ) 


 

M
y n  

a '(i ) x (n  i )  e(n )
i 1

Multivariate Vector
Autoregressive Time Series
Model
The multivariate prediction error filter is expressed
as
M
e(n ) 

a(i )x (n  i )
i 0
where,
a(0 )  I and a(i )  -a'(i ).
Multivariate vector
autoregressive time series model
The model can also be written as a standard
multivariate linear regression model as
yn  xnW  en
where xn   yn1, yn2,..., yn p  are the p previous
multivariate time series samples and W is a (p x d)by-d matrix of MAR coefficients (weights).
There are therefore k = p x d x d MAR coefficients.
Multivariate vector
autoregressive time series model
If the nth rows of Y, X, and E are yn, xn, and en
respectively, then for the n = 1,…N samples, the
equation is
Y  XW  E
Where Y is an N-by-d matrix, X is an N-by-(p x d)
matrix and E is an N-by-d matrix.
Multivariate vector
autoregressive time series model
For the multivariate linear regression model, the
data set D = {X,Y}, the likelihood of the data is
given as
p( D | W, )  ( 2 ) dN / 2 
where,
N /2
 1

exp   Tr ( E D (W ))
 2

is the determinant and Tr ( ) the trace, and
ED (W )  (Y
the error matrix.
- XW )T (Y - XW )
Multivariate vector
autoregressive time series model
To facilitate matrix operations, the vec notation is
given as w  vec(W )
w  vec(W )
denotes the columns being stacked
on top of each other.
To recover the matrix W , columns are “unstacked”
from the vector w .
This matrix transformation is a standard method for
implicitly defining a probability density over a
matrix.
Multivariate vector
autoregressive time series model
The maximum Likelihood (ML) solution for the MAR
-1
coefficients is
T
W
 X X
X TY
ML


And the ML error covariance , SML is estimated as
SML 
1
1
T
E D (WML )
Y- XWML  Y- XWML  
N k
N k
where, k = p x d x d .
Multivariate vector
autoregressive time series model
Again, to facilitate matrix operations, the vec
notation is given as w  vec(W )
w  vec(W )
denotes the columns of W being
concatenated.
This matrix transformation is a standard method for
implicitly defining a probability density over a
matrix.
Multivariate vector
autoregressive time series model
Using the vec notation wML , we define as
wML  vec(WML )
Finally, the ML parameter covariance matrix for wML is
given as
-1

S ML  S ML  X T X

Where  , denotes the Kroenecker product
(Anderson, 2003; Box & Tiao, 1983).
The Population Model :
Bayesian SEM
Unknown quantities (e.g., parameters) are viewed
as random.
These quantities are assigned a probability
distribution (e.g., Normal, Poisson, Multinomial,
Beta, Gamma, etc.) that details a generating
process for a particular set of data.
In this study, our unknown population parameters
were modeled as being random and then assigned
to a joint probability distribution.
The Population Model :
Bayesian SEM
The sampling based approach to Bayesian
estimation provides a solution for the random
parameter vector by estimating the posterior density
of a parameter.
This posterior distribution is defined as the product
of the likelihood function and the prior density
updated via Gibbs sampling and evaluated by
MCMC methods.
The Population Model :
Bayesian SEM
The present study includes Bayesian and
frequentist sampling theory - a “dual approach”
(Mood & Graybill, 1963; Box & Tiao, 1973)
approach .
In the dual approach, Bayesian posterior
information is used to suggest functions of the data
for use as estimators.
Inferences are then made by considering the
sampling properties of the Bayes estimators.
The Population Model :
Bayesian SEM
The goal is to examine the congruency between the
inferences obtained from a Monte Carlo study
based on the population Bayes estimators and the
full Bayesian solution relative to sample size and
time vector length.
Data generation & methods
Five multivariate autoregressive (AR1)
series vectors of varying lengths (5) and
sample sizes (5) were generated using SAS
v.9.2.
Autoregressive coefficients included
coefficients of .80, .70, .65, .50, and .40.
Error variance components of .20, .20, .10,
.15, and .15.
Cross-lagged coefficients of .10, .10, .15,
.10, and .10.
Bayesian Vector Autoregressive
Model
Blue = contemporaneous
Brown = cross-covariances
Red = autoregression of time (t) on t-1
Design
Study design: completely crossed 5 (sample
size) X 5 (time series vector length).
Sample size conditions: N = 1, 3, 5, 10, 15.
Time series conditions: T = 25, 50, 75, 100,
125.
Bayesian model
development
Structure of the autocorrelation, error and
cross-correlation processes were selected
to examine a wide range of plausible
scenarios in behavioral science.
Number and direction of paths were
determined given the goals of the study and
previous work using the Occam’s Window
Model Selection Algorithm (Madigan &
Raftery, 1994; Price, 2008).
Bayesian model
development
Optimal model selection was based on
competing Bayes Factors (i.e.
p(data|M1)/p(data|M2) and Deviance
Information Criterion (DIC) indices.
Development of the multivariate BVAR-SEM
proceeded by, modeling the population
parameters as being unknown but following
a random walk with drift.
Bayesian Model
Development
BVAR-SEM estimation of the population
models proceeded by using a ~noninformative
(weakly vague ) diffuse normal prior distribution
for structural regression weights and variance
components.
Prevents the possible introduction of parameter
estimation bias due to potential for poorly
selected priors in situations where little prior
knowledge is available (Lee, 2007, p. 281;
Jackman, 2000).
Bayesian model priors
development
Semi-conjugate for parameters
multivariate normal N(0, 4).
 ~
Precision for the covariance matrix  ~ inverse
Wishart distribution (multivariate generalization
of the chi-square distribution).
Priors & Programming
These priors were selected based on (a) the
distributional properties of lagged vector
autoregressive models in normal linear
models following a random walk with drift,
and (b) for complex models with small
samples.
Bayesian estimation was conducted using
Mplus, v 6.2 (Muthén & Muthén, 2010).
Model convergence
After MCMC burn-in phase (N=1,000),
posterior distributions were evaluated using
time series and auto correlation plots to
judge the behavior of the MCMC
convergence (i.e., N = 100,000 samples).
Posterior predictive p statistics ranged
between .61 - .68 for all 25 analyses.
Step 2: Monte Carlo study
Monte Carlo simulation provides an empirical
way to observe the behavior of a given statistic
or statistics across a particular number of
random samples.
This part of the investigation focused on
examining the impact of length of time series
and sample size on the accuracy (i.e.,
parameter estimation bias) of the model to
recover the Bayes population parameter
estimates.
Monte Carlo Study
The second phase proceeded by generating data
using parameter estimates derived from the
Bayesian population model incorporating a lag-1
multivariate normal distribution for each of the
following conditions (a) T = 25, N = 1, 3, 5, 10, 15;
(b) T = 50, N = 1, 3, 5, 10, 15, (c) T = 75, N = 1, 3,
5, 10, 15; (d)T = 100; N = 1, 3, 5, 10, 15; (e) T =
125; N = 1, 3, 5, 10, 15.
Mplus, v. 6.2 was used to conduct the N=1000
Monte Carlo study and derive power and bias
estimates.
Results: Over all conditions
When power estimates for parameters were < .80,
Bayesian credible intervals contained the null value
of zero 78% of the time.
Power for parameter estimates displayed power <
.80 in 51% estimates at sample size N=1.
A parameter value of zero was contained in the
95% Bayesian credible interval in 51% of the
estimates at sample size N=1.
Blue = contemporaneous
Brown = cross-covariances
Red = autoregression of time (t) on t-1
Results: Path E on A
When the regression of path E on A exhibited
power < .80 population values of zero were also
observed within the Bayesian 95% credible interval
in 72% of sample size conditions.
Overall, a parameter value of zero was contained in
the 95% Bayesian credible interval in 84% of the
conditions.
Results: Path E on B
When the regression of path E on B exhibited
power < .80, Bayesian credible intervals contained
the null value of zero in 56% of the conditions.
A parameter value of zero was contained in the
95% Bayesian credible interval in 88% of the
conditions.
Blue = contemporaneous
Brown = cross-covariances
Red = autoregression of time (t) on t-1
Results: Path D on B
When the regression of path E on B exhibited
power < .80, Bayesian credible intervals contained
the null value of zero in 56% of the conditions.
A parameter value of zero was contained in the
95% Bayesian credible interval in 88% of the
conditions.
Blue = contemporaneous
Brown = cross-covariances
Red = autoregression of time (t) on t-1
Results: Path E on B
The regression of path E on B exhibited power <
.80 in 32% of conditions.
This occurred 24% of the time in the N=1 and N=3
sample size conditions for time series up to T = 75.
A parameter value of zero was contained in the
95% Bayesian credible interval in 24% of the
conditions.
T = 25 Condition
N=1
N=3
Path
Pop
Mean
M.S.E.
% bias
power
Pop
Mean
M.S.E.
% bias
power
SERIESA ← SERIESB
-1.14
-1.14
0.071
-0.72
0.98
-1.14
-1.14
0.018
-0.52
1.00
SERIESC ← SERIESB
1.60
1.60
0.053
0.11
1.00
1.63
1.62
0.015
-0.25
1.00
SERIESD ← SERIESB
0.65
0.65
0.301
-1.06
0.26*
0.62
0.62
0.082
0.00
0.61*
SERIESE ← SERIESB
-0.16
-0.15
0.077
-4.46
0.09*
-0.18
-0.17
0.023
-7.10
0.20*
SERIESD ← SERIESC
0.98
1.00
0.224
2.24
0.58*
1.00
1.00
0.065
0.00
0.97
SERIESA ← SERIESC
0.50
0.49
0.058
-1.82
0.60*
0.50
0.50
0.015
0.00
0.98
SERIESE ← SERIESC
0.77
0.75
0.051
-1.96
0.90
0.75
0.75
0.014
0.00
1.00
SERIESD ← SERIESE
-0.34
-0.34
0.193
-0.18
0.12*
-0.37
-0.38
0.059
3.27
0.36*
SERIESA ← SERIESE
-0.08
-0.08
0.045
-7.41
0.06*
-0.07
-0.07
0.014
0.00
0.08*
SERIESD ← SERIESA
1.78
1.77
0.044
-0.48
1.00
1.79
1.80
0.014
0.21
1.00
Red = the null values θ0 lying in the credible interval for parameter θ, therefore the null hypothesis
cannot be rejected (i.e., its credibility of the null hypothesis). Blue = power < .80.
T = 25 Condition
N=5
N = 10
Path
Pop
Mean
M.S.E.
% bias
power
Pop
Mean
M.S.E.
% bias
SERIESA ← SERIESB
-1.14
-1.13
0.010
-0.35
1.00
-1.14
-1.14
0.005
-0.26
SERIESC ← SERIESB
1.61
1.61
0.009
-0.25
1.00
1.61
1.61
0.004
-0.25
SERIESD ← SERIESB
0.64
0.64
0.048
0.63
0.84
0.64
0.63
0.020
-0.94
SERIESE ← SERIESB
-0.15
-0.14
0.014
-2.91
0.26*
-0.17
-0.16
0.007
-6.95
SERIESD ← SERIESC
1.00
1.00
0.036
-0.01
0.99
1.00
1.00
0.017
-0.01
SERIESA ← SERIESC
0.50
0.50
0.008
-0.36
1.00
0.49
0.49
0.004
-0.61
SERIESE ← SERIESC
0.75
0.75
0.008
-0.27
1.00
0.76
0.76
0.017
0.40
SERIESD ← SERIESE
-0.34
-0.35
0.034
4.13
0.46*
-0.36
-0.36
0.004
0.84
SERIESA ← SERIESE
-0.08
-0.07
0.008
-4.00
0.12*
-0.08
-0.07
0.004
-4.00
SERIESD ← SERIESA
1.80
1.80
0.008
0.00
1.00
1.80
1.80
0.004
0.00
Red = the null values θ0 lying in the credible interval for parameter θ, therefore the null
hypothesis cannot be rejected (i.e., it’s credibility of the null hypothesis). Blue = power < .80.
T = 50 Condition
N=1
N=3
Path
Pop
Mean
M.S.E.
% bias
power
Pop
Mean
M.S.E.
% bias
power
SERIESA ← SERIESB
1.54
1.55
0.066
0.78
1.00
1.54
1.53
0.023
-0.34
1.00
SERIESC ← SERIESB
1.12
1.13
0.022
0.40
1.00
1.12
1.12
0.007
-0.03
1.00
SERIESD ← SERIESB
0.42
0.42
0.102
0.17
0.29*
0.41
0.40
0.032
-3.21
0.62*
SERIESE ← SERIESB
1.11
1.10
0.031
-0.79
1.00
-0.11
-0.10
0.010
-3.89
0.20*
SERIESD ← SERIESC
-0.82
-0.83
0.082
1.58
0.83
-0.82
-0.83
0.030
1.58
0.99
SERIESA ← SERIESC
-1.45
-1.44
0.023
-0.69
1.00
-1.45
-1.45
0.008
-0.54
1.00
SERIESE ← SERIESC
0.64
0.64
0.022
-0.64
0.98
0.63
0.62
0.007
-1.60
1.00
SERIESD ← SERIESE
0.38
0.38
0.080
1.86
0.27*
0.39
0.38
0.025
-1.29
0.66*
SERIESA ← SERIESE
0.17
0.18
0.022
2.69
0.23*
0.17
0.18
0.006
2.69
0.57*
SERIESD ← SERIESA
1.62
1.62
0.021
-0.07
1.00
1.64
1.65
0.007
0.54
1.00
T = 75 Condition
N=1
N=3
Path
Pop
Mean
M.S.E.
% bias
power
Pop
Mean
M.S.E.
% bias
power
SERIESA ← SERIESB
1.08
1.12
0.021
3.54
1.00
1.10
1.12
0.006
1.45
1.00
SERIESC ← SERIESB
1.08
1.06
0.016
-1.85
1.00
1.09
1.06
0.006
-2.99
1.00
SERIESD ← SERIESB
0.01
0.09
0.345
841.00
0.23*
0.01
0.08
0.014
742.00
0.25*
SERIESE ← SERIESB
0.35
0.33
0.046
-3.48
0.40*
0.36
0.33
0.016
-6.93
0.83
SERIESD ← SERIESC
0.31
0.22
0.038
-27.66
0.39*
0.30
0.23
0.015
-23.33
0.75*
SERIESA ← SERIESC
-0.13
-0.09
0.021
-32.33
0.14*
-0.08
-0.06
0.006
-21.25
0.17*
SERIESE ← SERIESC
0.88
0.96
0.053
9.32
0.99
0.93
0.96
0.015
3.83
1.00
SERIESD ← SERIESE
0.01
0.21
0.078
2021.00
0.36*
0.01
0.20
0.049
1894.00
0.65*
SERIESA ← SERIESE
0.01
-0.36
0.140
-3746.00
0.94
0.01
-0.39
0.154
-4003.00
1.00
SERIESD ← SERIESA
0.01
-0.38
0.167
-3900.00
0.88
0.01
0.08
0.170
695.00
1.00
T = 125 Condition
N=1
N=3
Path
Pop
Mean
M.S.E.
% bias
power
Pop
Mean
M.S.E.
% bias
power
SERIESA ← SERIESB
1.02
1.02
0.03
-0.01
1.00
1.00
0.99
0.008
-1.30
1.00
SERIESC ← SERIESB
1.33
1.32
0.01
-0.08
1.00
1.33
1.32
0.003
-0.75
1.00
SERIESD ← SERIESB
0.73
0.73
0.04
-0.55
0.96
0.74
0.73
0.012
-1.35
1.00
SERIESE ← SERIESB
0.05
0.05
0.01
0.67
0.08*
0.05
0.05
0.003
0.00
0.18*
SERIESD ← SERIESC
-0.52
-0.53
0.03
1.46
0.82
-0.51
-0.51
0.010
0.79
0.99
SERIESA ← SERIESC
-1.47
-1.47
0.01
-0.14
1.00
-1.44
-1.43
0.002
-0.42
1.00
SERIESE ← SERIESC
0.17
0.17
0.01
0.83
0.46*
0.16
0.16
0.002
0.00
0.87
SERIESD ← SERIESE
0.86
0.87
0.03
0.23
0.99
0.86
0.87
0.009
0.70
1.00
SERIESA ← SERIESE
0.09
0.09
0.01
1.08
0.17*
0.10
0.10
0.003
-1.02
0.46*
SERIESD ← SERIESA
1.54
1.54
0.01
0.26
1.00
1.52
1.53
0.003
0.66
1.00
Conclusions
When the autoregressive value drops below .70 and
variance below .20 (as in vectors A and B), problems
related to parameter bias and statistical power
consistently occur – regardless of sample size and vector
time length.
Percent bias in parameter estimates (> 5%) and/or
power was particularly problematic (< .80) in paths
involving the C, D and E vectors across all sample size
conditions and all time vectors conditions.
Recall that the C, D and E vectors included
autoregressive coefficients of .65 (σ2 =.10), .50 (σ2 =.10)
and .40 (σ2 =.15).
Conclusions
One-sided Bayesian hypothesis tests (i.e., H1: H0 > 0)
conducted by using the posterior probability of the null
hypothesis concurred with frequentist sampling theory
power estimates.
However, when we move beyond hypothesis tests by
modeling the values for the entire distribution of a
parameter the Bayesian approach is more informative
(*important for applied research).
This settles the question of where we are in the
parameter dimension given a particular probability value
(e.g., we can make probabilistic statements with greater
accuracy).
Conclusions
The Bayesian MAR provides a fully Bayesian
solution to multivariate time series models by
modeling the entire distribution of parameters – not
merely point estimates of distributions.
The Bayesian approach is more sensitive than
frequentist approach by modeling the full
distribution of the parameter dimension for time
series where the autoregressive components are
< .80.
THANK YOU FOR YOUR ATTENTION!
References
Anderson, T. W., 2003. An Introduction to Multivariate Time
Series 3rd ed. Wiley, New York, NY.
Ansari, A., & Jedidi, K. (2000). Bayesian factor analysis for
multilevel binary observations. Psychometrika, 64, 475–496.
Box, G. and Tiao, (1973). Bayesian Inference in Statistical
Analyses. Reading, MA: Addison-Wesley.
Chatfield, C. (2004). The Analysis of Time Series: An
Introduction, 6th ed. New York: Chapman & Hall.
References
Congdon, P. D. (2010). Applied Bayesian Hierarchical Methods.
Boca Raton: Chapman & Hall/CRC Press.
Gelman, A., Carlin, J. B., Stern, H., & Rubin, D.B. (2004).
Bayesian Data Analysis, 2nd ed. Boca Raton: CRC/Chapman
Hall.
Jackman, S. (2000). Estimation and inference via Bayesian
simulation: An introduction to Markov Chain Monte Carlo.
American Journal of Political Science, 44:2, 375 – 404.
Lee S.Y. (2007). Structural Equation Modeling: A Bayesian
Approach. New York: Wiley..
Lee S.Y., & Song X.Y. (2004). Bayesian model comparison of
nonlinear structural equation models with missing continuous
and ordinal data. British Journal of Mathematical and
Statistical Psychology, 57, 131-150.
References
Lütkepohl, H. (2006). New Introduction to Multiple Time Series
Analysis. Berlin: Springer-Verlag.
Price, L.R., Laird, A.R., Fox, P.T. (2009). Neuroimaging network
analysis using Bayesian model averaging. Presentation at The
International Meeting of the Psychometric Society, Durham,
New Hampshire.
Price, L.R., Laird, A.R., Fox, P.T., & Ingham, R., (2009). Modeling
dynamic functional neuroimaging data using structural
equation modeling. Structural Equation Modeling: A
Multidisciplinary Journal, 16, 147-162.
Scheines, R., Hoijtink, H., & Boomsma, A. (1999). Bayesian
estimation and testing of structural equation models.
Psychometrika, 64, 37–52.
Muthen L.K., Muthen, B.O. (2010). Mplus Version 6.0 [Computer
program]. Los Angeles, CA: Muthen & Muthen.