Download Applying bootstrap methods to time series and regression models

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Linear regression wikipedia , lookup

German tank problem wikipedia , lookup

Time series wikipedia , lookup

Choice modelling wikipedia , lookup

Regression analysis wikipedia , lookup

Robust statistics wikipedia , lookup

Coefficient of determination wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Transcript
Applying bootstrap methods to
time series and regression
models
β€œAn Introduction to the Bootstrap” by Efron and Tibshirani, chapters 8-9
M.Sc. Seminar in statistics, TAU, March 2017
By Yotam Haruvi
1
The general problem
β€’ So far, we've seen so called one-sample problems.
β€’ Our data was an 𝑖. 𝑖. 𝑑 sample from a single, unknown distribution 𝐹.
β€’ Note that 𝐹 may have been multidimensional.
β€’ We generated bootstrap samples from 𝐹, which gave each
1
observation a probability of .
𝑛
β€’ But not all datasets comply with this simple probabilistic structure…
β€’ Examples?
2
Unknown
probabilistic model
Observed data
Bootstrap sample
Real world
Bootstrap world
𝑃 β†’ 𝒙 = (π’™πŸ , … , 𝒙𝒏 )
𝑃 β†’ π’™βˆ— = (π’™πŸβˆ— , … , π’™π’βˆ— )
πœƒ = 𝑆(𝒙)
Statistic of interest
3
Estimated
probabilistic model
We will focus
on this part
today
πœƒ βˆ— = 𝑆(π’™βˆ— )
Bootstrap replication
Agenda
We wish to extend the bootstrap method to other, more complexed
data structures:
β€’ Time series
β€’ Regression
We will review several, ad-hoc bootstrap methods, for each of the
structures above.
But we’ll start with a simple example - Two-sample problem.
4
Two-sample problem – the framework
β€’ For example, blood pressure measurements in treatment and placebo
groups.
β€’ Let 𝑛 denote the number of patients in the treatment group.
β€’ Let π‘š denote the number of patients in the placebo group.
β€’ Our data: (𝑧1 , … , 𝑧𝑛 , 𝑦1 , … , π‘¦π‘š ).
β€’ This isn’t a one-sample problem, since 𝒛 and π’š may come from
different distributions.
5
Two-sample problems – a bootstrap solution
β€’ The extension of bootstrap is simple:
β€’ We denote the distribution of blood pressure in the treatment group
by 𝐹. That is, 𝐹 is the distribution that generated 𝒛.
β€’ Similarly 𝐺 is the distribution that generated π’š.
β€’ We estimate 𝐹 and 𝐺 separately.
1
𝑛
β€’ 𝐹 gives probability to each of 𝑧1 , … , 𝑧𝑛 and 𝐺 gives probability
each of 𝑦1 , … , π‘¦π‘š .
6
1
π‘š
to
Two-sample problems – a bootstrap solution
β€’ A single bootstrap sample contains 𝑛 samples from 𝐹 and π‘š samples
from 𝐺 .
β€’ For each bootstrap sample, we calculate 𝑧 βˆ— βˆ’ 𝑦 βˆ— .
β€’ We can estimate the standard error of the means difference.
β€’ What parameters from the β€œreal world” were mapped with no change
to bootstrap world? What is the justification for that choice?
7
Time series
β€’ A time series 𝑦𝑑 𝑉𝑑=1 is a dataset for which we have a reason to
believe that if 𝑑1 and 𝑑2 are β€œclose enough”, then 𝑦𝑑 1 and 𝑦𝑑 2 are also
β€œclose”.
β€’ Example, measuring the level of a hormone in one subject, every 10
minutes, during an 8 hours time window.
β€’ We assume that all (48) observations have the same mean πœ‡.
8
Time series - illustration
lutenizing hormone
4
Hormone level
3.5
3
2.5
2
1.5
1
Time point
Diggle, 1990: 48 measurements taken from a healthy woman, every 10 minutes
9
Time series – the problem
β€’ We denote the centered time series by 𝑧𝑑 = 𝑦𝑑 βˆ’ 𝑦
β€’ We’d like to fit a first order autoregressive scheme - AR(1) model
β€’ 𝑧𝑑 = 𝛽 β‹… π‘§π‘‘βˆ’1 + πœ–π‘‘ , 𝑑 = 2, … , 48
β€’ 1 ≀ 𝛽 ≀ 1 , π”Όπœ– = 0
β€’ How well does this model fit?
β€’ What is the SE of 𝛽?
β€’ We’d like to apply bootstrap method to answer that.
β€’ Can we use β€œone-sample” bootstrap here?
10
Time series – a bootstrap solution
β€’ We estimate 𝛽 using LS: 𝛽 = argmin
𝛽
48
𝑑=2
𝑧𝑑 βˆ’ π›½π‘§π‘‘βˆ’1
2
β€’ We estimate the error terms πœ–π‘‘ by πœ–π‘‘ = zt βˆ’ π›½π‘§π‘‘βˆ’1
β€’ A bootstrap sample is generated:
β€’
β€’
β€’
β€’
β€’
𝑧1βˆ— = 𝑧1 = 𝑦1 βˆ’ 𝑦
𝑧2βˆ— = 𝛽𝑧1 + πœ–2βˆ—
𝑧3βˆ— = 𝛽𝑧2βˆ— + πœ–3βˆ—
…
βˆ—
βˆ—
βˆ—
𝑧48
= 𝑧47
+ πœ–48
β€’ Where πœ–π‘‘βˆ— are drawn randomly with replacement from πœ–2 , … , πœ–48 .
11
Time series – a bootstrap solution
β€’ For each bootstrap sample we calculate LS estimator π›½βˆ—
β€’ We can now estimate the SE of 𝛽 by the empirical SE of all 𝛽 βˆ—
β€’ We can extend easily to second order autoregressive scheme – AR(2)
12
Time series – moving blocks bootstrap
β€’ A different approach to time series – the moving blocks bootstrap.
β€’ Choose a block length ( 3 in our illustration)
β€’ sample with replacement from all possible contiguous blocks of that
length.
β€’ Align those blocks until you get a sample of (approximately) size 𝑛.
Original sample
13
Bootstrap sample
Time series – moving blocks bootstrap
β€’ For each bootstrap sample we calculate LS estimator π›½βˆ—
β€’ We can now estimate the SE of 𝛽 by empirical SE of all 𝛽 βˆ—
β€’ We can extend easily to second order autoregressive scheme – AR(2)
14
Time series – discussion of two methods
β€’ Advantage of moving blocks approach: doesn’t depend on a specific model.
β€’ Note: we still use an AR model in this framework as we apply it on each
bootstrap sample. The difference is that we don’t use it to generate the
bootstrap sample!
β€’ disadvantage of moving blocks approach: How to choose a block length 𝑙?
β€’ 𝑙 should be large enough so that observations that are more than 𝑙 time
steps apart from each other, are approximately independent.
β€’ 𝑙 = 1 is a one-sample bootstrap. Implies no correlation between neighbors.
β€’ 𝑙 = 𝑛 is not helpful, we will get the same estimators…
β€’ The authors state that there isn’t (yet on 1993) a solid method for choosing
an optimal 𝑙.
15
Regression – the framework
β€’ Consider a regression model in which we observe pairs 𝒙i = (𝒄𝑖 , 𝑦𝑖 ),
where 𝒄𝑖 is a vector of length 𝑝 and 𝑖 = 1, … , 𝑛.
β€’ A model: 𝑦𝑖 = 𝒄𝑖 𝜷 + πœ–π‘– , where 𝜷 = 𝛽1 , … , 𝛽𝑝
𝑇
β€’ And a function 𝐺(𝒙, 𝒃), where 𝒃 ∈ ℝ𝑝 and 𝒙 ∈ ℝ𝑛×(𝑝+1) , by which
we measure the β€œgoodness of fit” of a model.
β€’ The classic framework also includes the assumption that the error
terms 𝝐 come from a single (centered) distribution, and that they are
independent of 𝒄.
16
Regression – the problem
β€’ The most common β€œfit function”: 𝐺 𝒙, 𝒃 =
𝒏
π’Š=𝟏
π’šπ’Š βˆ’ 𝒄𝑖 𝒃
𝟐
β€’ We can derive an analytical expression, not only for 𝜷, but for 𝑆𝐸 𝜷
as well.
β€’ If we assume normality we can easily test 𝐻0 : 𝛽𝑖 = 0.
β€’ But what if we're interested, for example, in the more robust Least
Median of Squares model, in which 𝐺 𝒙, 𝒃 = π‘šπ‘’π‘‘π‘–π‘Žπ‘› π’šπ’Š βˆ’ 𝒄𝑖 𝒃 𝟐 ?
β€’ We can (numerically) calculate 𝜷 = argmin{π‘šπ‘’π‘‘π‘–π‘Žπ‘› π’šπ’Š βˆ’ 𝒄𝑖 𝒃 𝟐 },
but what about its SE?
17
𝒃
Regression – Bootstrap solutions
We will cover two different ways in which we can generate bootstrap
samples:
β€’ Bootstrapping pairs
β€’ Bootstrapping residuals
18
Regression - Bootstrapping pairs
β€’ Bootstrapping pairs means that we draw (with replacement) 𝑛 pairs
from 𝒙1 = 𝒄1 , 𝑦1 , … , 𝒙n = 𝒄𝑛 , 𝑦𝑛 , to create a single bootstrap
sample -π’™βˆ— .
β€’ For each bootstrap sample we calculate π›½βˆ— - the minimizer of
𝐺 π’™βˆ— , 𝒃 .
β€’ We can now estimate 𝑆𝐸 𝜷 .
19
Regression - Bootstrapping residuals
β€’ We’ve already seen it in context of time series…
β€’ Bootstrapping residuals requires that we first calculate 𝜷 using the
original sample.
β€’ Then we estimate the error terms πœ–π‘– = 𝑦i βˆ’ πœ·π’„π’Š and obtain an
empirical distribution of errors.
β€’ A bootstrap sample is generated: π’™βˆ—π’Š = π’„π’Š , πœ·π’„π’Š + πœ–π‘–βˆ— where πœ–π‘–βˆ— is
drawn with replacement from {πœ–1 , … , πœ–π‘› } .
β€’ For each bootstrap sample, we calculate 𝛽 βˆ— - the minimizer of
𝐺 π’™βˆ— , 𝒃 . We can now estimate 𝑆𝐸 𝜷 .
20
Regression – discussion of two methods
β€’ We will prefer bootstrapping pairs, if the assumption that the error
terms and covariates are independent is violated.
β€’ In other words, bootstrapping residuals is (slightly) more sensitive to
the assumption above (it seems that the differences aren’t large).
β€’ When bootstrapping residuals, each bootstrap sample has exactly the
same covariates vector as the original sample. This structure is
suitable for data in which there is no variability in the covariates.
β€’ As 𝑛 grows, bootstrapping pairs approaches bootstrapping residuals.
21
Conclusion
β€’ Some data structures, everything but 𝑖. 𝑖. 𝑑 samples, require more
careful thinking about the process in which we extract 𝐹 from the
observed data.
β€’ We’ve seen that in the presence of a statistical model, one way of
dealing with this issue is bootstrapping residuals. We’ve applied it to a
time series model as well as to a regression model.
β€’ The downside of bootstrapping residuals may be it’s reliance on some
of the model’s assumptions.
β€’ To tackle this problem, we’ve offered slightly more robust
approaches: moving blocks in the context of time series, and
bootstrapping pairs in the context of regression.
β€’ It turns out that in many cases, different methods agree, even if not
all model assumptions are justified.
22
Thank you!
23