Download assessment of precision and limited attention

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Financial economics wikipedia , lookup

Transcript
Learning under limited attention and unknown precision of information
J. Daniel Aromí
IIEP - UBA - Conicet. UdeSA
June 2012
(First version April 2012)
Abstract:
A Bayesian agent is simultaneously learning about the payoff associated to a set of feasible actions and
the precision of the information used to generate the estimate. Learning is endogenous as the agent
selects different levels of scarce attention as a function of perceived precision. Variations in attention
due to different assessments of precision result in heavier tails and sharper peaks in the distribution of
estimation errors. Unlike the known precision case, learning activity can present delays and nonmonotonicities due to revisions in the perception of precision. Another difference with the known
precision case is that the sequence of updates shows correlated volatility since a surprising signal leads
to a higher level of attention which affects the distribution of future revisions.
JEL Codes: D81, D84
Contact: [email protected]
1. Introduction
In any economic setting, decision making is assisted by learning processes that generate the
representations of the environment and, in particular, the value associated to different actions. Under
limited cognitive resources, these representations will necessarily be approximations whose accuracy
depends on the cognitive resources recruited for this task. For example, investment decisions demand
assessments of profitability. These assessments are built acquiring information and inferring its
implications. Relevant information includes features and outcomes of past projects and the evolution of
plausible determinants of profitability. This information can be acquired with different levels of breadth
and precision. Additionally, the inference of the implications of available information can be made with
different levels of thoroughness. As a result the accuracy of the assessment will be a function of the
allocation of effort to the task.
In addition, for a given level of cognitive effort, the precision of the assessment will depend on specific
features of the task. These aspects include the quantity of relevant factors, the complexity of their
interaction, the ease with which main factors can be identified, and the facility with which their joint
influence can be established. It is evident that these features of the specific task cannot be known with
exactitude before developing the learning task. It is possible that random factors such as the perceived
consistency of available information will determine the initial assessment of precision. As a result,
determining the precision of the representation is part of the learning process.
In this way, learning involves the multiple tasks of allocating cognitive effort, acquiring a representation
of the environment and assessing the accuracy of the representation. These three aspects evolve jointly
as part of a multifaceted process. In this process, attention is a function of judgments on accuracy.
Assessed accuracy is a function of the consistency of incoming information. And, closing a cycle, the rate
of arrival of information is determined by attention levels.
This work provides a tractable framework that represents this type of process. A Bayesian framework is
provided in which beliefs with respect to payoffs and precision of information are updated in response
to information flow. In addition, cognitive resources will be recruited in response to the assessment of
precision. This analysis constitutes one instance in which the relevance of learning processes is
magnified due to the existence of multidimensional uncertainty and endogenous cognitive decisions.
The present study reveals that the joint presence of precision uncertainty and limited attention has
implication for the rate at which different magnitudes of estimation errors will be observed. Variations
in the assessments of precision lead to variations in attention levels or, in other words, different rates of
information flow. Beliefs consistent with high assessment of precision lead to low attention and
increasing likelihood of relatively large errors while beliefs consistent with low assessments of precision
lead to high levels of attention increasing the chances of relatively small mistakes. Hence, due to
endogenous variation in attention levels, small errors and large errors will occur at a higher rate. More
specifically, in the exercise developed below, despite starting from a Gaussian set-up, the distribution of
errors can present high levels of kurtosis. Fat tails and sharp peaks are observed as the likelihood of
extreme and small errors is higher than what would be expected when evaluating a fitted normal
distribution.
In the benchmark case, with known precision, learning can be characterized as a relatively tranquil
progression in which uncertainties gradually disappear and cognitive effort decreases as a consequence.
In contrast, with unknown precision, effort levels need not decrease with time. Learning can lead to
reassessments of the scope of ignorance. New information with surprising content can lead to posterior
beliefs in which precision is judged to be lower than previously held. This downward reassessment of
precision can cause an increment in the recruitment of cognitive resources with respect to what was
observed in previous periods. The opposite result holds in the case in which the initial information is
unsurprising. That is, little news brings about little news. As a result, the joint presence of precision
uncertainty and limited attention results in correlated volatility.
The analysis is developed under the assumption that the agent updates its beliefs using Bayes’ rule. One
key element that guarantees tractability is the identification of a convenient representation of beliefs
and signals. The current analyses exploits the fact that given normally distributed signals and Normalgamma prior beliefs, the posterior beliefs are also given by a distribution belonging to the Normalgamma family of distributions. That is, Normal-gamma prior distribution and Normal-gamma posterior
distribution are conjugate distributions. For the case of identically distributed signals this is a standard
result in statistics.1 This work develops and makes use of a simple extension of this result for the case of
effortful signals with varying levels of precision.
The framework developed in the present study provides an expanded language for representing learning
dynamics. In particular, shocks to expectations can take different forms. The most common form of
expectational errors refers to the noise in the estimates of payoff function parameters. In this work,
additional forms of belief errors are provided, these are errors regarding the assessments of precision of
current knowledge and incoming signals. This distinction is relevant, since different learning paths are
associated to different profiles of error in expectations.
With respect to related contributions, this work can be linked to the literature on systematic errors in
the calibration of confidence and the literature on rational inattention. In coincidence with the first
group of contributions2, this work allows for discrepancies between correct confidence intervals and
assessed confidence intervals. In contrast with this literature, this work does not focus on biases in the
estimation of confidence but on consequences associated to errors in its estimation. In line with the
literature on rational inattention3, this work allows for endogenous determination of costly information
flow. A key distinction of this contribution is that learning is performed in a context in which the proper
allocation of attention is not known.
1
For an exposition of this result see DeGroot (1986).
For application in economic see Malmendier and Tate(2005) for overconfidence by CEOs, Camerer and Lovallo
(1999) for overconfidence and entry and Barber and Odean (2001) for overconfidence in financial investment
decisions.
3
See for example Sims (2003), Woodford (2012), Mackowiak and Wiederholt (2010), Peng and Xiong (2006).
2
More generally, this work can be linked to contributions that measure or explain heavy tailed
distributions and correlated volatility in economic time series and, specially, financial time series4. This
work provides a new mechanism that can explain heavy tails and clustered volatility. From this
perspective, learning about precision and endogenous recruiting of cognitive resources constitute one
potential explanation of these statistical properties of economic time series.
This work suggests that the consideration of models of learning that allow for rich forms of uncertainty
can be relevant in economic analysis. In this respect this work follows contributions such as Avery et al.
(1998), Sanguinetti et al. (1998) and Weitzman (2007). Avery et al. (1998) develop a framework in which
multidimensional uncertainty allows for herding and mispricing in financial markets. In one of their
examples, mispricing can occur due to joint presence of uncertainty regarding three aspects: the
occurrence of a shock, its effect and the quality of trader’s information. Sanguinetti et al. (1998)
consider an economy in which a shock has changed its long term growth path; the agents are uncertain
about the location of the new growth path and the rate of adjustment. As a result of the two parameter
uncertainty, non-monotonic trajectories can be observed as mistakes in initial beliefs can be associated,
temporarily, to inferences that increase certain prediction errors. Weitzman (2007) suggest that asset
pricing puzzles can be rationalized by models that allow for uncertainty regarding the growth rate of the
economy and, additionally, uncertainty regarding the evolution of the parameters that determine the
stochastic path.
The next section develops the model and presents properties of the learning process. In section 3, the
decision problem is analyzed. Section 4 considers the model in the case of multiple periods. Concluding
remarks are provided in the last section.
2. Beliefs and Bayesian learning
We consider settings in which information allows an agent to select fitter actions. The agent is uncertain
about the value of a parameter that affects the relative payoffs of available actions. In addition, we
make two key assumptions regarding information and learning. First, the agent is uncertain about the
value of a parameter which governs the precision of information. Second, the agent receives
informative signals whose precision can be controlled by the allocation of effortful attention.
Prior beliefs are assumed to belong to the Normal-gamma family of distributions. That is, beliefs are
given by a bi-variate four parameter distribution function: ( , )~
( , , , ). This
means that the precision parameter is believed to be distributed according to a Gamma distribution
with shape parameter and scale parameter . Hence the mean precision is believed to be given by
4
For an early contribution see Mandelbrot (1963). Agent based models in which agent used simple rules of thumb
have also been developed to explain these statistical properties in financial markets, for example see Hommes
(2006). Thurner, Farmer and Geanokoplos (2011) suggest that fat tails and clustered volatility in finance can be
explained by leverage levels.
and the mean variance of the precision is
. Additionally, conditional on the value of the precision
parameter, , the parameter is believed to be distributed normally with mean and variance 1⁄ .
Hence, under these beliefs, average is given by and the variance of is believed to be 1/(( −
1) ). Given and , signals are distributed normally with mean and variance 1⁄ were
represent the effortful attention allocated to the acquisition of the signal. Hence, for this type of signal,
higher effort is associated to more precision or lower variance.
In the jargon of Normal-gamma distributions, is known as the center parameter and constitutes the
mean expected value of independently of the value of the precision parameter. On the other hand,
the parameter is known as the precision multiplier and it determines the relative informational value
of the prior estimate of versus the incoming signals.
The role of the shape parameter and scale parameter of the Gamma distribution can be described
by observing that when → ∞ and → 0 with
→ , there is convergence to the case in which the
precision parameter is known since the variance of the precision parameter converges to 0,
→ 0. IN
other words, for a given mean value of the precision parameter , a high shape parameter and low
scale parameter are associated with a relatively certain value of the precision parameter while a low
shape parameter and a high scale parameter are associated with high uncertainty about the precision
of information.
One reason to focus on prior beliefs of the Normal-gamma family is that learning from a group of
independent normally distributed signals, with uninformative prior beliefs regarding mean and variance
of signals, results in beliefs that can be represented as a bi-variate distribution of the Normal-gamma
!
, with ~ ( , 1/ ) the sample mean, ̅ = ∑!
/%, is distributed
family. Given signals
normally with mean and variance 1/( %). The product of the precision parameter and the sample
variance, & = ∑! ( − ̅ ) /%, is distributed according to a Chi-squared distribution with % − 1
degrees of freedom. The Chi-squared distribution is a special case of the Gamma distribution...
2.1 Bayesian updating
To update the beliefs after observing the effortful signal we present a simple extension of the
standard result for the case in which the precision of the signal is a function of the attention level .
Proposition 1:
Consider prior beliefs over ( , ) given by
( , , , ) and signal with distribution
'
( ′, ′, ′, ′) where the
( , ( ) ). Then, the posterior beliefs are given by
updated parameters are given by:
)
=
)
=
+
+
+
=
)
)
=,
'
+
+
1
2
2( + )
'
( − ) -
Proof:
Let . ( ) =
/(0)12
3'
4
5
6
'
,. ( | )=
(89):/;
(
:
<);
5=(>?@);
;
4'
and .A ( | , ) =
.( , | ) = . ( ). ( | ).A ( | , )
(B9):/;
(
:
<);
>?@);
;
4 'CD(
. Then,
La siguiente identidad puede ser verificada:
( − ) + ( − ) = ( + )(
)
− ) +
+
( − )
Hence, the conditional probability density function is given by:
= EF ′
/
4
(H)'I);
'CG)
J
3K '
GD (N'H);
'C, M
GMD
L
4
Where E is a contant that does not depend on or . The expression inside brackets is proportional a
function that, up to a constant coefficient, equals the probability density function of a normal random
variable with mean and precision ′ . The expression after the brackets equals, up to a constant
coefficient, the probability density function of a random variable distributed according to a gamma
distribution with shape parameter ′ and scale parameter ′.
□
3. The allocation of attention
We analyze a decision where there exists uncertainty regarding payoff function parameter and the
precision of information. In this section, we analyze a single period problem. In the next section, we
analyze a multiple period problem.
We consider a problem of an agent that can acquire information in order to make better decisions but
faces an opportunity cost associated to this acquisition. The payoff of the agent is separable into a term
reflecting the fitness of the actions and a term capturing the opportunity cost of the cognitive resources.
The first term is a function of the difference between the action selected in each period, &Oℝ, and the
value of an unknown parameter, . More precisely, this loss equals the mean squared difference
between the action selected and the parameter value. The loss function that depends on the square of
the error could be thought as capturing instances in which big errors can lead to higher probability of big
losses such as death and bankruptcy. Given the symmetry of the distribution representing beliefs
regarding , this loss is minimized by selecting an action equal to the posterior mean value of , that is
& = QR S = ′. We will assume that the agent always sets this optimal value for the action and in this
way we focus exclusively on the selection of attention levels. The second term captures the assumption
that, while allocating more attention leads to more precise estimates of , there is an associated
constant marginal cost equal to T. A natural interpretation is that allocating attention is costly because
this implies reducing the level of attention allocated to other tasks.5 Hence, once selecting the optimal
value for &, the payoff function is given by:
U( ) = −QR( ′ − ) | S − T
The problem is reduced to one in which the agent selects the attention level in order to minimize this
loss function. The expectation in the expression above refers the mean value as determined by Bayesian
updating of the agent prior beliefs for a given value of the effort level. In particular, the posterior mean
expected value of the parameter ′ is the value updated through Bayes’ formula. The expected value
can be worked out to obtain a simple expression:
QR( ′ − ) | S = Q F ,
=
1
V
( + )
QR(
=
W
+
+
− ) S+
1
( + )( − 1)
− - | J
QR( − ) SX
Note that this problem is well defined only when the shape parameter is above 1, otherwise the
variance assigned to is not a real number. In other words, for a given value of the mean expected
precision, , the problem is not well defined when the uncertainty regarding variance is too high, low
and high .6
The single period problem takes a simple functional form:
max −
\]W
1
−T
( + )( − 1)
The first order conditions are given by:
1
−T ≤0
( + ) ( − 1)
5
_
1
− T`
( + ) ( − 1)
=0
Alternative formulations could allow for increasing marginal costs and capacity constraints. Most of the findings
are expected to be observed under this alternative, less tractable, formulations.
6
The existence of a range of parameter values for which the variance is not a real number suggest that alternative
formulations for the penalty associated to errors in estimation. For example, the loss from unfit actions could be
calculated using percentiles of the estimated distribution of prediction errors.
The first order conditions are sufficient given the concavity of the objective function and the affine
inequality constraint. The condition can be solved for an explicit expression of the optimal effort level.
We summarize this result in proposition 2.
Proposition 2:
Consider the problem of minimizing loss function U(&, ) = −QR(& − ) | S − T given Normal-gamma
beliefs and signals distributed normally with precision 1/( ). The optimal level of attention is given by:
=
_0,
1
RT( − 1) S
/
− `
This expression shows that attention levels are a decreasing function of prior belief parameters of and
. This means that increases in any of the parameters that determines the mean value of precision, or
, result in lower effort levels. Increases in the precision multiplier , that results in lower variance of
the prior estimate, is also associated to lower attention levels. Additionally, higher marginal cost of
attention T results in less cognitive effort. There is a range over which no effort is allocated to the task;
this is a property derived from the assumption of a constant marginal cost. Additionally, in this problem
an agent is not constrained regarding the amount of attention that can be allocated in order to improve
the precision of the estimate. In particular, as converges to 1, the level of recruited cognitive
resources grows without bound. These are specific properties of this representation and are not
consider to play a key role in the main conclusions of this work.
The explicit solution of the allocation problem permits a clear illustration of how the present framework
allows for multiple configurations of beliefs that are compatible with a given effort level. Informally, this
analysis indicates that high levels of attention can result from believing that the agent did not learn in
the past, believing that learning is hard or believing that it is really difficult to assess how difficult it is to
learn. In the first case, cognitive effort can be high due to the perception of little flow of information in
the past. In these case a low is the key factor that explains cognitive effort. Alternatively, high levels of
attention can be due to the belief that in general information precision is low. In this case, beliefs are
characterized by low values of
and it is concluded that the value of past and current information is
low. Due to the high costs associated to big estimation mistakes, the agent is willing to set a high level of
attention even though the gains in knowledge from each unit of effort is low. Lastly, for a given
assessment of mean precision, high uncertainty regarding the value of precision results in high levels of
recruited attention. In symbols, given
recruitment of resources will be high when is high or,
equivalently, is small.
Using the rules for updating beliefs, the posterior distribution can be expressed through simple
equations. If no attention is recruited, the beliefs system is unchanged.7 With positive levels of
7
Naturally, minor modifications could result in a set-up in which agents perceive signals that do not require the
recruitment of costly cognitive resources. For example, this type of formulation would be convenient for the study
attention, posterior beliefs belong to the same family of distributions with new parameter levels that
are given by:
)
= (1 − E) + E
)
a
=
+
=
1
RT( − 1) S
/
)
=
+
1
2
)
=(
'
+ E(
a
− ) )'
Where E = 1 − RT( − 1) S / with 0 < E < 1. This parameter can be interpreted as determining the
weight given to the signal a in the updated value of the center parameter and the scale parameter .
This coefficient is expressed in terms of the prior beliefs parameters and the cost of attention.
3.1 Heavy tails
This framework characterizes in a tractable way the interaction between assessments of information
precision and recruitment of cognitive resources. In particular, we would like to evaluate its implications
for the shape of the distribution of estimation errors. Consider a benchmark scenario of correct
parameterization of precision then, shocks on beliefs that lower assessed precision lead to higher
cognitive effort and higher precision of the posterior estimate of the parameter . On the other hand,
shocks that increase estimated precision lead to lower attention and lower precision of the posterior.
These links indicate that errors in the assessment of precision are associated with a distribution of
posterior beliefs that, compared to the case of known precision, has higher probability of small and big
errors. That is, the distribution of errors is expected to show higher kurtosis.
We show this property for a particular profile of initial beliefs. First, consider a benchmark case given by
a situation in which the precision multiplier is a constant and precision is known. That is: = c . In
addition consider the sequences satisfying, → ∞, → 0 and
→ . In the limit, cognitive effort is
− c X. Additionally, assume that the center parameter of the prior beliefs is
given by =
V0,
(dC):/;
distributed normally with ~ ( , CGc). That is, errors are normally distributed, the estimate is unbiased
and the precision multiplier parameter is properly calibrated. Then, in the benchmark case, the error
of the posterior distribution has also a normal distribution with zero mean and variance equal to
e
(
)
− )=e
( )) =
f% _
c
CG
,g h
C
d
/
`.
Alternatively, a situation in which precision is uncertain can be considered. Assume that the precision
multiplier parameter is, as before, = c , but = i and = /i with i > 1. In this case the mean
value of precision parameter is correct as Q( ) =
= but the parameter values imply that there
remains uncertainty regarding its value since e ( ) =
= /i. This uncertainty disappears as
− c X. In this way we are able
i → ∞. Then, attention levels would be given by =
V0,
(dC(k' )/k):/;
to detect one of the effects of uncertain precision. Compared to the benchmark case with certain
of situations in which the agent needs to decide between a large number of alternatives whose values are, to a
large extent, independent.
precision, effort levels are higher. For a given expected value of precision, uncertainty regarding the
value of the precision parameter leads to a higher perception of the benefits of allocating effortful
attention. The key term that captures this relationship is given by (i − 1)/i in the denominator.
As indicated in the previous discussion, high attention is expected to generate small errors in the
estimation of . On the other hand, low attention is likely to generate relatively large errors. Hence, the
relative probability of large, medium and small errors is a function of variation in the assessment of
precision. To evaluate the impact of variations in the assessments of precision on the shape of the
distribution of estimation errors a specific form of variations in beliefs is proposed. The following
proposition gives a specific expression for the impact of this link in the case in which assessed precision
takes two values with equal probability. It is shown that variation leads to excess kurtosis, that is, to a
distribution of errors with a sharper peak and fatter tails.
Proposition 3:
Consider a setting in which with probability ½ beliefs are given by l = ( , , , ), with associated
> 0, and with probability ½ beliefs are given by parameters l = ( , , , ) with
attention level
> 0. Then, the distribution of errors presents excess kurtosis if and only if
associated attention level
≠
. Excess kurtosis is given by _
;
;
D: /nGMD : o 'D; /nGMD ; o
`
D: /(GMD : ); MD; /(GMD ; );
.
Proof:
For each configuration of the parameters, the errors are distributed normally, with mean equal to 0 and
standard deviation equal to p = (GMD: ); and p = (GMD; ); respectively. By definition, given normal
G; D:
G; D;
C
C
distributions, excess kurtosis for a given configuration of beliefs equals 0. In symbols
t
− 3 = 0, where
is the fourth moment around the mean. We denote, for each parameter configuration,
Hq,;
r;q
Hq
rq
= 3. Where, with some abuse of notation,
t,
and
t,
Hq,:
r:q
= 3 and
represent the fourth moment around the
mean for each parameter configuration. The variance of the error under unknown parameter
configuration is given by e ( ) ) = QR( ) − ) S = u( ) − ) (.5.( ) | ) + .5.( ) | )x ) = (p +
p )/2. Similarly, the fourth moment around the mean is given by: t = QR( ) − )t S = u( ) −
)t (.5.( ) | ) + .5.( ) | )x ) = ( t, + t, )/2. Hence:
yz {(
=
)
− )=
t
pt
−3=
2n t, + t, o
−3
(p + p )
(p t + 2p p + p t + p t − 2p p + p t )
6(p t + p t )
−3= 3
−3
(p + p )
(p + p )
= 3F
p −p
J = 3F
p +p
/( +
/( +
) −
) +
/( +
/( +
)
J
)
Where in the last equality we replaced the expression for the standar deviation of errors in each
parameter configuration.
Hence there exists excess kurtosis if and only if attention levels are different for each configuration. In
addition, kurtosis increases with the square of the difference in attention levels as a fraction of mean
attention levels
By postulating
=
= the analysis discards the effect of past information regarding the mean
value of and focuses on the estimation errors that are explained by assessment of precision and the
newly acquired information. The result hold for any value of and . The result can be interpreted as
describing the type of uncertainty faced by an observer that tries to assess the frequency of different
errors given limited information regarding agent’s beliefs.
□
This result suggest an additional explanation for the heterogeneity in beliefs about structural
parameters. In this case, heterogeneity is explained by variations in the assessment of precision that
lead to variations in attention. In this case, the heterogeneity result takes the form of differences in the
variance of the estimate. In a context with exogenous attention this causal link between assessment of
precision and properties of the estimate would not be observed. In addition, in the case of normal
identically distributed signals, the estimates of mean and variance are independent.8
The following numerical example illustrates this property for continuous multivariate distribution of
beliefs.
Example 1:
Consider parameter values
= 0 and
parameters distributed according to
= 0.001. Suppose Normal-gamma initial beliefs with
~ g0, h , ~UR0,6S, ~UR1,2S and
AC
C
L
= }~ where }~•(2) and
~~UR0.5,1.5S. Finally, assume costs parameter T = = 0.001. The distributions of the example can be
considered as a noisy version of the beliefs that would result after receiving three signals. Figure 1
presents the log-log plot of the histogram of the estimation errors for a simulation of size 10€ . The
histogram can be compared to the plot of the probability density function of a normal function with the
same mean and variance as the sample errors of the simulation. As can be seen, the errors of the
simulation present tails that are clearly heavier that the fitted normal benchmark.
8
Ver DeGroot 1986.
INSERT FIGURE 1
The property presented in propostion 2 and illustrated in the previous example is similar to, but
different from, the heavy tailed distribution (in the Normal case, Student-t distribution) of an unknown
parameter estimated under uncertain variance. Uncertain variance with exogenous recruitment of
attention results in wider confidence intervals but generates no change in the shape of the distribution
of errors of the estimated parameter. In contrast, this work shows that under endogenous attention,
there are heavy tails in the distribution of errors.
3.2 Implications of heavy tails
So far the analysis has focused on the description of learning processes associated to individual decision
problems. The next two examples intend to show how these finding can prove highly relevant in
situations that transcend the individual decision problem. First, consider a case in which an external
observer, for example a regulator in a financial market, would like to learn about the rate of occurrence
of errors of different magnitude. Suppose that this actor is able to view a sample of errors that took
place in independent instances. It will be shown that a distribution of errors with high kurtosis implies a
less precise learning process. In addition, if the inference is made under the assumption that the shape
of the distribution of errors is not affected by variations in attention levels, that is, if it is assumed there
is no increment in kurtosis then, the bias and mean squared error of the estimate can be shown to
increase in a significant way. This situation is analyzed in example 2 below.
Second, uncertain precision and limited attention allows for the interaction of agents with precise
estimates and low assessment of precision with agents with imprecise estimates and high assessment of
precision. This situation is expected to have implications for the frequency with which exchanges can
lead to extremely unfavorable terms for one side of the transaction. We evaluate this scenario in
example 3.
Example 2:
Consider parameter values and distribution of beliefs as in example1. The exercise focuses the problem
of characterizing the distribution of errors based on independent observations. In particular, consider
the problem of estimating the 99th quantile for the absolute value of errors, that is the 99th quantile for
the distribution of | − | . The estimation of this quantile can be required in analysis that evaluate the
resiliency of systems by evaluating the expected loss in extreme scenarios.9
The distribution of | − | is approximated through the simulation of 10• independent events. The
approximated mean is 0.78 and the variance is approximately 0.57. The value of interest, the 99th
percentile, is approximately 3.42.
The hypothetical learning task is developed based on information regarding errors observed across 100
independent instances. The estimation of the quantile is made through linear interpolation of the
cumulative probability function. After 10€ instances, the mean value of the estimations is 3.12, its
standard deviation is 0.75 and the mean absolute value of the errors equals 0.65.
This performance can be compared to the instance in which the distribution has zero excess kurtosis.
That is, the case of a normal distribution with the same mean and variance as in the previous case. In
this case, the true value of the 99th quantile can be rounded up to 2.52. As before, the estimation is
made after observing 100 independent occurrences. The 10€ estimations result in a mean estimate of
2.41, a standard deviation of 0.18 and mean absolute value of the errors equal to 0.22.
A comparison of these two cases shows that the estimation of errors is less precise in the case of
distributions with high kurtosis. The results suggest that the bias is larger, the estimate is more volatile
and the average error is large in the case of in which the distribution of errors is characterized by high
kurtosis.
At last, consider the original distribution with high kurtosis but imagine that the estimation of the 99th
percentile is made under the assumption that the distribution is normal. In this case the estimate equals
the 99th percentile of a normal distribution given the sample mean and sample variance that results
from the 100 independent occurrences. In this case the mean estimate equals 2.51 while the mean
squared error is 1.02. That is, the bias and the error of the estimate increase significantly if the change in
the shape of the distribution is not taken into account.
Example 3:
Consider a two stage setup with two agents. In the first period each agent obtains a signal about the
distribution of an unknown parameter . Preferences and information are as developed in the model
described in the previous subsection. In the second stage, the agents use this information to value and
exchange a contract whose payoff is proportional to . We assume that given posterior beliefs, agents’
participate in an exchange game. Agents submit their valuation truthfully and exchange the contract
with the agent with the lowest valuation selling the contract. The truthful valuation of the contract by
each agent is a function of the mean and variance assigned to according to posterior beliefs:
9
For example, this type of analysis is carried out in portfolio risk analysis under the label of Value at Risk.
Q( ) − e
( )/2 = ′ −
2
)(
1
′ − 1) ′
That is, we assume mean-variance preferences. Finally, we assume that the price of the contract equals
the average of the valuations of the sides participating in the transaction.
This representation assumes that agents develop the learning task without an explicit representation of
the subsequent exchange stage. Nevertheless, the value of information resulting from the exchange
stage can be reflected in the value of the parameters of the first stage payoff function. On the other
hand, the likelihood of trade and the price of the contract correspond to a situation in which agents
submit their valuation naively without taking into account the value of information implicit in rival’s
actions or the market power implicit in two sided bargaining. Hence, this exercise can be interpreted as
an analysis of the impact of heavy tails in trading when agents are acting naively.
The simulations evaluate the difference between the observed price and a benchmark value. A fair value
for the contract is not well defined in this context in which agents have different beliefs regarding the
precision of their estimates and uncertainty can be reduced through costly flow of information. In this
exercise, the benchmark value is calculated using the valuation function above and average beliefs
regarding the value of and the average dispersion of this estimate. The mean absolute value of the
difference between the price and benchmark value equals 0.66 and its standard deviation equals 0.875.
Consider alternatively a case with known precision where the precision of the estimate of by each
agent equals the average precision in the previous setup and the value function is adjusted so that the
mean valuation is the same as in the previous example. Then the average absolute value of the
difference between terms of trade and the benchmark price of the contract equals 0.62 and its standard
deviation equals 0.47.
This result illustrates how uncertain precision is associated with significantly larger deviations in the
terms of trade from what would be observed when expectations take its average value. The deviations
are larger on average but the largest difference is in terms of the volatility in the deviations. As expected
given extreme distribution of errors, under uncertain precision and limited attention, prices which are
very close to average and greatly off average values are more likely than in the known precision case.
Additionally, the example illustrates the problems with defining a fair price in a context in which
information flow is endogenous and there are differences in the perception of the value of this
information.
4. The two period problem
This section presents a simple extension that allows for an analysis of the inter-temporal connections in
learning processes. The key assumptions of uncertain precision and endogenous determination of
attention are maintained. In this context, it can be assessed how innovations in beliefs are associated to
adjustments in the level of attention and, in this way, to the likelihood of further innovations in beliefs.
With this objective, a two period model is analyzed. The payoff function of each period coincides with
the payoff function presented in the previous section. Each period contains two stages, the first stage
entails the determination of attention levels and the second stage involves the perception of an
informative signals and its incorporation in updated beliefs.
In the first stage of the first period, the initial level of attention is determined. Given the level of
attention , the second stage of the first period involves perceiving signal and updating initial beliefs,
characterized by parameters l = ( , , , ), according to Bayes rule to get new parameters l ) =
( ) , ) , ) , ) ).
The second period problem is identical to the one period case analyzed in the previous section. That is,
given initial beliefs l ) = ( ′, ′, ′, ′), the optimal level of attention is given by
′=
V0, (d(3)'
)L)):/;
− ′X. After perceiving the associated informative signal ’, beliefs are
updated through Bayes’ rule resulting a a new set of parameters l )) = ( ′′, ′′, ′′, ′′). In particular, the
new center parameter is given by
)
′=
G)H)MD)N)
.
G)MD)
4.1 Correlated volatility
The following result deals with a relationship between the innovations in each period. More specifically,
the result establishes a connection between changes in the value of the center parameter in the first
and second periods. For this purpose, an explicit expression is established for the variance of changes of
the center parameter in the second period conditional on the change in this parameter in the first
period. That is, the analysis focuses on the pattern followed by the expression e „) ( )) − ) | ) − ) as
a function of ) − . The next proposition identifies three regions which determine whether the
conditional variance is a constant or an increasing function of ) − .
Let e by such that if e = |
)
− |=
D|N'H|
GMD
then ((
)
− 1) ) T)'
/
=
)
. That is, given initial beliefs and
attention level, e is the critical level above which positive levels of cognitive effort are recruited in the
second period.
Proposition 4:
Consider an initial profile of beliefs l = ( , , , ) and initial attention level . Additionally, let
l ) = ( ) , ) , ) , ) ) represent the updated parameters given signal . There exist e ≥ 0 such that
e „) ( )) − ) | ) − ) is zero if | ) − | ≤ e and e „) ( )) − ) | ) − ) is strictly increasing in | ) − |
for the range e < | ) − |.
Proof:
=g
)
First, note that
'
+
GD (N'H);
h
GMD
'
=g
+
'
G(GMD) (H)'H);
h
D
'
decreases with | ′ − |. Also,
for RT( ′ − 1) ′S' / ≤ ′ we have ’ = 0, hence the variance is zero for ) − satisfying the condition
indicated in the proposition for this range of | ′ − |. Let e be such that RT( ′ − 1) ′S' / ≤ ′ when
| ′ − | = e, equivalently let e =
Alternatively, for |
)
„K (
))
−
)| )
VGK MDKX FVDK(3K'
DK
)
. Finally, for RT(
− )=e
)L K
(GMD)A
GD
− | > e or equivalently RT(
hence ’ decreases with
e
F0, _2
X + VGK (3K'
VGK MDK (
DK
)L K
)
)
−
)
g − hT − 2
− 1) ) S'; >
− 1) ) S
) )X
=V
:
'
;
DK
G K MD K
XJ = VGK MDKX GK (3K'
DK
(GMD)
)L K
>
:
)
,
X RQ((
)
GD
)
'
`
/
J.
we have ’ = RT(
− ) ) + Q((
)
)
− 1) ) S'; −
:
)
,
− ) )S =
. Where the variance was calculated for the
Normal-gamma case. Replacing the expression for the second period attention level:
e
„K (
))
−
)| )
− ) = GK (3K '
†R‡Dˆ‰K nHKK 'HKŠHK 'HoS
RT(
)
†LK
− 1) ) S'; >
:
=−
)
G K (3K '
. Since
)
)L K
;
)LK
− V(3K '
d
+ _
)L K
d
(3K '
X
/
‹
)L K
. With derivative given by:
:
;
` =−
d :/;
(3K ' ):/; L K
‹/; K
G
V(d(3K'
)L K ):/;
decreases with | ′ − | this proves the proposition.
−
)
X < 0 if
□
The proposition above identifies a chain of effects that determine the relationship between innovations
in each period. First, surprises, high values of | ) − |, lead to downward revisions in the beliefs
regarding precision; this is captured by the fall in ′. Second, subject to the condition that assures
positive levels of attention, these downward revisions are associated with higher attention levels ’. The
last link, the one connecting effort level and volatility in beliefs, is characterized by two effects. On one
hand, higher effort levels imply that new incoming information would receive more weight, hence the
estimate of is expected to vary more intensely in the second period. On the other hand, the
information supplied by the signal will be less noisy dampening the volatility in the estimate of . The
previous result shows that information is processed in a way such that the first effect dominates.
It is worth emphasizing that this intertemporal connection between variations in estimates during
different periods would not hold in the case in which precision is known or attention is exogenously
determined. First, variation in attention is the source of the two channels that generate changes in the
conditional variance. Both the signal weight and the precision of the signal are modified through
variation in attention levels. Second, under known precision there is no change in the scale parameter
which, in the analysis above generates variation in attention levels. Hence in these cases, large
informational shocks would not lead to variations in the expected size of innovations in beliefs in future
periods.
Example 4:
Consider a numerical example that illustrates how the conditional variance depends on the variation of
the center parameter in the previous period. Suppose initial parameter values given by l =
( , , , ) = (0,3,1.5,1), marginal costs T = 0.1, precision given by = 1 and initial effort is given by
= 1. In this way, for a given signal the updated beliefs involve parameter values:
l ) = ( ′, ′, ′, ′) = ( , 4, 2, 1/(1 + 3
+
€N ;
h
t
/
/8). As a result, effort level is given by
N
t
− 4S. Finally, the resulting conditional variance equals:
0 f. (8/5) / > | |
1
. 1.€
e „) ( )) − ) | ) − ) =
−
.€ f. (8/5)
•3 _1 + 3 `
3
Ž
2 _1 +
`
8
8
•
•
Ž
= maxR0, g10 +
/
>| |
INSERT FIGURE 2
The example shows that for signal values sufficiently close to zero, | | < (8/5) / there is no learning in
the second period and hence the variance is zero. In this case, the nonnegative constraint is binding, the
agent has a higher valuation for attention than for the information that the allocation of attention
generates. For this range of parameters, only irreversibility of learning processes keeps the agent from
accepting a higher level of ignorance in exchange for receiving a payment equal to the cost (shadow
price) of cognitive resources required to restore the original level of confidence.
If incoming information is sufficiently surprising, if it satisfies (8/5) / < | |, variance increases with the
size of the first period surprise. The result holds because the effect of higher weight assigned to the
incoming signal prevails over the effect of receiving less noisy signals.
4.2 Delay and non-monotonicity in cognitive effort
In this subsection the possibility of non-monotonicity in cognitive effort is evaluated. The two-period
problem can be represented as:
max −Q„ R( ′ − ) | S − T + Q„ [Q„) [−(
D
))
− ) −T
(l ) )|
(l ) )S| S
Where l’ is the updated set of parameter given initial beliefs l, initial attention level 4, the costly signal
and the costless signal ‘. The function (l ) ) reflects the optimal decisions of the second period, that
=
is, if ) > (T( ′ − 1) ′)' / then (l ) ) = 0, alternatively, if ) < (T( ′ − 1) ′)' / then
(T( ) − 1) ) )' / − ’.
Two effort levels for the first period seem to constitute natural benchmarks. One corresponds to the
choice of a myopic agent that only considers the gains to be made in the current period; let ’ (l)
denote this level. The other case refers to a naïve analysis in which the attention level is selected
ignoring that additional effort levels can be exerted in the second period; let ! (l) represent this level.
The solution to the optimal level of attention in the first period ∗ is expected to correspond to a value
that is larger than ’ . This would be the case if the marginal return from allocating additional effort is
larger than in the myopic case, since the resulting information can be helpful in a second period.
Additionally, ∗ is expected to be smaller than ! . This would be the case if the marginal return to effort
in the standard problem is lower than the one in the naïve case due to the application of effort in the
second period.
This exercise can be compared to the case of known precision. It is easy to show that, given the constant
marginal cost of information and known precision, the optimal allocation of attention involves allocating
all effort level in the first period. In contrast, with unknown precision, information feedback can lead to
reassessments that lead to additional effort. A downward reassessment of precision is a necessary but
not a sufficient condition for the allocation of additional attention in a subsequent period. Previous
information will, on average, result in lower variance assigned to the estimates of the and . For nonmonoticities in cognitive effort, the downward revision in precision needs to be large enough so that it
overpowers the effect of accumulated information from previous periods.
The following numerical exercise provide example of instances in which non-monotonicities are
observed.
Example 5:
Consider parameter values
= 1 and
parameters distributed according to
}~•(% − 1) and ~~U[0.5,1.5S.
= T = 0.001. Suppose Normal-gamma initial beliefs with
~ g0, h , ~U[0,6S, ~U[1,2S and /(2 )~}~ where
AC
According to the simulations, in the case in which the first period effort is given by ∗ (l), the expected
value of precision decreases, that is, posterior beliefs satisfy
> ′ ′, with probability 44%.
Additionally, in this case, cognitive effort in the second period is positive with probability 0.24 and
higher than first period effort with probability 0.07.
Alternatively, with first period effort equal to ∗∗ (l), positive effort in the second period is observed
with probability 0.11 and the second period effort is higher that first period effort with probability 0.03.
That is, even though past information flow tends to modify beliefs in a way that hinders the recruitment
of cognitive effort, the occurrence of large revisions of precision in the examples analyzed cause positive
and, sometimes, incremental levels of effort. The frequency of non-monotonicities would be higher in
contexts in which it is believed that state of the world can change and, as a result, the value of past
information decreases with time. In this case, the variance assigned to the estimates increases more
slowly and a larger fraction of negative revisions of precision would generate non-monotonocities in
effort level.
5. Concluding remarks
The present work develops a framework in which an agent is learning about multiple aspects. The value
of a parameter and the precision of its estimate are uncertain. Additionally, the agent determines
endogenously the level of cognitive resources allocated to this learning task. The results indicate that
this setup is associated with high kurtosis in the distribution of errors, non-monotonic recruitment of
cognitive resources and correlated volatility of the estimate of the parameter.
While this work does not focus on economic applications, it is reasonable to consider that numerous
economic scenarios are characterized by complex forms of uncertainty. In addition, in many situations
economic agents have the ability to allocate cognitive resources to advance their understanding if that is
judged convenient. Hence, the insights considered in this work can prove relevant in this type of
situations.
There are several aspects that have not been dealt with in the present work and whose consideration
seems valuable. First, this work deals with a setup in the unknown parameters are stationary. This can
be interpreted as an analysis of learning surrounding a large innovation in the system. More generally, a
system in which the state of the world is not stationary could be considered. This environment requires
defining a rule through which the learner acknowledges these stochastic changes in the underlying
parameters.
Additionally, our set up assumes that signals are not correlated but, in any real world situation, the
learner needs to assess the level of correlation of the signals received. This is another aspect that could
be considered in the representation of learning processes.
The analysis presented above considers individual learning process and characterizes its properties.
Representations in which this type of learning processes are developed while interacting with other
agents is a tool that needs to be developed for the application of these insights to market dynamics. This
might requires extending this framework to multiple parameter settings in which learning is carried out
for both structural and strategic parameters.
The assumptions in this work provide ample room to learning from collecting data and introspection.
This implies bounds on the range of validity of the analysis. The approach focuses on situations in which
application of sufficient attention can significantly improve assessments of the value of different actions.
For example, this captures de idea that investment or saving decisions can be greatly improved through
careful analysis of comprehensive evidence. But in many cases, learning from experience can be the
main source of knowledge. This is especially the case when the aspects that remain to be learned are so
complex that there is not enough capacity to advance understanding in a significant way through logical
reasoning and collection of data.
6. Bibliography
Avery, C. and Peter Zemsky, 1998, Multidimensional Uncertainty and Herd Behavior in Financial
Markets, American Economic Review, Vol. 88, No. 4 (Sep., 1998), pp. 724-748
Barber, Bred and Odean, Terrance. “Boys will be boys: Gender, overconfidence, and common stock
investment.” Quarterly Journal of Economics, February 2001, 116(1), pp. 261-92.Sims (2003)
Camerer, Colin and Lovallo, Dan. “Overconfidence and excess entry: an experimental approach.”
American Economic Review, March 1999, 89(1), pp. 306-318.
DeGroot, Morris, 1986, Probability and Statistics, Addison Wesley Publishing Company.
Hommes, C, 2006, "Heterogeneous Agent Models in Economics and Finance", in Leigh Tesfatsion and
Kenneth L. Judd (editors), Handbook of Computational Economics, Vol. 2: Agent-Based Computational
Economics, Handbooks in Economics Series, North-Holland/Elsevier, Amsterdam.
Mackowiak, Bartosz, and Mirko Wiederholt, “Optimal Sticky Prices under Rational Inattention,”
American Economic Review 99: 769-803 (2009)
Malmendier and Tate(2005) CEO Overconfidence and Corporate Investment (with G. Tate). Journal of
Finance, December 2005, vol. 60 (6), pp. 2661-2700.
Mandelbrot, B., 1963, The variation of certain speculative prices, Journal of Business, Vol. 36, pp. 394419.
Peng, L., and W. Xiong, 2006, Investor attention, overconfidence and category learning, Journal of
Financial Economics, Vol. 80, Issue 3.
Sanguinetti, P. and D. Heymann, 1998, Business Cycles From Misperceived Trends, Economic Notes, Vol
27, No 2.
Sims, C, 2003, Impications of Rational Inattention, Journal of Monetary Economics, Vol. 50, pp. 665-690.
Thurner, S., J. Farmer and J. Geanokoplos, 2011, Leverage causes fat tails and clustered volatility, Cowles
Foundation Discussion Paper No. 1745R
Weitzman , M., 2007, Subjective Expectations and Asset-Return Puzzles, American Economic Review,Vol.
97, No. 4
Woodford, M, 2012, Inattentive Valuation and Reference Dependent Choice, Columbia University,
mimeo.
6
4
2
0
log(A $counts[min:max])
8
10
Figure 1: Log-log plot of histogram of estimation errors (circles) vs. normal fitted distribution (line)
-1
0
1
log(A$breaks[min:max])
2
3
-0,50
μ'-μ
0,00
0,50
1,00
1,50
2,00
2,50
3,00
3,50
4,00
4,50
5,00
1,8
-1,00
1,6
-1,50
1,4
-2,00
1,2
-2,50
1
-3,00
0,8
-3,50
0,6
-4,00
0,4
-4,50
0,2
0
-5,00
Figure 2: Conditional Variance – Numerical Example
var( μ''-μ'|μ'-μ)