Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Learning under limited attention and unknown precision of information J. Daniel Aromí IIEP - UBA - Conicet. UdeSA June 2012 (First version April 2012) Abstract: A Bayesian agent is simultaneously learning about the payoff associated to a set of feasible actions and the precision of the information used to generate the estimate. Learning is endogenous as the agent selects different levels of scarce attention as a function of perceived precision. Variations in attention due to different assessments of precision result in heavier tails and sharper peaks in the distribution of estimation errors. Unlike the known precision case, learning activity can present delays and nonmonotonicities due to revisions in the perception of precision. Another difference with the known precision case is that the sequence of updates shows correlated volatility since a surprising signal leads to a higher level of attention which affects the distribution of future revisions. JEL Codes: D81, D84 Contact: [email protected] 1. Introduction In any economic setting, decision making is assisted by learning processes that generate the representations of the environment and, in particular, the value associated to different actions. Under limited cognitive resources, these representations will necessarily be approximations whose accuracy depends on the cognitive resources recruited for this task. For example, investment decisions demand assessments of profitability. These assessments are built acquiring information and inferring its implications. Relevant information includes features and outcomes of past projects and the evolution of plausible determinants of profitability. This information can be acquired with different levels of breadth and precision. Additionally, the inference of the implications of available information can be made with different levels of thoroughness. As a result the accuracy of the assessment will be a function of the allocation of effort to the task. In addition, for a given level of cognitive effort, the precision of the assessment will depend on specific features of the task. These aspects include the quantity of relevant factors, the complexity of their interaction, the ease with which main factors can be identified, and the facility with which their joint influence can be established. It is evident that these features of the specific task cannot be known with exactitude before developing the learning task. It is possible that random factors such as the perceived consistency of available information will determine the initial assessment of precision. As a result, determining the precision of the representation is part of the learning process. In this way, learning involves the multiple tasks of allocating cognitive effort, acquiring a representation of the environment and assessing the accuracy of the representation. These three aspects evolve jointly as part of a multifaceted process. In this process, attention is a function of judgments on accuracy. Assessed accuracy is a function of the consistency of incoming information. And, closing a cycle, the rate of arrival of information is determined by attention levels. This work provides a tractable framework that represents this type of process. A Bayesian framework is provided in which beliefs with respect to payoffs and precision of information are updated in response to information flow. In addition, cognitive resources will be recruited in response to the assessment of precision. This analysis constitutes one instance in which the relevance of learning processes is magnified due to the existence of multidimensional uncertainty and endogenous cognitive decisions. The present study reveals that the joint presence of precision uncertainty and limited attention has implication for the rate at which different magnitudes of estimation errors will be observed. Variations in the assessments of precision lead to variations in attention levels or, in other words, different rates of information flow. Beliefs consistent with high assessment of precision lead to low attention and increasing likelihood of relatively large errors while beliefs consistent with low assessments of precision lead to high levels of attention increasing the chances of relatively small mistakes. Hence, due to endogenous variation in attention levels, small errors and large errors will occur at a higher rate. More specifically, in the exercise developed below, despite starting from a Gaussian set-up, the distribution of errors can present high levels of kurtosis. Fat tails and sharp peaks are observed as the likelihood of extreme and small errors is higher than what would be expected when evaluating a fitted normal distribution. In the benchmark case, with known precision, learning can be characterized as a relatively tranquil progression in which uncertainties gradually disappear and cognitive effort decreases as a consequence. In contrast, with unknown precision, effort levels need not decrease with time. Learning can lead to reassessments of the scope of ignorance. New information with surprising content can lead to posterior beliefs in which precision is judged to be lower than previously held. This downward reassessment of precision can cause an increment in the recruitment of cognitive resources with respect to what was observed in previous periods. The opposite result holds in the case in which the initial information is unsurprising. That is, little news brings about little news. As a result, the joint presence of precision uncertainty and limited attention results in correlated volatility. The analysis is developed under the assumption that the agent updates its beliefs using Bayes’ rule. One key element that guarantees tractability is the identification of a convenient representation of beliefs and signals. The current analyses exploits the fact that given normally distributed signals and Normalgamma prior beliefs, the posterior beliefs are also given by a distribution belonging to the Normalgamma family of distributions. That is, Normal-gamma prior distribution and Normal-gamma posterior distribution are conjugate distributions. For the case of identically distributed signals this is a standard result in statistics.1 This work develops and makes use of a simple extension of this result for the case of effortful signals with varying levels of precision. The framework developed in the present study provides an expanded language for representing learning dynamics. In particular, shocks to expectations can take different forms. The most common form of expectational errors refers to the noise in the estimates of payoff function parameters. In this work, additional forms of belief errors are provided, these are errors regarding the assessments of precision of current knowledge and incoming signals. This distinction is relevant, since different learning paths are associated to different profiles of error in expectations. With respect to related contributions, this work can be linked to the literature on systematic errors in the calibration of confidence and the literature on rational inattention. In coincidence with the first group of contributions2, this work allows for discrepancies between correct confidence intervals and assessed confidence intervals. In contrast with this literature, this work does not focus on biases in the estimation of confidence but on consequences associated to errors in its estimation. In line with the literature on rational inattention3, this work allows for endogenous determination of costly information flow. A key distinction of this contribution is that learning is performed in a context in which the proper allocation of attention is not known. 1 For an exposition of this result see DeGroot (1986). For application in economic see Malmendier and Tate(2005) for overconfidence by CEOs, Camerer and Lovallo (1999) for overconfidence and entry and Barber and Odean (2001) for overconfidence in financial investment decisions. 3 See for example Sims (2003), Woodford (2012), Mackowiak and Wiederholt (2010), Peng and Xiong (2006). 2 More generally, this work can be linked to contributions that measure or explain heavy tailed distributions and correlated volatility in economic time series and, specially, financial time series4. This work provides a new mechanism that can explain heavy tails and clustered volatility. From this perspective, learning about precision and endogenous recruiting of cognitive resources constitute one potential explanation of these statistical properties of economic time series. This work suggests that the consideration of models of learning that allow for rich forms of uncertainty can be relevant in economic analysis. In this respect this work follows contributions such as Avery et al. (1998), Sanguinetti et al. (1998) and Weitzman (2007). Avery et al. (1998) develop a framework in which multidimensional uncertainty allows for herding and mispricing in financial markets. In one of their examples, mispricing can occur due to joint presence of uncertainty regarding three aspects: the occurrence of a shock, its effect and the quality of trader’s information. Sanguinetti et al. (1998) consider an economy in which a shock has changed its long term growth path; the agents are uncertain about the location of the new growth path and the rate of adjustment. As a result of the two parameter uncertainty, non-monotonic trajectories can be observed as mistakes in initial beliefs can be associated, temporarily, to inferences that increase certain prediction errors. Weitzman (2007) suggest that asset pricing puzzles can be rationalized by models that allow for uncertainty regarding the growth rate of the economy and, additionally, uncertainty regarding the evolution of the parameters that determine the stochastic path. The next section develops the model and presents properties of the learning process. In section 3, the decision problem is analyzed. Section 4 considers the model in the case of multiple periods. Concluding remarks are provided in the last section. 2. Beliefs and Bayesian learning We consider settings in which information allows an agent to select fitter actions. The agent is uncertain about the value of a parameter that affects the relative payoffs of available actions. In addition, we make two key assumptions regarding information and learning. First, the agent is uncertain about the value of a parameter which governs the precision of information. Second, the agent receives informative signals whose precision can be controlled by the allocation of effortful attention. Prior beliefs are assumed to belong to the Normal-gamma family of distributions. That is, beliefs are given by a bi-variate four parameter distribution function: ( , )~ ( , , , ). This means that the precision parameter is believed to be distributed according to a Gamma distribution with shape parameter and scale parameter . Hence the mean precision is believed to be given by 4 For an early contribution see Mandelbrot (1963). Agent based models in which agent used simple rules of thumb have also been developed to explain these statistical properties in financial markets, for example see Hommes (2006). Thurner, Farmer and Geanokoplos (2011) suggest that fat tails and clustered volatility in finance can be explained by leverage levels. and the mean variance of the precision is . Additionally, conditional on the value of the precision parameter, , the parameter is believed to be distributed normally with mean and variance 1⁄ . Hence, under these beliefs, average is given by and the variance of is believed to be 1/(( − 1) ). Given and , signals are distributed normally with mean and variance 1⁄ were represent the effortful attention allocated to the acquisition of the signal. Hence, for this type of signal, higher effort is associated to more precision or lower variance. In the jargon of Normal-gamma distributions, is known as the center parameter and constitutes the mean expected value of independently of the value of the precision parameter. On the other hand, the parameter is known as the precision multiplier and it determines the relative informational value of the prior estimate of versus the incoming signals. The role of the shape parameter and scale parameter of the Gamma distribution can be described by observing that when → ∞ and → 0 with → , there is convergence to the case in which the precision parameter is known since the variance of the precision parameter converges to 0, → 0. IN other words, for a given mean value of the precision parameter , a high shape parameter and low scale parameter are associated with a relatively certain value of the precision parameter while a low shape parameter and a high scale parameter are associated with high uncertainty about the precision of information. One reason to focus on prior beliefs of the Normal-gamma family is that learning from a group of independent normally distributed signals, with uninformative prior beliefs regarding mean and variance of signals, results in beliefs that can be represented as a bi-variate distribution of the Normal-gamma ! , with ~ ( , 1/ ) the sample mean, ̅ = ∑! /%, is distributed family. Given signals normally with mean and variance 1/( %). The product of the precision parameter and the sample variance, & = ∑! ( − ̅ ) /%, is distributed according to a Chi-squared distribution with % − 1 degrees of freedom. The Chi-squared distribution is a special case of the Gamma distribution... 2.1 Bayesian updating To update the beliefs after observing the effortful signal we present a simple extension of the standard result for the case in which the precision of the signal is a function of the attention level . Proposition 1: Consider prior beliefs over ( , ) given by ( , , , ) and signal with distribution ' ( ′, ′, ′, ′) where the ( , ( ) ). Then, the posterior beliefs are given by updated parameters are given by: ) = ) = + + + = ) ) =, ' + + 1 2 2( + ) ' ( − ) - Proof: Let . ( ) = /(0)12 3' 4 5 6 ' ,. ( | )= (89):/; ( : <); 5=(>?@); ; 4' and .A ( | , ) = .( , | ) = . ( ). ( | ).A ( | , ) (B9):/; ( : <); >?@); ; 4 'CD( . Then, La siguiente identidad puede ser verificada: ( − ) + ( − ) = ( + )( ) − ) + + ( − ) Hence, the conditional probability density function is given by: = EF ′ / 4 (H)'I); 'CG) J 3K ' GD (N'H); 'C, M GMD L 4 Where E is a contant that does not depend on or . The expression inside brackets is proportional a function that, up to a constant coefficient, equals the probability density function of a normal random variable with mean and precision ′ . The expression after the brackets equals, up to a constant coefficient, the probability density function of a random variable distributed according to a gamma distribution with shape parameter ′ and scale parameter ′. □ 3. The allocation of attention We analyze a decision where there exists uncertainty regarding payoff function parameter and the precision of information. In this section, we analyze a single period problem. In the next section, we analyze a multiple period problem. We consider a problem of an agent that can acquire information in order to make better decisions but faces an opportunity cost associated to this acquisition. The payoff of the agent is separable into a term reflecting the fitness of the actions and a term capturing the opportunity cost of the cognitive resources. The first term is a function of the difference between the action selected in each period, &Oℝ, and the value of an unknown parameter, . More precisely, this loss equals the mean squared difference between the action selected and the parameter value. The loss function that depends on the square of the error could be thought as capturing instances in which big errors can lead to higher probability of big losses such as death and bankruptcy. Given the symmetry of the distribution representing beliefs regarding , this loss is minimized by selecting an action equal to the posterior mean value of , that is & = QR S = ′. We will assume that the agent always sets this optimal value for the action and in this way we focus exclusively on the selection of attention levels. The second term captures the assumption that, while allocating more attention leads to more precise estimates of , there is an associated constant marginal cost equal to T. A natural interpretation is that allocating attention is costly because this implies reducing the level of attention allocated to other tasks.5 Hence, once selecting the optimal value for &, the payoff function is given by: U( ) = −QR( ′ − ) | S − T The problem is reduced to one in which the agent selects the attention level in order to minimize this loss function. The expectation in the expression above refers the mean value as determined by Bayesian updating of the agent prior beliefs for a given value of the effort level. In particular, the posterior mean expected value of the parameter ′ is the value updated through Bayes’ formula. The expected value can be worked out to obtain a simple expression: QR( ′ − ) | S = Q F , = 1 V ( + ) QR( = W + + − ) S+ 1 ( + )( − 1) − - | J QR( − ) SX Note that this problem is well defined only when the shape parameter is above 1, otherwise the variance assigned to is not a real number. In other words, for a given value of the mean expected precision, , the problem is not well defined when the uncertainty regarding variance is too high, low and high .6 The single period problem takes a simple functional form: max − \]W 1 −T ( + )( − 1) The first order conditions are given by: 1 −T ≤0 ( + ) ( − 1) 5 _ 1 − T` ( + ) ( − 1) =0 Alternative formulations could allow for increasing marginal costs and capacity constraints. Most of the findings are expected to be observed under this alternative, less tractable, formulations. 6 The existence of a range of parameter values for which the variance is not a real number suggest that alternative formulations for the penalty associated to errors in estimation. For example, the loss from unfit actions could be calculated using percentiles of the estimated distribution of prediction errors. The first order conditions are sufficient given the concavity of the objective function and the affine inequality constraint. The condition can be solved for an explicit expression of the optimal effort level. We summarize this result in proposition 2. Proposition 2: Consider the problem of minimizing loss function U(&, ) = −QR(& − ) | S − T given Normal-gamma beliefs and signals distributed normally with precision 1/( ). The optimal level of attention is given by: = _0, 1 RT( − 1) S / − ` This expression shows that attention levels are a decreasing function of prior belief parameters of and . This means that increases in any of the parameters that determines the mean value of precision, or , result in lower effort levels. Increases in the precision multiplier , that results in lower variance of the prior estimate, is also associated to lower attention levels. Additionally, higher marginal cost of attention T results in less cognitive effort. There is a range over which no effort is allocated to the task; this is a property derived from the assumption of a constant marginal cost. Additionally, in this problem an agent is not constrained regarding the amount of attention that can be allocated in order to improve the precision of the estimate. In particular, as converges to 1, the level of recruited cognitive resources grows without bound. These are specific properties of this representation and are not consider to play a key role in the main conclusions of this work. The explicit solution of the allocation problem permits a clear illustration of how the present framework allows for multiple configurations of beliefs that are compatible with a given effort level. Informally, this analysis indicates that high levels of attention can result from believing that the agent did not learn in the past, believing that learning is hard or believing that it is really difficult to assess how difficult it is to learn. In the first case, cognitive effort can be high due to the perception of little flow of information in the past. In these case a low is the key factor that explains cognitive effort. Alternatively, high levels of attention can be due to the belief that in general information precision is low. In this case, beliefs are characterized by low values of and it is concluded that the value of past and current information is low. Due to the high costs associated to big estimation mistakes, the agent is willing to set a high level of attention even though the gains in knowledge from each unit of effort is low. Lastly, for a given assessment of mean precision, high uncertainty regarding the value of precision results in high levels of recruited attention. In symbols, given recruitment of resources will be high when is high or, equivalently, is small. Using the rules for updating beliefs, the posterior distribution can be expressed through simple equations. If no attention is recruited, the beliefs system is unchanged.7 With positive levels of 7 Naturally, minor modifications could result in a set-up in which agents perceive signals that do not require the recruitment of costly cognitive resources. For example, this type of formulation would be convenient for the study attention, posterior beliefs belong to the same family of distributions with new parameter levels that are given by: ) = (1 − E) + E ) a = + = 1 RT( − 1) S / ) = + 1 2 ) =( ' + E( a − ) )' Where E = 1 − RT( − 1) S / with 0 < E < 1. This parameter can be interpreted as determining the weight given to the signal a in the updated value of the center parameter and the scale parameter . This coefficient is expressed in terms of the prior beliefs parameters and the cost of attention. 3.1 Heavy tails This framework characterizes in a tractable way the interaction between assessments of information precision and recruitment of cognitive resources. In particular, we would like to evaluate its implications for the shape of the distribution of estimation errors. Consider a benchmark scenario of correct parameterization of precision then, shocks on beliefs that lower assessed precision lead to higher cognitive effort and higher precision of the posterior estimate of the parameter . On the other hand, shocks that increase estimated precision lead to lower attention and lower precision of the posterior. These links indicate that errors in the assessment of precision are associated with a distribution of posterior beliefs that, compared to the case of known precision, has higher probability of small and big errors. That is, the distribution of errors is expected to show higher kurtosis. We show this property for a particular profile of initial beliefs. First, consider a benchmark case given by a situation in which the precision multiplier is a constant and precision is known. That is: = c . In addition consider the sequences satisfying, → ∞, → 0 and → . In the limit, cognitive effort is − c X. Additionally, assume that the center parameter of the prior beliefs is given by = V0, (dC):/; distributed normally with ~ ( , CGc). That is, errors are normally distributed, the estimate is unbiased and the precision multiplier parameter is properly calibrated. Then, in the benchmark case, the error of the posterior distribution has also a normal distribution with zero mean and variance equal to e ( ) − )=e ( )) = f% _ c CG ,g h C d / `. Alternatively, a situation in which precision is uncertain can be considered. Assume that the precision multiplier parameter is, as before, = c , but = i and = /i with i > 1. In this case the mean value of precision parameter is correct as Q( ) = = but the parameter values imply that there remains uncertainty regarding its value since e ( ) = = /i. This uncertainty disappears as − c X. In this way we are able i → ∞. Then, attention levels would be given by = V0, (dC(k' )/k):/; to detect one of the effects of uncertain precision. Compared to the benchmark case with certain of situations in which the agent needs to decide between a large number of alternatives whose values are, to a large extent, independent. precision, effort levels are higher. For a given expected value of precision, uncertainty regarding the value of the precision parameter leads to a higher perception of the benefits of allocating effortful attention. The key term that captures this relationship is given by (i − 1)/i in the denominator. As indicated in the previous discussion, high attention is expected to generate small errors in the estimation of . On the other hand, low attention is likely to generate relatively large errors. Hence, the relative probability of large, medium and small errors is a function of variation in the assessment of precision. To evaluate the impact of variations in the assessments of precision on the shape of the distribution of estimation errors a specific form of variations in beliefs is proposed. The following proposition gives a specific expression for the impact of this link in the case in which assessed precision takes two values with equal probability. It is shown that variation leads to excess kurtosis, that is, to a distribution of errors with a sharper peak and fatter tails. Proposition 3: Consider a setting in which with probability ½ beliefs are given by l = ( , , , ), with associated > 0, and with probability ½ beliefs are given by parameters l = ( , , , ) with attention level > 0. Then, the distribution of errors presents excess kurtosis if and only if associated attention level ≠ . Excess kurtosis is given by _ ; ; D: /nGMD : o 'D; /nGMD ; o ` D: /(GMD : ); MD; /(GMD ; ); . Proof: For each configuration of the parameters, the errors are distributed normally, with mean equal to 0 and standard deviation equal to p = (GMD: ); and p = (GMD; ); respectively. By definition, given normal G; D: G; D; C C distributions, excess kurtosis for a given configuration of beliefs equals 0. In symbols t − 3 = 0, where is the fourth moment around the mean. We denote, for each parameter configuration, Hq,; r;q Hq rq = 3. Where, with some abuse of notation, t, and t, Hq,: r:q = 3 and represent the fourth moment around the mean for each parameter configuration. The variance of the error under unknown parameter configuration is given by e ( ) ) = QR( ) − ) S = u( ) − ) (.5.( ) | ) + .5.( ) | )x ) = (p + p )/2. Similarly, the fourth moment around the mean is given by: t = QR( ) − )t S = u( ) − )t (.5.( ) | ) + .5.( ) | )x ) = ( t, + t, )/2. Hence: yz {( = ) − )= t pt −3= 2n t, + t, o −3 (p + p ) (p t + 2p p + p t + p t − 2p p + p t ) 6(p t + p t ) −3= 3 −3 (p + p ) (p + p ) = 3F p −p J = 3F p +p /( + /( + ) − ) + /( + /( + ) J ) Where in the last equality we replaced the expression for the standar deviation of errors in each parameter configuration. Hence there exists excess kurtosis if and only if attention levels are different for each configuration. In addition, kurtosis increases with the square of the difference in attention levels as a fraction of mean attention levels By postulating = = the analysis discards the effect of past information regarding the mean value of and focuses on the estimation errors that are explained by assessment of precision and the newly acquired information. The result hold for any value of and . The result can be interpreted as describing the type of uncertainty faced by an observer that tries to assess the frequency of different errors given limited information regarding agent’s beliefs. □ This result suggest an additional explanation for the heterogeneity in beliefs about structural parameters. In this case, heterogeneity is explained by variations in the assessment of precision that lead to variations in attention. In this case, the heterogeneity result takes the form of differences in the variance of the estimate. In a context with exogenous attention this causal link between assessment of precision and properties of the estimate would not be observed. In addition, in the case of normal identically distributed signals, the estimates of mean and variance are independent.8 The following numerical example illustrates this property for continuous multivariate distribution of beliefs. Example 1: Consider parameter values = 0 and parameters distributed according to = 0.001. Suppose Normal-gamma initial beliefs with ~ g0, h , ~UR0,6S, ~UR1,2S and AC C L = }~ where }~•(2) and ~~UR0.5,1.5S. Finally, assume costs parameter T = = 0.001. The distributions of the example can be considered as a noisy version of the beliefs that would result after receiving three signals. Figure 1 presents the log-log plot of the histogram of the estimation errors for a simulation of size 10€ . The histogram can be compared to the plot of the probability density function of a normal function with the same mean and variance as the sample errors of the simulation. As can be seen, the errors of the simulation present tails that are clearly heavier that the fitted normal benchmark. 8 Ver DeGroot 1986. INSERT FIGURE 1 The property presented in propostion 2 and illustrated in the previous example is similar to, but different from, the heavy tailed distribution (in the Normal case, Student-t distribution) of an unknown parameter estimated under uncertain variance. Uncertain variance with exogenous recruitment of attention results in wider confidence intervals but generates no change in the shape of the distribution of errors of the estimated parameter. In contrast, this work shows that under endogenous attention, there are heavy tails in the distribution of errors. 3.2 Implications of heavy tails So far the analysis has focused on the description of learning processes associated to individual decision problems. The next two examples intend to show how these finding can prove highly relevant in situations that transcend the individual decision problem. First, consider a case in which an external observer, for example a regulator in a financial market, would like to learn about the rate of occurrence of errors of different magnitude. Suppose that this actor is able to view a sample of errors that took place in independent instances. It will be shown that a distribution of errors with high kurtosis implies a less precise learning process. In addition, if the inference is made under the assumption that the shape of the distribution of errors is not affected by variations in attention levels, that is, if it is assumed there is no increment in kurtosis then, the bias and mean squared error of the estimate can be shown to increase in a significant way. This situation is analyzed in example 2 below. Second, uncertain precision and limited attention allows for the interaction of agents with precise estimates and low assessment of precision with agents with imprecise estimates and high assessment of precision. This situation is expected to have implications for the frequency with which exchanges can lead to extremely unfavorable terms for one side of the transaction. We evaluate this scenario in example 3. Example 2: Consider parameter values and distribution of beliefs as in example1. The exercise focuses the problem of characterizing the distribution of errors based on independent observations. In particular, consider the problem of estimating the 99th quantile for the absolute value of errors, that is the 99th quantile for the distribution of | − | . The estimation of this quantile can be required in analysis that evaluate the resiliency of systems by evaluating the expected loss in extreme scenarios.9 The distribution of | − | is approximated through the simulation of 10• independent events. The approximated mean is 0.78 and the variance is approximately 0.57. The value of interest, the 99th percentile, is approximately 3.42. The hypothetical learning task is developed based on information regarding errors observed across 100 independent instances. The estimation of the quantile is made through linear interpolation of the cumulative probability function. After 10€ instances, the mean value of the estimations is 3.12, its standard deviation is 0.75 and the mean absolute value of the errors equals 0.65. This performance can be compared to the instance in which the distribution has zero excess kurtosis. That is, the case of a normal distribution with the same mean and variance as in the previous case. In this case, the true value of the 99th quantile can be rounded up to 2.52. As before, the estimation is made after observing 100 independent occurrences. The 10€ estimations result in a mean estimate of 2.41, a standard deviation of 0.18 and mean absolute value of the errors equal to 0.22. A comparison of these two cases shows that the estimation of errors is less precise in the case of distributions with high kurtosis. The results suggest that the bias is larger, the estimate is more volatile and the average error is large in the case of in which the distribution of errors is characterized by high kurtosis. At last, consider the original distribution with high kurtosis but imagine that the estimation of the 99th percentile is made under the assumption that the distribution is normal. In this case the estimate equals the 99th percentile of a normal distribution given the sample mean and sample variance that results from the 100 independent occurrences. In this case the mean estimate equals 2.51 while the mean squared error is 1.02. That is, the bias and the error of the estimate increase significantly if the change in the shape of the distribution is not taken into account. Example 3: Consider a two stage setup with two agents. In the first period each agent obtains a signal about the distribution of an unknown parameter . Preferences and information are as developed in the model described in the previous subsection. In the second stage, the agents use this information to value and exchange a contract whose payoff is proportional to . We assume that given posterior beliefs, agents’ participate in an exchange game. Agents submit their valuation truthfully and exchange the contract with the agent with the lowest valuation selling the contract. The truthful valuation of the contract by each agent is a function of the mean and variance assigned to according to posterior beliefs: 9 For example, this type of analysis is carried out in portfolio risk analysis under the label of Value at Risk. Q( ) − e ( )/2 = ′ − 2 )( 1 ′ − 1) ′ That is, we assume mean-variance preferences. Finally, we assume that the price of the contract equals the average of the valuations of the sides participating in the transaction. This representation assumes that agents develop the learning task without an explicit representation of the subsequent exchange stage. Nevertheless, the value of information resulting from the exchange stage can be reflected in the value of the parameters of the first stage payoff function. On the other hand, the likelihood of trade and the price of the contract correspond to a situation in which agents submit their valuation naively without taking into account the value of information implicit in rival’s actions or the market power implicit in two sided bargaining. Hence, this exercise can be interpreted as an analysis of the impact of heavy tails in trading when agents are acting naively. The simulations evaluate the difference between the observed price and a benchmark value. A fair value for the contract is not well defined in this context in which agents have different beliefs regarding the precision of their estimates and uncertainty can be reduced through costly flow of information. In this exercise, the benchmark value is calculated using the valuation function above and average beliefs regarding the value of and the average dispersion of this estimate. The mean absolute value of the difference between the price and benchmark value equals 0.66 and its standard deviation equals 0.875. Consider alternatively a case with known precision where the precision of the estimate of by each agent equals the average precision in the previous setup and the value function is adjusted so that the mean valuation is the same as in the previous example. Then the average absolute value of the difference between terms of trade and the benchmark price of the contract equals 0.62 and its standard deviation equals 0.47. This result illustrates how uncertain precision is associated with significantly larger deviations in the terms of trade from what would be observed when expectations take its average value. The deviations are larger on average but the largest difference is in terms of the volatility in the deviations. As expected given extreme distribution of errors, under uncertain precision and limited attention, prices which are very close to average and greatly off average values are more likely than in the known precision case. Additionally, the example illustrates the problems with defining a fair price in a context in which information flow is endogenous and there are differences in the perception of the value of this information. 4. The two period problem This section presents a simple extension that allows for an analysis of the inter-temporal connections in learning processes. The key assumptions of uncertain precision and endogenous determination of attention are maintained. In this context, it can be assessed how innovations in beliefs are associated to adjustments in the level of attention and, in this way, to the likelihood of further innovations in beliefs. With this objective, a two period model is analyzed. The payoff function of each period coincides with the payoff function presented in the previous section. Each period contains two stages, the first stage entails the determination of attention levels and the second stage involves the perception of an informative signals and its incorporation in updated beliefs. In the first stage of the first period, the initial level of attention is determined. Given the level of attention , the second stage of the first period involves perceiving signal and updating initial beliefs, characterized by parameters l = ( , , , ), according to Bayes rule to get new parameters l ) = ( ) , ) , ) , ) ). The second period problem is identical to the one period case analyzed in the previous section. That is, given initial beliefs l ) = ( ′, ′, ′, ′), the optimal level of attention is given by ′= V0, (d(3)' )L)):/; − ′X. After perceiving the associated informative signal ’, beliefs are updated through Bayes’ rule resulting a a new set of parameters l )) = ( ′′, ′′, ′′, ′′). In particular, the new center parameter is given by ) ′= G)H)MD)N) . G)MD) 4.1 Correlated volatility The following result deals with a relationship between the innovations in each period. More specifically, the result establishes a connection between changes in the value of the center parameter in the first and second periods. For this purpose, an explicit expression is established for the variance of changes of the center parameter in the second period conditional on the change in this parameter in the first period. That is, the analysis focuses on the pattern followed by the expression e „) ( )) − ) | ) − ) as a function of ) − . The next proposition identifies three regions which determine whether the conditional variance is a constant or an increasing function of ) − . Let e by such that if e = | ) − |= D|N'H| GMD then (( ) − 1) ) T)' / = ) . That is, given initial beliefs and attention level, e is the critical level above which positive levels of cognitive effort are recruited in the second period. Proposition 4: Consider an initial profile of beliefs l = ( , , , ) and initial attention level . Additionally, let l ) = ( ) , ) , ) , ) ) represent the updated parameters given signal . There exist e ≥ 0 such that e „) ( )) − ) | ) − ) is zero if | ) − | ≤ e and e „) ( )) − ) | ) − ) is strictly increasing in | ) − | for the range e < | ) − |. Proof: =g ) First, note that ' + GD (N'H); h GMD ' =g + ' G(GMD) (H)'H); h D ' decreases with | ′ − |. Also, for RT( ′ − 1) ′S' / ≤ ′ we have ’ = 0, hence the variance is zero for ) − satisfying the condition indicated in the proposition for this range of | ′ − |. Let e be such that RT( ′ − 1) ′S' / ≤ ′ when | ′ − | = e, equivalently let e = Alternatively, for | ) „K ( )) − )| ) VGK MDKX FVDK(3K' DK ) . Finally, for RT( − )=e )L K (GMD)A GD − | > e or equivalently RT( hence ’ decreases with e F0, _2 X + VGK (3K' VGK MDK ( DK )L K ) ) − ) g − hT − 2 − 1) ) S'; > − 1) ) S ) )X =V : ' ; DK G K MD K XJ = VGK MDKX GK (3K' DK (GMD) )L K > : ) , X RQ(( ) GD ) ' ` / J. we have ’ = RT( − ) ) + Q(( ) ) − 1) ) S'; − : ) , − ) )S = . Where the variance was calculated for the Normal-gamma case. Replacing the expression for the second period attention level: e „K ( )) − )| ) − ) = GK (3K ' †R‡Dˆ‰K nHKK 'HKŠHK 'HoS RT( ) †LK − 1) ) S'; > : =− ) G K (3K ' . Since ) )L K ; )LK − V(3K ' d + _ )L K d (3K ' X / ‹ )L K . With derivative given by: : ; ` =− d :/; (3K ' ):/; L K ‹/; K G V(d(3K' )L K ):/; decreases with | ′ − | this proves the proposition. − ) X < 0 if □ The proposition above identifies a chain of effects that determine the relationship between innovations in each period. First, surprises, high values of | ) − |, lead to downward revisions in the beliefs regarding precision; this is captured by the fall in ′. Second, subject to the condition that assures positive levels of attention, these downward revisions are associated with higher attention levels ’. The last link, the one connecting effort level and volatility in beliefs, is characterized by two effects. On one hand, higher effort levels imply that new incoming information would receive more weight, hence the estimate of is expected to vary more intensely in the second period. On the other hand, the information supplied by the signal will be less noisy dampening the volatility in the estimate of . The previous result shows that information is processed in a way such that the first effect dominates. It is worth emphasizing that this intertemporal connection between variations in estimates during different periods would not hold in the case in which precision is known or attention is exogenously determined. First, variation in attention is the source of the two channels that generate changes in the conditional variance. Both the signal weight and the precision of the signal are modified through variation in attention levels. Second, under known precision there is no change in the scale parameter which, in the analysis above generates variation in attention levels. Hence in these cases, large informational shocks would not lead to variations in the expected size of innovations in beliefs in future periods. Example 4: Consider a numerical example that illustrates how the conditional variance depends on the variation of the center parameter in the previous period. Suppose initial parameter values given by l = ( , , , ) = (0,3,1.5,1), marginal costs T = 0.1, precision given by = 1 and initial effort is given by = 1. In this way, for a given signal the updated beliefs involve parameter values: l ) = ( ′, ′, ′, ′) = ( , 4, 2, 1/(1 + 3 + €N ; h t / /8). As a result, effort level is given by N t − 4S. Finally, the resulting conditional variance equals: 0 f. (8/5) / > | | 1 . 1.€ e „) ( )) − ) | ) − ) = − .€ f. (8/5) •3 _1 + 3 ` 3 Ž 2 _1 + ` 8 8 • • Ž = maxR0, g10 + / >| | INSERT FIGURE 2 The example shows that for signal values sufficiently close to zero, | | < (8/5) / there is no learning in the second period and hence the variance is zero. In this case, the nonnegative constraint is binding, the agent has a higher valuation for attention than for the information that the allocation of attention generates. For this range of parameters, only irreversibility of learning processes keeps the agent from accepting a higher level of ignorance in exchange for receiving a payment equal to the cost (shadow price) of cognitive resources required to restore the original level of confidence. If incoming information is sufficiently surprising, if it satisfies (8/5) / < | |, variance increases with the size of the first period surprise. The result holds because the effect of higher weight assigned to the incoming signal prevails over the effect of receiving less noisy signals. 4.2 Delay and non-monotonicity in cognitive effort In this subsection the possibility of non-monotonicity in cognitive effort is evaluated. The two-period problem can be represented as: max −Q„ R( ′ − ) | S − T + Q„ [Q„) [−( D )) − ) −T (l ) )| (l ) )S| S Where l’ is the updated set of parameter given initial beliefs l, initial attention level 4, the costly signal and the costless signal ‘. The function (l ) ) reflects the optimal decisions of the second period, that = is, if ) > (T( ′ − 1) ′)' / then (l ) ) = 0, alternatively, if ) < (T( ′ − 1) ′)' / then (T( ) − 1) ) )' / − ’. Two effort levels for the first period seem to constitute natural benchmarks. One corresponds to the choice of a myopic agent that only considers the gains to be made in the current period; let ’ (l) denote this level. The other case refers to a naïve analysis in which the attention level is selected ignoring that additional effort levels can be exerted in the second period; let ! (l) represent this level. The solution to the optimal level of attention in the first period ∗ is expected to correspond to a value that is larger than ’ . This would be the case if the marginal return from allocating additional effort is larger than in the myopic case, since the resulting information can be helpful in a second period. Additionally, ∗ is expected to be smaller than ! . This would be the case if the marginal return to effort in the standard problem is lower than the one in the naïve case due to the application of effort in the second period. This exercise can be compared to the case of known precision. It is easy to show that, given the constant marginal cost of information and known precision, the optimal allocation of attention involves allocating all effort level in the first period. In contrast, with unknown precision, information feedback can lead to reassessments that lead to additional effort. A downward reassessment of precision is a necessary but not a sufficient condition for the allocation of additional attention in a subsequent period. Previous information will, on average, result in lower variance assigned to the estimates of the and . For nonmonoticities in cognitive effort, the downward revision in precision needs to be large enough so that it overpowers the effect of accumulated information from previous periods. The following numerical exercise provide example of instances in which non-monotonicities are observed. Example 5: Consider parameter values = 1 and parameters distributed according to }~•(% − 1) and ~~U[0.5,1.5S. = T = 0.001. Suppose Normal-gamma initial beliefs with ~ g0, h , ~U[0,6S, ~U[1,2S and /(2 )~}~ where AC According to the simulations, in the case in which the first period effort is given by ∗ (l), the expected value of precision decreases, that is, posterior beliefs satisfy > ′ ′, with probability 44%. Additionally, in this case, cognitive effort in the second period is positive with probability 0.24 and higher than first period effort with probability 0.07. Alternatively, with first period effort equal to ∗∗ (l), positive effort in the second period is observed with probability 0.11 and the second period effort is higher that first period effort with probability 0.03. That is, even though past information flow tends to modify beliefs in a way that hinders the recruitment of cognitive effort, the occurrence of large revisions of precision in the examples analyzed cause positive and, sometimes, incremental levels of effort. The frequency of non-monotonicities would be higher in contexts in which it is believed that state of the world can change and, as a result, the value of past information decreases with time. In this case, the variance assigned to the estimates increases more slowly and a larger fraction of negative revisions of precision would generate non-monotonocities in effort level. 5. Concluding remarks The present work develops a framework in which an agent is learning about multiple aspects. The value of a parameter and the precision of its estimate are uncertain. Additionally, the agent determines endogenously the level of cognitive resources allocated to this learning task. The results indicate that this setup is associated with high kurtosis in the distribution of errors, non-monotonic recruitment of cognitive resources and correlated volatility of the estimate of the parameter. While this work does not focus on economic applications, it is reasonable to consider that numerous economic scenarios are characterized by complex forms of uncertainty. In addition, in many situations economic agents have the ability to allocate cognitive resources to advance their understanding if that is judged convenient. Hence, the insights considered in this work can prove relevant in this type of situations. There are several aspects that have not been dealt with in the present work and whose consideration seems valuable. First, this work deals with a setup in the unknown parameters are stationary. This can be interpreted as an analysis of learning surrounding a large innovation in the system. More generally, a system in which the state of the world is not stationary could be considered. This environment requires defining a rule through which the learner acknowledges these stochastic changes in the underlying parameters. Additionally, our set up assumes that signals are not correlated but, in any real world situation, the learner needs to assess the level of correlation of the signals received. This is another aspect that could be considered in the representation of learning processes. The analysis presented above considers individual learning process and characterizes its properties. Representations in which this type of learning processes are developed while interacting with other agents is a tool that needs to be developed for the application of these insights to market dynamics. This might requires extending this framework to multiple parameter settings in which learning is carried out for both structural and strategic parameters. The assumptions in this work provide ample room to learning from collecting data and introspection. This implies bounds on the range of validity of the analysis. The approach focuses on situations in which application of sufficient attention can significantly improve assessments of the value of different actions. For example, this captures de idea that investment or saving decisions can be greatly improved through careful analysis of comprehensive evidence. But in many cases, learning from experience can be the main source of knowledge. This is especially the case when the aspects that remain to be learned are so complex that there is not enough capacity to advance understanding in a significant way through logical reasoning and collection of data. 6. Bibliography Avery, C. and Peter Zemsky, 1998, Multidimensional Uncertainty and Herd Behavior in Financial Markets, American Economic Review, Vol. 88, No. 4 (Sep., 1998), pp. 724-748 Barber, Bred and Odean, Terrance. “Boys will be boys: Gender, overconfidence, and common stock investment.” Quarterly Journal of Economics, February 2001, 116(1), pp. 261-92.Sims (2003) Camerer, Colin and Lovallo, Dan. “Overconfidence and excess entry: an experimental approach.” American Economic Review, March 1999, 89(1), pp. 306-318. DeGroot, Morris, 1986, Probability and Statistics, Addison Wesley Publishing Company. Hommes, C, 2006, "Heterogeneous Agent Models in Economics and Finance", in Leigh Tesfatsion and Kenneth L. Judd (editors), Handbook of Computational Economics, Vol. 2: Agent-Based Computational Economics, Handbooks in Economics Series, North-Holland/Elsevier, Amsterdam. Mackowiak, Bartosz, and Mirko Wiederholt, “Optimal Sticky Prices under Rational Inattention,” American Economic Review 99: 769-803 (2009) Malmendier and Tate(2005) CEO Overconfidence and Corporate Investment (with G. Tate). Journal of Finance, December 2005, vol. 60 (6), pp. 2661-2700. Mandelbrot, B., 1963, The variation of certain speculative prices, Journal of Business, Vol. 36, pp. 394419. Peng, L., and W. Xiong, 2006, Investor attention, overconfidence and category learning, Journal of Financial Economics, Vol. 80, Issue 3. Sanguinetti, P. and D. Heymann, 1998, Business Cycles From Misperceived Trends, Economic Notes, Vol 27, No 2. Sims, C, 2003, Impications of Rational Inattention, Journal of Monetary Economics, Vol. 50, pp. 665-690. Thurner, S., J. Farmer and J. Geanokoplos, 2011, Leverage causes fat tails and clustered volatility, Cowles Foundation Discussion Paper No. 1745R Weitzman , M., 2007, Subjective Expectations and Asset-Return Puzzles, American Economic Review,Vol. 97, No. 4 Woodford, M, 2012, Inattentive Valuation and Reference Dependent Choice, Columbia University, mimeo. 6 4 2 0 log(A $counts[min:max]) 8 10 Figure 1: Log-log plot of histogram of estimation errors (circles) vs. normal fitted distribution (line) -1 0 1 log(A$breaks[min:max]) 2 3 -0,50 μ'-μ 0,00 0,50 1,00 1,50 2,00 2,50 3,00 3,50 4,00 4,50 5,00 1,8 -1,00 1,6 -1,50 1,4 -2,00 1,2 -2,50 1 -3,00 0,8 -3,50 0,6 -4,00 0,4 -4,50 0,2 0 -5,00 Figure 2: Conditional Variance – Numerical Example var( μ''-μ'|μ'-μ)