Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Pairwise influences in dynamic choice: method and application Stefano Nasini∗ Victor Martínez-de-Albéniz† February 11, 2017 Abstract Choices of different individuals over time exhibit pairwise associations in a wide range of economic contexts. Network models for the processes by which decisions propagate through social interaction have been studied before and widely applied to marketing, but only a few consider unknown network structures. In fact, while it is typically possible to directly observe individual choices, inferring individual influences (who influences who) might be difficult in the general case, as it requires strong modeling assumptions on the cross-section dependencies of the associated multidimensional panels. This paper proposes a class of exponential random models to jointly deal with dynamic choices of individuals over items together with the structure of pairwise influences between them. We analyze the properties of the model and develop a statistical methodology to estimate its parameters. We then present an empirical marketing analysis of music broadcasting, where a set of songs are diffused over radio stations; we infer station-to-station influences. After uncovering the influences in the station network, we analyze the problem of deciding which station one should choose to first launch a song. Key words: Diffusion on social networks, pairwise influences, multidimensional panel data, influencer marketing, music industry. 1 Introduction When studying choices of multiple agents, spill-over and imitation emerge as a consequence of social interactions in a wide range of economically-relevant contexts. They have been proved to be capable of affecting individual and social outcomes to a large extend (Granovetter 1978, Schelling 2006, Goldenberg et al. 2001). Clear examples are demand-side economies of scale – where the attractiveness of a commodity increases as a function of the total number of consumers (Blind 2004) – and also several classes of monopolistic competition – where product differentiation depends on the imitation between ∗ † Corresponding author: IESEG School of Management, (LEM CNRS 9221), Lille/Paris, France, [email protected] IESE Business School, University of Navarra, Barcelona, Spain, [email protected]. V. Martínez-de-Albéniz’s re- search was supported in part by the European Research Council - ref. ERC-2011-StG 283300-REACTOPS and by the Spanish Ministry of Economics and Competitiveness (Ministerio de Economía y Competitividad) - ref. ECO201459998-P. 1 companies (Dixit and Stiglitz 1977). In fact, this condition covers a wide range of market structures where decision makers are influenced1 by the decisions of their competitors though imitation, such as in the information technology, apparel retailing or music broadcasting industries. In the latter case, which is the application that we present in this paper, the degree of synchronized similarity between programme schedules of broadcasting stations might provide a hint on how much they imitate each other. From a statistical point of view, influence happens when the pattern of individual decisions is systematically anticipated by the ones of other individuals. In this case, we detect a repeated lagged association from which pairwise influences are empirically inferred. Network influence models have been already studied in different contexts (LeSage 1997, Arcidiacono et al. 2012, Granovetter 1978, Schelling 2006, Zeger and Karim 1991, Lee et al. 2006), see §2 for more details. Most of the existing references assumes a well known influence structure and focus on the problem of studying diffusion patterns and optimal starting conditions. In fact, the multiplier effect that a sequence of influences might produce results in amplifying or shrinking the diffusion of ideas and choices. In contrast, we focus on the reverse problem of network discovery, by which observed decisions are tracked to detect patterns of imitation. This paper proposes a novel parametric approach — we call it Pairwise Influences in Dynamic Choice models (PIDC models, from now on) — to infer network patterns from cross-section dependencies in dynamic choice settings, based on the observation of sequential decisions of multiple individuals over multiple items. Formally, we observe a multidimensional panel with three dimensions {xist | i ∈ I, s ∈ S, t ∈ T }, namely the realization of a triple-index process defined on a suitable probability space, where i is the item dimension (e.g., a song), s is the individual dimension (e.g., a broadcasting station), t is the time dimension (e.g., a week)2 . In this analysis, there is one central question regarding the model design: how to internalize in a parsimonious way the dynamic nature of cross-section dependencies and their effect on the diffusion and propagation across time (Bailey et al. 2012). Specifically, we construct an exponential random model for multidimensional panels that internalize the effect of pairwise interactions between individuals on the joint distribution of the dynamic choices on items (Robins et al. 2007). Our model includes effects that connect individual-item levels across periods, based on hidden network patterns of dynamic similarities between individual decisions with respect to items. With such specification, we develop several analytical properties of how model parameters affect the probability distributions of panel samples. The advantage of using an exponential random model over existing network influence models – such as the linear thresholds model, by Granovetter (1978) and Schelling (2006), and the independent 1 The term influence is here adopted in a comprehensive sense to generalize the ones of imitation, and spill-over, which are used in more particular context, such as social psychology and microeconomics respectively. 2 This kind of data structure can be straightforwardly represented in term of dynamic two-mode valued networks – a time-dependent real-valued association between two sets of nodes, say S (for individuals), and I (for items) – by establishing a correspondence between panel dimensions and network layers. This allows an immediate definition of individual correlations, in terms of the projection of the two-mode layers (individual–item connections) into a one-mode network (individual–individual connections), where links represent pairwise interactions between individuals such as transmission, spill-over or imitation, as described by Leydesdorff and Wagner (2008). 2 cascade model by Goldenberg et al. (2001) – is the possibility to support a large variety of model specifications, including the recent empirical findings in influence marketing (Cialdini 2009, Brown and Fiorella 2013). Our statistical methodology is then applied to the diffusion of songs across broadcasting stations in the UK. Our data contains the number of plays of each song over many broadcasting stations. These bring to the market highly correlated program schedules – i.e., choices of songs with high crosssection dependencies arising as a result of pairwise imitations. Our aim is to determine the influence of one station on another. We show that, despite the large amount of songs in the item dimension of the panel, the pattern of imitation between broadcasting stations (cross–section dependencies in the individual dimension) remains constant when only few of them are used as a training set for estimation. Our results thus provide an empirical estimation of the strength of station-to-station influences, which can be used as a score for the ability of conditioning the future choices of other players in the market. With knowledge of the influence structure, music producers can better choose which station to choose for first diffusing their products. Indeed, a direct implication of the proposed modeling framework is to optimize introduction decisions in the music industry, assessed in the final part of the paper. We show that influence effects between broadcasting companies are determinant to maximize the diffusion of a song in the first weeks after launch, but these effects fade away with time as the song reaches the entire network. The rest of the paper is organized as follows. Section 2 reviews the three streams of literature which are relevant in our analysis: econometric models of dynamic choice, multidimensional panel data and exponential random graph models. Section 3 introduces and describes the music data set, along with the relevant measurements we aim to control in our probabilistic model. Section 4 provides a detailed description of the proposed exponential random model for this type of data set, embeds such model in a general Bayesian framework and discusses the algorithmic aspects of the estimation method. The numerical results are presented in Section 5. Section 6 describes how the estimated model can help decision-makers better choose the stations where songs are released first, and Section 7 concludes. All the mathematical proofs of propositions are reported in Appendix A and additional foundations and plots in Appendices B and C. 2 Literature review This paper is connected to several streams of literature, which are for convenience grouped in three main fields: multidimensional choice models for marketing analytics, random models of network formation, diffusion through networks. Multidimensional choice models for marketing analytics Choice models consist in the prob- abilistic selection from sets of mutually exclusive alternatives (Train 2009, Görür et al. 2006, Kuksov and Villas-Boas 2010, Burda et al. 2009). In such context, multidimensional panel data generally 3 appears as a sequence of multiple choices (or outputs) by a fixed collection of individuals (companies, customers, etc.), e.g., Gao and Hitt (2004), Amemiya (1985). Multinomial logit models are typically adopted as a referential framework for individual choices (Kök et al. 2009, Martínez-de-Albéniz and Sáez-de-Tejada 2014), which can be easily generalized to cases of infinite alternative by the Poisson asymptotic behavior (Simons and Johnson 1971), or properly parameterized to account for time-dependent properties in dynamic choice settings and correlated individual decisions (Tversky and Simonson 1993, Hardie et al. 1993). In fact, a common issue when studying dynamic choices of multiple individuals over a set of items concerns the presence of cross-sectional dependencies either in the item or in the individual dimension. Despite the relevance of such problem, only recently cross-sectional dependencies have become central in marketing analytics and in the general econometric literature (Anselin 1988, LeSage and Pace 2009, Chudik et al. 2011). In fact, in the context of individual decisions, inferring influences, transmissions and spill-overs (who influences who) requires strong modeling assumptions about the correlation structure of the individual dimension. In some cases, when interactions between spatially distributed consumers, retailers or manufacturers are specified, spatial autoregressive processes have also been considered as models of panel data (LeSage 1997). Besides spatial proximity, cross-correlations of errors could arise as a result of interactions within socioeconomic networks, when an underlying structure of externalities, spill-overs and imitation between individual decisions can be assumed. For example Arcidiacono et al. (2012) estimates individual spill-overs in an educational context by internalizing the within-group similarities of individual outcomes and choices. This is also the case of interest of this paper, where individual decisions – such as broadcasting decision of companies in the music industry – depend on the decisions of other individuals. In general, the explicit inclusion of pairwise spill-overs is hard to find in the panel data literature; systematic studies of such a structural analysis can only be encountered in the random graph and complex network literature, as discussed in the next paragraphs. Random models of network formation As noted by Castro and Nasini (2015), random models of network formation infer connection structures from purely stochastic processes where connections appear randomly in accordance with some distribution. Among one of the most general and widely adopted models in this class are the well-known exponential random graph models (Robins et al. 2007), ERGM from now on. These allow for a straightforward characterization of network features – such as propensities for homophily, mutuality, and triad closure – through choices of the sufficient statistics used in the model (Morris et al. 2008). In recent years, ERGM have been extended to embrace more complex settings, such as the ones associated to dynamic networks (Hanneke et al. 2010, Desmarais and Cranmer 2012), valued networks (Krivitsky 2012), two-mode networks (Wang et al. 2009), mudual dependency between indivdual properties and network structure (Nasini et al. 2017). Enlarging the range of applicability of the ERGM to such cases allows the inclusion of many relevant statistical problems in the field of complex networks. 4 A further step into this generalization can be made by noting that ERGM do not need to be limited to explicitly observed networks: sufficient statistics can be included to mirror the hidden patterns of connection inferred from observed cross-section dependencies. In fact, statistical settings in which a given set of individuals S is supposed to choose (or to be associated with) elements from a second set of items I commonly appear in several applications, such as scholars associated to papers, customers associated to companies, etc. From this perspective, our approach is related to this stream of literature, where a class of exponential random models is adopted to estimate the hidden associations within the individual dimension S of the multidimensional panel (station-tostation influences). Diffusion through networks Classical diffusion models studied over the years are the Bass differ- ential diffusion model (Bass 1969), and the SIR epidemiological models (Kermack and McKendrick 1927). They account for the establishment and spread of diffusion over a fixed population, without explicitly taking into consideration connections at individual level. However, when a structure of pairwise influences, transmissions and spill-overs is known, diffusion and propagation can be studied as fundamental processes taking place over the edges from node to node. As noted by Kempe et al. (2015), network diffusion processes have a long history of study in the social sciences. More than forty years ago, the DeGroot learning model (DeGroot 1974) was one of the first approaches in this area. His basic intuition was to design the choice dynamics of a node from period to period by averaging out the choices of its neighbours. From the viewpoint of the mathematical sociology, Granovetter (1978) and Schelling (2006) have parallelly investigated the linear thresholds model. The underlying idea was to study binary states of nodes (i.e., active versus inactive), in such a way that nodes have independent thresholds which represent the total weight which has to be exerted by active neighbors in order to become active. A similar model has been proposed in the context of marketing by Goldenberg et al. (2001), where any active nodes is capable of activating his neighbors with a fixed independent probability. This is called the independent cascade model. The DeGroot learning model, the linear thresholds model and the independent cascade model are among the most widely studied mathematical frameworks for diffusion of influence taking place over the edges from node to node. Unfortunately, despite their large range of applicability, none of them allows for the simultaneous inclusion of multidimensional choices over a set of items whose attractiveness is non-homogeneous in time, as for music songs. There are alternative approaches to deal with diffusion and propagation, based on pure simulation schemes, such as multi-agent systems (Olfati-Saber et al. 2007, Kirman and Vriend 2001), but they do not allow to statistically obtain inferential insights from empirical observations. In our statistical setting, the empirical probability of station-to-station influences – representing a score for the ability of conditioning the future choices of other players in the market – allows diffusion patterns to be probabilistically characterized. 5 3 The context: data on music broadcasting We collected data about the program schedules of songs played in broadcasting companies (TV channels and radio stations) in the UK, from January 1, 2011 to March 31, 2014. The data format consists in a sequence of tables identifying a unique moment (day and hour) in which a specified song is played in a specified broadcasting company. The information was obtained from BMAT, which has developed a technology (Vericast) to monitor the songs being played in real time on any radio station in any country. (This is done by identifying the musical ‘fingerprint’ of each song and matching it with their large database of songs.) The database included information about artists and names of each song, day and time of the day being played, and radio station of that play, but for simplicity we aggregated these plays by week across all stations. Table 1 describes the data set – an exploratory statistical analysis of such data sets has been carried out by Martínez-de-Albéniz and Sáez-de-Tejada (2014), with an emphasis on the time-variation of song popularity. Stations 22 Artists 32,765 Songs 74,712 Time periods (weeks) 255 Table 1: Descriptive statistics of the two data sets. Despite the large amount of songs broadcasted, only few of them are frequently played: 1, 180 songs out of 65, 531 capture half of the total plays (50% market share). Moreover, the number of songs whose market share is at least a half of the most broadcasted song is just 30. These facts, as highlighted by the Pareto diagram in Figure 1, suggest that very few songs are quantitatively relevant in the analysis of dynamic broadcasting patterns. (a) All the 65, 531 songs played. (b) Top 1, 180 songs played. Figure 1: Pareto diagram of the number of plays. A similar behavior can be observed at the artist level. Among the most played artists is Bruno Mars. The weekly number of plays of two of his most popular songs is shown in Figure 2. They exhibit a dynamics which resembles the short life cycle of fashion goods: the demand of songs evolves 6 on a time window in which their popularity increases shortly after launch and then decreases3 . (a) Bruno Mars, Just the way you are in the UK. (b) Bruno Mars, Locked Out Of Heaven in the UK. Figure 2: Time plots of numbers of plays of two songs of Bruno Mars across 22 stations. Within the British data set, broadcasting stations are partitioned in accordance with their music format – World-music (2 stations), Contemporary and Easy listening (7 stations), Rock music (6 stations), Top 40 and Urban (7 stations). This classification is based both on the Wikipedia description of the broadcasting companies and the information resulting from their corresponding webpages. Within a given radio format, individual decisions are much more homogeneous and the corresponding number of plays appear positively correlated. We take advantage of this in the model specification, see §4.3. The overall picture emerging from this data description suggests the possibility of assessing the presence of pairwise influences in the individual broadcasting decisions within and between music formats. Influence happens when the pattern of plays of a collection songs by a given station in a given period is systematically followed by other stations. In this case we empirically detect repeated pattern which is the basis of our pairwise influences estimation. A probabilistic model for this particular type of network discovery is proposed in the next section. 4 Model definition and specification A statistical model for influence discovery is presented in this section. 4.1 The PIDC model Let S and I be the individual and item dimensions of a three dimensional panel and let xst = [xs1t xs2t . . . xs|I|t ]T , xit = [x1it x2it . . . x|S|it ]T and xt = [x1t x2t . . . x|S|t ] be the associated |I|dimensional, |S|-dimensional vectors and |I| × |S|-dimensional matrix correspondingly. Thus, xsit represents the valued connection between individual s ∈ S and item i ∈ I, which can be interpreted 3 When presenting our modeling framework, this time effect of songs diffusions has been treated as an exogenous social trend (possibly due to an appetite for novelty, or to a planned marketing strategy) which affects broadcasting decisions without been affected by them (Harvey 1989). This simplifying assumption is made explicit in §4.3 and allows estimating the shape (life cycle) of this exogenous social trend and its influence on each station. 7 as the decision that the sth individual takes with respect to the ith item (e.g., the number of plays of song i in station s) at time t. Let h : R −→ R be non-negative and non-decreasing and G : R2 −→ R be defined as G(x, y) = xg(y), where g is a non-decreasing real valued function4 . Sometimes the shorter notation hist and Gss0 i`t shall be adopted instead of h(xist ) and G(xist , xis0 (t−`) ). The latter is defined to capture the dynamic network pattern of cross-sectional dependencies, in terms of lagged similarities between individual choices. We build a conditional model for the decision xsit that the sth individual takes with respect to the ith item at time t: P (xsit | xit−τmin . . . xit−τmax ) ∝ 1 exp ψ (hist )ψ τX max X `=τmin s0 ∈S γ`ss0 Gss0 i`t . (1) The joint probability distribution of the described three-dimensional panel (and its dynamic twomode network correspondence5 ) can be properly defined as follows. Definition 1 (Three-dimensional panel joint probability). Let the individual decisions at the first τ periods x1 . . . , xτ be known. The joint probability of observing a sequence of |S| multidimensional choices is defined by assuming conditional independence between individuals and items and applying the product rule: where the lag τmin T 1 1 P (xτ +1 . . . xT | γ) = exp ψx Γ̃g , (2) Z(γ) hΠ (x) Q Q Q > 0, the function hΠ (x) = t∈T i∈I s∈S (hist )−ψ , the function Z(γ) is a normalizing constant (also known as the partition function), and γ̃ is the |T ||I||S|2 dimensional vector obtained from stringing out the Γτmin . . . Γτmin Γ̃ = elements of matrix Γ̃ in lexicographic order and Γτ ... Γτmax Γτmin . . . Γτmax .. . Γτmin Γτmin +1 Γτmin The components of vector g ∈ R|I||S||T | correspond to g(xist ), sorted in lexicographic order with respect to items, individuals, and time periods. The characterization of the sample space X ⊆ R|I||S||T | of this multidimensional random variable P can either include constraints on the play-list capacity i∈I xist = N , for each station s at time t, 4 5 The specification of h and g are exogenous and given by the user for each application. Note that based on the previously introduced dynamic association between primary and secondary layers S and I, the valued connections Gs,(s0 ,`),t,i result from the projection of the two-mode layers (individual–item connections) into a one-mode network (individual–individual connections). 8 or allow for unrestricted choices. The latter case is assumed in the oncoming results, to avoid the inclusion of unnecessarily tedious notation. A micro-foundation of (2) is proposed in Appendix B, based on a decision setting where stations select a randomized play-list policy. Assumption 1 (The underlying measure). The underlying measure h : R −→ R in model (2) is such that h0 (x)/h(x) is monotonically increasing and positive and h(0) ≥ 1. Assumption 1 is verified by most of the well-known exponential random models, such as the Poisson and the Gaussian model. We denote with F the so-defined family of three-dimensional panels with probability measure and note that not all values of γ lead to a well-defined probability distribution, due to the sufficient statistic Gss0 i`t not being integrable. The domain D(F) is the set of all γs which lead to a well-defined probability distribution, that is to say, D(F) = {γ ∈ Rq | Z(γ) < ∞}. In the model, γ`ss0 controls the effect of choice similarities between pairs of individuals (s, s0 ) ∈ S × S – in the described application to the music industry, it captures a (possibly time-delayed) spill-over or imitation between couples of stations. While we present here a general specification, we give it more structure below, in §4.3. Proposition 1. Consider the conditional distribution of xi,t | xi,t−τmin . . . xi,t−τmax , as obtained from (1). P (xi,t | xi,t−τmin . . . xi,t−τmax ) is unimodal when h verifies Assumption 1. Proposition 2. Consider again the conditional distribution of xi,t | xi,t−τmin . . . xi,t−τmax and let H : R|S| −→ R|S| be defined as H(x1 , . . . , xn ) = [h0 (x1 )/h(x1 ) . . . h0 (xn )/h(xn )]T . Based on Proposition 1 and Assumption 1, we claim mod[xi,t | xi,t−τmin . . . xi,t−τmax ] = H −1 τX max Γ` g(xi,t−` ) − Z(γ)e `=τmin where mod[ . ] is the mode of the distribution. Proposition 3. Let mit = mod[xi,t | xi,t−τmin . . . xi,t−τmax ] and µit = E[xi,t | xi,t−τmin . . . xi,t−τmax ]. Based on the Laplace approximation, when ψ grows large we have √ τX max 2ψ 2π (mit ) T exp ψ (mit ) Γ` g(xi,t−` ) E[xi,t | xi,t−τmin . . . xi,t−τmax ] ≈ ψ h(mit ) Z(γ) `=τmin √ V[xi,t | xi,t−τmin . . . xi,t−τmax ] ≈ τX max 2π (µit − mit )2ψ T exp ψ (mit ) Γ` g(xi,t−` ) h(mit )ψ Z(γ) `=τmin where E[ . ] and V[ . ] are respectively the expectation and the variance operators. max Known inequalities can be used to show the impact of the influence structure {Γ` }τ`=τ on the min diversity of choices (song variety) with respect to the mode. Based on the Gauss’s inequality (Gauss 9 1823) – using the version reported by Weisstein (2016)–, we have τX max 1 (32π)1/4 (µit − mit )ψ T exp ψ P (|xi,t − mit | ≥ k ) ≤ (mit ) Γ` g(xi,t−` ) 2 9kh(mit )ψ/2 Z(γ)1/2 `=τmin The immediate implication of the above inequalities is that the deviation of station play-lists frequency from the mode decays monotonically with respect to the influence scales in the influence max structure {Γ` }τ`=τ . min max The sensitivity of PIDC models to the influence structure {Γ` }τ`=τ can also be assessed, based min on general properties of the exponential family of distributions. Proposition 4. Consider the conditional model (1). We claim that ∂ E [xsit | xi,t−τmin . . . xi,t−τmax ] = g(xs0 i,t−` )V [xsit | xi,t−τmin . . . xi,t−τmax ] , ∂γ`ss0 (3) for each i ∈ I, s ∈ S. This supports our statement concerning the dynamic expectations in the presence of cross-section dependencies, which result to be amplified or shrunk by the similarities between stations play-lists. 4.2 PIDC model specifications This subsection deals with the application of (2) to different statistical settings, associated with binary, count and continuous data. The aim is to assess the versatility of PIDC models to capture different forms of influences and to generalize well-known statistical models. Binary data specifications X ∈ Consider data drawn from a multidimensional binary sample space {0, 1}|I||S||T | . xi,t In this case, we define the underlying measure as h(xsit ) = 1. max Y X τX π it | xi,t−τmin . . . xi,t−τmax ∼ Bern (π it ) , where ∝ exp γ`ss0 Gss0 i`t 1 − πit 0 i∈I (4) s ∈S `=τmin Note that for binary sample spaces D(F) = Rq , as long as g is a well-defined real valued function. Note that this PIDC specification generalizes the voter model (see (Clifford and Sudbury 1973) and (Holley and Liggett 1975) for more details). In fact, in the classic voter model, nodes randomly pick at each stage one neighbor and adopt its choice. The binary specification of the PIDC model (4) allows nodes to select choices which have not been selected by their neighbors. Moreover, it allows for non-uniform preferences among neighbors and non-Markovian dependence thought the influence max structure {Γ` }τ`=τ . min Count data specifications Consider data, drawn from a multidimensional discrete sample space X ∈ Z|I||S||T | . In this case, we define the underlying measure as h(xsit ) = xsit !. Proposition 5 establishes a sufficient condition of integrability. 10 Proposition 5. A sufficient conditions for D(F) = Rq is that g is bounded from above. When no simultaneous dependencies are included (τmin > 0), individual outcomes have the following distribution: xi,t | xi,t−τmin . . . xi,t−τmax ∼ Pois (λit ) , where λit = exp max X τX s0 ∈S γ`ss0 g(xs0 (t−`) ) (5) `=τmin Thus, the influence effects result in the joint shift of the conditional mean and variance of each item profile in each time period. Continuous data specifications tinuous sample space X ∈ R|I||S||T | . Consider continuous data drawn from a multidimensional conIn this case, we define the underlying measure as h(xsit ) = 1. We let Γ ∈ R|S|×|S| be positive definite, then we have 1 (6) xit . . . xi,t−τ ∼ N (µ, Σ) , where µ = 0 and Σ = − Γ̃−1 . 2 An analogous multidimensional Gaussian model for economic influence between firms was proposed by Kelly et al. (2013). Similarly, when τmin > 0, we have xi,t | xi,t−τmin . . . xi,t−τmax ∼ Exp (λit ) , where λit = max X τX γ`ss0 g(xs0 (t−`) ) (7) s0 ∈S `=τmin and γ `s is the sth column vector of Γ` . Also in this case, constraints must be imposed to guarantee P max g(xs0 (t−`) ) < 0, for all i ∈ I, t ∈ T }. Z(γ) < ∞, namely D(F) = {(γ) ∈ Rq | γ s τ`=τ min Note that this allows us to work with negative data, in contrast with the previous specification. Equation (2) might be conceived as a generalization where the sample space can be both continuous or discrete and the spill-over measurement is allowed to be arbitrarily specified, beyond the ones verifying the conditions considered in this subsection. 4.3 Further specifications: social media and community structure In its naive version, the PIDC models establish a conditional distribution of present individual states (outcomes and decisions) as a function of past states, where the transaction happen through an unknown influence structure that we wish to infer from the empirical observation. Sometimes the pattern of influence can be assumed to be associated to a more structured setting, where individuals belong to different groups or where some of them is more influential than others. These cases are presented in this section. Social media and the song life cycle Let s∗ be a social media who is able to influence the choices of individuals in S. The dynamic choices of s∗ over items in I are denoted as xis∗ ,t . They are exogenous and cannot be affected by the ones of individuals in S. For every s ∈ S, the influence of s∗ over s is included in the kernel of the PIDC model (2) as τX max max X τX 1 exp ψ Gss0 i`t γ`ss0 Gss0 i`t + ψ Gss∗ i`t P (xist | γ) ∝ (hist )ψ 0 s ∈S `=τmin 11 `=τmin (8) where Gss∗ i`t = xist xis∗ ,t−` . From Proposition 3, this implies that the conditional expectation is shifted upward when the the social media starts broadcasting songs at a higher intensity. As discussed in §1 and illustrated in the plots in Figure 2, the broadcasting pattern of songs exhibit a time window in which their number of plays within the broadcasting industry quickly increases shortly after the release and then decreases. For any song i ∈ I, let t0i be the starting week when the song is launched. A possible social media specification to account for the observed song life cycle is to define the exogenous attractiveness trajectory of the ith song as a Gamma kernel: ( δi0 + δi1 (t − t0i ) + δi2 log(t − t0i ) xis∗ t = −∞ if t > t0i otherwise (9) Thus, the social media dynamics is fully characterized by the tuning parameters δi0 , δi1 , and δi2 . The community structure and the music formats The presence of different music formats gives rise to certain form of community structure in the network of broadcasting companies. Consider a set of music formats KF = {Contemporary, Top-40, World music, Rock} and a set of of individual broadcasting companies KC . Function κC : KC → KF assigns to each company a given format. Then, for each time lag ` the influence structure Γ` can be restricted as follows: γ`s0 r = γ`s1 r , for all s1 , s2 , r ∈ S, such that κC (s0 ) = κC (s1 ) 6= κC (r), (10) i.e., stations of the same format follow the same influence pattern with respect other formats. Note that when this community structure is not taken into account, the dimensionality of the parameter space of (2) is (τmax − τmin )|S|2 , which grows dramatically large when the number of individuals (stations) increases (the dimensionality of the parameter space can exceed the one of the sample space). The inclusion of this community structure allows reducing the dimensionality. In our application, we partitioned the 22 broadcasting companies into 4 radio formats,so that the number of influence effects reduces from 22 × 21 × (τmax − τmin ) to ((7 × 6) + (7 × 6) + (6 × 5) + 2) × (τmax − τmin ), i.e., a reduction of 75%. 4.4 Estimation method Classical inferential methods for the model parameters of (2) are encumbered by the intractability of the normalizing constant 1/Z(γ), which makes the numerical optimization of the likelihood function very challenging. However, under very specific conditions, (2) might be reduced to well-known probabilistic models, as discussed in §4.3. Under general conditions, different approaches exist in the literature do deal with unknown normalizing constants: the pseudo-likelihood approach, proposed by Besag (1975) in the context of the analysis of data with spatial dependencies; the Monte Carlo Maximum Likelihood (MCML), introduced by Geyer (1996); or the auxiliary variable method, first described by Møller et al. (2006) in the context of Bayesian statistics. Murray et al. (2006) introduced a computational improvement to the auxiliary variable method, resulting in a specialized MCMC 12 method, known as the exchange algorithm. All these approaches are useful and widely accepted estimation methods in cases where the direct maximization of the likelihood is intractable. Here we adapt the exchange algorithm to the PIDC class of models, by embedding the kernel of the probability function (2) into a Bayesian framework (Gamerman and Lopes 2006). Let x(0) ∈ X be the observed three-dimensional panel (in the form discussed in §4). Given a prior distribution π(γ), (0) (0) (0) (0) we can apply the Bayes rule: P (γ | x1 . . . x|T | ) ∝ P (x1 . . . x|T | | γ) π(γ). It is well known that, if π(γ) is uniform over the parameter space, then the mode of the posterior distribution is equal to the maximum likelihood estimator of (2). The exchange algorithm by Murray et al. (2006) is based on the simulation of the joint distribution (0) (0) of the parameter and the sample spaces, conditioned to the observed data set x1 . . . x|T | . Specifically, to generate the posterior parameter distribution, we use an arbitrary proposal distribution Q defined on the same support as the prior and the posterior (which has no impact on the resulting estimation, although it does affect the convergence time), and generate samples γ 0 and x0 , which will obey a distribution governed by x(0) , P (x, γ | x(0) ). Algorithm 1 MCMC method for PIDC models. 1: Initialize the posterior distribution of γ (arbitrarily) 2: repeat 3: Draw γ 0 from Q 4: if τmin > 0 then 8: P (x(0) | γ 0 )π(γ 0 )Q(γ) Accept γ with probability min 1, P (x(0) | γ)π(γ)Q(γ 0 ) else P (x0 | γ)P (x(0) | γ 0 )π(γ 0 )Q(γ) Draw x0 from P (. | γ 0 ) and accept γ 0 with probability min 1, P (x(0) | γ)P (x0 | γ 0 )π(γ)Q(γ 0 ) end if 9: Update the posterior distribution of γ 5: 6: 7: 0 10: until Convergence As summarized in step 7 of Algorithm 1, the double intractability is eliminated when simulating from P (x, γ | x(0) ) by the Metropolis-Hastings method, based on a proper definition of the proposal distribution – which is used to draw candidate points for the posterior. Note that in step 3 a new value from the parameter space is randomly proposed, and the computation of the acceptance probability of this candidate point depends on the whether we are able to characterize Z(γ). In the affirmative case, such as when τmin > 0, the candidate point can be accepted based on the classical MetropolisHastings ratio in step 5. In the opposite case, we need the auxiliary variable step 7 to approximate the unknown normalizing constant. The chain converges to P (γ | x(0) ) in the limit. Clearly, even though we avoid calculating the normalizing constant, this remains a computationally intensive procedure. 13 5 Empirical results The application analyzed in this paper concerns the occurrences of songs in the play lists of broadcasting stations, described in §3. Based on the model specifications presented in §4.3, the main goal is to estimate the dynamic attractiveness of songs (the exogenous social trend), along with the overall max structure of imitations, as defined in model (2). Specifically, the influence structure {Γ` }τ`=τ and min the social media parameters {(δi0 , δi1 , δi2 )}i∈I is estimated using the Bayesian approach described in §4.4. To assess the robustness of this model specification to different occurrence of the broadcasting process, the estimation has been carried out on two sub-groups of songs. In fact, Figure 1 revealed that only few songs were quantitatively relevant in the size of the overall broadcasting patterns. As noted in §3 the number of songs whose market share is at least a half of the most broadcasted song is just 30. Based on this, we assess the robustness of the model specification, when only the first 50 songs – corresponding to 9% total market share – and the first 200 songs – corresponding to 21% market share – are used for the estimation. All the runs associated to the results presented here were carried out on a Dell PowerEdge R430 server with Xeon E5-2690 v3 CPUs and 128 GB of RAM, under a Window Server R2 operative system. 5.1 Song life cycle and the influence structure The numerical results presented hereby are taking into account both training sets represented by the first 50 songs and the first 200 songs. The estimated song life-cycles after their launch week t0 are reported. The colored envelopes in Figures 3 and 4 shows a credible interval from the posterior predictive distribution of the number of plays of an arbitrary song, using respectively the top-50 and the top-200 songs as training sets. (a) Top–5 songs. (b) Bottom–5 songs. Figure 3: Life cycle for songs: the black lines are the number of plays of the top 5 songs (left plot) and the bottom 5 songs (right plot), within the analyzed top 50 songs, from their release t0 . The colored envelope denotes their expectation plus-minus twice standard deviation from the posterior predictive distribution, based on (9). 14 (a) Top–5 songs. (b) Bottom–5 songs. Figure 4: Life cycle for songs: the black lines are the number of plays of the top 5 songs (left plot) and the bottom 5 songs (right plot), within the analyzed top 200 songs, from their release t0 . The colored envelope denotes their expectation plus-minus twice standard deviation from the posterior predictive distribution, based on (9) . The estimated social media parameters {(δi0 , δi1 , δi2 )}i∈I reflects the common life-cycle of song across stations and allows for an accurate aggregate prediction of future dynamics based on the initial success and propagation. These estimates provide an alternative to Martínez-de-Albéniz and Sáezde-Tejada (2014), where individual peaks are computed taking into account competing song releases. Although the results are qualitatively similar for 50 and 200 songs, we observe a slightly increased variability when comparing the envelope in Figure 4 (associated to a training set with the first 200 songs), with the one in Figure 3 (associated to a training set with the first 50 songs). Using the top-50 and the top-200 songs as training sets, the network plots in Figure 5 shows the corresponding expectations, based on the posterior distribution, for each pair of music formats (the sizes of the depicted connections denote the expected values of γ, for all the marginal posteriors). The size of the influence effects can be interpreted as the sensitivity of the influenced station to increase its number of plays of a song if the influencer station played it more often than the average in previous periods (recall that this average may be driven by the social media dynamic (9)). For example, we can see that Top-40 radio stations are highly influenced from rock stations. When γ is zero, the differentiation between broadcasting companies is controlled by the different effect of the social media s∗ on the corresponding stations. Appendix C reports the estimated network plots of pairwise influences within each of the four radio formats. A second interpretation of these estimated influences is the consistency between both training sets, supporting the robustness of the estimated influences and the exhaustively of the information contained in the few most popular songs. Furthermore, we obtained a higher sparsity of the pairwise influence pattern when the analyzed collection of song is enlarged (see also Appendix C). 15 (a) Top-50 songs training set. (b) Top-200 songs training set. Figure 5: Network plots of the estimated pairwise influences between radio formats. 5.2 The time reaction of influence The problem of studying a time window in which individuals are influenced (and react to the stimulus of others) is here taken into account from the statistical outlook of the lag-length selection in time series (Ng and Perron 2001). This is particularly useful because it allows the model to determine how fast imitation occurs. Specifically, one will be able to assess whether choice is more or less simultaneous (with weight being put on low lags, e.g., ` = 1) or imitation takes its time (with more weight put on higher lags). The estimation of §5.1 was here carried out using τmin = 1 and τmax = 5 weeks. Table 2 reports the frequency distribution of the estimate γ parameters at each time lag, corresponding to the numerical results in §5.1 and §5.1. The influences between radio formats appear to be low with respect to all the five lags. For both training sets, the highest influence seems to appear after two and three weeks. A graphical illustration of the dynamic influences within each music format is provided in Figures 6 and 7, for the two respective training sets with the 50 and the 200 top songs. This level of analysis allows selecting the optimal time lag of pairwise influences, which is crucial to avoid overfitting in PIDC models. In fact, the high dimensionality of the parameter space of PIDC models entails a parsimonious inclusion of lags in the model specification. 6 Influence maximization by optimal propagation Beyond the ability to predict expected outcomes, well-defined probabilistic models which capture pairwise influences can be helpful in choosing which station is best to diffuse a new song. As mentioned in §2, the DeGroot learning model, the linear thresholds model and the independent cascade model 16 Training set Distribution Top-50 songs, Subsection §5.1 Top-200 songs, Subsection §5.1 Influence effects Γ1 Γ2 Γ3 Γ4 Γ5 Min 0.0000 0.0000 0.0000 0.0000 0.0000 1st quartile 0.0102 0.0090 0.0099 0.0093 0.0086 Median 0.0375 0.0471 0.0580 0.0314 0.0386 Mean 0.1300 0.1245 0.1527 0.1082 0.1060 3rd quartile 0.1645 0.1725 0.2218 0.1487 0.1430 Max 1.2740 1.2902 1.4604 1.2348 1.2486 Min 0.0000 0.0000 0.0000 0.0000 0.0000 1st quartile 0.0029 0.0029 0.0027 0.0028 0.0026 Median 0.0254 0.0385 0.0455 0.0315 0.0207 Mean 0.0929 0.1264 0.1394 0.1117 0.1086 3rd quartile 0.1117 0.1681 0.1453 0.1556 0.1499 Max 1.9504 2.1971 2.2798 1.8299 2.0376 max Table 2: Frequency distribution of the estimated influence structure {Γ` }τ`=τ between radio formats. min are among the most widely studied mathematical framework for diffusion of influence taking place over the edges from node to node. The PIDC models we described in this paper can be included in this list, with the aim of designing a more flexible approach both as a process of diffusion and as a mathematical framework for statistical inference. Given a full specification of any of these influence models, an influence optimization problem (Goldenberg et al. 2001, Olfati-Saber et al. 2007, Kirman and Vriend 2001) consists in choosing a good initial set of nodes to target in order to maximize the future diffusion. The influence of a set S0 of nodes, denoted σ(S0 ), might be expressed as the expected outcomes of nodes in S at the end of the process or integrated over all the process, given that S0 is this initial active set. Influence maximization problems are usually subject to a fixed cardinality constraint on S0 , so that the number of initial individuals is fixed. Formally, it can be expressed as the following mathematical program: max S0 ⊆S σ(S0 ), subject to |S0 | = k (11) For most of the influence models, it is NP-hard to determine the optimum set for influence maximization (Kempe et al. 2015). Moreover, it is not clear how to evaluate σ(S0 ) in polynomial time, as its exact computation requires to integrate over a combinatorial set. Chen et al. 2010 and Wang et al. 2012 proved that evaluating σ(S0 ) is generally #P-complete both for the linear threshold model and the independent cascade model respectively. However, as shown by Kempe et al. 2015, it is possible to obtain arbitrarily good approximations in polynomial time, by simulating the random choices and diffusion process. In fact, for each simulated scenario κ, σκ (S0 ) is a submodular function both for the linear threshold model and the independent cascade model. Thus, a natural greedy hill-climbing strategy has been widely adopted to approximate the maximum influence to within a factor of (1 − 1/e), where e is the base of the natural logarithm. In this section we examine the influence maximization problem under the estimated PIDC model. Since we focus on a concrete song, we drop the sub-index i in what follows, and define the influence 17 (a) Rock music. (b) Contemporary music. (c) World-music. (d) Top-40 music. Figure 6: Time series of the lagged influences within each music format, based on the top-50 songs training set. (e) Rock music. (f) Contemporary music. (g) World-music. (h) Top-40 music. Figure 7: Time series of the lagged influences within each music format, based on the top-200 songs training set. function as σ(S0 ) = X E [xs,t0 +T | xt0 = y(S0 ), xt = 0 ∀t < t0 ] s∈S = XX wP (xs,t0 +T = w | xt0 = y(S0 ), xt = 0 ∀t < t0 ) (12) s∈S w = XX s∈S where z = [w, x]T w z T 1 1 exp ψz Γ̃g , Z(γ, S0 ) hΠ (z) and S0 has been explicitly included in the normalizing constant notation Z(γ, S0 ) to make the dependency explicit. Proposition 6. Consider the problem of maximizing the expected number of plays at time t0 + T , under selecting an initial launching station at time t0 and let S ∗ be the optimal initial set. We claim that max σ(S0 ) ≥ S0 |S ∗ | exp (ψρ) . Z(γ, S ∗ ) where ||.||1 denotes the 1-norm of a matrix and ρ its spectral radius of the influence structure Pτmax `=τmin Γ` . This is equivalent to the summation, for each station s ∈ S and all time period up to T , of the maxiP 1/2 τmax mum column sums of absolute values of Γ . `=τmin ` The result is useful because it shows that the optimal solution is related to the spectral radius of the influence matrix, and in particular the lower bound can be conceived as a centrality index in the defined network of pairwise influences (Jackson 2010, Borgatti 2005, Bonacich 1987). 18 Let us start with the choice of station. To quantify the diffusion potential of a station s, we define X cs,T = E [xs,t0 +T | xt0 = vs ] , s0 ∈S where vs ∈ {0, 1}|S| is the s canonical vector with zeros everywhere except in sth position. The measure cs,T thus assigns dynamic scores to all individuals in the network based on their ability to generate positive externalities in the future periods. Format Expected plays t0 + 1 t0 + 2 t0 + 3 t0 + 4 t0 + 5 BBC Radio 1 222.1906 196.1270 177.0395 163.3093 152.3401 Magic 105.4 219.4027 193.1600 175.4467 162.2060 152.0035 Key 103 213.6750 189.0823 172.8133 161.0081 151.3424 Galaxy 102 213.0615 188.4712 172.4151 160.5560 150.9959 Table 3: Propagation of the broadcasting decision at launch week t0 . Table 3 reports the average values of cs,T for the most and the least influential stations for T = 1, . . . , 5. Two important insights on diffusion properties of the starting decisions result from the Table 3. • The only impact of the launching company is propagate one and two weeks after the premiere and vanishes after the third week. A more contextual interpretation is the fact that the long-run behavior of a song is quite unsensitive to the initial marketing policy. • BBC Radio 1 has the largest propagation effect. • Galaxy 102 has the smallest propagation effect. • The influence happening between stations of different formats are extremely small in size, suggesting that the main core of imitation happen within stations of the same music formats (see Appendix C). These insights deserve a careful interpretation. As already mentioned in §1, statistical models fail to distinguish between causality and co-variation, so that the only evidence detected by the estimated model is a systematic imitation of Top-40 stations (including BBC Radio 1) by Rock ones. 7 Conclusions In this paper we developed a modeling framework for network discovery in influencer marketing that integrates time variation of individual decisions about each item with the structural information concerning their influences and spill-over. This model was crafted for fashion goods which experience significant dynamic variations in popularity, and was applied to the diffusion of songs over time. We could jointly face three aspects of statistical and decision analysis which are usually separately treated: i) modeling dynamic choices and cross-section dependencies in multidimensional panels, ii) discovering hidden structures of pairwise influences, iii) deducing diffusion patterns from the model specification. 19 We discussed some of the mathematical properties of the proposed exponential random model, along with a Bayesian estimation framework. Our general specification resulted in high dimensionality arising from the quadratic growth of connections within the network, but a multi-level definition of pairwise externalities allowed us to reduce the dimensionality of the parameter space. In the music industry application, our results provide an empirical estimation of the strength of station-to-station influences, which can be used as a score for the ability of conditioning the future choices of other players in the market. We found evidence that there were influencing links (one station copying another). With knowledge of the influence structure, we are able to support the decision of music producers about which station to choose for first diffusing their products. We showed that influence effects between broadcasting companies allows maximizing the diffusion of a song in the first weeks after launch, but these effects fade away with time as the song reaches the entire network. The proposed methodology is general and might be extended to many other multidimensional panel settings. Applying the techniques developed in the paper to other environments may generate further insights on how to incorporate influence relationships while at the same time including time variations that apply to the entire network. References Amemiya, T. 1985. Advanced econometrics. Harvard university press. Anselin, L. 1988. Spatial econometrics: methods and models, Volume 4. Springer Science & Business Media. Arcidiacono, P., G. Foster, N. Goodpaster, and J. Kinsler. 2012. Estimating spillovers using panel data, with an application to the classroom. Quantitative Economics 3 (3): 421–470. Bailey, N., G. Kapetanios, and M. Pesaran. 2012. Exponent of cross-sectional dependence: Estimation and inference. Working paper, SSRN. Bass, F. M. 1969. A New Product Growth for Model Consumer Durables. Management Science 15 (5): 215–227. Besag, J. 1975. Statistical analysis of non-lattice data. The statistician 24 (3): 179–195. Blind, K. 2004. The economics of standards: Theory, evidence, policy. Edward Elgar, Cheltenham. Bonacich, P. 1987. Power and centrality: A family of measures. American journal of sociology 92 (5): 1170– 1182. Borgatti, S. P. 2005. Centrality and network flow. Social networks 27 (1): 55–71. Brown, D., and S. Fiorella. 2013. Influence marketing: How to create, manage, and measure brand influencers in social media marketing. Que Publishing. Burda, M., M. Harding, and J. Hausman. 2009. Understanding choice intensity: A poisson mixture model with logit-based random utility selective mixing. Working paper, MIT. Castro, J., and S. Nasini. 2015. Mathematical programming approaches for classes of random network problems. European Journal of Operational Research 245 (2): 402–414. Chen, W., Y. Yuan, and L. Zhang. 2010. Scalable influence maximization in social networks under the linear threshold model. In 2010 IEEE International Conference on Data Mining, 88–97. IEEE. 20 Chudik, A., M. H. Pesaran, and E. Tosetti. 2011. Weak and strong cross-section dependence and estimation of large panels. The Econometrics Journal 14 (1): C45–C90. Cialdini, R. B. 2009. Influence: Science and practice, Volume 4. Pearson Education Boston, MA. Clifford, P., and A. Sudbury. 1973. A model for spatial conflict. Biometrika 60 (3): 581–588. DeGroot, M. H. 1974. Reaching a Consensus. Journal of the American Statistical Association 69 (345): 118–121. Desmarais, B. A., and S. J. Cranmer. 2012. Statistical mechanics of networks: Estimation and uncertainty. Physica A: Statistical Mechanics and its Applications 391 (4): 1865–1876. Dixit, A. K., and J. E. Stiglitz. 1977. Monopolistic competition and optimum product diversity. The American Economic Review 67 (3): 297–308. Gamerman, D., and H. F. Lopes. 2006. Markov chain monte carlo: stochastic simulation for bayesian inference. CRC Press. Gao, G., and L. Hitt. 2004. Information technology and product variety: Evidence from panel data. In ICIS 2004 Proceedings, 365 – 378. Gauss, C.-F. 1823. Theoria combinationis observationum erroribus minimis obnoxiae.-gottingae, henricus dieterich 1823. Henricus Dieterich. Geyer, C. J. 1996. Estimation and optimization of functions. In Markov Chain Monte Carlo in Practice, ed. D. S. W.R. Gilks, S. Richardson, 241–258. Chapman and Hall, London. Goldenberg, J., B. Libai, and E. Muller. 2001. Talk of the network: A complex systems look at the underlying process of word-of-mouth. Marketing letters 12 (3): 211–223. Görür, D., F. Jäkel, and C. E. Rasmussen. 2006. A choice model with infinitely many latent features. In Proceedings of the 23rd international conference on Machine learning, 361–368. ACM. Granovetter, M. 1978. Threshold models of collective behavior. American journal of sociology:1420–1443. Hanneke, S., W. Fu, E. P. Xing et al. 2010. Discrete temporal models of social networks. Electronic Journal of Statistics 4:585–605. Hardie, B. G., E. J. Johnson, and P. S. Fader. 1993. Modeling loss aversion and reference dependence effects on brand choice. Marketing science 12 (4): 378–394. Harvey, D. 1989. The condition of postmodernity, Volume 14. Blackwell Oxford. Holley, R. A., and T. M. Liggett. 1975. Ergodic theorems for weakly interacting infinite systems and the voter model. The annals of probability:643–663. Jackson, M. O. 2010. Social and economic networks. Princeton University Press. Jaynes, E. T. 1957. Information theory and statistical mechanics. Physical review 106 (4): 620. Kelly, B., H. Lustig, and S. Van Nieuwerburgh. 2013. Firm volatility in granular networks. NBER Working Paper No. 19466. Kempe, D., J. Kleinberg, and Éva Tardos. 2015. Maximizing the Spread of Influence through a Social Network. Theory of Computing 11 (4): 105–147. Kermack, W. O., and A. G. McKendrick. 1927. A contribution to the mathematical theory of epidemics. In Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, Volume 115, 700–721. The Royal Society. 21 Kirman, A. P., and N. J. Vriend. 2001. Evolving market structure: An ACE model of price dispersion and loyalty. Journal of Economic Dynamics and Control 25 (354): 459 – 502. Agent-based Computational Economics (ACE). Kök, A. G., M. L. Fisher, and R. Vaidyanathan. 2009. Assortment planning: Review of literature and industry practice. In Retail supply chain management, 99–153. Springer. Krivitsky, P. N. 2012. Exponential-family random graph models for valued networks. Electron. J. Statist. 6:1100–1128. Kuksov, D., and J. M. Villas-Boas. 2010. When more alternatives lead to less choice. Marketing Science 29 (3): 507–524. Lee, Y., J. A. Nelder, and Y. Pawitan. 2006. Generalized linear models with random effects: unified analysis via h-likelihood. CRC Press. LeSage, J., and R. K. Pace. 2009. Introduction to spatial econometrics. CRC press. LeSage, J. P. 1997. Bayesian estimation of spatial autoregressive models. International Regional Science Review 20 (1-2): 113–129. Leydesdorff, L., and C. S. Wagner. 2008. International collaboration in science and the formation of a core group. Journal of Informetrics 2 (4): 317–325. Martínez-de-Albéniz, V., and A. Sáez-de-Tejada. 2014. Dynamic choice models for cultural choice. Working paper, IESE Business School. Møller, J., A. N. Pettitt, R. Reeves, and K. K. Berthelsen. 2006. An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants. Biometrika 93 (2): 451–458. Morris, M., M. S. Handcock, and D. R. Hunter. 2008. Specification of exponential-family random graph models: terms and computational aspects. Journal of statistical software 24 (4): 1548. Murray, I., Z. Ghahramani, and D. J. C. MacKay. 2006. MCMC for doubly-intractable distributions. In Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI-06), 359–366. AUAI Press. Nasini, S., V. M. de Albeniz, and T. Dehdarirad. 2017. Conditionally exponential random models for individual properties and network structures: Method and application. Social Networks 48:202 – 212. Ng, S., and P. Perron. 2001. Lag length selection and the construction of unit root tests with good size and power. Econometrica 69 (6): 1519–1554. Olfati-Saber, R., A. Fax, and R. M. Murray. 2007. Consensus and cooperation in networked multi-agent systems. Proceedings of the IEEE 95 (1): 215–233. Robins, G., P. Pattison, Y. Kalish, and D. Lusher. 2007. An introduction to exponential random graph (p∗ ) models for social networks. Social networks 29 (2): 173–191. Schelling, T. C. 2006. Micromotives and macrobehavior. WW Norton & Company. Simons, G., and N. Johnson. 1971. On the convergence of binomial to Poisson distributions. The Annals of Mathematical Statistics 42 (5): 1735–1736. Train, K. 2009. Discrete choice methods with simulation. Cambridge university press. Tversky, A., and I. Simonson. 1993. Context-dependent preferences. Management science 39 (10): 1179–1189. Wang, C., W. Chen, and Y. Wang. 2012. Scalable influence maximization for independent cascade model in large-scale social networks. Data Mining and Knowledge Discovery 25 (3): 545–576. 22 Wang, P., K. Sharpe, G. L. Robins, and P. E. Pattison. 2009. Exponential random graph (p∗ ) models for affiliation networks. Social Networks 31 (1): 12–25. Weisstein, E. W. 2016. Gauss’s Inequality. Zeger, S. L., and M. R. Karim. 1991. Generalized linear models with random effects; a Gibbs sampling approach. Journal of the American statistical association 86 (413): 79–86. 23 Appendix A: Proofs Proposition 1 Proof. We start by noting that the unimodality of P (xi,t | xi,t−τmin . . . xi,t−τmax ) is equivalent to the unimodality of its logarithm. So we define f : Rn −→ R as f (xi,t ) = log P (xi,t | xi,t−τmin . . . xi,t−τmax ) and H : R|S| −→ R|S| as H(x1 , . . . , xn ) = [h0 (x1 )/h(x1 ) . . . h0 (xn )/h(xn )]T . A sufficient condition for f to be unimodal is that, for any positive ε, it verifies the following two implications: i) if ∇f (z) < 0 then ∇f (z + ε) < 0 (componentwise); i) if ∇f (z) > 0 then ∇f (z − ε) > 0 (componentwise). Note that ∇f (xit ) = Since δ = Pτmax `=τmin τX max Γ` g(xi,t−` ) − H(xit ) − log Z(γ)e. `=τmin Γ` g(xi,t−` ) − Z(γ)e is a constant, based on Assumption 1, for any component s ∈ S we can verify that i) if δs < Hs (xi,t ) then δs < Hs (xi,t + ε); ii) if δs > Hs (xi,t ) then δs > Hs (xi,t − ε); where δs and Hs are the sth components of δ and H respectively. Proposition 2 Proof. Consider the conditional distribution xi,t | xi,t−τmin . . . xi,t−τmax , as obtained from (2) and maximize it with respect to xi,t . From Proposition 1 we define ∇f (xit ) and obtain the first order condition for the mode of xi,t | xi,t−τmin . . . xi,t−τmax τX max `=τmin Γ` g(xi,t−` ) = h0 (mod(x1,it )) h(mod(x1,it )) .. . h0 (mod(x1,it )) h(mod(x1,it )) τX max Γ` g(xi,t−` ) , and mod(xit ) = H −1 `=τmin The second equality comes from the fact that Assumption 1 guarantee that H is invertible. Proposition 3 Proof. Consider the Laplace method used to approximate integrals of the form s Z 1 2π 1 exp(f (x))dx ≈ exp (mod[f ]) 00 h(z) f (mod[f ]) h(z) 24 where mod[f ] is the maximizer of f and f 00 (mod[f ]) is the Hessian of f at point mod[f ]. In our case, τX max T f (x) = ψ (xit ) Γ` g(xi,t−` ) + log xit `=τmin After noting that f 00 (mod[f ]) is diagonal, we replace the values corresponding to the PIDC model (1) and defining mit = mod[xi,t | xi,t−τmin . . . xi,t−τmax ] and µit = E[xi,t | xi,t−τmin . . . xi,t−τmax ], we have √ τX max 2π (mit )2ψ T E[xi,t | xi,t−τmin . . . xi,t−τmax ] = exp ψ (mit ) Γ` g(xi,t−` ) h(mit ) Z(γ) `=τmin √ V[xi,t | xi,t−τmin . . . xi,t−τmax ] = τX max 2π (µit − mit )2ψ T exp ψ (mit ) Γ` g(xi,t−` ) h(mit ) Z(γ) `=τmin where E[ . ] and V[ . ] are respectively the expectation and the variance operators. Proposition 4 Proof. Given an an exponential random model on a sample space X with vector of natural parameter and sufficient statistics γ and T respectively, let qγ (x) = exp(T (x)T γ)/h(x) and w : X → R. Equation (4) can be deduced by differentiation: ! X d ∂ 1 E [x] = + xqγ (x) dγ ∂γ Z(γ) x∈χ ! =− X x∈χ xqγ (x) X X ∂qγ (x) x ∂γ x∈χ ! T (x)qγ (x) x∈χ 1 Z(γ)2 ! + 1 Z(γ) X ! T (x)xqγ (x) x∈χ 1 Z(γ) = −E [T (x)] E [x] + E [xT (x)] = Cov [x, T (x)] . In the case of the the conditional model (1), the sufficient statistic is T`ss0 = xs,i,t g(xs0 i,t−` ). Thus, we claim that ∂ E [xsit | xi,t−τmin . . . xi,t−τmax ] = g(xs0 i,t−` )V [xsit | xi,t−τmin . . . xi,t−τmax ] , ∂γ`ss0 (13) for each i ∈ I, s ∈ S. Proposition 5 Proof. For every item i ∈ I and every time t ∈ T , consider model (2) and the probability of the total individual outcomes: P X xsit = yit xt−τmin , . . . , xt−τmax ! ∝ Qγ (yit ) := X xs1t +...+xs|S|t =yst s∈S 25 1 (hist )ψ exp ψ τX max `=τmin γ`ss0 Gss0 i`t max Let γmax be the maximum element among the influence structure {Γ` }τ`=τ . Then the following min upper limits can be deduced: max{ψ,1} Qγ (yst ) ≤ exp ((τmax − τmin )γmax (yst g(yst ))) 1 X Y xs1t +...+xs|S|t =yst xsrt ! r∈S max{ψ,1} |S|yst exp (γmax (yst g(yst ))) ; = yst ! [multinomial theorem] Since Qγ (yst ) < ∞ for any real yst , the a normalizing constant exists when Qγ (yst ) goes to zero when yst grows large. By applying the ratio test to the convergence of the series, we find |S|y+1 y! exp (γmax ((y + 1) g(y + 1))) y→∞ |S|y (y + 1)! exp (γmax (y g(y))) L = lim |S| exp (γmax (((y + 1) g(y + 1)) − (y g(y)))) y→∞ y = lim Note that a sufficient condition for Z(γ) < ∞ for all γ ∈ R is that L < 1. Convergence is guaranteed if g is non-decreasing and bounded from above, then, for all y ≥ 0, we have (((y + 1) g(y + 1)) − (y g(y))) ≤ 1. Proposition 6 Proof. Consider the PIDC model (2) and let S0∗ be the optimal first stage solution of (11), under a given PIDC model specification. Note that for any S0 ∈ 2|S| , verifying the capacity constraint, we have σ(S0∗ ) ≥ σ(S0 ). Based on (12) we see that σ(S0 ) ≥ 1 T exp ψu Z(γ, S0 ) τX max Γ` u , `=τmin where u is an arbitrary vector with norm |S ∗ |. Since this is true for all u, it is also true for the maxi P 1/2 P T τmax τmax Γ . u, subject to ||u|| = |S |, which is known to be the norm of mizer of u Γ 0 ` ` `=τmin `=τmin Thus, we can write 1/2 τ max 1 exp (ψρ) X σ(S0 ) ≥ |S ∗ | exp ψ Γ` ≥ Z(γ, S0 ) Z(γ, S0 ) `=τmin 1 where ρ is the spectral radius of P τmax `=τmin Γ` 1/2 . 26 Appendix B: PIDC micro-foundation This Appendix derives the statistical model for influence discovery (2) from a micro-founded decision setting when broadcasting companies are assume to target pre-defined play-list policies. Each station s ∈ S wishes to pursue an optimal play-list polity, under a limited capacity of N maximum mutually exclusive portions (song’s plays)6 . Thus, the total number of plays is fixed and P must be distributed among |I| songs: i∈I xsit = N . e ss0 `t = [G e s,s0 ,1,`,t . . . G e s,s0 ,|I|,`,t ]T ∈ R|I| be a measure of similarity between At a given period t, let G the |I| choices of station s at time t and the ones of station s0 at time period t − `. Stations payoff over feasible play-list policies are are axiomatically derived. Axiom 1 (Monotonicity with respect to the influence level). The optimal value of xsit increases with e s,s0 ,i,`,t , for any other station s0 ∈ S at period t − `. respect to the target similarity G Axiom 2 (Pareto-efficiency). The optimal play-list policy is Pareto-efficient with respect to two objectives: (i) maximize the likelihood under a fully stochastic setting, (ii) maximizing the pre-defined target similarities. Axiom 3 (Myopic with respect to the future). The optimal play-list polity at time t of a given station s ∈ S does not take into account the best response policy of any other stations s0 ∈ S/{s} (stations are ’future outcomes taker’). Axiom 4 (Myopic with respect to the influence structure). The optimal value of xsit must be increase s,s0 ,i,`,t , for any other station s0 ∈ S at period t − ` (stations ing with respect to the target similarity G are ’influence structure taker’). Axiom 5 (Invariance to a uniform change in the influence level). When all the similarities are multiplied by a common constant, the resulting station utility shall be proportional to that constant. Q In a fully stochastic setting, the probability of a particular assignment xst is N !( i xsit !)−1 |I|−N . Thus, for each station s at time t, the class of play-lists policies verifying the aforementioned assumptions consists in the maximization of the weighted geometric mean between item targets and relative frequency: 1−ϕ !ϕ max Y Y τY Y N! e ss0 i`t )γ`ss0 xsit /N |I|−N (G , (14) us (xst ) ∝ xsit ! 0 i∈I s ∈S `=τmin i∈I | {z }| {z } likelihood score target similarity level P where ϕ ∈ [0, 1] is a known weight, s0 ,` γss0 ` = 1 and P (xsit ) ∝ xsit /N . In this context, the play-list e ss0 i`t is treated as a constant with respect to current decisions (they might have been similarity G estimated from past observations or simply assumed by the station). There are N |S| possible assignments when portions and items are distinguishable. When the number of items per Q each portion n1 . . . nM is fixed, there are N !/ j nj possible assignments. 6 27 The optimal play-list policy can be obtained based on the maximum entropy principle derivation by Jaynes (1957). In fact, when N is sufficiently large, taking the logarithm of (14) and using the Stirling’s approximation give rise to argmax us (xst ) ≈ argmax − ϕ X xsit (log xsit − 1) + (1 − ϕ) i∈I XX θsitj j∈J i∈I The play-list (16) is then derived from first order condition on (15): τX max X X 1 − ϕ e ss0 i`t , with P (xsit ) ∝ exp γ`ss0 log G ϕ 0 `=τmin i∈I s ∈S X xsit log Tsitj . N P (xsit ) = 1. (15) (16) i∈I In this decision setting, the pairwise influences effects γ`ss0 are known by the station —as the weights of the geometric mean in (14)—, who take decisions based this pre-defined imitation pattern. The resulting play-list policy (16) follows an exponential random model, whose complete specification e ss0 i`t 7 . depends on the design of G For mathematical tractability and for the interesting statistical properties described in the next e ss0 i`t is adopted for the general class of subsection, a re-parametrization of the influence measure G PIDC models. The following re-parametrization is considered e ss0 i`t G log(hist ) = exp Gss0 i`t − , γ`ss0 so that P (xsit ) ∝ 1 (hist )ψ exp ψ τX max XX γ`ss0 Gss0 i`t . `=τmin i∈I s0 ∈S As discussed in the next subsection, when a collection of dynamic broadcasting decisions is observed (in terms of a three dimensional panel {xist | i ∈ I, s ∈ S, t ∈ T }), h and G play the respective roles of underlaying measure and the sufficient statistics of the so defined exponential random model. This specification allows translating the strategic model for companies decisions into a statistical framework for the empirical estimation of γ. 7 Most well-known probability distributions belong to such family, which can take several forms when modeling highly dimensional data. 28 Appendix C: Influence plots (a) Rock music format, Top-50 training set. (b) Rock music format, Top-200 training set. Figure 8: Network plots of the estimated pairwise influences between rock music stations. (a) Contemporary format, Top-50 training set. (b) Contemporary format, Top-200 training set. Figure 9: Network plots of the estimated pairwise influences between contemporary music stations. 29 (a) World music format, Top-50 training set. (b) World music format, Top-200 training set. Figure 10: Network plots of the estimated pairwise influences between world music stations. (a) Top-40 music format, Top-50 training set. (b) Top-40 music format, Top-200 training set. Figure 11: Network plots of the estimated pairwise influences between top-40 music stations. 30