Download Pairwise influences in dynamic choice: method and application

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Pairwise influences in dynamic choice: method and application
Stefano Nasini∗
Victor Martínez-de-Albéniz†
February 11, 2017
Abstract
Choices of different individuals over time exhibit pairwise associations in a wide range of
economic contexts. Network models for the processes by which decisions propagate through social
interaction have been studied before and widely applied to marketing, but only a few consider
unknown network structures. In fact, while it is typically possible to directly observe individual
choices, inferring individual influences (who influences who) might be difficult in the general case,
as it requires strong modeling assumptions on the cross-section dependencies of the associated
multidimensional panels. This paper proposes a class of exponential random models to jointly deal
with dynamic choices of individuals over items together with the structure of pairwise influences
between them. We analyze the properties of the model and develop a statistical methodology to
estimate its parameters. We then present an empirical marketing analysis of music broadcasting,
where a set of songs are diffused over radio stations; we infer station-to-station influences. After
uncovering the influences in the station network, we analyze the problem of deciding which station
one should choose to first launch a song.
Key words: Diffusion on social networks, pairwise influences, multidimensional panel data, influencer marketing, music industry.
1
Introduction
When studying choices of multiple agents, spill-over and imitation emerge as a consequence of social
interactions in a wide range of economically-relevant contexts. They have been proved to be capable
of affecting individual and social outcomes to a large extend (Granovetter 1978, Schelling 2006,
Goldenberg et al. 2001). Clear examples are demand-side economies of scale – where the attractiveness
of a commodity increases as a function of the total number of consumers (Blind 2004) – and also several
classes of monopolistic competition – where product differentiation depends on the imitation between
∗
†
Corresponding author: IESEG School of Management, (LEM CNRS 9221), Lille/Paris, France, [email protected]
IESE Business School, University of Navarra, Barcelona, Spain, [email protected]. V. Martínez-de-Albéniz’s re-
search was supported in part by the European Research Council - ref. ERC-2011-StG 283300-REACTOPS and by
the Spanish Ministry of Economics and Competitiveness (Ministerio de Economía y Competitividad) - ref. ECO201459998-P.
1
companies (Dixit and Stiglitz 1977). In fact, this condition covers a wide range of market structures
where decision makers are influenced1 by the decisions of their competitors though imitation, such as
in the information technology, apparel retailing or music broadcasting industries. In the latter case,
which is the application that we present in this paper, the degree of synchronized similarity between
programme schedules of broadcasting stations might provide a hint on how much they imitate each
other. From a statistical point of view, influence happens when the pattern of individual decisions is
systematically anticipated by the ones of other individuals. In this case, we detect a repeated lagged
association from which pairwise influences are empirically inferred.
Network influence models have been already studied in different contexts (LeSage 1997, Arcidiacono et al. 2012, Granovetter 1978, Schelling 2006, Zeger and Karim 1991, Lee et al. 2006), see §2
for more details. Most of the existing references assumes a well known influence structure and focus
on the problem of studying diffusion patterns and optimal starting conditions. In fact, the multiplier
effect that a sequence of influences might produce results in amplifying or shrinking the diffusion
of ideas and choices. In contrast, we focus on the reverse problem of network discovery, by which
observed decisions are tracked to detect patterns of imitation.
This paper proposes a novel parametric approach — we call it Pairwise Influences in Dynamic
Choice models (PIDC models, from now on) — to infer network patterns from cross-section dependencies in dynamic choice settings, based on the observation of sequential decisions of multiple
individuals over multiple items. Formally, we observe a multidimensional panel with three dimensions
{xist | i ∈ I, s ∈ S, t ∈ T }, namely the realization of a triple-index process defined on a suitable
probability space, where i is the item dimension (e.g., a song), s is the individual dimension (e.g., a
broadcasting station), t is the time dimension (e.g., a week)2 . In this analysis, there is one central
question regarding the model design: how to internalize in a parsimonious way the dynamic nature
of cross-section dependencies and their effect on the diffusion and propagation across time (Bailey
et al. 2012). Specifically, we construct an exponential random model for multidimensional panels that
internalize the effect of pairwise interactions between individuals on the joint distribution of the dynamic choices on items (Robins et al. 2007). Our model includes effects that connect individual-item
levels across periods, based on hidden network patterns of dynamic similarities between individual
decisions with respect to items. With such specification, we develop several analytical properties of
how model parameters affect the probability distributions of panel samples.
The advantage of using an exponential random model over existing network influence models –
such as the linear thresholds model, by Granovetter (1978) and Schelling (2006), and the independent
1
The term influence is here adopted in a comprehensive sense to generalize the ones of imitation, and spill-over,
which are used in more particular context, such as social psychology and microeconomics respectively.
2
This kind of data structure can be straightforwardly represented in term of dynamic two-mode valued networks –
a time-dependent real-valued association between two sets of nodes, say S (for individuals), and I (for items) – by
establishing a correspondence between panel dimensions and network layers. This allows an immediate definition of
individual correlations, in terms of the projection of the two-mode layers (individual–item connections) into a one-mode
network (individual–individual connections), where links represent pairwise interactions between individuals such as
transmission, spill-over or imitation, as described by Leydesdorff and Wagner (2008).
2
cascade model by Goldenberg et al. (2001) – is the possibility to support a large variety of model
specifications, including the recent empirical findings in influence marketing (Cialdini 2009, Brown
and Fiorella 2013).
Our statistical methodology is then applied to the diffusion of songs across broadcasting stations
in the UK. Our data contains the number of plays of each song over many broadcasting stations.
These bring to the market highly correlated program schedules – i.e., choices of songs with high crosssection dependencies arising as a result of pairwise imitations. Our aim is to determine the influence
of one station on another. We show that, despite the large amount of songs in the item dimension
of the panel, the pattern of imitation between broadcasting stations (cross–section dependencies in
the individual dimension) remains constant when only few of them are used as a training set for
estimation.
Our results thus provide an empirical estimation of the strength of station-to-station influences,
which can be used as a score for the ability of conditioning the future choices of other players in the
market. With knowledge of the influence structure, music producers can better choose which station
to choose for first diffusing their products. Indeed, a direct implication of the proposed modeling
framework is to optimize introduction decisions in the music industry, assessed in the final part of the
paper. We show that influence effects between broadcasting companies are determinant to maximize
the diffusion of a song in the first weeks after launch, but these effects fade away with time as the
song reaches the entire network.
The rest of the paper is organized as follows. Section 2 reviews the three streams of literature
which are relevant in our analysis: econometric models of dynamic choice, multidimensional panel
data and exponential random graph models. Section 3 introduces and describes the music data set,
along with the relevant measurements we aim to control in our probabilistic model. Section 4 provides
a detailed description of the proposed exponential random model for this type of data set, embeds
such model in a general Bayesian framework and discusses the algorithmic aspects of the estimation
method. The numerical results are presented in Section 5. Section 6 describes how the estimated
model can help decision-makers better choose the stations where songs are released first, and Section
7 concludes. All the mathematical proofs of propositions are reported in Appendix A and additional
foundations and plots in Appendices B and C.
2
Literature review
This paper is connected to several streams of literature, which are for convenience grouped in three
main fields: multidimensional choice models for marketing analytics, random models of network formation, diffusion through networks.
Multidimensional choice models for marketing analytics
Choice models consist in the prob-
abilistic selection from sets of mutually exclusive alternatives (Train 2009, Görür et al. 2006, Kuksov
and Villas-Boas 2010, Burda et al. 2009). In such context, multidimensional panel data generally
3
appears as a sequence of multiple choices (or outputs) by a fixed collection of individuals (companies,
customers, etc.), e.g., Gao and Hitt (2004), Amemiya (1985).
Multinomial logit models are typically adopted as a referential framework for individual choices
(Kök et al. 2009, Martínez-de-Albéniz and Sáez-de-Tejada 2014), which can be easily generalized
to cases of infinite alternative by the Poisson asymptotic behavior (Simons and Johnson 1971), or
properly parameterized to account for time-dependent properties in dynamic choice settings and
correlated individual decisions (Tversky and Simonson 1993, Hardie et al. 1993). In fact, a common
issue when studying dynamic choices of multiple individuals over a set of items concerns the presence
of cross-sectional dependencies either in the item or in the individual dimension. Despite the relevance
of such problem, only recently cross-sectional dependencies have become central in marketing analytics
and in the general econometric literature (Anselin 1988, LeSage and Pace 2009, Chudik et al. 2011).
In fact, in the context of individual decisions, inferring influences, transmissions and spill-overs (who
influences who) requires strong modeling assumptions about the correlation structure of the individual
dimension.
In some cases, when interactions between spatially distributed consumers, retailers or manufacturers are specified, spatial autoregressive processes have also been considered as models of panel
data (LeSage 1997). Besides spatial proximity, cross-correlations of errors could arise as a result of
interactions within socioeconomic networks, when an underlying structure of externalities, spill-overs
and imitation between individual decisions can be assumed. For example Arcidiacono et al. (2012)
estimates individual spill-overs in an educational context by internalizing the within-group similarities
of individual outcomes and choices. This is also the case of interest of this paper, where individual
decisions – such as broadcasting decision of companies in the music industry – depend on the decisions
of other individuals. In general, the explicit inclusion of pairwise spill-overs is hard to find in the
panel data literature; systematic studies of such a structural analysis can only be encountered in the
random graph and complex network literature, as discussed in the next paragraphs.
Random models of network formation
As noted by Castro and Nasini (2015), random models
of network formation infer connection structures from purely stochastic processes where connections
appear randomly in accordance with some distribution.
Among one of the most general and widely adopted models in this class are the well-known
exponential random graph models (Robins et al. 2007), ERGM from now on. These allow for a
straightforward characterization of network features – such as propensities for homophily, mutuality,
and triad closure – through choices of the sufficient statistics used in the model (Morris et al. 2008).
In recent years, ERGM have been extended to embrace more complex settings, such as the ones
associated to dynamic networks (Hanneke et al. 2010, Desmarais and Cranmer 2012), valued networks
(Krivitsky 2012), two-mode networks (Wang et al. 2009), mudual dependency between indivdual
properties and network structure (Nasini et al. 2017). Enlarging the range of applicability of the
ERGM to such cases allows the inclusion of many relevant statistical problems in the field of complex
networks.
4
A further step into this generalization can be made by noting that ERGM do not need to be
limited to explicitly observed networks: sufficient statistics can be included to mirror the hidden
patterns of connection inferred from observed cross-section dependencies. In fact, statistical settings
in which a given set of individuals S is supposed to choose (or to be associated with) elements
from a second set of items I commonly appear in several applications, such as scholars associated
to papers, customers associated to companies, etc. From this perspective, our approach is related
to this stream of literature, where a class of exponential random models is adopted to estimate the
hidden associations within the individual dimension S of the multidimensional panel (station-tostation influences).
Diffusion through networks
Classical diffusion models studied over the years are the Bass differ-
ential diffusion model (Bass 1969), and the SIR epidemiological models (Kermack and McKendrick
1927). They account for the establishment and spread of diffusion over a fixed population, without
explicitly taking into consideration connections at individual level. However, when a structure of
pairwise influences, transmissions and spill-overs is known, diffusion and propagation can be studied
as fundamental processes taking place over the edges from node to node.
As noted by Kempe et al. (2015), network diffusion processes have a long history of study in
the social sciences. More than forty years ago, the DeGroot learning model (DeGroot 1974) was
one of the first approaches in this area. His basic intuition was to design the choice dynamics of a
node from period to period by averaging out the choices of its neighbours. From the viewpoint of
the mathematical sociology, Granovetter (1978) and Schelling (2006) have parallelly investigated the
linear thresholds model. The underlying idea was to study binary states of nodes (i.e., active versus
inactive), in such a way that nodes have independent thresholds which represent the total weight
which has to be exerted by active neighbors in order to become active. A similar model has been
proposed in the context of marketing by Goldenberg et al. (2001), where any active nodes is capable of
activating his neighbors with a fixed independent probability. This is called the independent cascade
model. The DeGroot learning model, the linear thresholds model and the independent cascade model
are among the most widely studied mathematical frameworks for diffusion of influence taking place
over the edges from node to node. Unfortunately, despite their large range of applicability, none
of them allows for the simultaneous inclusion of multidimensional choices over a set of items whose
attractiveness is non-homogeneous in time, as for music songs. There are alternative approaches to
deal with diffusion and propagation, based on pure simulation schemes, such as multi-agent systems
(Olfati-Saber et al. 2007, Kirman and Vriend 2001), but they do not allow to statistically obtain
inferential insights from empirical observations.
In our statistical setting, the empirical probability of station-to-station influences – representing a
score for the ability of conditioning the future choices of other players in the market – allows diffusion
patterns to be probabilistically characterized.
5
3
The context: data on music broadcasting
We collected data about the program schedules of songs played in broadcasting companies (TV
channels and radio stations) in the UK, from January 1, 2011 to March 31, 2014. The data format
consists in a sequence of tables identifying a unique moment (day and hour) in which a specified song
is played in a specified broadcasting company. The information was obtained from BMAT, which has
developed a technology (Vericast) to monitor the songs being played in real time on any radio station
in any country. (This is done by identifying the musical ‘fingerprint’ of each song and matching it
with their large database of songs.) The database included information about artists and names of
each song, day and time of the day being played, and radio station of that play, but for simplicity we
aggregated these plays by week across all stations. Table 1 describes the data set – an exploratory
statistical analysis of such data sets has been carried out by Martínez-de-Albéniz and Sáez-de-Tejada
(2014), with an emphasis on the time-variation of song popularity.
Stations
22
Artists
32,765
Songs
74,712
Time periods (weeks)
255
Table 1: Descriptive statistics of the two data sets.
Despite the large amount of songs broadcasted, only few of them are frequently played: 1, 180
songs out of 65, 531 capture half of the total plays (50% market share). Moreover, the number of
songs whose market share is at least a half of the most broadcasted song is just 30. These facts, as
highlighted by the Pareto diagram in Figure 1, suggest that very few songs are quantitatively relevant
in the analysis of dynamic broadcasting patterns.
(a) All the 65, 531 songs played.
(b) Top 1, 180 songs played.
Figure 1: Pareto diagram of the number of plays.
A similar behavior can be observed at the artist level. Among the most played artists is Bruno
Mars. The weekly number of plays of two of his most popular songs is shown in Figure 2. They
exhibit a dynamics which resembles the short life cycle of fashion goods: the demand of songs evolves
6
on a time window in which their popularity increases shortly after launch and then decreases3 .
(a) Bruno Mars, Just the way you are in the UK.
(b) Bruno Mars, Locked Out Of Heaven in the UK.
Figure 2: Time plots of numbers of plays of two songs of Bruno Mars across 22 stations.
Within the British data set, broadcasting stations are partitioned in accordance with their music
format – World-music (2 stations), Contemporary and Easy listening (7 stations), Rock music (6
stations), Top 40 and Urban (7 stations). This classification is based both on the Wikipedia description of the broadcasting companies and the information resulting from their corresponding webpages.
Within a given radio format, individual decisions are much more homogeneous and the corresponding
number of plays appear positively correlated. We take advantage of this in the model specification,
see §4.3.
The overall picture emerging from this data description suggests the possibility of assessing the
presence of pairwise influences in the individual broadcasting decisions within and between music
formats. Influence happens when the pattern of plays of a collection songs by a given station in a
given period is systematically followed by other stations. In this case we empirically detect repeated
pattern which is the basis of our pairwise influences estimation.
A probabilistic model for this particular type of network discovery is proposed in the next section.
4
Model definition and specification
A statistical model for influence discovery is presented in this section.
4.1
The PIDC model
Let S and I be the individual and item dimensions of a three dimensional panel and let xst =
[xs1t xs2t . . . xs|I|t ]T , xit = [x1it x2it . . . x|S|it ]T and xt = [x1t x2t . . . x|S|t ] be the associated |I|dimensional, |S|-dimensional vectors and |I| × |S|-dimensional matrix correspondingly. Thus, xsit
represents the valued connection between individual s ∈ S and item i ∈ I, which can be interpreted
3
When presenting our modeling framework, this time effect of songs diffusions has been treated as an exogenous
social trend (possibly due to an appetite for novelty, or to a planned marketing strategy) which affects broadcasting
decisions without been affected by them (Harvey 1989). This simplifying assumption is made explicit in §4.3 and allows
estimating the shape (life cycle) of this exogenous social trend and its influence on each station.
7
as the decision that the sth individual takes with respect to the ith item (e.g., the number of plays of
song i in station s) at time t.
Let h : R −→ R be non-negative and non-decreasing and G : R2 −→ R be defined as G(x, y) =
xg(y), where g is a non-decreasing real valued function4 . Sometimes the shorter notation hist and
Gss0 i`t shall be adopted instead of h(xist ) and G(xist , xis0 (t−`) ). The latter is defined to capture
the dynamic network pattern of cross-sectional dependencies, in terms of lagged similarities between
individual choices.
We build a conditional model for the decision xsit that the sth individual takes with respect to
the ith item at time t:

P (xsit | xit−τmin . . . xit−τmax ) ∝
1
exp ψ
(hist )ψ

τX
max
X
`=τmin
s0 ∈S
γ`ss0 Gss0 i`t  .
(1)
The joint probability distribution of the described three-dimensional panel (and its dynamic twomode network correspondence5 ) can be properly defined as follows.
Definition 1 (Three-dimensional panel joint probability). Let the individual decisions at the first
τ periods x1 . . . , xτ be known. The joint probability of observing a sequence of |S| multidimensional
choices is defined by assuming conditional independence between individuals and items and applying
the product rule:
where the lag τmin
T 1
1
P (xτ +1 . . . xT | γ) =
exp ψx Γ̃g ,
(2)
Z(γ) hΠ (x)
Q
Q Q
> 0, the function hΠ (x) = t∈T i∈I s∈S (hist )−ψ , the function Z(γ) is a
normalizing constant (also known as the partition function), and γ̃ is the |T ||I||S|2 dimensional
vector obtained from stringing out the

Γτmin . . .


Γτmin



Γ̃ = 





elements of matrix Γ̃ in lexicographic order and

Γτ


...
Γτmax


Γτmin . . .
Γτmax



..
.


Γτmin Γτmin +1 

Γτmin
The components of vector g ∈ R|I||S||T | correspond to g(xist ), sorted in lexicographic order with respect
to items, individuals, and time periods.
The characterization of the sample space X ⊆ R|I||S||T | of this multidimensional random variable
P
can either include constraints on the play-list capacity i∈I xist = N , for each station s at time t,
4
5
The specification of h and g are exogenous and given by the user for each application.
Note that based on the previously introduced dynamic association between primary and secondary layers S and I,
the valued connections Gs,(s0 ,`),t,i result from the projection of the two-mode layers (individual–item connections) into
a one-mode network (individual–individual connections).
8
or allow for unrestricted choices. The latter case is assumed in the oncoming results, to avoid the
inclusion of unnecessarily tedious notation.
A micro-foundation of (2) is proposed in Appendix B, based on a decision setting where stations
select a randomized play-list policy.
Assumption 1 (The underlying measure). The underlying measure h : R −→ R in model (2) is such
that h0 (x)/h(x) is monotonically increasing and positive and h(0) ≥ 1.
Assumption 1 is verified by most of the well-known exponential random models, such as the
Poisson and the Gaussian model.
We denote with F the so-defined family of three-dimensional panels with probability measure
and note that not all values of γ lead to a well-defined probability distribution, due to the sufficient
statistic Gss0 i`t not being integrable. The domain D(F) is the set of all γs which lead to a well-defined
probability distribution, that is to say, D(F) = {γ ∈ Rq | Z(γ) < ∞}.
In the model, γ`ss0 controls the effect of choice similarities between pairs of individuals (s, s0 ) ∈
S × S – in the described application to the music industry, it captures a (possibly time-delayed)
spill-over or imitation between couples of stations. While we present here a general specification, we
give it more structure below, in §4.3.
Proposition 1. Consider the conditional distribution of xi,t | xi,t−τmin . . . xi,t−τmax , as obtained from
(1). P (xi,t | xi,t−τmin . . . xi,t−τmax ) is unimodal when h verifies Assumption 1.
Proposition 2. Consider again the conditional distribution of xi,t | xi,t−τmin . . . xi,t−τmax and let H :
R|S| −→ R|S| be defined as H(x1 , . . . , xn ) = [h0 (x1 )/h(x1 ) . . . h0 (xn )/h(xn )]T . Based on Proposition
1 and Assumption 1, we claim

mod[xi,t | xi,t−τmin . . . xi,t−τmax ] = H −1 
τX
max

Γ` g(xi,t−` ) − Z(γ)e
`=τmin
where mod[ . ] is the mode of the distribution.
Proposition 3. Let mit = mod[xi,t | xi,t−τmin . . . xi,t−τmax ] and µit = E[xi,t | xi,t−τmin . . . xi,t−τmax ].
Based on the Laplace approximation, when ψ grows large we have


√
τX
max
2ψ
2π (mit )
T
exp ψ
(mit ) Γ` g(xi,t−` )
E[xi,t | xi,t−τmin . . . xi,t−τmax ]
≈
ψ
h(mit ) Z(γ)
`=τmin
√
V[xi,t | xi,t−τmin . . . xi,t−τmax ]
≈


τX
max
2π (µit − mit )2ψ
T
exp ψ
(mit ) Γ` g(xi,t−` )
h(mit )ψ Z(γ)
`=τmin
where E[ . ] and V[ . ] are respectively the expectation and the variance operators.
max
Known inequalities can be used to show the impact of the influence structure {Γ` }τ`=τ
on the
min
diversity of choices (song variety) with respect to the mode. Based on the Gauss’s inequality (Gauss
9
1823) – using the version reported by Weisstein (2016)–, we have


τX
max
1
(32π)1/4 (µit − mit )ψ
T
exp  ψ
P (|xi,t − mit | ≥ k ) ≤
(mit ) Γ` g(xi,t−` )
2
9kh(mit )ψ/2 Z(γ)1/2
`=τmin
The immediate implication of the above inequalities is that the deviation of station play-lists
frequency from the mode decays monotonically with respect to the influence scales in the influence
max
structure {Γ` }τ`=τ
.
min
max
The sensitivity of PIDC models to the influence structure {Γ` }τ`=τ
can also be assessed, based
min
on general properties of the exponential family of distributions.
Proposition 4. Consider the conditional model (1). We claim that
∂
E [xsit | xi,t−τmin . . . xi,t−τmax ] = g(xs0 i,t−` )V [xsit | xi,t−τmin . . . xi,t−τmax ] ,
∂γ`ss0
(3)
for each i ∈ I, s ∈ S.
This supports our statement concerning the dynamic expectations in the presence of cross-section
dependencies, which result to be amplified or shrunk by the similarities between stations play-lists.
4.2
PIDC model specifications
This subsection deals with the application of (2) to different statistical settings, associated with
binary, count and continuous data. The aim is to assess the versatility of PIDC models to capture
different forms of influences and to generalize well-known statistical models.
Binary data specifications
X ∈
Consider data drawn from a multidimensional binary sample space
{0, 1}|I||S||T | .
xi,t
In this case, we define the underlying measure as h(xsit ) = 1.


max
Y
X τX
π it
| xi,t−τmin . . . xi,t−τmax ∼ Bern (π it ) , where
∝
exp 
γ`ss0 Gss0 i`t 
1 − πit
0
i∈I
(4)
s ∈S `=τmin
Note that for binary sample spaces D(F) = Rq , as long as g is a well-defined real valued function.
Note that this PIDC specification generalizes the voter model (see (Clifford and Sudbury 1973)
and (Holley and Liggett 1975) for more details). In fact, in the classic voter model, nodes randomly
pick at each stage one neighbor and adopt its choice. The binary specification of the PIDC model (4)
allows nodes to select choices which have not been selected by their neighbors. Moreover, it allows
for non-uniform preferences among neighbors and non-Markovian dependence thought the influence
max
structure {Γ` }τ`=τ
.
min
Count data specifications
Consider data, drawn from a multidimensional discrete sample space
X ∈ Z|I||S||T | . In this case, we define the underlying measure as h(xsit ) = xsit !. Proposition 5
establishes a sufficient condition of integrability.
10
Proposition 5. A sufficient conditions for D(F) = Rq is that g is bounded from above.
When no simultaneous dependencies are included (τmin > 0), individual outcomes have the following distribution:

xi,t | xi,t−τmin . . . xi,t−τmax ∼ Pois (λit ) , where λit = exp 
max
X τX
s0 ∈S

γ`ss0 g(xs0 (t−`) )
(5)
`=τmin
Thus, the influence effects result in the joint shift of the conditional mean and variance of each
item profile in each time period.
Continuous data specifications
tinuous sample space X ∈
R|I||S||T | .
Consider continuous data drawn from a multidimensional conIn this case, we define the underlying measure as h(xsit ) = 1.
We let Γ ∈ R|S|×|S| be positive definite, then we have
1
(6)
xit . . . xi,t−τ ∼ N (µ, Σ) , where µ = 0 and Σ = − Γ̃−1 .
2
An analogous multidimensional Gaussian model for economic influence between firms was proposed by Kelly et al. (2013). Similarly, when τmin > 0, we have
xi,t | xi,t−τmin . . . xi,t−τmax ∼ Exp (λit ) , where λit =
max
X τX
γ`ss0 g(xs0 (t−`) )
(7)
s0 ∈S `=τmin
and γ `s is the sth column vector of Γ` . Also in this case, constraints must be imposed to guarantee
P max
g(xs0 (t−`) ) < 0, for all i ∈ I, t ∈ T }.
Z(γ) < ∞, namely D(F) = {(γ) ∈ Rq | γ s τ`=τ
min
Note that this allows us to work with negative data, in contrast with the previous specification.
Equation (2) might be conceived as a generalization where the sample space can be both continuous
or discrete and the spill-over measurement is allowed to be arbitrarily specified, beyond the ones
verifying the conditions considered in this subsection.
4.3
Further specifications: social media and community structure
In its naive version, the PIDC models establish a conditional distribution of present individual states
(outcomes and decisions) as a function of past states, where the transaction happen through an
unknown influence structure that we wish to infer from the empirical observation. Sometimes the
pattern of influence can be assumed to be associated to a more structured setting, where individuals
belong to different groups or where some of them is more influential than others. These cases are
presented in this section.
Social media and the song life cycle
Let s∗ be a social media who is able to influence the
choices of individuals in S. The dynamic choices of s∗ over items in I are denoted as xis∗ ,t . They are
exogenous and cannot be affected by the ones of individuals in S. For every s ∈ S, the influence of
s∗ over s is included in the kernel of the PIDC model (2) as


τX
max
max
X τX
1
exp ψ
Gss0 i`t γ`ss0 Gss0 i`t + ψ
Gss∗ i`t 
P (xist | γ) ∝
(hist )ψ
0
s ∈S `=τmin
11
`=τmin
(8)
where Gss∗ i`t = xist xis∗ ,t−` . From Proposition 3, this implies that the conditional expectation is
shifted upward when the the social media starts broadcasting songs at a higher intensity. As discussed
in §1 and illustrated in the plots in Figure 2, the broadcasting pattern of songs exhibit a time window
in which their number of plays within the broadcasting industry quickly increases shortly after the
release and then decreases.
For any song i ∈ I, let t0i be the starting week when the song is launched. A possible social media
specification to account for the observed song life cycle is to define the exogenous attractiveness
trajectory of the ith song as a Gamma kernel:
(
δi0 + δi1 (t − t0i ) + δi2 log(t − t0i )
xis∗ t =
−∞
if t > t0i
otherwise
(9)
Thus, the social media dynamics is fully characterized by the tuning parameters δi0 , δi1 , and δi2 .
The community structure and the music formats
The presence of different music formats
gives rise to certain form of community structure in the network of broadcasting companies. Consider
a set of music formats KF = {Contemporary, Top-40, World music, Rock} and a set of of individual
broadcasting companies KC . Function κC : KC → KF assigns to each company a given format.
Then, for each time lag ` the influence structure Γ` can be restricted as follows:
γ`s0 r = γ`s1 r ,
for all s1 , s2 , r ∈ S, such that κC (s0 ) = κC (s1 ) 6= κC (r),
(10)
i.e., stations of the same format follow the same influence pattern with respect other formats.
Note that when this community structure is not taken into account, the dimensionality of the
parameter space of (2) is (τmax − τmin )|S|2 , which grows dramatically large when the number of
individuals (stations) increases (the dimensionality of the parameter space can exceed the one of the
sample space). The inclusion of this community structure allows reducing the dimensionality. In our
application, we partitioned the 22 broadcasting companies into 4 radio formats,so that the number of
influence effects reduces from 22 × 21 × (τmax − τmin ) to ((7 × 6) + (7 × 6) + (6 × 5) + 2) × (τmax − τmin ),
i.e., a reduction of 75%.
4.4
Estimation method
Classical inferential methods for the model parameters of (2) are encumbered by the intractability
of the normalizing constant 1/Z(γ), which makes the numerical optimization of the likelihood function very challenging. However, under very specific conditions, (2) might be reduced to well-known
probabilistic models, as discussed in §4.3. Under general conditions, different approaches exist in
the literature do deal with unknown normalizing constants: the pseudo-likelihood approach, proposed
by Besag (1975) in the context of the analysis of data with spatial dependencies; the Monte Carlo
Maximum Likelihood (MCML), introduced by Geyer (1996); or the auxiliary variable method, first described by Møller et al. (2006) in the context of Bayesian statistics. Murray et al. (2006) introduced
a computational improvement to the auxiliary variable method, resulting in a specialized MCMC
12
method, known as the exchange algorithm. All these approaches are useful and widely accepted
estimation methods in cases where the direct maximization of the likelihood is intractable.
Here we adapt the exchange algorithm to the PIDC class of models, by embedding the kernel of
the probability function (2) into a Bayesian framework (Gamerman and Lopes 2006). Let x(0) ∈ X be
the observed three-dimensional panel (in the form discussed in §4). Given a prior distribution π(γ),
(0)
(0)
(0)
(0)
we can apply the Bayes rule: P (γ | x1 . . . x|T | ) ∝ P (x1 . . . x|T | | γ) π(γ). It is well known that, if
π(γ) is uniform over the parameter space, then the mode of the posterior distribution is equal to the
maximum likelihood estimator of (2).
The exchange algorithm by Murray et al. (2006) is based on the simulation of the joint distribution
(0)
(0)
of the parameter and the sample spaces, conditioned to the observed data set x1 . . . x|T | . Specifically,
to generate the posterior parameter distribution, we use an arbitrary proposal distribution Q defined
on the same support as the prior and the posterior (which has no impact on the resulting estimation,
although it does affect the convergence time), and generate samples γ 0 and x0 , which will obey a
distribution governed by x(0) , P (x, γ | x(0) ).
Algorithm 1 MCMC method for PIDC models.
1: Initialize the posterior distribution of γ (arbitrarily)
2: repeat
3:
Draw γ 0 from Q
4:
if τmin > 0 then
8:
P (x(0) | γ 0 )π(γ 0 )Q(γ)
Accept γ with probability min 1,
P (x(0) | γ)π(γ)Q(γ 0 )
else
P (x0 | γ)P (x(0) | γ 0 )π(γ 0 )Q(γ)
Draw x0 from P (. | γ 0 ) and accept γ 0 with probability min 1,
P (x(0) | γ)P (x0 | γ 0 )π(γ)Q(γ 0 )
end if
9:
Update the posterior distribution of γ
5:
6:
7:
0
10: until Convergence
As summarized in step 7 of Algorithm 1, the double intractability is eliminated when simulating
from P (x, γ | x(0) ) by the Metropolis-Hastings method, based on a proper definition of the proposal
distribution – which is used to draw candidate points for the posterior. Note that in step 3 a new value
from the parameter space is randomly proposed, and the computation of the acceptance probability
of this candidate point depends on the whether we are able to characterize Z(γ). In the affirmative
case, such as when τmin > 0, the candidate point can be accepted based on the classical MetropolisHastings ratio in step 5. In the opposite case, we need the auxiliary variable step 7 to approximate
the unknown normalizing constant.
The chain converges to P (γ | x(0) ) in the limit. Clearly, even though we avoid calculating the
normalizing constant, this remains a computationally intensive procedure.
13
5
Empirical results
The application analyzed in this paper concerns the occurrences of songs in the play lists of broadcasting stations, described in §3. Based on the model specifications presented in §4.3, the main goal
is to estimate the dynamic attractiveness of songs (the exogenous social trend), along with the overall
max
structure of imitations, as defined in model (2). Specifically, the influence structure {Γ` }τ`=τ
and
min
the social media parameters {(δi0 , δi1 , δi2 )}i∈I is estimated using the Bayesian approach described in
§4.4.
To assess the robustness of this model specification to different occurrence of the broadcasting
process, the estimation has been carried out on two sub-groups of songs. In fact, Figure 1 revealed
that only few songs were quantitatively relevant in the size of the overall broadcasting patterns. As
noted in §3 the number of songs whose market share is at least a half of the most broadcasted song
is just 30. Based on this, we assess the robustness of the model specification, when only the first 50
songs – corresponding to 9% total market share – and the first 200 songs – corresponding to 21%
market share – are used for the estimation.
All the runs associated to the results presented here were carried out on a Dell PowerEdge R430
server with Xeon E5-2690 v3 CPUs and 128 GB of RAM, under a Window Server R2 operative
system.
5.1
Song life cycle and the influence structure
The numerical results presented hereby are taking into account both training sets represented by
the first 50 songs and the first 200 songs. The estimated song life-cycles after their launch week t0
are reported. The colored envelopes in Figures 3 and 4 shows a credible interval from the posterior
predictive distribution of the number of plays of an arbitrary song, using respectively the top-50 and
the top-200 songs as training sets.
(a) Top–5 songs.
(b) Bottom–5 songs.
Figure 3: Life cycle for songs: the black lines are the number of plays of the top 5 songs (left plot) and the bottom
5 songs (right plot), within the analyzed top 50 songs, from their release t0 . The colored envelope denotes their
expectation plus-minus twice standard deviation from the posterior predictive distribution, based on (9).
14
(a) Top–5 songs.
(b) Bottom–5 songs.
Figure 4: Life cycle for songs: the black lines are the number of plays of the top 5 songs (left plot) and the bottom
5 songs (right plot), within the analyzed top 200 songs, from their release t0 . The colored envelope denotes their
expectation plus-minus twice standard deviation from the posterior predictive distribution, based on (9)
.
The estimated social media parameters {(δi0 , δi1 , δi2 )}i∈I reflects the common life-cycle of song
across stations and allows for an accurate aggregate prediction of future dynamics based on the initial
success and propagation. These estimates provide an alternative to Martínez-de-Albéniz and Sáezde-Tejada (2014), where individual peaks are computed taking into account competing song releases.
Although the results are qualitatively similar for 50 and 200 songs, we observe a slightly increased
variability when comparing the envelope in Figure 4 (associated to a training set with the first 200
songs), with the one in Figure 3 (associated to a training set with the first 50 songs).
Using the top-50 and the top-200 songs as training sets, the network plots in Figure 5 shows the
corresponding expectations, based on the posterior distribution, for each pair of music formats (the
sizes of the depicted connections denote the expected values of γ, for all the marginal posteriors).
The size of the influence effects can be interpreted as the sensitivity of the influenced station to
increase its number of plays of a song if the influencer station played it more often than the average
in previous periods (recall that this average may be driven by the social media dynamic (9)). For
example, we can see that Top-40 radio stations are highly influenced from rock stations. When γ is
zero, the differentiation between broadcasting companies is controlled by the different effect of the
social media s∗ on the corresponding stations. Appendix C reports the estimated network plots of
pairwise influences within each of the four radio formats.
A second interpretation of these estimated influences is the consistency between both training
sets, supporting the robustness of the estimated influences and the exhaustively of the information
contained in the few most popular songs. Furthermore, we obtained a higher sparsity of the pairwise
influence pattern when the analyzed collection of song is enlarged (see also Appendix C).
15
(a) Top-50 songs training set.
(b) Top-200 songs training set.
Figure 5: Network plots of the estimated pairwise influences between radio formats.
5.2
The time reaction of influence
The problem of studying a time window in which individuals are influenced (and react to the stimulus
of others) is here taken into account from the statistical outlook of the lag-length selection in time
series (Ng and Perron 2001). This is particularly useful because it allows the model to determine
how fast imitation occurs. Specifically, one will be able to assess whether choice is more or less
simultaneous (with weight being put on low lags, e.g., ` = 1) or imitation takes its time (with more
weight put on higher lags).
The estimation of §5.1 was here carried out using τmin = 1 and τmax = 5 weeks. Table 2 reports the
frequency distribution of the estimate γ parameters at each time lag, corresponding to the numerical
results in §5.1 and §5.1.
The influences between radio formats appear to be low with respect to all the five lags. For
both training sets, the highest influence seems to appear after two and three weeks. A graphical
illustration of the dynamic influences within each music format is provided in Figures 6 and 7, for
the two respective training sets with the 50 and the 200 top songs.
This level of analysis allows selecting the optimal time lag of pairwise influences, which is crucial
to avoid overfitting in PIDC models. In fact, the high dimensionality of the parameter space of PIDC
models entails a parsimonious inclusion of lags in the model specification.
6
Influence maximization by optimal propagation
Beyond the ability to predict expected outcomes, well-defined probabilistic models which capture
pairwise influences can be helpful in choosing which station is best to diffuse a new song. As mentioned
in §2, the DeGroot learning model, the linear thresholds model and the independent cascade model
16
Training set
Distribution
Top-50 songs, Subsection §5.1
Top-200 songs, Subsection §5.1
Influence effects
Γ1
Γ2
Γ3
Γ4
Γ5
Min
0.0000
0.0000
0.0000
0.0000
0.0000
1st quartile
0.0102
0.0090
0.0099
0.0093
0.0086
Median
0.0375
0.0471
0.0580
0.0314
0.0386
Mean
0.1300
0.1245
0.1527
0.1082
0.1060
3rd quartile
0.1645
0.1725
0.2218
0.1487
0.1430
Max
1.2740
1.2902
1.4604
1.2348
1.2486
Min
0.0000
0.0000
0.0000
0.0000
0.0000
1st quartile
0.0029
0.0029
0.0027
0.0028
0.0026
Median
0.0254
0.0385
0.0455
0.0315
0.0207
Mean
0.0929
0.1264
0.1394
0.1117
0.1086
3rd quartile
0.1117
0.1681
0.1453
0.1556
0.1499
Max
1.9504
2.1971
2.2798
1.8299
2.0376
max
Table 2: Frequency distribution of the estimated influence structure {Γ` }τ`=τ
between radio formats.
min
are among the most widely studied mathematical framework for diffusion of influence taking place
over the edges from node to node. The PIDC models we described in this paper can be included in
this list, with the aim of designing a more flexible approach both as a process of diffusion and as a
mathematical framework for statistical inference.
Given a full specification of any of these influence models, an influence optimization problem
(Goldenberg et al. 2001, Olfati-Saber et al. 2007, Kirman and Vriend 2001) consists in choosing a
good initial set of nodes to target in order to maximize the future diffusion. The influence of a
set S0 of nodes, denoted σ(S0 ), might be expressed as the expected outcomes of nodes in S at the
end of the process or integrated over all the process, given that S0 is this initial active set. Influence
maximization problems are usually subject to a fixed cardinality constraint on S0 , so that the number
of initial individuals is fixed. Formally, it can be expressed as the following mathematical program:
max
S0 ⊆S
σ(S0 ), subject to |S0 | = k
(11)
For most of the influence models, it is NP-hard to determine the optimum set for influence maximization (Kempe et al. 2015). Moreover, it is not clear how to evaluate σ(S0 ) in polynomial time,
as its exact computation requires to integrate over a combinatorial set. Chen et al. 2010 and Wang
et al. 2012 proved that evaluating σ(S0 ) is generally #P-complete both for the linear threshold model
and the independent cascade model respectively. However, as shown by Kempe et al. 2015, it is
possible to obtain arbitrarily good approximations in polynomial time, by simulating the random
choices and diffusion process. In fact, for each simulated scenario κ, σκ (S0 ) is a submodular function
both for the linear threshold model and the independent cascade model. Thus, a natural greedy
hill-climbing strategy has been widely adopted to approximate the maximum influence to within a
factor of (1 − 1/e), where e is the base of the natural logarithm.
In this section we examine the influence maximization problem under the estimated PIDC model.
Since we focus on a concrete song, we drop the sub-index i in what follows, and define the influence
17
(a) Rock music.
(b) Contemporary music.
(c) World-music.
(d) Top-40 music.
Figure 6: Time series of the lagged influences within each music format, based on the top-50 songs training set.
(e) Rock music.
(f) Contemporary music.
(g) World-music.
(h) Top-40 music.
Figure 7: Time series of the lagged influences within each music format, based on the top-200 songs training set.
function as
σ(S0 ) =
X
E [xs,t0 +T | xt0 = y(S0 ), xt = 0 ∀t < t0 ]
s∈S
=
XX
wP (xs,t0 +T = w | xt0 = y(S0 ), xt = 0 ∀t < t0 )
(12)
s∈S w
=
XX
s∈S
where z =
[w, x]T
w
z
T 1
1
exp ψz Γ̃g ,
Z(γ, S0 ) hΠ (z)
and S0 has been explicitly included in the normalizing constant notation Z(γ, S0 )
to make the dependency explicit.
Proposition 6. Consider the problem of maximizing the expected number of plays at time t0 + T ,
under selecting an initial launching station at time t0 and let S ∗ be the optimal initial set. We claim
that
max σ(S0 ) ≥
S0
|S ∗ | exp (ψρ)
.
Z(γ, S ∗ )
where ||.||1 denotes the 1-norm of a matrix and ρ its spectral radius of the influence structure
Pτmax
`=τmin
Γ` .
This is equivalent to the summation, for each station s ∈ S and all time period up to T , of the maxiP
1/2
τmax
mum column sums of absolute values of
Γ
.
`=τmin `
The result is useful because it shows that the optimal solution is related to the spectral radius of
the influence matrix, and in particular the lower bound can be conceived as a centrality index in the
defined network of pairwise influences (Jackson 2010, Borgatti 2005, Bonacich 1987).
18
Let us start with the choice of station. To quantify the diffusion potential of a station s, we define
X
cs,T =
E [xs,t0 +T | xt0 = vs ] ,
s0 ∈S
where vs ∈ {0, 1}|S| is the s canonical vector with zeros everywhere except in sth position. The
measure cs,T thus assigns dynamic scores to all individuals in the network based on their ability to
generate positive externalities in the future periods.
Format
Expected plays
t0 + 1
t0 + 2
t0 + 3
t0 + 4
t0 + 5
BBC Radio 1
222.1906
196.1270
177.0395
163.3093
152.3401
Magic 105.4
219.4027
193.1600
175.4467
162.2060
152.0035
Key 103
213.6750
189.0823
172.8133
161.0081
151.3424
Galaxy 102
213.0615
188.4712
172.4151
160.5560
150.9959
Table 3: Propagation of the broadcasting decision at launch week t0 .
Table 3 reports the average values of cs,T for the most and the least influential stations for
T = 1, . . . , 5. Two important insights on diffusion properties of the starting decisions result from the
Table 3.
• The only impact of the launching company is propagate one and two weeks after the premiere and
vanishes after the third week. A more contextual interpretation is the fact that the long-run behavior
of a song is quite unsensitive to the initial marketing policy.
• BBC Radio 1 has the largest propagation effect.
• Galaxy 102 has the smallest propagation effect.
• The influence happening between stations of different formats are extremely small in size, suggesting
that the main core of imitation happen within stations of the same music formats (see Appendix C).
These insights deserve a careful interpretation. As already mentioned in §1, statistical models fail
to distinguish between causality and co-variation, so that the only evidence detected by the estimated
model is a systematic imitation of Top-40 stations (including BBC Radio 1) by Rock ones.
7
Conclusions
In this paper we developed a modeling framework for network discovery in influencer marketing that
integrates time variation of individual decisions about each item with the structural information
concerning their influences and spill-over. This model was crafted for fashion goods which experience
significant dynamic variations in popularity, and was applied to the diffusion of songs over time.
We could jointly face three aspects of statistical and decision analysis which are usually separately
treated: i) modeling dynamic choices and cross-section dependencies in multidimensional panels, ii)
discovering hidden structures of pairwise influences, iii) deducing diffusion patterns from the model
specification.
19
We discussed some of the mathematical properties of the proposed exponential random model,
along with a Bayesian estimation framework. Our general specification resulted in high dimensionality
arising from the quadratic growth of connections within the network, but a multi-level definition of
pairwise externalities allowed us to reduce the dimensionality of the parameter space.
In the music industry application, our results provide an empirical estimation of the strength of
station-to-station influences, which can be used as a score for the ability of conditioning the future
choices of other players in the market. We found evidence that there were influencing links (one station
copying another). With knowledge of the influence structure, we are able to support the decision of
music producers about which station to choose for first diffusing their products. We showed that
influence effects between broadcasting companies allows maximizing the diffusion of a song in the
first weeks after launch, but these effects fade away with time as the song reaches the entire network.
The proposed methodology is general and might be extended to many other multidimensional
panel settings. Applying the techniques developed in the paper to other environments may generate
further insights on how to incorporate influence relationships while at the same time including time
variations that apply to the entire network.
References
Amemiya, T. 1985. Advanced econometrics. Harvard university press.
Anselin, L. 1988. Spatial econometrics: methods and models, Volume 4. Springer Science & Business Media.
Arcidiacono, P., G. Foster, N. Goodpaster, and J. Kinsler. 2012. Estimating spillovers using panel data, with
an application to the classroom. Quantitative Economics 3 (3): 421–470.
Bailey, N., G. Kapetanios, and M. Pesaran. 2012. Exponent of cross-sectional dependence: Estimation and
inference. Working paper, SSRN.
Bass, F. M. 1969. A New Product Growth for Model Consumer Durables. Management Science 15 (5):
215–227.
Besag, J. 1975. Statistical analysis of non-lattice data. The statistician 24 (3): 179–195.
Blind, K. 2004. The economics of standards: Theory, evidence, policy. Edward Elgar, Cheltenham.
Bonacich, P. 1987. Power and centrality: A family of measures. American journal of sociology 92 (5): 1170–
1182.
Borgatti, S. P. 2005. Centrality and network flow. Social networks 27 (1): 55–71.
Brown, D., and S. Fiorella. 2013. Influence marketing: How to create, manage, and measure brand influencers
in social media marketing. Que Publishing.
Burda, M., M. Harding, and J. Hausman. 2009. Understanding choice intensity: A poisson mixture model
with logit-based random utility selective mixing. Working paper, MIT.
Castro, J., and S. Nasini. 2015. Mathematical programming approaches for classes of random network problems.
European Journal of Operational Research 245 (2): 402–414.
Chen, W., Y. Yuan, and L. Zhang. 2010. Scalable influence maximization in social networks under the linear
threshold model. In 2010 IEEE International Conference on Data Mining, 88–97. IEEE.
20
Chudik, A., M. H. Pesaran, and E. Tosetti. 2011. Weak and strong cross-section dependence and estimation
of large panels. The Econometrics Journal 14 (1): C45–C90.
Cialdini, R. B. 2009. Influence: Science and practice, Volume 4. Pearson Education Boston, MA.
Clifford, P., and A. Sudbury. 1973. A model for spatial conflict. Biometrika 60 (3): 581–588.
DeGroot, M. H. 1974. Reaching a Consensus. Journal of the American Statistical Association 69 (345):
118–121.
Desmarais, B. A., and S. J. Cranmer. 2012. Statistical mechanics of networks: Estimation and uncertainty.
Physica A: Statistical Mechanics and its Applications 391 (4): 1865–1876.
Dixit, A. K., and J. E. Stiglitz. 1977. Monopolistic competition and optimum product diversity. The American
Economic Review 67 (3): 297–308.
Gamerman, D., and H. F. Lopes. 2006. Markov chain monte carlo: stochastic simulation for bayesian inference.
CRC Press.
Gao, G., and L. Hitt. 2004. Information technology and product variety: Evidence from panel data. In ICIS
2004 Proceedings, 365 – 378.
Gauss, C.-F. 1823. Theoria combinationis observationum erroribus minimis obnoxiae.-gottingae, henricus
dieterich 1823. Henricus Dieterich.
Geyer, C. J. 1996. Estimation and optimization of functions. In Markov Chain Monte Carlo in Practice, ed.
D. S. W.R. Gilks, S. Richardson, 241–258. Chapman and Hall, London.
Goldenberg, J., B. Libai, and E. Muller. 2001. Talk of the network: A complex systems look at the underlying
process of word-of-mouth. Marketing letters 12 (3): 211–223.
Görür, D., F. Jäkel, and C. E. Rasmussen. 2006. A choice model with infinitely many latent features. In
Proceedings of the 23rd international conference on Machine learning, 361–368. ACM.
Granovetter, M. 1978. Threshold models of collective behavior. American journal of sociology:1420–1443.
Hanneke, S., W. Fu, E. P. Xing et al. 2010. Discrete temporal models of social networks. Electronic Journal
of Statistics 4:585–605.
Hardie, B. G., E. J. Johnson, and P. S. Fader. 1993. Modeling loss aversion and reference dependence effects
on brand choice. Marketing science 12 (4): 378–394.
Harvey, D. 1989. The condition of postmodernity, Volume 14. Blackwell Oxford.
Holley, R. A., and T. M. Liggett. 1975. Ergodic theorems for weakly interacting infinite systems and the voter
model. The annals of probability:643–663.
Jackson, M. O. 2010. Social and economic networks. Princeton University Press.
Jaynes, E. T. 1957. Information theory and statistical mechanics. Physical review 106 (4): 620.
Kelly, B., H. Lustig, and S. Van Nieuwerburgh. 2013. Firm volatility in granular networks. NBER Working
Paper No. 19466.
Kempe, D., J. Kleinberg, and Éva Tardos. 2015. Maximizing the Spread of Influence through a Social Network.
Theory of Computing 11 (4): 105–147.
Kermack, W. O., and A. G. McKendrick. 1927. A contribution to the mathematical theory of epidemics. In
Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, Volume
115, 700–721. The Royal Society.
21
Kirman, A. P., and N. J. Vriend. 2001. Evolving market structure: An ACE model of price dispersion and
loyalty. Journal of Economic Dynamics and Control 25 (354): 459 – 502. Agent-based Computational
Economics (ACE).
Kök, A. G., M. L. Fisher, and R. Vaidyanathan. 2009. Assortment planning: Review of literature and industry
practice. In Retail supply chain management, 99–153. Springer.
Krivitsky, P. N. 2012.
Exponential-family random graph models for valued networks.
Electron. J.
Statist. 6:1100–1128.
Kuksov, D., and J. M. Villas-Boas. 2010. When more alternatives lead to less choice. Marketing Science 29
(3): 507–524.
Lee, Y., J. A. Nelder, and Y. Pawitan. 2006. Generalized linear models with random effects: unified analysis
via h-likelihood. CRC Press.
LeSage, J., and R. K. Pace. 2009. Introduction to spatial econometrics. CRC press.
LeSage, J. P. 1997. Bayesian estimation of spatial autoregressive models. International Regional Science
Review 20 (1-2): 113–129.
Leydesdorff, L., and C. S. Wagner. 2008. International collaboration in science and the formation of a core
group. Journal of Informetrics 2 (4): 317–325.
Martínez-de-Albéniz, V., and A. Sáez-de-Tejada. 2014. Dynamic choice models for cultural choice. Working
paper, IESE Business School.
Møller, J., A. N. Pettitt, R. Reeves, and K. K. Berthelsen. 2006. An efficient Markov chain Monte Carlo
method for distributions with intractable normalising constants. Biometrika 93 (2): 451–458.
Morris, M., M. S. Handcock, and D. R. Hunter. 2008. Specification of exponential-family random graph models:
terms and computational aspects. Journal of statistical software 24 (4): 1548.
Murray, I., Z. Ghahramani, and D. J. C. MacKay. 2006. MCMC for doubly-intractable distributions. In
Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI-06), 359–366.
AUAI Press.
Nasini, S., V. M. de Albeniz, and T. Dehdarirad. 2017. Conditionally exponential random models for individual
properties and network structures: Method and application. Social Networks 48:202 – 212.
Ng, S., and P. Perron. 2001. Lag length selection and the construction of unit root tests with good size and
power. Econometrica 69 (6): 1519–1554.
Olfati-Saber, R., A. Fax, and R. M. Murray. 2007. Consensus and cooperation in networked multi-agent
systems. Proceedings of the IEEE 95 (1): 215–233.
Robins, G., P. Pattison, Y. Kalish, and D. Lusher. 2007. An introduction to exponential random graph (p∗ )
models for social networks. Social networks 29 (2): 173–191.
Schelling, T. C. 2006. Micromotives and macrobehavior. WW Norton & Company.
Simons, G., and N. Johnson. 1971. On the convergence of binomial to Poisson distributions. The Annals of
Mathematical Statistics 42 (5): 1735–1736.
Train, K. 2009. Discrete choice methods with simulation. Cambridge university press.
Tversky, A., and I. Simonson. 1993. Context-dependent preferences. Management science 39 (10): 1179–1189.
Wang, C., W. Chen, and Y. Wang. 2012. Scalable influence maximization for independent cascade model in
large-scale social networks. Data Mining and Knowledge Discovery 25 (3): 545–576.
22
Wang, P., K. Sharpe, G. L. Robins, and P. E. Pattison. 2009. Exponential random graph (p∗ ) models for
affiliation networks. Social Networks 31 (1): 12–25.
Weisstein, E. W. 2016. Gauss’s Inequality.
Zeger, S. L., and M. R. Karim. 1991. Generalized linear models with random effects; a Gibbs sampling
approach. Journal of the American statistical association 86 (413): 79–86.
23
Appendix A: Proofs
Proposition 1
Proof. We start by noting that the unimodality of P (xi,t | xi,t−τmin . . . xi,t−τmax ) is equivalent to the
unimodality of its logarithm. So we define f : Rn −→ R as f (xi,t ) = log P (xi,t | xi,t−τmin . . . xi,t−τmax )
and H : R|S| −→ R|S| as H(x1 , . . . , xn ) = [h0 (x1 )/h(x1 ) . . . h0 (xn )/h(xn )]T . A sufficient condition for
f to be unimodal is that, for any positive ε, it verifies the following two implications:
i) if ∇f (z) < 0 then ∇f (z + ε) < 0 (componentwise);
i) if ∇f (z) > 0 then ∇f (z − ε) > 0 (componentwise).
Note that ∇f (xit ) =
Since δ =
Pτmax
`=τmin
τX
max
Γ` g(xi,t−` ) − H(xit ) − log Z(γ)e.
`=τmin
Γ` g(xi,t−` ) − Z(γ)e is a constant, based on Assumption 1, for any component
s ∈ S we can verify that
i) if δs < Hs (xi,t ) then δs < Hs (xi,t + ε);
ii) if δs > Hs (xi,t ) then δs > Hs (xi,t − ε);
where δs and Hs are the sth components of δ and H respectively.
Proposition 2
Proof. Consider the conditional distribution xi,t | xi,t−τmin . . . xi,t−τmax , as obtained from (2) and
maximize it with respect to xi,t . From Proposition 1 we define ∇f (xit ) and obtain the first order
condition for the mode of xi,t | xi,t−τmin . . . xi,t−τmax

τX
max
`=τmin


Γ` g(xi,t−` ) = 

h0 (mod(x1,it ))
h(mod(x1,it ))
..
.
h0 (mod(x1,it ))
h(mod(x1,it ))



τX
max


Γ` g(xi,t−` )
 , and mod(xit ) = H −1 

`=τmin
The second equality comes from the fact that Assumption 1 guarantee that H is invertible.
Proposition 3
Proof. Consider the Laplace method used to approximate integrals of the form
s
Z
1
2π
1
exp(f (x))dx ≈
exp (mod[f ])
00
h(z)
f (mod[f ]) h(z)
24
where mod[f ] is the maximizer of f and f 00 (mod[f ]) is the Hessian of f at point mod[f ]. In our case,


τX
max
T
f (x) = ψ 
(xit ) Γ` g(xi,t−` ) + log xit 
`=τmin
After noting that f 00 (mod[f ]) is diagonal, we replace the values corresponding to the PIDC model (1)
and defining mit = mod[xi,t | xi,t−τmin . . . xi,t−τmax ] and µit = E[xi,t | xi,t−τmin . . . xi,t−τmax ], we have


√
τX
max
2π (mit )2ψ
T
E[xi,t | xi,t−τmin . . . xi,t−τmax ]
=
exp ψ
(mit ) Γ` g(xi,t−` )
h(mit ) Z(γ)
`=τmin
√
V[xi,t | xi,t−τmin . . . xi,t−τmax ]
=


τX
max
2π (µit − mit )2ψ
T
exp ψ
(mit ) Γ` g(xi,t−` )
h(mit ) Z(γ)
`=τmin
where E[ . ] and V[ . ] are respectively the expectation and the variance operators.
Proposition 4
Proof. Given an an exponential random model on a sample space X with vector of natural parameter
and sufficient statistics γ and T respectively, let qγ (x) = exp(T (x)T γ)/h(x) and w : X → R.
Equation (4) can be deduced by differentiation:
!
X
d
∂
1
E [x] =
+
xqγ (x)
dγ
∂γ Z(γ)
x∈χ
!
=−
X
x∈χ
xqγ (x)
X
X ∂qγ (x)
x
∂γ
x∈χ
!
T (x)qγ (x)
x∈χ
1
Z(γ)2
!
+
1
Z(γ)
X
!
T (x)xqγ (x)
x∈χ
1
Z(γ)
= −E [T (x)] E [x] + E [xT (x)] = Cov [x, T (x)] .
In the case of the the conditional model (1), the sufficient statistic is T`ss0 = xs,i,t g(xs0 i,t−` ). Thus,
we claim that
∂
E [xsit | xi,t−τmin . . . xi,t−τmax ] = g(xs0 i,t−` )V [xsit | xi,t−τmin . . . xi,t−τmax ] ,
∂γ`ss0
(13)
for each i ∈ I, s ∈ S.
Proposition 5
Proof. For every item i ∈ I and every time t ∈ T , consider model (2) and the probability of the total
individual outcomes:
P
X
xsit
= yit xt−τmin , . . . , xt−τmax

!
∝ Qγ (yit ) :=
X
xs1t +...+xs|S|t =yst
s∈S
25
1
(hist )ψ
exp ψ
τX
max
`=τmin

γ`ss0 Gss0 i`t 
max
Let γmax be the maximum element among the influence structure {Γ` }τ`=τ
. Then the following
min
upper limits can be deduced:
max{ψ,1}


Qγ (yst ) ≤ 
exp ((τmax − τmin )γmax (yst g(yst )))
1
X
Y
xs1t +...+xs|S|t =yst
xsrt !



r∈S
max{ψ,1}
|S|yst
exp (γmax (yst g(yst )))
;
=
yst !
[multinomial theorem]
Since Qγ (yst ) < ∞ for any real yst , the a normalizing constant exists when Qγ (yst ) goes to zero when
yst grows large. By applying the ratio test to the convergence of the series, we find
|S|y+1 y! exp (γmax ((y + 1) g(y + 1)))
y→∞
|S|y (y + 1)! exp (γmax (y g(y)))
L = lim
|S|
exp (γmax (((y + 1) g(y + 1)) − (y g(y))))
y→∞ y
= lim
Note that a sufficient condition for Z(γ) < ∞ for all γ ∈ R is that L < 1. Convergence is guaranteed if
g is non-decreasing and bounded from above, then, for all y ≥ 0, we have (((y + 1) g(y + 1)) − (y g(y))) ≤
1.
Proposition 6
Proof. Consider the PIDC model (2) and let S0∗ be the optimal first stage solution of (11), under a
given PIDC model specification. Note that for any S0 ∈ 2|S| , verifying the capacity constraint, we
have σ(S0∗ ) ≥ σ(S0 ). Based on (12) we see that

σ(S0 ) ≥

1
T
exp ψu 
Z(γ, S0 )
τX
max
 
Γ`  u ,
`=τmin
where u is an arbitrary vector with norm |S ∗ |. Since this is true for all u, it is also true for the maxi
P
1/2
P
T
τmax
τmax
Γ
.
u,
subject
to
||u||
=
|S
|,
which
is
known
to
be
the
norm
of
mizer of u
Γ
0
`
`
`=τmin
`=τmin
Thus, we can write
 
1/2 
τ
max
1
exp (ψρ)

 X
σ(S0 ) ≥
|S ∗ | exp ψ 
Γ`   ≥
Z(γ, S0 )
Z(γ, S0 )
`=τmin
1
where ρ is the spectral radius of
P
τmax
`=τmin
Γ`
1/2
.
26
Appendix B: PIDC micro-foundation
This Appendix derives the statistical model for influence discovery (2) from a micro-founded
decision setting when broadcasting companies are assume to target pre-defined play-list policies.
Each station s ∈ S wishes to pursue an optimal play-list polity, under a limited capacity of N
maximum mutually exclusive portions (song’s plays)6 . Thus, the total number of plays is fixed and
P
must be distributed among |I| songs: i∈I xsit = N .
e ss0 `t = [G
e s,s0 ,1,`,t . . . G
e s,s0 ,|I|,`,t ]T ∈ R|I| be a measure of similarity between
At a given period t, let G
the |I| choices of station s at time t and the ones of station s0 at time period t − `.
Stations payoff over feasible play-list policies are are axiomatically derived.
Axiom 1 (Monotonicity with respect to the influence level). The optimal value of xsit increases with
e s,s0 ,i,`,t , for any other station s0 ∈ S at period t − `.
respect to the target similarity G
Axiom 2 (Pareto-efficiency). The optimal play-list policy is Pareto-efficient with respect to two objectives: (i) maximize the likelihood under a fully stochastic setting, (ii) maximizing the pre-defined
target similarities.
Axiom 3 (Myopic with respect to the future). The optimal play-list polity at time t of a given station
s ∈ S does not take into account the best response policy of any other stations s0 ∈ S/{s} (stations
are ’future outcomes taker’).
Axiom 4 (Myopic with respect to the influence structure). The optimal value of xsit must be increase s,s0 ,i,`,t , for any other station s0 ∈ S at period t − ` (stations
ing with respect to the target similarity G
are ’influence structure taker’).
Axiom 5 (Invariance to a uniform change in the influence level). When all the similarities are
multiplied by a common constant, the resulting station utility shall be proportional to that constant.
Q
In a fully stochastic setting, the probability of a particular assignment xst is N !( i xsit !)−1 |I|−N .
Thus, for each station s at time t, the class of play-lists policies verifying the aforementioned assumptions consists in the maximization of the weighted geometric mean between item targets and relative
frequency:
1−ϕ
!ϕ 
max
Y Y τY
Y N!
e ss0 i`t )γ`ss0 xsit /N 

|I|−N
(G
,
(14)
us (xst ) ∝
xsit !
0
i∈I s ∈S `=τmin
i∈I
|
{z
}|
{z
}
likelihood score
target similarity level
P
where ϕ ∈ [0, 1] is a known weight, s0 ,` γss0 ` = 1 and P (xsit ) ∝ xsit /N . In this context, the play-list
e ss0 i`t is treated as a constant with respect to current decisions (they might have been
similarity G
estimated from past observations or simply assumed by the station).
There are N |S| possible assignments when portions and items are distinguishable. When the number of items per
Q
each portion n1 . . . nM is fixed, there are N !/ j nj possible assignments.
6
27
The optimal play-list policy can be obtained based on the maximum entropy principle derivation
by Jaynes (1957). In fact, when N is sufficiently large, taking the logarithm of (14) and using the
Stirling’s approximation give rise to
argmax us (xst ) ≈ argmax − ϕ
X
xsit (log xsit − 1) + (1 − ϕ)
i∈I
XX
θsitj
j∈J i∈I
The play-list (16) is then derived from first order condition on (15):


τX
max X X
1
−
ϕ
e ss0 i`t  , with
P (xsit ) ∝ exp 
γ`ss0 log G
ϕ
0
`=τmin i∈I s ∈S
X
xsit
log Tsitj .
N
P (xsit ) = 1.
(15)
(16)
i∈I
In this decision setting, the pairwise influences effects γ`ss0 are known by the station —as the
weights of the geometric mean in (14)—, who take decisions based this pre-defined imitation pattern.
The resulting play-list policy (16) follows an exponential random model, whose complete specification
e ss0 i`t 7 .
depends on the design of G
For mathematical tractability and for the interesting statistical properties described in the next
e ss0 i`t is adopted for the general class of
subsection, a re-parametrization of the influence measure G
PIDC models.
The following re-parametrization is considered
e ss0 i`t
G
log(hist )
= exp Gss0 i`t −
,
γ`ss0

so that P (xsit ) ∝
1
(hist )ψ
exp ψ
τX
max

XX
γ`ss0 Gss0 i`t  .
`=τmin i∈I s0 ∈S
As discussed in the next subsection, when a collection of dynamic broadcasting decisions is observed (in terms of a three dimensional panel {xist | i ∈ I, s ∈ S, t ∈ T }), h and G play the respective
roles of underlaying measure and the sufficient statistics of the so defined exponential random model.
This specification allows translating the strategic model for companies decisions into a statistical
framework for the empirical estimation of γ.
7
Most well-known probability distributions belong to such family, which can take several forms when modeling highly
dimensional data.
28
Appendix C: Influence plots
(a) Rock music format, Top-50 training set.
(b) Rock music format, Top-200 training set.
Figure 8: Network plots of the estimated pairwise influences between rock music stations.
(a) Contemporary format, Top-50 training set.
(b) Contemporary format, Top-200 training set.
Figure 9: Network plots of the estimated pairwise influences between contemporary music stations.
29
(a) World music format, Top-50 training set.
(b) World music format, Top-200 training set.
Figure 10: Network plots of the estimated pairwise influences between world music stations.
(a) Top-40 music format, Top-50 training set.
(b) Top-40 music format, Top-200 training set.
Figure 11: Network plots of the estimated pairwise influences between top-40 music stations.
30