Download Tree and Forest Weights and Their Application to Nonuniform

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Digital Kenyon: Research, Scholarship, and Creative Exchange
Faculty Publications
Mathematics and Statistics
2-1999
Tree and Forest Weights and Their Application to
Nonuniform Random Graphs
Brian Jones
[email protected]
Follow this and additional works at: http://digital.kenyon.edu/math_pubs
Part of the Geometry and Topology Commons
Recommended Citation
Jones, B. D., Pittel, B. G. and Verducci, J. S. (1999). Tree and Forest Weights and Their Application to Nonuniform Random Graphs.
Annals of Applied Probability 9, No. 1, 197-215.
This Article is brought to you for free and open access by the Mathematics and Statistics at Digital Kenyon: Research, Scholarship, and Creative
Exchange. It has been accepted for inclusion in Faculty Publications by an authorized administrator of Digital Kenyon: Research, Scholarship, and
Creative Exchange. For more information, please contact [email protected].
Tree and Forest Weights and Their Application to Nonuniform Random Graphs
Author(s): Brian D. Jones, Boris G. Pittel and Joseph S. Verducci
Source: The Annals of Applied Probability, Vol. 9, No. 1 (Feb., 1999), pp. 197-215
Published by: Institute of Mathematical Statistics
Stable URL: http://www.jstor.org/stable/2667298
Accessed: 17-06-2015 13:14 UTC
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/
info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact [email protected].
Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and extend access to The Annals of
Applied Probability.
http://www.jstor.org
This content downloaded from 138.28.20.235 on Wed, 17 Jun 2015 13:14:53 UTC
All use subject to JSTOR Terms and Conditions
The Annals of Applied Pr-obability
1999, Vol. 9, No. 1, 197-215
TREE AND FOREST WEIGHTS AND THEIR APPLICATION
TO NONUNIFORM RANDOM GRAPHS
BY BRIAN D. JONES, BORIS G. PITTEL AND JOSEPH S. VERDUCCI
Kenyon College, Ohio State University and Ohio State University
For a complete graph K,, on n vertices with weighted edges, define the
weight of a spanning tree (more generally, spanning forest) as the product
of edge weights involved. Define the tree weight (forest weight) of Kn as
the total weight of all spanning trees (forests). The uniform edge weight
distribution is shown to maximize the tree weight, and an explicit bound
on the tree weight is formulated in terms of the overall variance of edge
weights as well as the variance of the sum of edge weights over nodes. An
application to sparse random graphs leads to a bound for the relative risk
of observing a spanning tree in a well-defined neighborhood of the uniform
distribution. An analogous result shows that, for each positive integer k,
the weight of all forests with k rooted trees is also maximized under the
uniform distribution. A key ingredient for the latter result is a formula
for the weight of forests of k rooted trees that generalizes Maxwell's rule
for spanning trees. Our formulas also enable us to show that the number
of trees in a random rooted forest is intrinsically divisible, that is, representable as a sum of n independent binary random variables ?j, with
parameter (1 + A)-1, A's being the eigenvalues of the Kirchhoff matrix.
This is directly analogous to the properties of the number of blocks in
a random set partition (Harper), of the size of the random matching set
(Godsil) and of the number of leaves in a random tree (Steele).
1. Introduction.
Various classes e of objects, such as permutations,
groups, trees and directed graphs have natural representations in terms of
matrices. Several authors [e.g., Beran (1979), Verducci (1989)] have investigated probability models on these classes in terms of exponential families
based on such matrix representations. That is, if R(7T) is a square matrix
representing object -T and 0 is a parameter matrix of the same dimensions,
then
(1)
p(
exp{tr[R(QT)'0]}
)
is the probability density (with respect to Haar measure for continuous groups
or counting measure for finite classes) of the associated family.
An area of interest is the form of the normalizing constant
(2)
C,(e) =
E
exp{tr[R(7T)'e ]}.
For example, when e is the symmetric group Sn of all permutations 7r of
[n] _ {1, .. ., n} and R(7T) is the corresponding permutation matrix, then
Received September 1997; revised March 1998.
AMS 1991 subject classifications. Primary 05C05; secondary 05C80.
Key words and phrases. Kirchhoff matrix, matrix-tree theorem, Maxwell's rule, spanning tree,
spectral graph theory, tree polynomial.
197
This content downloaded from 138.28.20.235 on Wed, 17 Jun 2015 13:14:53 UTC
All use subject to JSTOR Terms and Conditions
198
t/(O)
B. D. JONES, B. G. PITTEL AND J. S. VERDUCCI
is the permanent of A _ [exp(Oij)]; that is,
n
s(0)
=
>1 H Ai
TESI
(i)
i=1
[see Minc (1978) for a thorough treatment of permanents]. The recently proved
van der Waerden conjecture [see van Lint and Wilson (1992), Chapter 12]
asserts that fris minimized over all doubly stochastic matrices A when all
the entries of A are equal (i.e., A = [1/n].) Extrema of qre for other classes v
is of central focus to this paper.
Of particular interest is the situation where v consists of all nn-2 spanning
trees -T on [n] (identified with the n vertices labelled 1, 2, .. ., n), and the
n x n symmetric matrix R(7T) has (i, j) and (j, i) entries 1 if the tree 7Thas
edge e = (i, j), [that is, if e c E(Q7T),
the set of all edges in the tree -i], and 0
otherwise. In this case, we can simplify notation by treating A as a symmetric
matrix (e.g., an adjacency matrix with entries indicating edge weights). Then
the n x n matrix A may be identified with its edge weights {ae, e c E(KJI,
and
tree(()
(3)
=
L H
ae
vre4 eeE(v)
= t(A)
is called the tree weight of A. The tree weight t(A) is also called the tree
polynomial [Farrell (1981)] of the complete graph Kn on n nodes with weighted
edges because it is a polynomial in the entries ae of A and the summation is
over all nn-2 spanning trees of Kn. Individual terms in (3) are called tree
products.
When A is the adjacency matrix of a simple graph G, then t(A) clearly
counts the number of spanning trees of G. In this paper, the tree weight t(A)
naturally arises with more general forms of A. For A = [?e] assigning odds Oe
to all edges e of Kn, t(A) becomes the key to calculating the probability of observing a spanning tree on n nodes when edges e c Kn, appear independently
with probabilities Pe = Oe/(l + Oe). Also, if Pe = ce/n (generating a "sparse"
random graph), then the expected number of tree components having k nodes
is easily shown to be asymptotically equal to a weighted sum of t(AS), where
S c [n], ISI = k and As = {ae: e c S x S}.
The authors conjectured [Jones (1995)] that, given the total edge weight w =
Le ae, t(A) attains its maximum at the uniform weight distribution a* = wIN,
where N = (n) is the number of edges in the edge set E(Kn). In Section 2 we
prove this conjecture and also obtain an upper bound for the ratio t(A)/t(A*)
expressed in terms of the first and second empirical moments of the vertex
weights ai- Lei ae.
We use these results in Section 3 to investigate a sparse random graph
model. Specifically, we look at the probability that the random graph is a
spanning tree, comparing it to the uniform case with Pe = c/n, c = N-1 >e Ce.
The limiting ratio of the corresponding probabilities is bounded from above
This content downloaded from 138.28.20.235 on Wed, 17 Jun 2015 13:14:53 UTC
All use subject to JSTOR Terms and Conditions
TREE AND FOREST WEIGHTS
199
in terms of the first two empirical moments of the edge weights [ce] and the
vertex weights [ci]. The bound implies that, in a nondegenerate situation, the
probability is maximum in the uniform case for n large, provided that the
average edge weight c strictly exceeds 4 in the limit. We conclude the section
with a simple example indicating that this bound for c cannot be pushed below
2+ /2.
In Section 4 we extend our study to rooted forests. Recall that a graph is
called a forest if it is a collection of tree components, or equivalently, it has no
cycles. A tree is said to be rooted if a single node in that tree is designated as
a root, and a rooted forest is a forest in which all trees are rooted. Let v (resp.,
ek) denote the collection of all spanning rooted forests (consisting, resp., of k
trees). Let us define the corresponding forest weights by
f (A) =
H
fk(A) =E
g
ae,
vceV ec:E(v)
H
ae.
Tvck ec:E(v)
We show that, like the tree weight t(A), each of f k(A), (1 < k < n) and f (A)
attain their maximum values at the uniform case A*.
The formulas for fk(A) and f(A) are used to show that the number of
trees Xn in a random rooted forest (sampled with probability proportional
to its weight) is intrinsically divisible, that is, representable as a sum of n
independent Bernoulli random variables. More precisely, the success probability of the jth Bernoulli variable is 1/(1 + Aj), where Aj, (1 < j < n), are
the eigenvalues of the weighted Kirchhoff matrix corresponding to A. With
regard to this divisibility property, Xn is analogous to the number of sets in
the random set partition [Harper (1967)], the number of edges in the random
matching set [Heilmann and Lieb (1972), Godsil (1981)] and the number of
pendant vertices in a random tree [Steele (1987)]. We illustrate applicability
of the last result by showing that for the cube on n = 2k vertices, the random variable Xn is asymptotically Gaussian with mean and variance close
to 2k/k.
2. Bounds on tree weight.
The main results of this section are first, that
on the set of all edge-weight distributions A = [ae] of a given total weight w,
the tree weight function t(A) attains its maximum at the uniform distribution
A* = [a* = wIN], N = (n); and second, that a useful bound on t(A)/t(A*)
can be formulated in terms of the vertex weights ai = Leei ae.
THEOREM1.
maxj t(A): Yae
2.
THEOREM
=
w}
=
t(A*)
For any weight distribution A
t(A) < 1 (exaip)nexp[
nn-2
=
=
-
rae],
__n-
_"l
_
This content downloaded from 138.28.20.235 on Wed, 17 Jun 2015 13:14:53 UTC
All use subject to JSTOR Terms and Conditions
B. D. JONES, B. G. PITTEL AND J. S. VERDUCCI
200
To prove the statements, first introduce the n x n Kirchhoff matrix M
[mij(A)] associated with the weight distribution A:
r-a(~J),if
-
mij = 4ai)
(4)
(4)
""'~~~'
~ai,
1
=
i#
fg+J
if i=j.
Here are two well-known properties of M.
LEMMA 1.
M(A) is nonnegative definite.
PROOF. Let Xtr= [xI,
.. .,
xnj c Rn. Then
xtrM(A)x
=
Laijx
-E
i,j
=
aijxixj
i,j
2 >aij(xi
-Xj)2
i, j
>
D.
Let Mi(A) denote the cofactor of the (i, i)th element in M(A). The following
lemma is frequently called Maxwell's rule [Maxwell (1892)].
LEMMA 2.
For each i c [n], t(A) = Mi(A).
For a proof see, for instance, Moon (1970), Theorem 5.2. The next lemma
is a well-known result about the average value of diagonal minors. See, for
example, Godsil (1993).
LEMMA3. Let K c [n] = {1, ..., n}, let M be an n x n symmetric matrix
with eigenvalues {Ai} and let MK be the determinant of the (n - k) x (n - k)
matrix obtained by deleting from M the rows and columns indexed in K. Then
Y,
L
MK=
Jc[n]: IJ=n-k
Kc[n]: IKI=k
H Ai.
jeJ
PROOFOF THEOREM1. Since 1 = [1, 1,..., i]tr is an eigenvector of M =
M(A) with eigenvalue 0, it follows from Lemma 1 that the smallest eigenvalue
An of M is 0. From Lemma 2 and Lemma 3 with k = 1,
1
n
A
t(A)=-LE
n i=l joi
l
71-
1
=1n n Ai
i=l
This content downloaded from 138.28.20.235 on Wed, 17 Jun 2015 13:14:53 UTC
All use subject to JSTOR Terms and Conditions
TREE AND FOREST WEIGHTS
1
201
2WA
n n-1J
=
t(A*).
This last equation can be seen either by noting that the eigenvalues of M(A*)
are
=
A*=2W
n -1
n -i- '
and A* = 0, or by simply multiplying the number
common tree product
K
2W
of labelled trees by the
nn-2
A n-1
n(n - 1) J
Note. The essential parts of Theorem 1 appear in several earlier works; see
Grimmett (1976), Grone and Merris (1988). Nevertheless, Theorem 1 itself is
not stated in any previous work that we have found.
2. As in the previous proof, let A1
PROOFOFTHEOREM
> A_ = 0 be
the eigenvalues of the Kirchhoff matrix M(A). Using Lemma 3 with k = 2,
we get
E
AiAj
l<i<j<n-I
AiAi
1<i<j<n
=
E
(aiaj-a?.)
1<i<j<n
E
aiaj
l<i<j<n
aa]a-
=2[(Eai)
Next, applying the geometric-arithmetic
A,Aj, we get
(b1I )
(
2=iiUniA
Ai)
AiAj
mean inequality to
[(Li ai )2 - L
< [
(i2-1)
a?] (2)
2n1
which implies
n-IA
(5)
(ia
i n- (n
- 1(n-l)/2
-~~~~~~~~~~~~
1n n
_ -
(Li
n=
-A
(
-2
(n
/a?
Ei
v ai
,i-
ai_
1)/2
X
2
This content downloaded from 138.28.20.235 on Wed, 17 Jun 2015 13:14:53 UTC
All use subject to JSTOR Terms and Conditions
numbers
202
B. D. JONES, B. G. PITTEL AND J. S. VERDUCCI
where the last inequality comes from 1 + x < ex. Finally, since t(A)
A )/n, (5) implies the result of the theorem. a
(FWi7-l
=
3. Application
to spanning trees in sparse random graphs.
Let
P = [Pe] be a symmetric matrix, with each entry Pe c [0, 1]. Introduce the
random graph model S(n, P) where edges appear independently and the
probability of an edge e being present equals Pe, e c E(Kn). We will let G
denote a realization of the random graph model 4. The special case in which
Pe p is referred to as the uniform case, and we denote it 4(n, p). This case
has been studied extensively since the 1950s, with works by Gilbert (1959),
Kelmans (1972), Grimmett and McDiarmid (1975), Bolloba's and Erd6s (1976)
and Stepanov (1970), to cite some examples.
A random graph model closely related to 4(n, p) is the model 4(n, m),
in which a fixed number of edges m = m(n) are distributed uniformly at
random among all N positions. For many graph properties, these two models are asymptotically equivalent as n -> oc with m = pN, provided that
p(l - p)N = o((pN)2), meaning that the variance of the number of edges
in the random graph is negligible compared to its expected value. The model
4(n, m) was the focus of Erd6s and Renyi's cornerstone works (1959, 1960),
and later Stepanov (1969) undertook a detailed study of 4(n, p). An ever expanding bibliography of random graph literature may be traced through the
texts by Bollobas (1985), Palmer (1985) and in a recent survey by Karon'ski
(1995).
Of special interest is the case when the edge probability p = c/n or m =
cn/2, both conditions meaning that the average vertex degree is asymptotic
to c. A major thrust of the Erd6s-Renyi-Stepanov studies is that the random
graph can be in one of the three phases, namely subcritical, nearcritical and
supercritical, dependent on whether c < 1, c is close to 1 or c > 1. It is when
c slightly exceeds 1 that the birth of a giant component takes place; see also
Barbour (1982), Bolloba's (1985), Pittel (1990), Janson, Knuth, Luczak and
Pittel (1993), Luczak, Pittel and Wierman (1994).
Our focus here is primarily on the nonuniform sparse model, when Pe =
ce/n, and Ce # c. With a notable exception of Stepanov (1970a, b) who studied the special case P(ij) = aiaj/n, not much is known about the behavior
of this random graph. Its analysis promises to be considerably more complicated since the classical graph-enumerating techniques can no longer be used
freely.
In this section we investigate the probability that G, a random graph from
the model 4(n, [Ce/n]), is a spanning tree. Let e and P(e) denote the set of
all spanning trees on Kn and the probability in question. The probability that
G is a given spanning tree T c e is
P(T)=
H
eeE(T)
(6)
H
Pe
(1-Pe)
e'oE(T)
Pe
I1)
_
ec:E(T)1-Pe
e
I1
(1-Pe)
e'ceE(Kn)
This content downloaded from 138.28.20.235 on Wed, 17 Jun 2015 13:14:53 UTC
All use subject to JSTOR Terms and Conditions
TREE AND FOREST WEIGHTS
203
Clearly then,
P(e)=
P(T)
E
TEe
(7)
=
H
t(A)
(1 -Pe),
eeE(K,,)
where
[
A
In the uniform case [Pe
Po (4)
= c/n],
=
n
?_pe]
this probability simplifies to
2(c/n)n-1(1
-
c/n)(2)(n)
C(ce-c/2 )n-3
n
(2 e
Our intent is to show, in the sparse case [Pe] = [Ce/n] subject to some
mild balancing restrictions on [Ce], that P(e) also approaches 0 exponentially
fast, with the rate dominated by P0Q() in the sense that the relative risk
P(e)/IP0(e) is asymptotically bounded.
We assume that [Ce]depends on n in such a way that
(8)
limsupmaxpe
e
n-->oo
< 1,
and
lim c = btC > 0,
n-+oo
lim N-
n-+-oo
(9)
L(CeCe)2
C >0,
e
n
lim n-
(c-ic)2
vc > 0
and
i=1
lim (nN)-l
n-+-oo
C c3
=0.
e
The meaning of (9) is that the sample mean and the sample variance of
[Ce] and also the sample variance of the vertex-associated averages [ci] _
[eDi Ce/(n - 1)] all have finite limits. The generality of the conditions is in
that, aside from the mild uniform restriction (8), they are of a global nature.
In particular, we may regard the values [Ce] as being realized from a random
distribution with moment constraints given by (9).
This content downloaded from 138.28.20.235 on Wed, 17 Jun 2015 13:14:53 UTC
All use subject to JSTOR Terms and Conditions
204
B. D. JONES, B. G. PITTEL AND J. S. VERDUCCI
THEOREM3.
Under conditions (8) and (9),
< exp j-C
1im SUp
COROLLARY1.
2/u2
For sufficiently large n, the uniform probability matrix
Pf{-} among all matrices satisfying conditions (8) and
[Pe = c-/n] maximizes
(9) with I-tc > 4.
We begin by noticing that, according to (7),
(10)
(10)
P{&}
t(A)
~~Po{f
elt=
0
Fle(l-Pe)
q
p
c/n
where
In view of (10), we need only to prove the following technical lemma.
LEMMA 4.
Under conditions (8) and (9),
lim sup
n--->o
t(A)
t(A0)
<
exp
p2
o-c2
-2--
C
-2/2
lim
n
n---coq=
exp
_4
}
.
PROOF. From Theorem 2, we have
n1L1Unm..
2
)nljnI
exM
1(trM
(12)
t(A)
where M=
[mij] is the Kirchhoff matrix for A
n
trM =
(13)
EE
=2
(Pe
2ZPe
e
=
[pe/(l
( ti=
-
2
1
Pe)] Therefore,
P
fPe
+ePl
2
p)
Pe
ee
This content downloaded from 138.28.20.235 on Wed, 17 Jun 2015 13:14:53 UTC
All use subject to JSTOR Terms and Conditions
205
TREE AND FOREST WEIGHTS
the last equality following from (8). However, with Pe = ce/n and (9),
= L?(Pe-P
>Z Pe
e
+ P)3
e
~~ee
e
n3 ,c
(c
c)+n3
)+(2
)n3
=o(l).
Therefore, (13) implies
trM
2EPe
(14)
=
-
2 +o(l)
+2Lp
e
e
e
)
(n -
L(Ce - c)2
1)jc-+ (nN)-
and thus
M
~~tr
=- 1
(15)
(15)
-2
c +(nN)-
+
L(Ce-)
o(n--
We now express >j m?. in terms of [Ce]. Observe that
Pe
Pe
1
eDi
1
(16)
? On
e+
-
e3D
=
since
(17)
Ci =
-
Ce
e3D
(1 + O(n-1))
+ O(n2LC2)
eDi
( Ce)/(n - 1); hence
it
Ci (1 ? O(n1))
+ O(n-2
Now choose a c (0, 1) and set 3 = 2
mean inequality we have
2c-in-2
-
?
c2) +
e
a. Applying the geometric-arithmetic
C2 = 2c-in-on-8
eDiez
n
C2
ez
eDi
(18)
2
-i
2
n-23(6
C 2)
This content downloaded from 138.28.20.235 on Wed, 17 Jun 2015 13:14:53 UTC
All use subject to JSTOR Terms and Conditions
206
B. D. JONES, B. G. PITTEL AND J. S. VERDUCCI
and therefore
M=
+
c2(1
+
0(n-'
+
-2,
0((n
+
n-2a))
n-4)(Lc))
D
2
(19)
(
O(2a))
2
since 2a < 1 and
2,8
+
0
-2
n
)7
< 4. Then it follows that
n
it
(20)
n
ci
i=1
~
i=1~
~
2
n
( Lc2)
+o(n2
~~~~~i1
g
To bound the last term in (20), clearly
(
c2)
(21)
< (2
c2)
+ 2 252)
<
(4
<
16(L(Ce-C)2)2+-8(L(c
)2
(Ce-
[Nf1 >(C
< 4n4
-c)2)l252
-
5)2?+
+4n4cl44
4]
Q(n4),
-
the last estimate following from the second condition in (9). Combining
and (21),
_e1
e
(22)
11
l,i=
But 3-
2o3 < 0,
L(
25]n_
C j
-
Li2
-4)2
+ 0n
+
and thus by (9), the limit of (22) as
exponential term in
exp
(12)
:
C2]
n
+
O(n32:).
is
oo
c2~ +,u4.The
can now be evaluated completely:
J2(n-n-i
2)N
n-1LEi=1mz2
2
-
n-i
= exp
1 L),1
2
0(
(23)
2
[1 + O(n2)][
=
exp2
= exp{-2
1
(trM)2
-n 1
n2)2
(trnM/(n-2
2
(20)
=1
mz2/(n-1)
1))2
1
J
2
This content downloaded from 138.28.20.235 on Wed, 17 Jun 2015 13:14:53 UTC
All use subject to JSTOR Terms and Conditions
TREE AND FOREST WEIGHTS
207
Recalling the definition of Ao and (15), we obtain
n
1
trM
n "n-1I
(24)
n
=
1
t(Ao)
N
Nj
(Ce,-
+
c2
j(cN)-
YL(Ce
2 C+
C[C+
[+ +
} +
/'j,4
o(n-l1)](
+ o(n-l)]
(1 -c/n
> O, c is asymptotically bounded away from 0,
Since cju
and the above expression converges to
o(n-')/-=
o(n-1)
(25)
- + btc} expf-/tc}
exp
= exp , -
Therefore,
2
lim supt(A)
t(A0)
(26)
2e
}
ILc2 -
holds.
Turning our attention to the asymptotic value of qn, defined in (15), we may
use condition (8) to write
logqn
Llog(1--)-(2)log
=
n
~~Ce
n
(1--)
(
22+?(n
Ce
2-
e
)no
1
~~~~
2n2~~~~~~~~
(27)
1
-
2n2
( 2 ) [(n)
C +
2
n
-
2]
+
2 + 0(n-)I
+o(1)
Q(~n-)
O-c2
4
Thus
_
lim q =exp
(28)
c}
and the lemma is proved. D
This content downloaded from 138.28.20.235 on Wed, 17 Jun 2015 13:14:53 UTC
All use subject to JSTOR Terms and Conditions
208
B. D. JONES, B. G. PITTEL AND J. S. VERDUCCI
PROOF OF THEOREM3. Theorem 3 now follows easily from (10) and the
limiting bounds (26) and (28):
pim
lim
(29)
t(A)
i-el_
{}
sup
n-oo
=
lim
sup
t
n}
P0{
qn
t(A0)
< exp j '2(
,
4)
vc}
-
The corollary now follows since (TC2and v2 are nonnegative quantities.
Note. Having proved Theorem 3, we first thought that the uniform distribution might (asymptotically) maximize P(e) for [Ce] in a broader range of
buc,perhaps even for [uc > 1. As the example below shows, however, if [Ce]
is subject to the restrictions (9) only, then the lower bound for [uc cannot be
pushed below 2 + V2.
EXAMPLE 1. Two-Component Graph.
Let n = 2m, C = {1, 2, ..., m}, C =
{m + 1, m + 2, ... , 2m} and let cl, c2 be two positive constants. For e = (i, j),
set
C
e
| cl,
if i and j are both in the same component C or C,
if i and j are in different components.
|C2,
An easy computation shows that, besides the zero eigenvalue, the Kirchoff matrix M(A) with A = [Pe/(l-Pe)]
has an eigenvalue m(al+a2)
(of multiplicity
2m - 2), and another eigenvalue 2ma2 (of multiplicity 1). Here
1-
ai=
/n
i = 1,2.
It follows then from Lemma 2 and Lemma 3 that
t(A)
exp ( oC. 2
C2
2
+ C2/
~~~'Cl
(9) with
n
n-1C2 (C
The matrix [Ce] satisfies
Cl + C2
IC
2
,
2
C =
(Cl-C2_)2
>
c
4
So, using (10) and the formula for lim qn from Lemma 4, we obtain
P(e)
n-1C2
n-2 exp (Cl + C
2c
and
This content downloaded from 138.28.20.235 on Wed, 17 Jun 2015 13:14:53 UTC
All use subject to JSTOR Terms and Conditions
TREE AND FOREST WEIGHTS
209
Taking the ratio and simplifying,
P(4e)
lim, po-
(30)
_C
=
C2
l2
exp
1-
C2 +
P
liIP()I-e
< exp[crc2(i
[
2l
(
-Jc
L
1
1)
-
)
-
c
4)]
where the last inequality (expected according to our theorem) follows from
xe-x < e-1. Notice that for p o-c/c small, the limiting ratio in (30) becomes
(31)
exp[
Pf
(i2
-
4,c + 2) +
O(p3)]
Thus, to guarantee that this quantity falls below 1, we need to have
btc < 2 -XN~2or
,uc > 2+X.
Thus, somewhat counterintuitively, the "bad" btc's fill the interval [2 ,
2 + V2]. We thought about a possibility to push the lower bound above 2 + VX
by partitioning [n] into r > 2 equal size blocks. Surprisingly, this didn't change
(31) and 2 + VX didn't budge! Could it be that (31) is actually true for a much
broader class of [Ce]?
Note. For any class v of graphs on [n], the random graph model S(n, P)
induces a probability distribution of the form given in (1) with the graph iT
being represented by its adjacency matrix R(IT) and the canonical parameter
matrix 0 having (i, j) = e entry Oe = log[Pf(e)/(1 - Pf(e)]. In this case, the
normalizing constant in (2) becomes Cr(O)ox PQ(e). The model derived from
these choices in (1) gives the conditional probability under S(n, P) of a graph
7Tgiven that -T E e. In the remainder of the paper, we investigate Pg(?) for
various classes ve of rooted forests. For such classes, we find a remarkable
factorization of the model that takes the form of (1), but with Oe = log Pg(e).
4. Extension to forests.
Recall some basic definitions: a tree is said to
be rooted if it contains a specific node that is identified as a root. A spanning
forest on the node set [n] is a set of disjoint trees (some of which may be
isolated nodes) whose node sets form a partition of [n].
Let K c [n] with IKI = k; let -K be the collection of spanning forests consisting of k trees, each rooted at one of the nodes in K and let ek = UKCNeK*
If a forest 7T E -ek is represented by its n x n adjacency matrix R(7T), then,
analogously to the tree weight, we define the k-forest weight fk of A = [ae] by
(32)
fk(A)=
L
7E ek
H ae
eeE((r)
Note that when k = 1, we have f (A) = nt(A), where
because a single tree on [n] may be rooted at any of its n
Given K c [n], let fK(A) denote the total weight of
each tree having exactly one vertex from K. [So f [n](A) =
Then we may formulate the following theorem.
the factor n arises
nodes.
all forests of trees,
t(A), in particular.]
This content downloaded from 138.28.20.235 on Wed, 17 Jun 2015 13:14:53 UTC
All use subject to JSTOR Terms and Conditions
210
B. D. JONES, B. G. PITTEL AND J. S. VERDUCCI
THEOREM4.
fK=MK.
Consequently,
L
fk(A)=
MK(A),
KC[n]: KI=k
where, we recall, MK(A) denotes the determinant of the submatrix of M(A)
with rows and columns indexed by the nodes from KC.
Note. For a_--1, the second relation can be read out from Biggs (1974),
Chapter 6, Theorem 7.5. For the general case, Moon (1970) attributes the first
relation (with IKI = 2) to Percival (1953). Our proof combines elements from
Biggs and Moon.
PROOF. Let I be the n x N incidence matrix of Kn. Its rows are indexed by
the nodes and the columns by the edges. A column labelled e = (i, j) has two
nonzero entries, Iie = -IjeS both of unit absolute value. It is well known that
I is unimodular; that is, all nonsingular square submatrices of I have their
determinants equal to 4?1. Introduce a reduced incidence matrix IK obtained
from I by deleting the rows labelled by the nodes from K.
LEMMA 5. Let B be a square submatrix of order n-k
Then B is nonsingular if and only if:
Of IK, where k = IK.
(i) the edges corresponding to the (n-k) columns of B determine a spanning
subforest of Kn that consists of k trees; and
(ii) each of the k trees contains exactly one node from K.
For a proof, see Biggs (1974), Chapter 2, Lemma 7.4.
Let D be a diagonal N x N matrix, with Dee = aeSand define JK
Then, applying the Binet-Cauchy theorem,
det(IKDI
K)
=
det(JK
=
IKD1.
Jt)
{det
J
}2;
here the sum is over all (n - k) x (n - k) submatrices g of JK. Each such X
determines the corresponding submatrix B of IK, so that
det2=
[
ae]det 2B,
where the product is over all the edges e E EB that correspond to the (n - k)
columns of B. Now, by Lemma 5, Idet BI = 1 or 0, dependent upon whether
the corresponding (n - k) edges are the edges of a forest of k trees each having
exactly one vertex from K. We conclude that
det(IKDI
)=
fK(A).
This content downloaded from 138.28.20.235 on Wed, 17 Jun 2015 13:14:53 UTC
All use subject to JSTOR Terms and Conditions
TREE AND FOREST WEIGHTS
211
Finally, for i E KC,
L
(IKDI K)ii=
(IK)ieae(Ie)ei
eeE(Kn)
L
-
(34)
[(IK)ie]2 ae
eczE(K,J
ae
=-L
eczE(K,): eDi
=Mii,
and for j E KC, j
#
i,
(IKDI
L
)ij=
(IK)ieae(IK)ej
eeE(K,J)
(35)
,
-
(IK)ie(IK)jeae
eeE(K,,)
=-a(i,j.
Therefore,
det(IKDItr)
det[mij]i, jeKC = MK. *D
=
The following is a generalization of Theorem 1 to the forest weight.
THEOREM5.
max{fk(A):
Le ae = W} = f k(A*) = (w/1N)n-kC()knn-k-.
PROOF. Recall that
fk(A)=
L MK(A)
KcN: IKI=k
=
E
Aj,
,
JcN: lJI=n-k jEJ
where the A's are the eigenvalues of the Kirchhoff matrix for the weighted
matrix A. But f k(A) is a Schur concave function
adjacency
(recall
A77 =
H?UiA
JcN: IJI=n-k jEJ
(36)
on {A1, ...,
A1z_1}
0) [see, e.g., Marshall and Olkin (1979), page 61)]. Hence
<
=(
n
n-1
I)
)(tr
n-k n
W(
k(n)
'=1 A
(M) )7
n-
n-k-
The inequality on line one becomes the equality in the uniform case A*
[w/N].
D
This content downloaded from 138.28.20.235 on Wed, 17 Jun 2015 13:14:53 UTC
All use subject to JSTOR Terms and Conditions
=
212
B. D. JONES, B. G. PITTEL AND J. S. VERDUCCI
Let f(A) denote the total weight of all forests of rooted trees, that is,
n
f(A)=
Z fk(A).
k=1
COROLLARY
2. max{f(A):
e ae = w} =
f(A*) = (1 + wn/N)7z'.
For the general edge-weight distribution, summing both sides of the equation
L
fk(A)=
HAi
JcN: IJI=n-k jeJ
from k = 1 to n, we get
n-i
f(A)
=
fl>ni(1 + Aj) =
(37)
H (1 + Aj)
~~~~~~~~~j=1
= det(I,, + M(A)).
Note. Chung and Langlands (1996) [see also Chung (1997)] proved (for unit
edge weights) that the number of rooted forests spanning an induced subgraph equals the product of its vertex degrees and the Dirichlet eigenvalues
of the corresponding Laplacian operator. With the vertex degrees replaced by
the vertex weights, the corresponding formula would certainly hold for the
weighted case, too. If we consider Kn as a subgraph of Kn+l, with each of the
extra n edges being of unit weight, then (37) is a consequence of that general
identity.
5. On the intrinsic divisibilty of the number of rooted trees in a
random forest.
Formula (37) leads to a surprisingly concise description of
the distribution of the number of tree components in a random spanning forest
generated from model (1). That is, suppose we pick a forest of rooted trees
with probability proportional to its weight. Then Pk, the probability that the
random forest consists of k (rooted) trees is given by
Pk
E_J:
-
IJ=n-k fIjeJ Ay
f1n=1(1 + Aj)
Consequently, the probability generating function (p.g.f.) of X, the number of
trees in the forest, is
Lk=1 xk LJ: J =n-k lIjEJ Aj
Aj
_ij(+l=( + Aj)
I1=(x?
n
1=
A
(1 + Ax)
Ai~~~
IJ
This content downloaded from 138.28.20.235 on Wed, 17 Jun 2015 13:14:53 UTC
All use subject to JSTOR Terms and Conditions
213
TREE AND FOREST WEIGHTS
This formula means that X has the same distribution as the sum of n independent, binary random variables ? j, with
Pr(Ej = 0) = 1- Pr(8j = 1)
=
Ai
Thus the number of trees in the random rooted forest possesses what we would
like to call an intrinsic divisibility property. Among previously discovered random variables with this property are the number of blocks in a random set
partition [Harper (1967)], the number of edges in the random matching set
[Heilmann and Lieb (1972), Godsil (1981)], the number of edges common for
a fixed edge set and the random spanning tree [Godsil (1984)] and the number of leaves in a random tree [Steele (1987)]. The distribution of the latter
turned out to be intimately connected to that of the number of blocks in Harper
(1967). Jeff Kahn (1997) has pointed out to us that our result could also have
been obtained from that of Godsil (1984), via the embedding of Kn into Kn+l
For a highly readable survey of the area, the reader is referred to Pitman
(1997). In it probabilistic methods are used to get strong bounds on the coefficients of polynomials whose roots are all real. In our case where the roots are
non-negative, it is well-known that the coefficient sequence is unimodal.
Whenever intrinsic divisibility is established, it quickly leads to various
results, including the asymptotic behavior of the distribution in question; see
Harper (1967), Godsil (1981) and Kahn (1996), for instance. In our case, since
n
Var X
=
Pr(cj = O)Pr(8j = 1)
E
j=1
n
A
J (1 + A.)2'
we obtain: if the matrix A changes with n
n
A
then (X
E[X])/Var112
variable [Durrett (1991)].
-
oo in such a way that
J=(1+ Aj)2
X converges, in distribution, to the standard normal
EXAMPLE2. Consider the d-dimensional cube Qd. Assume that all its
d2d-1 edges are of unit weight. The nonzero eigenvalues of the Kirchhof
matrix in this case are 2m of multiplicity (id), (1 < m < d) [see Chung
(1997)]. Let Xd be the number of trees in a random rooted forest which spans
Qd. Then, using standard tricks, we obtain
2d-1
EXd=
1
E
1+A
j=1 l+A
=n=1
(
m)
1+2m
2d
(38)
= H(1 + 0(d-1);
This content downloaded from 138.28.20.235 on Wed, 17 Jun 2015 13:14:53 UTC
All use subject to JSTOR Terms and Conditions
214
B. D. JONES, B. G. PITTEL AND J. S. VERDUCCI
A
2d-1
Var Xd = Y
d
m=l
-
2d
1?
td
2m
m J(I + 2m)2
(1 + O(d-1)).
Thus we can claim that (Xd - EXd)Var-112 Xd converges, in distribution, to
the standard normal variable. As a weak consequence of this result we can
claim that almost all trees in the random spanning forest are of order d, the
dimension of the cube.
Although it might seem plausible that the number of trees in a random
forest of unrooted trees is also representable as a sum of binary random variables, this is easily seen not to be the case, since for n = 3 the corresponding
p.g.f. is (x3 + 3x2 + 3x)/7, which has complex roots.
The second author expresses his gratitude to Mark
Acknowledgments.
Jerrum, Jeff Kahn and David Wilson for stimulating discussions on random
trees and forests during his visit to Dimacs Center (Rutgers University) in the
spring of 1997. All three of us are grateful to a referee for a careful reading
of the paper and many thoughtful suggestions.
REFERENCES
BARBOUR,
A. (1982). Poisson convergence and random graphs. Math. Proc. Cambridge Philos. Soc.
92 349-359.
BERAN,R. (1979). Exponential models for directional data. Ann. Statist. 7 1162-1178.
BIGGS,N. (1974). Algebraic Graph Theory. Cambridge Univ. Press.
BOLLOBAS,
B. (1985). Random Graphs. Academic Press, New York.
BOLLOBAS,B. and ERD6s, P. (1976). Cliques in random graphs. Math Proc. Cambridge Philos.
Soc. 80 419-427.
CHUNG,F. (1997). Spectral Graph Theory. Amer. Math. Soc., Providence, RI.
CHUNG,F. R. K. and LANGLANDS,
R. P. (1996). A combinatorial Laplacian with vertex weights.
J Combin. Theory Ser. A 75 316-327.
DURRETT,R. (1991). Probability: Theory and Examples. Wadsworth & Brooks/Cole, Belmont, CA.
ERD6S, P. and RENYI,A. (1959). On random graphs I. Publ. Math. Debrecen 6 290-297.
ERD6S, P. and RENYI,A. (1960). On the evolution of random graphs. Publ. Math. Inst. Hungarian
Acad. Sci. 5 17-61.
FARRELL,E. J. (1981). Tree polynomials of complete graphs and complete bipartite graphs. Pure
Appl. Math. Sci. 13 7-13.
GILBERT,E. N. (1959). Random graphs. Ann. Math. Statist. 30 1141-1144.
GODSIL,C. D. (1981). Matching behaviour is asymptotically normal. Combinatorica 1 369-376.
GODSIL,C. D. (1984). Real graph polynomials. In Progress in Graph Theory 281-293. Academic
Press, Toronto.
GODSIL,C. D. (1993). Algebraic Combinatorics. Chapman & Hall, New York.
G. R. (1976). An upper bound for the number of spanning trees of a graph. Discrete
GRIMMETT,
Math. 16 323-324.
G. R. and McDLARMID, C. J. H. (1975). On colouring random graphs. Math Proc. CamGRIMMETT,
bridge Philos. Soc. 77 313-324.
This content downloaded from 138.28.20.235 on Wed, 17 Jun 2015 13:14:53 UTC
All use subject to JSTOR Terms and Conditions
TREE AND FOREST WEIGHTS
215
GRONE, R. and MERRIS, R. (1988). A bound for the complexity of a simple graph. Discrete Math.
69 97-99.
HARPER, L. H. (1967). Sterling behavior is asymptotically normal. Ann. Math. Statist. 38 410-414.
HEILMANN, 0. J. and LIEB, E. H. (1972). Theory of monomer-dimer systems. Comm. Math. Phys.
25 190-232.
JANSON, S., KNUTH, D. E., LUCZAK,T. and PITTEL, B. (1993). The birth of a giant component.
Random Structures Algorithms 4 233-358.
JONES, B. D. (1995). Tree components in random graph processes with nonuniform edge proba-
bilities. Ph.D. dissertation, Dept. Statistics, Ohio State Univ.
KAHN, J. (1996). A normal law for matchings. Dept. Mathematics, Rutgers Univ. Preprint.
KAHN, J. (1997). Personal communication.
KARONSKI, M. (1995). Random graphs. In Handbook of Combinatorics I 351-380. MIT Press.
KELMANS, A. K. (1972). Asymptotic formulas for the probability of k-connectedness of random
graphs. Theory Probab. Appl. 17 243-254.
LuczAK,T., PITTEL, B. and WIERMAN, J. C. (1994). The structure of a random graph at the point
of phase transition. Trans. Amer. Math. Soc. 341 721-748.
MARSHALL, A. W. and OLKIN, I. (1979). Inequalities: Theory of Majorization and Its Applications.
Academic Press, New York.
MAXWELL, J. C. (1892). A Treatise on Electricity and Magnetism I, 3rd ed. Clarendon Press,
Oxford.
MINC, H. (1978). Permanents. Encyclopedia of Mathematics and Its Applications 6. AddisonWesley, Reading, MA.
MOON, J. W. (1970). Counting Labelled Trees. Clowes, London.
PALMER, E. M. (1985). Graphical Evolution. Wiley, New York.
PERCIVAL, W. S. (1953). The solution of passive electrical networks by means of mathematical
trees. Proceedings of the Institute of Electrical Engineers 100 143-150.
PITMAN, J. (1997). Probabilistic bounds on the coefficients of polynomials with only real zeros.
J Combin. Theory Ser. A 77 279-303.
PITTEL, B. (1990). On tree census and the giant component in sparse random graphs. In Random
Structures and Algorithms 1 311-342. Wiley, New York.
STEELE J. M. (1987). Gibbs measures on combinatorial objects and the central limit theorem for
an exponential family of random trees. Probab. Engrg. Inform. Sci. 1 47-59.
STEPANOV, V. E. (1969). Combinatorial algebra and random graphs. Theory Probab. Appl. 14
373-399.
STEPANOV, V. E. (1970a). On the probability of connectedness of a random graph G,,(t). Theory
Probab. Appl. 15 55-56.
STEPANOV, V. E. (1970b). Phase transitions in random graphs. Theory Probab. Appl. 15 187-203.
VAN LINT, J. H. and WILSON, R. M. (1992). A Course in Combinatorics. Cambridge Univ. Press.
VERDUCCI, J. S. (1989). Minimum majorization decomposition. In Contributions to Probability and
Statistics (L. J. Gleser, M. D. Perlman, S. J. Press and A. R. Sampson, eds.) 160-173.
Springer, New York.
B. G. PITTEL
B. D. JONES
DEPARTMENTOF MATHEMATICS
KENYON COLLEGE
ASCENSION HALL
GAMBIER, OHIO 43022
DEPARTMENT OF MATHEMATICS
OHIO STATE UNIVERSITY
MATHEMATICSTOWER
231 W. EIGHTEENTH AVENUE
COLUMBUS, OHIO 43210
J. S. VERDUCCI
DEPARTMENT OF STATISTICS
OHIO STATE UNIVERSITY
COCKINS HALL
1958 NEIL AVENUE
COLUMBUS, OHIO 43210
This content downloaded from 138.28.20.235 on Wed, 17 Jun 2015 13:14:53 UTC
All use subject to JSTOR Terms and Conditions