Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Graph Generation with Prescribed
Feature Constraints
Xiaowei Ying Xintao Wu
Univ. of North Carolina at Charlotte
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
Framework
• Motivation
-- Generate graphs for publishing social network
-- Generate graphs for testing data mining results
• Generator with feature range constraint
-- Privacy risks introduced by feature constraints
• Generator with feature distribution constraint
2
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
Motivation
Publishing social networks: Privacy VS. Utility
Privacy issue: anonymization is not enough
Active/passive attacks[Backstrom, et. al., WWW07]
Subgraph attacks [M. Hay et. al., VLDB08]
K-anonymity in social networks
[B. Zhou, et. al. ICDE08] [K. Liu et. al., SIGMOD08]
Randomization approach
Local topology is changed – reduce re-identification risk
Links are randomized – link privacy is pretected
3
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
Motivation
Publishing social networks: Privacy VS. Utility
Randomization Approach
-- Pure randomization can’t preserve many topological features. [Ying SDM08]
1 -- the largest eigenvalue of adjacency matrix
2 -- the second smallest eigenvalue of Laplacian matrix
h -- harmonic mean of shortest distance
C -- transitivity
How to generate graphs preserving data utility?
4
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
Motivation
Generate graphs for testing data mining results
-- Generate a set of graph samples s.t. a feature of the samples
satisfies a specified distribution.
5
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
Framework
• Motivation
-- Generate graphs for publishing social network
-- Generate graphs for testing data mining results
• Generator with feature range constraint
-- Privacy risks introduced by feature constraints
• Generator with feature distribution constraint
6
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
Switch and Uniform Graph Generator
Uniform switch procedure [Taylor, 1981]
1. Accessibility:
can 4access all the1 graph with
the given degree
4
1
sequence
5
5
2. Uniformity: all such graphs have the same probability to be
3
2
generated 2
3
-- Preserves the degree sequence/distribution
3. Application: empirically learning the property of graph
features given degree seq.
7
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
Graph Generator with FRC
How to generate a graph:
1.
2.
with the given degree sequence
with the feature range constraint (FRC):
~
S (G ) R [ s , s ] S (G ) R
8
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
Framework
• Motivation
-- Generate graphs for publishing social network
-- Generate graphs for testing data mining results
• Generator with feature range constraint
-- Privacy risks introduced by feature constraints
• Generator with feature distribution constraint
9
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
Graph Generator and Privacy Issues
Privacy risks introduced by FRC
Attackers know:
1. The released graph preserve the true degree sequence
2. The true graph has its S feature within range R
What attackers can do?
With the released graph, attackers can explore the
graph space
10
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
Graph Generator and Privacy Issues
Graph space :
{G: with the given degree seq. & S (G ) R}
Uniformly sample the graph space:
N samples : G1 , G2 , , GN Space
1
P (aij 1 | Space)
N
Attacker’s confidence on link (i,j)
11
N
G (i, j )
k 1
k
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
FRC Can Jeopardize Privacy
--A real network example
Network of US political books
1 :
(105 nodes, 441 edges)
2 :
Books about US politics sold by
h:
Amazon.com. Edges represent
C :co-purchasing of books by
frequent
the same buyers. Nodes have been
given colors of blue, white, or red
to indicate whether they are
"liberal",
"neutral",
or
"conservative".
http://www-personal.umich.edu/˜mejn/netdata/
12
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
FRC Can Jeopardize Privacy
--A real network example
The attacker simply takes t node
pairs with the highest probabilities
as candidate links
Top candidates can seriously
jeopardize privacy!!
Some features jeopardize
privacy, and some others not
13
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Polbook network
105 nodes, 441 edges
Graph Generation with Prescribed Feature Constraints
FRC Can Jeopardize Privacy
-- More real network examples
Polbook network
105 nodes, 441 edges
14
Enron email network
151 nodes, 869 edges
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
FRC Can Jeopardize Privacy
-- A theoretical result
15
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
FRC Can Jeopardize Privacy
-- A theoretical result
Graphs with given degree seq.
Conclusion:
If the FRC specifies a sub-space
close to the true graph, privacy is
seriously breached
d
True graph
Feature range
constraint
16
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
Framework
• Motivation
-- Generate graphs for publishing social network
-- Generate graphs for testing data mining results
• Generator with feature range constraint
-- Privacy risks introduced by feature constraints
• Generator with feature distribution constraint
17
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
Graph Generator with FDC
Feature Distribution Constraint (FDC)
Uniform generator:
Natural distribution f(x)
• gives the natural distribution
of feature S, highly skewed in
the range
How to generate graphs s.t.
• with given degree seq.
Target distribution g(x)
• features value has the target
distribution g(x)
18
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
Graph Generator with FDC
Based on Metropolis-Hastings method
Accept ratio depends on target distr. g(x) &
natural distr. f(x)
19
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
Graph Generator with FDC
Evaluation
Natural distribution:
Target distribution:
20
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
Summary
Graph generator with feature range constraint
Attackers can sample the graph space near the true graph and breach
the privacy.
Graph generator with feature distribution constraint
21
Generate a set of graphs samples for statistical testing
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Thank You!
Questions?
Acknowledgments
This work was supported in part by U.S. National
Science Foundation IIS-0546027 and CNS-0831204.
22
Graph Generation with Prescribed Feature Constraints
Graph Generator and Privacy Issues
Example: graphs with degree sequence {3,2,2,2,3}.
Is node 1 and 5 connected?
Published graph
23
True graph
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada
Graph Generation with Prescribed Feature Constraints
Graph Generator with FDC
Problem of generator with FRC:
Uniform generator:
Real-world graph
• gives the natural distribution of
feature S
Range
• highly skewed in the range
• generates biased feature value
24
2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada