Download Statistical Estimation of Diffusion Network Topologies and more Lecture notes Network Design in PDF only on Docsity!
Statistical Estimation of Diffusion
Network Topologies
- What can diffusion network represent? ➢ Epidemic spread-out network (like COVID-19) ➢ Social network
- How can diffusion network be used? ➢ Prediction of number of cases
- Diffusion network reconstruction aims at recovering these influence relations based
on diffusion results observed from historical diffusion processes.
Background
- What is diffusion network? ➢Diffusion network is a directed graph that represent the diffusion relation between nodes (usually people and users)
➢Precise quarantine
Problem Statement
➢ Diffusion process:
A
B
D
E
C A
B
D
E
C A
B
D
E
C
END
Start with some
initially infected nodes
Try to infect children with
probability for only once
Until there are no newly
infected nodes
➢ Assumptions:
- All diffusion processes are independent to each other
- All diffusion processes are following the same network topology
Problem Statement
Infection Status Records A^ B^ C^ D^ E Process S^1 0 1 1 1 0
Process Sβ^ 1 0 1 1 0
…^ …^ …^ …^ … …
We are given : a set S = { S^1 , ..., Sβ } of infection status (0/1) results observed on a diffusion network G in β diffusion processes
A
B
D
E
C A
B
D
E
C
A
B
D
E
C A
B
D
E
C
… Process
Process # β
TENDS Algorithm: Overview
Use infection MI to prune out less
likely candidate parent nodes
For each node 𝑣𝑖 in the graph
Search parent nodes that maximize
scoring criterion 𝑔
Reach upper bound?
Parent nodes for 𝑣𝑖
Y
N
- Pruning candidate parent nodes :
calculate infection MI value for
each node pair and performs K-
means to select candidate parent
nodes
- Greedy search for the parent node
set Fi of each node vi :
calculate corresponding scores, and
then continuously expand the
parent node set Fi with the highest
scored parent node sets until
reaching the upper bound or no
candidate parent node left.
TENDS Algorithm: Details
Use infection MI to prune out less
likely candidate parent nodes
Search parent nodes that maximize
scoring criterion 𝑔
Reach upper bound?
Parent nodes for 𝑣𝑖
Y
N
(1) Squeeze search space
➢ We screen out the insignificant candidate
parent nodes whose infections have low
correlations
➢ We modify the original MI metric as a new
version called infection MI to better measure
the positive correlation:
( (^) i j i j i j i j i j
IMI X MI X X MI X X
MI X
X
X MI X X
For each node 𝑣𝑖 in the graph
TENDS Algorithm: Details
Use infection MI to prune out less
likely candidate parent nodes
Search parent nodes that maximize
scoring criterion 𝑔
Reach upper bound?
Parent nodes for 𝑣𝑖
Y
N
(3) Upper bound on number of parent nodes
➢ From naïve constraints on the scoring criterion:
we can derive an upper bound on the number of
parent nodes for each node
where
| Fi | log( (^) F (^) i + i )
For each node 𝑣𝑖 in the graph
1 2 1 2
i^2 N^ log^2 N log^ log(^ 1)
N N
g v ( (^) i , Fi (^) ) g v ( (^) i ,)
TENDS Algorithm: Analysis
Use infection MI to prune out less
likely candidate parent nodes
Search parent nodes that maximize
scoring criterion 𝑔
Reach upper bound?
Parent nodes for 𝑣𝑖
Y
N
➢ Complexity:
- For infection mutual information: quadratic
- For subgraph structure scan: ~linear
- Total: quadratic
➢ Insights and key idea:
- We are keep finding local optimal structures:
find optimal parent nodes
- Infection mutual information is very useful to
roughly measure the infection relations
- To find the directed edges, we use asymmetric
likelihood as scoring criterion
For each node 𝑣𝑖 in the graph
Experiment Settings
➢ Performance Criterion : F - score of inferred directed edges is used to evaluate the accuracy
performance of algorithms.
➢ Benchmark Algorithms :
(1) sub modularity-based approach MulTree : consider all propagation tree supported by
diffusion processes
(2) convex programming-based approach NetRate : convex optimization method to find optimal
topology method and infers the edge weights as well.
(3) infection timestamp-free approach LIFT : a non-temporal method but requires diffusion
sources
Experimental Evaluation
➢ Effect of Diffusion Network Size : we adopt five synthetic networks, of which the sizes vary
from 100 to 300. We simulate 150 times of diffusion processes on each network. In each
simulation, 0.15 n nodes are randomly selected as the initial infected nodes.
(a) F-score (^) (b) Running Time
Experimental Evaluation
➢ Effect of Infection MI-based Pruning Method : we test the algorithms on NetSci and DUNF
with different pruning threshold, varying from 0.4tau to 2tau, and for each MI threshold, we
simulate 150 diffusion processes on each network. In each diffusion process, we randomly select
0.15 n nodes as the initial infection nodes.
Effect of Infection MI-based Pruning Method on NetSci Effect of Infection MI-based Pruning Method on DUNF
(a) F-score (b) Running Time (a) F-score (b)^ Running Time
Takeaway
➢ Contribution: we proposed a diffusion network topology reconstruction method, using a scoring criterion and the upper bound of parent node size, with the help of a pruning method using infection mutual information.
➢ Exact timestamps in diffusion records are hard to get and misleading; we do not necessarily need them.
➢ Experiments showed that our method is robust a wide range of network settings.