Statistical Estimation of Diffusion Network Topologies, Lecture notes of Network Design

The concept of diffusion network and how it can be used to represent epidemic spread-out and social networks. It also discusses the Diffusion network reconstruction and the problem statement of inferring network topology. The TENDS algorithm is introduced, which prunes candidate parent nodes and performs a greedy search for the parent node set of each node. The document also includes an analysis of the algorithm's complexity and key ideas.

Typology: Lecture notes

2021/2022

Uploaded on 05/11/2023

thehurts
thehurts 🇺🇸

4.5

(11)

219 documents

1 / 18

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Statistical Estimation of Diffusion
Network Topologies
2020/4/15 1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12

Partial preview of the text

Download Statistical Estimation of Diffusion Network Topologies and more Lecture notes Network Design in PDF only on Docsity!

Statistical Estimation of Diffusion

Network Topologies

  • What can diffusion network represent? ➢ Epidemic spread-out network (like COVID-19) ➢ Social network
  • How can diffusion network be used? ➢ Prediction of number of cases
  • Diffusion network reconstruction aims at recovering these influence relations based

on diffusion results observed from historical diffusion processes.

Background

  • What is diffusion network? ➢Diffusion network is a directed graph that represent the diffusion relation between nodes (usually people and users)

➢Precise quarantine

Problem Statement

➢ Diffusion process:

A
B
D
E
C A
B
D
E
C A
B
D
E
C
END

Start with some

initially infected nodes

Try to infect children with

probability for only once

Until there are no newly

infected nodes

➢ Assumptions:

  • All diffusion processes are independent to each other
  • All diffusion processes are following the same network topology

Problem Statement

Infection Status Records A^ B^ C^ D^ E Process S^1 0 1 1 1 0

Process ^ 1 0 1 1 0

…^ …^ …^ …^ … …
  • Given diffusion records:

We are given : a set S = { S^1 , ..., } of infection status (0/1) results observed on a diffusion network G in β diffusion processes

A
B
D
E
C A
B
D
E
C
A
B
D
E
C A
B
D
E
C

… Process

Process # β

TENDS Algorithm: Overview

Use infection MI to prune out less

likely candidate parent nodes

For each node 𝑣𝑖 in the graph

Search parent nodes that maximize

scoring criterion 𝑔

Reach upper bound?

Parent nodes for 𝑣𝑖

Y

N

  • Pruning candidate parent nodes :

calculate infection MI value for

each node pair and performs K-

means to select candidate parent

nodes

  • Greedy search for the parent node

set Fi of each node vi :

calculate corresponding scores, and

then continuously expand the

parent node set Fi with the highest

scored parent node sets until

reaching the upper bound or no

candidate parent node left.

TENDS Algorithm: Details

Use infection MI to prune out less

likely candidate parent nodes

Search parent nodes that maximize

scoring criterion 𝑔

Reach upper bound?

Parent nodes for 𝑣𝑖

Y

N

(1) Squeeze search space

➢ We screen out the insignificant candidate

parent nodes whose infections have low

correlations

➢ We modify the original MI metric as a new

version called infection MI to better measure

the positive correlation:

( (^) i j i j i j i j i j

IMI X MI X X MI X X
MI X
X
X MI X X

For each node 𝑣𝑖 in the graph

TENDS Algorithm: Details

Use infection MI to prune out less

likely candidate parent nodes

Search parent nodes that maximize

scoring criterion 𝑔

Reach upper bound?

Parent nodes for 𝑣𝑖

Y

N

(3) Upper bound on number of parent nodes

➢ From naïve constraints on the scoring criterion:

we can derive an upper bound on the number of

parent nodes for each node

where

| Fi | log(  (^) F (^) i + i )

For each node 𝑣𝑖 in the graph

1 2 1 2

i^2 N^ log^2 N log^ log(^ 1)

N N

g v ( (^) i , Fi (^) )  g v ( (^) i ,)

TENDS Algorithm: Analysis

Use infection MI to prune out less

likely candidate parent nodes

Search parent nodes that maximize

scoring criterion 𝑔

Reach upper bound?

Parent nodes for 𝑣𝑖

Y

N

➢ Complexity:

  • For infection mutual information: quadratic
  • For subgraph structure scan: ~linear
  • Total: quadratic

➢ Insights and key idea:

  • We are keep finding local optimal structures:

find optimal parent nodes

  • Infection mutual information is very useful to

roughly measure the infection relations

  • To find the directed edges, we use asymmetric

likelihood as scoring criterion

For each node 𝑣𝑖 in the graph

Experiment Settings

➢ Performance Criterion : F - score of inferred directed edges is used to evaluate the accuracy

performance of algorithms.

➢ Benchmark Algorithms :

(1) sub modularity-based approach MulTree : consider all propagation tree supported by

diffusion processes

(2) convex programming-based approach NetRate : convex optimization method to find optimal

topology method and infers the edge weights as well.

(3) infection timestamp-free approach LIFT : a non-temporal method but requires diffusion

sources

Experimental Evaluation

➢ Effect of Diffusion Network Size : we adopt five synthetic networks, of which the sizes vary

from 100 to 300. We simulate 150 times of diffusion processes on each network. In each

simulation, 0.15 n nodes are randomly selected as the initial infected nodes.

(a) F-score (^) (b) Running Time

Experimental Evaluation

➢ Effect of Infection MI-based Pruning Method : we test the algorithms on NetSci and DUNF

with different pruning threshold, varying from 0.4tau to 2tau, and for each MI threshold, we

simulate 150 diffusion processes on each network. In each diffusion process, we randomly select

0.15 n nodes as the initial infection nodes.

Effect of Infection MI-based Pruning Method on NetSci Effect of Infection MI-based Pruning Method on DUNF

(a) F-score (b) Running Time (a) F-score (b)^ Running Time

Takeaway

➢ Contribution: we proposed a diffusion network topology reconstruction method, using a scoring criterion and the upper bound of parent node size, with the help of a pruning method using infection mutual information.

➢ Exact timestamps in diffusion records are hard to get and misleading; we do not necessarily need them.

➢ Experiments showed that our method is robust a wide range of network settings.