

Besser lernen dank der zahlreichen Ressourcen auf Docsity
Heimse Punkte ein, indem du anderen Studierenden hilfst oder erwirb Punkte mit einem Premium-Abo
Prüfungen vorbereiten
Besser lernen dank der zahlreichen Ressourcen auf Docsity
Download-Punkte bekommen.
Heimse Punkte ein, indem du anderen Studierenden hilfst oder erwirb Punkte mit einem Premium-Abo
Cheat sheet for statistics for network science.
Art: Zusammenfassungen
1 / 3
Diese Seite wird in der Vorschau nicht angezeigt
Lass dir nichts Wichtiges entgehen!


Examples of common distributions:
2
A continuous random variable X with 𝑓(𝑥) =
1
√ 2 𝜎𝜋
−
( 𝑥−𝜇
)
2
2 𝜎
2
2
a. Discrete:
𝑋
1
𝑏−𝑎+ 1
, if 𝑘 = 𝑎, 𝑎 + 1 , … , 𝑏,
0 , otherwise.
𝑎+𝑏
2
b. Continuous:
𝑋
1
𝑏−𝑎
, if 𝑎 ≤ 𝑥 ≤ 𝑏,
0 , otherwise.
𝑎+𝑏
2
HEAVY TAIL: A random variable 𝑋 with CCFD 𝐹
(𝑥) is
heavy-tailed if the tail is not bounded by an exponential.
FAIT TAIL: A random variable 𝑋 with CCFD 𝐹
(𝑥) is fat-
tailed if 𝐹
−𝛾
for a 𝛾 > 0.
Continuous case:
min
min
−𝛼
min
−𝛼+ 1
Discrete case:
−𝛼
How to fit a power law:
Likelihood Estimation
, through the
Kolmogorov-Smirnov test statistic. (a) Calculate the
maximum distance, D, between the CDFs of data and
fit, (b) Take the 𝑥
min
that minimizes the distance.
p-value. The p-value is the probability that a data set
of the same size that is truly drawn from the
hypothesized distribution would have goodness of fit
D worse than the observed value.
a. Create many (at least 10,000) synthetic
power-law data
b. For each, fit a new power law and calculate
the KS statistic
c. p-value = fraction of KS statistics for
synthetic that exceed for real
d. If p-value is small, power law can be ruled
out
(power law with cutoff, lognormal, stretched
exponential) using likelihood ratio test
Process of scientific understanding:
1 ) Measure
Infer
Statistical inference is the process of deducing
properties of
an underlying probability distribution by analysis of data.
We try to infer properties of a population by analyzing a
sample.
Statistical inference approaches fall into 2 categories:
Parametric estimation: Making an educated guess about
the value of a parameter or a value range of parameters.
Prediction: If we want to estimate something else, like
the
next value in Xn+1 a sequence X1,... ,Xn
i. State the null hypothesis , such as , and an
alternative hypothesis H1 : ✓ 6= ✓ 0. This
simplest explanation is the null hypothesis
or null model. It assumes that chance is
responsible for an observation.
ii. Statistical assumptions: independence,
confounding, normality, sample size
iii. Compute the test statistic from the data and
compare it to the reference distribution that
describes if is true
iv. If lies in the "extreme regions" of F T , reject
The p-value is the probability of obtaining a test
statistic at least as extreme as the one that was
actually observed, assuming that the null hypothesis
is true. If the p-value is lower than a predefined
significance level , like here, then is rejected. If lies in
the "extreme regions" of , reject. Otherwise, retain it - but
this does not prove!
Take a null model, like ER or configuration.
Generate 1000+ realizations with parameters
calibrated from
the data (here: choose p and N accordingly).
measures:
degree, clustering, characteristic path length, etc
random.
Problem: If I test n true null hypotheses at level 𝛼, then
on average I’ll still falsely reject 𝛼𝑛 of them.
20 hypotheses to test, 0.05 level of significance.
P(significant)=1-P(no significant)=1-(1-0.05)^
that are meaningful, useful, or both. Cluster analysis
is the study of techniques for automatically finding
classes.
Each data object is in exactly one subset.) vs
Hierarchical (A set of nested clusters organized as a
tree.)
vs Overlapping (Non-exclusive.) vs Fuzzy (Cluster
membership is a weight.)
(Top-down)
a. Select K points as initial centroids.
b. Repeat:
i. Form K clusters by assigning each point to
its closest centroid.
ii. Recompute the centroid of each cluster.
c. Until: centroids do not change.
Advantages: easy, fast, works well with globular
clusters of same size and density
Disadvantages: need to know number of partitions,
sensitive to initial conditions, not effective under
several conditions (can get stuck in local minimum).
K-MEDIOD. The prototype is the medoid, which is a
data point itself.
a. Label all points as core, border or noise points.
(Select a radius. Count all points within a radius.
This is the density of the point. Select a
parameter MinPts).
b. Eliminate noise points.
c. Put an edge between all core points that are
within epsilon of each other
d. Make each group of connected core points into
a separate cluster.
e. Assign each border point to one of the clusters
of its associated core points.
Advantages: resistant to noise, can handle clusters
of arbitrary shape and size
Disadvantages: cannot handle clusters with different
densities, can be computationally expensive
Divides a network into a pre-defined number of
smaller subgraphs. The cut is the set of links that
need to be removed to split a graph. To partition a
graph, we would like to minimize the cut, but also
get similarly sized sets. Normalized cut =
𝐶𝑢𝑡
( 𝑆,𝑇
)
𝑉𝑜𝑙
( 𝑆
)
𝐶𝑢𝑡
( 𝑆,𝑇
)
𝑉𝑜𝑙
( 𝑇
)
a. Divide the network into 2 arbitrary groups with
predefined size.
b. Inspect each pair of nodes between groups.
Identify the pair that results in largest reduction
of cuts and swap them.
c. Repeat until no swap improves.
Induces a dendrogram that records the sequences
of merges or splits
a. Compute the proximity matrix, if necessary.
b. Repeat:
i. Merge the closest two clusters (single link,
complete link, group average, centroid).
ii. Update the proximity matrix to reflect the
proximity between the new cluster and the
original clusters
c. Until: only one cluster remains.