






Studia grazie alle numerose risorse presenti su Docsity
Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium
Prepara i tuoi esami
Studia grazie alle numerose risorse presenti su Docsity
Prepara i tuoi esami con i documenti condivisi da studenti come te su Docsity
Trova i documenti specifici per gli esami della tua università
Preparati con lezioni e prove svolte basate sui programmi universitari!
Rispondi a reali domande d’esame e scopri la tua preparazione
Riassumi i tuoi documenti, fagli domande, convertili in quiz e mappe concettuali
Studia con prove svolte, tesine e consigli utili
Togliti ogni dubbio leggendo le risposte alle domande fatte da altri studenti come te
Esplora i documenti più scaricati per gli argomenti di studio più popolari
Ottieni i punti per scaricare
Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium
Una introduzione alla teoria e alle tecniche di analisi univariate e multivariate dei dati, compresi i concetti di variabile casuale continua e discreta, campione casuale, stima parametrica e analisi multivariate come PCA e cluster analysis. Vengono inoltre presentate diverse tecniche di analisi come la validazione delle tendenze al cluster e le diverse distribuzioni random come la normale, gamma e poisson.
Tipologia: Sintesi del corso
1 / 10
Questa pagina non è visibile nell’anteprima
Non perderti parti importanti!







whole real line
𝑏
𝑎
integral
(area below the interval) from a to
b of f(x) dx (for every a and b
belonging to the whole real line)
equal to 1 - > ∫
+∞
−∞
integral from - infinite to +infinite of
f(x) dx =1 (area below the curve)
to zero - > P(X=x0) = 0
point
support
𝑥∈Sx
the summatory
of all the probability for x belonging
to the support is equal to 1
F(x) = P (X <x) = ∫
𝑥
−∞
integral
from - infinite to x of f(x) dx
P(X) = P(X<x)
0 < F(x) < 1
If x 0
< x 1
F(x 0
1
E(x) = ∫
+∞
−∞
integral from
𝑥∈𝑆𝑥
summatory of
x*p(x)
distance from the mean = ∫
+∞
−∞
2
𝑑𝑥 - > integral from - infinite
to +infinite (x – E(x))
2
f(x) dx
𝑥∈𝑆𝑥
Summatory of (x-E(x))
2
p(x)
f (x; mu, sigma
2
mu belongs to R
sigma
2
f (x; alpha, beta)
x>
alpha>
0
= 1-p
1
= p
E(x) = p
Var= p(1-p)
p (x; m, p)
E(x) = mp
Unbiased estimator
Positively biased estimator (over)
Negatively biased estimator (under)
to the estimated parameters) best
fits our distribution from the
random sample
2
and p-value
Measure the maximum distance
between one distribution and a
theoretical one or between two
distributions (delta and p-value)
Compare two nested model (LR and
p-value)
Space: orthogonal axis with high redundancy which carry as much information
(variance) as possible and are linearly independent
o Between arrows: <90° positive correlation, =180° negative correlation, =0° no
correlation, length of the arrow= well explained
o Arrows and PC: <90° well explained by the PC, same direction = positive
correlation, opposite direction = negative correlation
o Origin: mean centered
o Kaiser’s Rule: Variance >
o Proportion of variance explained: percentage of variance explained by the PC
o Scree plot: elbow method
continuous
variable
o Choose random k
o Compute within sum of square of each cluster
o Plot the within sum of square versus clusters
o Search the elbow on the plot
o Choose random k
o Compute avg.sil for each cluster
o Plot avg.sil
o Choose maximum
o Compute within sum of square of random clustered
o Generate 500 uniform distribution and compute within sum of square
o Compute GAP(k) =
1
𝐵
∑ log(𝑊𝑆𝑆𝑘𝑏)−log (𝑊𝑆𝑆𝑘)
𝐵
𝑏= 1
o Choose the smallest k - > GAP(K) > GAP(K+1) - S k+
Clusters with low dissimilarity are merged together and we can choose the height we
prefer.
Linkage method:
Algorithm:
Validation:
Correlation between cophenetic distance and original distance - > [0,1] maximize
Each cluster is represented by the mean of all the observation in the cluster
Algorithm:
Clusters are represented by a single point with minimum dissimilarity distance which
represents all the observations of the cluster.
Algorithm:
Each cluster comes from its own distribution and has its own vector of parameters
(Mixture of Gaussian distribution)
p(x; theta) = weight 1
1
(x; theta
1
) + … + weight
k
k
(x; theta
k
theta = mu k
covariance matrix - > volume, orientation, shape (E = equal, V = Variable, I = axix-
aligned (orientation) – spherical (shape))
Choosing K: Maximize Bayesian Information Criterion