Prepara i tuoi esami
Ottieni punti
Guide e consigli
Vendi su Docsity
Docsity AI

Prepara i tuoi esami

Studia grazie alle numerose risorse presenti su Docsity

Ottieni i punti per scaricare

Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium

Guide e consigli

Vendi su Docsity

Docsity AI

Accedi Registrati

Prepara i tuoi esami

Studia grazie alle numerose risorse presenti su Docsity

Cerca documenti

Prepara i tuoi esami con i documenti condivisi da studenti come te su Docsity

Cerca la tua università

Trova i documenti specifici per gli esami della tua università

Video Corsi

Preparati con lezioni e prove svolte basate sui programmi universitari!

Quiz

Rispondi a reali domande d’esame e scopri la tua preparazione

Docsity AINEW

Riassumi i tuoi documenti, fagli domande, convertili in quiz e mappe concettuali

Maturità 2026

Studia con prove svolte, tesine e consigli utili

Esplora domande

Togliti ogni dubbio leggendo le risposte alle domande fatte da altri studenti come te

Argomenti di studio

Esplora i documenti più scaricati per gli argomenti di studio più popolari

Ottieni i punti per scaricare

Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium

Condividi documenti

20 Punti

Per ogni documento caricato

Rispondi alle domande

5 Punti

per ogni risposta data (max 1 al giorno)

Tutti i modi per ottenere punti gratis

Ottieni punti subito

Scegli un piano Premium con tutti i punti di cui hai bisogno

Opportunità di studio

Scegli il tuo prossimo programma di studio

Entra in contatto con le migliori università del mondo e scegli il tuo percorso di studi

Classifica delle migliori università

Scopri le migliori università italiane secondo gli studenti

Community

Chiedi alla community

Chiedi aiuto alla community e sciogli i tuoi dubbi legati allo studio

Guide Gratuite

I nostri eBook salva studente

Scarica gratuitamente le nostre guide sulle tecniche di studio, metodi per gestire l'ansia, dritte per la tesi realizzati da tutor Docsity

Data Analysis - Unsupervised Statistical Learning (PCA, CA, Model-Based Clustering), Appunti di Analisi Dei Dati

Università degli Studi di Catania (UNICT)Analisi Dei Dati

Il documento presenta i tools per l'apprendimento non supervisionato (Unsupervised Statistical Learning) con spiegazioni, formule, esempi. Contenuti: 1) Univariate Statistical modelling, 2) Basics of Matrices, 3) Basics of Multivariate Statistics, 4) Principal Component Analysis, 5) Cluster Analysis, 6) Cluster Validation, 7) Model-Based Clustering

Tipologia: Appunti

2021/2022

In vendita dal 19/04/2022

crasssass 🇮🇹

1 documento

1 / 140

Questa pagina non è visibile nell’anteprima

Non perderti parti importanti!

1

Data Analysis

SLIDE 1 – Univariate Statistical modelling

Random variable

A random variable (rv) is a variable whose numerical values are determined by the

outcome of a random experiment. A rv can be: 1) discrete, if it can take no more than

a countable number of values; 2) continuous, if it can take any value in an interval.

1) Discrete random variable

Probability (mass) function: The probability mass function (pmf) 𝑝(𝑥) of a discrete rv

𝑋 expresses the probability that 𝑋 takes the value 𝑥, as a function of 𝑥. The pmf

𝑝 ∶ ℝ → [0, 1] is defined as 𝑝 (𝑥) = 𝑃 (𝑋 = 𝑥)

Support: the support 𝑆𝑥 of a discrete rv 𝑋 is defined as the set of possible values

of 𝑋, that is 𝑆𝑥= {𝑥 ∈ ℝ ∶ 𝑝 (𝑥) > 0}.

Properties: The pmf satisfies the following properties:

1) 𝑝 (𝑥)> 0 𝑓𝑜𝑟 𝑥 ∈ 𝑆𝑥 ;

2) 𝑝 (𝑥)= 0 𝑓𝑜𝑟 𝑥 ∉ 𝑆𝑥 ;

3) ∑𝑝(𝑥)= 1

𝑥∈𝑆𝑥

Cumulative probability function: The cumulative distribution function (cdf) 𝐹(𝑥) of a

discrete rv 𝑋 expresses the probability that 𝑋 does not exceed the value 𝑥, as a

function of 𝑥. The cdf 𝐹 ∶ ℝ → [0, 1] is defined as

𝐹(𝑥) = 𝑃 (𝑋 ≤ 𝑥) = ∑𝑝(𝑡)

{𝑡∈𝑆𝑥;𝑡≤𝑥}

where the notation indicates that the summation is over all possible values 𝑡 ∈ 𝑆𝑥

that are less than or equal to 𝑥.

Properties: The cdf satisfies the following properties:

1) 0 ≤ 𝐹 (𝑥) ≤ 1 for every 𝑥 ∈ ℝ;

2) if x0 and x1 are two numbers such that x0 < x1, then 𝐹 (𝑥0) ≤ 𝐹 (𝑥1)

Scopri Appunti di Analisi Dei Dati Università degli Studi di Catania (UNICT)

Documenti correlati

PCA & Clustering: Unsupervised Learning for Data Exploration

Text Clustering - Parte 1

Clustering analysis applicata

(1)

Clustering di dati multidimensionali - Prof. Marinai

Analisi dei Dati: Univariate e Multivariate - Prof. Punzo

Clustering Modellistico

relazione clustering

Concetti di base e metriche di clustering in Machine Learning

Clustering di Madri: Caratteristiche e Tipologie - Prof. Gritti

turismo e clustering spaziale

Clustering con modelli mistura

Data Mining e clustering

Anteprima parziale del testo

Scarica Data Analysis - Unsupervised Statistical Learning (PCA, CA, Model-Based Clustering) e più Appunti in PDF di Analisi Dei Dati solo su Docsity!

Data Analysis

SLIDE 1 – Univariate Statistical modelling

Random variable

A random variable (rv) is a variable whose numerical values are determined by the

outcome of a random experiment. A rv can be: 1) discrete, if it can take no more than

a countable number of values; 2) continuous, if it can take any value in an interval.

1) Discrete random variable

Probability (mass) function : The probability mass function (pmf) 𝑝(𝑥) of a discrete rv

𝑋 expresses the probability that 𝑋 takes the value 𝑥, as a function of 𝑥. The pmf

𝑝 ∶ ℝ → [ 0 , 1 ] is defined as 𝑝 (𝑥) = 𝑃 (𝑋 = 𝑥)

Support : the support 𝑆

𝑥

of a discrete rv 𝑋 is defined as the set of possible values

of 𝑋, that is 𝑆

𝑥

Properties : The pmf satisfies the following properties:

𝑥

𝑥∈𝑆

𝑥

Cumulative probability function : The cumulative distribution function (cdf) 𝐹(𝑥) of a

discrete rv 𝑋 expresses the probability that 𝑋 does not exceed the value 𝑥, as a

function of 𝑥. The cdf 𝐹 ∶ ℝ → [ 0 , 1 ] is defined as

{𝑡∈𝑆

𝑥

;𝑡≤𝑥}

where the notation indicates that the summation is over all possible values 𝑡 ∈ 𝑆

𝑥

that are less than or equal to 𝑥.

Properties : The cdf satisfies the following properties:

1 ) 0 ≤ 𝐹 (𝑥) ≤ 1 for every 𝑥 ∈ ℝ;

2 ) if x 0

and x 1

are two numbers such that x 0

< x 1

, then 𝐹 (𝑥

0

1

Expectation : The expectation (also called mean or expected value) of a discrete rv 𝑋

is defined as 𝐸(𝑋) = 𝜇

𝑋

𝑥∈𝑆

𝑥

, where the summation is over all possible

values 𝑥 ∈ 𝑆

𝑥

Generalization : Let 𝑋 be a discrete rv with pmf 𝑝(𝑥). Moreover, let 𝑔(𝑋) be

some function of 𝑋. The expected value of 𝑔(𝑋) is defined as

[

)]

𝑥∈𝑆

𝑥

Variance : The expectation of the squared discrepancy about the mean (𝑋 − 𝜇 𝑋

2

is

called the variance, commonly denoted by 𝜎

𝑋

2

, and it is given by

𝐸[(𝑋 − 𝜇

𝑋

2

] = 𝜎

𝑋

2

𝑋

2

𝑥∈𝑆

𝑥

2) Continuous random variables

Probability density function: Let 𝑋 be a continuous rv. The probability density

function (pdf) of 𝑋 is a function 𝑓 ∶ ℝ → ℝ

0

with the following properties

𝑓(𝑥) ≥ 0 for any 𝑥 ∈ ℝ

𝑏

𝑎

for any 𝑎, 𝑏 ∈ ℝ;

∞

−∞

If 𝑋 is a continuous rv, then the probability of a single point x 0

is null, that is

0

Cumulative distribution function : The cumulative distribution function 𝐹(𝑥) of a

continuous rv 𝑋 is 𝐹 (𝑥) = 𝑃 (𝑋 ≤ 𝑥) = ∫

𝑥

−∞

It easily follows that 𝑓 (𝑥) =

𝑑𝐹(𝑥)

𝑑𝑥

Expectation : The expectation of a continuous rv 𝑋 is defined as

𝑋

∞

−∞

∞

−∞

Generalization : Let 𝑋 be a discrete rv with pdf 𝑓(𝑥). Moreover, let 𝑔(𝑋) be

some function of 𝑋. The expected value of 𝑔(𝑋) is defined as

Beta: suited for phenomena with support within an interval (for instance, a

rate). (sotto)

Exponential: suited for describing time between event data (births, deaths,

etc).

Uniform: suited to describe phenomena with maximum uncertainty.

Random sample

A random sample of size 𝑛 is the set 𝑋

1

𝑖

𝑛

of rv’s associated to 𝑛

independent and identically distributed (iid) observations of the rv 𝑋.

Observed sample : an observed sample of size n is the set 𝑥

1

𝑖

𝑛

constituting the realizations of the rv’s 𝑋

1

𝑖

𝑛

through the 𝑛 sample units.

Sample (joint) distribution : the sample (joint) distribution is the joint distribution 𝑓

𝑛

(or 𝑝

𝑛

) of 𝑋

1

𝑖

𝑛

If 𝑋 is a continuous rv and has pdf 𝑓 (𝑥; 𝜗), the joint density of the sample is given by

𝑛

1

𝑖

𝑛

𝑖

𝑛

𝑖= 1

If 𝑋 is a discrete rv and has pmf 𝑝(𝑥; 𝜗), the joint probability of the sample is given

by

𝑛

1

𝑖

𝑛

𝑖

𝑛

𝑖= 1

Parametric inference

Parametric estimation : let 𝑋 1

𝑖

𝑛

be a random sample from a continuous

rv 𝑋 having pdf 𝑓 (𝑥; 𝜗), where 𝜗 = (𝜗

1

𝑟

𝑘

The parameter vector 𝜗 is unknown and we wish to “estimate” it based on the

random sample. This is the typical context for parametric estimation. In other words,

the functional form 𝑓 is known (or at least assumed such!), but 𝜗is not!

Example : Let 𝜇 be the (population) mean of 𝑋. Suppose 𝜇 is unknown and we

want to estimate it using the random sample. Note that here no functional form

𝑓 is assumed for the pdf of 𝑋!

This easily extends to discrete rv’s.

Point estimation

Let 𝑋

1

𝑖

𝑛

be a random sample from a population with density 𝑓(𝑥; 𝜗). We

will indicate with 𝜏(𝜗) a function of the unknown parameters 𝜗 that we wish to

estimate using the random sample.

Example : Let 𝑓

2

1

√ 2 𝜋𝜎

−

1

2

(

𝑥− 𝜇

𝜎

)

2

. In this case 𝜗 =

2

′

and

we might want to estimate: 1) the coefficient of variation 𝜏(𝜇, 𝜎

2

√𝜎

2

𝜇

the mean 𝜏(𝜇, 𝜎

2

) = 𝜇; 3) the variance 𝜏(𝜇, 𝜎

2

Estimator : An estimator of 𝜏(𝜗) is defined as some (hopefully appropriate) function

1

𝑖

𝑛

) of the sample variables. Remark that the estimator T, being

a function of random variables, is itself a random variable!

Unfortunately, an estimator with 𝑀𝑆𝐸(𝑇) uniformly smaller with respect to 𝜗 than

any other estimator does not exist.

Our goal: set up “desirable” conditions 𝑇 must fulfil to restrict the range of admissible

estimators- hopefully there will be one better than the others!

Unbiased estimator (quelli che noi studieremo)

Unbiased estimator: the estimator 𝑇 of 𝜏(𝜗) is unbiased if 𝐸(𝑇) = 𝜏(𝜗), for every 𝜗.

The “central value of 𝑇”, that is 𝐸(𝑇), coincides with 𝜏(𝜗).

This property allows us to find a subset of estimators, the unbiased estimators, of

much interest. On top of this, given that their MSE coincides with their variance, to

choose among unbiased estimators will be enough to look for the lowest variance.

Bias of an estimator

If 𝐸(𝑇) ≠ 𝜏(𝜗), we say that the estimator is biased

Bias of an estimator : the bias of an estimator T of 𝜏(𝜗) is the difference

If 𝐸(𝑇) > 𝜏(𝜗) the estimator T is positively biased, with bias

𝑇 tends to overestimate 𝜏(𝜗);

If 𝐸(𝑇) < 𝜏(𝜗) the estimator T is negatively biased, with bias

𝑇 tends to underestimate 𝜏(𝜗).

Cramèr-Rao Lower Bound (legato agli unbiased estimators)

Within the class of unbiased estimators, the most important feature is variance.

Under certain regularity conditions it is possible to show that the variance of any

unbiased estimator is greater than, or equal to, a quantity, which is the lower bound

of the variance of unbiased estimators.

An unbiased estimator T is efficient if its variance reaches the CR lower bound.

𝑿 - > Perhaps ML is the most widespread method for building up estimators,

although arguably not the easiest one (see the method of moments).

Operationally, ML method can be summarized into 3 steps

Maximum likelihood method

- > Step 1

Keeping in mind that we are dealing with a random sample of iid observations, first

we compute the likelihood function

1

𝑛

𝑖

𝑛

𝑖= 1

Remark : note that the formula is the same as the sample joint density

𝑛

1

𝑛

; 𝜗), but this is now a function of 𝜗 given the sample.

Interpretation: 𝐿 (𝜗; 𝑥 1

𝑛

) indicates the probability, or likelihood if you prefer,

that the observed sample originates from 𝑋 having density 𝑓 (𝑥; 𝜗).

Objective : our goal is to maximize 𝐿 (𝜗; 𝑥

1

𝑛

) with respect to 𝜗.

- > Step 2

From the likelihood function we get to the log-likelihood function

for two main reasons:

1

𝑛

) does not alter the minima and the maxima of 𝐿 (𝜗; 𝑥

1

𝑛

since the logarithm is an increasing monotonic function;

2) the logarithmic function has some useful properties:

a) it easily handles the exponentials (𝑙𝑛 𝑎

𝑥

= 𝑥 𝑙𝑛 𝑎 and 𝑙𝑛 𝑒

𝑥

= 𝑥) that

most widespread distributions have (normal, gamma, etc);

b) it easily handles products ( ln ∏ 𝑓(𝑥

𝑖

) = ∑ ln 𝑓(𝑥

𝑖

𝑛

𝑖= 1

𝑛

𝑖= 1

like those of

1

𝑛

c) exalts differences (useful from a computational point of view) (vedere

imagine qui sotto)

- > Step 3 (forse non troppo importante)

The ML estimator of 𝜗 is found maximizing 𝑙 (𝜗; 𝑥

1

𝑛

) with respect to 𝜗 =

1

𝑟

𝑘

)′ that is by solving the system of k equations obtained by equating

the k partial derivatives to 0:

Goodness-of-fit tests

There are 3 main statistical tests to evaluate the goodness-of-fit of a statistical

(theoretical) model (that we put under the null hypothesis 𝐻

0

) to the empirical

distribution (namely the distribution of the observed sample) (ovvero ci sono 3

tipologie di test effettuabili per capire la bontà con la quale un set di dati è stato

descritto da una distribuzione che si è ipotizzata):

Note: The null hypothesis is an hypothesis which is assumed true until there’s a proof

of the contrary. The alternative hypothesis is an hypothesis which is opposite to the

null one which is accepted only if there is a strong proof in its favour.

Pearson’s chi-square test;
Kolmogorov-Smirnov test;
Likelihood-ratio test.

1) Pearson’s chi-square test

In Pearson’s chi-square goodness-of-fit test the sample data of size 𝑛, if of a

continuous type, are divided into 𝑠 intervals (or classes or bins) 𝐴 1

𝑠

. Then the

numbers of points 𝑛

1

𝑠

that fall into the intervals 𝐴

1

𝑠

are compared with

the expected numbers of points 𝑛

1

𝑠

(under the null model) in those intervals.

The null hypothesis 𝐻

0

assigns the probabilities 𝜋

1

𝑠

1

𝑠

Hypotheses: the null and alternative hypotheses are:

0

1

𝑠

1

𝑠

Where 𝑝

1

𝑠

represent the true but unknown probabilities of 𝐴

1

𝑠

Remark : it is natural to assume that the rejection of 𝐻

0

depends on the

discrepancy between 𝑛

1

𝑠

and 𝑛

1

𝑠

Remarks : - > The distribution of 𝜒

2

is asymptotic and evaluated under the null

hypothesis.

Let q be the number of unknown parameters of the null model. If these

parameters need to be estimated from the sample data, then

The critical region of level 𝛼 is defined as

2

where 𝑐 = 𝜒

2

[(𝑠− 1 ); 1 −𝛼]

is the quantile of order 1 − 𝛼 from 𝜒

2

(𝑠− 1 )

2) Kolmogorov-Smirnov test

The Kolmogorov-Smirnov test (KS test) is a goodness-of-fit test that can be used with

two different purposes.

one-sample KS test : it is used to compare the empirical cumulative distribution

function 𝐹(𝑥) with the theoretical cumulative distribution function 𝜙

under

the null. The KS statistic quantifies the distance between 𝐹(𝑥) and 𝜙(𝑥) as

The critical region of level 𝛼 is defined as

where 𝑐 = 𝑥

[𝑚; 1 −𝛼]

2

is the quantile of order 1 − 𝛼 from 𝑥

𝑚

2

Interpretation of LR : → The numerator of LR corresponds to the likelihood of an

observed outcome under the null hypothesis.

→ The denominator of LR corresponds to the maximum likelihood of an observed

outcome varying parameters over the whole parameter space Θ.

The numerator of this ratio is, by definition, less than the denominator; so, LR is

between 0 and 1.

→ Low values of LR mean that the observed result was much less likely to occur under

0

as compared to 𝐻

1

→ High values (close to 1) of LR mean that the observed outcome was nearly as likely

to occur under the null hypothesis as the alternative, and so the null hypothesis

cannot be rejected.

Remark: The LR test requires to the competing models to be nested, i.e. the more

complex model can be transformed into the simpler model by imposing constraints

on the former’s parameters.

SLIDE 2 – Basics of Matrices

Triangular matrix : In the mathematical discipline of linear algebra, a triangular

matrix is a special kind of square matrix. A square matrix is called lower triangular if

all the entries above the main diagonal are equal to 0. Similarly, a square matrix is

called upper triangular if all the entries below the main diagonal are equal to 0.