→ In an ordered list, a value that is in the "middle" (50% above, 50% below). Not affected by extreme values. ! for the position: (n+1) / 2 If we get a decimal number, the position is between 2 values and we do the mean of them.

MODE

→ central tendency, most occurring value, not affected by extreme values, used for both numerical and categorical data. It may not exist. There may be several modes

GEOMETRIC MEAN

→ most frequent to average percentages, rates ..., i.e. cases where the variable presents a cumulative change. When we have the factor TIME. 𝑁 𝑋1 * 𝑋2 * 𝑋 (x calcolatrice: prodotto elevato alla 1/n)

WEIGHTED MEAN [(X1peso)+(X2peso)...] / somma dei pesi

MEASURES OF VARIABILITY (DISPERSION)

RANGE

→ difference between the largest and smallest observation. Omits how data are distributed and it is sensitive to extreme values. X largest - X smallest

QUARTILES

separate large data sets into four quarters: The first quartile, Q1, separates approximately the smallest 25% of the data from the remainder of the data. Q2 is the median. The third quartile separates approximately the smallest 75% of the data from the remaining largest 25% of the data. ! For the position:

Q1 is 0.25(n+1)
Q2 is 0.50(n+1)
Q3 is 0.75(n+1) If we get a decimal number, the position is between 2 values and we do the mean of them.

INTERQUARTILE RANGE

To solve the extreme values problem. High and low values of the observations are removed and the range is calculated 50% of the central data.

Q3 - Q

BOX PLOT

Graph that describes the shape of a distribution in terms of the five-number summary.

VARIANCE

It is the sum of the squared differences between each observation and the sample mean divided by the sample size. [(X1-Xmedio) + (X2-Xmedio) +…] / n

STANDARD DEVIATION

It is the most commonly used measure of dispersion. It shows the dispersion around the sample mean X. It has the same units as the original data.

● low SD: observations are very concentrated around the mean value. ● high SD: observation dispersed

COEFFICIENT OF VARIATION

It measures the relative dispersion. Often in percentage(%). It measures the dispersion in measurement units. It is useful to compare two or more data sets measured in different units. (SD/Xmedio) * 100% mi esce una % ! useful when we have two situations (eg. stocks with average price of last year) with the same standard deviation but we see that stock B has lower relative dispersion on its price. MEASURES OF THE RELATIONSHIPS BETWEEN TWO VARIABLES

COVARIANCE

→ measure of linear relationship, it is for the direction of the two variables. Sommatoria (Xi - Xmedio) / n ● >0 → tendono a muoversi nella stessa dir ● <0 → tendono a muoversi in dir opposte ● 0 → non sono correlate, indipendenti

CORRELATION COEFFICIENT

→ measures the relative strength of the linear relationship between two variables. It is dimensionless and varies between –1 and 1. COV(x,y) / SDx * SDy ● closer to -1 → stronger the negative linear relationship ● closer to 1 → stronger the positive linear relationship ● closer to 0 → lighter a linear relationship