Docsity
Docsity

Prepara tus exámenes
Prepara tus exámenes

Prepara tus exámenes y mejora tus resultados gracias a la gran cantidad de recursos disponibles en Docsity


Consigue puntos base para descargar
Consigue puntos base para descargar

Gana puntos ayudando a otros estudiantes o consíguelos activando un Plan Premium


Orientación Universidad
Orientación Universidad


Unidad 5 (inglés), Apuntes de Idioma Inglés

Asignatura: Introduccion a la Estadistica (ingles), Profesor: Ines Couso, Carrera: Turismo, Universidad: UNIOVI

Tipo: Apuntes

2013/2014

Subido el 24/11/2014

lucilu96-1
lucilu96-1 🇪🇸

4.8

(4)

10 documentos

1 / 6

Toggle sidebar

Esta página no es visible en la vista previa

¡No te pierdas las partes importantes!

bg1
Intro
Joint freq.
Condit.%
Charts
Statistical dependence
Two-dimensional variables
Facultad de Comercio, Turismo y Ciencias Sociales Jovellanos
Two- dimensional variables
Intro
Joint freq.
Condit.%
Charts
Statistical dependence
Introduction
Joint frequencies
Joint frequencies
Joint and marginal distributions
Conditional percentages
Scatter plot
Statistical dependence
Studying statistical dependence from double - entry tables
Studying statistical dependence from some coefficients
Two- dimensional variables
Intro
Joint freq.
Condit.%
Charts
Statistical dependence
Introduction: limitations of univariate tables and statistics
IExample: Consider a sample of n= 10 families. Let us study
two variables:
IX=“household income (Eur per month)”
IY=“household saving (Eur per month)”
The respective frequency tables are:
xini.
1500 1
2000 2
3000 4
4000 1
yjn.j
0 1
50 2
100 4
200 1
500 2
IYou can calculate average income, average saving, variances,
percentiles, etc. (univariate statistics)
IBut, the above frequency tables do not provide us any
information about the RELATION between both variables.
Two- dimensional variables
Intro
Joint freq.
Condit.%
Charts
Statistical dependence
Limitations of univariate tables and statistics (II)
IFrequency tables of Xand Ydo not provide us enough
information to answer the following questions:
ILet us restrict ourselves to those families with an income
higher than 2500 Eur. What percent of those families save
more than 400 Eur?
IIs there a positive relation between both variables? Do higher
incomes correspond to higher savings?
IBut we can answer them if we look at the original dataset:
family 1 2 3 4 5
income 1500 2000 1500 2000 2000
saving 0 50 50 200 100
family 6 7 8 9 10
income 3000 2000 3000 4000 3000
saving 500 100 100 500 100
IWe would need a different kind of table to summarize
bivariate data. Two- dimensional variables
pf3
pf4
pf5

Vista previa parcial del texto

¡Descarga Unidad 5 (inglés) y más Apuntes en PDF de Idioma Inglés solo en Docsity!

Joint freq. Condit.% Charts Statistical dependence

Two-dimensional variables

Facultad de Comercio, Turismo y Ciencias Sociales Jovellanos

Two- dimensional variables

Joint freq. Condit.% Charts Statistical dependence

Introduction

Joint frequencies

Joint frequencies

Joint and marginal distributions

Conditional percentages

Scatter plot

Statistical dependence

Studying statistical dependence from double - entry tables

Studying statistical dependence from some coefficients

Two- dimensional variables

Intro Joint freq. Condit.% Charts Statistical dependence

Introduction: limitations of univariate tables and statistics

I Example: Consider a sample of n = 10 families. Let us study

two variables:

I (^) X =“household income (Eur per month)” I (^) Y =“household saving (Eur per month)”

The respective frequency tables are:

xi ni.

yj n.j

I You can calculate average income, average saving, variances,

percentiles, etc. (univariate statistics)

I But, the above frequency tables do not provide us any

information about the RELATION between both variables.

Two- dimensional variables

Intro Joint freq. Condit.% Charts Statistical dependence

Limitations of univariate tables and statistics (II)

I Frequency tables of X and Y do not provide us enough

information to answer the following questions:

I (^) Let us restrict ourselves to those families with an income higher than 2500 Eur. What percent of those families save more than 400 Eur? I (^) Is there a positive relation between both variables? Do higher incomes correspond to higher savings?

I But we can answer them if we look at the original dataset:

family 1 2 3 4 5 income 1500 2000 1500 2000 2000 saving 0 50 50 200 100

family 6 7 8 9 10 income 3000 2000 3000 4000 3000 saving 500 100 100 500 100

I We would need a different kind of table to summarize

bivariate data. Two- dimensional variables

Joint freq. Condit.% Charts Statistical dependence

Joint frequencies Joint and marginal distributions

Joint frequency tables

I Useful to summarize bivariate information when the sample

size is high and the number of different pairs of values is small.

I Example (just an illustration of the construction of joint

frequency tables. In this example, the sample size is small).

I (^) Row joint frequency table. income 1500 1500 2000 2000 2000 3000 3000 4000 saving 0 50 50 100 200 100 500 500 no. families 1 1 1 2 1 2 1 1 I (^) Double-entry table (each cell represents the number of individuals in each category) xi \yj 0 50 100 200 500 1500 1 1 0 0 0 2000 0 1 2 1 0 3000 0 0 1 2 1 4000 0 0 0 0 1 Two- dimensional variables

Joint freq. Condit.% Charts Statistical dependence

Joint frequencies Joint and marginal distributions

Joint and marginal distributions

I We can obtain the “marginal” frequency tables of X and Y

from the double-entry table:

xi \yj 0 50 100 200 500 ni. 1500 1 1 0 0 0 2 2000 0 1 2 1 0 4 3000 0 0 2 0 1 3 4000 0 0 0 0 1 1 n.j 1 2 4 1 2 n = 10

I The converse is not true: we cannot obtain the joint

frequencies from the marginal frequencies!!!

Two- dimensional variables

Intro Joint freq. Condit.% Charts Statistical dependence

Joint frequencies Joint and marginal distributions

Joint frequency table: general notation

I General nomenclature for joint frequency tables:

xi \yj y 1... yj... yl ni. x 1 n 11... n 1 j... n 1 l n 1.

......... n 23......... xi ni 1... nij... nil ni. ......... n 23......... xk nk 1... nkj... nkl nk. n.j n. 1... n.j... n.l n

I nij : number of individuals taking the pair of values (xi , yj ) in

the sample.

I ni.: number of individuals in the sample taking the value

X = xi.

I n.j : number of individuals in the sample taking the value

Y = yj.

Intro Joint freq. Condit.% Charts Statistical dependence

Joint frequencies Joint and marginal distributions

Joint and marginal distributions: general formulas

If, for instance, X takes k = 4, and Y takes l = 5 different values

in the sample, we use the following notation:

xi \yj y 1 y 2 y 3 y 4 y 5 ni. x 1 n 11 n 12 n 13 n 14 n 15 n 1. x 2 n 21 n 22 n 23 n 24 n 25 n 2. x 3 n 11 n 12 n 13 n 14 n 15 n 3. x 4 n 11 n 12 n 13 n 14 n 15 n 4. n.j n. 1 n. 2 n. 3 n. 4 n. 5 n

I Marginal frequencies of X :

I (^) n 1. = n 11 + n 12 + n 13 + n 14 + n 15 I (^) In general ni. = ni 1 +... + nil

I Marginal frequencies of Y :

I (^) n. 1 = n 11 + n 21 + n 31 + n 34 I (^) In general n.j = n 1 j +... + nkj

I Sample size: n = n 1. +... + nk. = n. 1 +... + n.l.

Joint freq. Condit.% Charts Statistical dependence

Double-entry tables Coefficients

Statistical dependence

INCREASING LEVEL OF DEPENDENCE

STATISTICAL DEPENDENCE

FUNCTIONAL DEPENDENCE

STATISTICAL INDEPENDENCE

Example: X: gender Y: eye color

Example: X: husband's age Y: wife's age

Example: X: number of working hours Y: total fee=300+50X

Two- dimensional variables

Joint freq. Condit.% Charts Statistical dependence

Double-entry tables Coefficients

Statistical independence

The value of X of them does not influence the value of Y and

viceversa. Examples:

I X =“age”, Y =“last figure in the street number”

I X =“score in Maths”, Y =“initial of grandmother’s first name

(numbered from 1 to 26)”

Two- dimensional variables

Intro Joint freq. Condit.% Charts Statistical dependence

Double-entry tables Coefficients

Statistical dependence

The value of X influences the value of Y and viceversa. Examples:

I (^) X =“height”, Y =“weight”, I (^) X =“husband’s age”, Y =“wife’s age” I (^) X =“elevation”, Y =“temperature” I (^) X =“income”, Y =“saving”

I High statistical dependence:

I Low statistical dependence:

Intro Joint freq. Condit.% Charts Statistical dependence

Double-entry tables Coefficients

Functional dependence

Y can be written as a mathematical function of X. Examples:

I X =“number of working hours per project”

Y =“total fee due for project” Y = 300 + 50X.

I X =“area of a bubble” Y =“radius of a bubble” Y =

X

Joint freq. Condit.% Charts Statistical dependence

Double-entry tables Coefficients

How to detect statistical independence from double - entry

tables

Example: eye color vs. gender:

gender \ eye color blue-gray hazel-green brown-black ni. male 8 24 16 48 female 9 27 18 54 n.j 17 51 34 n = 102

When X and Y are independent, all rows in the double-entry table

are proportional to each other.

Two- dimensional variables

Joint freq. Condit.% Charts Statistical dependence

Double-entry tables Coefficients

How to detect functional dependence from double - entry

tables

Example: Study of some defects in glass (bubbles). X =diameter

(in mm.) Y =area (in mm^2 ) :

diameter \ area 0.1 0.38 7.07 12.57 ni. 0.35 5 0 0 0 5 0.70 0 3 0 0 3 3 0 0 2 0 2 4 0 0 0 1 1 n.j 5 3 2 1 n = 11

I When Y is a function of X , there is only one figure different

from 0 in each column.

I When X is a function of Y , there is only one figure different

from 0 in each row.

Two- dimensional variables

Intro Joint freq. Condit.% Charts Statistical dependence

Double-entry tables Coefficients

Contingency coefficient

I Valid for categorical, ordinal and quantitative variables.

I Mathematical definition: c = χ

2

n[min{k,l}−1] ,^ where

I χ^2 is the χ^2 −coefficient,

χ^2 = n

∑^ k

i=

∑^ l

j=

n^2 ij

ni.n.j

I 0 ≤ c ≤ 1

I c = 0 : statistical independence

I c = 1 : functional dependence

Intro Joint freq. Condit.% Charts Statistical dependence

Double-entry tables Coefficients

Coefficient of determination

I Only valid for quantitative variables.

I Mathematical definition:

r 2 =

[

Cov (X , Y )

SD(X )SD(Y )

] 2

, where

I Cov (X , Y ) is the covariance of (X , Y )

I 0 ≤ r 2 ≤ 1

I r 2 = 0 represents linear independence

I r 2 = 1 represents linear dependence