CATREG: Categorical Regression with Optimal Scaling using Alternating Least Squares, Study notes of Mathematical Statistics

Catreg is a statistical method used for quantifying categorical variables using optimal scaling, resulting in an optimal linear regression equation for the transformed variables. It allows for mixed optimal scaling levels and no distributional assumptions about the variables. The algorithm, assumptions, and output of the catreg procedure.

Typology: Study notes

2011/2012

Uploaded on 10/31/2012

sangawar
sangawar 🇮🇳

4.5

(4)

118 documents

1 / 12

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
CATREG
CATREG (Categorical regression with optimal scaling using alternating least
squares) quantifies categorical variables using optimal scaling, resulting in an
optimal linear regression equation for the transformed variables. The variables can
be given mixed optimal scaling levels and no distributional assumptions about the
variables are made.
Notation
The following notation is used throughout this chapter unless otherwise stated:
n Number of analysis cases (objects)
w
n Weighted number of analysis cases:
1
n
i
i
w
=
tot
n Total number of cases (analysis + supplementary)
i
w Weight of object i; 1
i
w= if cases are unweighted; 0
i
w= if object iis
supplementary.
W Diagonal tot tot
nn× matrix, with i
won the diagonal.
p Number of predictor variables
m Total number of variables
r Index of response variable
Jp Index set of predictor variables
H The data matrix (category indicators), of order tot
nm×, after discretization,
imputation of missings , and listwise deletion, if applicable.
For variable j, 1, ,jm=K
j
k Number of categories of variable j(number of distinct values
in j
h, thus, including supplementary objects)
j
G Indicator matrix for variable j, of order tot j
nk
×
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download CATREG: Categorical Regression with Optimal Scaling using Alternating Least Squares and more Study notes Mathematical Statistics in PDF only on Docsity!

1

CATREG ( Cat egorical reg ression with optimal scaling using alternating least squares) quantifies categorical variables using optimal scaling, resulting in an optimal linear regression equation for the transformed variables. The variables can be given mixed optimal scaling levels and no distributional assumptions about the variables are made.

Notation

The following notation is used throughout this chapter unless otherwise stated:

n (^) Number of analysis cases (objects)

n w Weighted number of analysis cases: 1

n i i

w

n tot Total number of cases (analysis + supplementary)

w i Weight of object i ; wi = 1 if cases are unweighted; wi = 0 if object i is supplementary. W (^) Diagonal ntot × ntot matrix, with wi on the diagonal.

p Number of predictor variables m Total number of variables r Index of response variable Jp Index set of predictor variables H (^) The data matrix (category indicators), of order ntot × m , after discretization, imputation of missings , and listwise deletion, if applicable.

For variable j , j = 1, K, m k (^) j Number of categories of variable^ j^ (number of distinct values in h (^) j , thus, including supplementary objects)

G (^) j Indicator matrix for variable j , of order ntot × kj

The elements of G (^) j are defined as i = 1, K , ntot ; r =1, K, kj

1 when the th object is in the th category of variable j ir 0 when the th object is not in the th category of variable

i r j g i r j

D (^) j Diagonal k (^) j × kj matrix, containing the weighted univariate marginals; i.e., the weighted column sums of G (^) j ( D (^) j = G WGj (^) j )

f Degrees of freedom for the predictor variables, of order p

S (^) j I-spline basis for variable j , of order k (^) j × ( s (^) j + tj )(see Ramsay (1988) for details) a (^) j Spline coefficient vector, of order s (^) j + tj

d (^) j Spline intercept.

s (^) j Degree of polynomial

t (^) j Number of interior knots

The quantification matrices and parameter vectors are:

y r Category quantifications for the response variable, of order^ kr

y j, j ∈ Jp Category quantifications for predictor variable^ j , of order^ kj

b Regression coefficients for the predictor variables, of order p

v (^) Accumulated contributions of predictor variables: p

j j j j J

b

G y

Note: The matrices W , G (^) j , and D (^) j are exclusively notational devices; they are stored in reduced form, and the program fully profits from their sparseness by replacing matrix multiplications with selective accumulation.

Discretization

Discretization is done on the unweighted data.

Objective Function Optimization

Objective Function

The CATREG objective is to find the set of y (^) r , b , and y (^) j , jJp , so that the function

( ;^ ; ) p p

r j r r j j j r r j j j j J j J

σ − bb ∈ ∈

= ^ ^ ^ 

y b y G y (^) ∑ G y W G y (^) ∑ G y

is minimal, under the normalization restriction y D y r ′ (^) r r (^) = nw The quantifications of the response variable are also centered; that is, they satisfy u WG y ′ (^) r (^) r = 0 with u denoting an n -vector with ones.

Optimal Scaling Levels

The following optimal scaling levels are distinguished in CATREG ( (^) j = 1, K, m ):

Nominal Equality restrictions only.

Spline Nominal y (^) j = dj + S a j (^) j (equality and spline restrictions).

Spline Ordinal y (^) j = dj + S a j (^) j (equality and monotonic spline restrictions),

with a (^) j restricted to contain nonnegative elements (to garantee monotonic I- splines).

Ordinal y (^) jC j (equality and monotonicity restrictions).

The monotonicity restriction y (^) jC j means that y (^) j must be located in the convex cone of all k (^) j -vectors with nondecreasing elements.

Numerical y (^) jL j (equality and linearity restrictions).

The linearity restriction y (^) jLj means that y (^) j must be located in the subspace of all k (^) j -vectors that are a linear transformation of the vector consisting of kj successive integers.

For each variable, these levels can be chosen independently. The general requirement for all options is that equal category indicators receive equal quantifications. For identification purposes, y (^) j is always normalized so that y D yj (^) j (^) j = nw.

Optimization

Iteration scheme

Optimization is achieved by executing the following iteration scheme:

  1. Initialization I or II
  2. Update category quantifications response variable
  3. Update category quantifications and regression coefficients predictor variables
  4. Convergence test: repeat (2)(3) or continue Steps (1) through (4) are explained below.

(1) Initialization I. Random The initial category quantifications y % (^) j (for j = 1, ..., m ) are defined as the kj category indicators of variable j , normalized such that u WG y ′ (^) j % (^) j = 0 and y D y % (^) j j % (^) j = nw , and the initial regression coefficients are the correlations with the response variable. II. Numerical In this case, the iteration scheme is executed twice. In the first cycle, (initialized with initialization I) all variables are treated as numerical. The second cycle, with the specified scaling levels, starts with the category quantifications and regression coefficients from the first cycle.

(2) Update category quantifications response variable

Then the unconstrained update of y (^) j is

( ) 1 j j j r r j y % = D^ −^ G W G y ′ − v

Next y % (^) j is restricted and normalized as in step (2) to obtain y^ + j.

Finally, we update the regression coefficient

1 b (^) j n w j j j

  • (^) = − (^) y D y % ′ +.

(4) Convergence test The difference between consecutive values of the squared multiple regression coefficient,

( ) ( ) 2 1 2 1 2 R n w (^) r r = −^ G y ′^ Wv v Wv ′ −

is compared with the user-specified convergence criterion ε  a small positive number. Steps (2) and (3) are repeated as long as the loss difference exceeds ε.

Diagnostics

Descriptive Statistics

The descriptives tables gives the weighted univariate marginals and the weighted number of missing values (system missing, user defined missing, and values ≤ 0 ) for each variable.

Fit and error measures

The fit and the error for each iteration are reported in the History table.

Multiple R Square

R^2 as computed in step(4) in the last iteration.

Error (^) ( ) 2 1 2 1 − R

Also, the increase in R^2 for each iteration is reported.

Summary Statistics

Multiple R

( )

2 1 2 R = R

Multiple R Square

R^2

Adjusted Multiple R Square

( ) (^ ) (^ )

− − R nwnw − − u f

with u a p -vector of ones.

ANOVA Table Sum of Squares

df Mean Sum of Squares Regression 2 n w R u f ′ (^2) ( ) 1 n w R ′ − u f

Residual ( )

2 n w (^) 1 − R w^ n − 1 − u f ′ ( ) (^ )^

2 1 n w (^) 1 R n w 1 − − − u f ′ −

F = MSreg/MSres

F -value

( ( ))

2 Fj = Beta (^) j SE Beta j

Zero-order correlation

Correlations between the transformed response variable G y r r and the transformed predictor variables G y j (^) j :

r rj = nw −^1 ( G y r (^) r )′ WG y j j

Partial correlation

PartialCorr j (( ) ( ) ) 2 2 1 2 b (^) j 1 t (^) j 1 R b (^) j

− = − +

with t (^) j the tolerance for variable j (see below).

Part correlation

PartCorr j = b t j 1 2 j

with t (^) j the tolerance for variable j (see below).

Importance

Pratt’s measure of relative importance (Pratt, 1987)

Imp j = b r j rj R^2

Tolerance

The tolerance for the optimally scaled predictor variables is given by

1 t (^) j rp (^) jj = − ,

with (^) p jj

r the j th^ diagonal element of R (^) p , where R (^) p is the correlation matrix of

predictors that have regression coefficients > 0.

The tolerance for the original predictor variables is also reported and is computed in the same way, using the correlation matrix for the original predictor variables, discretized, imputed, and listwise deleted, if applicable.

Quantifications

The quantifications are y (^) j , j = 1, …, m.

Predicted and residual values

There is an option to save the predicted values v and the residual values G y r (^) rv.

Supplementary objects

For supplementary objects predicted and residual values are computed. The category indicators of supplementary objects are replaced by the quantification of the category. If a category is only used by supplementary objects, the category indicator is replaced by a system-missing value.

Residual Plots

The residual plot for predictor variable (^) j displays two sets of points: unnormalized quantifications ( b (^) j y (^) j ) against category indicators, and residuals when the response variable is predicted from all predictor variables except variable j ( G y r (^) r − (^) ( vbj G y j (^) j )) against category indicators.

References

Barlow, R. E., Bartholomew, D. J., Bremner, J. M., and Brunk, H. D. 1972. Statistical inference under order restrictions. New York: John Wiley & Sons, Inc.