






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Catreg is a statistical method used for quantifying categorical variables using optimal scaling, resulting in an optimal linear regression equation for the transformed variables. It allows for mixed optimal scaling levels and no distributional assumptions about the variables. The algorithm, assumptions, and output of the catreg procedure.
Typology: Study notes
1 / 12
This page cannot be seen from the preview
Don't miss anything!







1
CATREG ( Cat egorical reg ression with optimal scaling using alternating least squares) quantifies categorical variables using optimal scaling, resulting in an optimal linear regression equation for the transformed variables. The variables can be given mixed optimal scaling levels and no distributional assumptions about the variables are made.
The following notation is used throughout this chapter unless otherwise stated:
n (^) Number of analysis cases (objects)
n w Weighted number of analysis cases: 1
n i i
∑
n tot Total number of cases (analysis + supplementary)
w i Weight of object i ; wi = 1 if cases are unweighted; wi = 0 if object i is supplementary. W (^) Diagonal ntot × ntot matrix, with wi on the diagonal.
p Number of predictor variables m Total number of variables r Index of response variable Jp Index set of predictor variables H (^) The data matrix (category indicators), of order ntot × m , after discretization, imputation of missings , and listwise deletion, if applicable.
For variable j , j = 1, K, m k (^) j Number of categories of variable^ j^ (number of distinct values in h (^) j , thus, including supplementary objects)
G (^) j Indicator matrix for variable j , of order ntot × kj
The elements of G (^) j are defined as i = 1, K , ntot ; r =1, K, kj
1 when the th object is in the th category of variable j ir 0 when the th object is not in the th category of variable
i r j g i r j
D (^) j Diagonal k (^) j × kj matrix, containing the weighted univariate marginals; i.e., the weighted column sums of G (^) j ( D (^) j = G WG ′ j (^) j )
f Degrees of freedom for the predictor variables, of order p
S (^) j I-spline basis for variable j , of order k (^) j × ( s (^) j + tj )(see Ramsay (1988) for details) a (^) j Spline coefficient vector, of order s (^) j + tj
d (^) j Spline intercept.
s (^) j Degree of polynomial
t (^) j Number of interior knots
The quantification matrices and parameter vectors are:
b Regression coefficients for the predictor variables, of order p
v (^) Accumulated contributions of predictor variables: p
j j j j J
b ∈
∑ G y
Note: The matrices W , G (^) j , and D (^) j are exclusively notational devices; they are stored in reduced form, and the program fully profits from their sparseness by replacing matrix multiplications with selective accumulation.
Discretization is done on the unweighted data.
Objective Function
The CATREG objective is to find the set of y (^) r , b , and y (^) j , j ∈ Jp , so that the function
( ;^ ; ) p p
r j r r j j j r r j j j j J j J
σ − b − b ∈ ∈
y b y G y (^) ∑ G y W G y (^) ∑ G y
is minimal, under the normalization restriction y D y r ′ (^) r r (^) = nw The quantifications of the response variable are also centered; that is, they satisfy u WG y ′ (^) r (^) r = 0 with u denoting an n -vector with ones.
Optimal Scaling Levels
The following optimal scaling levels are distinguished in CATREG ( (^) j = 1, K, m ):
Nominal Equality restrictions only.
Spline Nominal y (^) j = dj + S a j (^) j (equality and spline restrictions).
Spline Ordinal y (^) j = dj + S a j (^) j (equality and monotonic spline restrictions),
with a (^) j restricted to contain nonnegative elements (to garantee monotonic I- splines).
Ordinal y (^) j ∈ C j (equality and monotonicity restrictions).
The monotonicity restriction y (^) j ∈ C j means that y (^) j must be located in the convex cone of all k (^) j -vectors with nondecreasing elements.
Numerical y (^) j ∈ L j (equality and linearity restrictions).
The linearity restriction y (^) j ∈ Lj means that y (^) j must be located in the subspace of all k (^) j -vectors that are a linear transformation of the vector consisting of kj successive integers.
For each variable, these levels can be chosen independently. The general requirement for all options is that equal category indicators receive equal quantifications. For identification purposes, y (^) j is always normalized so that y D y ′ j (^) j (^) j = nw.
Optimization
Iteration scheme
Optimization is achieved by executing the following iteration scheme:
(1) Initialization I. Random The initial category quantifications y % (^) j (for j = 1, ..., m ) are defined as the kj category indicators of variable j , normalized such that u WG y ′ (^) j % (^) j = 0 and y D y % (^) j j % (^) j = nw , and the initial regression coefficients are the correlations with the response variable. II. Numerical In this case, the iteration scheme is executed twice. In the first cycle, (initialized with initialization I) all variables are treated as numerical. The second cycle, with the specified scaling levels, starts with the category quantifications and regression coefficients from the first cycle.
(2) Update category quantifications response variable
Then the unconstrained update of y (^) j is
( ) 1 j j j r r j y % = D^ −^ G W G y ′ − v
Next y % (^) j is restricted and normalized as in step (2) to obtain y^ + j.
Finally, we update the regression coefficient
1 b (^) j n w j j j
(4) Convergence test The difference between consecutive values of the squared multiple regression coefficient,
( ) ( ) 2 1 2 1 2 R n w (^) r r = −^ G y ′^ Wv v Wv ′ −
is compared with the user-specified convergence criterion ε a small positive number. Steps (2) and (3) are repeated as long as the loss difference exceeds ε.
Descriptive Statistics
The descriptives tables gives the weighted univariate marginals and the weighted number of missing values (system missing, user defined missing, and values ≤ 0 ) for each variable.
Fit and error measures
The fit and the error for each iteration are reported in the History table.
Multiple R Square
R^2 as computed in step(4) in the last iteration.
Error (^) ( ) 2 1 2 1 − R
Also, the increase in R^2 for each iteration is reported.
Summary Statistics
Multiple R
( )
2 1 2 R = R
Multiple R Square
Adjusted Multiple R Square
( ) (^ ) (^ )
− − R nw − nw − − u f ′
with u a p -vector of ones.
ANOVA Table Sum of Squares
df Mean Sum of Squares Regression 2 n w R u f ′ (^2) ( ) 1 n w R ′ − u f
Residual ( )
2 n w (^) 1 − R w^ n − 1 − u f ′ ( ) (^ )^
2 1 n w (^) 1 R n w 1 − − − u f ′ −
F = MSreg/MSres
F -value
( ( ))
2 Fj = Beta (^) j SE Beta j
Zero-order correlation
Correlations between the transformed response variable G y r r and the transformed predictor variables G y j (^) j :
r rj = nw −^1 ( G y r (^) r )′ WG y j j
Partial correlation
PartialCorr j (( ) ( ) ) 2 2 1 2 b (^) j 1 t (^) j 1 R b (^) j
− = − +
with t (^) j the tolerance for variable j (see below).
Part correlation
PartCorr j = b t j 1 2 j
with t (^) j the tolerance for variable j (see below).
Importance
Pratt’s measure of relative importance (Pratt, 1987)
Imp j = b r j rj R^2
Tolerance
The tolerance for the optimally scaled predictor variables is given by
1 t (^) j rp (^) jj = − ,
with (^) p jj
r the j th^ diagonal element of R (^) p , where R (^) p is the correlation matrix of
predictors that have regression coefficients > 0.
The tolerance for the original predictor variables is also reported and is computed in the same way, using the correlation matrix for the original predictor variables, discretized, imputed, and listwise deleted, if applicable.
Quantifications
The quantifications are y (^) j , j = 1, …, m.
Predicted and residual values
There is an option to save the predicted values v and the residual values G y r (^) r − v.
Supplementary objects
For supplementary objects predicted and residual values are computed. The category indicators of supplementary objects are replaced by the quantification of the category. If a category is only used by supplementary objects, the category indicator is replaced by a system-missing value.
Residual Plots
The residual plot for predictor variable (^) j displays two sets of points: unnormalized quantifications ( b (^) j y (^) j ) against category indicators, and residuals when the response variable is predicted from all predictor variables except variable j ( G y r (^) r − (^) ( v − bj G y j (^) j )) against category indicators.
Barlow, R. E., Bartholomew, D. J., Bremner, J. M., and Brunk, H. D. 1972. Statistical inference under order restrictions. New York: John Wiley & Sons, Inc.