









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Main points of this lecture are: Representing Data, Repmat, Few Random Things, Clear Memory, Linear Models, Vectorizing, Linear Regression
Typology: Study notes
1 / 16
This page cannot be seen from the preview
Don't miss anything!










Roland Memisevic
January 25, 2007
I (^) Working with MATLAB. I (^) Vectors, matrices, operations. I (^) Control structures. I (^) Scripts and functions. I (^) Plotting. I (^) Slicing/Logical indexing.
I (^) Use ’repmat’ to build new matrices from smaller ones. Example: x = [1;2] repmat(x,1,3) = [1 1 1 2 2 2] repmat(x,2,2) = [1 1 2 2 1 1 2 2]
I (^) Used mostly to ’vectorize’ things (more in a moment...)
I (^) We are often dealing with lots of data, and normally that data is given in the form of real-vectors. If not, we can often massage it accordingly... (performing feature extraction). I (^) It is convenient to use matrices to store data, for example by stacking the vectors column-wise:
X = (x 1 x 2... xN)
I (^) To display up to 3-d data given in this form, you could use scatter(X(1,:),X(2,:)) % display 2-d dataset scatter3(X(1,:),X(2,:),X(3,:)) % display 3-d dataset
I (^) Many machine learning methods are based on simple ’neurons’ computing:
I
I (^) Perceptrons, back-prop networks, support vector machines, logistic regression, PCA, (nearest neighbors), ...
I (^) Applying the linear function to data-points stacked column-wise in a matrix X is simply Y = wTX. In MATLAB: Y = w’*X; I (^) In MATLAB writing stuff in matrix form can be faster than using loops. Referred to as ’vectorization’. I (^) Another example. Suppose you want to mean center a set of vectors stored in X. Instead of m = mean(X,2); for i = 1 : size(X,2) X(:,ii) = X(:,ii) - m; end
we could write: X = X - repmat(mean(X,2),1,size(X,2));
I (^) Unfortunately, in reality we don’t know w... I (^) Given only a set of examples yi and xi, what would be a reasonable guess for w? I (^) Standard approach: Minimize the sum of squared errors
E (w) =
i
(wTxi − yi )^2
I (^) There are many ways of finding the w that minimizes E (w). I (^) One very good way is this: I (^) Start with a random guess winit. I (^) Follow the negative gradient − ∂ ∂Ew , until the error stops changing a lot.
I (^) The gradient is ∂ ∂Ew = 2
i (w
Txi − yi )xi. I (^) So we just have to iterate:
w ← w − 2
i
(wTxi − yi )xi,
where is a small learning rate, without which we will overshoot the minimum. I (^) With vectorization, learning takes about 5 lines in MATLAB: for iteration = 1 : 5000 %in practice: until stopping %criterion satisfied grad = 2sum(repmat(w’X-Y,size(X,1),1).X,2); w = w - epsilon * grad; err = sum((Y - w’X).^2) %just to check end
I (^) Classification: We are given some data with associated class-labels, and want to learn the underlying function, that maps inputs to labels. I (^) Examples: Spam filtering, face recognition, intrusion detection, ... and many many many more. I (^) For now we consider only binary problems (2 classes). Most ideas can be quite easily extended to more than 2. I (^) Again, we will use a linear model. In particular:
f (x) = sgn(wTx)
I (^) This means that we try to separate classes using a hyper-plane.
I (^) Again we can use MATLAB to visualize what’s going on: X = [randn(3,200)-ones(3,200)1. ... randn(3,200)+ones(3,200)1.8]; %produce some inputs Y = [zeros(1,200), ones(1,200)]; %produce some labels scatter3(X(1,:),X(2,:),X(3,:), 80, Y, ’filled’);
I (^) How well does some random w do on the training set? w = randn(3,1); Y_random = sign(w’*X); scatter3(X(1,:),X(2,:),X(3,:),80,Y_random, ’filled’); hold on; plot3([0 w(1)], [0 w(2)], [0 w(3)], ’k’); %show w hold off; sum(Y_random~=Y)/200 %error rate
I (^) A lot of machine learning is based on the simple ’neuron’: wTx I (^) We have looked at basic regression and classification. I (^) Usually a few lines in MATLAB. I (^) A couple of things were oversimplified here. For example, in practice we would adapt the learning rate in gradient descent, add an extra input-dimension for the bias, etc. I (^) Can be easily applied to real data: E.g. Spam, ...