Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Deep Belief Networks - Machine Learning and Pattern Recognition - Lecture Notes, Study notes of Machine Learning

Birla Institute of Technology and Science Machine Learning

Main points of this lecture are: Deep Belief Networks, Existing Methods, Algorithms, Draw Backs, Gaussian Noise, Conditional Distributions, Learning Stacks, Topic Space, Semantic Hashing

Typology: Study notes

2012/2013

Uploaded on 04/30/2013

bassu 🇮🇳

4.5

(42)

141 documents

1 / 23

This page cannot be seen from the preview

Don't miss anything!

DEEP BELIEF NETWORKS WITH APPLICATIONS

TO FAST DOCUMENT RETRIEVAL

Docsity.com

Discover Study notes of Machine Learning Birla Institute of Technology and Science

Partial preview of the text

Download Deep Belief Networks - Machine Learning and Pattern Recognition - Lecture Notes and more Study notes Machine Learning in PDF only on Docsity!

DEEP^ BELIEF^ N

ETWORKS WITH APPLICATIONSTO FAST DOCUMENT^ R

ETRIEVAL 1

Existing Methods

-^ One of the most popular and widely used in practicealgorithms for document retrieval tasks is TF-IDF. •^ TF-IDF weights each word by:^ –^ its frequency in the query document (Term Frequency)^ –^ the logarithm of the reciprocal of its frequency in the whole set ofdocuments (Inverse Document Frequency).^ However, TF-IDF has several limitations:^ –^ It computes document similarity directly in the word-count space,which may be slow for large vocabularies.^ –^ It assumes that the counts of different words provide independentevidence of similarity.^ –^ It makes no use of semantic similarities between words.

Drawbacks of Existing Methods • LSA is a linear method so it can only capture pairwisecorrelations between words. • Numerous methods, in particular probabilistic versions ofLSA were introduced in the machine learning community. • These models can be viewed as graphical models in which asingle layer of hidden topic variables have directedconnections to variables that represent word-counts. • There are limitations on the types of structure that can berepresented efficiently by a single layer of hidden variables. • We will build a network with multiple hidden layers and withmillions of parameters and show that it can discover latentrepresentations that work much better.

RBM’s revisited

-^ A joint configuration (

v,^ h) has an energy:∑∑ E(v, h) =^ −^ bv−^ bhii^ j^ i^ j

∑− vhW.j ijij^ i,j

-^ The probability that the model assigns to

v^ is: ∑ p(v) = p(v,^ h) =^ h ∑ 1 exp(−E(v,^ h)) Z^ h

RBM’s for count data • Hidden units remain binary and the visible word counts aremodeled by the Poisson model. • The energy is defined as:∑∑∑ E(v, h) = − bv−^ bh−^ ii^ jj^ i^ j^

∑vhW+ log^ vijij (^) i,j i^ !.i

-^ Conditional distributions over hidden and visible units are:^ p(h= 1|vj^

1 ∑) = 1 + exp(−b−^ j , Wv)iji i p(v=^ n|h) =^ Poissoni^

(∑^ n,^ exp (b+^ hWi^ j^ j

) )^ ,ij wherePoisson

n() λ−λn, λ= e. n!^7

Learning Stacks of RBM’s

30 W^4500 RBM 500 W^31000 RBM 1000 W^22000 RBM 2000 W^1 RBM

-^ Perform greedy, layer-by-layer learning:^ –^ Learn and Freeze

Wusing Poisson^1 Model. – Treat the existing feature detectorsas if they were data. – Learn and Freeze W.^2 – Greedily learn many layers.

-^ Each layer of features captures stronghigh-order correlations between the activitiesof units in the layer below.

20 newsgroup: 2-D topic space^ Autoencoder 2−D Topic Space^ talk.religion.miscsci.cryptographycomp.graphics

rec.sport.hockey misc.forsale talk.politics.mideast

LSA 2−D Topic Space

-^ The 20 newsgroup corpus contains 18,845 postings (11,314training and 7,531 test) taken from the Usenet newsgroups. •^ We use a 2000-500-250-125-10 autoencoder to convert adocument into a low-dimensional code. •^ We used a simple “bag-of-words” representation.

Reuters Corpus: 2-D topic space Autoencoder 2−D Topic SpaceEuropean CommunityMonetary/EconomicInterbank MarketsEnergy MarketsDisasters andAccidents Leading Ecnomic^ Legal/JudicialIndicatorsGovernmentBorrowingsAccounts/Earnings

LSA 2−D Topic Space

-^ We use a 2000-500-250-125-2 autoencoder to convert testdocuments into a two-dimensional code. •^ The Reuters Corpus Volume II contains 804,414 newswirestories (randomly split into

402,207^ training and

402,207^ test).

-^ We used a simple “bag-of-words” representation.

Semantic Hashing

W^ +ε W^ +ε^ W^ +ε W W

WW W W W W W

+ε W +ε W +ε W

2000 500 500 500 20002000 (^2000500500) Gaussian^3 3 Noise^5002 25001 (^500 )

11 22 33^ Code Layer 20 3 2 5001 Fine−tuning

6 5 4 Code Layer 20 Unrolling RBM 20 3 RBM 500 500 RBM (^500) Bag of WordsRecursive Pretraining TT TT T^ T

-^ Learn to map documents into

semantic^ 20-D binary code and use these codes as memory addresses. • We have the ultimate retrieval tool: Given a query document,compute its 20-bit address and retrieve all of the documentsstored at the similar addresses

with no search at all

. 13

The Main Idea of Semantic Hashing

SemanticallySimilarDocuments

Memory f Document^14

Semantic Hashing Reuters 2−D Embedding of 20−bit codesEuropean CommunityMonetary/EconomicDisasters andAccidents GovernmentBorrowingEnergy Markets Accounts/Earnings^ 0.1^ 0.2^ 0.

TF−IDF 50 TF−IDF using 20 bits Locality Sensitive Hashing (^4030) Precision (%) 20 10 0 0.8 1.6 3.2 6.4^ 12.8^ 25.6^ 51.2^100 Recall (%)

-^ We used a simple C implementation on Reuters dataset(402,212 training and 402,212 test documents). •^ For a given query, it takes about 0.5 milliseconds to create ashort-list of about 3,000 semantically similar documents. •^ It then takes 10 milliseconds to retrieve the top few matchesfrom that short-list using TF-IDF, and it is more accurate thanfull TF-IDF.

Learning nonlinear embedding • Learning a similarity measure over the input space

X.

-^ Given a distance metric

D^ (e.g. Euclidean) we can measure similarity between two input vectors

nk^ x,^ x∈^ X^ by ncomputing D[f (x| kW ), f (x|W^ )].

-^ “Push-Pull” Idea: Pull points belonging to the same classtogether. Push points belonging to the different classes apart.

d[f(x1),f(x2)] 1010 WW (^33500500) WW (^22500500) WW (^1120002000) x1 x2^17

Learning Nonlinear NCA • Probability that point^ n^ belongs to class

a^ is: ∑n (^) p(c= a) = k^ k:c=a p.nk

-^ Maximize the expected number of correctly classified pointson the training data:^ O

∑^ ∑N (^1) = (^) NCA n=1^ k:c N p.nk nk=c

-^ By considering a linear perceptron we arrive at linear NCA.

2-D codes 0

1 2

3 7 9 8 4 6 5

atheismreligion.christiansci.cryptographysci.spacerec.hokeyrec.autoscomp.windowscomp.hardware 20