Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Inverse Document Frequency (IDF) and Document Similarity, Slides of Fundamentals of E-Commerce

Birla Institute of Technology and Science Fundamentals of E-Commerce

The concepts of inverse document frequency (idf) and full weighting (tf-idf) in information retrieval. Idf measures how much a term helps to discriminate between documents, while tf-idf calculates the weight of a term in a document based on its frequency and idf. Document similarity is measured using the cosine coefficient of their vector representations. The document also covers document retrieval and evaluation measures, probabilistic retrieval, and latent semantic analysis.

Typology: Slides

2012/2013

Uploaded on 07/29/2013

masti 🇮🇳

4.5

(10)

121 documents

1 / 10

This page cannot be seen from the preview

Don't miss anything!

Inverse document frequency (IDF)

•A term that occurs in a few documents is likely to be a

better discriminator than a term that appears in most

or all documents

•nj - Number of documents which contain the term

•n - total number of documents in the set

•Inverse document frequency

IDF log=

Docsity.com

Discover Slides of Fundamentals of E-Commerce Birla Institute of Technology and Science

Partial preview of the text

Download Inverse Document Frequency (IDF) and Document Similarity and more Slides Fundamentals of E-Commerce in PDF only on Docsity!

Inverse document frequency (IDF) • A term that occurs in a few documents is likely to be abetter discriminator than a term that appears in mostor all documents • n- Number of documents which contain the termj^

ωj

-^ n^ - total number of documents in the set •^ Inverse document frequency

j j^

n n IDF^

log=

Inverse document frequency (IDF)

Document Similarity

-^ Ranks documents by measuring the similaritybetween each document and the query •^ Similarity between two documents

d^ and

d^ ′^ is

a function

s(d, d

′)^ ∈^ R

-^ In a vector-space representation the cosinecoefficient of two document vectors is ameasure of similarity

Cosine Coefficient

-^ The cosine of the angle formed by two documentvectors

x^ and

x^ ′^ is

-^ Documents with many common terms will havevectors close to each other, than documents withfewer overlapping terms

'^ ' '^ ), cos(^

xx xx xx

T =^ ⋅

Retrieval and Evaluation Measures • Precision (

π) - Fraction of retrieved documents that are actually relevant • Recall (

ρ) - Fraction of relevant documents that are retrieved

RR ∩ R = π
RR ∩ * R = ρ

Probabilistic Retrieval

-^ Probabilistic Ranking Principle (PRP)(Robertson, 1977)^ –

ranking of the documents in the order ofdecreasing probability of relevance to the userquery – probabilities are estimated as accurately aspossible on basis of available data – overall effectiveness of such as system will be thebest obtainable

Latent Semantic Analysis

-^ Why need it?^ –

serious problems for retrieval methods based onterm matching^ •^ vector-space similarity approach works only if theterms of the query are explicitly present in the relevantdocuments – rich expressive power of natural language^ •^ often queries contain terms that express

concepts

related to text to be retrieved

Inverse Document Frequency (IDF) and Document Similarity, Slides of Fundamentals of E-Commerce

Related documents

Partial preview of the text

Download Inverse Document Frequency (IDF) and Document Similarity and more Slides Fundamentals of E-Commerce in PDF only on Docsity!

Inverse document frequency (IDF) • A term that occurs in a few documents is likely to be abetter discriminator than a term that appears in mostor all documents • n- Number of documents which contain the termj^

ωj

Inverse document frequency (IDF)

Document Similarity

Cosine Coefficient

x^ ′^ is

Retrieval and Evaluation Measures • Precision (

Probabilistic Retrieval

Latent Semantic Analysis