






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Students of Communication, study E-Commerce as an auxiliary subject. these are the key points discussed in these Lecture Slides of E-Commerce : Advanced Crawling Techniques, Selective Crawling, Focused Crawling, Distributed Crawling, Web Dynamics, Downloads Documents, Reachable Pages, Navigates, Exhaustive Crawl, Selective Crawl
Typology: Slides
1 / 11
This page cannot be seen from the preview
Don't miss anything!







Exhaustive crawl
broad coverage - used by general purpose search engines - Selective crawl - fetch pages according to some criteria, for e.g., popularpages, similar pages - exploit semantic content, rich contextual aspects
θ ( ξ )
relevance criterion
parameters
for e.g., a boolean relevance function - s(u) = document is relevant - s(u) = document is irrelevant
length of the path from the site homepage to the document - limit total number of levels retrieved from a site - maximize coverage breadth
assign relevance according to which pages are more important thanothers - estimate the number of backlinks < ≈ = otherwise , 0 ) ( if , 1 ) ( ) ( δ δ u u root u s^ depth
= otherwise , 0 indegree if, 1 ) )( ( τ δ (u) u s backlinks
assign value of importance
value is proportional to the popularity of thesource document - estimated by a measure of indegree of apage
use text categorization techniques
s θ (topic) (u) = P(c|d(u),
score of parent is extended to children URL
anchor text is used for scoring pages
classify crawled pages into categories
use a topic taxonomy, provide example URLs, andmark categories of interest - use a Bayesian classifier to find P(c|p) - compute relevance score for each page - R(p) =
c ∈ good P(c|p)