






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An overview of modeling human behavior on the web, focusing on web data and measurement issues, empirical client-side studies of browsing behavior, and probabilistic models of browsing behavior. It discusses the importance of understanding human digital behavior, the collection and identification of web data, and the impact of robots and human traffic on server-side data. The document also touches upon page requests, caching, and proxy servers.
Typology: Slides
1 / 11
This page cannot be seen from the preview
Don't miss anything!







Web data and measurement issues Background:
Important to understand how data is collected - Web data is collected automatically via softwarelogging tools - Advantage: - No manual supervision required - Disadvantage: - Data can be skewed (e.g. due to the presence of robot traffic) - Important to identify robots (also known as crawlers,spiders)
A time-series plot of Web requests Number of page requests per hour as a function of time from pagerequests in the www.ics.uci.edu Web server logs during the first week of April 2002.
Periodic Spikes (can overload a server)
Requests by “bad” robots - Lower-level constant stream of requests - Requests by “good” robots
Daily pattern: Monday to Friday
Hourly pattern: peak around midday & low trafficfrom midnight to early morning
URL of the page requested
Time and date of the request - IP address of the requester - Requester browser information (agent)
Other users
Browser caching - Dynamic addressing in local network - Proxy Server caching
Page requests, caching, and proxy servers A graphical summary of how page requests from an individual user can bemasked at various stages between the user’s local computer and the Webserver.