Modeling Human Behavior on the Web: Understanding Web Data and Browsing Patterns, Slides of Fundamentals of E-Commerce

An overview of modeling human behavior on the web, focusing on web data and measurement issues, empirical client-side studies of browsing behavior, and probabilistic models of browsing behavior. It discusses the importance of understanding human digital behavior, the collection and identification of web data, and the impact of robots and human traffic on server-side data. The document also touches upon page requests, caching, and proxy servers.

Typology: Slides

2012/2013

Uploaded on 07/30/2013

post_box
post_box 🇮🇳

4.7

(3)

113 documents

1 / 11

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Modeling the Internet and the Web:
Modeling and Understanding Human
Behavior on the Web
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Modeling Human Behavior on the Web: Understanding Web Data and Browsing Patterns and more Slides Fundamentals of E-Commerce in PDF only on Docsity!

Modeling the Internet and the Web:

Modeling and Understanding Human

Behavior on the Web

Outline

Introduction

Web Data and Measurement Issues

Empirical Client-Side Studies of BrowsingBehavior

Probabilistic Models of Browsing Behavior

Modeling and Understanding Search EngineQuerying

Web data and measurement issues Background:

Important to understand how data is collected - Web data is collected automatically via softwarelogging tools - Advantage: - No manual supervision required - Disadvantage: - Data can be skewed (e.g. due to the presence of robot traffic) - Important to identify robots (also known as crawlers,spiders)

A time-series plot of Web requests Number of page requests per hour as a function of time from pagerequests in the www.ics.uci.edu Web server logs during the first week of April 2002.

Robot / human identification

Robot traffic consists of two components

Periodic Spikes (can overload a server)

Requests by “bad” robots - Lower-level constant stream of requests - Requests by “good” robots

Human traffic has

Daily pattern: Monday to Friday

Hourly pattern: peak around midday & low trafficfrom midnight to early morning

Server-side data

Data logging at Web servers •

Web server sends requested pages to therequester browser

It can be configured to archive these requestsin a log file recording

URL of the page requested

Time and date of the request - IP address of the requester - Requester browser information (agent)

Page requests, caching, and proxy servers

In theory, requester browser requests a pagefrom a Web server and the request isprocessed

In practice, there are

Other users

Browser caching - Dynamic addressing in local network - Proxy Server caching

Page requests, caching, and proxy servers A graphical summary of how page requests from an individual user can bemasked at various stages between the user’s local computer and the Webserver.