Link Analysis Algorithms - Lecture Notes | COMP 140, Study notes of Computer Science

Material Type: Notes; Professor: Subramanian; Class: AN INTEGRATED INTRODUCTION TO COMPUTATIONAL AND PROBLEM SOLVING; Subject: Computer Science; University: Rice University; Term: Fall 2008;

Typology: Study notes

Pre 2010

Uploaded on 08/16/2009

koofers-user-ncz
koofers-user-ncz 🇺🇸

9 documents

1 / 26

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Link analysis of directed graphs
Devika Subramanian
Comp 140
Fall 2008
1
Lecture derived from Ullman and Rajaraman
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a

Partial preview of the text

Download Link Analysis Algorithms - Lecture Notes | COMP 140 and more Study notes Computer Science in PDF only on Docsity!

Link analysis of directed graphs

Devika Subramanian

Comp 140

Fall 2008

1 Lecture derived from Ullman and Rajaraman

Link analysis algorithms

Page rank algorithm

Hubs and authorities algorithm

Simple recursive formulation

Each link’s vote is proportional to the

importance of its source page

If page P with importance x has n

outlinks, each link gets x/n votes

 Page P’s own importance is the sum of

the votes on its inlinks

An example y = y /2 + a / a = y /2 + m m = a / Yahoo Amazon M’soft a m a/ a/ y/ m y y/

Matrix formulation of flow problem  Matrix M has N rows and N columns, one for each page in set of pages to be ranked.  (^) If page j has n outlinks  (^) if j points to i, M ij=1/n  (^) else M ij= 0  (^) Let r be a vector of length N  (^) r i is the importance score of page  (^) |r| = 1

Matrix formulation (contd.)

 The flow equations can be succintly

represented as the matrix equation

r = Mr

Power iteration

Initialize: r

0 = [1/N,….,1/N] T

Iterate: r

k+

= Mr

k

Stop when | r k+

  • r k | 1 < ε The essence of the Page Rank algorithm!

Power iteration example Yahoo Amazon M’soft y 1/2 1/2 0 a 1/2 0 1 m 0 1/2 0 y a m y a = m

The stationary distribution  (^) Where is the surfer at time t+1?  (^) Follows a link uniformly at random  (^) p(t+1) = Mp(t)  (^) Suppose the random walk reaches a state such that p(t+1) = Mp(t) = p(t)  (^) Then p(t) is called a stationary distribution for the random walk  (^) Our rank vector r satisfies r = Mr  (^) So it is a stationary distribution for the random surfer

Existence and uniqueness of solution  (^) For graphs that satisfy certain conditions, the stationary distribution is unique and eventually will be reached no matter what the initial probability distribution at time t = 0.  (^) The r vector is the scaled page rank score.

Implementation (contd.)

Repeat

 (^) r i+1 = Mri

Until successive versions of r are nearly

equal

numpy has functions for making vectors

and matrices and for multiplying them.

Hubs and authorities algorithm

Called HITS (Hypertext-induced topic

selection)

Developed by Jon Kleinberg in 1998,

about the same time Page and Brin

wrote their Page Rank paper.

Defining hubs and authorities  A good hub links to many good authorities  (^) A good authority is linked from many good hubs  (^) Model using two scores for each node  Hub score and Authority score  Represented as vectors h and a

Defining link structure using transition matrix A

 HITS uses a matrix A[i, j] = 1 if page i

links to page j, 0 if not.

 A

T

, the transpose of A, is similar to the

PageRank matrix M, but A

T

has 1’s where

M has fractions.