Google Page Rank Linear Algebra | Exercises Linear Algebra

LINEAR ALGEBRA APPLICATION: GOOGLE PAGERANK

ALGORITHM.

JONATHAN MACHADO

Abstract. Google’s PageRank algorithm is what makes Google such a strong search en-

gine. The pioneering PageRank algorithm redefined how a search engine operates and

executes. In this paper, the underlying mathematical basics for understanding how the al-

gorithm functions are provided. A basic analysis of hyperlinks with its association to the

algorithm and the PageRank algorithm is studied. Ultimately, this paper shines light on a

neat application of linear algebra coupled with graph theory.

1. Introduction

Revolutionizing how the modern world operates, the Internet is a powerful medium in

which anyone around the world, regardless of location, can access endless information about

any subject and communicate with one another without bounds. All that is needed is

a computer and the World Wide Web. One of the greatest results of the Internet was

the establishment of hyperlinks. The World Wide Web is an extensive computer network

consisting of billions of web pages holding documents of information. Hyperlinks are the

pathways from one web page to another, initiating the capability of communication between

these pages. Interactions between documents are performed by referencing one another via

links. Here lies the foundation on how the most dominant search engine, Google, does its

magic.

So, how does Google do it? Initially, Google breaks the web into sections, crawls through

these segments, and adds it to their main index kept across thousands of different machines

[2]. This process is done daily to keep Google’s index of the web up-to-date. Now, a user

visits Google, types in a query, and off the Google search engine goes to find the most

relevant and important web pages to be shown in regards to what was searched. First, the

query is decomposed into the individual words typed in the search engine [3]. Google then

deploys programs known as spiders that crawl in Google’s index in search for pages that

include the words, across many machines [1]. These spiders start off on a few pages. They

follow the links on the current page to other pages on a continuous search; and so on, until

every page regarding the query is indexed [3]. All of these pages are combined together for

Google to now apply over a hundred different ranking factors such as the quality of the page

(authoritative, low quality, or spam), the location of the words (in the title, url, etc.), the

proximity of the words (if the words are next to each in a sentence or not), time users have

spent on the pages before, etc., to sort the resulting pages based on overall rank [1].

Notably, the famous PageRank algorithm created by Google’s founders is the most critical

component in determining the overall rank of a page. Throughout the searching process,

the PageRank algorithm is main factor used to evaluate the pages that are most reputable

and authoritative across the index. The derivation of the PageRank algorithm was what set

Google apart from the rest early on and made it the successful, most powerful search engine

to date. The PageRank algorithm revolutionized how search engines retrieved pages from

Google Page Rank Linear Algebra, Exercises of Linear Algebra