

















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An overview of digital libraries, discussing their content, information discovery, interoperability, user interaction, security, and preservation. It covers various digital libraries such as the national digital library, perseus digital library, and the internet archive. The document also explores issues related to content availability, information discovery, interoperability, user interaction, security, and preservation, and proposes solutions using databases, distributed systems, human-computer interaction, linguistics, natural language processing, artificial intelligence, and library science.
Typology: Study Guides, Projects, Research
1 / 25
This page cannot be seen from the preview
Don't miss anything!


















October 28, 2003
National Digital Library (Library of Congress)
Digital Library of Georgia
Alexandria Digital Library
Perseus Digital Library
Internet Archive
ACM Digital Library
Georgia Tech Library Digital Initiative
Citeseer
How much content is there?
Audio collection of the works of Beethoven – 20 GB Library floor of books on shelves – 50 GB Academic research library – 2 TB Printed collection of the Library of Congress – 10 TB 3 years of EOS data – 1 petabyte All printed material – 200 petabytes All words ever spoken by human beings – 5 exabytes
Source: http://www.jamesshuggins.com/h/tek1/how_big.htm
Digitized
Born digital
Documents
Images
Audio/video
Geo-spatial
Searchable databases
Commercial example: Mapquest
Maps Satellite images
Find geographic entities based on name Like the index in an atlas
Source: http://www.alexandria.ucsb.edu
We make data faster than librarians can organize it (Is this a good thing?)
Search metadata
Author: William Shakespeare Title: Hamlet Year: 1601 Subject: Depressed Danish princes
Search content
“question outrageous fortune”
To be, or not to be: that is the question: Whether 'tis nobler in the mind to suffer The slings and arrows of outrageous fortune, Or to take arms against a sea of troubles, And by opposing end them? To die: to sleep; No more;
Where does it come from?
What does it mean? E.g. “author” versus “creator” versus “artist”
How do you manage it?
What happens over time?
Example: Dublin core
Core elements
Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language, Relation, Coverage, Rights
Other elements
Audience, abstract, replaces … 33 elements Can be extended
Source: http://dublincore.org/documents/2003/03/04/dcmi-terms/
“A page is important if important pages point to it”
Stanford University
Bob’s Homepage
The PageRank equation
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... +PR(Tn)/C(Tn))
PR(A) – the PageRank of a page PR(Ti) – the PageRank of a page pointing to A C(Ti) – the number of outgoing links of Ti d – Dampening factor so the computation converges
The theoretical justification The “random surfer model”
Source: S. Brin and L. Page, "The Anatomy of a Large-Scale Hypertextual Web Search Engine," Proc. 1998 WWW Conf., 1998
Classic problem: “apple” versus “Apple” Idea: pages you like are “important” Calculate page rank for other important pages
This is expensive! Calculate PageRank over 3 billion pages for each user
Create lots of links to your page Weblogs
Anchor text, word similarity, other secret things
Legacy systems
Data schema
Data encoding
Data semantics Query language
Result ranking
Limited interfaces
Current protocols solve only part of the problem TCP, HTTP, SOAP, etc.
Search and retrieve “card catalog” information
Limited state search, retrieval and metadata
Protocols and toolkit for “Open Archives”
The old days
Limited screen real-estate Limited PDA processing Limited bandwidth
Innovative user interface PDA/proxy interaction
Source: http://www-diglib.stanford.edu/~testbed/doc2/PowerBrowsing/desc.html
How to organize boxes of photos? Digital photos have identifying information Date Maybe location
Source: Graham, Garcia-Molina, Paepcke and Winograd. "Time as Essence for Photo Browsing Through Personal Digital Libraries.” JCDL 2002