Digital Libraries: Content, Discovery, Interoperability, and User Experience, Study Guides, Projects, Research of Computer Science

An overview of digital libraries, discussing their content, information discovery, interoperability, user interaction, security, and preservation. It covers various digital libraries such as the national digital library, perseus digital library, and the internet archive. The document also explores issues related to content availability, information discovery, interoperability, user interaction, security, and preservation, and proposes solutions using databases, distributed systems, human-computer interaction, linguistics, natural language processing, artificial intelligence, and library science.

Typology: Study Guides, Projects, Research

Pre 2010

Uploaded on 08/05/2009

koofers-user-sop
koofers-user-sop 🇺🇸

3

(1)

10 documents

1 / 25

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Digital libraries
Brian F. Cooper
October 28, 2003
Overview
What is a digital library?
Content
Information discovery
Interoperability
User interaction
Security
Preservation
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19

Partial preview of the text

Download Digital Libraries: Content, Discovery, Interoperability, and User Experience and more Study Guides, Projects, Research Computer Science in PDF only on Docsity!

Digital libraries

Brian F. Cooper

October 28, 2003

Overview

„ What is a digital library?

„ Content

„ Information discovery

„ Interoperability

„ User interaction

„ Security

„ Preservation

What is a digital library?

„ What is a library?

„ What is a digital library?

Digital libraries

„ National Digital Library (Library of Congress)

„ Digital Library of Georgia

„ Alexandria Digital Library

„ Perseus Digital Library

„ Internet Archive

„ ACM Digital Library

„ Georgia Tech Library Digital Initiative

„ Citeseer

„ Google

Content

„ How much content is there?

† Audio collection of the works of Beethoven – 20 GB † Library floor of books on shelves – 50 GB † Academic research library – 2 TB † Printed collection of the Library of Congress – 10 TB † 3 years of EOS data – 1 petabyte † All printed material – 200 petabytes † All words ever spoken by human beings – 5 exabytes

Source: http://www.jamesshuggins.com/h/tek1/how_big.htm

Content

„ Where does the content come from

† Digitized

† Born digital

Content

„ Types of content

† Documents

† Images

† Audio/video

† Geo-spatial

† Searchable databases

Geospatial content

„ Alexandria Digital Library Project

† Commercial example: Mapquest

„ Georeferenced images

† Maps † Satellite images

„ Gazetteer

† Find geographic entities based on name † Like the index in an atlas

Source: http://www.alexandria.ucsb.edu

Use the librarian

„ Idea: things are organized well

„ But…

† We make data faster than librarians can organize it † (Is this a good thing?)

Use the “card catalog”

„ Search metadata

Author: William Shakespeare Title: Hamlet Year: 1601 Subject: Depressed Danish princes

„ Search content

“question outrageous fortune”

To be, or not to be: that is the question: Whether 'tis nobler in the mind to suffer The slings and arrows of outrageous fortune, Or to take arms against a sea of troubles, And by opposing end them? To die: to sleep; No more;

Metadata

„ Issues

† Where does it come from?

† What does it mean? „ E.g. “author” versus “creator” versus “artist”

† How do you manage it?

† What happens over time?

Metadata

„ Example: Dublin core

„ Core elements

† Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language, Relation, Coverage, Rights

„ Other elements

† Audience, abstract, replaces … † 33 elements † Can be extended

Source: http://dublincore.org/documents/2003/03/04/dcmi-terms/

Google

„ Basic idea: PageRank

† “A page is important if important pages point to it”

Stanford University

Bob’s Homepage

The ugly math

„ The PageRank equation

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... +PR(Tn)/C(Tn))

† PR(A) – the PageRank of a page † PR(Ti) – the PageRank of a page pointing to A † C(Ti) – the number of outgoing links of Ti † d – Dampening factor so the computation converges

„ The theoretical justification † The “random surfer model”

Source: S. Brin and L. Page, "The Anatomy of a Large-Scale Hypertextual Web Search Engine," Proc. 1998 WWW Conf., 1998

Personalized search

„ Find information that is relevant to you

† Classic problem: “apple” versus “Apple” † Idea: pages you like are “important” † Calculate page rank for other important pages

„ Challenges

† This is expensive! † Calculate PageRank over 3 billion pages for each user

The ugly truth

„ Google can be “gamed”

† Create lots of links to your page † Weblogs

„ PageRank little used anymore

† Anchor text, word similarity, other secret things

„ Google is “broken”

Interoperability issues

„ Legacy systems

„ Data schema

„ Data encoding

„ Data semantics „ Query language

„ Result ranking

„ Limited interfaces

„ Current protocols solve only part of the problem † TCP, HTTP, SOAP, etc.

DL-specific protocols

„ Z39.

† Search and retrieve “card catalog” information

„ SDLIP/STARTS/SDARTS

† Limited state search, retrieval and metadata

„ OAI

† Protocols and toolkit for “Open Archives”

User interaction

„ How can a user get the information?

† The old days

Things have changed

Things have changed

Stanford PowerBrowser

„ Browsing on PDA’s

† Limited screen real-estate † Limited PDA processing † Limited bandwidth

„ Solution

† Innovative user interface † PDA/proxy interaction

Source: http://www-diglib.stanford.edu/~testbed/doc2/PowerBrowsing/desc.html

Browsing on mobile devices

„ Stanford PowerBrowser

Browsing on mobile devices

„ Stanford PowerBrowser

Browsing on mobile devices

„ Stanford PowerBrowser

Browsing image libraries

„ Stanford PhotoBrowser

† How to organize boxes of photos? † Digital photos have identifying information „ Date „ Maybe location

Source: Graham, Garcia-Molina, Paepcke and Winograd. "Time as Essence for Photo Browsing Through Personal Digital Libraries.” JCDL 2002

PhotoBrowser

PhotoBrowser