XML Document Analysis: Homework 3 for DB Course Spring 2005, Study notes of Database Management Systems (DBMS)

Instructions for homework 3 of a db course in spring 2005. The assignment involves using xml documents, specifically saxon xquery processor, to answer various queries about the data structures of auction.xml and thatt.xml. The document also includes instructions for converting thatt.xml into a new format and designing a relation schema for thesaurus data.

Typology: Study notes

2011/2012

Uploaded on 01/29/2012

arold
arold 🇺🇸

4.7

(24)

372 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CPS 216 Spring 2005
Homework #3
Assigned: Tuesday, March 29
Due: Tuesday, April 12
Problem 1.
Consider an XML document (from a project called XMark) modeling the data maintained by
an Internet auction site in /home/dbcourse/examples/xml-xmark/auction.xml. The
main entities modeled are: items, persons, open auctions, closed auctions, and categories.
iteminfo elements describe items that are for sale or that already have been sold.
Each item carries a unique identifier and bears properties like payment method (credit
card, money order, etc.), a reference to the seller, a description, etc., all encoded as
subelements. Each item belongs to a world region represented by the iteminfo’s
parent.
Open auctions are auctions in progress. Their properties include the bid history (i.e.,
increases over time) along with references to the bidders, a reference to the seller, a
reference to the iteminfo being sold, etc.
Closed auctions are auctions that are finished. Their properties include references to
the seller and the buyer, a reference to the respective iteminfo, the price, the
quantity of items sold, etc.
Persons are characterized by name, email address, phone number, mail address, profile
of their interests, a set of open auctions they watch, etc.
Categories feature a name and a description; they are used to implement classification
of items. A category graph links categories into a network.
The figure below illustrates the part of the document structure that is relevant to the problem:
site
regions people open_auctions closed_auctions
{africa,asia,...}
iteminfo
description
address
name
street
person
name
city country
zipcode
open_auction
bidder itemref seller
date
time personref
increase
initial
closed_auction
itemref
seller bu yer
price
incategory
On rack40, use the following query with saxonxq
<result>{doc("/home/dbcourse/examples/xml-xmark/auction.xml")}</result>
to get an overview of the document. Please refer to the document “XML Programming Notes”
on the course Web site for instructions on running saxonxq, the Saxon XQuery processor.
pf3
pf4

Partial preview of the text

Download XML Document Analysis: Homework 3 for DB Course Spring 2005 and more Study notes Database Management Systems (DBMS) in PDF only on Docsity!

CPS 216 Spring 2005 Homework # Assigned: Tuesday, March 29 Due: Tuesday, April 12

Problem 1.

Consider an XML document (from a project called XMark) modeling the data maintained by an Internet auction site in /home/dbcourse/examples/xml-xmark/auction.xml. The main entities modeled are: items, persons, open auctions, closed auctions, and categories.

  • iteminfo elements describe items that are for sale or that already have been sold. Each item carries a unique identifier and bears properties like payment method (credit card, money order, etc.), a reference to the seller, a description, etc., all encoded as subelements. Each item belongs to a world region represented by the iteminfo’s parent.
  • Open auctions are auctions in progress. Their properties include the bid history (i.e., increases over time) along with references to the bidders, a reference to the seller, a reference to the iteminfo being sold, etc.
  • Closed auctions are auctions that are finished. Their properties include references to the seller and the buyer, a reference to the respective iteminfo, the price, the quantity of items sold, etc.
  • Persons are characterized by name, email address, phone number, mail address, profile of their interests, a set of open auctions they watch, etc.
  • Categories feature a name and a description; they are used to implement classification of items. A category graph links categories into a network. The figure below illustrates the part of the document structure that is relevant to the problem: site

regions people open_auctions closed_auctions

{africa,asia,...}

iteminfo description

address

name

street

person name

city country

zipcode

open_auction bidder itemref seller date time personref

increase

initial

closed_auction itemref buyer seller

price

incategory

On rack40, use the following query with saxonxq {doc("/home/dbcourse/examples/xml-xmark/auction.xml")} to get an overview of the document. Please refer to the document “XML Programming Notes” on the course Web site for instructions on running saxonxq, the Saxon XQuery processor.

Write queries in XQuery to answer the following questions. Because Saxon does not have a sophisticated optimizer, query performance may be influenced by the way you write your queries. If a particular query takes forever to run, consider reordering loops and evaluating selections (filters) as early as possible.

For each question below, say (a), write your XQuery in a file named hw3-1-a.xquery, and generate the output file hw3-1-a.xml by running saxonxq hw3-1-a.xquery > hw3-1- a.xml. Turn in printout of all your .xquery and output .xml files. If the output file is too long, print out only the first page of the output.

(a) Find names of all items in “namerica” region. (b) Find names of all items that belong to “category0.” (c) Find names of all persons whose address is in “United States.” (d) Find all buyers who paid for more than $50 in a closed auction. (e) Find names of all person who has bidden in an open auction for an item whose name contains the string “cow.” (f) Find names of all persons with address in “United States” who never bought anything in closed auctions. (g) For each open auction whose seller's name is “Venkatavasu Takano,” print out the following information: <open_auction id=“...”> <bidder_name>...</bidder_name> <bidder_name>...</bidder_name> ... </open_auction>

Problem 2.

Consider an XML document describing a thesaurus (from http://zthes.z3950.org/) in /home/dbcourse/examples/xml-zthes/thatt.xml. The XML document contains a list of terms. Each term has a list of relations with other terms. For example, the following excerpt from thatt.xml means that the term INFORMATION ECONOMICS is related to terms INFORMATION, INFORMATION INDUSTRY, INFORMATION SYSTEMS, and INFORMATION THEORY: 663 INFORMATION ECONOMICS NT BT 662 INFORMATION RT 664

Show the raw output generated by your code; do not manually edit the output (other than truncating it for printout).

(b) Design and create a relation schema to store the thesaurus data using the schema- aware approach discussed in lecture. Your design should not have the redundancy described in (a). Populate the tables with the data from thatt.xml. You should write a program or script to convert the data into flat files and import them into DB2. Do not use individual INSERT statements. The choice of programming/scripting language and XML API is up to you. Instructions on how to load a DB2 database can be found in the document “DB2 SQL Programming Notes” on the course Web site; examples can be found in /home/dbcourse/examples/db-*/ directories.

Turn in the following:

  • CREATE TABLE statements you used in creating the relational schema, with keys and foreign keys properly declared.
  • Printout of the program/script you used to generate DB2 data files for loading.
  • Printout of the script you used to import data files into DB2.
  • Printout of the result of running the following query for each table you have loaded: SELECT * FROM table FETCH FIRST 10 ROWS ONLY;