FluXQuery: An Optimizing XQuery Processor for Streaming XML Data - Prof. Jiawei Han | Study Guides, Projects, Research Computer Science

FluXQuery: An Optimizing XQuery Processor

for Streaming XML Data

Christoph Koch†,∗Stefanie Scherzinger‡Nicole Schweikardt\Bernhard Stegmaier]

†: Technische Universit¨at Wien, Vienna, Austria, Email: [email protected]

‡: Technische Universit¨at Wien, Vienna, Austria, Email: [email protected]

\: Humboldt Universit¨at zu Berlin, Berlin, Germany, Email: [email protected]

]: Technische Universit¨at M¨unchen, Munich, Germany, Email: [email protected]

1 Introduction and Motivation

XML has established itself as the ubiquitous format

for data exchange on the Internet. An imminent de-

velopment is that of streams of XML data being ex-

changed and queried. Data management scenarios

where XQuery [11] is evaluated on XML streams are

becoming increasingly important and realistic, e.g. in

e-commerce settings.

Naturally, query engines employed for stream pro-

cessing are main-memory-based, yet contemporary

XQuery engines consume main memory in large mul-

tiples of the actual size of the input documents (cf.

[10, 8]). This excessive need for buffers has proven to

be a serious scalability issue and significant research

challenge [10, 9, 5, 3].

So far, the efficient evaluation of XPath on streams

has been closely investigated to the point where state-

of-the-art techniques use very little main memory

[1, 4, 6, 7]. However, corresponding approaches to

the effective and economical processing of XQuery on

streams are still at a preliminary stage. XQuery, as a

data-transformation query language, is of an entirely

different nature than node-selecting XPath. This con-

stitutes the need to develop sophisticated techniques

for coping with and reducing main memory buffers

during XQuery evaluation.

What is required is a well-principled machinery for

processing XQuery which is parsimonious with re-

sources in that it minimizes the amount of buffer-

ing necessary. Any such solution should allow for

∗Work support by project Z29-N04 of the Austrian Science

Fund (FWF).

Permission to copy without fee all or part of this material is

granted provided that the copies are not made or distributed for

direct commercial advantage, the VLDB copyright notice and

the title of the publication and its date appear, and notice is

given that copying is by permission of the Very Large Data Base

Endowment. To copy otherwise, or to republish, requires a fee

and/or special permission from the Endowment.

Proceedings of the 30th VLDB Conference,

Toronto, Canada, 2004

both extensibility and the leverage of a large body

of the database community’s related earlier work to

take effect. Under these considerations, such machin-

ery needs to employ an algebraic view of queries and

optimizations.

So far, no principled work exists on algebraic query

optimization for structured data streams (such as

XML, but unlike flat tuple streams, e.g. [2]) which

takes into account the special features of stream pro-

cessing. In particular, we lack an algebra for query-

ing structured data which truly captures the spirit of

stream processing and which prepares the ground for

optimizing query evaluation using schema information.

In this demonstration, we present the FluXQuery

engine as the first optimizing XQuery engine for

streams. Optimization in FluXQuery is based on a

new internal query language called FluX [8] which

slightly extends the main structures of XQuery by a

construct for event-based query processing. By al-

lowing for the conscious use of main memory buffers,

it supports reasoning over the employment of buffers

during query evaluation.

2 The FluX Query Language

We consider the following XQuery Qin a bibliography

domain, as found among the XML Query Use Cases

[12] (XMP Q3):

{ for $b in $ROOT/bib/book return

<result> { $b/title } { $b/author } </result> }

</results>

This query lists the title(s) and authors of each book

in the bibliography and groups them inside a “result”

element. Note that the XQuery language requires that,

within each book, titles are output before all authors.

Now the DTD

<!ELEMENT bib (book)*>

<!ELEMENT book (title|author)*>

FluXQuery: An Optimizing XQuery Processor for Streaming XML Data - Prof. Jiawei Han, Study Guides, Projects, Research of Computer Science

Related documents

Partial preview of the text

Download FluXQuery: An Optimizing XQuery Processor for Streaming XML Data - Prof. Jiawei Han and more Study Guides, Projects, Research Computer Science in PDF only on Docsity!