Adaptive XML Parsing for High-Performance Web Services: TDX & Permutation Phrase Grammar | Papers Theatre

An Adaptive XML Parser for Developing High-Performance Web Services

Wei Zhang and Robert A. van Engelen

Department of Computer Science

Florida State University, Tallahassee, FL 32306

{wzhang,engelen}@cs.fsu.edu

Abstract

This paper presents an adaptive XML parser that is

based on table-driven XML (TDX) parsing technology.

This technique can be used for developing extensible high-

performance Web services for large complex systems that

typically require extensible schemas. The parser integrates

scanning, parsing, and validation into a single-pass with-

out backtracking by utilizing compact tabular representa-

tions of schemas and a push-down automaton (PDA) at

runtime. The tabular forms are constructed from a set of

schemas or WSDL descriptions through the use of permuta-

tion grammar. The engine is implemented as a PDA-based,

table-driven driver, as a result, it is independent of XML

schemas. When XML schemas are updated or extended,

the tabular forms can be regenerated and populated to the

generic engine without requirement of redeployment of the

parser. This adaptive approach balances the need for per-

formance against the requirements of reconstruction and

redeployment of the Web services. Our experiments show

the adaptive parser usually demonstrates performance of 5

times faster than traditional validating parsers and perfor-

mance drop within 20% of the fastest fully compiled tradi-

tional validating parsers.

1. Introduction

The Extensible Markup Language (XML) format deliv-

ers key advantages in interoperability and is widely adopted

as a standard for exchanging structured information by Web

services. Web services technologies and applications have

built on the success of XML by providing standardized

delivery of structurally and semantically rich content over

the Web, as defined by the Simple Object Access Proto-

col (SOAP) and Web Service Definition Language (WSDL)

W3C standards. However, the interoperability of XML Web

services often comes at the price of reduced efficiency of

message composition, transfer, and parsing compared to

simple binary protocols. Several studies have evaluated

the performance of SOAP and concluded that SOAP and

XML incur a substantial performance penalty compared to

binary protocols [4, 9,10]. Parsing and validation of XML

against a schema is expensive [12,19], as well as the cost

of deserialization into usable in-memory objects for appli-

cations [6, 10].

Several efforts have been made to address the parsing

and validation performance through the use of grammar-

based parser generation by leveraging XML schema lan-

guages such as DTD [23], XML schema [14], and Re-

lax NG [8] at compile time. Compiled schema-specific

parsers [7, 11,15, 16,20–22, 24,25] have shown significant

performance improvement. Schema-specific p arsersenco de

parsing states and validation rules at compile time by ex-

ploiting schema structures and validation rules to increase

processing efficiency at runtime.

However, each generated schema-specific parser must be

appropriate to the operating system, compiler, supporting li-

braries, and hardware on which applications will be run on.

The parser must be regenerated and deployed when an XML

schema is updated. This is a significant ch allenge for devel-

oping extensible Web services. A Web service is usually a

long term agreement that allows consumers to interact with

a web service. To address schema updates, service design-

ers typically add new elements to their schema by changing

the source code, adding the required business logic and re-

building the service. However, this approach only works for

simple services. Consider for example a large business ap-

plication requires customizations to fit specific industries,

countries, and customers. Exposing such business applica-

tions as web services is difficult because they have to be

able to be customized over time and these customizations

must work for all consumers, even consumers that have

made changes to the application. This issue requires that

the services have to be designed to be extensible, i.e. Ex-

tensible Web services that typically require extensible XM L

schemas.

Our previous works [26] presents a table-driven XML

(TDX) parsing and validation for high-performance Web

services. Our TDX technique utilizes a compact tabu-

lar representation of schemas and a push-down automa-

ton (PDA) for a single-pass parsing and validation with-

Adaptive XML Parsing for High-Performance Web Services: TDX & Permutation Phrase Grammar, Papers of Theatre

Related documents

Partial preview of the text

Download Adaptive XML Parsing for High-Performance Web Services: TDX & Permutation Phrase Grammar and more Papers Theatre in PDF only on Docsity!

An Adaptive XML Parser for Developing High-Performance Web Services

Wei Zhang and Robert A. van Engelen

Department of Computer Science

Florida State University, Tallahassee, FL 32306

{ wzhang,engelen } @cs.fsu.edu

Abstract

1. Introduction

2. Overview of Adaptive pTDX-based Parser

4. Constructing Modular Tables

4.1. Token Table

4.2. Action Table

4.3. Parsing Table

5. Constructing Generic Engine

5.1. Scanning Mode

5.2. Parsing Mode

8. Conclusion

References