Bridging Java and Deductive Databases - Research and Dissertation | CS 7994, Papers of Computer Science

Material Type: Paper; Class: Research and Dissertation; Subject: Computer Science; University: Virginia Polytechnic Institute And State University; Term: Unknown 1989;

Typology: Papers

Pre 2010

Uploaded on 02/13/2009

koofers-user-0m4
koofers-user-0m4 🇺🇸

10 documents

1 / 102

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
DJ: Bridging Java and Deductive Databases
Andrew B. Hall
Thesis submitted to the Faculty of the
Virginia Polytechnic Institute and State University
in partial fulfillment of the requirements for the degree of
Master of Science
in
Computer Science
Dr. Eli Tilevich, Chair
Dr. Osman Balci
Dr. Naren Ramakrishnan
May 30, 2008
Blacksburg, Virginia
Keywords and phrases: Database Management System (DBMS), Database, Deductive Database,
Column Oriented Database, Object Oriented (OO) Language, Orthogonal Persistence, Plain
Old Java Object (POJO), Plain Old Java Interface (POJI), Dynamic Proxy, Middleware
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download Bridging Java and Deductive Databases - Research and Dissertation | CS 7994 and more Papers Computer Science in PDF only on Docsity!

DJ: Bridging Java and Deductive Databases

Andrew B. Hall

Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of

Master of Science in Computer Science

Dr. Eli Tilevich, Chair Dr. Osman Balci Dr. Naren Ramakrishnan

May 30, 2008 Blacksburg, Virginia

Keywords and phrases: Database Management System (DBMS), Database, Deductive Database, Column Oriented Database, Object Oriented (OO) Language, Orthogonal Persistence, Plain Old Java Object (POJO), Plain Old Java Interface (POJI), Dynamic Proxy, Middleware

DJ: Bridging Java and Deductive Databases

Andrew B. Hall

ABSTRACT

Modern society is intrinsically dependent on the ability to manage data effectively. While relational databases have been the industry standard for the past quarter century, recent growth in data volumes and complexity requires novel data management solutions. These trends revitalized the interest in deductive databases and highlighted the need for column- oriented data storage. However, programming technologies for enterprise computing were designed for the relational data management model (i.e., row-oriented data storage). There- fore, developers cannot easily incorporate emerging data management solutions into enter- prise systems.

To address the problem above, this thesis presents Deductive Java (DJ), a system that en- ables enterprise programmers to use a column oriented deductive database in their Java applications. DJ does so without requiring that the programmer become proficient in de- ductive databases and their non-standardized, vendor-specific APIs. The design of DJ in- corporates three novel features: (1) tailoring orthogonal persistence technology to the needs of a deductive database with column-oriented storage; (2) using Java interfaces as a primary mapping construct, thereby simplifying method call interception; (3) providing facilities to deploy light-weight business rules.

DJ was developed in partnership with LogicBlox Inc., an Atlanta based technology startup.

Ramakrishnan has always been incredibly welcoming whenever I have had the occasion to spend time with him, and I greatly enjoyed his “Data Mining” class.

Next I would like to thank the kind folks at LogicBlox. The work presented in this thesis was the result of a partnership with the company. Wes Hunter and Dave Zook spent many hours helping us to grasp the concepts, challenges, and an understanding of the LogicBlox technology. Molham Aref believed in the project enough to fund us, and there were many other people at LogicBlox who contributed technical insights such as Greg Brooks, Soeren Oleson, Mark Bloemeke, and Steve Coulson. Special thanks go Erin Hunter, who handled all of the financial arrangements both with Virginia Tech, and for traveling to Atlanta.

I am incredibly thankful for all of my friends and roommates who were so supportive over the past two years. They were always there to say a kind word, provide encouragement, and in some cases even look over my thesis. My roommates put up with my hectic schedule, and even managed to not see me for days at a time. I would also like to thank my graduate school partners and lab-mates who I worked with here at Virginia Tech, especially Wes Tansey, who was always there to talk over things, and evaluate whether they made sense from an objective point of view.

Finally I would like to thank my future employer Microsoft Corporation for the opportunity to apply the lessons I have learned as a student at Virginia Tech solving real world problems.

iv

Contents

Abstract ii

Acknowledgments iii

List of Figures viii

List of Tables ix

List of Acronyms x

1 Introduction 1 1.1 Motivation For Change.............................. 2 1.2 Literature Review of Deductive Databases................... 4 1.2.1 Database Evolution............................ 6 1.2.2 Brief History of Deductive Databases.................. 7 1.3 Statement of the Problem............................ 10 1.4 Statement of Objectives............................. 11 1.5 Overview of Thesis................................ 11

2 LogicBlox: A Real World Need 13 2.1 LogicBlox Implementation Overview...................... 13 2.2 Overview of Column-Oriented Databases.................... 15 2.3 Objectives..................................... 19 2.4 Challenges.................................... 22 v

  • 3 Related Work
    • 3.1 Database Connectivity
      • 3.1.1 Users
      • 3.1.2 Application Developers
      • 3.1.3 DBMS Vendors
      • 3.1.4 Connectivity Implementations
    • 3.2 Object Oriented Data Mapping
      • 3.2.1 Orthogonal Persistence
      • 3.2.2 Microsoft LINQ
    • 3.3 Applicability of Existing Technologies
      • 3.3.1 JDBC
      • 3.3.2 Java Hibernate
      • 3.3.3 Microsoft LINQ
  • 4 DJ: A Java to Deductive Database Bridge
    • 4.1 Solution Approaches
      • 4.1.1 JDBC Like Approach
      • 4.1.2 Meta Data Types
      • 4.1.3 Meta-Data Datalog Queries
      • 4.1.4 Database Representative Java Classes
      • 4.1.5 Classes versus Interfaces
      • 4.1.6 Meta-Data Bindings
    • 4.2 DJ Implementation
      • 4.2.1 Example in Datalog
      • 4.2.2 Basic “Bind” Functionality
      • 4.2.3 More Advanced “Bind” Functionality
      • 4.2.4 Getter – Setter Functionality
      • 4.2.5 Set Handling
      • 4.2.6 Business Rule Support
      • 4.2.7 Using the Annotated Interfaces
      • 4.2.8 Generating Annotated Interfaces
    • 4.3 Evaluation
      • 4.3.1 Annotated Approach
      • 4.3.2 Code Generation
  • 5 Conclusions and Future Research
    • 5.1 Conclusions
    • 5.2 Contributions
      • 5.2.1 Introduction of POJIs
      • 5.2.2 Object Oriented to Column Oriented Mapping
      • 5.2.3 Automatic deployment of business rules
    • 5.3 Future Research
      • 5.3.1 Dynamic Queries
      • 5.3.2 Asynchronous Queries
      • 5.3.3 Advanced Business Rules
      • 5.3.4 Transaction Management
      • 5.3.5 IDE Support
      • 5.3.6 Query Batching
      • 5.3.7 Multiple Data Sources
  • Bibliography
  • 2.1 Project Visualization List of Figures
  • 3.1 Database communication through a vendor API
  • 3.2 Database communication through a vendor-independent standard
  • 3.3 The user’s perspective of the need for database connectivity
  • 3.4 The application developer’s perspective of the need for database connectivity
  • 3.5 The DBMS vendor’s perspective of the need for database connectivity
  • 3.6 ODBC Components
  • 3.7 A component DBMS architecture using OLE DB interfaces
  • 3.8 JDBC Architecture
  • 3.9 Orthogonal Persistence Architecture
  • 3.10 Java Hibernate Example
  • 3.11 Sample C# DLinq Application
  • 4.1 Code Generation Tool

List of Acronyms

API Application Programming Interface COM Component Object Model DB Database DBMS Database Management System RDBMS Relational Database Management System DDB Deductive Database POJO Plain Old Java Object POJI Plain Old Java Interface OO Object Oriented OOP Object Oriented Programming OOL Object Oriented Language ODBC Open Database Connectivity OLE DB Object Linking and Embedding, Database ISAM Indexed Sequential Access Method JDBC Java Database Connectivity OP Orthogonal Persistence ECRC European Computer-Industry Research Centre MCC Microelectronics and Computer Technology Corporation

x

DJ Deductive Java

VLDB Very Large Databases LB LogicBlox

JNI Java Native Interface

SQL Structured Query Language

PL/SQL Procedural Language / Structured Query Language

LINQ Language INtegrated Query

IDE Integrated Development Environment

XML Extensible Markup Language

xi

Andrew B. Hall Chapter 1. Introduction 2

Due to the increasing volumes of global data and applications for database technology, the field is currently on the edge of a new era. No one familiar with database technology can argue that for the past 25 years traditional large vendor relational databases such as Oracle, DB2, and SQL Server have been the gold standard. Almost every large scale database development effort has used a traditional database on the back end. There are many reasons for the success of traditional relational databases, including incredible performance across a wide spectrum of applications, high reliability, familiarity of developers with the technology, and the support of large vendors.

1.1 Motivation For Change

Despite the overwhelming success and standardized use of traditional relational databases, some people are beginning to question whether they will continue to rule the future of database technology. For example, Michael Stonebraker of Massachusetts Institute of Tech- nology (MIT) argues that the current mainstream database technology is no longer adequate to solely support the needs of data management systems. He points out that modern day relational technology can directly trace its roots back to the first relational database manage- ment system (RDBMS) of the 1970’s [96]. He argues that even though the entire landscape of computing (users, processing power, applications, etc.) has changed since the 1970’s, database vendors have chosen to stick with their traditional technology, or a “one size fits all” strategy [95]. He acknowledges that many improvements to the technology have been made, but states that this is for the express purpose of continuing to sell the technology without re-architecting it [96]. In his paper “One Size Fits All? Part 2: Benchmarking Re- sults” [94], Stonebraker presents benchmarking evidence that “the major RDBMSs can be beaten by specialized architectures by an order of magnitude or more in several application areas”, including:

  • Text

Andrew B. Hall Chapter 1. Introduction 3

  • Data Warehouses
  • Stream Processing
  • Scientific and intelligence databases

Stonebraker is not alone in his assessment of the current state of database technology either. In an industry keynote at Very Large Databases 2007, Michael Brodie said “The confluence of limitations of conventional DBMSs, the explosive growth of previously unimagined ap- plications and data, and the genuine need for problem solving will result in a new world of data management” [17]. He makes the claim that the growth of information between 2006 and 2010 will result in less than 5% of the world’s data falling into the “relational” category. Brodie then went on to say that the data management world should “embrace these opportunities and provide leadership” for new data management solutions as the field of computing continues to evolve.

This assessment should not be a surprise to anyone familiar with the current computing landscape. As the use of computer and database technology has surged, so have the vol- umes, applications of, and need for data. It is therefore not surprising that the traditional technologies of the 1980’s designed for general purpose use can be outperformed by solutions designed for special case data management scenarios. As Carrie Ballinger of TERADATA aptly points out, ”dermatologists don’t perform brain surgery, sprinters dont make good long distance runners, and gas-efficient commuter cars dont win stock car races. Nowhere does excellence in one category of endeavor translate to excellence in everything” [10]. The field of data management has seen many non-traditional technologies since its inception, however very few if any of these have been able to gain widespread use in industry. This thesis presents an alternative type of database technology called a “deductive database”. A brief overview of deductive database technology and its applications is given, and then the research done to develop a bridge for providing deductive database functionality to the Java programming language in an intuitive way that hides the intricacies of deductive technology from the programmer.

Andrew B. Hall Chapter 1. Introduction 5

Deductive database systems generally divide their information into two categories:

  • Data or facts, that are normally represented by a predicate with constant arguments. For example, the fact parent(joe, sue), means that Sue is a parent of Joe. Here, parent is the name of a predicate, and this predicate is represented by storing in the database a relation of all the true tuples for this predicate. Thus, (joe, sue) would be one of the tuples in the stored relation [86].
  • Rules or program, normally written in the form

P ← Q 1 ,...,Qn

This rule is read declaratively as “Q 1 and ... and Qn implies P.

Databases of this form are termed Datalog databases [105]. The data or facts are referred to as the extensional database (EDB), and the rules are referred to as the intensional database (IDB) [86].

It is important to observe that despite the increased power that deductive databases offer, they never made their way into the realm of commercial technology. There may be many contributing factors to this, including the fact research on deductive databases occurring in parallel with relational databases resulted in a large number of contributions being applied to relational database implementations. Through the years, relational databases have continued to add support for more and more scenarios and types of data through features such as triggers, stored procedures, and views to name a few. In light of the failure of deductive database to make inroads into industry, and the fact that deductive databases can be defined as an extension of a relational database, it is important to understand a basic history of database evolution (both relational and deductive).

Andrew B. Hall Chapter 1. Introduction 6

Table 1.1: Major RDBMS Releases Year RDBMS 1976 System R (IBM) 1978 Oracle 1983 DB2 (IBM) 1987 SQL Server (UNIX Version) 1988 SQL Server (OS/2 Version) 1993 Microsoft SQL Server 1998 MySQL (Windows Version)

1.2.1 Database Evolution

Early database systems began to appear in the mid-1960’s and fell into three paradigms: hierarchical systems [101], network model based systems [18], and inverted file systems. Two major short comings of these systems were they that failed to separate the conceptual relationships of data from its physical storage, and only provided procedural programming language interfaces [34]. The first relational database system was published by E.F. Codd of IBM Research, San Jose, California in 1970 [19]. In this model, the physical storage of data was separated from its conceptual representation, and a high level query (logic) language in the form of relational calculus and relational algebra was introduced, that allowed the programmer to specify what “was to be done”, and the database system to decide “how it was to be done”. This query language allowed programmers who were not computer specialists to write declarative queries, and the computer to answer those queries. The subsequent development of syntactic optimization techniques [104, 106] permitted relational database systems to process queries with efficiency competitive to existing hierarchical and network implementations. Table 1.1 gives a historical overview of major RDBMS releases.

Even though relational databases used a logic language in relational calculus, relational calculus was not formalized in terms of logic [75]. It was not until 1984 that Raymond Reiter first formalized databases in terms of logic by observing that relational databases make many underlying assumptions about their data [89]. Specifically, he built off his previous work observing that with respect to negation, an assumption was being made that facts

Andrew B. Hall Chapter 1. Introduction 8

Another key work which served to forward the field was a publication by Minker, Gallaire, and Nicolas in 1984 surveying current work in the field at that time titled “Logic and Databases: A Deductive Approach” [74], followed by a second paper responding to critics of the first paper titled “Logic and Databases: A Response” [38]. It was at this time that three major deductive database projects emerged. Work at the European Computer-Industry Research Centre (ECRC) under the direction of Jean-Maire Nicolas began in 1984. The Logical Data Language (LDL) project was begun at the end of 1984 through the newly created Microelectronics and Computer Technology Corporation (MCC) research consortium, and in 1985 Stanford began the NAIL! (Not Another Implementation of Logic!) project.

Work by the ECRC can be divided into two major phases, an initial phase from 1984 to 1987, and a secondary phase from 1988 to 1990. The initial phase of research (’84-’87) led to the development of early prototypes of the deductive systems QSQ, SLD-AL, QoSaQ, and DedGin [108, 109]), integrity checking for SQL in Soundcheck by H. Decker [31], and a combination of deductive and object oriented database ideas in KB2 by M. Wallace [112]. The second phase (’88-’90) led to the more functional prototypes: Megalog by J. Bocca [15], and DedGin [111] and EKS-V1 [110] by Vieille. The EKS system supported integrity constraints and some some forms of aggregation through recursion.

The LDL project was perhaps the most successful deductive database project in history. It began at the end of 1984 in the the Database Program of the MCC. The objective was to build a database system with a powerful language for developing complex data-intensive and knowledge-based applications [116]. The name LDL first appeared in a paper by Tsur and Zaniolo [102] in VLDB 1986; the paper describes the main constructs in the language, such as complex terms and recursion, sets, set aggregates, and database schema declarations. The first prototype completed in 1987, compiled LDL programs into an extended relational algebra language designed for a non-existent MCC parallel database machine. Therefore, in 1987, the deductive computing laboratory of MCC began development of a new stand-alone deductive database system for LDL which was completed in 1989. The completion of the LDL prototype allowed the development of new applications that revealed the potential of

Andrew B. Hall Chapter 1. Introduction 9

LDL in several areas including: the rapid prototyping of information systems, intelligent middleware and information brokering, data cleaning and conversion, and data mining. The subsequent delivery of LDL technology to MCC’s shareholders resulted in the development of LDL++ in 1992 [118].

Beginning in 1992, the LDL++ system was deployed in several middleware applications; including as an important component of the Infosleuth [56] technology suite that was used to construct intelligent agents that supported the semantic interoperability of distributed databases and heterogeneous information resources. LDL++ continued to be developed at UCLA [115] resolving research issues limiting the language’s power. Finally a new version of LDL++ which included support for non-monotonic recursion and generalized user-defined set aggregates was developed in 1999 by Haixun Wang [113].

The NAIL! project began at Stanford in 1985. It was based off of J.D. Ullman’s paper “Im- plementation of Logical Query Languages for Databases” [103], with the goal of studying the optimization of logic using the database-oriented “all-solutions” model. The first paper on Magic Sets [11] and regular recursions [77, 78] resulted from this project in collaboration with the MCC group. Many of the important contributions for handling negation and aggre- gation in logical rules were also made by the project. Stratified negation [40], well-founded negation [41], and modularly stratified negation [90] were also developed in connection with this project. An initial prototype system named YAWN! “Yet Another Window on NAIL!” [76] was built, but eventually abandoned because the purely declarative paradigm was found to be unworkable for many applications. The revised system used a core language, called Glue, which was single logical rules, wrapped in traditional language constructs such as loops, procedures, and modules [80, 32, 33].

Many other deductive database implementations have arisen over the years, of which an excellent overview can be found in the book “Applications of Logic Databases” [82] edited by R. Ramakrishnan. These include the Aditi project [107] which began at the University of Melbourne in 1988, the CORAL project [84, 85] begun at the University of Wisconsin