




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Paper; Class: Research and Dissertation; Subject: Computer Science; University: Virginia Polytechnic Institute And State University; Term: Unknown 1989;
Typology: Papers
1 / 102
This page cannot be seen from the preview
Don't miss anything!





























































































Andrew B. Hall
Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of
Master of Science in Computer Science
Dr. Eli Tilevich, Chair Dr. Osman Balci Dr. Naren Ramakrishnan
May 30, 2008 Blacksburg, Virginia
Keywords and phrases: Database Management System (DBMS), Database, Deductive Database, Column Oriented Database, Object Oriented (OO) Language, Orthogonal Persistence, Plain Old Java Object (POJO), Plain Old Java Interface (POJI), Dynamic Proxy, Middleware
Andrew B. Hall
Modern society is intrinsically dependent on the ability to manage data effectively. While relational databases have been the industry standard for the past quarter century, recent growth in data volumes and complexity requires novel data management solutions. These trends revitalized the interest in deductive databases and highlighted the need for column- oriented data storage. However, programming technologies for enterprise computing were designed for the relational data management model (i.e., row-oriented data storage). There- fore, developers cannot easily incorporate emerging data management solutions into enter- prise systems.
To address the problem above, this thesis presents Deductive Java (DJ), a system that en- ables enterprise programmers to use a column oriented deductive database in their Java applications. DJ does so without requiring that the programmer become proficient in de- ductive databases and their non-standardized, vendor-specific APIs. The design of DJ in- corporates three novel features: (1) tailoring orthogonal persistence technology to the needs of a deductive database with column-oriented storage; (2) using Java interfaces as a primary mapping construct, thereby simplifying method call interception; (3) providing facilities to deploy light-weight business rules.
DJ was developed in partnership with LogicBlox Inc., an Atlanta based technology startup.
Ramakrishnan has always been incredibly welcoming whenever I have had the occasion to spend time with him, and I greatly enjoyed his “Data Mining” class.
Next I would like to thank the kind folks at LogicBlox. The work presented in this thesis was the result of a partnership with the company. Wes Hunter and Dave Zook spent many hours helping us to grasp the concepts, challenges, and an understanding of the LogicBlox technology. Molham Aref believed in the project enough to fund us, and there were many other people at LogicBlox who contributed technical insights such as Greg Brooks, Soeren Oleson, Mark Bloemeke, and Steve Coulson. Special thanks go Erin Hunter, who handled all of the financial arrangements both with Virginia Tech, and for traveling to Atlanta.
I am incredibly thankful for all of my friends and roommates who were so supportive over the past two years. They were always there to say a kind word, provide encouragement, and in some cases even look over my thesis. My roommates put up with my hectic schedule, and even managed to not see me for days at a time. I would also like to thank my graduate school partners and lab-mates who I worked with here at Virginia Tech, especially Wes Tansey, who was always there to talk over things, and evaluate whether they made sense from an objective point of view.
Finally I would like to thank my future employer Microsoft Corporation for the opportunity to apply the lessons I have learned as a student at Virginia Tech solving real world problems.
iv
Abstract ii
Acknowledgments iii
List of Figures viii
List of Tables ix
List of Acronyms x
1 Introduction 1 1.1 Motivation For Change.............................. 2 1.2 Literature Review of Deductive Databases................... 4 1.2.1 Database Evolution............................ 6 1.2.2 Brief History of Deductive Databases.................. 7 1.3 Statement of the Problem............................ 10 1.4 Statement of Objectives............................. 11 1.5 Overview of Thesis................................ 11
2 LogicBlox: A Real World Need 13 2.1 LogicBlox Implementation Overview...................... 13 2.2 Overview of Column-Oriented Databases.................... 15 2.3 Objectives..................................... 19 2.4 Challenges.................................... 22 v
API Application Programming Interface COM Component Object Model DB Database DBMS Database Management System RDBMS Relational Database Management System DDB Deductive Database POJO Plain Old Java Object POJI Plain Old Java Interface OO Object Oriented OOP Object Oriented Programming OOL Object Oriented Language ODBC Open Database Connectivity OLE DB Object Linking and Embedding, Database ISAM Indexed Sequential Access Method JDBC Java Database Connectivity OP Orthogonal Persistence ECRC European Computer-Industry Research Centre MCC Microelectronics and Computer Technology Corporation
x
DJ Deductive Java
VLDB Very Large Databases LB LogicBlox
JNI Java Native Interface
SQL Structured Query Language
PL/SQL Procedural Language / Structured Query Language
LINQ Language INtegrated Query
IDE Integrated Development Environment
XML Extensible Markup Language
xi
Andrew B. Hall Chapter 1. Introduction 2
Due to the increasing volumes of global data and applications for database technology, the field is currently on the edge of a new era. No one familiar with database technology can argue that for the past 25 years traditional large vendor relational databases such as Oracle, DB2, and SQL Server have been the gold standard. Almost every large scale database development effort has used a traditional database on the back end. There are many reasons for the success of traditional relational databases, including incredible performance across a wide spectrum of applications, high reliability, familiarity of developers with the technology, and the support of large vendors.
Despite the overwhelming success and standardized use of traditional relational databases, some people are beginning to question whether they will continue to rule the future of database technology. For example, Michael Stonebraker of Massachusetts Institute of Tech- nology (MIT) argues that the current mainstream database technology is no longer adequate to solely support the needs of data management systems. He points out that modern day relational technology can directly trace its roots back to the first relational database manage- ment system (RDBMS) of the 1970’s [96]. He argues that even though the entire landscape of computing (users, processing power, applications, etc.) has changed since the 1970’s, database vendors have chosen to stick with their traditional technology, or a “one size fits all” strategy [95]. He acknowledges that many improvements to the technology have been made, but states that this is for the express purpose of continuing to sell the technology without re-architecting it [96]. In his paper “One Size Fits All? Part 2: Benchmarking Re- sults” [94], Stonebraker presents benchmarking evidence that “the major RDBMSs can be beaten by specialized architectures by an order of magnitude or more in several application areas”, including:
Andrew B. Hall Chapter 1. Introduction 3
Stonebraker is not alone in his assessment of the current state of database technology either. In an industry keynote at Very Large Databases 2007, Michael Brodie said “The confluence of limitations of conventional DBMSs, the explosive growth of previously unimagined ap- plications and data, and the genuine need for problem solving will result in a new world of data management” [17]. He makes the claim that the growth of information between 2006 and 2010 will result in less than 5% of the world’s data falling into the “relational” category. Brodie then went on to say that the data management world should “embrace these opportunities and provide leadership” for new data management solutions as the field of computing continues to evolve.
This assessment should not be a surprise to anyone familiar with the current computing landscape. As the use of computer and database technology has surged, so have the vol- umes, applications of, and need for data. It is therefore not surprising that the traditional technologies of the 1980’s designed for general purpose use can be outperformed by solutions designed for special case data management scenarios. As Carrie Ballinger of TERADATA aptly points out, ”dermatologists don’t perform brain surgery, sprinters dont make good long distance runners, and gas-efficient commuter cars dont win stock car races. Nowhere does excellence in one category of endeavor translate to excellence in everything” [10]. The field of data management has seen many non-traditional technologies since its inception, however very few if any of these have been able to gain widespread use in industry. This thesis presents an alternative type of database technology called a “deductive database”. A brief overview of deductive database technology and its applications is given, and then the research done to develop a bridge for providing deductive database functionality to the Java programming language in an intuitive way that hides the intricacies of deductive technology from the programmer.
Andrew B. Hall Chapter 1. Introduction 5
Deductive database systems generally divide their information into two categories:
P ← Q 1 ,...,Qn
This rule is read declaratively as “Q 1 and ... and Qn implies P.
Databases of this form are termed Datalog databases [105]. The data or facts are referred to as the extensional database (EDB), and the rules are referred to as the intensional database (IDB) [86].
It is important to observe that despite the increased power that deductive databases offer, they never made their way into the realm of commercial technology. There may be many contributing factors to this, including the fact research on deductive databases occurring in parallel with relational databases resulted in a large number of contributions being applied to relational database implementations. Through the years, relational databases have continued to add support for more and more scenarios and types of data through features such as triggers, stored procedures, and views to name a few. In light of the failure of deductive database to make inroads into industry, and the fact that deductive databases can be defined as an extension of a relational database, it is important to understand a basic history of database evolution (both relational and deductive).
Andrew B. Hall Chapter 1. Introduction 6
Table 1.1: Major RDBMS Releases Year RDBMS 1976 System R (IBM) 1978 Oracle 1983 DB2 (IBM) 1987 SQL Server (UNIX Version) 1988 SQL Server (OS/2 Version) 1993 Microsoft SQL Server 1998 MySQL (Windows Version)
Early database systems began to appear in the mid-1960’s and fell into three paradigms: hierarchical systems [101], network model based systems [18], and inverted file systems. Two major short comings of these systems were they that failed to separate the conceptual relationships of data from its physical storage, and only provided procedural programming language interfaces [34]. The first relational database system was published by E.F. Codd of IBM Research, San Jose, California in 1970 [19]. In this model, the physical storage of data was separated from its conceptual representation, and a high level query (logic) language in the form of relational calculus and relational algebra was introduced, that allowed the programmer to specify what “was to be done”, and the database system to decide “how it was to be done”. This query language allowed programmers who were not computer specialists to write declarative queries, and the computer to answer those queries. The subsequent development of syntactic optimization techniques [104, 106] permitted relational database systems to process queries with efficiency competitive to existing hierarchical and network implementations. Table 1.1 gives a historical overview of major RDBMS releases.
Even though relational databases used a logic language in relational calculus, relational calculus was not formalized in terms of logic [75]. It was not until 1984 that Raymond Reiter first formalized databases in terms of logic by observing that relational databases make many underlying assumptions about their data [89]. Specifically, he built off his previous work observing that with respect to negation, an assumption was being made that facts
Andrew B. Hall Chapter 1. Introduction 8
Another key work which served to forward the field was a publication by Minker, Gallaire, and Nicolas in 1984 surveying current work in the field at that time titled “Logic and Databases: A Deductive Approach” [74], followed by a second paper responding to critics of the first paper titled “Logic and Databases: A Response” [38]. It was at this time that three major deductive database projects emerged. Work at the European Computer-Industry Research Centre (ECRC) under the direction of Jean-Maire Nicolas began in 1984. The Logical Data Language (LDL) project was begun at the end of 1984 through the newly created Microelectronics and Computer Technology Corporation (MCC) research consortium, and in 1985 Stanford began the NAIL! (Not Another Implementation of Logic!) project.
Work by the ECRC can be divided into two major phases, an initial phase from 1984 to 1987, and a secondary phase from 1988 to 1990. The initial phase of research (’84-’87) led to the development of early prototypes of the deductive systems QSQ, SLD-AL, QoSaQ, and DedGin [108, 109]), integrity checking for SQL in Soundcheck by H. Decker [31], and a combination of deductive and object oriented database ideas in KB2 by M. Wallace [112]. The second phase (’88-’90) led to the more functional prototypes: Megalog by J. Bocca [15], and DedGin [111] and EKS-V1 [110] by Vieille. The EKS system supported integrity constraints and some some forms of aggregation through recursion.
The LDL project was perhaps the most successful deductive database project in history. It began at the end of 1984 in the the Database Program of the MCC. The objective was to build a database system with a powerful language for developing complex data-intensive and knowledge-based applications [116]. The name LDL first appeared in a paper by Tsur and Zaniolo [102] in VLDB 1986; the paper describes the main constructs in the language, such as complex terms and recursion, sets, set aggregates, and database schema declarations. The first prototype completed in 1987, compiled LDL programs into an extended relational algebra language designed for a non-existent MCC parallel database machine. Therefore, in 1987, the deductive computing laboratory of MCC began development of a new stand-alone deductive database system for LDL which was completed in 1989. The completion of the LDL prototype allowed the development of new applications that revealed the potential of
Andrew B. Hall Chapter 1. Introduction 9
LDL in several areas including: the rapid prototyping of information systems, intelligent middleware and information brokering, data cleaning and conversion, and data mining. The subsequent delivery of LDL technology to MCC’s shareholders resulted in the development of LDL++ in 1992 [118].
Beginning in 1992, the LDL++ system was deployed in several middleware applications; including as an important component of the Infosleuth [56] technology suite that was used to construct intelligent agents that supported the semantic interoperability of distributed databases and heterogeneous information resources. LDL++ continued to be developed at UCLA [115] resolving research issues limiting the language’s power. Finally a new version of LDL++ which included support for non-monotonic recursion and generalized user-defined set aggregates was developed in 1999 by Haixun Wang [113].
The NAIL! project began at Stanford in 1985. It was based off of J.D. Ullman’s paper “Im- plementation of Logical Query Languages for Databases” [103], with the goal of studying the optimization of logic using the database-oriented “all-solutions” model. The first paper on Magic Sets [11] and regular recursions [77, 78] resulted from this project in collaboration with the MCC group. Many of the important contributions for handling negation and aggre- gation in logical rules were also made by the project. Stratified negation [40], well-founded negation [41], and modularly stratified negation [90] were also developed in connection with this project. An initial prototype system named YAWN! “Yet Another Window on NAIL!” [76] was built, but eventually abandoned because the purely declarative paradigm was found to be unworkable for many applications. The revised system used a core language, called Glue, which was single logical rules, wrapped in traditional language constructs such as loops, procedures, and modules [80, 32, 33].
Many other deductive database implementations have arisen over the years, of which an excellent overview can be found in the book “Applications of Logic Databases” [82] edited by R. Ramakrishnan. These include the Aditi project [107] which began at the University of Melbourne in 1988, the CORAL project [84, 85] begun at the University of Wisconsin