Data-Centric Web Application Framework, Slides of Communication

Application data in native data structures . ... Web application with native data output. ... computer and process information stored on it is in place.

Typology: Slides

2022/2023

Uploaded on 03/01/2023

shahid_88c
shahid_88c 🇺🇸

4.4

(26)

261 documents

1 / 136

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
MasarykUniversity
FacultyofInformatics
Data-CentricWebApplicationFramework
JanPazdziora
Dissertation
Brno,September2003
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download Data-Centric Web Application Framework and more Slides Communication in PDF only on Docsity!

Masaryk University

Faculty of Informatics

Data-Centric Web Application Framework

Jan Pazdziora

Dissertation

Brno, September 2003

Data-Centric Web Application Framework

Jan Pazdziora Dissertation Advisor: Prof. Dr Jiří Zlatuška, CSc. Brno, September 2003

Produced from DocBook XML source using The SAXON XSLT Processor from Michael Kay and XEP – an XSL Engine for PDF developed by RenderX, Inc. Printed using TrueType fonts Georgia by Microsoft Corporation and Luxi Mono by Bigelow & Holmes, Inc. and URW++.

Table of Contents

Abstract............................................................. vii

  • I. The Motivation: Area and Problem Statement Introduction viii
      1. Growing depth of Web technology deployment
      1. Text-based interfaces promote quick solutions
      1. Multiplicity of syntax and markup
      • 3.1. Server side includes
      • 3.2. CGI scripts and programs
      • 3.3. Templating techniques
      • 3.4. Access to database backend
      1. Lack of clear internal interfaces
      • 4.1. Interfaces in current Web applications
      • 4.2. Communication among team members
      1. The goal of this work
  • II. The Background: Characteristics of Web-Based Applications
      1. Network-based applications
      • 6.1. Applications and data on client
      • 6.2. Applications on client and data on server
      • 6.3. Both applications and data on server
      • 6.4. Conclusion of classification
      1. Request and response nature of HTTP
      1. Centralized server part
      • 8.1. Central and professional maintenance
      • 8.2. Fast development cycles
      • 8.3. Instant upgrades
      • 8.4. Scalability using external resources
      • 8.5. Performance
      1. Client side highly distributed
      • 9.1. Uncontrolled client side
      • 9.2. Increased feedback potential
      • 9.3. Cost efficiency
      • 9.4. Web in all information-handling environments
      1. Use of extreme programming
      • 10.1. Specification of the methodology
      • 10.2. Cost of change
      • 10.3. Extreme programming and Web projects
  • III. The Solution: Intermediary Data Layer
      1. Application data in native data structures
      1. Data matched against formal structure
    • 12.1. Data serialization
    • 12.2. Overview of the data structure description format
    • 12.3. Examples of data structure description document
    • 12.4. Operation of data-centric applications
    • 12.5. Design notes
    1. Data structure description reference
    • 13.1. Processing steps
    • 13.2. Element names
    • 13.3. Input parameters
    • 13.4. Including fragments of data structure description
    • 13.5. Data placeholders
    • 13.6. Regular elements
    1. Output postprocessing
    • 14.1. The presentation postprocessing
    • 14.2. Alternate HTTP response
    • 14.3. Operation in CGI mode
    1. Implementation
    • 15.1. RayApp reference implementation
    • 15.2. Performance of the implementation
    1. Structure description as a communication tool
    • 16.1. Analysis
    • 16.2. Development and tests of application code
    • 16.3. Development and tests of presentation layer
    • 16.4. Structure of the Web system
    • 16.5. Handling changes
    • 16.6. Audit and logging
    1. Multilingual Web systems
    • 17.1. Localized database data sources
    • 17.2. Application-transparent setting
    • 17.3. Localization of other database objects
    • 17.4. Localization of time and numeric values
    • 17.5. Preprocessed stylesheets for static texts
    • 17.6. Multiple content data sources
    1. Migration and implementation rules
    • 18.1. Separation of tasks
    • 18.2. Migration of existing applications
    1. Conclusion and further work
  • References
  • Curriculum Vitæ
  • 2.1. Server validates documents but sends even those not well-defined. List of Figures
  • 4.1. Structure of Web application based on templates.
  • 4.2. Interfaces in Web systems.
  • 7.1. Requests and responses in user's work with Web-based system.
  • 11.1. Core of a Web application.
  • 11.2. Web application with native data output.
  • 12.1. DSD-based Web application.
  • 17.1. Database table courses.
  • 17.2. Localized database views.
  • 3.1. Tags and escaping in HTML. List of Tables
  • 3.2. String quoting and escaping in Perl.
  • 6.1. Categories of network-based applications.
  • 8.1. Numbers of applications to Masaryk University.
  • 13.1. DSD: Reserved element names.
  • 13.2. DSD: Attributes of parameter specification.
  • 13.3. DSD: Parameter types.
  • 13.4. DSD: URIs allowed in typeref.
  • 13.5. DSD: Attributes of data placeholders.
  • 13.6. DSD: Built-in data types.
  • 13.7. DSD: Values of attribute multiple.
  • 13.8. DSD: Attributes for application localization.
  • 13.9. DSD: Conditional attributes for regular elements.
  • 13.10. DSD: Attributes for attribute management.
  • 15.1. RayApp performance.
  • 15.2. Apache HTTP server performance.
  • 2.1. A snippet of HTTP communication, using telnet as client. List of Examples
  • 2.2. A simple CGI script.
  • 2.3. A simple HTML page.
  • 2.4. Simple page in valid XHTML.
  • 3.1. Server side includes.
  • 3.2. Part of email message.
  • 3.3. Perl code printing HTML markup.
  • 3.4. Using function calls to generate markup.
  • 3.5. PHP script producing HTML.
  • 3.6. PHP script printing data without escaping.
  • 3.7. Mixture of syntax in PHP script.
  • 3.8. PHP script running SQL queries without escaping.
  • 3.9. Perl script using bind parameters.
  • 3.10. SQL queries can still be created dynamically.
  • 4.1. Representing list of students in HTML.
  • 4.2. Script doing updates and producing HTML.
  • 4.3. Script calling module.
  • 7.1. HTTP request with GET method.
  • 7.2. Simple mod_perl handler.
  • 12.1. Data structure description with a single scalar placeholder.
  • 12.2. Output XML for various return data structures.
  • 12.3. Data structure description with only root element.
  • 12.4. Output XML for single root placeholder.
  • 12.5. Data structure description for a search application.
  • 12.6. Returned list of people for the search application.
  • 12.7. Output XML for list of people.
  • 12.8. Associative array is equivalent to regular array.
  • 12.9. Single person result for the search application.
  • 12.10. Output XML for a single-person result.
  • 12.11. Associative array naming with values.
  • 12.12. Serializing associative array.
  • 14.1. Perl module for data-centric processing in external process.
  • 14.2. Perl module for server-side postprocessing of CGI application.
  • 15.1. RayApp Synopsis.
  • 17.1. Code handling multiple languages with multiple branches.

Introduction

Suppose that Alice wants some piece of information from Bob. She walks into his office and asks him. Alternatively, if she is not right in the same building, she may pick up a phone and call. Bob recognizes Alice's face or voice and tells her whatever she was inquiring about because he knows Alice is entitled to know. Maybe the information is public and he does not even have to verify Alice's identity — he is ready to tell the answer to anybody who would care to ask. Ex- amples may include an employee asking the personnel department how many days off she has left in the current year, or the head of department verifying her employees' bonuses. Typical queries about public information are students asking registrar office about a schedule of a course or exam at a university, or a customer interested in price of a certain product that Bob's company might be selling.

Chances are, Bob does not hold all the knowledge in his head. He may use a computer to store and manage the information. So whenever somebody walks in or calls him on the phone to ask, he uses his computer to find the pieces of information needed, and then reads the answer from the screen of his monitor. Alternatively, Bob might not know the answer at all, nevertheless he may redirect the question to someone else who holds the necessary information in his or her head, or computer. The questions that Bob answers may also come in paper letters or be faxed in, or may arrive into Bob's email folder. Then, Bob has to write the answers down.

In the described situation, Bob serves only as an intermediary in the process of shifting the information around for many questions and answers, often adding no extra value. However, what he may add are mistakes and typos when reading and reciting the answers or writing them down because people are notoriously known not to be very successful in doing routine jobs, like reading and copying numbers or long texts. Less successful than computer. Moreover, it is either necessary to catch Bob in his office at the time he is there, or wait until he finds time to write the answer and send it back. If there are many people who need information stored somewhere in Bob's head, office or computer, Bob becomes an obstacle rather than help in getting the answers. Everyone depends on a human with limited availability and throughput.

In order that Bob gives correct answers, he has to keep his knowledge up-to-date and keep re- cords of any changes that happen in the real life. Suppose Bob has schedules of courses at the university in his notepad or in his computer. Whenever a change occurs because two teachers swapped lecture rooms or new course or presentation of visiting professor got scheduled, Bob has to be notified, so that next time he is asked, he provides correct information. Thus, people pour into Bob's office or send in papers, to let him know about changes that have happened.

Since the information that Bob provides to Alice and other people is probably stored in his computer in some way, Alice and others may want to be able to access the information directly,

viii

to learn what they may want to know without bothering Bob. Or without being bothered with Bob. Bob in turn would have more time to provide specialized advice and consultations, or ex- plain the data in nontypical cases.

In order to access information stored in remote machine, computer networks are used as core infrastructure that allows computers to communicate, effectively allowing users of those com- puters to drive them to fetch or change data on other machines. Preferred models of storing and retrieving information using computers have shifted during the course of time, as the technology has evolved and relative strengths of computing power and storage versus networking capabilities (all with relation to price) have changed. From centralized computer systems with no direct remote access, to terminal accessible computers, and towards highly crumbled com- puter systems in the form of desktop PCs (personal computers). And back, via local area networks (LAN) and their occasional interconnections towards their total integration [FitzGerald01]. Nowadays, a computer network is merely the synonym for the Internet , using Transmission Control Protocol/Internet Protocol (TCP/IP, [Postel81, Comer95]) for connection oriented data transfers, with a set of additional standards for controlling the network and for specialized tasks, for example Domain Name Service (DNS, [Mockapetris87]) for name resolution. From the technical point of view, any computer in the world can be connected to the network, and there is the only global network, the Internet. Computer network infrastructure to reach any computer and process information stored on it is in place.

Being able to transport pieces of information across computer networks using wires or waves is only the beginning of their reasonable utilization. It is more important whether the computers are able to conduct a communication which would empower a human (or software program) at one end with reasonable access to information stored and managed on the remote system. In order to achieve this task of reaching and utilizing the remote pieces of information, sets of communication protocols, standards and formats were defined and became de facto standards all around the globe. The most viable since 1990s is HyperText Transfer Protocol (HTTP, [HTTP]), which together with HyperText Markup Language (HTML, [HTML]) formed the World Wide Web (WWW, or simply the Web, [Berners96]) as a user-friendly layer on top of the raw network.

Unified global protocols of the Web enable to anybody who has a computer connected to some sort of network providing access to the Internet to fetch the news from CNN's Web, or more specifically use software that utilizes open standards to retrieve and display document addressed by http://www.cnn.com/ URL (Uniform Resource Locator, [Fielding94]). This is not to say that all computers are connected or that all people have access to the Internet. No, the world is not that far yet [Global02]. However, the point is that if somebody has a computer or is just getting one, it is reasonable to assume that such a device will be able to provide its users with access to information sources on the Internet, and notably on the Web. That includes hardware

Introduction

ix

of the organization may comprise hundreds of applications and their maintainability becomes the corner-stone of its long-term viability.

Observations about Web-based information system development and deployment depicted in this work come mainly from the development of Information System of Masaryk University [ISMU02]. This system for study administration has been gradually built since December 1998, supporting Masaryk University's conversion towards credit-based study programs and providing administrative and communication tool for all of its over twenty thousand full-time and nine thousand part-time students, all teachers, administrative staff and management of the university, faculties and departments. Alternative views were also gathered when the system was deployed at Faculty of Humanity Studies of Charles University and at private College of Finance and Administration. These two institutions outsourced the system and amended the in-house experience of the development team [Brandejs01] with a provider-supplier case.

The IS MU project uses open and progressive technologies, both during the development and the production operation. However, as the number of applications in the system reached into hundreds, existing styles of Web application development proved to be suboptimal. The exper- ience with the current frameworks and analysis of their shortcomings (Part I) come from direct daily use of common development approaches. The design of the current dynamic application on the Web is mostly user-interface-driven, even if some approaches try (and mostly fail) for content and presentation separation. Applications only use one output markup language, redu- cing reusability of the system when different format is called for. In addition, the output HTML markup is spread across the code of individual applications and in supporting modules and the way output pages are generated leads very easily to non-valid documents.

In the Web environment, changes usually come from outside and are caused by external devel- opment in the technology. Thus, they cannot be avoided. External change and uncontrolled user bases are among the most prominent aspects of Web-based systems and affect the way dynamic administrative systems can be developed and deployed (Part II). Often, a minor change would be needed in the output format, leading to modification of all applications in the system because the markup is stored in those applications.

Therefore, a move from the presentation, HTML-bound development towards a framework which will be presentation-independent and data-centric is called for (Part III). Data structure description is designed to define the interface between business logic and presentation, bringing formalization to the application design, development, and testing, both in the content and in the presentation part. Using native data structures, the output markup is completely removed from the code of individual applications. Clean interfaces leave enough freedom for programmers and Web designers to perform their art, while bringing structure and validation to otherwise textual Web system. On top of the data-centric model, multiple output formats and multilingual

Introduction

xi

applications can be created, leaving the application code free of any format- or language-spe- cific branches.

The framework is believed to be production ready, should the development team of the IS MU or of any other project wish to utilize it. This work includes both business-oriented analysis to help project managers discover the new methodology and decide whether it ought to be deployed in the development of their Web systems, and a lower-level technical view of the solution aimed at application programmers, which should make them comfortable with the radically different aproach and provide them with implementation roadmap.

Introduction

xii

Chapter 1. Growing depth of Web technology deployment

While the original design goal of the World Wide Web (the Web) technology addressed the need of sharing and linking information to facilitate communication among individuals [Berners96], it has grown to a strong commercial channel and a way through which millions of business operations are conducted every day. Firms have realized the potential of the Web to bring (or take away) customers, other organizations can see its potential as a tool for cost reduction, free speech, and easy information distribution. Thus, even if the sharing capability of the Web is still in place, it also holds a strong position as a technical means of core operation of organizations and enterprises of any size [Tapscott96]. Such usage is more internally-focused and often completely replaces the original internal infrastructure of the firm or institution. Due to the technology achievements, new virtual enterprises appear and the notions of firm and business get redefined [Zwass98]. The whole new e-commerce area is based on new commu- nication paths being available, which enables new groups of people and entities to conduct communication and information exchange.

The speed and level to which firms and organizations adopt the Web technology depends on benefits that they see coming from this media. Internet oriented businesses that based their core competence on the new communication channels and customer and user behavior, like eBay, E-Trade, or Amazon.com [CENISSS00], certainly had a higher commitment in the Web from the very beginning of their operation than classical enterprises that only slowly accepted Internet into their plans and carefully weighted each move, often without a clear long-term strategy [Raghunathan97].

The depth of Web applications deployment can be explored from various points of view, focusing on the technical or procedural integration. The technical aspect leads to generations [Cope- land97] based on features that can be seen by the external user:

  1. Information publication, usually static pages.
  2. Queries against database providing on-line read-only access.
  3. Direct updates of the data base (like orders or payments) possible.

The primary criterion in this classification is the direction of actions that end users can do in with the data in the system.

From the operational point of view , it is more relevant to focus on levels to which the Web in- terface is linked to the core operation of the organization [Felix97]:

  1. Information in static pages, maintained either by hand or generated off-line from internal primary systems.
  1. A small number of dynamic services on top of this static data, for example searches, com- pletely separated from the core data sources.
  2. Actions and changes can be conducted in the Web system, but the data base is only occa- sionally synchronized with the primary systems.
  3. The Web applications access the same data sources as other systems in the organization, on-line. They support the same processes, often taking over their functions.

Change from the level three to the level four seems to be the most fundamental because only here the Web interface is considered equal to other, more traditional means of data handling and information processing.

The processes that are conducted in the organization and their relation to records in computer systems are yet another type of possible classification [Pazdziora01b]:

  1. Actions are done on paper or in other material form and passed over for later digitization, which is often done by separate specialized departments.
  2. The person who did the actions in real life records them into the information system themselves, and is responsible for this record.
  3. Actions are only done in the computer system and they are only considered finished when they are recorded in the database. Any material form (transcript, confirmation) which is produced is based on the primary electronic records.

Again, the shift to the last stage is the essential one because it modifies the notion of processes and actions in the organization. At the same time, it removes inconsistencies between what is the state of affairs in reality and what is recorded in the computer system.

Note that the classifications above are fairly independent — the information system may be fully on-line, but the data is entered based on paper records. And vice versa, all employees may be updating data in the Web-based system and be responsible for it, but the interconnection with the core information systems is only done off-line, in batch jobs. Nevertheless, often a shift to higher levels in one classification is accompanied by reevaluation of the role of the Web- based system as a whole and followed by a shift in other categories as well.

While many large enterprises have already moved their operation to those higher levels of previous classifications [Chang00], small and medium enterprises [NOIE01], educational in- stitutions or state, municipal and public bodies [EC01, ESIS01] often find themselves at the very beginning. Web is still seen as an unnecessary burden by some institutions, so when the external need finally prevails, only the simplest and fastest solution is used to cover the imme- diate pressure. This is quite understandable, big commitments cannot be expected when then technology and its outcomes are new for the organization.

  1. Growing depth of Web technology deployment

Chapter 2. Text-based interfaces promote quick solutions

The beginning of the design of the World Wide Web dates back to 1989. The design was produced with flexibility and interoperability as the main concept. For example, the Universal Resource Identifier [URI] as the addressing mechanism on the Web was built around the following three design criteria [Berners94]:

  • Extensibility;
  • Completeness;
  • Printability.

While extensibility and completeness is a usual requirement, printability gears towards special goal. Printability means that the naming of resources and documents in the Web is text-based and humanly readable. Thus, the address of a document can and is expected to be written down or spelled out by human, without any machine needed.

Likewise, the HyperText Transfer Protocol (HTTP) [HTTP] and HyperText Markup Language (HTML) [HTML] are textual formats, or at least text-based. In HTTP [Fielding99], the request line and request headers, the status line of the response, as well as the response headers are defined in terms of text lines. Even the separation of the HTTP message headers from the message body is denoted as empty line in the protocol specification. The notion of lines suggests that the information is supposed and planned to be manipulated by text editors and in text- oriented programming paradigms, as opposed to protocols that are defined in terms of fields with binary types and binary content.

Example 2.1. A snippet of HTTP communication, using telnet as client. $ telnet www 80 Trying 147.251.48.1... Connected to www. Escape character is '^]'. GET /~adelton/ HTTP/1. Host: www Accept-Language: cs,en HTTP/1.1 200 OK Date: Thu, 10 Jul 2003 13:55:33 GMT Server: Apache/1.3.27 (Unix) mod_ssl/2.8.12 OpenSSL/0.9.7a Content-Location: index.html.cs Vary: negotiate,accept-language,accept-charset Last-Modified: Mon, 30 Jun 2003 10:27:35 GMT Accept-Ranges: bytes Content-Length: 2857 Connection: close Content-Type: text/html; charset=iso-8859- ... and many additional lines of HTTP response headers and body follow.

Text- and line-oriented nature of the core Web protocol makes it possible to watch and debug the Web operation using generic utilities like telnet, as demonstrated by a transcript of real communication in Example 2.1. Similarly, the simplest Web application in the form of Common Gateway Interface (CGI) [NCSA95] only takes a few lines in typical scripting environments. Example 2.2 shows a Perl script that outputs HTTP header and body, separated by an empty line, denoted by two new-line characters. In order to make the program useful from a Web browser and to achieve some dynamic actions, processing of input parameters would likely need to be added. For the same reason, the content type might be changed to HTML, making use of HTML forms and hyperlinks. Nonetheless, both the input data and the output would still be in textual form.

Example 2.2. A simple CGI script. #!/usr/bin/perl print "Content-Type: text/plain\n\n"; print "Hello world.\n";

The HTML standard is based on SGML (Standard Generalized Markup Language) [Goldfarb91] and again, it is a humanly readable format which can be manipulated by text tools. Certainly more sophisticated editors and approaches can be found, but the simple HTML page like the one in Example 2.3 can be authored by anybody in plain text editor.

Example 2.3. A simple HTML page.

Title Select what you like most: All for forty

Text-based formats tend to be open, non-proprietary, easy to understand and handle. Data in text formats suggest they can be created and manipulated by humans, not only by computers. Indeed, that was one of the design goals behind HTML. However, humans are well known for not being very good in routine work and composing HTML pages in text editor while trying to keep the syntax correct is certainly routine, calling for typos and omissions. Furthermore, when dealing with textual data, humans work quite a lot with external knowledge and derived context. People can figure out missing pieces to understand the whole document or data set, they can work with partial information. Computers are only expected to do that when specifically told so, not in a case of for example payroll handling.

Unfortunately, the Web browsers have greatly contributed to the fact that incorrect HTML code is used all around the Internet because they are too forgiving to broken syntax, only to help the

2. Text-based interfaces promote quick solutions