



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An analysis of the SAS System as a cost-effective alternative to commercial Database Management Systems (DBMSs) for data storage. It discusses the features of DBMSs and how they compare to the SAS System, including data inventory services, update processing, relational data model, portability, multiple users access, and more.
Typology: Schemes and Mind Maps
1 / 6
This page cannot be seen from the preview
Don't miss anything!




Commercial Database Management Systems (DBMSs) provide applications with fast access to large quantities of data. In addition, many have other capabilities such as data integrity services, data sharing, application-creation tools, and report writing. Version 6 of the SAS® System also contains a number of similar features.
This paper examines the database features of the
services offered by several popular DBMSs. The conclusion is that the SAS System can provide a cost-effective alternative to a commercial DBMS for the storage of data.
Database Management Systems have been available for more than two decades and are frequently used as a repository for data. The applications that use this data are often not part of the DBMS and are either purchased from another vendor or developed by the user.
The SAS System is widely used as an application for data analysis. The data may come from a variety of repositories, including a number of DBMSs.
A definition of a DBMS is offered to use as the basis for answering the question posed in the paper's title. An inventory of features found in current DBMSs is provided and this inventory is compared to the DBMS features found in the SAS System.
With this background, an answer to the question of whether or not the SAS Systel)'l is a DBMS is given. More relevant, however, than the name you call your data repository are the features you really require from it.
An argument is made that the data management facilities in the Version 6 SAS System have matured sufficiently so that it is a viable candidate for your data repository.
Finally some of the DBMS features planned for future releases of the SAS System are identified.
A DBMS is a software package that provide's a repository for computerized data. The DBMS is responsible for storing the user's data in the repository and making it available upon demand, Users of the data are shielded from the details and peculiarities of the computer software and hardware by the DBMS. That is, a DBMS separates the application from the data. This separation is a key point and will be discussed in more detail.
A database is the term used in this paper for a logical collection of data managed by a DBMS. The terms record, row, and observation are synonyms as are column, field, and variable.
Data Separation
The objective is to separate the application from the data so that the application can focus on the external or logical aspects of the data such as analysis and presentation. The DBMS focuses on managing the internal or physical aspects of the data such as the type and quantity of storage devices and the bookkeeping necessary to support the data model.
As an example (in a relational data model), the application sees the data as rows and columns. The DBMS translates its internal storage structures into these rows and columns.
The fundamental responsibility of the DBMS, once the data are in the database, is to deliver the data back to an application. Query, selection, and update faciiities are manifestations of this responsibility.
Another benefit of data separation is data sharing. Once a database is created, its data can be accessed by mu ltiple applications.
Data Model
The data model defines the relationships that a,xist among the various data items in the database. Some examples of relationships are:
The Database Management System is responsible for supporting the relationships specified by the data model. Prior to DBMSs, this was the application's responsibility.
Beyond the Basics
Advancements in computer technology (e.g., more power, lower cost) placed additional burdens on DBMSs (e.g., user-friendly interfaces, improved performance). This brought demand for additional features from the DBMS.
As keepers of the data, DBMSs were required to solve these problems. Automatic query optimization, integrity constraints, high speed transactions, and point-and-click interfaces are a partial list of solutions provided by the DBMS vendors.
Although most DBMSs today have a variety of data presentation and analysis services, such features are not relevant to this discussion. Our focus here is on the storage and management of data.
In this section, features found in present-day DBMSs are identified. There may not be industry- wide agreement on the categories or definitions used here. This section is intended to serve as a general overview of the facilities available, not a comprehensive survey.
The features are divided into two general categories, basic and advanced. The basic features refiect the core functionality of a DBMS: data separation and data relationships. Tile more
advanced features are built upon the basic ones and reflect additions required by users to keep up with advancements in computer technology. There is no Significance to the order of presentation.
Examples of components in Release 6.08 of the SAS System are included with the description of each DBMS feature. The examples used here are not intended to be an exhaustive list of such components of the SAS System.
Basic
file management To create, 'populate, delete, and backup databases.
Examples of file management services in the SAS System are the DATA step and the COPY, CIMPORT, CPORT, and SOL procedures.
data inventory services To list and display information about the existing databases.
The DATASETS and CONTENTS procedures provide data inventory services in the SAS System.
query processing To retrieve the stored data, including data filtering, that is, selectionand projection.
The DATA step, SCL, the WHERE clause, and the PRINT, SOL, REPORT, and FSBROWSE procedures provide query processing in the SAS System.
update processing To change existing data in a database and add new data.
The DATA step, SCL, the SOL, APPEND, and FSEDIT procedures can be used for update processing in the SAS System.
relational data model To provide support for the data model that is most popular for new applications. (However, this is not a requirement for a system to be a DBMS.)
SAS data sets are composed of rows (observations) and columns (variables), and thus are relational tables.' The SOL procedure implements the de facto industry
No integrated audit trail currently exists for the SAS System. For a given application, the DATA step and SCL support user-written schemes for collecting such data.
rollforward To permit the recovery of a lost or damaged data set by the application of updates from an aud,it trail to an archived copy of the database.
The SAS System currently does not support a rollforward mechanism. For a given application, the DATA step and SCL support user-written schemes for collecting such data.
transactions with rollback To logically bind multiple updates into a single atomic update. That is, either all the updates are successfully applied to the database or none of them are applied. Rollback initiates the removal of pending updates in the atomic unit.
Currently there is no support for transactions in the SAS System
high volume transactions ," To provide very fast response time to a large number of requests, also known as On-Line Transaction Processing (OLTP). Here performance is of key importance .. The environment is usually highly interactive with many users. An exam.ple is an airline reservation system.
The SAS System has been tailored for fast sequential processing, and therefore is not well-suited to this type of application.
distributed data/distributed processing To support an environment with applications and data on separate platforms. A given database will reside entirely on a single platform.
SAS/CONNECT® software allows an application to access data from a different platform, and it permits the application to execute on another platform. SAS/ACCESS® software supports access to data on other platforms in some environments.
distributed databases To store parts of the same database on different platforms.
There is no support in theSAS System for distribution of a single data set across different platforms.
If you use the historical definition of a DBMS as a data repository that provides separation of data and applications, then the SAS System is clearly a DBMS.
If you choose a more contemporary definition of a DBMS. t~en the SAS System falls somewhat short of being a DBMS. It has a number of features found in many commercial DBMSs, but it does not have all of them.
However, this question .is really academic. A better question is "What specific requirements do you have for .your data repository?" If you have an OLTP environment, the SAS System will probably not satisfy your performance requirements. An Information Database environment that depends upon lots of rapid sequential. access to the databases, is likely to find the SAS System's performance very gOOd.
DBMS vendors position their product as a data repository. The applications that use the data are usually not provided by the DBMS vendor. The SAS System is positioned as a data analysis and information delivery system. That is, the SAS System is the application that uses the data.
The SAS System has facilities to access data in many different formats and repositories as has been mentioned earlier. Given that you want to process/analyze your data with the SAS System, then the question here is not access to the data but where the data are to be permanently stored.
There, are three basic choices for the data repository: flat/unstructured files, a commercial DBMS, or the SAS System. And there are SAS applications and non-SAS applications. With these variables, let's define six simple models:
model
primary application
data repository
non-SAS SAS non-SAS SAS non-SAS SAS
flat file flat file DBMS DBMS SAS System SAS System
The first two models are quite reasonable and common uses of flat files as data repositories. The SAS System, via the DATA step, has powerful facilities for accessing a wide variety of flat file formats.
Models 3 and 4 are the traditional ones with a DBMS as the data repository and non-DBMS applications as consumers of the data.
In a model 5 environment, Ihe DATA step can provide the data to applications in a wide variety of fiat file formats when the original data cannot be read by the applications.. The DATA step can produce multiple different fiat files, one for each of the different applications. While stored in SAS data sets, the data can be edited (to repair invalid values) and subseted prior to delivery to the applications.
The main premise of this paper is that model 6 is a viable model and should be carefully considered when deciding upon a data repository for SAS applications. The choice between model 4 and model 6 should be based upon the features you require from your data repository.
Version 6 of the SAS System lacks some features found in commercial DBMSs as has been described previously. If you do not have ani of these requirements for your data repository, then you should seriously consider using the SAS System.
The benefits of using the SAS System for the storage of your data include:
data analysis and data storage will eliminate the need for maintenance and system upgrades to another product (the DBMS), and it will provide a single source for problem resolution. Compatibility issues between different versions of the application software and the DBMS software will not exist.
The features listed below are under consideration for some future release of the SAS System. No details are given as the research and development is in progress and numerous issues remain to be resolved.