Download Database Design and Transaction Processing Fundamentals and more Lecture notes Database Management Systems (DBMS) in PDF only on Docsity!
UNIT- 4
Database Design: Informal Design Guidelines for Relation Schemas, Functional Dependencies, Normal Forms Based on Primary Keys. Transaction Processing: Introduction to transaction processing, transaction and system concept, desirable properties of transactions. (Text 1: 14.1 to 14.3) (Text 1: 20.1 to 20.3)
Database Design
1.Informal Design Guidelines for Relation Schemas
4 - informal guidelines that may be used as measures to determine the quality of relation schema design: ■ Making sure that the semantics of the attributes is clear in the schema ■ Reducing the redundant information in tuples ■ Reducing the NULL values in tuples ■ Disallowing the possibility of generating spurious tuples 14.1.1 Imparting Clear Semantics to Attributes in Relations The semantics of a relation refers to its meaning resulting from the interpretation of attribute values in a tuple.
14.1.2 Redundant Information in Tuples and Update Anomalie Storing natural joins of base relations leads to an additional problem referred to as update anomalies. These can be classified into insertion anomalies, deletion anomalies, and modification anomalies. a) insertion anomalies, b) deletion anomalies, c) modification anomalies.
14.2 Functional Dependencies
14.2.1 Definition of Functional Dependency Definition. A functional dependency, denoted by X → Y, between two sets of attributes X and Y that are subsets of R specifies a constraint on the possible tuples that can form a relation state r of R. The constraint is that, for any two tuples t1 and t2 in r that have t1[X] = t2[X], they must also have t1[Y] = t2[Y]. Consider the relation schema EMP_PROJ in Figure 14.3(b); from the semantics of the attributes and the relation, we know that the following functional dependencies should hold
14.3 Normal Forms Based on Primary Keys
Normalization of data can be considered a process of analysing the given relation schemas based on their FDs and primary keys to achieve the desirable properties of (1) minimizing redundancy and (2) minimizing the insertion, deletion, and update anomalies discussed
Definition- The normal form of a relation refers to the highest normal form condition that it meets, and hence indicates the degree to which it has been normalized.
- A superkey of a relation schema R = {A1, A2, … , An} is a set of attri butes S ⊆ R with the property that no two tuples t1 and t2 in any legal relation state r of R will have t1[S] = t2[S].
- A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey anymore.
- If a relation schema has more than one key, each is called a candidate key. One of the candidate keys is arbitrarily designated to be the primary key, and the others are called secondary keys
- An attribute of relation schema R is called a prime attribute of R if it is a member of some candidate key of R. An attribute is called nonprime if it is not a prime attribute — that is, if it is not a member of any candidate key. 1) First Normal Form Definition. The only attribute values permitted by 1NF are single atomic (or indivisible) values.
14.5.1 Decomposition of Relations not in BCNF
- consider Figure 14.14, which shows a relation TEACH with the following dependencies: FD1: {Student, Course} → Instructor FD2: Instructor → Course
- This dependency means that each instructor teaches one course is a constraint for this application.
- Note that {Student, Course} is a candidate key for this relation and that the dependencies shown follow the pattern in Figure 14.13(b), with Student as A, Course as B, and Instructor as C. Hence this relation is in 3NF but not BCNF.
- Decomposition of this relation schema into two schemas is not straightforward because it may be decomposed into one of the three following possible pairs: 1. R1 (Student, Instructor) and R2(Student, Course) 2. R1 (Course, Instructor) and R2(Course, Student) 3. R1 (Instructor, Course) and R2(Instructor, Student) All three decompositions lose the functional dependency FD1.
20.1 Introduction to Transaction Processing
20.1.1 Single-User versus Multiuser Systems
- A DBMS is single-user if at most one user at a time can use the system, and it is multiuser if many users can use the system— and hence access the database— concurrently.
- Multiple users can access databases—and use computer systems—simultaneously because of the concept of multiprogramming, which allows the operating system of the computer to execute multiple programs—or processes —at the same time.
- A single central processing unit (CPU) can only execute at most one process at a time.
- However, multiprogramming operating systems execute some commands from one
- process, then suspend that process and execute some commands from the next process, and so on. A process is resumed at the point where it was suspended whenever it gets
14.1 TO 14.3 COMPLETED
its turn to use the CPU again. Hence, concurrent execution of processes is actually- interleaved , as illustrated in Figure 20. If the computer system has multiple hardware processors (CPUs), parallel processing of multiple processes is possible, as illustrated by processes C and D in Figure 20. 20.1.2 Transactions, Database Items, Read and Write Operations, and DBMS Buffers A transaction is an executing program that forms a logical unit of database pro cessing. A transaction includes one or more database access operations—these can include insertion, deletion, modification (update), or retrieval operations. One way of specifying the transaction boundaries is by specifying explicit begin transaction and end transaction statements in an application program; If the database operations in a transaction do not update the database but only retrieve data, the transaction is called a read-only transaction ; otherwise it is known as a read-write transaction. A database is basically represented as a collection of named data items. The size of a data item is called its granularity. The basic database access operations that a transaction can include are as follows:
- read_item(X). Reads a database item named X into a program variable. To simplify our notation, we assume that the program variable is also named X.
- write_item(X). Writes the value of program variable X into the database item named X. 20.1.3 Why Concurrency Control Is Needed We discuss the types of problems we may encounter simple transactions if data run concurrently.
- The Lost Update Problem. This problem occurs when two transactions that access the same database items have their operations interleaved in a way that makes the value of some database items incorrect.
- The Temporary Update (or Dirty Read) Problem. This problem occurs when one transaction updates a database item and then the transaction fails for some reason.
- The Incorrect Summary Problem. If one transaction is calculating an aggregate summary function on a number of database items while other transactions are updating some of these items, the aggregate function may calculate some values before they are updated and others after they are updated.
- Physical problems and catastrophes. This refers to an endless list of problems that includes power or air-conditioning failure, fire, theft, sabotage, overwriting disks or tapes by mistake, and mounting of a wrong tape by the operator.
20.2 Transaction and System Concepts
20.2.1 Transaction States and Additional Operations
For recovery purposes, the system needs to keep track of when each transaction starts, terminates, and commits, or aborts (see Section 20.2.3). Therefore, the recovery manager of the DBMS needs to keep track of the following operations:
- BEGIN_TRANSACTION. This marks the beginning of transaction execution.
- READ or WRITE. These specify read or write operations on the database items that are executed as part of a transaction.
- END_TRANSACTION. This specifies that READ and WRITE transaction operations have ended and marks the end of transaction execution.
- COMMIT_TRANSACTION. This signals a successful end of the transaction so that any changes (updates) executed by the transaction can be safely committed to the database and will not be undone.
- ROLLBACK (or ABORT). This signals that the transaction has ended unsuccessfully, so that any changes or effects that the transaction may have applied to the database must be undone. 20.2.2 The System Log
- To be able to recover from failures that affect transactions, the system maintains a log
- Typically, one (or more) main memory buffers, called the log buffers , hold the last part of the log file, so that log entries are first added to the log main memory buffer.
- When the log buffer is filled, or when certain other conditions occur, the log buffer is appended to the end of the log file on disk.
- In addition, the log file from disk is periodically backed up to archival storage (tape) to guard against catastrophic failures. The following are the types of entries—called log records —that are written to the log file and the corresponding action for each log record.
- In these entries, T refers to a unique transaction-id that is generated automatically by the system for each transaction and that is used to identify each transaction:
20.2.3 Commit Point of a Transaction
- A transaction T reaches its commit point when all its operations that access the database have been executed successfully and the effect of all the transaction operations on the database have been recorded in the log.
- Beyond the commit point, the transaction is said to be committed , and its effect must be permanently recorded in the database. 20.2.4 DBMS-Specific Buffer Replacement Policies The DBMS cache will hold the disk pages that contain information currently being processed in main memory buffers.
- Domain Separation (DS) Method. In a DBMS, various types of disk pages exist: index pages, data file pages, log file pages, and so on.
- Hot Set Method: The hot set method determines for each database processing algorithm the set of disk pages that will be accessed repeatedly, and it does not replace them until their processing is completed.
- The DBMIN Method. This page replacement policy uses a model known as QLSM (query locality set model), which predetermines the pattern of page references for each algorithm for a particular type of database operation.
20.3 Desirable Properties of Transactions
Transactions should possess several properties, often called the ACID properties;
- Atomicity. A transaction is an atomic unit of processing; it should either be performed in its entirety or not performed at all.
- Consistency preservation. A transaction should be consistency preserving, meaning that if it is completely executed from beginning to end without interference from other transactions, it should take the database from one consistent state to another.
- Isolation. A transaction should appear as though it is being executed in isolation from other transactions, even though many transactions are executing concurrently.
- Durability or permanency. The changes applied to the database by a com mitted transaction must persist in the database. (20.1-20.3)- UNIT-4 COMPLETED