Database Design: Understanding Load, Frequency, and Redundancy, Lecture notes of Design

A series of lessons from a database design course, focusing on the concepts of load and frequency of operations, redundancy, and the translation of an ER schema into the relational model. It covers various types of relationships, including many-to-many, recursive, and one-to-many relationships, as well as weak entities and optional relationships. The document also discusses the importance of choosing main identifiers and the implications of redundancy on database access.

Typology: Lecture notes

2021/2022

Uploaded on 09/27/2022

brandonflowers
brandonflowers 🇬🇧

4

(13)

233 documents

1 / 31

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Database Design — 1CSC343 – Introduction to Databases
Week 11: Database Design
Database Design
Database Design
From an ER Schema to a Relational One
From an ER Schema to a Relational One
Restructuring an ER schema
Restructuring an ER schema
Performance Analysis
Performance Analysis
Analysis of Redundancies, Removing
Analysis of Redundancies, Removing
Generalizations
Generalizations
Translation into a Relational Schema
Translation into a Relational Schema
Database Design — 2CSC343 – Introduction to Databases
Supplier
Part
supplies
Customer
orders (1,N)
(1,N)
Part
name
Supplier
Part Customer
Part(Name,Description,Part#)
Supplier(Name, Addr)
Customer(Name, Addr)
Supplies(Name,Part#, Date)
Orders(Name,Part#)
Hierarchical
Network
Relational
Supplier Customer
Date
(1,N)
(1,1)
Designing a Database Schema
Designing a Database Schema
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f

Partial preview of the text

Download Database Design: Understanding Load, Frequency, and Redundancy and more Lecture notes Design in PDF only on Docsity!

CSC343 – Introduction to Databases Database Design — 1

Week 11: Database Design

Database Design Database Design From an ER Schema to a Relational OneFrom an ER Schema to a Relational One Restructuring an ER schemaRestructuring an ER schema Performance AnalysisPerformance Analysis Analysis of Redundancies, RemovingAnalysis of Redundancies, Removing GeneralizationsGeneralizations Translation into a Relational SchemaTranslation into a Relational Schema

CSC343 – Introduction to Databases Database Design — 2

Supplier

Part

supplies

(1,N)orders (1,N)Customer

Part name

Supplier

Part Customer

Part(Name,Description,Part#) Supplier(Name, Addr) Customer(Name, Addr) Supplies(Name,Part#, Date) Orders(Name,Part#)

Hierarchical

Network

Relational

Supplier Customer

Date (1,N)

(1,1)

Designing a Database Schema Designing a Database Schema

CSC343 – Introduction to Databases Database Design — 3

(Relational) Database Design

„ Given a conceptual schema (ER, but could also be a UML), generate a logical (relational) schema. „ This is not just a simple translation from one model to another for two main reasons:

  1. not all the constructs of the Entity- Relationship model can be translated naturally into the relational model;
  2. the schema must be restructured in such a way as to make the execution of the projected operations as efficient as possible. „ The topic is covered in section 3.5 of the textbook. This lecture unit uses material from other textbooks as well.

CSC343 – Introduction to Databases Database Design — 4

Database

Design

Process

CSC343 – Introduction to Databases Database Design — 7

Cost Model

„ The cost of an operation is measured in terms of the number of disk accesses required. A disk access is, generally, orders of magnitude more expensive than in-memory accesses, or CPU operations.

„ For a coarse estimate of cost, we assume that

9 a Read operation (for one entity or relationship) requires 1 disk access; 9 A Write operation (for one entity or relationship) requires 2 disk accesses (read from disk, change, write back to disk).

CSC343 – Introduction to Databases Database Design — 8

Employee-Department Example

CSC343 – Introduction to Databases Database Design — 9

Typical Operations

„ Operation 1: Assign an employee to a project. „ Operation 2: Find an employee record, including her department, and the projects she works for. „ Operation 3: Find records of employees for a department. „ Operation 4: For each branch, retrieve its departments, and for each department, retrieve the last names of their managers, and the list of their employees. „ Note: For UML class diagrams, these would be operations associated with persistent database classes.

CSC343 – Introduction to Databases Database Design — 10

Tables of Volumes and Operations

Concept Type Volume Branch E 10 Department E 80 Employee E 2000 Project E 500 Composition R 80 Membership R 1900 Management R 80 Participation R 6000

Operation Type Frequency Operation 1 I 50 per day Operation 2 I 100 per day Operation 3 I 10 per day Operation 4 B 2 per day

The volume of data and the general characteristics of the operations can be summed up using two special tables. Table of volumes Table of operations

I - Interactive B - Batch

CSC343 – Introduction to Databases Database Design — 13

Analysis

Steps

CSC343 – Introduction to Databases Database Design — 14

Analysis of Redundancies

„A redundancy in a conceptual schema corresponds to a piece of information that can be derived (that is, obtained through a series of retrieval operations) from other data in the database. „An Entity-Relationship schema may contain various forms of redundancy.

CSC343 – Introduction to Databases Database Design — 15

Examples of Redundancies

CSC343 – Introduction to Databases Database Design — 16

Deciding About Redundancies

„ The presence of a redundancy in a database may be 9 an advantage : a reduction in the number of accesses necessary to obtain the derived information; 9 a disadvantage : because of larger storage requirements, (but, usually at negligible cost) and the necessity to carry out additional operations in order to keep the derived data consistent****. „ The decision to maintain or eliminate a redundancy is made by comparing the cost of operations that involve the redundant information and the storage needed, in the case of presence or absence of redundancy.

CSC343 – Introduction to Databases Database Design — 19

Table of Accesses, with Redundancy

Concept Type Accesses Type Person Entity 1 W ResidenceTown RelationshipEntity 11 WW

Operation 1

Concept Type Accesses Type Town Entity 1 R

Operation 2

CSC343 – Introduction to Databases Database Design — 20

Table of Accesses, without

Redundancy

Concept Type Accesses Type Person Entity 1 W Residence Relationship 1 W

Operation 1

Operation 2 Concept Type Accesses Type Town Entity 1 R Residence Relationship 5000 R

CSC343 – Introduction to Databases Database Design — 21

Comparing the Cost of Operations

„ Presence of redundancy: 9 Operation 1: 1,500 write accesses per day; 9 The cost of operation 2 is almost negligible; 9 Counting twice the write accesses, we have a total of 3,000 accesses a day.

„ Absence of redundancy. 9 Operation 1: 1,000 write accesses per day; 9 Operation 2 however requires a total of 10, read accesses per day; 9 Counting twice the write accesses, we have a total of 12,000 accesses per day. Redundant data may improve performance!

CSC343 – Introduction to Databases Database Design — 22

Removing Generalizations

„The relational model does not allow direct representation of generalizations that may be present in an E-R diagram. „For example, here is an ER schema with generalizations:

CSC343 – Introduction to Databases Database Design — 25

General Rules For Removing Generalization

„ Option 1 is convenient when the operations involve the occurrences and the attributes of E 0 , E 1 and E 2 more or less in the same way. „ Option 2 is possible only if the generalization satisfies the coverage constraint (i.e., every instance of E 0 is either an instance of E 1 or E 2 ) and is useful when there are operations that apply only to occurrences of E 1 or E 2. „ Option 3 is useful when the generalization is not coverage-compliant and the operations refer to either occurrences and attributes of E 1 (E 2 ) or of E 0 , and therefore make distinctions between child and parent entities. „ Available options can be combined (see option 4)

CSC343 – Introduction to Databases Database Design — 26

Partitioning and Merging of

Entities and Relationships

„Entities and relationships of an E-R schema can be partitioned or merged to improve the efficiency of operations, using the following principle: Accesses are reduced by separating attributes of the same concept that are accessed by different operations and by merging attributes of different concepts that are accessed by the same operations.

CSC343 – Introduction to Databases Database Design — 27

Example of Partitioning

CSC343 – Introduction to Databases Database Design — 28

Deletion of Multi-Valued Attribute

CSC343 – Introduction to Databases Database Design — 31

Selecting a Primary Key

„ Every relation must have a unique primary key. „ The criteria for this decision are as follows: 9 Attributes with null values cannot form primary keys; 9 One/few attributes is preferable to many attributes; 9 Internal key preferable to external ones (weak entity); 9 A key that is used by many operations to access the instances of an entity is preferable to others. „ At this stage, if none of the candidate keys satisfies the above requirements, it may be best to introduce a new attribute (e.g., social insurance #, student #,…)

CSC343 – Introduction to Databases Database Design — 32

Translation into a Logical Schema

„The second step of logical design consists of a translation between different data models. „Starting from an E-R schema, an equivalent relational schema is constructed. By “equivalent”, we mean a schema capable of representing the same information. „We will deal with the translation problem systematically, beginning with the fundamental case, that of entities linked by many-to-many relationships.

CSC343 – Introduction to Databases Database Design — 33

Many-to-Many Relationships

Employee(Number, Surname, Salary) Project(Code, Name, Budget) Participation(Number, Code, StartDate)

CSC343 – Introduction to Databases Database Design — 34

Many-to-Many Recursive Relationships

Product(Code, Name, Cost) Composition(Part, SubPart, Quantity)

CSC343 – Introduction to Databases Database Design — 37

Weak Entities

Student(RegistrationNumber, University, Surname, EnrolmentYear) University(Name, Town, Address)

CSC343 – Introduction to Databases Database Design — 38

One-to-One Relationships

Head(Number, Name, Salary, Department, StartDate) Department(Name, Telephone, Branch) OR Head(Number, Name, Salary, StartDate)

Department(Name, Telephone, HeadNumber, Branch)

CSC343 – Introduction to Databases Database Design — 39

Optional One-to-One Relationships

Employee(Number, Name, Salary) Department(Name, Telephone, Branch, Head, StartDate) Or, if both entities are optional Employee(Number, Name, Salary) Department(Name, Telephone, Branch) Management(Head, Department, StartDate)

CSC343 – Introduction to Databases Database Design — 40

A Sample ER Schema