























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A series of lessons from a database design course, focusing on the concepts of load and frequency of operations, redundancy, and the translation of an ER schema into the relational model. It covers various types of relationships, including many-to-many, recursive, and one-to-many relationships, as well as weak entities and optional relationships. The document also discusses the importance of choosing main identifiers and the implications of redundancy on database access.
Typology: Lecture notes
1 / 31
This page cannot be seen from the preview
Don't miss anything!
























CSC343 – Introduction to Databases Database Design — 1
Database Design Database Design From an ER Schema to a Relational OneFrom an ER Schema to a Relational One Restructuring an ER schemaRestructuring an ER schema Performance AnalysisPerformance Analysis Analysis of Redundancies, RemovingAnalysis of Redundancies, Removing GeneralizationsGeneralizations Translation into a Relational SchemaTranslation into a Relational Schema
CSC343 – Introduction to Databases Database Design — 2
Supplier
Part
supplies
(1,N)orders (1,N)Customer
Part name
Supplier
Part Customer
Part(Name,Description,Part#) Supplier(Name, Addr) Customer(Name, Addr) Supplies(Name,Part#, Date) Orders(Name,Part#)
Hierarchical
Network
Relational
Supplier Customer
Date (1,N)
(1,1)
CSC343 – Introduction to Databases Database Design — 3
Given a conceptual schema (ER, but could also be a UML), generate a logical (relational) schema. This is not just a simple translation from one model to another for two main reasons:
CSC343 – Introduction to Databases Database Design — 4
CSC343 – Introduction to Databases Database Design — 7
The cost of an operation is measured in terms of the number of disk accesses required. A disk access is, generally, orders of magnitude more expensive than in-memory accesses, or CPU operations.
For a coarse estimate of cost, we assume that
9 a Read operation (for one entity or relationship) requires 1 disk access; 9 A Write operation (for one entity or relationship) requires 2 disk accesses (read from disk, change, write back to disk).
CSC343 – Introduction to Databases Database Design — 8
CSC343 – Introduction to Databases Database Design — 9
Operation 1: Assign an employee to a project. Operation 2: Find an employee record, including her department, and the projects she works for. Operation 3: Find records of employees for a department. Operation 4: For each branch, retrieve its departments, and for each department, retrieve the last names of their managers, and the list of their employees. Note: For UML class diagrams, these would be operations associated with persistent database classes.
CSC343 – Introduction to Databases Database Design — 10
Concept Type Volume Branch E 10 Department E 80 Employee E 2000 Project E 500 Composition R 80 Membership R 1900 Management R 80 Participation R 6000
Operation Type Frequency Operation 1 I 50 per day Operation 2 I 100 per day Operation 3 I 10 per day Operation 4 B 2 per day
The volume of data and the general characteristics of the operations can be summed up using two special tables. Table of volumes Table of operations
I - Interactive B - Batch
CSC343 – Introduction to Databases Database Design — 13
CSC343 – Introduction to Databases Database Design — 14
A redundancy in a conceptual schema corresponds to a piece of information that can be derived (that is, obtained through a series of retrieval operations) from other data in the database. An Entity-Relationship schema may contain various forms of redundancy.
CSC343 – Introduction to Databases Database Design — 15
CSC343 – Introduction to Databases Database Design — 16
The presence of a redundancy in a database may be 9 an advantage : a reduction in the number of accesses necessary to obtain the derived information; 9 a disadvantage : because of larger storage requirements, (but, usually at negligible cost) and the necessity to carry out additional operations in order to keep the derived data consistent****. The decision to maintain or eliminate a redundancy is made by comparing the cost of operations that involve the redundant information and the storage needed, in the case of presence or absence of redundancy.
CSC343 – Introduction to Databases Database Design — 19
Concept Type Accesses Type Person Entity 1 W ResidenceTown RelationshipEntity 11 WW
Operation 1
Concept Type Accesses Type Town Entity 1 R
Operation 2
CSC343 – Introduction to Databases Database Design — 20
Concept Type Accesses Type Person Entity 1 W Residence Relationship 1 W
Operation 1
Operation 2 Concept Type Accesses Type Town Entity 1 R Residence Relationship 5000 R
CSC343 – Introduction to Databases Database Design — 21
Presence of redundancy: 9 Operation 1: 1,500 write accesses per day; 9 The cost of operation 2 is almost negligible; 9 Counting twice the write accesses, we have a total of 3,000 accesses a day.
Absence of redundancy. 9 Operation 1: 1,000 write accesses per day; 9 Operation 2 however requires a total of 10, read accesses per day; 9 Counting twice the write accesses, we have a total of 12,000 accesses per day. Redundant data may improve performance!
CSC343 – Introduction to Databases Database Design — 22
The relational model does not allow direct representation of generalizations that may be present in an E-R diagram. For example, here is an ER schema with generalizations:
CSC343 – Introduction to Databases Database Design — 25
General Rules For Removing Generalization
Option 1 is convenient when the operations involve the occurrences and the attributes of E 0 , E 1 and E 2 more or less in the same way. Option 2 is possible only if the generalization satisfies the coverage constraint (i.e., every instance of E 0 is either an instance of E 1 or E 2 ) and is useful when there are operations that apply only to occurrences of E 1 or E 2. Option 3 is useful when the generalization is not coverage-compliant and the operations refer to either occurrences and attributes of E 1 (E 2 ) or of E 0 , and therefore make distinctions between child and parent entities. Available options can be combined (see option 4)
CSC343 – Introduction to Databases Database Design — 26
Entities and relationships of an E-R schema can be partitioned or merged to improve the efficiency of operations, using the following principle: Accesses are reduced by separating attributes of the same concept that are accessed by different operations and by merging attributes of different concepts that are accessed by the same operations.
CSC343 – Introduction to Databases Database Design — 27
CSC343 – Introduction to Databases Database Design — 28
CSC343 – Introduction to Databases Database Design — 31
Every relation must have a unique primary key. The criteria for this decision are as follows: 9 Attributes with null values cannot form primary keys; 9 One/few attributes is preferable to many attributes; 9 Internal key preferable to external ones (weak entity); 9 A key that is used by many operations to access the instances of an entity is preferable to others. At this stage, if none of the candidate keys satisfies the above requirements, it may be best to introduce a new attribute (e.g., social insurance #, student #,…)
CSC343 – Introduction to Databases Database Design — 32
The second step of logical design consists of a translation between different data models. Starting from an E-R schema, an equivalent relational schema is constructed. By “equivalent”, we mean a schema capable of representing the same information. We will deal with the translation problem systematically, beginning with the fundamental case, that of entities linked by many-to-many relationships.
CSC343 – Introduction to Databases Database Design — 33
Employee(Number, Surname, Salary) Project(Code, Name, Budget) Participation(Number, Code, StartDate)
CSC343 – Introduction to Databases Database Design — 34
Product(Code, Name, Cost) Composition(Part, SubPart, Quantity)
CSC343 – Introduction to Databases Database Design — 37
Student(RegistrationNumber, University, Surname, EnrolmentYear) University(Name, Town, Address)
CSC343 – Introduction to Databases Database Design — 38
Head(Number, Name, Salary, Department, StartDate) Department(Name, Telephone, Branch) OR Head(Number, Name, Salary, StartDate)
Department(Name, Telephone, HeadNumber, Branch)
CSC343 – Introduction to Databases Database Design — 39
Employee(Number, Name, Salary) Department(Name, Telephone, Branch, Head, StartDate) Or, if both entities are optional Employee(Number, Name, Salary) Department(Name, Telephone, Branch) Management(Head, Department, StartDate)
CSC343 – Introduction to Databases Database Design — 40