Understanding ER Data Model and Entity Structuring in Logical Data Modeling, Study notes of Design

An introduction to logical data modeling, focusing on entities, attributes, and relationships in an ER data model. Entities are collections of related data elements, represented as rectangular boxes in diagrams. They can be kernel, associative, or characteristic entities. Attributes define the data elements of an entity and correspond to real-world properties. The process of normalization aims to provide efficient data storage by minimizing redundancy and repeating groups. One-to-many relationships exist when a single instance of an entity relates to many instances of another entity.

Typology: Study notes

2021/2022

Uploaded on 09/27/2022

kiras
kiras 🇬🇧

4.7

(21)

292 documents

1 / 31

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Creating a Logical
Data Model
Terms you’ll need to understand:
Entity Relationship Model
Entity
Kernel entity
Associative entity
Characteristic entity
Attribute
Identifier
Dependency
Primary key
Foreign key
Normalization
Denormalization
Normal forms
Relationship
Cardinality
Techniques you’ll need to master:
Entity Relationship Modeling
Database normalization/denormalization
Key and relationship definition
...............................................
2
02 0789731061 CH02 5/10/05 3:54 PM Page 29
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f

Partial preview of the text

Download Understanding ER Data Model and Entity Structuring in Logical Data Modeling and more Study notes Design in PDF only on Docsity!

Creating a Logical

Data Model

Terms you’ll need to understand:

✓ Entity Relationship Model ✓ Entity ✓ Kernel entity ✓ Associative entity ✓ Characteristic entity ✓ Attribute ✓ Identifier ✓ Dependency ✓ Primary key ✓ Foreign key ✓ Normalization ✓ Denormalization ✓ Normal forms ✓ Relationship ✓ Cardinality

Techniques you’ll need to master:

✓ Entity Relationship Modeling ✓ Database normalization/denormalization ✓ Key and relationship definition

...............................................

(^30) Chapter 2................................................

Introduction Designing a logical data model is all about preparing for the physical data design. Whether physical elements end up being a simple database, a com- plex data warehouse, or some other data store, the ideas behind the logical design remain the same. In this stage of development, try not to gear any design to a specific physical structure. It is important to keep in mind that we are still just planning things out, and the decisions regarding the physical ele- ments have yet to be made. Even though the physical elements have yet to be determined, there is a def- inite correlation between what you see in the logical model and what will end up being present when the model takes on a physical form. There is almost a one-to-one mapping between the objects that we discuss in the logical stage of development and similar objects in the physical databases, tables, files, and other physical articles. Though there are many data storage possibilities, the largest percentage of these will be databases—the storage mechanism of choice for most systems. Relational databases are databases in which data is organized into related objects. Each of the objects contained in a database is related to the others in some way. Relational databases, based on the paper written by Dr. E. F. Codd in 1970, store sets of data in relations, called tables. The tables are often related to one another through dependencies, but this is not required. In all the time that has passed since then, the modeling of data structures has remained based on this concept.

Data Modeling and

Logical Data Design Most individuals in the technology arena understand the physical compo- nents of a database system. It is easy to recognize data in a columnar format from spreadsheets, tables, data files, and other common data storage tech- niques. In a structured data storage system, however, much planning goes into the makeup of the storage system before any physical forms are taken. Long before a combination of files containing records and fields along with tables containing rows and columns form a database, the data content is ana- lyzed and a concept of the data, or logical structure is formed. This is an

(^32) Chapter 2................................................

Data Modeling To develop a data model, various application architectures can be considered. During logical modeling, variations in implementations do not need to be known. In meeting a specific business need, the data needs to be modeled in a pristine fashion. If you have a tendency toward one implementation or another, you could hamper this design goal. It is best to design the model as a generic data model before taking that model into future phases. In subse- quent stages of development, the database will take on its physical attributes and will be adjusted as applications take shape; yet these stages should not be stepped into prematurely. Focus first on the raw information as a base for the logical model.

Relational database design modeling was first developed by the database engineer Charles Bachman, in 1960, and then in 1976, database design modeling became the ER Model by Peter Chen. An ER Model allows a database to be defined in a simple and organized manner. Other modeling techniques have come and gone over the years, but the ER Model is the preferred technique used by most experienced database developers.

Modeling with the ER Data Model As mentioned previously, the ER data model is one of the most popular data models in use. An ER data model consists of three main components that are present in any model format: entities , attributes , and relationships. ➤ An entity is a discernible thing about which data is kept. In data model- ing, entities can be closely compared to the physical element of a table in a database. An entity is a collection of related data elements or attrib- utes and usually represents a major facet of the business problem. Supplier, product, employee, and order are all good examples of entities. ➤ Attributes are the characteristics given to an entity, such as ProductNumber and FirstName. Attributes of an entity represent a set of properties, each property being a data element within an entity. They will become the columns or fields in the physical database. ➤ Relationships show a logical link between two or more entities. Where two entities have a direct affiliation, a relationship is used to define the connection between the entities. A supplier entity may be related to orders; a product entity may be related to purchases. Relationships will normally establish constraints within the physical database.

............................................... .Creating a Logical Data Model^^33

Entities represent the primary elements of the ER Model. Each entity will represent a person, place, thing, or concept involved in a business process. It is usually easiest to determine the entities of a process by isolating the impor- tant players. When a customer places an order containing one or more prod- ucts, the process utilizes at least three major entities: customer , order , and product.

An ER Model is drawn out using an Entity Relationship Diagram (ERD). A rough diagram can be constructed using pencil and paper and a few simple shapes. For a more polished look a drawing tool is recommended, such as Microsoft Paint, or better yet a tool specifically used for the modeling process, such as Microsoft Visio. The actual tool used for drawing the dia- gram can vary, and you will find that different developers have their own preferences and reasons for preferring one tool over another. Most of the diagrams throughout this book have been developed using the two previous- ly mentioned programs, along with SQL Server’s own built-in tools. To illus- trate the concepts of the ER modeling process, we will be using a fictional company, Northwind Traders. This company will be familiar to many Microsoft Access users and it is also supplied with SQL Server as a sample database.

Entity Selection To begin the modeling process, you need to first isolate the entities required for the business process. A standard documentation technique is to draw entities as rectangular boxes with enough space to hold many attributes. An entity is the basic division of a database. In the logical design, entities are rep- resentative of the tables that will be present when the database development process moves into the physical design phase. Each entity exists as a separate individual data collection, unique from all the other entities.

Entities are generally the people, places, and things that make up a process. They can be qualified as one of three basic types: kernel entities , associative entities , and characteristic entities. These entity types are described further in the following list:

➤ A kernel entity exists on its own; it doesn’t define or provide descriptive information for other entities. An example of a kernel entity would be a product listing in an inventory model. The information contained in each kernel entity of a table represents the heart of the database model. ➤ Associative entities are needed to allow multiple kernel entities to be tied together. In the inventory system, a sales entity would be needed to tie a customer kernel entity to the products they have purchased. This same sales entity could be tied to another kernel entity, such as salespeople.

............................................... .Creating a Logical Data Model^^35

With further knowledge of the business process, other entities may come to light. In the case of the Northwind process, products all fall into separate cat- egories and the product categories are also tracked. This would require the addition of an additional entity, and the beginnings of the ER diagram would look similar to the example shown in Figure 2.2.

Figure 2.2 Northwind’s entities in an ER Model.

Entity structuring accommodates the initial stages of database design. When you’re designing an appropriate logical model, the data must be organized into these separate elements that will later make up the physical database tables. An entity is characterized by its attributes. Attributes are used to define the data elements of an entity. After the initial entities have been defined, the process of describing each entity through its characteristic prop- erties begins.

Attribute Definition Identifying attributes is the next step in ensuring a successful data modeling process. In defining attributes you are setting out to define entity composi- tion. Each entity will have descriptive elements that pertain solely to that element. An attribute is a descriptive element or property of an entity. Fields will represent the attributes when the logical design progresses to the phys- ical design stage. Attributes are characteristics or properties defined within a single entity, and they correspond to real-world properties of a place, thing, or concept.

(^36) Chapter 2................................................

Attributes such as names and addresses are almost always present for people and organizations. Other attributes provide further information for the enti- ty as required for the business process being defined.

Deciphering attributes from written descriptions and reports is more of a real-world scenario. The exam will provide the attributes; you will be expected to pick appropri- ate entities.

Try to find out the attributes that fit each of the entities. More attributes may have to be added later as the model becomes more complete. If you missed an attribute or added extra attributes at this time, they will likely be pointed out when the model is normalized. The normalization of a data model will be discussed later in this chapter, in the section “Data Normalization with Normal Forms.” Attribute decisions will vary from person to person, depending on your business scenario perspective for which the data is being modeled. The identification and creation of attributes is a developed skill; there is no true method for defining all attributes of an entity. Each business problem will require a variation of entity content, so the business process itself will lead to a lot of attribute choices. A few guidelines to use in the identification, creation, and naming of attrib- utes will help ease this process. The first is how you name your entities. A good name makes an attribute look professional and helps in its readability. Appropriate naming conventions are often developed as a corporate or development standard within an organization. Often mechanisms for short- ening names or using common prefixing or suffixing is part of a program- ming team’s standard. Here are some good guidelines that help in naming entities. Consistently following these guidelines will help to keep all of your designs up to the same standard: ➤ An entity or an attribute should be named in its singular form, thereby implying that it is only a single instance. An instance is a single occur- rence of an entity. ➤ The use of underscores (_), spaces, and other special characters is not a good habit because special characters have particular meanings in some software packages, and the mixture of text and other characters is diffi- cult to type. Try to distinguish a word from another by using mixed case, as in LastName instead of Last_Name.

(^38) Chapter 2................................................

flawless throughout the whole ER Model. When attributes are decomposed, different methods of ensuring data integrity can be applied to the broken- down segments rather than the attribute as a whole. It can be difficult to check the validity of an entire address, but when decomposed, the elements can be more easily checked. Decomposing also aids in the sorting of data for use in specific business processes, such as mass mailing. You will, in most cases, also be improving data retrieval performance when decomposed attributes are used. A generic attribute, Address, contains the street, city, region, postal code, and country. To get just the region of a customer in Washington, you have to select the whole Address attribute and parse it to find Washington, thereby degrading per- formance because of the redundant data retrieved. If you have four separate attributes, you can select the Region column and get your results more quick- ly. This same rule applies to updating data. It’s easier to update a single part of an Address than to parse and then update the whole Address attribute. As shown by the example provided in Figure 2.3, the Address has been fully decomposed for all entities, but the name has been decomposed only for the Employee entity.

Figure 2.3 Northwind’s entities with attributes.

Key Attributes The use of an attribute can vary from system to system, but some attributes will be present in most systems to help sort data and perform relationship ties

............................................... .Creating a Logical Data Model^^39

between one entity and another. A key attribute is almost always present within an entity to act as an identifier, much as a person’s name identifies that person as being a unique individual. Entities are usually interdependent: Each holds information that relates to other entities. These relationships can be defined by their correlated dependencies. Key attributes are also therefore used for the purpose of relating one entity to another.

Recognizing Key Attributes After all attributes have been defined and keys have begun to be recognized, the modeling process will be completed with the application of relationships and the normalization of data. These two processes are closely related, as you will see later, but before they can begin, key attributes must be recognized. These are specialized attributes referred to as identifiers. An identifier is an attribute or a set of attributes that defines one unique element of an entity. The use of identifiers allows for the individual selection of records from an entity. As the design progresses to the physical stage, identifiers will become the primary and foreign keys , allowing entities to be tied together through association or relationships. For Example, a product’s identifying attribute is usually a unique product ID.

Identification of Primary and Foreign Keys A primary key is a specialized attribute that is generally defined for each enti- ty. The primary key is almost always defined, though is not necessarily required for all entities in a data model. However, the provision of a primary key does allow for a considerable number of benefits and should be considered in every instance. When defining a primary key, you should keep various factors in mind. The primary key normally defines uniqueness in an entity in that every record of a table has its own unique primary key. Also, when defined, a primary key should not be permitted to be empty. If a primary key is empty, you have a situation in which data integrity is difficult (if not impossible) to maintain.

A primary key should be defined as a single attribute that doesn’t allow for duplicates or empty content. The primary key should be as small as possible. It is possible to create a compound primary key that uses multiple attributes or a key that contains a large number of bytes, but in the physical design this will increase the overhead and response time associated with data retrieval. A compound primary key consisting of multiple attributes is used in instances in which a singular attribute by itself does not enforce uniqueness within an entity. As we move through the modeling process, we will discover the need to use such a compound structure in some instances.

............................................... .Creating a Logical Data Model^^41

➤ Primary keys can be used only for columns or attributes that don’t allow empty entries. Allowing empty values would mean that a row would not be uniquely identified. ➤ The attribute chosen to hold a primary key must have values unique throughout the entity.

Foreign keys help in the relational process between two entities. When a pri- mary key is created on a parent entity, it is connected to another entity by linking to the other entity’s foreign key. For example, in an invoice situation, there are usually two entities: one for invoice general information and the other for invoice details. The invoice details would contain a hook on to the invoice general entity through the use of a foreign key, potentially the invoice number or a surrogate key.

Before we are ready to draw all relationships into the model, the data model should begin a process of normalization. Some of the relationships can no doubt already be seen; however, when normalization standards are applied to the model, more relationships may be found. Only after the model has been fully normalized will you have a complete view of the entity relationships.

Data Normalization The process of normalization is the division of entities in an attempt to pro- vide the most efficient use of data storage. At times, denormalization is planned redundancy that is subsequently performed to improve response time and better use of resources. The process of designing the logical structure of a database is an attempt to provide a degree of normalization combined with aspects of denormalization to produce optimum storage efficiency while still providing acceptable levels of performance and resource utilization. In most instances, data is fully normalized before any aspects of denormalization are considered. In fact, denormalization usually is not approached until the physical model is in development. Normalization is usually applied in practice from the outset of data model- ing. After you’re comfortable with all the modeling concepts, you will find yourself implementing normalized structures as a form of good practice. As

Keys are usually created as part of the table creation process, but they can be added to the table after the initial generation. The syntax for the creation of keys and their association to tables are discussed in Chapter 3.

(^42) Chapter 2................................................

with all other good development habits, you must first work through the concepts at a granular level before they begin to become second nature. Normalization in general refers to how you implement and store data. Normalization is a design process intended to eliminate duplicate data. In a normalized entity, the redundant data is removed and the entity is simplified to its most basic form. This usually leads to a more involved entity structure with more entities. In the same way, the process of database normalization and planned denormalization is the process of simplifying data and data design to achieve maximum performance and simplicity. This denormaliza- tion process involves the planned addition of redundant data. Although both normalization and denormalization are valid, necessary processes, the two achieve opposite goals. They don’t by themselves achieve maximum performance and simplicity, though they do strive for a perfect balance between performance (denormalization) and simplicity (normaliza- tion). Normalization means no duplicate data.

Data Normalization with Normal Forms In 1970, Dr. E. F. Codd designed three regulations a relational database adheres to, known as normal forms. Today they are known as the first, second, and third normal forms. (Normal forms do exceed three, but the first three are the only ones widely used.) The goal of the initial database design is to simplify the database into the third normal form. Using normal forms provides these basic advantages: ➤ No data redundancy contributing to data integrity ➤ Index columns for faster sorting and searching ➤ Smaller entities that reduce table locking and data contention ➤ Query optimization

Although in most cases a data model is taken only to the third normal form, it is worth noting that there are actually five forms. Because development to the fourth and fifth normal forms is not a requirement for the exam, we mentioned those forms here only for completeness.

The rules provided by these normal forms are discussed in the following sections.

(^44) Chapter 2................................................

Fourth and Fifth Normal Forms The fourth normal form dictates that a third normal form has no multival- ued dependencies. In other words, every value of an attribute must appear in at least one row with every other value of the other attribute. The fifth normal form is intended to eliminate joint dependency constraints. This is a theoretical consideration that is thought to have no practical value. If you disregard these forms, the design of the database might be less than perfect, but it should have no loss of functionality. Normalizing a database is seemingly good but it can hamper performance. In many cases a designer has to consider denormalizing a database. Planned redundancy or denormalization is often brought into the design to provide for better performance or to clarify data.

Denormalization Purposely adding redundant data and other fields that disobey normal forms is denormalization. Denormalizing as a process is more part of the physical design and will also be revisited during the implementation to improve per- formance. The concept is covered here for continuity and also to show the contrast with data normalization. After you have a logical design completely normalized, rarely will you keep it in that state as you proceed to the physi- cal design of the actual database.

Although normalization gives you a great deal of storage efficiency and might result in increased performance in some situations, there are some drawbacks to a com- pletely normalized database. You should consider the trade-offs in storage efficiency, performance, and maintainability in your final design.

If you go too far with the normalization process, you might actually reverse the effect you’re trying to achieve. Although normalization will reduce data redundancy, result in smaller tables with fewer rows, and provide a logical and consistent form, it will also require table joins for the implementation and will not allow for summary, duplicate, or other data that a user might expect to find in a single table. Normalizing a database design too far can decrease performance and make it difficult to alter the underlying table structure, and might make it harder to work with the data. Denormalization may occur at any number of levels. At the absolute extreme, a database schema can be completely duplicated to a number of servers across the network by implementing replication. This could be warranted if you need to distribute the access to the data across slow network

............................................... .Creating a Logical Data Model^^45

links or to multiple remote locations. Many advantages are gained through database replication, because the data is more easily available at the locations where it will be used. The drawback of this is increased maintenance of a number of servers. Also, if database replication isn’t configured properly, it could monopolize a WAN. In addition, if there are network problems or there is a poor setup, the data might not be synchronized to a level that keeps it up-to-date. Data can be maintained as an exact duplication against a num- ber of servers, but this would require a high-speed network and the configu- ration of a two-phase commit.

Other, simpler examples of planned denormalization would be to maintain complete address information for customers, suppliers, employees, and so on in the tables with the rest of their general information. This is what most users expect, and it is difficult to maintain a separate address table. There are no defined rules for denormalization, but some definite guidelines will help you understand what level might be appropriate in a given situation.

Data warehousing schemas often use a denormalized approach referred to as a star or snowflake schema. This schema structure takes advantage of typical decision support queries by using one central “fact” table for the subject area and many dimension tables containing denormalized descriptions of the facts.

There are also several other situations to consider. If a join requires the implementation of more than three tables, denormalization should be con- sidered. In some situations in which the number of columns in a table can grow very large, a denormalized structure would split the table into more easily handled portions and use a one-to-one relationship to connect the information.

The completed structure will have to be modified over time as the live use of the database warrants. Never consider a database design to be perfect or complete. It often takes several years of actual use to determine the best lev- els of normalization and denormalization to use.

Some forms of replication and data transfer accommodate redundancy by allowing the data to be on two servers simultaneously. If so, one server is usually treated as a read-only server and is used for offloading query processing from the updatable server. This is discussed in depth later in the book and is definitely an exam topic to be prepared for.

Relationships Between Entities Relationships are the final component in an ER Model, allowing for a logical linkage between one entity and another. A relationship in an ER Model

............................................... .Creating a Logical Data Model^^47

Identification of Relationships

Setting up the relationships will finalize a draft of the ER Model. This draft will undergo modifications as the database approaches a physical design. Because at this stage of design all the basic elements of data model have been completed using the Entity Relationship approach, we now have a working model allowing us to proceed further into development. A general listing of attributes for each entity and the relationships between these entities is an important springboard to use to progress through the database design to the eventual completed system.

Implementing the relationships and applying normalization principles are often performed as parallel processes because decisions made in one process effect the other process and vice versa. Normalization helps in determining cardinality and the cardinality is a requirement of each relation. The three basic cardinality types, one-to-one, one-to-many, and many-to-many, are partially a result of knowing the business scenario being modeled and par- tially derived from applying normal forms.

One-To-One Relationship

The one-to-one type of relationship occurs when one row or data element of an entity is associated with only one row or element in the second entity. This type is used mostly when an entity has an extraordinarily large number of attributes so the entity is split in two to make it easier to manage. Also, an extra entity might be desired when developing the physical storage locations of the data. By separating seldom-used data from more frequently used infor- mation, you can accommodate faster data retrieval and updates. It is for this reason that these types of relationships are pulled into the model until the physical design of the database has begun.

In modeling a one-to-one relationship, a common key must be present in each of the entities being related. This common key allows for the collective attributes of both entities to be retrieved using a single value. Consider, for example, a product that has many descriptive attributes. Two product entities could be used to separate the different properties. Each entity would use a product number or similar value as a key. This is illustrated in the model seg- ment shown in Figure 2.4.

In the preceding example the ProductCommon entity is used to store the attrib- utes that are most readily used, and the ProductAtypical entity contains other attributes that, though still needed, are less frequently used. It is much more common to find relationships existing in a one-to-many cardinality.

(^48) Chapter 2................................................

Figure 2.4 Two product entities in a one-to-one relationship.

One-To-Many Relationship One-to-many relationships exist when a single instance of an entity (the parent entity) relates to many instances of another entity (the child entity). One-to- many dependencies are a natural occurrence in the real world—for example, a customer will have many orders, and a manufactured product could have many components.

This relationship is a classic parent-child dependency. A foreign key in a child entity will point to the associated primary key of the parent. When this rela- tionship is related, removal of a parent could cause orphaning of the child because of its dependency. There are many examples of this type of rela- tionship in the Northwind order process. The following list represents those seen in the diagram to this point: ➤ One supplier to many products ➤ One order to many order details ➤ One product exists within many order details ➤ One employee has many orders ➤ One customer has many orders ➤ One shipper is used in many orders ➤ One category will contain many products

Although one-to-many usually establishes “many” as the normal numerical component, you can have zero or only one child row. Customers can have

One-to-many relationships can be expressed as many-to-one as well, though one-to- many is a common standard. It depends on how the relationship is being viewed.