Download Database Design: Normalization and Data Models and more Slides Data Structures and Algorithms in PDF only on Docsity!
1.204 Lecture 2
Data models, concluded
NNormali lizatition
Keys
• PrimaryPrimary keykey:: oneone oror moremore attributesattributes thatthat uniquelyuniquely
identify a record
- Name or identifying number, often system generated
- Composite keys are made up of two fields
- E.g., aircraft manufacturer and model number
Foreign keys
• PrimaryPrimary keykey ofof thethe independentindependent or parentor parent
entity type is maintained as a non-key
attribute in the dependent or child entity type
Foreign keys
DeptID DeptName 930 Receiving 378 Assembly 372 Finance 923 Pl i
EmpID DeptID EmpFirstName EmpLastName 4436 483 Brown John 4574 483 Jones Helen 5678 372 Smith Jane 5674 372 Crane Sally (^) 923 Planning 483 Construction
5674 372 Crane Sally 9987 923 Black Joe 5123 923 Green Bill 5325 483 Clinton Bob
Database requires a valid department number when employee is added Employee ID is the unique identifier of employees; department number is not needed as part of the employee primary key
Foreign keys (many-many
relationships)
- Primaryy key of py parent is used in pprimaryy keyy of child
Independent Dependent Independent
Vehicle can be driven by many drivers; driver can drive many vehicles
Many-to-many relationships with
foreign keys
Vehicle VehicleID VehicleMake VehicleModel
Vehicle Driver VehicleID DriverID
Driver VehicleID VehicleMake VehicleModel DriverID DriverName DriverLicenseNbr 35 Volvo Wagon 33 Ford Sedan 89 GMC Truck
VehicleID DriverID 35 900 35 253 89 900
DriverID DriverName DriverLicenseNbr 253 Ken A 900 Jen B
Never create an entity with vehicle1, vehicle2,…!
Five normal forms: preventing errors
• 1:1: AllAll ooccurrencesccurrences ofof anan entityentity mustmust containcontain tthehe
same number of attributes.
- No lists, no repeated attributes.
• 2: All non- primary key fields must be a function
of the primary key.
• 3: All non- primary key fields must not be a
function of other non- primary key fields.
• 4: A row must not contain two or more
independent multi-valued facts about an entity.
• 5: A record cannot be reconstructed from several
smaller record types. (Informal)
Examples based on William Kent, "A Simple Guide to Five Normal Forms in Relational Database Theory", Communications of the ACM 26(2), Feb. 1983
First normal form
• • All rows must beAll rows must be fixed lengthfixed length
– Does not allow variable length lists.
– Does not allow repeated fields, e.g., vehicle1,
vehicle2, vehicle3…
- However many columns you allow, you will always need one more…
- • UseUse a many-many relationship instead, always Seea many many relationship instead always. See vehicle-driver example
Third normal form
Employee Department DepartmentLocation 234 Finance Boston 223 Finance Boston 399 Operations Washington
• NonNon-keykey fieldsfields cannotcannot bebe aa ffunctionunction ofof otherother nonnon-
key fields
- Example that violates third normal form
- Key is employee
- Someone found it convenient to add department location for a report
- Department location is a function of department, which is not a key
- Problems:
- Department location is repeated in every employee record
- If department location changes, every record with it must be changed
- Data might become inconsistent
- If a department has no employees, there may be nowhere to store its location
Third normal form
• • SolutionSolution
– Two entity types: Employee and department
Employee Department 234 Finance 223 Finance 399 Operations
Department DepartmentLocation Finance Boston Operations Washington
TV: “The truth, the whole truth, and nothing but the truth”
DB: “The key, the whole key, and nothing but the key”
Fourth normal form
Employee Skill Language Brown cook English Smith typeyp^ German
• A row should not contain two or more
independent multi-valued facts about an
entity.
– Example that violates fourth normal form:
- AnAn employemployeeee maymay havhavee sevseveraleral skillsskills andand llanguagesanguages
– Problems
- Uncertainty in how to maintain the rows. Several approaches are possible and different consultants, analysts or programmers may (will) take different approaches, as shown on next slide
Employee Skill Language Brown cook Brown type BrownBrown FrenchFrench Brown German Brown Greek
Fourth normal form problems
- Blank fields ambiguous. Blank skill could mean:
- Person has no skill
- Attribute doesn’t apply to this employee
- Data is unknown
- Data may be found in another record (as in this case)
- Programmers will use all these assumptions over time, as will data entry staff, analysts, consultants and users - Disjoint format is used on this slide. Effectively same as 2 entity types.
• A record cannot be reconstructed from several
smaller record types.
• Example:
- Agents represent companies
- Companies make products
- Agents sell products
• Most general case (allows any combination):
Fifth normal form
Agent Company Product
Smith Ford car
Smith GM truck
- Smith does not sell Ford trucks nor GM cars
- If these are the business rules, a single entity is fine
- But…
Fifth normal form
• In most real cases a problem occurs
– If an agent sells a certain product and she
representts th the company, ththen shhe selllls tthhatt
product for that company.
Agent Company Product Smith Ford car Smith Ford truck Smith GM car Smith GM truck Jones Ford car
– We can reconstruct all true facts from 3 tables
instead of the single table:
Agent Company Smith Ford Smith GM Jones Ford
(Repetition of facts)
Agent Product Smith car Smith truck Jones car
Company Product Ford car Ford truck GM car GM truck
(No repetition of facts)
Fifth normal form
• ProblemsProblems wiwithth thethe singlesingle tabletable formform
- Facts are recorded multiple times. E.g., the fact that Smith sells cars is recorded twice. If Smith stops selling cars, there are 2 rows to update and one will be missed.
- Size of this table increases multiplicatively, while the normalized tables increase additively. With big operations, this is a big difference. - 100,000 x 100,000 is a lot bigger than 100,000 + 100,
• It’s much easier to write the business rules fIt’ h i t it th b i l from
5 th^ normal
- Rules are more explicit
- Supply chains usually have all sorts of 5th^ normal issues
Fifth normal form, concluded
• An example with a subtle set of conditions
Non-normal
Agent Company Product Smith Ford car Smith Ford truck SmithSmith GMGM carcar Smith GM truck Jones Ford car Jones Ford truck Brown Ford car Brown GM car Brown Toyota car Brown Toyota bus
Can yyou quickly q y deduce the business rules from this table?
Fifth normal
Agent Company SmithSmith FordFord Smith GM Jones Ford Brown Ford Brown GM Brown Toyota
Company Product FordFord carcar Ford truck GM car GM truck Toyota car Toyota bus
Agent Product SmithSmith carcar Smith truck Jones car Jones truck Brown car Brown bus
- Jones sells cars and GM makes cars, but Jones does not represent GM
- Brown represents Ford and Ford makes trucks, but Brown does not sell trucks
- Brown represents Ford and Brown sells buses, but Ford does not make buses