Download Data Base Management System Notes and more Study notes Database Management Systems (DBMS) in PDF only on Docsity!
DATABASE MANAGEMENT SYSTEM
Complete Exam Notes
Lecturer in Computer Science — Thal University Bhakkar
Covers: Definitions • Detailed Notes • Short Q&A
Contents
Chapter 1: Introduction to DBMS 1.1 Key Definitions Database An organized collection of related data stored and accessed electronically. DBMS Database Management System — software that manages databases (e.g., MySQL, Oracle, PostgreSQL). Data Raw, unprocessed facts and figures without context. Information Processed, organized data that is meaningful and useful. Data Redundancy Unnecessary duplication of data in multiple places in a database. Data Inconsistency Different copies of the same data have conflicting values. Data Independence Ability to change schema at one level without affecting the level above. Query A request to retrieve, insert, update, or delete data from a database. Instance The actual data stored in a database at a particular moment. Schema The logical structure/design of the database (blueprint). 1.2 Advantages of DBMS over File System
- Eliminates data redundancy and inconsistency
- Provides data sharing among multiple users
- Enforces data integrity and security constraints
- Provides backup and recovery mechanisms
- Supports concurrent access by multiple users
- Data independence (logical and physical) 1.3 Disadvantages of DBMS
- High cost of hardware and software
- Complex structure — requires trained personnel (DBA)
- Larger overhead due to generality
- Single point of failure if DBMS crashes
Can be reduced by : Replication: Keeping multiple copies of database on multiple servers
Clustering: Multiple servers share the workload
Backup and recovery systems – regular backups to restore from.
1.4 Three-Schema Architecture (ANSI/SPARC)
The three-schema architecture separates the user's view from the physical storage. It has three levels:
External Level How individual users or groups see the data (user views / subschemas). Conceptual Level Describes what data is stored and relationships — hides physical details. Internal Level Physical storage of data on disk (files, indexes, storage format). 1.5 Data Independence Logical Data Independence Ability to change conceptual schema without changing external schema or application programs. Physical Data Independence Ability to change internal/physical schema without changing conceptual schema.
1.6 Database Languages.............................................................................................................................
DDL (Data Definition Language) Defines database structure. Commands: CREATE, ALTER, DROP, TRUNCATE. DML (Data Manipulation Language) Manipulates data. Commands: SELECT, INSERT, UPDATE, DELETE. DCL (Data Control Language) Controls access. Commands: GRANT, REVOKE. TCL (Transaction Control Language) Manages transactions. Commands: COMMIT, ROLLBACK, SAVEPOINT.
1.7 Roles in a Database Environment..........................................................................................................
DBA (Database Administrator) Person responsible for managing and maintaining the DBMS. Database Designer Designs the logical and physical structure of the database. End Users People who interact with the database using queries or applications. Application Programmer Writes programs that interact with the database.
1.8 Short Q&A — Introduction......................................................................................................................
Q: What is a DBMS?
A: A software system that enables users to define, create, maintain, and control access to a database.
Examples: MySQL, Oracle, SQL Server.
Q: What is the difference between a database and a DBMS?
A: A database is the actual collection of data, while a DBMS is the software used to manage and interact
with that data.
Q: What are the types of data independence?
A: Logical data independence (changing conceptual schema without affecting external schema) and
Physical data independence (changing internal schema without affecting conceptual schema).
Q: Name the three levels of the ANSI/SPARC architecture.
A: External level (user views), Conceptual level (logical structure), and Internal level (physical storage).
Q: What is data redundancy and how does DBMS reduce it?
A: Data redundancy is unnecessary duplication of data. DBMS reduces it by centralizing data and using
normalization techniques.
2.5 Short Q&A — Data Models....................................................................................................................
Q: What is an entity in the ER model?
A: An entity is a real-world object or concept that has a distinct existence. For example, Student,
Employee, and Product are entities.
Q: What is the difference between a strong entity and a weak entity?
A: A strong entity has its own primary key and exists independently. A weak entity depends on a strong
entity for its identification and uses a partial key.
Q: What is a multivalued attribute? Give an example.
A: An attribute that can hold more than one value for an entity. Example: A person can have multiple phone
numbers.
Q: What are the types of cardinality?
A: One-to-one (1:1), One-to-many (1:N), Many-to-one (N:1), and Many-to-many (M:N).
Q: What is a derived attribute?
A: An attribute whose value is derived from other attributes. Example: Age can be derived from Date of
Birth.
Chapter 3: Relational Model & SQL.................................................................................................................
3.1 Relational Model Terminology................................................................................................................
Relation (Table) A two-dimensional table consisting of rows and columns. Tuple (Row) A single record in a relation. Attribute (Column) A named column of a relation that represents a property. Domain The set of allowable values for an attribute. Degree Number of attributes (columns) in a relation. Cardinality Number of tuples (rows) in a relation. Primary Key An attribute (or set) that uniquely identifies each tuple in a relation. Foreign Key An attribute in one relation that references the primary key of another relation. Candidate Key An attribute or set of attributes that can uniquely identify a tuple. Super Key A set of attributes that can uniquely identify a tuple (may contain extra attributes). Alternate Key A candidate key that is not selected as the primary key. Composite Key A primary key made up of two or more attributes. NULL A special value indicating missing or unknown data.
3.2 Integrity Constraints...............................................................................................................................
Domain Constraint Each attribute value must be from the defined domain (data type). Key Constraint Primary key must be unique; no two tuples have the same primary key. Entity Integrity Primary key cannot be NULL. Referential Integrity A foreign key value must match an existing primary key value or be NULL. Null Integrity Specifies whether NULLs are allowed for an attribute.
3.3 Relational Algebra..................................................................................................................................
Selection (σ) Selects tuples that satisfy a condition. σ(condition)(Relation). Like WHERE in SQL. Projection (π) Selects specific columns. π(attributes)(Relation). Like SELECT in SQL. Union (∪) Combines tuples from two compatible relations, removes duplicates. Intersection (∩) Returns tuples common to both relations. Difference (−) Returns tuples in first relation but not in second. Cartesian Product (×) Combines every tuple of one relation with every tuple of another. Join ( ⋈ ) Combines related tuples from two relations based on a condition. Natural Join Join that automatically matches attributes with the same name. Rename (ρ) Renames a relation or its attributes.
MIN(col) Returns the minimum value.
3.7 SQL Joins...............................................................................................................................................
INNER JOIN Returns rows with matching values in both tables. LEFT JOIN (LEFT OUTER) Returns all rows from left table + matched rows from right (NULL if no match). RIGHT JOIN (RIGHT OUTER) Returns all rows from right table + matched rows from left. FULL OUTER JOIN Returns all rows from both tables; NULL where no match. SELF JOIN Joins a table with itself. CROSS JOIN Cartesian product — every row combined with every other row.
3.8 Short Q&A — Relational Model & SQL..................................................................................................
Q: What is the difference between Primary Key and Foreign Key?
A: A Primary Key uniquely identifies each row in its own table and cannot be NULL. A Foreign Key is an
attribute in one table that references the Primary Key in another table to establish a relationship.
Q: What is referential integrity?
A: Referential integrity ensures that a foreign key value in a table must match an existing primary key value
in the referenced table, or be NULL.
Q: Difference between WHERE and HAVING?
A: WHERE filters individual rows before grouping. HAVING filters groups after GROUP BY. HAVING is
used with aggregate functions.
Q: What is the difference between DELETE, DROP, and TRUNCATE?
A: DELETE removes specific rows (can use WHERE); DROP removes entire table including structure;
TRUNCATE removes all rows but keeps the structure. DELETE is DML, DROP/TRUNCATE are DDL.
Q: What is a NULL value?
A: NULL represents missing, unknown, or undefined data. It is not zero or empty string. Any arithmetic with
NULL results in NULL.
Q: What is a natural join?
A: A natural join combines two tables based on columns with the same name and compatible data types,
automatically eliminating duplicate columns.
Chapter 4: Normalization.................................................................................................................................
4.1 What is Normalization?..........................................................................................................................
Normalization is the process of organizing a database to reduce data redundancy and improve data
integrity. It involves dividing large tables into smaller ones and defining relationships between them.
4.2 Anomalies (Problems without Normalization)........................................................................................
Insertion Anomaly Cannot insert data without including other unrelated data. Deletion Anomaly Deleting a record unintentionally removes other needed data. Update Anomaly Updating data in one place but not others leads to inconsistency.
4.3 Functional Dependency.........................................................................................................................
A functional dependency X → Y means: given a value of X, we can uniquely determine the value of Y. X is
called the determinant.
Full Functional Dependency Y is fully dependent on the entire composite key X, not just a part of it. Partial Dependency Y depends on only a part of the composite primary key. Transitive Dependency X → Y and Y → Z, so X → Z (Y is a non-key attribute determining Z).
4.4 Normal Forms........................................................................................................................................
Normal Form Rule / Condition Eliminates 1NF All attributes must be atomic (no multivalued/composite). Each column holds single value. Must have a primary key. Repeating groups and multivalued attributes. 2NF Must be in 1NF. No partial dependencies (every non-key attribute must depend on the WHOLE primary key). Partial dependencies. 3NF Must be in 2NF. No transitive dependencies (non-key attributes must not depend on other non-key attributes). Transitive dependencies. BCNF Stronger version of 3NF. For every functional dependency X→Y, X must be a superkey. Anomalies not covered by 3NF.
4.5 Short Q&A — Normalization..................................................................................................................
Q: What is normalization?
A: Normalization is the process of organizing database tables to reduce redundancy and eliminate
update/insert/delete anomalies by applying a series of normal forms.
Q: What is a functional dependency?
Chapter 5: Transactions & Concurrency Control...........................................................................................
5.1 Transaction..........................................................................................................................................
A transaction is a logical unit of work that contains one or more SQL operations. It must be executed
completely or not at all.
5.2 ACID Properties...................................................................................................................................
Atomicity A transaction is treated as a single unit — either ALL operations succeed or NONE do. On failure, the system performs a rollback. Consistency A transaction brings the database from one valid state to another valid state, preserving all integrity constraints. Isolation Transactions execute independently of each other. Intermediate results are not visible to other transactions. Durability Once a transaction is committed, the changes are permanent — even in case of system failure.
5.3 Transaction States...............................................................................................................................
Active Transaction is being executed. Partially Committed Final operation has been executed but not yet committed. Committed Transaction completed successfully; changes are permanent. Failed An error occurred; transaction cannot proceed. Aborted Transaction was rolled back; database restored to state before transaction.
5.4 Concurrency Problems.........................................................................................................................
Dirty Read A transaction reads data written by another uncommitted transaction. Non-Repeatable Read A transaction reads the same data twice and gets different values (another transaction modified it in between). Phantom Read A transaction re-executes a query and finds new rows inserted by another transaction. Lost Update Two transactions read the same value, both modify it; one update overwrites the other.
5.5 Concurrency Control Techniques.........................................................................................................
Lock-Based Protocol Transactions acquire locks (Shared or Exclusive) before accessing data. Prevents concurrent conflicts. Shared Lock (S-Lock) Allows concurrent reads. Multiple transactions can hold S-lock on same data. Exclusive Lock (X- Lock) Required for write operations. Only one transaction can hold X-lock at a time.
Two-Phase Locking (2PL) Growing phase: acquire locks. Shrinking phase: release locks. Ensures serializability. Timestamp Ordering Each transaction is assigned a timestamp. Conflicts resolved based on timestamp order. Deadlock Two or more transactions wait for each other's locks indefinitely. Deadlock Prevention Assign priorities; abort lower-priority transactions. Deadlock Detection Use a wait-for graph; detect cycles.
5.6 Short Q&A — Transactions..................................................................................................................
Q: What are ACID properties?
A: Atomicity (all or nothing), Consistency (valid state to valid state), Isolation (independent execution),
Durability (committed changes are permanent).
Q: What is a dirty read?
A: A dirty read occurs when a transaction reads data that has been modified by another transaction but not
yet committed.
Q: What is deadlock in DBMS?
A: A deadlock occurs when two or more transactions are waiting for each other to release locks, resulting
in an indefinite wait. It can be detected using a wait-for graph.
Q: What is 2PL (Two-Phase Locking)?
A: A protocol with two phases: Growing phase (only acquire locks, cannot release) and Shrinking phase
(only release locks, cannot acquire). It ensures conflict serializability.
Q: What is the difference between COMMIT and ROLLBACK?
A: COMMIT permanently saves all changes made during the transaction. ROLLBACK undoes all changes
made during the transaction, restoring the database to its previous state.
A: A primary index is built on an ordered key field of the file. A secondary index is built on a non-ordering
field and provides an alternative access path.
Chapter 7: Database Recovery.....................................................................................................................
7.1 Failure Types.......................................................................................................................................
Transaction Failure Logical errors (bad input, constraint violation) or system errors causing abort. System Crash Hardware or software failure — volatile memory (RAM) is lost. Disk Failure Physical damage to disk — persistent storage is lost. Communication Failure Network failure in distributed databases.
7.2 Recovery Techniques..........................................................................................................................
Log-Based Recovery Every database modification is recorded in a log before being applied. Used for UNDO and REDO operations. UNDO Reverses all changes of an uncommitted transaction (used on failure). REDO Re-applies all changes of a committed transaction (ensures durability after crash). Checkpointing Periodically saves state of all committed transactions to disk, reducing recovery time. Shadow Paging Maintains a shadow copy of database pages. On commit, replace current with shadow. Deferred Update All updates written to log first; written to database only at commit. Immediate Update Updates written to database immediately; log used for undo on failure.
7.3 Short Q&A — Recovery.......................................................................................................................
Q: What is a checkpoint in database recovery?
A: A checkpoint is a point in the log where all data in memory (buffer) is written to disk. During recovery,
only transactions after the last checkpoint need to be redone or undone.
Q: What is the difference between UNDO and REDO?
A: UNDO reverses changes of uncommitted transactions after a failure. REDO reapplies changes of
committed transactions that may not have been written to disk.
Q: What is deferred vs immediate update?
A: In deferred update, changes are not written to database until transaction commits (no UNDO needed). In
immediate update, changes are written as they happen and the log is used for UNDO if failure occurs.
Q: What is a view in SQL?
A: A view is a virtual table created by a SELECT query. It does not store data physically but provides a way
to simplify complex queries and enhance security.
Q: What is a trigger?
A: A trigger is a stored procedure that automatically fires in response to specific events (INSERT, UPDATE,
DELETE) on a table.
Q: What is the CAP theorem?
A: The CAP theorem states that a distributed database system can only guarantee two of these three
properties simultaneously: Consistency, Availability, and Partition Tolerance.
Q: What is the difference between SQL and NoSQL?
A: SQL databases are relational, use structured schemas and tables, and are ACID compliant. NoSQL
databases are non-relational, flexible in schema, and designed for scalability and unstructured data.
Chapter 9: Quick Revision — Most Important MCQs....................................................................................
The following are high-frequency exam questions. Memorize these thoroughly.
Q: What does DDL stand for?
A: Data Definition Language. Commands: CREATE, ALTER, DROP, TRUNCATE.
Q: What does DML stand for?
A: Data Manipulation Language. Commands: SELECT, INSERT, UPDATE, DELETE.
Q: What is the full form of ACID?
A: Atomicity, Consistency, Isolation, Durability.
Q: What is a Primary Key?
A: An attribute (or combination) that uniquely identifies each row in a table and cannot be NULL.
Q: What is a Foreign Key?
A: An attribute in one table that references the primary key of another table to enforce referential integrity.
Q: What is 1NF?
A: A relation is in 1NF if all attributes contain atomic (single) values and there are no repeating groups.
Q: What is 2NF?
A: A relation is in 2NF if it is in 1NF and every non-key attribute is fully functionally dependent on the entire
primary key (no partial dependency).
Q: What is 3NF?
A: A relation is in 3NF if it is in 2NF and there are no transitive dependencies (no non-key attribute depends
on another non-key attribute).
Q: What is BCNF?
A: Boyce-Codd Normal Form is a stronger version of 3NF where for every functional dependency X→Y, X
must be a superkey.
Q: What is an ER diagram?
A: Entity-Relationship diagram is a graphical representation of entities and their relationships in a database
system.
Q: What is a schema?
A: A schema is the logical structure/design of a database — defining tables, columns, data types, and
relationships.
Q: What is a cursor in SQL?
A: A cursor is a database object used to retrieve and process individual rows from a result set one at a
time.
Q: What is data warehousing?
A: A data warehouse is a large centralized repository of integrated data from multiple sources, used for
reporting and data analysis.
Q: What is OLTP vs OLAP?
A: OLTP (Online Transaction Processing) handles day-to-day transactions. OLAP (Online Analytical
Processing) handles complex queries for analysis and reporting.
Q: What is a superkey?
A: A superkey is any set of attributes that can uniquely identify a tuple. A primary key is a minimal
superkey.