








































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This certification exam guide introduces databases and data engineering fundamentals, including data models, relational databases, SQL concepts, normalization, data pipelines, and basic data processing architectures. Learners gain a solid conceptual foundation in managing and engineering data systems. The guide is exam-oriented and supports validation of core database and data engineering competencies.
Typology: Exams
1 / 80
This page cannot be seen from the preview
Don't miss anything!









































































Question 1. Which of the following best describes the primary purpose of a relational database management system (RDBMS)? A) To store unstructured binary files B) To provide a hierarchical file system C) To manage data using tables with relationships and enforce integrity constraints D) To execute machine‑learning algorithms on raw data Answer: C Explanation: An RDBMS organizes data into tables (relations) and enforces primary‑key, foreign‑key, and domain constraints, enabling reliable relational operations. Question 2. In the context of database requirements, what does “data persistence” refer to? A) The ability of a system to cache data temporarily B) The guarantee that data remains stored after the application terminates C) The speed at which data can be retrieved D) The capability of a database to handle concurrent users Answer: B Explanation: Persistence means that once data is written to the database, it survives crashes, power loss, and program termination. Question 3. Which data model component represents the logical arrangement of entities and relationships before physical implementation? A) Physical data model B) Conceptual data model C) Logical data model D) External data model Answer: C Explanation: A logical data model defines tables, columns, and relationships independent of storage details, bridging conceptual and physical models.
Question 4. What is the main advantage of using a client‑server architecture for database interaction? A) Eliminates the need for network communication B) Allows the database to run on the same machine as the application C) Separates presentation logic from data storage, enabling scalability and centralized management D) Guarantees zero latency for all queries Answer: C Explanation: Client‑server separates the application (client) from the DBMS (server), facilitating multi‑user access, security, and scaling. Question 5. Which relational algebra operation would you use to retrieve only the “EmployeeName” column from the Employees table? A) σ (selection) B) π (projection) C) ρ (rename) D) ⋈ (join) Answer: B Explanation: Projection (π) selects specific attributes (columns) from a relation, discarding the rest. Question 6. In relational algebra, the selection operator (σ) is analogous to which SQL clause? A) GROUP BY B) ORDER BY C) WHERE D) HAVING Answer: C Explanation: σ filters rows based on a predicate, just as the WHERE clause filters rows in an SQL SELECT statement. Question 7. Which integrity constraint ensures that every row in a table can be uniquely identified? A) Referential integrity
D) Export, Transfer, Link Answer: A Explanation: ETL is the classic pipeline that extracts data from sources, transforms it to fit target schemas, and loads it into a destination system. Question 11. Which data processing paradigm is most suitable for handling continuous streams of sensor data in near real‑time? A) Batch processing B) Micro‑batch processing C) Stream processing D) Offline processing Answer: C Explanation: Stream processing ingests and processes data incrementally as it arrives, providing low‑latency results. Question 12. In a data lifecycle, which stage follows “ingestion”? A) Archiving B) Transformation C) Deletion D) Visualization Answer: B Explanation: After data is ingested, it typically undergoes cleaning, validation, and transformation before storage or analysis. Question 13. Which normalization form eliminates transitive dependencies? A) First Normal Form (1NF) B) Second Normal Form (2NF) C) Third Normal Form (3NF) D) Boyce‑Codd Normal Form (BCNF) Answer: C
Explanation: 3NF removes attributes that depend on non‑key attributes, thereby eliminating transitive dependencies. Question 14. When might denormalization be deliberately applied in a data warehouse? A) To reduce storage cost B) To simplify application code C) To improve read‑query performance for analytical workloads D) To enforce stricter referential integrity Answer: C Explanation: Denormalization adds redundant data to speed up complex analytical queries at the expense of extra storage and potential update overhead. Question 15. Which SQL statement is used to change the definition of an existing table by adding a new column? A) INSERT INTO B) ALTER TABLE … ADD COLUMN C) CREATE TABLE … AS D) UPDATE … SET Answer: B Explanation: ALTER TABLE with the ADD COLUMN clause modifies the table schema by adding a new attribute. Question 16. Which DML command would you use to permanently remove rows that meet a specific condition? A) DELETE FROM … WHERE … B) DROP TABLE … C) TRUNCATE TABLE … D) UPDATE … SET … WHERE … Answer: A Explanation: DELETE removes rows matching a WHERE clause while preserving table structure.
A) Ensures a column value is unique across the table B) Guarantees that a column cannot be NULL C) Validates that a column’s value satisfies a specified Boolean expression D) Establishes a relationship between two tables Answer: C Explanation: CHECK enforces custom rules (e.g., salary > 0) on column values. Question 21. In data integration, what does the “ELT” approach stand for? A) Extract, Load, Transform B) Encode, Load, Transfer C) Extract, Link, Transfer D) Export, Load, Transform Answer: A Explanation: ELT extracts data, loads it into the target system first, then transforms it using the target’s processing power. Question 22. Which data type would you choose to store a precise monetary amount in a relational database? A) FLOAT B) VARCHAR C) DECIMAL (or NUMERIC) D) BLOB Answer: C Explanation: DECIMAL/NUMERIC stores fixed‑point numbers without rounding errors, suitable for currency. Question 23. What is the primary reason to create an index on a column? A) To enforce referential integrity B) To speed up query retrieval for that column
C) To reduce storage space D) To automatically generate primary keys Answer: B Explanation: Indexes provide a data structure (e.g., B‑tree) that allows faster search and lookup operations. Question 24. Which SQL clause is used to limit the number of rows returned by a query in PostgreSQL? A) TOP B) LIMIT C) ROWCOUNT D) FETCH FIRST Answer: B Explanation: PostgreSQL uses LIMIT to restrict result set size; MySQL also supports LIMIT. Question 25. When designing a schema for both OLTP and OLAP workloads, which design principle is most appropriate? A) Fully normalized tables only B) Fully denormalized tables only C) A hybrid approach: normalized for transactional tables, denormalized star/snowflake schemas for analytical tables D) Use a single flat file for all data Answer: C Explanation: OLTP benefits from normalization to avoid anomalies, while OLAP often uses denormalized star/snowflake schemas for query performance. Question 26. Which of the following statements about ACID properties is FALSE? A) Atomicity ensures that a transaction is all‑or‑nothing. B) Consistency guarantees that a transaction leaves the database in a valid state. C) Isolation prevents concurrent transactions from interfering with each other.
Explanation: Validation checks data quality, preventing bad data from contaminating the target system. Question 30. Which of the following techniques is most effective for handling missing numeric values in a data cleaning process? A) Deleting the entire row B) Replacing with the column’s mean or median value C) Converting missing values to the string “NULL” D) Ignoring the missing values during analysis Answer: B Explanation: Imputing with mean/median preserves dataset size and reduces bias, especially when missingness is random. Question 31. In a relational database, what does the term “schema” refer to? A) The physical storage location of data files B) The set of tables, columns, data types, constraints, and relationships that define the database structure C) The SQL statements used to query data D) The backup schedule for the database Answer: B Explanation: A schema is the logical blueprint describing how data is organized. Question 32. Which of the following best describes the “star schema” used in data warehousing? A) A highly normalized set of tables with many foreign keys B) A central fact table linked to multiple denormalized dimension tables C) A single flat table containing all data D) A graph database model with nodes and edges Answer: B Explanation: Star schema places fact data at the center and dimension tables around it, facilitating simple, fast queries.
Question 33. What is the effect of the SQL clause “DISTINCT” in a SELECT statement? A) It sorts the result set alphabetically. B) It removes duplicate rows from the result set. C) It groups rows based on the selected columns. D) It limits the number of rows returned. Answer: B Explanation: DISTINCT filters out duplicate rows, returning only unique combinations of selected columns. Question 34. Which of the following is a common cause of a “deadlock” in a relational database? A) Two sessions trying to read the same row simultaneously B) Two sessions each holding a lock that the other needs to proceed C) A single session executing a very large query D) Insufficient disk space for temporary tables Answer: B Explanation: A deadlock occurs when transactions wait on each other’s locks, creating a cycle that must be broken. Question 35. In the context of concurrency control, what does “optimistic locking” assume? A) Conflicts will occur frequently, so locks are acquired early. B) Conflicts are rare, so transactions proceed without locks and verify at commit time. C) All rows are locked for the duration of a transaction. D) Only read locks are used, never write locks. Answer: B Explanation: Optimistic locking checks for conflicts at commit, reducing lock contention when conflicts are unlikely. Question 36. Which SQL clause would you use to rename a column in the result set without altering the underlying table? A) RENAME COLUMN
D) Use of SQL as the primary query language Answer: C Explanation: NoSQL databases typically allow schema‑less designs and are built to scale out across many nodes. Question 40. When performing batch processing, which of the following is a typical advantage? A) Immediate results as data arrives B) Ability to process massive volumes of data at scheduled intervals, optimizing resource usage C) Real‑time monitoring of individual events D) No need for data storage before processing Answer: B Explanation: Batch jobs run on large data sets during off‑peak times, allowing efficient use of compute resources. Question 41. Which SQL statement would you use to permanently remove a table and all its data? A) DELETE FROM table_name; B) TRUNCATE TABLE table_name; C) DROP TABLE table_name; D) ALTER TABLE table_name DELETE; Answer: C Explanation: DROP TABLE deletes the table definition and all stored data from the database. Question 42. In a data warehouse, what is the purpose of a “slowly changing dimension” (SCD) type 2? A) To overwrite old dimension values with new ones B) To keep only the most recent value in the dimension table C) To create a new row for each change, preserving historical versions D) To delete obsolete dimension rows automatically Answer: C
Explanation: SCD Type 2 tracks history by inserting a new row with a new surrogate key whenever an attribute changes. Question 43. Which of the following best describes a “foreign key cascade delete” behavior? A) Deleting a parent row automatically deletes all child rows referencing it. B) Deleting a child row automatically deletes the parent row. C) Deleting a row sets foreign‑key values to zero. D) Deleting a row causes the transaction to abort. Answer: A Explanation: Cascade delete propagates the delete operation from the referenced (parent) row to dependent (child) rows. Question 44. What is the primary function of a “materialized view” in a database? A) To store the result set of a query physically for faster read performance B) To enforce referential integrity constraints C) To replace indexes on large tables D) To provide a temporary table for a single session Answer: A Explanation: Materialized views cache query results, reducing execution time for expensive queries. Question 45. In SQL, which clause would you use to order the final result set by the “created_date” column in descending order? A) GROUP BY created_date DESC B) ORDER BY created_date DESC C) SORT BY created_date DESC D) HAVING created_date DESC Answer: B Explanation: ORDER BY determines the sort order of returned rows; DESC specifies descending order.
Question 49. Which of the following best describes the purpose of a “check constraint” that uses the expression “salary > 0”? A) To enforce that each employee has a unique salary B) To guarantee that salary values cannot be negative or zero C) To automatically calculate bonuses D) To link the salary column to another table Answer: B Explanation: The constraint prevents insertion or update of rows where salary is less than or equal to zero. Question 50. In a relational database, what does the term “cardinality” refer to? A) The size of the database file on disk B) The number of rows in a table or the number of occurrences in a relationship (e.g., one‑to‑many) C) The speed of query execution D) The type of primary key used Answer: B Explanation: Cardinality describes the count of elements in a set, such as rows in a table or the multiplicity of a relationship. Question 51. Which of the following is a primary advantage of using a “partitioned table” in a large database? A) It eliminates the need for indexes. B) It automatically compresses all data. C) It improves query performance and manageability by dividing data into smaller, more manageable pieces. D) It forces all rows into a single physical file. Answer: C Explanation: Partitioning allows queries to scan only relevant partitions, reducing I/O and simplifying maintenance tasks like archiving.
Question 52. What does the SQL clause “FOR UPDATE” do when appended to a SELECT statement? A) It creates a new table with the selected rows. B) It locks the selected rows for subsequent update operations within the same transaction. C) It updates the selected rows automatically. D) It forces the database to use a full table scan. Answer: B Explanation: FOR UPDATE acquires write locks on the retrieved rows, preventing other transactions from modifying them until commit or rollback. Question 53. In data engineering, what is the main difference between “schema on read” and “schema on write”? A) Schema on read validates data before insertion; schema on write validates after retrieval. B) Schema on read applies structure when data is accessed; schema on write enforces structure at ingestion time. C) Schema on read uses NoSQL databases; schema on write uses relational databases. D) Schema on read automatically generates indexes; schema on write does not. Answer: B Explanation: Schema on read defers structuring until query time (common in data lakes), whereas schema on write imposes a fixed schema during data load. Question 54. Which of the following SQL statements would correctly create a table with a composite primary key consisting of columns “order_id” and “product_id”? A) CREATE TABLE Orders (order_id INT, product_id INT, PRIMARY KEY (order_id, product_id)); B) CREATE TABLE Orders (order_id INT PRIMARY KEY, product_id INT PRIMARY KEY); C) CREATE TABLE Orders (order_id INT, product_id INT, UNIQUE (order_id, product_id)); D) CREATE TABLE Orders (order_id INT, product_id INT, CONSTRAINT pk_orders UNIQUE (order_id, product_id)); Answer: A Explanation: Declaring PRIMARY KEY (order_id, product_id) defines a composite key; option C creates a unique constraint but not a primary key.
B) They appear with NULL values for columns from the right table. C) They cause the query to fail. D) They are duplicated for each row in the right table. Answer: B Explanation: LEFT OUTER JOIN preserves all left‑side rows, inserting NULLs for non‑matching right‑side columns. Question 59. Which of the following is a common technique for handling “dirty” data that contains inconsistent date formats? A) Dropping the entire column. B) Converting all dates to a canonical ISO 8601 format during the cleaning stage. C) Storing dates as plain text without validation. D) Ignoring the column in all analyses. Answer: B Explanation: Standardizing date representations ensures consistency and enables reliable querying. Question 60. In a relational database, what does the term “self‑join” mean? A) Joining a table to itself to relate rows within the same table B) Joining two tables that have the same name but reside in different schemas C) Joining a view with its underlying base table D) Joining a table without specifying any join condition Answer: A Explanation: A self‑join uses two aliases of the same table to compare rows against each other (e.g., hierarchical relationships). Question 61. Which of the following best describes a “surrogate key”? A) A natural key derived from business data B) An automatically generated, system‑managed identifier used as a primary key C) A foreign key that references another table’s primary key
D) A composite key consisting of multiple columns Answer: B Explanation: Surrogate keys are usually integer sequences or UUIDs that have no business meaning but simplify joins and indexing. Question 62. In SQL, which clause would you use to filter groups after aggregation based on a condition that involves an aggregate function? A) WHERE B) HAVING C) ON D) GROUP BY Answer: B Explanation: HAVING evaluates conditions on aggregated data, while WHERE applies before aggregation. Question 63. Which of the following is an advantage of using column‑store storage for analytical workloads? A) Faster row‑level inserts and updates B) Improved compression and query performance when selecting a subset of columns C) Automatic enforcement of foreign keys D) Simplified transaction logging Answer: B Explanation: Columnar storage compresses similar data together and reads only needed columns, boosting analytical query speed. Question 64. When designing a data model for a many‑to‑many relationship, which of the following structures is typically used? A) A single table with two foreign keys and a composite primary key (junction table) B) Two tables with a shared primary key C) A view that merges the two tables D) A denormalized table containing duplicated data from both sides