

















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
ALBERTA DATA ENGINEER EXAM| QUESTIONS AND CORRECT ANSWERS (VERIFIED ANSWERS) PLUS RATIONALES 2026 Q&A| INSTANTDOWNLOADPDF
Typology: Exams
1 / 25
This page cannot be seen from the preview
Don't miss anything!


















1 – 10: Data Engineering Foundations
1. Data engineering primarily focuses on: A. Model training only B. Building and maintaining data pipelines C. Image processing D. UI design Answer: B Rationale: Data engineering is about reliable data movement and storage systems. 2. ETL stands for: A. Extract Transform Load B. Encode Train Learn C. Execute Transfer Log D. Extract Test Load Answer: A Rationale: Core data pipeline process. 3. ELT differs from ETL because: A. Data is deleted B. Transformation happens after loading C. No storage is used D. No extraction occurs Answer: B Rationale: Modern cloud systems load first, transform later.
4. Data pipelines are used to: A. Train CNNs B. Move and process data C. Build UI D. Encrypt files only Answer: B Rationale: Automate data flow. 5. Structured data is: A. Images B. Tabular data C. Videos D. Audio only Answer: B Rationale: Organized in rows/columns. 6. Unstructured data includes: A. CSV files B. JSON tables C. Images and text D. SQL tables Answer: C Rationale: No fixed schema. 7. Semi-structured data example: A. CSV B. JSON C. Image D. Video Answer: B Rationale: Has flexible schema. 8. Data ingestion means: A. Deleting data
C. Removing pipelines D. Training models Answer: B Rationale: Improves data quality.
13. Data pipeline orchestration tool example: A. TensorFlow B. Airflow C. PyTorch D. OpenCV Answer: B Rationale: Manages workflow scheduling. 14. DAG in data pipelines stands for: A. Data Aggregation Group B. Directed Acyclic Graph C. Data Analysis Grid D. Distributed AI Graph Answer: B Rationale: Pipeline dependency structure. 15. Batch processing means: A. Real-time processing B. Processing data in chunks C. No processing D. Random processing Answer: B Rationale: Periodic processing mode. 16. Stream processing means: A. Delayed processing B. Real-time data processing C. Offline processing D. Data deletion
Answer: B Rationale: Continuous data flow.
17. Kafka is used for: A. Model training B. Data streaming C. Image processing D. UI design Answer: B Rationale: Real-time data pipeline system. 18. Data latency refers to: A. Storage size B. Delay in data processing C. Model accuracy D. CPU speed Answer: B Rationale: Time lag in pipelines. 19. Idempotency ensures: A. Random results B. Same output on repeated execution C. Faster GPUs D. Data loss Answer: B Rationale: Important for pipelines. 20. Data lineage tracks: A. Model accuracy B. Data origin and transformations C. CPU usage D. Images Answer: B Rationale: Traceability of data flow.
25. Partitioning data improves: A. Latency B. Parallel processing C. Model accuracy only D. Storage deletion Answer: B Rationale: Enables distributed computation. 26. Replication ensures: A. Data loss B. Fault tolerance C. Lower storage D. Random access Answer: B Rationale: Copies data across nodes. 27. Sharding is: A. Data encryption B. Splitting data across servers C. Training models D. Visualization Answer: B Rationale: Horizontal scaling method. 28. CAP theorem includes: A. CPU, Accuracy, Precision B. Consistency, Availability, Partition tolerance C. Cache, API, Pipeline D. None Answer: B Rationale: Distributed system tradeoffs. 29. ACID properties are used in: A. ML models
B. Databases C. CNNs D. GPUs Answer: B Rationale: Transaction reliability.
30. NoSQL databases are used for: A. Structured only B. Flexible schema data C. Images only D. Training models Answer: B Rationale: Scalable unstructured data. 31. SQL is used for: A. Image processing B. Structured queries C. GPU computation D. NLP only Answer: B Rationale: Relational data management. 32. Columnar databases are optimized for: A. Writes B. Analytics queries C. Images D. Training Answer: B Rationale: Faster aggregation. 33. OLTP systems handle: A. Analytics B. Transactions C. Images
Answer: B Rationale: Faster retrieval.
38. Cache is used for: A. Long-term storage B. Fast data access C. Training only D. Encryption Answer: B Rationale: Reduces latency. 39. Data warehouse tools include: A. Spark only B. Snowflake, BigQuery C. OpenCV D. TensorFlow Answer: B Rationale: Cloud analytics platforms. 40. Data lakehouse combines: A. ML + CV B. Data lake + warehouse C. CPU + GPU D. SQL + HTML Answer: B Rationale: Modern hybrid architecture. **41 – 60: Pipelines, Streaming & Cloud
Answer: B Rationale: Continuous data flow.
42. Apache Airflow is used for: A. Model training B. Pipeline orchestration C. Image processing D. NLP Answer: B Rationale: Workflow scheduling. 43. Apache Kafka is a: A. Database B. Streaming platform C. CNN model D. GPU tool Answer: B Rationale: Real-time messaging system. 44. Data pipeline failure requires: A. Ignore B. Retry mechanism C. Delete system D. Stop ML Answer: B Rationale: Fault tolerance. 45. Schema-on-read means: A. Schema defined before storage B. Schema applied at query time C. No schema D. Random structure Answer: B Rationale: Used in data lakes.
50. Fault tolerance ensures: A. System failure B. System continues despite errors C. No data D. Slow processing Answer: B Rationale: Reliability in pipelines. **61 – 80: Advanced Engineering, Ethics & Optimization
54. Data encryption protects: A. Speed B. Confidentiality C. Accuracy D. Models Answer: B Rationale: Secures sensitive data. 55. ETL pipelines are often: A. Manual B. Automated C. Random D. Static Answer: B Rationale: Scheduled workflows. 56. Monitoring pipelines ensures: A. Failures ignored B. System health C. No logging D. Random outputs Answer: B Rationale: Operational stability. 57. Data skew refers to: A. Balanced data B. Uneven distribution C. Model accuracy D. Storage size Answer: B Rationale: Affects performance. 58. Feature engineering is: A. Ignored in pipelines
D. Networking Answer: B Rationale: Stores unstructured and structured data.
63. BigQuery is a: A. ML model B. Cloud data warehouse C. Image tool D. API gateway Answer: B Rationale: Serverless analytics warehouse. 64. Azure Data Factory is used for: A. Image processing B. Data pipeline orchestration C. GPU training D. NLP only Answer: B Rationale: ETL workflow automation. 65. GCP Pub/Sub is used for: A. Storage B. Messaging/streaming C. Training models D. Compression Answer: B Rationale: Real-time event ingestion. 66. Cloud scalability means: A. Fixed resources B. On-demand resource expansion C. Offline systems D. Manual scaling only
Answer: B Rationale: Elastic infrastructure.
67. Multi-cloud strategy means: A. One cloud provider B. Using multiple cloud providers C. No cloud usage D. Local servers only Answer: B Rationale: Avoids vendor lock-in. 68. Serverless architecture means: A. No servers exist B. Managed infrastructure abstraction C. Manual scaling D. Offline execution Answer: B Rationale: Cloud handles servers automatically. 69. Data lakehouse combines: A. ML + CV B. Data lake + data warehouse C. CPU + GPU D. SQL + NoSQL only Answer: B Rationale: Unified analytics architecture. 70. Cloud elasticity refers to: A. Fixed capacity B. Automatic scaling C. Manual pipelines D. Static storage Answer: B Rationale: Dynamic resource allocation.
75. Backpressure in streaming means: A. Faster processing B. System overload handling C. Data deletion D. Training delay Answer: B Rationale: Controls data flow rate. 76. Checkpointing in pipelines is used for: A. UI design B. Fault recovery C. Image compression D. Labeling Answer: B Rationale: Saves pipeline state. 77. Data deduplication removes: A. Features B. Duplicate records C. Models D. Pipelines Answer: B Rationale: Improves data quality. 78. Schema evolution refers to: A. Static schema B. Changing data structure over time C. Model training D. Image processing Answer: B Rationale: Handles changing data formats. 79. Event-driven pipelines are triggered by: A. Manual execution
B. Data events C. Random timing D. GPU usage Answer: B Rationale: Reactive architecture.
80. Pipeline orchestration ensures: A. Random execution B. Ordered workflow execution C. Data loss D. Model deletion Answer: B Rationale: Manages task dependencies. **81 – 90: Data Quality, Governance & Security