ALBERTA DATA ENGINEER EXAM|, Exams of Advanced Data Analysis

ALBERTA DATA ENGINEER EXAM| QUESTIONS AND CORRECT ANSWERS (VERIFIED ANSWERS) PLUS RATIONALES 2026 Q&A| INSTANTDOWNLOADPDF

Typology: Exams

2025/2026

Available from 04/22/2026

wergnkses254
wergnkses254 🇺🇸

4.4

(8)

5.5K documents

1 / 25

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ALBERTA DATA ENGINEERING PIPELINE EXAM
QUESTIONS AND CORRECT ANSWERS (VERIFIED
ANSWERS) PLUS RATIONALE 2026 Q&A|INSTANT
DOWNLOAD PDF
1–10: Data Engineering Foundations
1. Data engineering primarily focuses on:
A. Model training only
B. Building and maintaining data pipelines
C. Image processing
D. UI design
Answer: B
Rationale: Data engineering is about reliable data movement and storage systems.
2. ETL stands for:
A. Extract Transform Load
B. Encode Train Learn
C. Execute Transfer Log
D. Extract Test Load
Answer: A
Rationale: Core data pipeline process.
3. ELT differs from ETL because:
A. Data is deleted
B. Transformation happens after loading
C. No storage is used
D. No extraction occurs
Answer: B
Rationale: Modern cloud systems load first, transform later.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19

Partial preview of the text

Download ALBERTA DATA ENGINEER EXAM| and more Exams Advanced Data Analysis in PDF only on Docsity!

ALBERTA DATA ENGINEERING PIPELINE EXAM

QUESTIONS AND CORRECT ANSWERS (VERIFIED

ANSWERS) PLUS RATIONALE 2026 Q&A|INSTANT

DOWNLOAD PDF

1 – 10: Data Engineering Foundations

1. Data engineering primarily focuses on: A. Model training only B. Building and maintaining data pipelines C. Image processing D. UI design Answer: B Rationale: Data engineering is about reliable data movement and storage systems. 2. ETL stands for: A. Extract Transform Load B. Encode Train Learn C. Execute Transfer Log D. Extract Test Load Answer: A Rationale: Core data pipeline process. 3. ELT differs from ETL because: A. Data is deleted B. Transformation happens after loading C. No storage is used D. No extraction occurs Answer: B Rationale: Modern cloud systems load first, transform later.

4. Data pipelines are used to: A. Train CNNs B. Move and process data C. Build UI D. Encrypt files only Answer: B Rationale: Automate data flow. 5. Structured data is: A. Images B. Tabular data C. Videos D. Audio only Answer: B Rationale: Organized in rows/columns. 6. Unstructured data includes: A. CSV files B. JSON tables C. Images and text D. SQL tables Answer: C Rationale: No fixed schema. 7. Semi-structured data example: A. CSV B. JSON C. Image D. Video Answer: B Rationale: Has flexible schema. 8. Data ingestion means: A. Deleting data

C. Removing pipelines D. Training models Answer: B Rationale: Improves data quality.

13. Data pipeline orchestration tool example: A. TensorFlow B. Airflow C. PyTorch D. OpenCV Answer: B Rationale: Manages workflow scheduling. 14. DAG in data pipelines stands for: A. Data Aggregation Group B. Directed Acyclic Graph C. Data Analysis Grid D. Distributed AI Graph Answer: B Rationale: Pipeline dependency structure. 15. Batch processing means: A. Real-time processing B. Processing data in chunks C. No processing D. Random processing Answer: B Rationale: Periodic processing mode. 16. Stream processing means: A. Delayed processing B. Real-time data processing C. Offline processing D. Data deletion

Answer: B Rationale: Continuous data flow.

17. Kafka is used for: A. Model training B. Data streaming C. Image processing D. UI design Answer: B Rationale: Real-time data pipeline system. 18. Data latency refers to: A. Storage size B. Delay in data processing C. Model accuracy D. CPU speed Answer: B Rationale: Time lag in pipelines. 19. Idempotency ensures: A. Random results B. Same output on repeated execution C. Faster GPUs D. Data loss Answer: B Rationale: Important for pipelines. 20. Data lineage tracks: A. Model accuracy B. Data origin and transformations C. CPU usage D. Images Answer: B Rationale: Traceability of data flow.

25. Partitioning data improves: A. Latency B. Parallel processing C. Model accuracy only D. Storage deletion Answer: B Rationale: Enables distributed computation. 26. Replication ensures: A. Data loss B. Fault tolerance C. Lower storage D. Random access Answer: B Rationale: Copies data across nodes. 27. Sharding is: A. Data encryption B. Splitting data across servers C. Training models D. Visualization Answer: B Rationale: Horizontal scaling method. 28. CAP theorem includes: A. CPU, Accuracy, Precision B. Consistency, Availability, Partition tolerance C. Cache, API, Pipeline D. None Answer: B Rationale: Distributed system tradeoffs. 29. ACID properties are used in: A. ML models

B. Databases C. CNNs D. GPUs Answer: B Rationale: Transaction reliability.

30. NoSQL databases are used for: A. Structured only B. Flexible schema data C. Images only D. Training models Answer: B Rationale: Scalable unstructured data. 31. SQL is used for: A. Image processing B. Structured queries C. GPU computation D. NLP only Answer: B Rationale: Relational data management. 32. Columnar databases are optimized for: A. Writes B. Analytics queries C. Images D. Training Answer: B Rationale: Faster aggregation. 33. OLTP systems handle: A. Analytics B. Transactions C. Images

Answer: B Rationale: Faster retrieval.

38. Cache is used for: A. Long-term storage B. Fast data access C. Training only D. Encryption Answer: B Rationale: Reduces latency. 39. Data warehouse tools include: A. Spark only B. Snowflake, BigQuery C. OpenCV D. TensorFlow Answer: B Rationale: Cloud analytics platforms. 40. Data lakehouse combines: A. ML + CV B. Data lake + warehouse C. CPU + GPU D. SQL + HTML Answer: B Rationale: Modern hybrid architecture. **41 – 60: Pipelines, Streaming & Cloud

  1. Real-time pipelines require:** A. Batch processing B. Stream processing C. Manual execution D. Offline training

Answer: B Rationale: Continuous data flow.

42. Apache Airflow is used for: A. Model training B. Pipeline orchestration C. Image processing D. NLP Answer: B Rationale: Workflow scheduling. 43. Apache Kafka is a: A. Database B. Streaming platform C. CNN model D. GPU tool Answer: B Rationale: Real-time messaging system. 44. Data pipeline failure requires: A. Ignore B. Retry mechanism C. Delete system D. Stop ML Answer: B Rationale: Fault tolerance. 45. Schema-on-read means: A. Schema defined before storage B. Schema applied at query time C. No schema D. Random structure Answer: B Rationale: Used in data lakes.

50. Fault tolerance ensures: A. System failure B. System continues despite errors C. No data D. Slow processing Answer: B Rationale: Reliability in pipelines. **61 – 80: Advanced Engineering, Ethics & Optimization

  1. Data versioning tracks:** A. Images B. Dataset changes over time C. Models only D. GPUs Answer: B Rationale: Reproducibility. 52. Data governance ensures: A. Chaos B. Compliance and control C. Random storage D. Faster GPUs Answer: B Rationale: Data management rules. 53. GDPR relates to: A. Gaming B. Data privacy regulation C. Image processing D. Clustering Answer: B Rationale: European privacy law.

54. Data encryption protects: A. Speed B. Confidentiality C. Accuracy D. Models Answer: B Rationale: Secures sensitive data. 55. ETL pipelines are often: A. Manual B. Automated C. Random D. Static Answer: B Rationale: Scheduled workflows. 56. Monitoring pipelines ensures: A. Failures ignored B. System health C. No logging D. Random outputs Answer: B Rationale: Operational stability. 57. Data skew refers to: A. Balanced data B. Uneven distribution C. Model accuracy D. Storage size Answer: B Rationale: Affects performance. 58. Feature engineering is: A. Ignored in pipelines

D. Networking Answer: B Rationale: Stores unstructured and structured data.

63. BigQuery is a: A. ML model B. Cloud data warehouse C. Image tool D. API gateway Answer: B Rationale: Serverless analytics warehouse. 64. Azure Data Factory is used for: A. Image processing B. Data pipeline orchestration C. GPU training D. NLP only Answer: B Rationale: ETL workflow automation. 65. GCP Pub/Sub is used for: A. Storage B. Messaging/streaming C. Training models D. Compression Answer: B Rationale: Real-time event ingestion. 66. Cloud scalability means: A. Fixed resources B. On-demand resource expansion C. Offline systems D. Manual scaling only

Answer: B Rationale: Elastic infrastructure.

67. Multi-cloud strategy means: A. One cloud provider B. Using multiple cloud providers C. No cloud usage D. Local servers only Answer: B Rationale: Avoids vendor lock-in. 68. Serverless architecture means: A. No servers exist B. Managed infrastructure abstraction C. Manual scaling D. Offline execution Answer: B Rationale: Cloud handles servers automatically. 69. Data lakehouse combines: A. ML + CV B. Data lake + data warehouse C. CPU + GPU D. SQL + NoSQL only Answer: B Rationale: Unified analytics architecture. 70. Cloud elasticity refers to: A. Fixed capacity B. Automatic scaling C. Manual pipelines D. Static storage Answer: B Rationale: Dynamic resource allocation.

75. Backpressure in streaming means: A. Faster processing B. System overload handling C. Data deletion D. Training delay Answer: B Rationale: Controls data flow rate. 76. Checkpointing in pipelines is used for: A. UI design B. Fault recovery C. Image compression D. Labeling Answer: B Rationale: Saves pipeline state. 77. Data deduplication removes: A. Features B. Duplicate records C. Models D. Pipelines Answer: B Rationale: Improves data quality. 78. Schema evolution refers to: A. Static schema B. Changing data structure over time C. Model training D. Image processing Answer: B Rationale: Handles changing data formats. 79. Event-driven pipelines are triggered by: A. Manual execution

B. Data events C. Random timing D. GPU usage Answer: B Rationale: Reactive architecture.

80. Pipeline orchestration ensures: A. Random execution B. Ordered workflow execution C. Data loss D. Model deletion Answer: B Rationale: Manages task dependencies. **81 – 90: Data Quality, Governance & Security

  1. Data quality includes:** A. Random data B. Accuracy, completeness, consistency C. Model speed D. GPU usage Answer: B Rationale: Ensures usable data. 82. Data governance defines: A. Model architecture B. Rules for data usage and control C. Image processing D. Training loops Answer: B Rationale: Data management policies. 83. Data lineage tracks: A. Model loss B. Data origin and transformations