AWS Academy Data Engineering Ultimate Exam, Exams of Technology

The AWS Academy Data Engineering Ultimate Exam provides comprehensive preparation for learners working with cloud-based data solutions and analytics platforms. The exam covers data pipelines, ETL processes, data lakes, storage services, databases, big data processing, analytics tools, security, automation, data governance, and AWS data engineering services. It helps candidates develop practical skills required for modern data engineering roles and AWS-focused analytics environments.

Typology: Exams

2025/2026

Available from 05/08/2026

nicky-jone
nicky-jone 🇮🇳

2.9

(43)

28K documents

1 / 48

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
AWS Academy Data Engineering Ultimate
Exam
**Question 1. Which term best describes the ability of a system to improve its performance
automatically through experience?**
A) Artificial Intelligence
B) Machine Learning
C) Deep Learning
D) Generative AI
Answer: B
Explanation: Machine Learning is the subset of AI that enables systems to learn from data without being
explicitly programmed.
**Question 2. In supervised learning, what is the primary difference between regression and
classification tasks?**
A) Regression predicts continuous values; classification predicts discrete categories.
B) Regression uses unlabeled data; classification uses labeled data.
C) Regression requires reinforcement signals; classification does not.
D) Regression is only for image data; classification is only for text.
Answer: A
Explanation: Regression outputs a numeric value, while classification assigns inputs to predefined
classes.
**Question 3. Which unsupervised learning technique groups similar data points based on distance
metrics?**
A) Decision Trees
B) KMeans Clustering
C) Linear Regression
D) QLearning
Answer: B
Explanation: KMeans clusters data by minimizing intracluster variance using distance calculations.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30

Partial preview of the text

Download AWS Academy Data Engineering Ultimate Exam and more Exams Technology in PDF only on Docsity!

Exam

Question 1. Which term best describes the ability of a system to improve its performance automatically through experience? A) Artificial Intelligence B) Machine Learning C) Deep Learning D) Generative AI Answer: B Explanation: Machine Learning is the subset of AI that enables systems to learn from data without being explicitly programmed. Question 2. In supervised learning, what is the primary difference between regression and classification tasks? A) Regression predicts continuous values; classification predicts discrete categories. B) Regression uses unlabeled data; classification uses labeled data. C) Regression requires reinforcement signals; classification does not. D) Regression is only for image data; classification is only for text. Answer: A Explanation: Regression outputs a numeric value, while classification assigns inputs to predefined classes. Question 3. Which unsupervised learning technique groups similar data points based on distance metrics? A) Decision Trees B) K‑Means Clustering C) Linear Regression D) Q‑Learning Answer: B Explanation: K‑Means clusters data by minimizing intra‑cluster variance using distance calculations.

Exam

Question 4. In reinforcement learning, what term describes the feedback signal that indicates how good an action was? A) Loss B) Reward C) Gradient D) Bias Answer: B Explanation: The reward guides the agent toward optimal policies by indicating the desirability of actions. Question 5. Which component of a neural network is responsible for introducing non‑linearity? A) Weights B) Biases C) Activation Function D) Input Layer Answer: C Explanation: Activation functions (e.g., ReLU, sigmoid) allow networks to model complex, non‑linear relationships. Question 6. Real‑time inferencing is best described as: A) Processing batch data overnight. B) Generating predictions synchronously as requests arrive. C) Storing model outputs for later analysis. D) Training models on streaming data. Answer: B Explanation: Real‑time (synchronous) inferencing returns results immediately for each incoming request.

Exam

A) The increase in model size over time. B) The degradation of model performance due to changes in data distribution. C) The process of converting a model to a different framework. D) The latency increase in inference after scaling. Answer: B Explanation: Model drift occurs when the statistical properties of input data shift, causing performance decay. Question 11. In the context of generative AI, what does the term “hallucination” describe? A) The model’s ability to generate high‑resolution images. B) The creation of outputs that are plausible but factually incorrect. C) The process of fine‑tuning a model on new data. D) The reduction of model size through pruning. Answer: B Explanation: Hallucinations are fabricated statements that appear realistic yet lack factual basis. Question 12. The “attention” mechanism in transformer architectures primarily enables: A. Faster GPU utilization. B. Parallel processing of sequence data. C. Dynamic weighting of input token relevance. D. Reduction of model parameters. Answer: C Explanation: Attention computes relevance scores between tokens, allowing the model to focus on important context. Question 13. In tokenization, which of the following best describes a “subword token”? A) A full sentence split by punctuation. B) A single character.

Exam

C) A fragment of a word, such as “##ing”. D) An entire paragraph treated as one token. Answer: C Explanation: Subword tokenization (e.g., WordPiece, BPE) breaks words into meaningful pieces to handle unknown words. Question 14. Increasing the temperature parameter during text generation will: A) Make the output more deterministic. B) Reduce the number of tokens generated. C) Increase randomness and creativity. D) Force the model to use only the top‑1 token. Answer: C Explanation: Higher temperature softens the probability distribution, allowing less likely tokens to be selected. Question 15. Which inference parameter limits the cumulative probability mass of considered tokens? A) Top‑K B) Top‑P (nucleus sampling) C) Temperature D) Max Tokens Answer: B Explanation: Top‑P selects the smallest set of tokens whose probabilities sum to a given threshold (e.g., 0.9). Question 16. When adapting a foundation model to a specific domain, the most efficient approach is often: A) Training a new model from scratch. B) Fine‑tuning the pre‑trained model on domain data.

Exam

Answer: B Explanation: RAG fetches relevant documents and injects them into the prompt, grounding the output in factual data. Question 20. In a typical RAG pipeline, which component converts raw text documents into numerical vectors for similarity search? A) Tokenizer B) Embedding Model C) Decoder D) Optimizer Answer: B Explanation: Embedding models map text to dense vectors that can be indexed and compared for retrieval. Question 21. Amazon Bedrock Knowledge Bases are used to: A) Host large‑scale training jobs. B) Store vector embeddings for RAG workflows. C) Deploy containerized microservices. D) Perform real‑time video transcoding. Answer: B Explanation: Bedrock Knowledge Bases manage document ingestion, embedding, and retrieval for RAG use cases. Question 22. Which AWS service provides built‑in content‑filtering guardrails for foundation models? A) Amazon SageMaker Clarify B) Amazon Bedrock Guardrails C) AWS WAF D) Amazon Macie

Exam

Answer: B Explanation: Bedrock Guardrails let you define policies for profanity, PII, and topic restrictions. Question 23. The primary purpose of Amazon SageMaker Clarify is to: A) Accelerate model training on GPU clusters. B) Detect bias and provide explanations for model predictions. C) Store large video datasets. D) Manage container registries. Answer: B Explanation: Clarify offers bias detection and explainability tools for both training and inference. Question 24. Which of the following is a common method for protecting AI workloads at the network layer? A) Enabling IAM role trust relationships. B) Deploying the workload inside a VPC with private subnets. C) Using Amazon S3 versioning. D) Encrypting model checkpoints with KMS. Answer: B Explanation: Placing resources in a VPC isolates them from the public internet and enables security controls. Question 25. Under the AWS Shared Responsibility Model, who is responsible for encrypting training data before it is uploaded to SageMaker? A) AWS B) The customer C) Both AWS and the customer equally D) Third‑party encryption vendors only Answer: B

Exam

Question 29. In a transformer encoder‑decoder architecture, which part processes the input sequence to create contextual representations? A) Decoder B) Encoder C) Tokenizer D) Positional Encoding Layer Answer: B Explanation: The encoder reads the source tokens and generates hidden states used by the decoder. Question 30. Which of the following best describes “parameter efficiency” in large language models? A) Using fewer training epochs. B) Achieving comparable performance with fewer parameters via techniques like distillation. C) Reducing the number of input features. D) Storing parameters in compressed files. Answer: B Explanation: Parameter efficiency aims to maintain capability while lowering model size, often via distillation or pruning. Question 31. Which of these prompt engineering patterns helps the model reason step‑by‑step before answering? A) Zero‑shot prompting B) Chain‑of‑Thought prompting C) Temperature scaling D) Top‑K sampling Answer: B Explanation: Chain‑of‑Thought encourages the model to generate intermediate reasoning steps.

Exam

Question 32. In the context of AI ethics, “explainability” primarily refers to: A) The ability to encrypt model weights. B) Providing human‑readable reasons for a model’s decision. C) Reducing model size for faster inference. D) Ensuring the model runs on low‑power devices. Answer: B Explanation: Explainability offers transparency by showing why a specific output was produced. Question 33. Which type of attack attempts to manipulate a model’s output by inserting malicious instructions into the prompt? A) Data poisoning B) Model inversion C) Prompt injection D) Side‑channel attack Answer: C Explanation: Prompt injection (or jailbreak) exploits the model’s interpretation of crafted inputs to bypass safeguards. Question 34. Watermarking AI‑generated content is primarily used for: A) Improving model accuracy. B) Identifying that the content originated from an AI system. C) Reducing token usage. D) Enhancing token embedding quality. Answer: B Explanation: Watermarks embed detectable patterns so consumers can verify AI origin. Question 35. Which AWS service provides a fully managed speech‑to‑text capability? A) Amazon Polly

Exam

C) Using reinforcement learning to improve performance. D) Pre‑training on billions of tokens. Answer: B Explanation: Few‑shot leverages a small number of examples supplied in the prompt to guide the model. Question 39. In the AWS Well‑Architected Framework, which pillar addresses the reliability of AI services? A) Security B) Performance Efficiency C) Reliability D) Operational Excellence Answer: C Explanation: The Reliability pillar ensures that services can recover from failures and meet availability requirements. Question 40. Which of the following is a common technique to mitigate model bias during training? A) Increasing learning rate. B) Using balanced class weights. C) Disabling dropout. D) Reducing batch size. Answer: B Explanation: Assigning class weights compensates for imbalanced representation, reducing biased predictions. Question 41. The process of converting a PDF document into vector embeddings for RAG is called: A) Tokenization B) Chunking C) Quantization D) Pruning

Exam

Answer: B Explanation: Chunking splits large documents into manageable pieces before embedding. Question 42. Which AWS service can automatically tag and redact personally identifiable information (PII) in model outputs? A) Amazon Macie B) Amazon Bedrock Guardrails C) AWS Config D) Amazon GuardDuty Answer: B Explanation: Bedrock Guardrails allow custom policies to detect and redact PII in generated text. Question 43. What does “top‑K” sampling control during text generation? A) The maximum length of the output. B) The number of most probable tokens considered at each step. C) The temperature of the softmax distribution. D) The proportion of the vocabulary used. Answer: B Explanation: Top‑K limits token selection to the K highest‑probability candidates. Question 44. Which of the following is a key advantage of using Amazon SageMaker JumpStart? A) It provides a marketplace for third‑party datasets. B) It offers ready‑to‑deploy pre‑trained models with one‑click integration. C) It automatically writes unit tests for your code. D) It replaces the need for IAM policies. Answer: B Explanation: JumpStart streamlines model deployment by offering pre‑trained open‑source models.

Exam

B) To ensure cosine similarity is equivalent to dot product. C) To encrypt the vectors. D) To increase token length. Answer: B Explanation: Normalizing vectors to unit length makes cosine similarity computation equal to dot product, simplifying retrieval. Question 49. Which of the following is NOT a typical use case for Amazon Rekognition? A) Face detection and analysis. B) Object and scene detection in images. C) Sentiment analysis of text. D) Video frame analysis. Answer: C Explanation: Sentiment analysis is an NLP task handled by services like Amazon Comprehend, not Rekognition. Question 50. When deploying a model with Amazon SageMaker, which feature enables automatic scaling of inference instances based on request volume? A) SageMaker Pipelines B) Multi‑Model Endpoint C) Auto‑Scaling with Application Auto Scaling D) SageMaker Debugger Answer: C Explanation: Application Auto Scaling can adjust the number of endpoint instances dynamically. Question 51. Which of the following best captures the concept of “parameter sharing” in transformer models? A) Using the same weight matrix across multiple layers. B) Storing parameters in a shared S3 bucket.

Exam

C) Sharing GPU memory among models. D) Duplicating the same parameters for each attention head. Answer: A Explanation: Transformers often reuse the same projection matrices across heads or layers to reduce total parameters. Question 52. In the context of AI governance, “data residency” refers to: A) The physical location where training data is stored. B) The latency of data transfer across regions. C) The version control of datasets. D) The encryption algorithm used for data at rest. Answer: A Explanation: Data residency concerns compliance with regulations that dictate where data may be stored geographically. Question 53. Which of the following is a recommended practice for preventing prompt injection attacks? A) Increasing model temperature. B) Sanitizing user inputs before concatenating them with system prompts. C) Disabling IAM authentication. D) Using larger batch sizes during inference. Answer: B Explanation: Input sanitization removes malicious instructions that could subvert model behavior. Question 54. What does “few‑shot” prompting with “chain‑of‑thought” combine? A) Providing no examples and random token sampling. B) Supplying a few examples and asking the model to reason step‑by‑step. C) Using top‑K sampling with high temperature. D) Training a new model on a small dataset.

Exam

Question 58. Which of the following is a common technique to reduce the size of a large language model for edge deployment? A) Increasing batch size during inference. B) Model quantization (e.g., 8‑bit). C) Adding more attention heads. D) Using a higher learning rate. Answer: B Explanation: Quantization reduces precision of weights, shrinking model size and improving latency on constrained devices. Question 59. When using Amazon Bedrock, the “model ID” parameter is used to: A) Identify the AWS region. B) Select a specific foundation model (e.g., Claude, Titan). C) Set the IAM role for the request. D) Define the output format. Answer: B Explanation: Model ID tells Bedrock which pre‑trained model to invoke for generation. Question 60. Which of the following best describes “knowledge distillation” in the context of LLMs? A) Converting a model to a different programming language. B) Training a smaller “student” model to mimic the outputs of a larger “teacher” model. C) Removing attention layers to simplify architecture. D) Encrypting model weights for secure storage. Answer: B Explanation: Distillation transfers knowledge from a large model to a compact one, preserving performance. Question 61. In the context of AI model bias, “label bias” occurs when:

Exam

A) The model’s architecture is too deep. B) Training labels are systematically skewed or inaccurate. C) Input features are normalized incorrectly. D) The optimizer converges too quickly. Answer: B Explanation: Biased or noisy labels propagate bias into the learned model. Question 62. Which AWS service can be used to automatically redact PII from text generated by a Bedrock model before returning it to the user? A) Amazon Macie B) Amazon Bedrock Guardrails C) AWS Secrets Manager D) Amazon Inspector Answer: B Explanation: Guardrails let you define policies that filter or redact sensitive information in model outputs. Question 63. What is the primary benefit of using “batch transform” in SageMaker for inference? A) Real‑time low‑latency responses. B) Processing large datasets offline with automatic scaling. C) Storing model artifacts in S3. D) Enabling multi‑region deployment. Answer: B Explanation: Batch Transform runs inference on a whole dataset asynchronously, ideal for bulk processing. Question 64. Which of the following best defines “adversarial robustness” for an AI model? A) Ability to run on low‑power devices. B) Resistance to intentionally crafted inputs that aim to cause mis‑prediction.