Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

AWS Academy Data Engineering Ultimate Exam, Exams of Technology

Technology

The AWS Academy Data Engineering Ultimate Exam provides comprehensive preparation for learners working with cloud-based data solutions and analytics platforms. The exam covers data pipelines, ETL processes, data lakes, storage services, databases, big data processing, analytics tools, security, automation, data governance, and AWS data engineering services. It helps candidates develop practical skills required for modern data engineering roles and AWS-focused analytics environments.

Typology: Exams

2025/2026

Available from 05/08/2026

nicky-jone 🇮🇳

2.9

(43)

28K documents

1 / 48

This page cannot be seen from the preview

Don't miss anything!

AWS Academy Data Engineering Ultimate

Exam

**Question 1. Which term best describes the ability of a system to improve its performance

automatically through experience?**

A) Artificial Intelligence

B) Machine Learning

C) Deep Learning

D) Generative AI

Answer: B

Explanation: Machine Learning is the subset of AI that enables systems to learn from data without being

explicitly programmed.

**Question 2. In supervised learning, what is the primary difference between regression and

classification tasks?**

A) Regression predicts continuous values; classification predicts discrete categories.

B) Regression uses unlabeled data; classification uses labeled data.

C) Regression requires reinforcement signals; classification does not.

D) Regression is only for image data; classification is only for text.

Answer: A

Explanation: Regression outputs a numeric value, while classification assigns inputs to predefined

classes.

**Question 3. Which unsupervised learning technique groups similar data points based on distance

metrics?**

A) Decision Trees

B) K‑Means Clustering

C) Linear Regression

D) Q‑Learning

Answer: B

Explanation: K‑Means clusters data by minimizing intra‑cluster variance using distance calculations.

Partial preview of the text

Download AWS Academy Data Engineering Ultimate Exam and more Exams Technology in PDF only on Docsity!

Exam

Question 1. Which term best describes the ability of a system to improve its performance automatically through experience? A) Artificial Intelligence B) Machine Learning C) Deep Learning D) Generative AI Answer: B Explanation: Machine Learning is the subset of AI that enables systems to learn from data without being explicitly programmed. Question 2. In supervised learning, what is the primary difference between regression and classification tasks? A) Regression predicts continuous values; classification predicts discrete categories. B) Regression uses unlabeled data; classification uses labeled data. C) Regression requires reinforcement signals; classification does not. D) Regression is only for image data; classification is only for text. Answer: A Explanation: Regression outputs a numeric value, while classification assigns inputs to predefined classes. Question 3. Which unsupervised learning technique groups similar data points based on distance metrics? A) Decision Trees B) K‑Means Clustering C) Linear Regression D) Q‑Learning Answer: B Explanation: K‑Means clusters data by minimizing intra‑cluster variance using distance calculations.

Exam

Question 4. In reinforcement learning, what term describes the feedback signal that indicates how good an action was? A) Loss B) Reward C) Gradient D) Bias Answer: B Explanation: The reward guides the agent toward optimal policies by indicating the desirability of actions. Question 5. Which component of a neural network is responsible for introducing non‑linearity? A) Weights B) Biases C) Activation Function D) Input Layer Answer: C Explanation: Activation functions (e.g., ReLU, sigmoid) allow networks to model complex, non‑linear relationships. Question 6. Real‑time inferencing is best described as: A) Processing batch data overnight. B) Generating predictions synchronously as requests arrive. C) Storing model outputs for later analysis. D) Training models on streaming data. Answer: B Explanation: Real‑time (synchronous) inferencing returns results immediately for each incoming request.

Exam

A) The increase in model size over time. B) The degradation of model performance due to changes in data distribution. C) The process of converting a model to a different framework. D) The latency increase in inference after scaling. Answer: B Explanation: Model drift occurs when the statistical properties of input data shift, causing performance decay. Question 11. In the context of generative AI, what does the term “hallucination” describe? A) The model’s ability to generate high‑resolution images. B) The creation of outputs that are plausible but factually incorrect. C) The process of fine‑tuning a model on new data. D) The reduction of model size through pruning. Answer: B Explanation: Hallucinations are fabricated statements that appear realistic yet lack factual basis. Question 12. The “attention” mechanism in transformer architectures primarily enables: A. Faster GPU utilization. B. Parallel processing of sequence data. C. Dynamic weighting of input token relevance. D. Reduction of model parameters. Answer: C Explanation: Attention computes relevance scores between tokens, allowing the model to focus on important context. Question 13. In tokenization, which of the following best describes a “subword token”? A) A full sentence split by punctuation. B) A single character.

Exam

C) A fragment of a word, such as “##ing”. D) An entire paragraph treated as one token. Answer: C Explanation: Subword tokenization (e.g., WordPiece, BPE) breaks words into meaningful pieces to handle unknown words. Question 14. Increasing the temperature parameter during text generation will: A) Make the output more deterministic. B) Reduce the number of tokens generated. C) Increase randomness and creativity. D) Force the model to use only the top‑1 token. Answer: C Explanation: Higher temperature softens the probability distribution, allowing less likely tokens to be selected. Question 15. Which inference parameter limits the cumulative probability mass of considered tokens? A) Top‑K B) Top‑P (nucleus sampling) C) Temperature D) Max Tokens Answer: B Explanation: Top‑P selects the smallest set of tokens whose probabilities sum to a given threshold (e.g., 0.9). Question 16. When adapting a foundation model to a specific domain, the most efficient approach is often: A) Training a new model from scratch. B) Fine‑tuning the pre‑trained model on domain data.

Exam

Answer: B Explanation: RAG fetches relevant documents and injects them into the prompt, grounding the output in factual data. Question 20. In a typical RAG pipeline, which component converts raw text documents into numerical vectors for similarity search? A) Tokenizer B) Embedding Model C) Decoder D) Optimizer Answer: B Explanation: Embedding models map text to dense vectors that can be indexed and compared for retrieval. Question 21. Amazon Bedrock Knowledge Bases are used to: A) Host large‑scale training jobs. B) Store vector embeddings for RAG workflows. C) Deploy containerized microservices. D) Perform real‑time video transcoding. Answer: B Explanation: Bedrock Knowledge Bases manage document ingestion, embedding, and retrieval for RAG use cases. Question 22. Which AWS service provides built‑in content‑filtering guardrails for foundation models? A) Amazon SageMaker Clarify B) Amazon Bedrock Guardrails C) AWS WAF D) Amazon Macie

Exam

Answer: B Explanation: Bedrock Guardrails let you define policies for profanity, PII, and topic restrictions. Question 23. The primary purpose of Amazon SageMaker Clarify is to: A) Accelerate model training on GPU clusters. B) Detect bias and provide explanations for model predictions. C) Store large video datasets. D) Manage container registries. Answer: B Explanation: Clarify offers bias detection and explainability tools for both training and inference. Question 24. Which of the following is a common method for protecting AI workloads at the network layer? A) Enabling IAM role trust relationships. B) Deploying the workload inside a VPC with private subnets. C) Using Amazon S3 versioning. D) Encrypting model checkpoints with KMS. Answer: B Explanation: Placing resources in a VPC isolates them from the public internet and enables security controls. Question 25. Under the AWS Shared Responsibility Model, who is responsible for encrypting training data before it is uploaded to SageMaker? A) AWS B) The customer C) Both AWS and the customer equally D) Third‑party encryption vendors only Answer: B

Exam

Question 29. In a transformer encoder‑decoder architecture, which part processes the input sequence to create contextual representations? A) Decoder B) Encoder C) Tokenizer D) Positional Encoding Layer Answer: B Explanation: The encoder reads the source tokens and generates hidden states used by the decoder. Question 30. Which of the following best describes “parameter efficiency” in large language models? A) Using fewer training epochs. B) Achieving comparable performance with fewer parameters via techniques like distillation. C) Reducing the number of input features. D) Storing parameters in compressed files. Answer: B Explanation: Parameter efficiency aims to maintain capability while lowering model size, often via distillation or pruning. Question 31. Which of these prompt engineering patterns helps the model reason step‑by‑step before answering? A) Zero‑shot prompting B) Chain‑of‑Thought prompting C) Temperature scaling D) Top‑K sampling Answer: B Explanation: Chain‑of‑Thought encourages the model to generate intermediate reasoning steps.

Exam

Question 32. In the context of AI ethics, “explainability” primarily refers to: A) The ability to encrypt model weights. B) Providing human‑readable reasons for a model’s decision. C) Reducing model size for faster inference. D) Ensuring the model runs on low‑power devices. Answer: B Explanation: Explainability offers transparency by showing why a specific output was produced. Question 33. Which type of attack attempts to manipulate a model’s output by inserting malicious instructions into the prompt? A) Data poisoning B) Model inversion C) Prompt injection D) Side‑channel attack Answer: C Explanation: Prompt injection (or jailbreak) exploits the model’s interpretation of crafted inputs to bypass safeguards. Question 34. Watermarking AI‑generated content is primarily used for: A) Improving model accuracy. B) Identifying that the content originated from an AI system. C) Reducing token usage. D) Enhancing token embedding quality. Answer: B Explanation: Watermarks embed detectable patterns so consumers can verify AI origin. Question 35. Which AWS service provides a fully managed speech‑to‑text capability? A) Amazon Polly

Exam

C) Using reinforcement learning to improve performance. D) Pre‑training on billions of tokens. Answer: B Explanation: Few‑shot leverages a small number of examples supplied in the prompt to guide the model. Question 39. In the AWS Well‑Architected Framework, which pillar addresses the reliability of AI services? A) Security B) Performance Efficiency C) Reliability D) Operational Excellence Answer: C Explanation: The Reliability pillar ensures that services can recover from failures and meet availability requirements. Question 40. Which of the following is a common technique to mitigate model bias during training? A) Increasing learning rate. B) Using balanced class weights. C) Disabling dropout. D) Reducing batch size. Answer: B Explanation: Assigning class weights compensates for imbalanced representation, reducing biased predictions. Question 41. The process of converting a PDF document into vector embeddings for RAG is called: A) Tokenization B) Chunking C) Quantization D) Pruning

Exam

Answer: B Explanation: Chunking splits large documents into manageable pieces before embedding. Question 42. Which AWS service can automatically tag and redact personally identifiable information (PII) in model outputs? A) Amazon Macie B) Amazon Bedrock Guardrails C) AWS Config D) Amazon GuardDuty Answer: B Explanation: Bedrock Guardrails allow custom policies to detect and redact PII in generated text. Question 43. What does “top‑K” sampling control during text generation? A) The maximum length of the output. B) The number of most probable tokens considered at each step. C) The temperature of the softmax distribution. D) The proportion of the vocabulary used. Answer: B Explanation: Top‑K limits token selection to the K highest‑probability candidates. Question 44. Which of the following is a key advantage of using Amazon SageMaker JumpStart? A) It provides a marketplace for third‑party datasets. B) It offers ready‑to‑deploy pre‑trained models with one‑click integration. C) It automatically writes unit tests for your code. D) It replaces the need for IAM policies. Answer: B Explanation: JumpStart streamlines model deployment by offering pre‑trained open‑source models.

Exam

B) To ensure cosine similarity is equivalent to dot product. C) To encrypt the vectors. D) To increase token length. Answer: B Explanation: Normalizing vectors to unit length makes cosine similarity computation equal to dot product, simplifying retrieval. Question 49. Which of the following is NOT a typical use case for Amazon Rekognition? A) Face detection and analysis. B) Object and scene detection in images. C) Sentiment analysis of text. D) Video frame analysis. Answer: C Explanation: Sentiment analysis is an NLP task handled by services like Amazon Comprehend, not Rekognition. Question 50. When deploying a model with Amazon SageMaker, which feature enables automatic scaling of inference instances based on request volume? A) SageMaker Pipelines B) Multi‑Model Endpoint C) Auto‑Scaling with Application Auto Scaling D) SageMaker Debugger Answer: C Explanation: Application Auto Scaling can adjust the number of endpoint instances dynamically. Question 51. Which of the following best captures the concept of “parameter sharing” in transformer models? A) Using the same weight matrix across multiple layers. B) Storing parameters in a shared S3 bucket.

Exam

C) Sharing GPU memory among models. D) Duplicating the same parameters for each attention head. Answer: A Explanation: Transformers often reuse the same projection matrices across heads or layers to reduce total parameters. Question 52. In the context of AI governance, “data residency” refers to: A) The physical location where training data is stored. B) The latency of data transfer across regions. C) The version control of datasets. D) The encryption algorithm used for data at rest. Answer: A Explanation: Data residency concerns compliance with regulations that dictate where data may be stored geographically. Question 53. Which of the following is a recommended practice for preventing prompt injection attacks? A) Increasing model temperature. B) Sanitizing user inputs before concatenating them with system prompts. C) Disabling IAM authentication. D) Using larger batch sizes during inference. Answer: B Explanation: Input sanitization removes malicious instructions that could subvert model behavior. Question 54. What does “few‑shot” prompting with “chain‑of‑thought” combine? A) Providing no examples and random token sampling. B) Supplying a few examples and asking the model to reason step‑by‑step. C) Using top‑K sampling with high temperature. D) Training a new model on a small dataset.

Exam

Question 58. Which of the following is a common technique to reduce the size of a large language model for edge deployment? A) Increasing batch size during inference. B) Model quantization (e.g., 8‑bit). C) Adding more attention heads. D) Using a higher learning rate. Answer: B Explanation: Quantization reduces precision of weights, shrinking model size and improving latency on constrained devices. Question 59. When using Amazon Bedrock, the “model ID” parameter is used to: A) Identify the AWS region. B) Select a specific foundation model (e.g., Claude, Titan). C) Set the IAM role for the request. D) Define the output format. Answer: B Explanation: Model ID tells Bedrock which pre‑trained model to invoke for generation. Question 60. Which of the following best describes “knowledge distillation” in the context of LLMs? A) Converting a model to a different programming language. B) Training a smaller “student” model to mimic the outputs of a larger “teacher” model. C) Removing attention layers to simplify architecture. D) Encrypting model weights for secure storage. Answer: B Explanation: Distillation transfers knowledge from a large model to a compact one, preserving performance. Question 61. In the context of AI model bias, “label bias” occurs when:

Exam

A) The model’s architecture is too deep. B) Training labels are systematically skewed or inaccurate. C) Input features are normalized incorrectly. D) The optimizer converges too quickly. Answer: B Explanation: Biased or noisy labels propagate bias into the learned model. Question 62. Which AWS service can be used to automatically redact PII from text generated by a Bedrock model before returning it to the user? A) Amazon Macie B) Amazon Bedrock Guardrails C) AWS Secrets Manager D) Amazon Inspector Answer: B Explanation: Guardrails let you define policies that filter or redact sensitive information in model outputs. Question 63. What is the primary benefit of using “batch transform” in SageMaker for inference? A) Real‑time low‑latency responses. B) Processing large datasets offline with automatic scaling. C) Storing model artifacts in S3. D) Enabling multi‑region deployment. Answer: B Explanation: Batch Transform runs inference on a whole dataset asynchronously, ideal for bulk processing. Question 64. Which of the following best defines “adversarial robustness” for an AI model? A) Ability to run on low‑power devices. B) Resistance to intentionally crafted inputs that aim to cause mis‑prediction.

AWS Academy Data Engineering Ultimate Exam, Exams of Technology

Related documents

Partial preview of the text

Download AWS Academy Data Engineering Ultimate Exam and more Exams Technology in PDF only on Docsity!

Exam

Exam

Exam

Exam

Exam

Exam

Exam

Exam

Exam

Exam

Exam

Exam

Exam

Exam