Certified Generative AI Engineer Associate Exam (Databricks) | Exams Nursing

Certified Generative AI Engineer Associate Exam

(Databricks)

2026-2028 Latest Version: 6.0 questions with

verified answers and rationales | instant pdf

download .

A Generative Al Engineer has created a RAG application to look up

answers to questions about a series of fantasy novels that are being asked

on the author’s web forum. The fantasy novel texts are chunked and

embedded into a vector store with metadata (page number, chapter

number, book title), retrieved with the user’s query, and provided to an LLM

for response generation. The Generative AI Engineer used their intuition to

pick the chunking strategy and associated configurations but now wants to

more methodically choose the best values.

Which TWO strategies should the Generative AI Engineer take to optimize their

chunking strategy and parameters? (Choose two.)

A. Change embedding models and compare performance.

B. Add a classifier for user queries that predicts which book will best contain

the answer. Use this to filter retrieval.

C. Choose an appropriate evaluation metric (such as recall or NDCG) and

experiment with changes in the chunking strategy, such as splitting

chunks by paragraphs or chapters.

Choose the strategy that gives the best performance metric.

D. Pass known questions and best answers to an LLM and instruct the LLM to

provide the best token count. Use a summary statistic (mean, median, etc.)

of the best token counts to choose chunk size.

E. Create an LLM-as-a-judge metric to evaluate how well previous

questions are answered by the most appropriate chunk. Optimize the

chunking parameters based upon the values of the metric.

Question: 1

Answer: C, E

Partial preview of the text

Download Certified Generative AI Engineer Associate Exam (Databricks) and more Exams Nursing in PDF only on Docsity!

Certified Generative AI Engineer Associate Exam

(Databricks)

2026 - 2028 Latest Version: 6.0 questions with

verified answers and rationales | instant pdf

download.

A Generative Al Engineer has created a RAG application to look up answers to questions about a series of fantasy novels that are being asked on the author’s web forum. The fantasy novel texts are chunked and embedded into a vector store with metadata (page number, chapter number, book title), retrieved with the user’s query, and provided to an LLM for response generation. The Generative AI Engineer used their intuition to pick the chunking strategy and associated configurations but now wants to more methodically choose the best values. Which TWO strategies should the Generative AI Engineer take to optimize their chunking strategy and parameters? (Choose two.) A. Change embedding models and compare performance. B. Add a classifier for user queries that predicts which book will best contain the answer. Use this to filter retrieval. C. Choose an appropriate evaluation metric (such as recall or NDCG) and experiment with changes in the chunking strategy, such as splitting chunks by paragraphs or chapters. Choose the strategy that gives the best performance metric. D. Pass known questions and best answers to an LLM and instruct the LLM to provide the best token count. Use a summary statistic (mean, median, etc.) of the best token counts to choose chunk size. E. Create an LLM-as-a-judge metric to evaluate how well previous questions are answered by the most appropriate chunk. Optimize the chunking parameters based upon the values of the metric. Question: 1 Answer: C, E

Explanation: To optimize a chunking strategy for a Retrieval-Augmented Generation (RAG) application, the Generative AI Engineer needs a structured approach to evaluating the chunking strategy, ensuring that the chosen configuration retrieves the most relevant information and leads to accurate and coherent LLM responses. Here's why C and E are the correct strategies: Strategy C: Evaluation Metrics (Recall, NDCG) Define an evaluation metric: Common evaluation metrics such as recall, precision, or NDCG (Normalized Discounted Cumulative Gain) measure how well the retrieved chunks match the user's query and the expected response. Recall measures the proportion of relevant information retrieved. NDCG is often used when you want to account for both the relevance of retrieved chunks and the ranking or order in which they are retrieved. Experiment with chunking strategies: Adjusting chunking strategies based on text structure (e.g., splitting by paragraph, chapter, or a fixed number of tokens) allows the engineer to experiment with various ways of slicing the text. Some chunks may better align with the user's query than others. Evaluate performance: By using recall or NDCG, the engineer can methodically test various chunking strategies to identify which one yields the highest performance. This ensures that the chunking method provides the most relevant information when embedding and retrieving data from the vector store. Strategy E: LLM-as-a-Judge Metric Use the LLM as an evaluator: After retrieving chunks, the LLM can be used to evaluate the quality of answers based on the chunks provided. This could be framed as a "judge" function, where the LLM compares how well a given chunk answers previous user queries. Optimize based on the LLM's judgment: By having the LLM assess previous answers and rate their relevance and accuracy, the engineer can collect feedback on how well different chunking configurations perform in real-world scenarios. This metric could be a qualitative judgment on how closely the retrieved information matches the user's intent. Tune chunking parameters: Based on the LLM's judgment, the engineer can adjust the chunk size or structure to better align with the LLM's responses, optimizing retrieval for future queries. By combining these two approaches, the engineer ensures that the chunking strategy is systematically evaluated using both quantitative

based on user queries. User submits queries against an LLM: Users interact with the application by submitting their queries. These queries will be passed to the LLM. LLM retrieves relevant documents: The LLM works with the vector store to retrieve the most relevant documents based on their vector representations. LLM generates a response: Using the retrieved documents, the LLM generates a response that is tailored to the user's question. Evaluate model: After generating responses, the system must be evaluated to ensure the retrieved documents are relevant and the generated response is accurate. Metrics such as accuracy, relevance, and user satisfaction can be used for evaluation. Deploy it using Model Serving: Once the RAG pipeline is ready and evaluated, it is deployed using a model-serving platform such as Databricks Model Serving. This enables real-time inference and response generation for users. By following these steps, the Generative AI Engineer ensures that the RAG application is both efficient and effective for the task of answering technical regulation questions. A Generative AI Engineer just deployed an LLM application at a digital marketing company that assists with answering customer service inquiries. Which metric should they monitor for their customer service LLM application in production? A. Number of customer inquiries processed per unit of time B. Energy usage per query C. Final perplexity scores for the training of the model D. HuggingFace Leaderboard values for the base LLM Explanation: When deploying an LLM application for customer service inquiries, the primary focus is on measuring the operational efficiency and quality of the responses. Here's why A is the correct metric: Number of customer inquiries processed per unit of time: This metric tracks the throughput of the customer service system, reflecting how many customer inquiries the LLM application can handle in a given time period (e.g., per minute or hour). Question: 3

High throughput is crucial in customer service applications where quick response times are essential to user satisfaction and business efficiency. Real-time performance monitoring: Monitoring the number of queries processed is an important part of ensuring that the model is performing well under load, especially during peak traffic times. It also helps ensure the system scales properly to meet demand. Why other options are not ideal: B. Energy usage per query: While energy efficiency is a consideration, it is not the primary concern for a customer-facing application where user experience (i.e., fast and accurate responses) is critical. C. Final perplexity scores for the training of the model: Perplexity is a metric for model training, but it doesn't reflect the real-time operational performance of an LLM in production. D. HuggingFace Leaderboard values for the base LLM: The HuggingFace Leaderboard is more relevant during model selection and benchmarking. However, it is not a direct measure of the model's performance in a specific customer service application in production. Focusing on throughput (inquiries processed per unit time) ensures that the LLM application is meeting business needs for fast and efficient customer service responses. A Generative AI Engineer is building a Generative AI system that suggests the best matched employee team member to newly scoped projects. The team member is selected from a very large team. The match should be based upon project date availability and how well their employee profile matches the project scope. Both the employee profile and project scope are unstructured text. How should the Generative Al Engineer architect their system? A. Create a tool for finding available team members given project dates. Embed all project scopes into a vector store, perform a retrieval using team member profiles to find the best team member. B. Create a tool for finding team member availability given project dates, and another tool that uses an LLM to extract keywords from project scopes. Iterate through available team members’ profiles and perform keyword matching to find the best available team member. Question: 4 Answer: A

Iterating through each member’s profile individually could be computationally expensive in large teams. It also lacks the mention of using a vector store or an efficient retrieval mechanism. Option D is the correct approach. Here’s why: Embedding team profiles into a vector store: Using a vector store allows for efficient similarity searches on unstructured data. Embedding the team member profiles into vectors captures their semantics in a way that is far more flexible than keyword-based matching. Using project scope for retrieval: Instead of matching keywords, this approach suggests using vector embeddings and similarity search algorithms (e.g., cosine similarity) to find the team members whose profiles most closely align with the project scope. Filtering based on availability: Once the best-matched candidates are retrieved based on profile similarity, filtering them by availability ensures that the system provides a practically useful result. This method efficiently handles large-scale datasets by leveraging vector embeddings and similarity search techniques, both of which are fundamental tools in Generative AI engineering for handling unstructured text. Technical References: Vector embeddings: In this approach, the unstructured text (employee profiles and project scopes) is converted into high-dimensional vectors using pretrained models (e.g., BERT, Sentence-BERT, or custom embeddings). These embeddings capture the semantic meaning of the text, making it easier to perform similarity-based retrieval. Vector stores: Solutions like FAISS or Milvus allow storing and retrieving large numbers of vector embeddings quickly. This is critical when working with large teams where querying through individual profiles sequentially would be inefficient. LLM Integration: Large language models can assist in generating embeddings for both employee profiles and project scopes. They can also assist in fine-tuning similarity measures, ensuring that the retrieval system captures the nuances of the text data. Filtering: After retrieving the most similar profiles based on the project scope, filtering based on availability ensures that only team members who are free for the project are considered. This system is scalable, efficient, and makes use of the latest techniques in Generative AI, such as vector embeddings and semantic search. Question: 5

A Generative AI Engineer is designing an LLM-powered live sports commentary platform. The platform provides real-time updates and LLM- generated analyses for any users who would like to have live summaries, rather than reading a series of potentially outdated news articles. Which tool below will give the platform access to real-time data for generating game analyses based on the latest game scores? A. DatabrickslQ B. Foundation Model APIs C. Feature Serving D. AutoML Explanation: Problem Context: The engineer is developing an LLM-powered live sports commentary platform that needs to provide real-time updates and analyses based on the latest game scores. The critical requirement here is the capability to access and integrate real-time data efficiently with the platform for immediate analysis and reporting. Explanation of Options: Option A: DatabricksIQ: While DatabricksIQ offers integration and data processing capabilities, it is more aligned with data analytics rather than real-time feature serving, which is crucial for immediate updates necessary in a live sports commentary context. Option B: Foundation Model APIs: These APIs facilitate interactions with pre-trained models and could be part of the solution, but on their own, they do not provide mechanisms to access real-time game scores. Option C: Feature Serving: This is the correct answer as feature serving specifically refers to the realtime provision of data (features) to models for prediction. This would be essential for an LLM that generates analyses based on live game data, ensuring that the commentary is current and based on the latest events in the sport. Option D: AutoML: This tool automates the process of applying machine learning models to real-world problems, but it does not directly provide real- time data access, which is a critical requirement for the platform. Thus, Option C (Feature Serving) is the most suitable tool for the platform as it directly supports the realtime data needs of an LLM-powered sports commentary system, ensuring that the analyses and Answer: C

An engineer is embedding customer support chat logs into a vector store. They want to ensure that queries about new products retrieve relevant past support tickets. Which strategy would most improve retrieval for unseen queries? A. Use a general-purpose embedding model trained on web text. B. Fine-tune the embedding model on company-specific chat logs. C. Increase chunk size to include multiple tickets in one embedding. D. Randomly shuffle tickets before embedding. Answer: B Explanation: Fine-tuning embeddings on domain-specific data improves semantic understanding and retrieval relevance for queries about new products. General-purpose embeddings may not capture domain-specific terminology. Question 9 A Generative AI Engineer wants the LLM to summarize multiple retrieved documents for a user query. Some retrieved chunks are redundant or slightly overlapping. What is the best strategy? A. Pass all chunks directly to the LLM without preprocessing. B. Deduplicate overlapping chunks before sending them to the LLM. C. Concatenate all chunks regardless of length. D. Only use the first chunk retrieved. Answer: B Explanation: Deduplicating overlapping chunks reduces redundancy and improves the quality of the LLM’s generated summary. Passing overlapping information can lead to repetitive or bloated responses. Question 10 An engineer is designing a RAG application with multiple source types: PDFs, HTML, and Word documents. What is the most effective approach for ingestion? A. Convert all documents to plain text without preserving metadata. B. Extract text along with metadata (page numbers, headings) and store in the vector store. C. Store raw files in the vector store without conversion.

D. Only index PDFs as they are the most common source. Answer: B Explanation: Preserving metadata allows more precise retrieval and context (e.g., knowing which section a chunk comes from). Plain text without metadata loses important structural information. Question 11 A Generative AI Engineer wants to evaluate different chunking strategies using an LLM-as-ajudge approach. Which workflow correctly implements this? A. Feed chunks to the LLM and ask it to judge relevance against a known query and answer. B. Only measure token length of chunks. C. Compare chunks by file size only. D. Randomly select chunks and measure retrieval time. Answer: A Explanation: Using the LLM as a judge evaluates chunk relevance qualitatively, complementing quantitative metrics like recall or NDCG. Question 12 An engineer is deploying a RAG application in production. They notice query latency increases as the vector store grows. Which technique will improve retrieval speed? A. Use approximate nearest neighbor (ANN) search in the vector store. B. Increase chunk size to reduce the number of vectors. C. Switch to storing raw text instead of embeddings. D. Reduce the number of embeddings per document randomly. Answer: A Explanation: ANN search (e.g., FAISS, Milvus) enables fast similarity search on large vector stores while maintaining retrieval quality. Question 13 A Generative AI Engineer is integrating a knowledge base of product manuals into a RAG system. They want to ensure the LLM only retrieves relevant sections for a query. What strategy should they implement?

An engineer wants an LLM to summarize social media posts in real-time. Which approach ensures minimal latency? A. Precompute embeddings and use feature serving to provide real-time data to the LLM. B. Fetch raw posts directly and pass to LLM without embeddings. C. Re-embed all posts at query time. D. Only summarize posts once a day. Answer: A Explanation: Precomputing embeddings and using feature serving allows the LLM to quickly access relevant vectors for real-time summarization. Question 17 A RAG system retrieves multiple chunks with partially conflicting information. What is the best strategy for the LLM? A. Use a summarization prompt asking the LLM to reconcile conflicting information. B. Pick the first chunk only. C. Concatenate all chunks and output without reconciliation. D. Remove all chunks and respond with “unknown.” Answer: A Explanation: LLM summarization with reasoning prompts helps reconcile conflicting information and produces coherent responses. Question 18 An engineer wants to reduce memory usage in a vector store without losing retrieval quality. Which approach is most appropriate? A. Reduce embedding dimensionality using PCA or quantization. B. Increase chunk size to include more tokens per embedding. C. Remove metadata from embeddings. D. Store only a subset of documents randomly. Answer: A Explanation: Dimensionality reduction or vector quantization reduces storage while maintaining similarity relationships for retrieval. Question 19 A Generative AI Engineer wants to ensure that the RAG system does not retrieve outdated information. What approach is most suitable?

A. Include timestamps in chunk metadata and filter by recency during retrieval. B. Randomly shuffle chunks before retrieval. C. Only index documents once and never update. D. Remove all metadata and rely on LLM reasoning. Answer: A Explanation: Filtering by timestamps ensures the system retrieves the most recent and relevant information, preventing outdated responses. Question 20 An engineer wants to evaluate the effectiveness of a new embedding model on retrieval tasks. Which workflow is most systematic? A. Define a benchmark set of queries and expected answers, compute recall/NDCG, and compare models. B. Pick a model randomly and hope it performs well. C. Only measure token usage. D. Compare training loss on embedding model. Answer: A Explanation: Using benchmark queries and quantitative retrieval metrics ensures systematic evaluation of embedding models. Question 26 A Generative AI Engineer is designing a RAG system for medical research papers. The engineer wants to ensure the LLM answers accurately and avoids hallucinations. Which strategy will most improve factual accuracy? A. Provide retrieved chunks as context to the LLM and include a prompt asking it to only use the provided information. B. Allow the LLM to answer freely without context. C. Randomly select chunks and concatenate them. D. Only use metadata without full text chunks. Answer: A Explanation: Providing context with a grounding prompt ensures the LLM generates answers based on verified information, reducing hallucinations. Question 27 An engineer notices that retrieval for some queries returns too many low- relevance chunks.

An engineer wants to ensure that the RAG system handles long queries effectively. What is the best approach? A. Truncate queries to 50 tokens. B. Embed the entire query and split it into semantic sub-queries for retrieval. C. Ignore query length. D. Only use keywords from the query. Answer: B Explanation: Breaking long queries into semantic sub-queries allows more precise retrieval without losing context, improving LLM response quality. Question 31 A company wants to build a RAG system that answers customer questions from product documentation and knowledge articles. Which approach ensures the system stays up-to-date with new content? A. Implement incremental indexing: add new documents and update embeddings regularly. B. Only index content once. C. Randomly shuffle old documents. D. Replace all embeddings weekly without tracking new content. Answer: A Explanation: Incremental indexing ensures new content is included promptly, keeping the RAG system relevant and accurate. Question 32 An engineer wants to evaluate user satisfaction for a customer support RAG system. Which metric is most suitable? A. User feedback score (e.g., thumbs up/down or rating). B. Number of embeddings. C. Retrieval latency only. D. Training loss of the base LLM. Answer: A Explanation: Direct user feedback measures real-world effectiveness and satisfaction, complementing quantitative retrieval metrics.

Question 33 A RAG application retrieves multiple documents that contain numeric data (e.g., statistics). What is the best approach for ensuring accurate numeric answers? A. Include retrieved numeric data as context and instruct the LLM to calculate or reference only provided numbers. B. Allow LLM to hallucinate numbers. C. Exclude numeric data from chunks. D. Only return textual summaries without numbers. Answer: A Explanation: Grounding the LLM on provided numeric data ensures accurate answers instead of fabricated values. Question 34 An engineer notices retrieval for some queries returns outdated results even though new documents exist. Which design change will prevent this? A. Add timestamp metadata to chunks and filter by recency during retrieval. B. Only index old documents. C. Remove metadata entirely. D. Randomly select chunks. Answer: A Explanation: Filtering by timestamps ensures the system retrieves the most recent information. Question 35 A RAG system is deployed for real-time sports commentary. Which architecture ensures minimal latency while updating the LLM with live scores? A. Use feature serving to provide real-time embeddings and features to the LLM. B. Precompute embeddings daily. C. Only update scores once per game. D. Pass raw feeds directly without preprocessing. Answer: A Explanation: Feature serving allows fast, real-time delivery of structured data to the LLM for lowlatency updates.

A RAG system retrieves chunks from multiple document sources. Some documents contain outdated procedures. How should the engineer prevent outdated information from being used? A. Include validity or effective dates in metadata and filter accordingly. B. Ignore dates entirely. C. Only retrieve from PDFs. D. Shuffle chunks randomly. Answer: A Explanation: Filtering by effective dates ensures only relevant and current information is used for responses. Question 40 A Generative AI Engineer wants to measure end-to-end system efficiency of their RAG application. Which metric is most appropriate? A. Query latency (time from user query to LLM response) B. Model training loss C. Number of embeddings D. Token count per chunk Answer: A Explanation: Query latency reflects the operational efficiency of the system and user experience. Question 41 An engineer wants to evaluate retrieval strategies for a RAG system without relying on the LLM. Which method is suitable? A. Use quantitative metrics like recall, precision, or NDCG on a labeled dataset of queries and expected chunks. B. Ask the LLM to rate its own retrieval. C. Measure only chunk size. D. Randomly select queries. Answer: A Explanation: Quantitative metrics provide objective evaluation of retrieval effectiveness without LLM bias. Question 42

A RAG system retrieves hundreds of chunks per query, but the LLM has a token limit. Which approach resolves this problem? A. Rank chunks by relevance and include only top-k in the prompt. B. Concatenate all chunks regardless of size. C. Randomly select chunks. D. Split the query into multiple prompts. Answer: A Explanation: Selecting top-k relevant chunks ensures the LLM receives the most useful context without exceeding token limits. Question 43 An engineer wants to ensure that a customer support RAG system is scalable to millions of queries per day. Which solution is most appropriate? A. Use distributed vector stores and parallel retrieval. B. Embed fewer documents to reduce retrieval load. C. Use keyword search instead of embeddings. D. Only process queries sequentially. Answer: A Explanation: Distributed vector stores and parallel retrieval allow scaling to large query volumes efficiently. Question 44 A RAG system retrieves documents from multiple sources with slightly different formats. What is the best preprocessing strategy? A. Normalize text and extract structured metadata to standardize chunks. B. Use raw text as-is without preprocessing. C. Remove metadata entirely. D. Only process one source type. Answer: A Explanation: Normalization and structured metadata improve retrieval quality and allow the LLM to reason effectively across sources. Question 45

Certified Generative AI Engineer Associate Exam (Databricks), Exams of Nursing

Partial preview of the text

Download Certified Generative AI Engineer Associate Exam (Databricks) and more Exams Nursing in PDF only on Docsity!

Certified Generative AI Engineer Associate Exam

(Databricks)

2026 - 2028 Latest Version: 6.0 questions with

verified answers and rationales | instant pdf

download.