PrepIQ Informatica Data Engineering 10 2 Developer Professional Ultimate Exam, Exams of Technology

Focuses on validating skills in Informatica Data Engineering Integration using DEI/BDM 10.2. Topics include big data integration, mass ingestion, transformations, mappings, workflows, Hadoop ecosystem integration, Spark engine optimization, pushdown execution, and real-time streaming pipelines. The practice exam simulates real-world processing tasks such as ingesting large datasets, applying transformations efficiently, debugging jobs, monitoring performance, and tuning pipelines in distributed environments.

Typology: Exams

2025/2026

Available from 04/28/2026

shilpi-jain-3
shilpi-jain-3 🇮🇳

2.5

(11)

80K documents

1 / 87

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
PrepIQ Informatica Data
Engineering 10 2 Developer
Professional Ultimate Exam
**Question 1. Which component of the Hadoop ecosystem is primarily
responsible for storing large data sets across a cluster?**
A) MapReduce
B) YARN
C) HDFS
D) Hive
Answer: C
Explanation: HDFS (Hadoop Distributed File System) provides scalable,
fault-tolerant storage by distributing data blocks across multiple nodes.
**Question 2. In the context of YARN, what does the ResourceManager do?**
A) Executes map and reduce tasks on nodes
B) Schedules containers and arbitrates resources across the cluster
C) Stores metadata about HDFS files
D) Provides a SQL-like query interface
Answer: B
Explanation: The YARN ResourceManager is the central authority that allocates
resources (CPU, memory) to applications by managing containers.
**Question 3. Which Informatica engine enables the execution of transformations
on Spark?**
A) Blaze Engine
B) Smart Executor
C) Polyglot Computing Engine
D) Data Integration Engine
Answer: C
Explanation: Informatica’s Polyglot Computing Engine abstracts underlying
processing engines, allowing mappings to run on Spark when configured.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57

Partial preview of the text

Download PrepIQ Informatica Data Engineering 10 2 Developer Professional Ultimate Exam and more Exams Technology in PDF only on Docsity!

Engineering 10 2 Developer

Professional Ultimate Exam

Question 1. Which component of the Hadoop ecosystem is primarily responsible for storing large data sets across a cluster? A) MapReduce B) YARN C) HDFS D) Hive Answer: C Explanation: HDFS (Hadoop Distributed File System) provides scalable, fault-tolerant storage by distributing data blocks across multiple nodes. Question 2. In the context of YARN, what does the ResourceManager do? A) Executes map and reduce tasks on nodes B) Schedules containers and arbitrates resources across the cluster C) Stores metadata about HDFS files D) Provides a SQL-like query interface Answer: B Explanation: The YARN ResourceManager is the central authority that allocates resources (CPU, memory) to applications by managing containers. Question 3. Which Informatica engine enables the execution of transformations on Spark? A) Blaze Engine B) Smart Executor C) Polyglot Computing Engine D) Data Integration Engine Answer: C Explanation: Informatica’s Polyglot Computing Engine abstracts underlying processing engines, allowing mappings to run on Spark when configured.

Engineering 10 2 Developer

Professional Ultimate Exam

Question 4. The Smart Executor in Informatica primarily provides which benefit? A) Automatic code generation for Java B) Parallel execution of transformation logic on the target database C) Dynamic allocation of compute resources based on workload D) Real-time data profiling Answer: C Explanation: Smart Executor monitors runtime metrics and dynamically scales resources, optimizing performance for large data jobs. Question 5. Which layer of the Informatica abstraction model isolates developers from underlying Hadoop infrastructure? A) Physical Data Object layer B) Logical Mapping layer C) Integration Service layer D) Abstraction Layer Answer: D Explanation: The Informatica abstraction layer abstracts Hadoop details, allowing developers to design mappings without handling low-level HDFS or YARN specifics. Question 6. When creating a Physical Data Object (PDO) for a Hive table, which property must be defined to enable partition pruning? A) Primary key B) Partition columns C) Data type of each column D) File format Answer: B Explanation: Defining partition columns lets Informatica generate queries that prune unnecessary partitions, improving performance.

Engineering 10 2 Developer

Professional Ultimate Exam

Question 10. Which lookup mode provides the best performance for large reference data sets in a Hadoop environment? A) Connected, active B) Connected, passive C) Unconnected, active D) Unconnected, passive Answer: B Explanation: A connected passive lookup loads the reference data once and keeps it in memory, reducing repeated I/O on large Hadoop tables. Question 11. In a dynamic mapping, what does the “Dynamic Port” feature allow? A) Automatic generation of target tables B) Handling of schema changes without redesigning the mapping C) Real-time monitoring of row counts D) Encryption of data at rest Answer: B Explanation: Dynamic ports adapt to schema drift, enabling the mapping to process new or altered source columns without manual changes. Question 12. Which parameter type is evaluated only once at the start of a workflow execution? A) Mapping variable B) Workflow variable C) Object parameter with “runtime” scope D) Parameter file entry Answer: B

Engineering 10 2 Developer

Professional Ultimate Exam

Explanation: Workflow variables are set before the workflow starts and remain constant throughout its execution. Question 13. Which task in Workflow Manager is used to pause a workflow until a specific file appears in a directory? A) Event Task B) Timer Task C) Decision Task D) Command Task Answer: A Explanation: An Event Task can be configured to wait for a file-based event, such as the arrival of a trigger file. Question 14. When deploying an application, which Informatica service is responsible for executing the mapping logic? A) Repository Service B) Integration Service C) Domain Service D) Monitoring Service Answer: B Explanation: The Integration Service runs the data integration jobs, including mappings and workflows, on the designated runtime engine. Question 15. In Big Data Streaming (BDS), which component provides the ability to process continuous data streams using micro-batches? A) Kafka Connect B) Spark Streaming Engine C) Flume Agent D) Hive Streaming

Engineering 10 2 Developer

Professional Ultimate Exam

Answer: C Explanation: ORC (Optimized Row Columnar) stores data in a columnar layout, offering high compression and fast query performance. Question 19. In Informatica, what does the “Pushdown Optimization” option do for a mapping executed on Hadoop? A) Forces all logic to run on the client machine B) Pushes eligible transformation logic to the database or Hadoop engine for execution C) Compresses source files before loading D) Generates a Java source file for debugging Answer: B Explanation: Pushdown Optimization offloads compatible transformations to the underlying engine (e.g., Hive, Spark), reducing data movement. Question 20. Which transformation would you use to assign a sequential numeric value to each row in a mapping? A) Sequence Generator B) Rank C) Sorter D) Filter Answer: A Explanation: The Sequence Generator produces a monotonically increasing number for each row, useful for surrogate keys. Question 21. When profiling data in the Developer tool, which metric indicates the percentage of null values in a column? A) Distinct Count B) Null Count C) Min Value

Engineering 10 2 Developer

Professional Ultimate Exam

D) Average Length Answer: B Explanation: Null Count reports the number of rows where the column value is null, allowing calculation of the null percentage. Question 22. Which of the following best describes the role of the Repository Service? A) Executes data integration jobs B. Manages metadata storage and versioning C) Provides user authentication for the domain D) Monitors workflow performance Answer: B Explanation: The Repository Service stores and manages metadata objects (mappings, sessions, workflows) and handles version control. Question 23. In a workflow, what does the “Decision Task” evaluate? A) Time-based triggers B) File existence C) Boolean expressions based on workflow variables D) Completion status of previous tasks Answer: C Explanation: Decision Tasks use expressions that reference workflow variables to determine the next path in the workflow. Question 24. Which of the following is a key characteristic of a “connected” lookup transformation? A) It can be used only in the source qualifier B) It passes data rows through the lookup regardless of match status C) It requires a separate mapping call to retrieve lookup values

Engineering 10 2 Developer

Professional Ultimate Exam

C) Real-time schema validation against a central repository D) Encryption of schema metadata Answer: B Explanation: Dynamic Schema lets a source adapt to changes such as added, removed, or reordered columns, facilitating schema drift handling. Question 28. Which of the following best describes a “parameter file” in Informatica? A) A file that stores workflow logs B) A file containing key-value pairs for runtime variable substitution C) A file used to define security policies D) A file that holds transformation code snippets Answer: B Explanation: Parameter files supply values for parameters and variables at runtime, enabling environment-specific configurations. Question 29. When using a “Union” transformation, what must be true about the input ports? A) All inputs must have the same number of ports and matching data types B) Input ports can have different data types and will be auto-converted C) Only one input can be connected at a time D) Union can only be used with flat file sources Answer: A Explanation: Union merges rows from multiple pipelines, requiring each input to have identical port structures for consistent output. Question 30. What is the primary purpose of the “Transaction Control” transformation? A) To enforce data type conversions

Engineering 10 2 Developer

Professional Ultimate Exam

B) To group rows into transactions for commit/rollback control C) To generate surrogate keys D) To perform row-level security checks Answer: B Explanation: Transaction Control allows you to define commit, rollback, or disconnect points within a mapping, giving fine-grained transaction management. Question 31. Which of the following statements about “Incremental Loading” is correct? A) It always overwrites the entire target table B) It loads only rows that have changed since the last load, typically using a high-water mark column C) It requires a full table scan of the source each run D) It can only be implemented with flat file sources Answer: B Explanation: Incremental loads use a change indicator (e.g., timestamp, version) to fetch only new or updated rows, reducing data volume. Question 32. In the context of NoSQL data sources, which Informatica connector is used to read from MongoDB? A) Hadoop Connector B) NoSQL Connector C) MongoDB Native Connector D) JSON Connector Answer: C Explanation: The MongoDB Native Connector provides direct read/write capabilities for MongoDB collections.

Engineering 10 2 Developer

Professional Ultimate Exam

Explanation: Command Tasks run operating-system commands or scripts on the host where the Integration Service resides. Question 36. Which of the following best describes “Schema Drift” in big data environments? A) Gradual performance degradation of a Hadoop cluster B) Changes in source data structure (e.g., added columns) over time C) Loss of metadata due to repository corruption D) Increase in data volume beyond cluster capacity Answer: B Explanation: Schema drift refers to evolving source schemas that can break static mappings unless handled dynamically. Question 37. Which Informatica transformation can be used to rank rows based on a numeric column and keep only the top N rows? A) Sorter B) Rank C) Aggregator D) Filter Answer: B Explanation: Rank assigns a rank number to rows based on a sort order and can limit output to a specified rank range. Question 38. When configuring a Kafka source in a streaming mapping, which property defines the offset reset behavior? A) bootstrap.servers B) group.id C) auto.offset.reset D) key.deserializer

Engineering 10 2 Developer

Professional Ultimate Exam

Answer: C Explanation: The auto.offset.reset property determines where the consumer starts reading if no previous offset is found (earliest or latest). Question 39. Which of the following is NOT a valid execution mode for an Informatica mapping on a Hadoop cluster? A) Spark B) Blaze C) Hive D. MapReduce only (without any higher-level engine) Answer: D Explanation: While MapReduce is the underlying engine, Informatica abstracts execution through Spark, Blaze, or Hive; you do not select “MapReduce only” directly. Question 40. In the Developer tool, what does the “Preview” button do for a source definition? A) Executes the entire mapping and writes to the target B) Retrieves a sample of source rows for quick inspection C) Generates the SQL code for the source query D) Validates the mapping syntax only Answer: B Explanation: Preview fetches a limited number of rows from the source, allowing developers to verify column data and types. **Question 41. Which of the following statements about “Pushdown Optimization

  • Full” is correct?** A) It pushes only filter conditions to the source database B) It pushes all eligible transformation logic, including joins and aggregations, to the source engine C) It disables all pushdown and forces local execution

Engineering 10 2 Developer

Professional Ultimate Exam

D) 1025

Answer: C Explanation: Sequence values: 1st = 1000, 2nd = 1005, 3rd = 1010? Wait increment 5, so 1000, 1005, 1010. Actually third is 1010. None of the options match? Correction: With start 1000, increment 5 => 1st 1000, 2nd 1005, 3rd

  1. Option A is 1010. So answer A. Answer: A Explanation: The sequence adds 5 each time; after two increments, the third value is 1000 + 2 × 5 = 1010. Question 45. Which transformation can be used to calculate a running total across rows in a mapping? A) Aggregator with group by set to none and a cumulative expression B) Filter C) Rank D) Joiner Answer: A Explanation: An Aggregator without a group-by clause can compute cumulative expressions using the “running total” syntax. Question 46. In a streaming mapping, which component is responsible for maintaining state across micro-batches? A) Kafka Producer B) Spark Structured Streaming checkpoint C) Hive Metastore D) HDFS NameNode Answer: B Explanation: Checkpointing in Spark Structured Streaming preserves state (offsets, aggregations) between micro-batches.

Engineering 10 2 Developer

Professional Ultimate Exam

Question 47. Which of the following is a valid reason to use a “Connected” lookup instead of an “Unconnected” lookup? A) When you need to reuse the same lookup logic in multiple mappings B) When you want to pass additional columns downstream without extra expressions C) When the lookup source is a flat file D) When you need to perform a self-join on the source Answer: B Explanation: Connected lookups automatically forward rows downstream, allowing you to enrich data without extra mapping logic. Question 48. What does the “High-Water Mark” technique rely on for incremental loads? A) A checksum of the entire source table B) A column (often timestamp or ID) that monotonically increases with each change C) The total row count of the source D) The size of the source file Answer: B Explanation: High-water mark columns indicate the latest processed value, enabling the extraction of only newer rows. Question 49. Which of the following statements about the “Smart Executor” is FALSE? A) It can automatically switch between Spark and Hive based on workload B) It monitors runtime metrics to adjust resource allocation C) It provides built-in data profiling during execution D) It works only with on-premise Hadoop clusters

Engineering 10 2 Developer

Professional Ultimate Exam

Answer: B Explanation: Parallelism creates multiple partitions of the source data, allowing concurrent processing to improve throughput. Question 53. Which of the following is a primary benefit of using the “Hive” execution mode for a mapping? A) Real-time processing of streaming data B) Ability to leverage existing HiveQL queries and tables C) Automatic generation of Java code for custom logic D) Direct write to NoSQL stores without transformation Answer: B Explanation: Hive execution translates mapping logic into HiveQL, allowing reuse of existing Hive tables and queries. Question 54. Which transformation can be used to split rows into multiple output groups based on a condition, without discarding any rows? A) Filter B) Router C) Aggregator D) Joiner Answer: B Explanation: Router evaluates multiple group expressions and routes each row to the first matching group, preserving all rows. Question 55. What does the “Cache Type” property of a lookup define? A) Whether the cache is stored in memory, on disk, or both B) The data type of the lookup key column C) The number of rows to cache per batch D) The timeout for cache refresh

Engineering 10 2 Developer

Professional Ultimate Exam

Answer: A Explanation: Cache Type determines if the lookup cache resides entirely in memory, on disk, or uses a hybrid approach. Question 56. Which of the following is NOT a valid source type for a Physical Data Object in Informatica? A) Relational database B) Hadoop HDFS file C) Kafka topic D) FTP server directory Answer: D Explanation: While you can read files via FTP using a file connector, a “Physical Data Object” refers to a defined source/target; FTP directories are accessed via a file connection, but not a distinct PDO type. The most inaccurate choice is D. Question 57. In a workflow, which task can be used to pause execution for a specific amount of time? A) Timer Task B) Event Task C) Decision Task D) Command Task Answer: A Explanation: Timer Tasks introduce a delay based on a defined interval before proceeding to the next task. Question 58. Which of the following best explains the term “Polyglot” in Informatica’s computing engines? A) Ability to translate code into multiple programming languages B) Support for executing mappings on different processing engines (Spark, Hive, etc.)