













Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Pass the Palantir Data Engineering Certification exam with confidence using this comprehensive 2025 test bank. Featuring 100+ practice questions with detailed rationales covering Foundry direct connections, agent-based connections, network egress policies, Fusion sheet sync, incremental batch sync, media sets, Virtual Tables, PySpark transforms, and more. Perfect for data engineers preparing for Palantir Foundry certification. Download now and ace your Palantir certification on the first try!
Typology: Exams
1 / 21
This page cannot be seen from the preview
Don't miss anything!














1. Which of the following is the correct sequence of steps to configure a direct connection in Foundry's managed SaaS platform? A) configure a network policy → provision credentials → create the source in data connection → configure network egress policy B) create the source in data connection → configure a network policy → configure network egress policy → provision credentials C) provision credentials → configure network egress policy → create the source in data connection → configure a network policy D) configure a network egress policy → provision credentials → create the source in data connection → configure a network policy Answer: D Rationale: The correct sequence begins with configuring a network egress policy to define allowed outbound traffic. Next, provision credentials for authentication to the external source. Then, create the source in data connection to establish the link. Finally, configure a network policy to manage inbound rules. This order ensures security and connectivity are properly established before the connection is fully configured. 2. You are responsible for integrating data from an Azure storage account into Foundry. To ensure optimal uptime and performance without managing additional infrastructure, which connection method should you configure? A) Third-Party Sync Tool B) Agent-based Connection C) Manual Network Tunneling D) Direct Connection Answer: D Rationale: Direct Connection is the recommended method for integrating cloud storage accounts like Azure into Foundry when you want optimal uptime and performance without managing
additional infrastructure. It leverages Palantir's managed SaaS platform to handle connectivity, eliminating the need to deploy and maintain agents or configure complex network tunneling.
3. What is the minimum recommended amount of RAM for a Foundry agent host? A) 12 GB B) 8 GB C) 32 GB D) 16 GB Answer: D Rationale: The minimum recommended RAM for a Foundry agent host is 16 GB. This ensures adequate memory for running the agent software and handling data transfer operations without performance degradation. Lower memory configurations may lead to instability or reduced throughput. 4. Which of the following are part of securing a Foundry agent host? (Select two.) A) Allow all inbound traffic to facilitate connectivity B) Allow network traffic only from specific IPs C) Open all ports for flexibility D) Install antivirus software on the host E) Ensure the agent host can talk to Palantir F) Configure the firewall to block all traffic except to desired destinations Answer: E, F Rationale: Securing a Foundry agent host requires two key actions: (1) ensuring the agent host can communicate with Palantir services (outbound connectivity), and (2) configuring the firewall to block all traffic except to explicitly desired destinations (default-deny stance). Options A and C are insecure practices. Option B is too restrictive. Option D, while good general practice, is not specifically required for Foundry agent security. 5. A data engineer needs to integrate data from various legacy systems into Palantir AIP without modifying the existing data formats. Which feature of Palantir AIP facilitates this seamless integration? A) Metadata Services B) Virtual Tables
8. What are two responsibilities of Action types in the Palantir Ontology? A) Define object properties B) Capture data from operators C) Orchestrate decision-making processes D) Manage user permissions Answer: B, C Rationale: Action types in the Palantir Ontology serve two primary responsibilities: (B) capturing data from operators (user inputs that modify ontology objects) and (C) orchestrating decision- making processes (business logic that determines valid state transitions). Action types are kinetic elements that enable write-back functionality from applications to the ontology. 9. You are responsible for syncing a specific range of data from a Fusion spreadsheet to a dataset in Foundry to be used by Contour. After selecting the desired table range and initiating the sync, what must you ensure to avoid synchronization issues? A) Ensure that the dataset has Viewer permissions B) Export the synced data as a CSV file immediately after syncing C) Only use table sync without any sheet sync in the Fusion sheet D) Use both sheet sync and table sync within the same Fusion sheet Answer: C Rationale: To avoid synchronization issues, you should only use table sync without any sheet sync in the Fusion sheet. Using both sync types concurrently can create conflicts and inconsistent data states. Table sync provides a direct, reliable method for syncing specific ranges while avoiding the complexities of simultaneous sync mechanisms. 10. When syncing a table range from a Fusion sheet to a dataset in Foundry, which of the following conditions must be met to ensure that future changes in the spreadsheet are reflected in the dataset? A) The user must have at least Editor permissions on the dataset B) The dataset must be exported as a CSV file after each sync C) Both sheet sync and table sync must be enabled concurrently D) The user must have Viewer permissions on the dataset Answer: A
Rationale: To ensure future spreadsheet changes are reflected in the synced dataset, the user must have at least Editor permissions on the dataset. Editor permissions allow the sync process to update the dataset with new or modified data from the Fusion sheet. Viewer permissions only allow read access, preventing updates.
11. Which Linux operating system version is specifically recommended for hosting a Foundry agent? A) Ubuntu 18. B) Fedora 34 C) Debian 10 D) Red Hat Enterprise Linux 8 Answer: D Rationale: Red Hat Enterprise Linux 8 (RHEL 8) is the specifically recommended Linux distribution for hosting a Foundry agent. RHEL provides enterprise-grade stability, security features, and long-term support that align with Palantir's production requirements. 12. Which role is required to configure network egress policies in Foundry's managed SaaS platform? A) Information Security Officer B) User C) Project Admin D) Data Pipeline Developer Answer: A Rationale: The Information Security Officer role is required to configure network egress policies in Foundry's managed SaaS platform. This is because egress policies control outbound network traffic from the Foundry environment, which has security implications requiring appropriate permissions. Regular users, Project Admins, and Data Pipeline Developers do not have this level of network configuration access. 13. Which of the following components enhance security interoperability within Palantir AIP? (Select three.) A) SAML integration for authentication B) Using internal scripts for authorization
will only be transmitted to authorized, pre-approved destinations. This security-first approach prevents credential exposure to unauthorized endpoints.
16. Which connection method requires deploying and maintaining software within your network infrastructure? A) Direct Connection B) Agent-based Connection C) Manual Network Tunneling D) Third-Party Sync Tool Answer: B Rationale: Agent-based connections require deploying and maintaining Palantir's agent software within your network infrastructure. The agent acts as a secure bridge between your on-premise data sources and Foundry. While this provides flexibility for complex network environments, it introduces infrastructure management overhead compared to direct connections. 17. A data engineer needs to connect to a database that resides in a private subnet with no direct internet access. Which connection method is most appropriate? A) Direct Connection B) Agent-based Connection C) Virtual Table D) Fusion Sheet Sync Answer: B Rationale: Agent-based connection is most appropriate for databases in private subnets without direct internet access. The agent is deployed inside the private network and initiates outbound connections to Foundry, eliminating the need for inbound firewall rules or public IP addresses on the database server. 18. When configuring a direct connection to an external data source, what is the role of the network policy? A) It encrypts data during transmission B) It defines allowed inbound traffic to the Foundry environment
C) It provisions authentication credentials D) It creates the source in data connection Answer: B Rationale: The network policy defines allowed inbound traffic to the Foundry environment from external sources. It specifies which IP addresses, ports, and protocols are permitted to communicate with Foundry services. This is distinct from egress policies, which control outbound traffic.
19. Which of the following best describes the difference between network policy and network egress policy in Foundry? A) Network policy controls outbound traffic; egress policy controls inbound traffic B) Network policy controls inbound traffic; egress policy controls outbound traffic C) Both control inbound traffic but at different layers D) Both control outbound traffic but for different protocols Answer: B Rationale: In Foundry's networking model, network policies control inbound traffic (connections coming into Foundry from external sources), while network egress policies control outbound traffic (connections initiated from Foundry to external destinations). This distinction is crucial for understanding security boundaries and connectivity requirements. 20. You are integrating data from an external SQL database and need to ensure data synchronization occurs only on weekdays between 9 AM and 5 PM. How should you configure this? A) Set the transaction type to APPEND B) Configure a schedule with specific time windows C) Enable incremental sync with a WHERE clause D) Use a Third-Party Sync Tool Answer: B Rationale: Configuring a schedule with specific time windows (e.g., weekdays 9 AM-5 PM) allows you to control when data synchronization occurs. This is configured in the data connection settings and is independent of incremental sync configuration, which controls which rows are synced, not when sync occurs.
Rationale: The ignore_items_not_matching_schema=True parameter in put_dataset_files() filters files based on schema expectations, allowing you to upload only files that match specified criteria (such as file type). This is commonly used when a dataset contains mixed file types and only a subset should be uploaded to a media set.
24. What is the first step to set up media sets in your Python transform in Foundry? A) Initialize media sets using the @initialize_media_set decorator B) Add a dependency on 'transforms-media' in your code repository C) Create media sets directly in the Python code D) Use the @media_set_input decorator to specify media sets Answer: B Rationale: The first step to set up media sets is to add a dependency on 'transforms-media' in your code repository's configuration (e.g., meta.yml or requirements file). This makes the media set APIs available. After adding the dependency, you can use decorators like @media_set_input and @media_set_output to work with media sets in transforms. 25. What does the ignore_items_not_matching_schema=True parameter do in the context of media set uploads? A) It uploads all files regardless of schema B) It skips files that don't conform to the expected schema C) It converts non-matching files to match the schema D) It raises an error for non-matching files Answer: B Rationale: When ignore_items_not_matching_schema=True , the put_dataset_files() method skips (ignores) files that don't conform to the expected schema, uploading only those that match. This is useful when a dataset contains a mix of file types (e.g., JPEG and PDF) and you want to upload only a specific type (e.g., PDFs). 26. Which of the following statements about Virtual Tables in Palantir AIP is correct? A) Virtual Tables physically copy data into Foundry storage B) Virtual Tables provide a virtual layer over external data sources without data movement C) Virtual Tables require data format conversion D) Virtual Tables are only available for cloud storage sources
Answer: B Rationale: Virtual Tables provide a virtual abstraction layer over external data sources, allowing querying and integration without physically moving or converting the data. This is the key feature that enables seamless integration with legacy systems while preserving existing data formats.
27. A data engineer needs to connect Foundry to an external database that resides behind a corporate firewall with strict outbound-only rules. Which connection method should be used? A) Direct Connection B) Agent-based Connection C) Virtual Table D) Manual network tunneling Answer: B Rationale: Agent-based connection is ideal for databases behind corporate firewalls with outbound-only rules. The agent is deployed inside the corporate network and initiates outbound connections to Foundry, which is typically allowed by outbound-only firewall policies. This avoids the need to open inbound ports. 28. What is required for a Foundry agent host to communicate with Palantir services? A) Inbound port 443 open B) Outbound access to Palantir endpoints C) Static public IP address D) VPN connection to Palantir Answer: B Rationale: A Foundry agent host requires outbound access to Palantir service endpoints. The agent initiates connections to Palantir, not the reverse. This means outbound traffic (typically on port 443) must be allowed, while inbound access to the agent host is generally not required. 29. Which security practice is specifically recommended for Foundry agent host configuration? A) Allow all inbound traffic for maximum connectivity B) Configure firewall to block all traffic except to desired destinations C) Disable all firewall rules for performance D) Use default operating system security settings
Answer: B Rationale: Using both sheet sync and table sync on the same Fusion sheet creates synchronization conflicts and should be avoided. The two sync mechanisms interfere with each other, leading to inconsistent data states and sync failures.
33. When configuring a direct connection, what is created after provisioning credentials? A) Network egress policy B) Network policy C) The source in data connection D) The agent host Answer: C Rationale: According to the correct sequence, after provisioning credentials, the next step is creating the source in data connection. This establishes the actual link between Foundry and the external data source using the previously configured egress policy and provisioned credentials. 34. Which type of connection method typically requires the most manual infrastructure management? A) Direct Connection B) Agent-based Connection C) Virtual Table D) REST API Answer: B Rationale: Agent-based connections require the most manual infrastructure management because you must deploy, configure, monitor, update, and maintain the agent software on your own infrastructure. Direct connections are managed by Palantir's SaaS platform, reducing operational overhead. 35. A company needs to connect Foundry to multiple on-premise data sources across different geographic locations. What should they consider? A) Using a single agent for all locations B) Deploying separate agents for each location or network segment C) Direct connections are not possible from on-premise D) Virtual Tables only work for cloud sources
Answer: B Rationale: For multiple on-premise data sources across different geographic locations or network segments, deploying separate agents in each location is recommended. This ensures optimal network performance, maintains security boundaries, and provides fault isolation between locations.
36. What is the default transaction type for incremental batch sync in Foundry? A) OVERWRITE B) APPEND C) MERGE D) UPSERT Answer: B Rationale: APPEND is the transaction type used for incremental batch sync. It adds new rows to the existing dataset without modifying or deleting existing records. This is appropriate when the source system only adds new records over time (e.g., log data, event data). 37. Which of the following scenarios is best suited for a Direct Connection rather than an Agent-based Connection? A) Connecting to a database in a private subnet B) Integrating data from an Azure storage account C) Connecting to an on-premise database behind a strict firewall D) Accessing data from an air-gapped network Answer: B Rationale: Direct Connection is best suited for cloud storage services like Azure storage accounts because Palantir's managed SaaS platform can directly access these services over the internet without requiring an agent. Agent-based connections are typically better for on-premise databases behind firewalls or in private subnets. 38. What is the recommended approach for handling credentials when configuring multiple direct connections to the same external system? A) Use the same credentials for all connections B) Create separate credentials for each connection
41. When syncing data from Fusion to Foundry, what permission level is required on the target dataset for changes to propagate? A) Viewer B) Editor C) Owner D) No permissions required Answer: B Rationale: Editor permissions are required on the target dataset for changes to propagate from Fusion. Editor permissions allow write operations, which are necessary for the sync process to update the dataset with new or modified data from the Fusion sheet. Viewer permissions only allow read access. 42. Which connection method is most appropriate for integrating data from a legacy mainframe system that cannot support modern authentication protocols? A) Direct Connection with OAuth B) Agent-based Connection with custom credential handling C) Virtual Table with SAML D) Direct Connection with API keys Answer: B Rationale: Agent-based connections are most appropriate for legacy systems with limited authentication capabilities because the agent can handle credential management and custom authentication logic within your network. The agent acts as a bridge, handling modern authentication with Foundry while interfacing with the legacy system using its native protocols. 43. What is the purpose of the network policy in a direct connection configuration? A) To define allowed outbound traffic to external sources B) To define allowed inbound traffic to Foundry from external sources C) To encrypt data in transit D) To authenticate users accessing the data Answer: B Rationale: The network policy defines allowed inbound traffic to the Foundry environment from external sources. It specifies which external IP addresses, networks, or sources are permitted to
initiate connections to Foundry services. This is the final step in the direct connection configuration sequence.
44. A team notices that their direct connection is failing frequently. Which of the following could be a cause? A) Network egress policy configured before credentials B) Network policy configured before source creation C) Incomplete sequence: egress policy → credentials → source → network policy D) Using the recommended 16 GB RAM for the agent host Answer: C Rationale: If any step in the correct sequence (network egress policy → credentials → source → network policy) is missing or out of order, the direct connection may fail or be incomplete. Each step establishes necessary components for secure, functional connectivity. 45. What type of data source is typically NOT suitable for Direct Connection? A) Azure Blob Storage B) AWS S C) On-premise database in a private subnet without internet access D) Google Cloud Storage Answer: C Rationale: On-premise databases in private subnets without internet access are not suitable for Direct Connection because Direct Connection requires the external source to be reachable over the internet. For such scenarios, an Agent-based Connection is required, as the agent can be deployed inside the private network. 46. When configuring a direct connection, when are credentials provisioned in the correct sequence? A) First, before any network configuration B) Second, after network egress policy but before source creation C) Third, after source creation D) Last, after network policy Answer: B
Rationale: Direct connections are secured through network policies (inbound) and network egress policies (outbound), along with credential management. These layered security controls ensure that only authorized traffic flows between Foundry and external sources.
50. What is the recommended action if a direct connection's network policy needs to be updated? A) Delete and recreate the entire connection B) Update the network policy directly; other components remain configured C) Re-provision credentials after updating the policy D) The network policy cannot be updated after creation Answer: B Rationale: Network policies can be updated directly after the connection is established without recreating other components. Changes to network policies take effect without requiring re- provisioning of credentials or recreation of the data source. This allows for iterative security adjustments. **SECTION 2: DATA PIPELINE DEVELOPMENT – TRANSFORMS (Questions 51–100)
C) Use backslashes () for line breaks in chains D) Limit chains to a maximum of 5 statements E) Extract complex logic into separate functions F) Nest multiple chains within a single expression block Answer: D, E Rationale: Recommended practices for PySpark expression chaining include: (D) limiting chains to a maximum of 5 statements to keep code manageable, and (E) extracting complex logic into separate functions to improve readability and reusability. Option A is not specifically recommended; Option B reduces readability; Option C is unnecessary with proper formatting; Option F increases complexity.
53. You need to inject a TransformContext into your Transform's compute function to access the current Spark session. How should you define the parameters of your compute function? A) def compute(context, input, output): B) def compute(input, output): C) def compute(input, output, ctx): D) def compute(ctx, input, output): Answer: D Rationale: To inject a TransformContext (which provides access to the Spark session among other things), the compute function should define parameters as def compute(ctx, input, output): where ctx is the TransformContext, followed by input datasets and output datasets in order. The context must be the first parameter. 54. Which of the following Python libraries is NOT recommended for training models in Foundry's Code Repositories? A) scikit-learn B) SparkML C) PyTorch D) TensorFlow Answer: B Rationale: SparkML is not recommended for training models in Foundry's Code Repositories due to compatibility and performance considerations. scikit-learn, PyTorch, and TensorFlow are all supported and commonly used for machine learning workloads within Foundry.