Palantir Data Engineering Certification Exam: Q&A, Exams of Social Sciences

A series of questions and answers related to the palantir data engineering certification exam. It covers topics such as configuring direct connections, integrating data from azure storage, securing foundry agent hosts, and using virtual tables for seamless data integration. The questions also address data transformation, security interoperability, pipeline implementation, and leveraging jupyter notebooks within palantir aip. The material is designed to test and enhance understanding of palantir foundry's features and best practices for data engineering tasks, including data synchronization, data parsing, and security configurations. It also explores the kinetic elements in the palantir ontology and recommended practices for pyspark code readability.

Typology: Exams

2024/2025

Available from 06/06/2025

ROCKY-B
ROCKY-B 🇰🇪

4.4

(16)

40K documents

1 / 21

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
PALANTIR DATA ENGINEERING CERTIFICATION
EXAM
2. Which of the following is the correct sequence of steps to configure a direct
connection in Foundry's managed SaaS platform?
configure a network policy → provision credentials → create the source in data
connection → configure network egress policy
create the source in data connection → configure a network policy → configure network
egress policy → provision credentials
provision credentials → configure network egress policy → create the source in data
connection → configure a network policy
configure a network egress policy → provision credentials → create the source in data
connection → configure a network policy - Answers :configure a network egress policy
→ provision credentials → create the source in data connection → configure a network
policy
5. You are responsible for integrating data from an Azure storage account into Foundry.
To ensure optimal uptime and performance without managing additional infrastructure,
which connection method should you configure?
Third-Party Sync Tool
Agent-based Connection
Manual Network Tunneling
Direct Connection - Answers :Direct Connection
8. What is the minimum recommended amount of RAM for a Foundry agent host?
12 GB
8 GB
32 GB
16 GB - Answers :16 GB
9. Which of the following are part of securing a Foundry agent host? Select two.
Allow all inbound traffic to facilitate connectivity.
Allow network traffic only from specific IPs.
Open all ports for flexibility.
Install antivirus software on the host.
Ensure the agent host can talk to Palantir.
Configure the firewall to block all traffic except to desired destinations. - Answers
:Ensure the agent host can talk to Palantir.
Configure the firewall to block all traffic except to desired destinations.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15

Partial preview of the text

Download Palantir Data Engineering Certification Exam: Q&A and more Exams Social Sciences in PDF only on Docsity!

PALANTIR DATA ENGINEERING CERTIFICATION

EXAM

  1. Which of the following is the correct sequence of steps to configure a direct connection in Foundry's managed SaaS platform? configure a network policy → provision credentials → create the source in data connection → configure network egress policy create the source in data connection → configure a network policy → configure network egress policy → provision credentials provision credentials → configure network egress policy → create the source in data connection → configure a network policy configure a network egress policy → provision credentials → create the source in data connection → configure a network policy - Answers :configure a network egress policy → provision credentials → create the source in data connection → configure a network policy
  2. You are responsible for integrating data from an Azure storage account into Foundry. To ensure optimal uptime and performance without managing additional infrastructure, which connection method should you configure? Third-Party Sync Tool Agent-based Connection Manual Network Tunneling Direct Connection - Answers :Direct Connection
  3. What is the minimum recommended amount of RAM for a Foundry agent host? 12 GB 8 GB 32 GB 16 GB - Answers :16 GB
  4. Which of the following are part of securing a Foundry agent host? Select two. Allow all inbound traffic to facilitate connectivity. Allow network traffic only from specific IPs. Open all ports for flexibility. Install antivirus software on the host. Ensure the agent host can talk to Palantir. Configure the firewall to block all traffic except to desired destinations. - Answers :Ensure the agent host can talk to Palantir. Configure the firewall to block all traffic except to desired destinations.
  1. A data engineer needs to integrate data from various legacy systems into Palantir AIP without modifying the existing data formats. Which feature of Palantir AIP facilitates this seamless integration? Metadata Services Virtual Tables REST Interfaces Palantir HyperAuto Pipelines - Answers :Virtual Tables
  2. Which of the following actions can be performed after successfully syncing a table range from a Fusion sheet to a dataset in Foundry? Select three. Change the branch of the dataset. Modify the export column type to match desired data types. Delete the original Fusion sheet without affecting the dataset. Use both sheet sync and table sync on the same Fusion sheet. Automatically merge changes from multiple Fusion sheets. Rename the synced dataset. - Answers :Change the branch of the dataset. Modify the export column type to match desired data types. Rename the synced dataset.
  3. Which open data format is used by default for transformed data in Palantir AIP to ensure compatibility with existing data architectures? JSON Parquet CSV Avro - Answers :Parquet
  4. Which of the following are responsibilities of Action types in the Palantir Ontology? Select two. Provide object type polymorphism Define link types Capture data from operators Author business logic Orchestrate decision-making processes Define object properties - Answers :Capture data from operators Orchestrate decision-making processes
  5. You are responsible for syncing a specific range of data from a Fusion spreadsheet to a dataset in Foundry to be used by Contour. After selecting the desired table range and initiating the sync, what must you ensure to avoid synchronization issues? Ensure that the dataset has Viewer permissions. Export the synced data as a CSV file immediately after syncing.

Aligning pipeline logic with the ontology's entity and relationship definitions. Using only default transformation settings without customization. Avoiding documentation to keep the pipeline simple. Manually verifying each pipeline run for consistency. Ensuring that data transformations preserve the integrity of semantic relationships. Implementing error handling to manage discrepancies between data sources and ontology requirements. - Answers :Aligning pipeline logic with the ontology's entity and relationship definitions. Ensuring that data transformations preserve the integrity of semantic relationships.

  1. You are assigned to maintain a critical data pipeline in Foundry that has been experiencing intermittent failures. To ensure timely resolution and support, which of the following support structures should you establish? Implement a ticketing system for tracking support requests and resolutions. Create detailed documentation outlining common issues and troubleshooting steps. Set up automated alerting for pipeline failures and performance issues. Restrict access to the pipeline only to senior data engineers. - Answers :Implement a ticketing system for tracking support requests and resolutions. Create detailed documentation outlining common issues and troubleshooting steps. Set up automated alerting for pipeline failures and performance issues.
  2. A data scientist wants to leverage their existing Jupyter notebooks within Palantir AIP for data analysis without switching to a different interface. Which feature of Palantir AIP should they utilize to achieve this? REST Interfaces Virtual Tables Palantir HyperAuto Pipelines Code Workspaces - Answers :Code Workspaces
  3. What are the kinetic elements in the Palantir Ontology? Objects, Properties, Links Actions, Functions Semantics, Interfaces Object Types, Link Types - Answers :Actions, Functions
  4. Which Linux operating system version is specifically recommended for hosting a Foundry agent? Ubuntu 18. Fedora 34 Debian 10 Red Hat Enterprise Linux 8 - Answers :Red Hat Enterprise Linux 8
  1. What actions are performed when the ModelOutput.publish() method is called in Foundry's Code Repositories? Select two: It serializes the model using the ModelAdapter.save() method. It initializes the model adapter with the fresh model. It runs the model inference. It creates a new model version. - Answers :It serializes the model using the ModelAdapter.save() method. It creates a new model version.
  2. Which of the following statements correctly describes the behavior of the FileSystem.open() method in Foundry Transforms? it allows random access to any part of that file it automatically infers the file schema upon opening it returns a writable stream by default it provides a read-only stream without support for seek or tell methods - Answers :it provides a read-only stream without support for seek or tell methods
  3. Which of the following are recommended practices for chaining expressions in PySpark to enhance code readability? Select two. isolate each logical group of transformations into separate code blocks. chain as many expressions as possible for conciseness. use backslashes () for line breaks in chains. limit chains to a maximum of 5 statements. extract complex logic into separate functions. nest multiple chains within a single expression block. - Answers :limit chains to a maximum of 5 statements. extract complex logic into separate functions.
  4. You need to inject a TransformContext into your Transform's compute function to access the current Spark session. How should you define the parameters of your compute function? def compute(context, input, output): def compute(input, output): def compute(input, output, ctx): def compute(ctx, input, output): - Answers :def compute(ctx, input, output):
  5. You have a dataset in the Foundry filesystem that includes JPEG and PDF files, and you want to upload only the PDF files to a media set. Which parameter can you use in the put_dataset_files() method to achieve this? upload_specific_types=['pdf']

Download and extract all packages in the solved environment Compile the Python source code Link packages into the environment Verify package contents - Answers :Download and extract all packages in the solved environment Link packages into the environment Verify package contents

  1. Which of the following Python libraries is NOT recommended for training models in Foundry's Code Repositories? scikit-learn SparkML PyTorch TensorFlow - Answers :SparkML
  2. Which of the following are recommended practices for refactoring complex logical operations in PySpark transformations? Chain multiple 'filter()' and 'withColumn()' calls in a single line Extract complex logic into separate functions. Use deeply nested parentheses to encapsulate logical operations. Group logic into named variables. Keep logic expressions inside the same code block to 3 expressions at most. Duplicate code for better readability. - Answers :Extract complex logic into separate functions. Group logic into named variables. Keep logic expressions inside the same code block to 3 expressions at most.
  3. You are developing a Transform in Foundry that processes input dataframes using PySpark and needs to output multiple datasets based on different filters. Which decorator should you use to define this Transform? @ transform_df @ transform_pandas @ transform_file @ transform - Answers :@ transform
  4. You are performing a left join between two DataFrames in PySpark, but realize that the right DataFrame may have multiple matches for some keys, leading to duplicate rows in the output. According to the style guide, what should you do to prevent this 'join explosion'? Use .dropDuplicates() after the join Switch to an inner join to avoid duplicates Ensure the join key in the right DataFrame is unique

Use a right join instead of a left join - Answers :Ensure the join key in the right DataFrame is unique

  1. Which of the following are considered bad practices when performing joins in PySpark? Using dataframe aliases to disambiguate column names. Dropping unnecessary columns after the join. Ensuring the key you join on is unique when performing left joins. Using right joins. Explicitly specifying the join type. Allowing expressions that duplicate columns in the output. - Answers :Using right joins. Allowing expressions that duplicate columns in the output.
  2. When defining Transform logic level versioning (TLLV), which of the following factors are included in the default version string? Select three. The names of all input datasets All modules the Transform depends on The module where the Transform is defined Any project dependencies The runtime environment configuration All functions within the Transform - Answers :All modules the Transform depends on The module where the Transform is defined Any project dependencies
  3. When would you choose to use the 'Merge with fast-forward' mode in Foundry's Code Repositories? When you need to create a new commit that combines all changes from the pull request. When the target branch has diverged significantly from the source branch. When you want to maintain a detailed commit history with merge commits. When there are no additional changes on the target branch and you want a linear commit history. - Answers :When there are no additional changes on the target branch and you want a linear commit history.
  4. You want to leverage distributed processing in Foundry Transforms to handle files of varying sizes efficiently. Which Spark configuration properties should you adjust to control the partitioning of the FileStatus DataFrame? Select two. spark.executor.cores spark.executor.memory spark.sql.files.openCostInBytes spark.driver.memory spark.sql.files.maxPartitionBytes - Answers :spark.sql.files.openCostInBytes

Use Data Connection syncs to manage shared datasets - Answers :Treat the shared dataset as an input in only one pipeline and ignore it in others Create a new pipeline dedicated to building the shared dataset and have other pipelines treat it as an input

  1. You need to completely replace the existing data in a dataset with a new batch of data. Which type of transaction should you perform in Foundry? APPEND DELETE UPDATE SNAPSHOT - Answers :SNAPSHOT
  2. You have created a new repository named 'Data_Processor' for your shared Python library in Foundry. According to Conda's naming conventions, how will this repository name be published as a Conda package? data_processor Data-Processor data-processor data processor - Answers :data-processor
  3. Which of the following actions can help debug a hanging build in Foundry? Select two. Restart the build Use Job Comparison tool Take a snapshot in Spark Enable AI error enhancer Download driver logs before canceling the build Upgrade the repository - Answers :Take a snapshot in Spark Download driver logs before canceling the build
  4. You have added a new Python library to your Foundry Code Repository, but when you try to import its modules in your code, they are not recognized. What should you do to resolve this issue? Switch to a different branch. Reinstall the entire environment. Manually edit the meta.yaml file. Restart Code Assist. - Answers :Restart Code Assist.
  5. Which of the following configurations can be used to customize Transform Logic Level Versioning (TLLV) in Foundry? Select two. Adding tllvFiles with specific file paths in transformsPython configuration.

Setting tllv to false in transformsPython configuration. Using @transform decorator with multiple Output specifications. Enabling tllvIncludeDeps to prevent invalidation when dependencies change. - Answers :Adding tllvFiles with specific file paths in transformsPython configuration. Setting tllv to false in transformsPython configuration.

  1. To prevent changes from being overwritten by other users, what is the recommended practice when working with branches in Foundry? Assign one active developer per individual branch. Have multiple users share the same branch. Restrict branch creation to administrators only. Merge changes frequently to avoid conflicts. - Answers :Assign one active developer per individual branch.
  2. To set up test coverage reporting in your Python repository using PyTest, which of the following steps should you perform? Select two. Create a pytest.ini file with coverage options. Install the coverage package using pip separately. Configure the build.gradle file to include coverage tasks. Add 'pytest-cov' to the test requirements in meta.yml. - Answers :Add 'pytest-cov' to the test requirements in meta.yml.
  3. What are the necessary steps to configure an incremental batch sync for a JDBC connection in Foundry? Select three. Disable the preview functionality before configuring incremental sync. Set the transaction type to APPEND. Modify the SQL query to include a WHERE clause with the incremental column using the wildcard '?' Set the initial value of the incremental column to zero regardless of previous syncs. Use the Overwrite transaction type to ensure data consistency. Enable the Incremental option and configure the incremental state. - Answers :Set the transaction type to APPEND. Modify the SQL query to include a WHERE clause with the incremental column using the wildcard '?' Enable the Incremental option and configure the incremental state.
  4. Which of the following practices help in minimizing breaking changes when modifying dataset schemas? Select two. Modifying existing columns to repurpose them for new data types. Avoiding any changes to the schema to prevent breaking changes. Creating new columns instead of modifying or deleting existing ones. Deleting old columns immediately after adding new ones.

Datasets built by the schedule and used by other datasets within the same schedule. Find datasets consumed by external applications. Datasets built by the schedule that are not used by any other datasets in the schedule. - Answers :Datasets built by the schedule and used by other datasets within the same schedule.

  1. Which of the following steps are necessary to set up test coverage reporting in your Python repository using PyTest? Apply the 'com.palantir.conda.pep8' Gradle plugin. Add 'pytest-cov' to the test requirements in meta.yml. Create a pytest.ini file with coverage options. Set the 'coverage-report' parameter in build.gradle. - Answers :Add 'pytest-cov' to the test requirements in meta.yml. Create a pytest.ini file with coverage options.
  2. You are working on a PySpark transformation in Foundry and need to rename all columns of a DataFrame from uppercase to lowercase. The current implementation uses a for loop to iterate over the columns, which is causing performance issues. According to the PySpark style guide, what is the recommended approach to rename the columns efficiently? Manually specify each column rename operation. Use DataFrame.renameAll() method. Use a list comprehension with select() and alias(). Use withColumnRenamed() inside the for loop. - Answers :Use a list comprehension with select() and alias().
  3. Which decorator should you use if your Transform needs to handle file-based datasets rather than DataFrame objects? @transform_file @file_transform @transform_files @transform - Answers :@transform
  4. Which of the following actions will trigger re-resolution of Conda lock files in Foundry's Code Repositories? Select three. Running Task Runner. Upgrading to a newer template version. Changing the list of packages in the meta.yaml file. Adding a new branch to the repository. Modifying the build.gradle file. Deleting the hidden Conda lock files. - Answers :Upgrading to a newer template version. Changing the list of packages in the meta.yaml file.

Deleting the hidden Conda lock files.

  1. You are developing a Transform in Foundry that needs to read only JSON files from an input dataset for further processing. Which method and parameter should you use to list these files efficiently? filesystem.list('.json') filesystem.open('.json') filesystem.read_files('.json') filesystem.ls(glob='.json') - Answers :filesystem.ls(glob='*.json')
  2. Which of the following are considered product types as defined in the release process? Use-case product Transform product Workflow product Ontology product Feature product Schema product - Answers :Use-case product Ontology product
  3. What determines the starting point for calculating a dataset view in Foundry? The latest SNAPSHOT transaction before that point in time. The earliest transaction in the dataset. The first DELETE transaction. The latest APPEND transaction. - Answers :The latest SNAPSHOT transaction before that point in time.
  4. Which decorator must be used when defining a Python transform that utilizes media sets in Foundry? @transform_media @transform @media_transform @media_set - Answers :@transform
  5. Which of the following steps are necessary for publishing a trained model in Foundry's Code Repositories? Select two. Use SparkML for training. Call ModelOutput.publish() to save the model. Author a model adapter. Write a Python transform to train the model. - Answers :Call ModelOutput.publish() to save the model.

Read the entire file into a string and split it by lines. Enable random access by using the seek method on the file stream. - Answers :Use FileSystem.open() to stream the file and process it line by line.

  1. Which of the following actions are necessary to add an object type to your data lineage graph in Foundry? Select two. Open the View node properties panel and click the Settings icon next to the object type. Select the object type from the search results to add it to your data lineage graph. Use the Search Foundry tool in the right sidebar to find the desired object type. Select the dataset and filter the list of related artifacts to include object types. - Answers :Select the object type from the search results to add it to your data lineage graph. Use the Search Foundry tool in the right sidebar to find the desired object type.
  2. You initiated a build on a feature branch with a fallback chain of feature → master, where dataset A is on master. During the build, two jobs are executed serially: the first job writes to dataset B on the feature branch, and the second job writes to dataset C on the feature branch. What will be the state of dataset A after the build? Dataset A on the master branch is updated with new data. Dataset A on the feature branch is updated with new data. Both feature and master branches of dataset A are updated. Dataset A remains unchanged. - Answers :Dataset A remains unchanged.
  3. Which of the following file formats is recommended to store unstructured data within a Foundry dataset? Parquet Text JSON Avro - Answers :Text
  4. How can you disable specific PyLint messages, such as 'missing-module-docstring', in your Python project within Foundry? Use command-line arguments when running PyTest to disable the messages. Remove the associated code that triggers the messages. Edit the build.gradle file to exclude these messages. Modify the src/.pylintrc file to disable the specific messages. - Answers :Modify the src/.pylintrc file to disable the specific messages.
  5. Which of the following features are available under the Details view in Foundry's Dataset Preview? Select three. Adding custom metadata fields Viewing and downloading dataset files

Comparing datasets Monitoring real-time data streams Editing the dataset schema Scheduling data syncs - Answers :Adding custom metadata fields Viewing and downloading dataset files Editing the dataset schema

  1. You have transitioned a data pipeline to maintenance mode and need to ensure it continues to meet user requirements. What should you define first before starting the maintenance process? The user access permissions for the pipeline The cost of maintaining the pipeline The pipeline's data scope and delivery expectations The technical architecture of upstream systems - Answers :The pipeline's data scope and delivery expectations
  2. You are developing a Transform within the 'Data Cleaning Project' in Foundry. Your Transform requires access to a dataset owned by the 'Customer Data Project.' According to Project references guidelines, what action should you take to include the 'Customer Data Project' dataset as an input for your Transform? Export the 'Customer Data Project' dataset to the 'Data Cleaning Project'. Add a Project reference to the 'Customer Data Project' dataset. Update the code repository's language packages to include the 'Customer Data Project'. Directly reference the dataset without any additional configuration. - Answers :Add a Project reference to the 'Customer Data Project' dataset.
  3. You are tasked with writing a model to the output dataset in Foundry using the pickle module. Which mode should you use when opening the file with FileSystem.open()? 'r' 'wb' 'rb' 'w' - Answers :'wb'
  4. You are tasked with setting up a new transform in Palantir Foundry that utilizes Palantir's OpenAI GPT-4 language model to analyze customer feedback. Which of the following steps should you perform first? Enable AIP on your Foundry enrollment. Import the OpenAiGptChatLanguageModel class in your Python file. Add the palantir_models library to your Code Repository. Write the compute function using the @transform decorator. - Answers :Enable AIP on your Foundry enrollment.

pandas numpy palantir_models scipy - Answers :palantir_models

  1. Which Gradle plugin should be applied to enable the Spark anti-pattern linter in your Python project within Foundry? 'com.palantir.conda.pylint' 'com.palantir.conda.pep8' 'com.palantir.transforms.lang.pytest-defaults' 'com.palantir.transforms.lang.antipattern-linter' - Answers :'com.palantir.transforms.lang.antipattern-linter'
  2. Which of the following functions are performed by Foundry builds concerning branches? Select two. Resolving job inputs and outputs with respect to the build branch and fallback branches. Creating new branches for each build. Automatically deleting unused branches after a build. Compiling the build graph by collecting JobSpecs from branches. Merging changes from multiple branches into the build branch. Ensuring that builds modify all dataset branches. - Answers :Resolving job inputs and outputs with respect to the build branch and fallback branches. Compiling the build graph by collecting JobSpecs from branches.
  3. Which section within the Information panel of Foundry's Dataset Preview provides details such as the dataset's creation time, last update, and the users responsible for these actions? About Columns Schedules Data Preview - Answers :About
  4. Which of the following methods adhere to the recommended PySpark style when adding new columns to a DataFrame? Using withColumnRenamed to add new columns. Adding new columns directly without specifying the method. Using select with multiple expressions to add new columns. Using withColumn to add each new column individually. - Answers :Using withColumn to add each new column individually.
  1. In the recommended branching strategy, what is the primary role of the 'master' branch? It is used to create short-lived feature branches. It integrates schema changes at specific cadences. It is the production branch and is sourced with production data. It serves as the staging branch for testing new features. - Answers :It is the production branch and is sourced with production data.
  2. When defining Transform logic level versioning (TLLV), which of the following factors are included in the default version string? Select three. The module where the Transform is defined The runtime environment configuration The names of all input datasets All functions within the Transform Any project dependencies All modules the Transform depends on - Answers :The module where the Transform is defined Any project dependencies All modules the Transform depends on
  3. Which of the following statements accurately describe the purpose and functionality of retention policies in Foundry? Select three. Retention policies are used to enforce data governance requirements. Retention policies minimize storage costs by deleting data no longer needed. Retention policies remove file references from the dataset view based on specified criteria. Retention policies permanently delete files from the backing filesystem. Retention policies automatically version datasets. Retention policies replace existing dataset schemas. - Answers :Retention policies are used to enforce data governance requirements. Retention policies minimize storage costs by deleting data no longer needed. Retention policies remove file references from the dataset view based on specified criteria.
  4. When using the transform_df() decorator, what is the expected return type of the compute function? Python dictionary pandas.DataFrame pyspark.sql.DataFrame None - Answers :pyspark.sql.DataFrame