Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Cloud Data Engineer Python for China India Philippines Complete Exam Preparation and Study, Exams of Technology

Technology

A regionally focused cloud data engineering guide emphasizing Python development, data pipelines, and cloud analytics practices. It includes hands-on examples, exam practice questions, and localized learning insights to support certification success.

Typology: Exams

2025/2026

Available from 02/22/2026

shilpi-jain-3 🇮🇳

2.5

(11)

80K documents

1 / 105

This page cannot be seen from the preview

Don't miss anything!

Cloud Data Engineer Python for China India

Philippines Complete Exam Preparation and

Study Guide

**Question 1. Which Python library is most appropriate for making asynchronous

HTTP requests to a REST API for high-throughput data ingestion?**

A) requests

B) urllib3

C) aiohttp

D) http.client

Answer: C

Explanation: aiohttp provides an async API that allows many concurrent requests

without blocking, making it ideal for high-throughput ingestion.

**Question 2. In boto3, which method is used to upload a large file to Amazon S3

using multipart upload to improve reliability?**

A) put_object()

B) upload_fileobj()

C) multipart_upload()

D) upload_file()

Answer: D

Explanation: upload_file() automatically manages multipart upload for large files,

handling retries and part size.

**Question 3. When using Google Cloud Storage Python client, which parameter

specifies the destination bucket and object name in a single call?**

A) bucket_name

B) destination_blob_name

Partial preview of the text

Download Cloud Data Engineer Python for China India Philippines Complete Exam Preparation and Study and more Exams Technology in PDF only on Docsity!

Philippines Complete Exam Preparation and

Study Guide

Question 1. Which Python library is most appropriate for making asynchronous HTTP requests to a REST API for high-throughput data ingestion? A) requests B) urllib C) aiohttp D) http.client Answer: C Explanation: aiohttp provides an async API that allows many concurrent requests without blocking, making it ideal for high-throughput ingestion. Question 2. In boto3, which method is used to upload a large file to Amazon S using multipart upload to improve reliability? A) put_object() B) upload_fileobj() C) multipart_upload() D) upload_file() Answer: D Explanation: upload_file() automatically manages multipart upload for large files, handling retries and part size. Question 3. When using Google Cloud Storage Python client, which parameter specifies the destination bucket and object name in a single call? A) bucket_name B) destination_blob_name

Philippines Complete Exam Preparation and

Study Guide

C) source_file_name D) blob_name Answer: B Explanation: destination_blob_name defines the full path (including object name) within the target bucket. Question 4. Which Azure SDK class is used to interact with Azure Blob Storage in Python? A) BlobServiceClient B) AzureBlobClient C) StorageBlobClient D) BlobContainerClient Answer: A Explanation: BlobServiceClient is the entry point for managing containers and blobs in Azure Storage. Question 5. In a Kafka producer written in Python (confluent-kafka), what does the linger_ms configuration control? A) Maximum size of a batch before sending B) Time to wait for additional records before sending a batch C) Number of retries on failure D) Compression algorithm Answer: B

Philippines Complete Exam Preparation and

Study Guide

A) spark.sql.shuffle.partitions B) spark.default.parallelism C) spark.sql.autoBroadcastJoinThreshold D) spark.executor.instances Answer: A Explanation: spark.sql.shuffle.partitions determines how many partitions are created during shuffle operations. Question 9. In a serverless AWS Lambda function written in Python, which environment variable provides the name of the invoked Lambda function? A) AWS_LAMBDA_FUNCTION_NAME B) LAMBDA_FUNCTION_NAME C) AWS_FUNCTION_NAME D) FUNCTION_NAME Answer: A Explanation: AWS_LAMBDA_FUNCTION_NAME is automatically set by Lambda runtime. Question 10. Which of the following is a correct way to register a Python UDF in Google BigQuery using the google-cloud-bigquery library? A) client.register_udf('my_udf', my_function) B) client.create_udf('my_udf', my_function) C) client.query('CREATE TEMP FUNCTION my_udf(x INT64) AS ( ... )')

Philippines Complete Exam Preparation and

Study Guide

D) client.create_routine('my_udf', routine_type='SCALAR', language='PYTHON', arguments=..., body=...) Answer: D Explanation: BigQuery supports Python UDFs via the Routine API; create_routine defines a SCALAR routine with Python code. Question 11. Which file format combines columnar storage with ACID transaction support and is native to Delta Lake? A) Parquet B) ORC C) Delta D) Avro Answer: C Explanation: Delta Lake uses the Delta format, built on Parquet files with transaction logs for ACID properties. Question 12. When performing Change Data Capture (CDC) from MySQL to a cloud data lake using Python, which MySQL feature provides a binary log of changes? A) General Log B) Slow Query Log C) Binary Log (binlog) D) Error Log Answer: C

Philippines Complete Exam Preparation and

Study Guide

A) BashOperator B) PythonOperator C) SparkSubmitOperator D) HttpSensor Answer: B Explanation: PythonOperator runs a Python function as a task. Question 16. Which Airflow sensor is designed to wait for a file to appear in an S bucket? A) S3KeySensor B) S3FileSensor C) S3ObjectSensor D) S3PathSensor Answer: A Explanation: S3KeySensor checks for the existence of a key (file) in S3. Question 17. In Terraform, which provider block is required to manage AWS resources via Python scripts executed as local-exec? A) provider "aws" {} B) provider "python" {} C) provider "local" {} D) provider "aws_lambda" {}

Philippines Complete Exam Preparation and

Study Guide

Answer: A Explanation: The AWS provider configures credentials; local-exec can invoke Python scripts. Question 18. Which Python package can generate realistic fake personal data for data masking purposes? A) faker B) mockaroo C) data-faker D) random-person Answer: A Explanation: faker creates synthetic personal data (names, addresses, etc.) useful for masking. Question 19. When encrypting data client-side before uploading to Azure Blob Storage, which Azure service provides the key management? A) Azure Key Vault B) Azure Storage Encryption C) Azure Secrets Manager D) Azure AD Answer: A Explanation: Azure Key Vault stores and manages encryption keys used for client-side encryption.

Philippines Complete Exam Preparation and

Study Guide

D) Couchbase Answer: C Explanation: pymongo is the official MongoDB driver for Python. Question 23. When using DynamoDB with Python (boto3), which method performs a conditional write that succeeds only if an attribute does not already exist? A) put_item(ConditionExpression=…) B) update_item(ConditionExpression=…) C) insert_item(…) D) write_item(…) Answer: A Explanation: put_item with a ConditionExpression can enforce that an attribute is absent before inserting. Question 24. Which Python library enables vector similarity search against a Pinecone index? A) pinecone-client B) pinecone-sdk C) pinecone-py D) pinecone-vector Answer: A Explanation: pinecone-client is the official Python client for interacting with Pinecone vector databases.

Philippines Complete Exam Preparation and

Study Guide

Question 25. In a data lake on S3, which naming convention best supports Hive partition pruning? A) /year=2023/month=07/day=15/… B) /2023/07/15/… C) /data_20230715_… D) /partition/… Answer: A Explanation: Using key=value pairs (year=…, month=…) matches Hive’s partitioning scheme, enabling automatic pruning. Question 26. Which of the following file formats provides built-in schema evolution and is optimized for streaming writes? A) Parquet B) Avro C) ORC D) JSON Answer: B Explanation: Avro supports schema evolution and is designed for efficient writes, especially in streaming pipelines. Question 27. In CloudWatch Logs, which Python SDK method retrieves log events for a given log stream? A) get_log_events()

Philippines Complete Exam Preparation and

Study Guide

Answer: B Explanation: MEMORY_AND_DISK stores partitions in memory and writes overflow to disk, reducing recomputation. Question 30. Which Python decorator is used to cache the result of a function call in memory for the duration of the program? A) @lru_cache B) @cache C) @memoize D) @cached_property Answer: A Explanation: @lru_cache from functools caches function results based on input arguments. Question 31. In Azure Data Factory, which Python activity allows you to run custom Python code on an Azure Batch pool? A) AzureFunctionActivity B) DatabricksNotebookActivity C) CustomActivity D) PythonScriptActivity Answer: C Explanation: CustomActivity can execute arbitrary scripts, including Python, on Azure Batch.

Philippines Complete Exam Preparation and

Study Guide

Question 32. Which GCP service offers a managed vector database that can be accessed via the google-cloud-aiplatform Python client? A) Vertex AI Matching Engine B) BigQuery Vector Search C) Cloud MemoryStore D) Cloud Spanner Answer: A Explanation: Vertex AI Matching Engine provides vector similarity search with a Python client. Question 33. When using pandas.read_json() with lines=True, what format is expected? A) A single JSON object B) An array of JSON objects C) NDJSON (newline-delimited JSON) D) JSON with comments Answer: C Explanation: lines=True tells pandas to parse each line as a separate JSON record (NDJSON). Question 34. Which Python package provides a high-level API for building data pipelines that can run on multiple execution engines (e.g., Spark, Dask, Pandas)? A) luigi B) prefect

Philippines Complete Exam Preparation and

Study Guide

Explanation: hashlib provides SHA-256 and other hash functions for binary data. Question 37. In Airflow, what does setting retries=3 and retry_delay=timedelta(minutes=5) on a task accomplish? A) The task will run three times in parallel with a 5-minute gap B) After a failure, the task will be retried up to three times, waiting 5 minutes between attempts C) The task will be skipped after three failures D) The task will delay its first run by 5 minutes Answer: B Explanation: retries defines the maximum retry attempts; retry_delay sets the wait time between retries. Question 38. Which of the following is a best practice for handling large CSV files in a Lambda function? A) Load the entire file into memory using pandas B) Use streaming/iterators (e.g., csv.reader) to process line by line C) Write the file to /tmp and process it there D) Increase Lambda memory to 10 GB Answer: B Explanation: Streaming processing avoids memory exhaustion, which is critical in Lambda’s limited environment. Question 39. When using boto3 to assume an IAM role in another AWS account, which method is called?

Philippines Complete Exam Preparation and

Study Guide

A) sts.assume_role() B) iam.assume_role() C) sts.get_federation_token() D) sts.get_session_token() Answer: A Explanation: sts.assume_role returns temporary credentials for the target role. Question 40. Which Python library provides a simple way to create Docker images from a Python script for containerized pipelines? A) docker-py B) pyspark-docker C) dockerfile-generator D) docker Answer: D Explanation: The docker (docker-py) library allows programmatic creation and management of Docker images and containers. Question 41. In GCP, which service stores metadata about data lineage that can be accessed via the google-cloud-datacatalog Python client? A) Cloud Asset Inventory B) Data Catalog C) Cloud Logging D) Cloud Trace

Philippines Complete Exam Preparation and

Study Guide

Question 44. Which Python library can be used to interact with Apache Hive Metastore for managing table schemas? A) pyhive B) hive-client C) hms-api D) impyla Answer: A Explanation: pyhive provides a DB-API compatible interface to Hive, allowing schema queries. Question 45. In Azure Synapse, which Python library is recommended for executing T-SQL statements via the serverless SQL pool? A) pyodbc B) sqlalchemy-azure C) azure-synapse-spark D) synapse-sql-client Answer: A Explanation: pyodbc can connect to Synapse’s SQL endpoint using ODBC drivers. Question 46. Which GCP service provides a managed, auto-scaling Spark environment that can be accessed via the google-cloud-dataproc Python client? A) Dataflow B) Dataproc

Philippines Complete Exam Preparation and

Study Guide

C) Composer D) BigQuery Answer: B Explanation: Dataproc offers managed Spark clusters; the Python client manages jobs and clusters. Question 47. What is the primary advantage of using generators in Python for processing very large datasets? A) They automatically parallelize the code B) They load the entire dataset into memory C) They yield items one at a time, reducing memory usage D) They compress data on the fly Answer: C Explanation: Generators produce items lazily, keeping only the current item in memory. Question 48. In Airflow, which parameter of the DAG object controls the maximum number of active runs for that DAG? A) max_active_runs B) concurrency C) schedule_interval D) catchup Answer: A

Cloud Data Engineer Python for China India Philippines Complete Exam Preparation and Study, Exams of Technology

Related documents

Partial preview of the text

Download Cloud Data Engineer Python for China India Philippines Complete Exam Preparation and Study and more Exams Technology in PDF only on Docsity!

Philippines Complete Exam Preparation and

Study Guide

Philippines Complete Exam Preparation and

Study Guide

Philippines Complete Exam Preparation and

Study Guide

Philippines Complete Exam Preparation and

Study Guide

Philippines Complete Exam Preparation and

Study Guide

Philippines Complete Exam Preparation and

Study Guide

Philippines Complete Exam Preparation and

Study Guide

Philippines Complete Exam Preparation and

Study Guide

Philippines Complete Exam Preparation and

Study Guide

Philippines Complete Exam Preparation and

Study Guide

Philippines Complete Exam Preparation and

Study Guide

Philippines Complete Exam Preparation and

Study Guide

Philippines Complete Exam Preparation and

Study Guide

Philippines Complete Exam Preparation and

Study Guide