




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Exam tests cloud-native data engineering with Python. Topics: data pipelines, ETL frameworks, API integrations, cloud storage/processing, and data security. Audience: data engineers and developers. Format: practical coding tasks, MCQs, and case studies. Difficulty: high due to coding + cloud integration knowledge. Certification validates ability to design and manage cloud data workflows using Python.
Typology: Exams
1 / 181
This page cannot be seen from the preview
Don't miss anything!





























































































Question 1. Which Python data structure is most appropriate for storing unique elements with fast lookup times? A) List B) Tuple C) Set D) Dictionary Answer: C Explanation: Sets in Python are designed to store unique elements and provide O(1) average time complexity for lookups, making them ideal for this purpose. Question 2. What is the primary purpose of the 'with' statement in file handling? A) To open a file for writing only
B) To automatically manage resource cleanup after file operations C) To read data from a file D) To create a new file if it does not exist Answer: B Explanation: The 'with' statement ensures that the file is properly closed after its suite finishes, even if an error occurs, managing resources efficiently. Question 3. Which Python library is most commonly used for data manipulation and analysis? A) NumPy B) Pandas C) Matplotlib D) Seaborn
Question 5. Which of the following file formats is optimized for big data storage and querying? A) CSV B) JSON C) Parquet D) TXT Answer: C Explanation: Parquet is a columnar storage file format optimized for big data processing and efficient querying, especially in distributed systems. Question 6. Which exception handling block is used to execute cleanup code regardless of whether an exception was raised? A) try-except B) try-finally
C) except-else D) try-except-else Answer: B Explanation: The 'finally' block executes code regardless of whether an exception occurred, often used for cleanup actions like closing files. Question 7. Which package manager is used to install Python packages? A) conda B) pip C) npm D) apt-get Answer: B
A) Creates a list from an iterable B) Creates an array object for numerical computations C) Performs matrix multiplication D) Converts a list to a set Answer: B Explanation: np.array() converts a list or other iterable into a NumPy array, enabling efficient numerical operations and vectorization. Question 10. Which Python library is most suitable for creating visualizations like bar plots and histograms? A) Pandas B) NumPy C) Matplotlib
D) Scikit-learn Answer: C Explanation: Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Question 11. Which cloud SDK is used to interact with Amazon S3 in Python? A) google-cloud-storage B) boto C) azure-storage-blob D) cloudstorage Answer: B Explanation: boto3 is the AWS SDK for Python, enabling programmatic access to S3 and other AWS services.
B) google-cloud-bigquery C) sqlalchemy D) psycopg Answer: B Explanation: google-cloud-bigquery is the official client library for interacting with Google BigQuery from Python. Question 14. How can you build a serverless data pipeline using AWS services? A) Using EC2 instances directly B) Using AWS Lambda functions triggered by events C) Using Amazon S3 only D) Using AWS Elastic Beanstalk
Answer: B Explanation: AWS Lambda allows you to run code in response to events, enabling serverless and event-driven architectures for data pipelines. Question 15. Which method in Pandas is used to handle missing data? A) fillna() B) drop_duplicates() C) merge() D) groupby() Answer: A Explanation: fillna() replaces missing values with specified data, aiding in cleaning datasets with null entries.
C) To pause the execution of a program D) To handle exceptions Answer: A Explanation: 'yield' turns a function into a generator, allowing it to produce a sequence of values lazily, which is useful for large datasets. Question 18. Which control flow statement is used to execute a block of code multiple times? A) if B) while C) break D) pass Answer: B
Explanation: 'while' loops repeatedly execute a block as long as a condition is true, enabling iteration. Question 19. When working with large datasets in Pandas, which method helps to process data in chunks to avoid memory overload? A) read_csv() with chunksize parameter B) merge() C) apply() D) drop_duplicates() Answer: A Explanation: Setting chunksize in read_csv() allows reading large files in smaller, manageable pieces, helping manage memory usage.
C) To enable multi-threading D) To facilitate data visualization Answer: B Explanation: Vectorization allows NumPy to perform batch operations efficiently, significantly improving performance over explicit loops. Question 22. Which method in Pandas is used to remove duplicate rows from a DataFrame? A) dropna() B) drop_duplicates() C) merge() D) groupby() Answer: B
Explanation: drop_duplicates() removes duplicate rows, aiding in data cleaning to ensure data integrity. Question 23. When connecting to Amazon Redshift using Python, which library is most commonly used? A) psycopg B) pymysql C) cx_Oracle D) pyodbc Answer: A Explanation: psycopg2 is a PostgreSQL adapter, and Redshift is compatible with PostgreSQL, making it suitable for Redshift connections.
C) get_object() D) list_objects() Answer: B Explanation: upload_file() uploads a local file to an S3 bucket, essential for programmatic data storage. Question 26. Which Python library provides functions for easy plotting of statistical graphics? A) Matplotlib B) Seaborn C) Plotly D) Bokeh Answer: B
Explanation: Seaborn builds on Matplotlib to provide high-level interface for attractive statistical graphics. Question 27. Which method is used to convert a Pandas DataFrame to a JSON string? A) to_csv() B) to_json() C) to_dict() D) to_html() Answer: B Explanation: to_json() serializes a DataFrame into JSON format, useful for data interchange.