










































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The Cassandra Ultimate Exam is a technical certification preparation resource focused on Apache Cassandra database administration and distributed data management systems. Topics include database architecture, data modeling, replication, cluster management, performance optimization, query language operations, troubleshooting, scalability, and security practices. This exam is ideal for database administrators, software developers, and IT professionals seeking expertise in NoSQL database technologies and enterprise-level data solutions.
Typology: Exams
1 / 50
This page cannot be seen from the preview
Don't miss anything!











































Question 1. Which component of the CAP theorem does Apache Cassandra prioritize? A) Consistency B) Availability C) Partition tolerance D) Both B and C Answer: D Explanation: Cassandra is designed as an AP system, emphasizing high availability and partition tolerance; it sacrifices strong consistency in favor of eventual consistency. Question 2. In Cassandra’s gossip protocol, what is the primary purpose of the Phi Accrual Failure Detector? A) To assign token ranges to new nodes B) To calculate read repair probabilities C) To detect node failures based on heartbeat latency D) To balance data across data centers Answer: C Explanation: The Phi Accrual Failure Detector monitors heartbeat intervals and raises a suspicion level (phi) to decide when a node is considered down. Question 3. Which partitioner provides a uniform distribution of tokens across the ring by using a 128-bit hash function? A) RandomPartitioner B) ByteOrderedPartitioner C) Murmur3Partitioner D) OrderPreservingPartitioner Answer: C Explanation: Murmur3Partitioner hashes partition keys with the Murmur3 algorithm, yielding a 128-bit token space that evenly distributes data.
Question 4. When using SimpleStrategy, which factor determines how many replicas are stored for each piece of data? A) Number of data centers B) Replication Factor (RF) C) Number of nodes in the cluster D) Token range size Answer: B Explanation: SimpleStrategy replicates data across the ring based solely on the configured Replication Factor. Question 5. Which snitch is most appropriate for a multi-region deployment on AWS? A) SimpleSnitch B) GossipingPropertyFileSnitch C) RackInferringSnitch D) CloudSnitch Answer: D Explanation: CloudSnitch automatically discovers the region and availability zone of cloud instances, making it ideal for AWS multi-region clusters. Question 6. In a Cassandra primary key, the partition key is responsible for: A) Sorting rows inside a partition B) Determining which node stores the row C) Enforcing uniqueness of columns D) Defining secondary indexes Answer: B Explanation: The partition key’s hash value decides the token range and thus the node that owns the row. Question 7. Which of the following is true about clustering columns? A) They are part of the partition key
D) To improve write throughput Answer: B Explanation: UDUs let you embed a structured object (multiple fields) inside a column, promoting reuse without creating separate tables. Question 11. Which statement about secondary indexes in Cassandra is correct? A) They are stored on the coordinator node only B) They provide constant-time lookups across the entire cluster C) They work best on low-cardinality columns in small tables D) They automatically shard data across all nodes Answer: C Explanation: Secondary indexes are efficient for low-cardinality columns on modestly sized tables; for high-cardinality or large tables they cause performance problems. Question 12. Materialized views are primarily used to: A) Replace primary keys with composite keys B) Provide read-only, pre-computed query results without manual denormalization C) Enable ACID transactions across tables D) Store data in a different consistency level than the base table Answer: B Explanation: Materialized views automatically maintain a denormalized copy of data to serve specific query patterns. Question 13. Which design pattern helps avoid “hot partitions” in a time-series workload? A) Using a single static partition key B) Adding a bucket suffix (e.g., day, hour) to the partition key C) Storing timestamps as clustering columns only D) Relying on secondary indexes for time filtering Answer: B
Explanation: Adding a time bucket (day/hour) to the partition key spreads writes across many partitions, preventing any single partition from becoming a hotspot. Question 14. Which CQL command creates a keyspace with NetworkTopologyStrategy for two data centers, DC1 with RF=3 and DC2 with RF=2? A) CREATE KEYSPACE ks WITH replication = {'class':'SimpleStrategy','replication_factor':5}; B) CREATE KEYSPACE ks WITH replication = {'class':'NetworkTopologyStrategy','DC1':3,'DC2':2}; C) CREATE KEYSPACE ks WITH replication = {'class':'NetworkTopologyStrategy','replication_factor':5}; D) CREATE KEYSPACE ks WITH replication = {'class':'SimpleStrategy','DC1':3,'DC2':2}; Answer: B Explanation: NetworkTopologyStrategy requires specifying each data-center name and its replication factor. Question 15. Which CQL clause enables a conditional insert that succeeds only if the row does not already exist? A) INSERT ... IF NOT EXISTS B) INSERT ... USING TTL C) INSERT ... IF EXISTS D) INSERT ... USING TIMESTAMP Answer: A Explanation: The IF NOT EXISTS clause triggers a lightweight transaction (LWT) that checks for row existence before inserting. Question 16. In a lightweight transaction, which consensus algorithm does Cassandra use? A) Raft B) Two-phase commit C) Paxos D) Zookeeper quorum
Answer: A Explanation: Logged batches write a batch log to ensure atomicity across partitions, adding extra I/O overhead. Question 20. Which built-in CQL function returns the current Unix timestamp in milliseconds? A) now() B) unixTimestamp() C) toTimestamp(now()) D) toUnixTimestamp(now()) Answer: D Explanation: toUnixTimestamp(now()) converts the current time to a Unix epoch in milliseconds. Question 21. What is the primary purpose of the commit log in Cassandra’s write path? A) To store data in a column-family format B) To provide durability by persisting writes before memtables are flushed C) To index rows for faster reads D) To manage compaction schedules Answer: B Explanation: The commit log records every mutation sequentially; if a node crashes, data can be recovered from it before memtables are flushed to SSTables. Question 22. Memtables in Cassandra are: A) Immutable on-disk files B) In-memory write buffers that are flushed to SSTables when full or after a time interval C) Compaction logs for merging SSTables D) Temporary storage for hinted handoff data Answer: B
Explanation: Memtables hold recent writes in memory; when they reach a threshold they are flushed to disk as SSTables. Question 23. Which component helps the read path avoid unnecessary disk I/O by quickly ruling out absent keys? A) Index Summary B) Bloom Filter C) Key Cache D) Compaction Strategy Answer: B Explanation: Bloom filters are probabilistic data structures that indicate whether a partition key is likely present in an SSTable, preventing needless disk reads. Question 24. What does the key cache store in Cassandra? A) Full rows for hot partitions B) Mapping of partition keys to SSTable locations C) Results of recent secondary index lookups D) Compaction progress metadata Answer: B Explanation: The key cache keeps the positions of partition keys on disk, allowing faster seeks during reads. Question 25. In SizeTieredCompactionStrategy (STCS), which condition triggers a compaction? A) When an SSTable reaches a specific size threshold B) When the number of SSTables in a size tier exceeds a configurable count C) When a time window expires D) When the total number of tombstones surpasses a limit Answer: B Explanation: STCS groups SSTables of similar size and compacts them once the count in a tier exceeds a threshold (default 4).
Answer: C Explanation: With RF=3, QUORUM = 2. Since 2 (read) + 2 (write) = 4 > 3, QUORUM for both ensures strong consistency. Question 30. Which anti-entropy mechanism stores write hints on a live node when a replica is temporarily down? A) Read Repair B) Hinted Handoff C) Merkle Tree Repair D) Gossip Repair Answer: B Explanation: Hinted handoff writes a hint on a live node; once the down node recovers, the hint is replayed. Question 31. In a background read repair, when is it triggered? A) On every read request B) Only when a read returns a mismatch between replicas C) Periodically by the repair service regardless of reads D) When a node joins the cluster Answer: B Explanation: Background read repair runs asynchronously after a read detects divergent data among replicas. Question 32. Which nodetool command initiates a manual repair for a specific keyspace and table? A) nodetool repair -ks keyspace -tb table B) nodetool cleanup -ks keyspace -tb table
C) nodetool repair keyspace table D) nodetool compact -ks keyspace -tb table Answer: C Explanation: The syntax nodetool repair starts a repair process for the given keyspace/table. Question 33. When adding a new node to a Cassandra cluster, which operation streams data to the newcomer? A. Rebuild B. Repair C. Bootstrap D. Decommission Answer: C Explanation: Bootstrap streams the appropriate token ranges from existing nodes to the new node, allowing it to become a full member. Question 34. Which hardware characteristic most directly influences Cassandra’s read latency? A) CPU core count B) Disk IOPS and latency C) Network bandwidth D) JVM heap size Answer: B Explanation: Reads often require disk I/O; faster disks (SSD/NVMe) with lower latency reduce read times. Question 35. For a write-heavy workload, which JVM garbage collector is generally recommended for Cassandra 4.x? A) ParallelGC B) CMS C) G1GC D) ZGC
Question 39. To restore data from a snapshot to a newly added node, which tool is typically used? A) sstableloader B) nodetool repair C) cqlsh COPY FROM D) cassandra-restore Answer: A Explanation: sstableloader streams SSTables from the snapshot into the target node, respecting token ownership. Question 40. Which of the following authentication mechanisms can be enabled in Cassandra to integrate with an existing LDAP directory? A) PasswordAuthenticator only B) AllowAllAuthenticator only C) LDAPAuthenticator (via DSE) D) KerberosAuthenticator (native) Answer: C Explanation: DataStax Enterprise provides LDAPAuthenticator to delegate authentication to an LDAP/AD server. Question 41. Transparent Data Encryption (TDE) in Cassandra primarily protects data at: A) Rest (on disk) B) In transit (network) C) In memory (heap) D) During compaction only Answer: A Explanation: TDE encrypts SSTables and commit logs on disk, securing data at rest. Question 42. TLS/SSL in Cassandra is used to protect:
A) Only client-to-node communication B) Only node-to-node (internode) traffic C) Both client-to-node and internode traffic when configured D) Only data stored in snapshots Answer: C Explanation: By configuring client_encryption_options and server_encryption_options, Cassandra can encrypt both client and inter-node communications. Question 43. In a multi-data-center deployment, which consistency level guarantees that the read reflects the most recent write in the local data center only? A) ALL B) QUORUM C) LOCAL_QUORUM D) EACH_QUORUM Answer: C Explanation: LOCAL_QUORUM requires a quorum of replicas within the local DC, ensuring fast, locally consistent reads. Question 44. When deploying Cassandra on Kubernetes using the K8ssandra operator, which custom resource defines the desired cluster topology? A) CassandraCluster B) CassandraDatacenter C) CassandraNodePool D) CassandraStatefulSet Answer: B Explanation: The CassandraDatacenter CR specifies the number of racks, nodes per rack, and other topology details. Question 45. Which of the following is a recommended practice to avoid “hot partitions” caused by a monotonically increasing primary key?
C) It removes the need for the index altogether D) It guarantees O(1) query performance Answer: B Explanation: Even with a secondary index, if the query filters on non-indexed columns, Cassandra may need to scan entire partitions, making ALLOW FILTERING risky. Question 49. When configuring a keyspace for a single-region deployment with three racks, which replication strategy should you use to ensure rack-aware replication? A) SimpleStrategy B) NetworkTopologyStrategy with a single DC and RF= C) NetworkTopologyStrategy with three DC entries (one per rack) D) GossipingPropertyFileSnitch only Answer: B Explanation: In a single-region setup, NetworkTopologyStrategy with the DC’s RF set to 3 will automatically place replicas on distinct racks if the snitch reports rack information. Question 50. Which token allocation method helps avoid token “skew” when adding multiple nodes at once? A) Random token assignment B) Manual token assignment using a uniform step size C) Using the nodetool move command after node addition D) Assigning all new nodes the same token range Answer: B Explanation: Manually calculating tokens with equal spacing ensures even data distribution and prevents skew. Question 51. If a table has a clustering column defined as clustering_column DESC, what effect does this have on query results? A) Rows are stored in ascending order but returned in descending order automatically
B) Rows are stored and returned in descending order within each partition C) Cassandra ignores the DESC keyword and always stores ascending D) It only affects the order of secondary index entries Answer: B Explanation: The DESC modifier tells Cassandra to store clustering columns in descending order, which directly influences query ordering without additional sorting. Question 52. Which of the following is a limitation of using a map collection in a table? A) Maps cannot be indexed at all B) Maps cannot contain null values C) Maps are limited to 100 entries per row D) Updating a single map entry rewrites the entire map in the row Answer: D Explanation: Modifying a map entry causes the whole map column to be rewritten, which can be expensive for large maps. Question 53. In a multi-DC cluster, which consistency level ensures that a write is acknowledged by at least one replica in each data center? A) ALL B) QUORUM C) EACH_QUORUM D) LOCAL_QUORUM Answer: C Explanation: EACH_QUORUM requires a quorum of replicas in every data center, guaranteeing cross-DC durability. Question 54. Which of the following statements about nodetool repair is correct? A) It only repairs data on the node where it is executed B) It performs a full table scan regardless of the token ranges specified
Answer: D Explanation: The - operator removes a specific element from a list by value. Question 58. Which of the following is a recommended practice for sizing the JVM heap in a Cassandra node? A) Set heap size equal to total RAM B) Keep heap size between 8 GB and 16 GB to avoid long GC pauses C) Use a heap larger than 32 GB for better caching D) Disable the heap entirely and rely on off-heap storage Answer: B Explanation: Keeping the heap within 8- 16 GB allows the G1GC to manage pauses efficiently; larger heaps can cause long GC cycles. Question 59. In a Cassandra query, which clause can be used to limit the number of rows returned without scanning the entire partition? A) LIMIT B) FETCH FIRST C) TOP D) ROWS ONLY Answer: A Explanation: The LIMIT clause stops reading after the specified number of rows, reducing I/O when the partition is large. Question 60. Which of the following is true about the WRITETIME function in CQL? A) It returns the timestamp used for the last write of the entire row B) It can only be used on primary key columns C) It returns the microsecond precision write timestamp of a specific column D) It is only available in Cassandra 5.0 and later Answer: C Explanation: WRITETIME(column_name) returns the write timestamp (in microseconds) for that particular column.
Question 61. Which of the following describes the effect of setting gc_grace_seconds to a very low value (e.g., 0) on a table? A) Tombstones are never removed, causing storage bloat B) Tombstones are removed immediately after compaction, risking resurrected deleted data during repairs C) Write latency is reduced because no commit log is written D) Reads become eventually consistent only after a full repair Answer: B Explanation: A low gc_grace_seconds means tombstones can be purged quickly, but if a replica was down during a delete, it may miss the tombstone and later resurrect the deleted row during repair. Question 62. Which of the following options is the most efficient way to retrieve the latest N rows for a given partition key ordered by a descending clustering column? A) SELECT * FROM table WHERE partition_key =? ORDER BY clustering_column DESC LIMIT N; B) SELECT * FROM table WHERE partition_key =? LIMIT N; C) SELECT * FROM table WHERE partition_key =? ALLOW FILTERING; D) SELECT * FROM table WHERE clustering_column =? LIMIT N; Answer: A Explanation: Providing the partition key and ordering by the clustering column (which is stored in that order) allows Cassandra to read only the needed rows, and LIMIT stops after N rows. Question 63. In a multi-region Cassandra setup, which replication factor configuration ensures that each region stores a full copy of the data? A) RF=2 with SimpleStrategy B) NetworkTopologyStrategy with RF=3 in each region C) NetworkTopologyStrategy with RF=1 per region D) SimpleStrategy with RF equal to number of regions Answer: C