
























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This exam guide prepares developers for the Developer Associate credential focused on Apache Cassandra. Topics include data modeling, cluster architecture, replication, consistency, CQL, performance tuning, and fault tolerance. The guide emphasizes building scalable, highly available distributed data solutions.
Typology: Exams
1 / 96
This page cannot be seen from the preview
Don't miss anything!

























































































Question 1. Which component in Cassandra’s write path guarantees durability before an insert is acknowledged to the client? A) Memtable B) Bloom filter C) Commit log D) SSTable Answer: C Explanation: The commit log records every mutation to disk before it is applied to the memtable, ensuring durability even if the node crashes. Question 2. In Cassandra’s gossip protocol, what information is primarily exchanged between nodes? A) Table schemas B) Current node state and heartbeat timestamps C) Query results D) Compaction statistics Answer: B Explanation: Gossip disseminates each node’s state (up/down, load, token ranges) and heartbeat information to maintain cluster membership. Question 3. Which replication strategy should be used for a multi-data-center deployment to control replicas per data center? A) SimpleStrategy B) NetworkTopologyStrategy C) LocalStrategy D) RackAwareStrategy Answer: B
Explanation: NetworkTopologyStrategy lets you specify the number of replicas for each data center, ideal for multi-DC clusters. Question 4. What is the primary purpose of a Bloom filter in an SSTable? A) Encrypt data at rest B) Speed up row look-ups by probabilistically indicating key absence C) Store secondary index data D) Manage tombstone expiration Answer: B Explanation: Bloom filters provide a fast, memory-efficient way to test if a key is not in an SSTable, reducing unnecessary disk reads. Question 5. When a read request can be satisfied from the memtable, which cache is bypassed? A) Row cache B) Key cache C) Index cache D) Commit log cache Answer: B Explanation: The key cache stores locations of SSTable data; if the data is in the memtable, the key cache is not consulted. Question 6. Which compaction strategy is best suited for write-heavy workloads with many small SSTables? A) LeveledCompactionStrategy B) SizeTieredCompactionStrategy C) TimeWindowCompactionStrategy D) DateTieredCompactionStrategy
D) They create a secondary index automatically Answer: B Explanation: Columns inside the first parentheses form a composite partition key, determining how data is distributed across nodes. Question 10. Which of the following data types is NOT a collection in Cassandra? A) set B) list C) map D) tuple Answer: D Explanation: Tuple is a user-defined type, not a collection; collections are sets, lists, and maps. Question 11. When should a materialized view be avoided? A) When you need eventual consistency B) When the base table has high write throughput C) When you need read-only access D) When the view is defined on a single partition key Answer: B Explanation: Materialized views incur additional writes for each base-table mutation, so they are unsuitable for high-write tables. Question 12. Which CQL statement correctly creates a keyspace with a replication factor of 3 in a single data center? A) CREATE KEYSPACE ks WITH REPLICATION = {'class':'SimpleStrategy','replication_factor':3};
B) CREATE KEYSPACE ks WITH REPLICATION = {'class':'NetworkTopologyStrategy','DC1':3}; C) CREATE KEYSPACE ks WITH REPLICATION = {'class':'SimpleStrategy','RF':3}; D) CREATE KEYSPACE ks WITH REPLICATION = {'class':'NetworkTopologyStrategy','replication_factor':3}; Answer: A Explanation: SimpleStrategy uses the ‘replication_factor’ property; option A follows the correct syntax. Question 13. Which CQL clause is required to enforce a specific clustering order on a table? A) WITH CLUSTERING ORDER BY B) ORDER BY C) CLUSTERING ORDER D) USING CLUSTERING ORDER Answer: A Explanation: The table option “WITH CLUSTERING ORDER BY (col ASC/DESC)” defines the on-disk order of clustering columns. Question 14. What does the CQL keyword ALLOW FILTERING do? A) Enables secondary indexes on the query B) Allows the query to bypass partition key restrictions, potentially scanning many rows C) Forces the query to use the row cache D) Optimizes the query for read-repair Answer: B Explanation: ALLOW FILTERING permits queries that would otherwise be rejected because they require full table scans.
Explanation: During a read, if the coordinator detects differing data among replicas, it initiates a read-repair to synchronize them. Question 18. Which of the following is true about hinted handoff? A) It stores hints on the coordinator node for down replicas and replays them when they recover B) It permanently stores hints in a separate keyspace C) It only works for writes at CL=ALL D) It replaces the need for repair Answer: A Explanation: Hinted handoff writes a small hint to the coordinator’s local storage, which is replayed when the target node comes back online. Question 19. In a cluster with two data centers, each with RF=3, what Consistency Level ensures a write is persisted in a majority of replicas per data center? A) QUORUM B) EACH_QUORUM C) LOCAL_QUORUM D) ALL Answer: B Explanation: EACH_QUORUM requires a quorum in each data center (2 of 3), guaranteeing per-DC durability. Question 20. Which driver load-balancing policy is aware of token ranges and can route queries directly to the replica that owns the data? A) RoundRobinPolicy B) DCAwareRoundRobinPolicy
C) TokenAwarePolicy D) LatencyAwarePolicy Answer: C Explanation: TokenAwarePolicy uses the partitioner’s token map to send requests to the replica node that owns the requested partition. Question 21. What is the effect of setting “speculative_retry = '99percentile'” on a table? A) The driver will retry failed queries automatically 99% of the time B) The coordinator will issue a second read to another replica if the first read exceeds the 99th percentile latency C) Writes will be duplicated to 99% of replicas for safety D) The table will be compacted after 99% of its SSTables are merged Answer: B Explanation: Speculative retry triggers an additional read to a different replica when the initial read latency exceeds the defined percentile, reducing tail latency. Question 22. Which CQL command creates a user-defined type (UDT) named address with fields street (text) and zip (int)? A) CREATE TYPE address (street text, zip int); B) CREATE UDT address (street text, zip int); C) CREATE TYPE address WITH (street text, zip int); D) CREATE TYPE address AS (street text, zip int); Answer: A Explanation: The correct syntax is “CREATE TYPE address (street text, zip int);”.
Question 26. Which batch type should be used when you need atomicity across multiple tables? A) UNLOGGED BATCH B) LOGGED BATCH C) COUNTER BATCH D) BATCH WITH TIMEOUT Answer: B Explanation: LOGGED BATCH writes a batch log to guarantee atomicity across all mutations; if any fail, the entire batch is rolled back. Question 27. Which statement about Lightweight Transactions (LWT) is correct? A) They provide eventual consistency with lower latency than normal writes B) They use the Paxos protocol to achieve linearizable consistency C) They can only be executed on tables with a single partition key column D) They bypass the commit log for faster performance Answer: B Explanation: LWTs employ Paxos to ensure that conditional updates (IF …) are linearizable across replicas. Question 28. What is the purpose of the “nodetool repair” command? A) To compact all SSTables on a node B) To synchronize data between replicas for a given token range C) To clear the key cache D) To rebuild secondary indexes Answer: B
Explanation: Repair runs anti-entropy processes to compare and reconcile differences among replicas for the specified token ranges. Question 29. Which of the following is true about the row cache? A) It caches entire rows, including all columns, and is useful for read-heavy workloads with small rows B) It caches only primary key values C) It automatically invalidates when a column is updated D) It is enabled by default on all tables Answer: A Explanation: The row cache stores full rows in memory, reducing disk I/O for frequently accessed, relatively small rows. Question 30. In cqlsh, which command displays the schema of a keyspace named “sales”? A) DESCRIBE KEYSPACE sales; B) SHOW KEYSPACE sales; C) LIST KEYSPACE sales; D) GET SCHEMA sales; Answer: A Explanation: “DESCRIBE KEYSPACE sales;” prints the CQL statements that define the keyspace and its tables. Question 31. Which option best describes the effect of setting “compaction = {'class':'LeveledCompactionStrategy'}” on a table? A) SSTables are merged based on size tiers only B) Data is organized into levels to provide more predictable read latency C) Compaction runs only during off-peak hours
B) Bucket data by time interval (e.g., day) as part of the partition key C) Store timestamps as clustering columns only D) Use a counter column for each timestamp Answer: B Explanation: Adding a time bucket (e.g., day) to the partition key limits the size of each partition, preventing “wide partitions”. Question 35. Which CQL clause allows you to write a query that returns rows ordered by clustering column “event_time” descending? A) ORDER BY event_time DESC B) WITH CLUSTERING ORDER BY (event_time DESC) C) SELECT … FROM table ORDER BY event_time DESC; D) USING ORDER BY DESC(event_time) Answer: C Explanation: The SELECT statement can include “ORDER BY event_time DESC” only if the clustering order matches or is a subset of the defined order. Question 36. How does the Java driver’s “RetryPolicy” affect query execution? A) It determines which node to contact first B) It decides whether to retry a request after certain failures (e.g., read timeout) C) It changes the consistency level automatically D) It encrypts the query payload Answer: B Explanation: RetryPolicy encapsulates logic to retry operations on specific errors, such as timeouts or unavailable replicas.
Question 37. In a Cassandra cluster, what is the effect of increasing “num_tokens” per node from 1 to 256? A) It reduces the total number of nodes required for a given data size B) It improves data distribution uniformity and reduces the impact of node removal C) It disables the gossip protocol D) It forces the use of SimpleStrategy only Answer: B Explanation: More tokens per node lead to finer-grained token ranges, resulting in a more even data distribution and smoother rebalancing. Question 38. Which of the following statements about “hinted handoff” is FALSE? A) Hints are stored on the coordinator node that receives the write B) Hints are replayed automatically when the target node becomes available C) Hints are persisted forever until the node recovers D) Hints can be disabled per keyspace with “hinted_handoff_enabled = false” Answer: C Explanation: Hints have a configurable TTL (default 3 hours) and are not kept indefinitely. Question 39. Which CQL command removes a column named “email” from the table “users”? A) ALTER TABLE users DROP email; B) DELETE COLUMN email FROM users; C) ALTER TABLE users REMOVE email; D) DROP COLUMN email FROM users; Answer: A
Answer: B Explanation: Because clustering is stored descending, range queries that request older timestamps (timestamp < ?) can be satisfied by scanning forward without filtering. Question 43. Which of the following is true about “counter” columns? A) They can be part of a primary key B) They support TTL semantics C) They require a special “counter” table type and cannot coexist with non-counter columns D) They can be updated with regular UPDATE statements without special syntax Answer: C Explanation: Counter tables are limited to counter columns only; they cannot mix with regular columns and have specific update semantics. Question 44. What does the “nodetool tpstats” command display? A) Thread pool statistics, including pending and active tasks per operation type B) Token placement statistics for the node C) Disk space usage per keyspace D) Current gossip state of the cluster Answer: A Explanation: tpstats reports per-thread-pool metrics, helping diagnose bottlenecks in reads, writes, and other operations. Question 45. Which CQL option sets the default compaction strategy for a newly created table? A) WITH compaction = {'class':'SizeTieredCompactionStrategy'}
B) WITH default_compaction = 'SizeTieredCompactionStrategy' C) SET compaction_strategy = 'SizeTieredCompactionStrategy' D) USING compaction_strategy = 'SizeTieredCompactionStrategy' Answer: A Explanation: The “WITH compaction = {…}” clause defines the compaction strategy and its options for the table. Question 46. When a node joins a Cassandra cluster, which process is responsible for streaming the data it is now responsible for? A) Gossip B) Bootstrapping C) Repair D) Hinted handoff Answer: B Explanation: Bootstrapping streams the appropriate token ranges from existing nodes to the new node. Question 47. Which of the following is a recommended way to reduce the impact of a large partition on read latency? A) Increase the memtable flush threshold B) Enable row cache on the table C) Split the data into multiple partitions using a bucketing key D) Set “gc_grace_seconds” to 0 Answer: C Explanation: Bucketing the data creates smaller partitions, preventing a single large partition from dominating read latency.
Explanation: Cassandra’s AP nature means that replicas may temporarily diverge, but background processes reconcile them eventually. Question 51. Which command can be used to view the current replication factor of a keyspace named “analytics”? A) DESCRIBE KEYSPACE analytics; B) SELECT * FROM system_schema.keyspaces WHERE keyspace_name='analytics'; C) nodetool describecluster analytics D) SHOW REPLICATION analytics; Answer: B Explanation: system_schema.keyspaces stores metadata, including the replication map for each keyspace. Question 52. In a table with a map column “attributes map”, how would you retrieve the value for key “color”? A) SELECT attributes['color'] FROM table; B) SELECT attributes.color FROM table; C) SELECT map_get(attributes, 'color') FROM table; D) SELECT get(attributes, 'color') FROM table; Answer: A Explanation: The map syntax uses square brackets to access a specific key: attributes['color']. Question 53. Which of the following is a side effect of setting “read_repair_chance = 1.0” on a table? A) Every read will trigger a read-repair, increasing write traffic B) Reads will be blocked until all replicas respond
C) The table will use LeveledCompactionStrategy automatically D) Tombstones will never be purged Answer: A Explanation: A read_repair_chance of 1.0 forces read-repair on every read, potentially adding significant background write load. Question 54. What is the maximum number of columns that can be defined in a single Cassandra table? A) 1024 B) 65535 C) Unlimited (practically limited by memory) D) 32768 Answer: C Explanation: Cassandra does not enforce a hard limit on column count; the practical limit is determined by memory and performance considerations. Question 55. Which of the following is true about the “system_auth” keyspace? A) It stores user-defined tables for application data B) It contains tables for role and permission management used by the native authentication mechanism C) It is automatically dropped when the cluster restarts D) It cannot be encrypted Answer: B Explanation: system_auth holds tables such as roles and permissions for Cassandra’s built-in authentication and authorization.