Pré-visualização parcial do texto
Baixe MKGSb Cb Edg0C e outras Manuais, Projetos, Pesquisas em PDF para Engenharia Naval, somente na Docsity!
Distributed Data at Web Scale Eben Hewitt Foreword by Jonathan Ellis 0) RE | LLY* / Apache Cassandra Project Chair Cassandra: The Definitive Guide by Eben Hewitt Copyright O 2011 Eben Hewitt. All riglws reserved. Printed im the United States of America. Published by O'Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O'Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/institutional sales department: (800) 998-9938 or corporate(doreilly.com. Editor: Mike Loukides Indexer: Ellen Troutman Zaig Production Editor: Holly Bauer Cover Designer: Karen Montgomery Copyeditor: Genevieve d'Entremont Interior Designer: David Futaro Proofreader: Emily Quill Ilustrator: Robert Romano Printing History: November 2010: First Edition. Nutshell Handbook, the Nutshel] Handbook logo, and the O'Reilly logo are registered trademarks of O'Reilly Media, nc. Cassandra: The Definitive Guide, the image of a Paradise flycatcher, and related trade dress are trademarks of O'Reilly Media, Inc. Many of the designarions used by manufacrurers and sellers to distinguish their products are claimed as trademarks. Where those designations appear im this book, and O'Reilly Media, Inc. was aware of a trademark claim, the designations have been printed m caps or initial caps. While every precautrion has been taken in the preparation of this book, the publisher and aurhor assume no responsibility for errors or omissions, or for damages resulting from the use of the information con- tained herein. RepKover. EP SEE This book uses RepKover"M, a durable and flexible lay-flar binding. ISBN: 978-1-449-39041-9 IM] 1304023728 What's In There? 29 Building from Source 30 Additional Build Targets 32 Building with Maven 32 Running Cassandra 33 On Windows 33 On Linux 33 Starting the Server 34 Running the Command-Line Client Interface 35 Basic CLI Commands 36 Help 36 Connecting to a Server 36 Describing the Environment 37 Creating a Keyspace and Column Family 38 Writing and Reading Data 38 Summary 40 Sd ae as sandra! Bala acaso resosa eco es a a 4 The Relational Data Model 4 A Simple Introduction 42 Clusters 45 Keyspaces 46 Column Families 47 Colunm Family Options 49 Columns 49 Wide Rows, Skinny Rows 51 Colunm Sorting 52 Super Columns 53 Composite Keys 55 Design Differences Between RDBMS and Cassandra 56 No Query Language 56 No Referential Integrity 56 Secondary Indexes 56 Sorting Is a Design Decision 57 Denormalizanon 57 Design Patterns 58 Materialized View 59 Valueless Column 59 Aggregate Key 59 Some Things to Keep in Mind 60 Summary 60 viii | Table of Contents À. SAMpIé APRNCATOA: esses cesar ese es sea rece 61 Data Design 61 Hotel App RDBMS Design 62 Hotel App Cassandra Design 63 Hotel Application Code 64 Creating the Database 65 Data Structures 66 Getting a Connection 67 Prepopularing the Database 68 The Search Application 80 Twissandra 85 Summary 85 5. The Cassandra Architecture .........ccsce si scscrsenrererraneenenserianeso 87 System Keyspace 87 Peer-to-Peer 88 Gossip and Failure Detection 88 Anti-Entropy and Read Repair 90 Memtables, SSTables, and Comnnt Logs 91 Hinted Handoff 93 Compaction 94 Bloom Filrers 95 Tombstones 95 Staged Evem-Driven Architecture (SEDA) 96 Managers and Services 97 Cassandra Daemon 97 Storage Service 97 Messaging Service 97 Hinted Handoff Manager 98 Summary 98 6. Coniguring Cassandra ss ess o ese cors cross ecensesvss o cossesecsreness 99 Keyspaces 99 Creating a Column Family 102 Transitioning from 0.6 to 0.7 103 Replicas 103 Replica Placement Strategies 104 Simple Strategy 105 Old Network Topology Strategy 106 Network Topology Strategy 107 Replication Factor 107 Increasing the Replication Factor 108 Partitioners 110 Table of Contents | ix Delering Batch Mutates Batch Deletes Range Ghosts Programmatically Defining Keyspaces and Column Families Summary Basic Client API Thnift Thnift Support for Java Exceptions Theift Summary Avro Avro Ant Targets Avro Specification Avro Summary A Bitof Gir Connecting Client Nodes Clhent List Round-Robin DNS Load Balancer Cassandra Web Console Hector (Java) Features The Hector APl HectorSharp (Cf) Chirper Chiton (Python) Pelops (Java) Kundera (Java ORM) Fauna (Ruby) Summary MOON cessa es ease Logging Tailing General Tips Overview of JMX and MBeans MBeans Integrating JMX Interacting with Cassandra via JMX Cassandra's MBeans 149 150 154 152 152 153 dE ASI REDES SATER GREENE 155 156 156 159 159 160 160 162 163 164 164 165 165 165 165 165 168 169 170 170 175 175 176 176 ER? 177 PRE D ROO RRRSREEDNPA e 179 179 181 182 183 185 187 188 190 Table of Contents | xi org.apache.cassandra.concurrent org.apache.cassandra.db org.apache.cassandra.gms org.apache.cassandra.service Custom Cassandra MBcans Runtime Analysis Tools Heap Analysis with JMX and JHAT Detrecning Thread Problems Health Check Summary 10... Maienante omnes seem eo cessar a eme mc Getting Ring Information Info Ring Getting Statistics Using cfstats Using tpstats Basic Maintenance Repair Flush Cleanup Smapshots Taking a Snapshot Clearing a Snapshot Load-Balancing the Cluster loadbalance and streams Decommissioning a Node Updating Nodes Removing Tokens Compaction Threshold Changing Column Families in a Working Cluster Summary TA... PERONMANTO TUNING cases rsss core acocerrasasencass cone cnsenegea Data Storage Reply Timeout Commit Logs Memtables Coneurrency Caching Buffer Sizes Using the Python Stress Test xii | Table of Contents 193 193 194 194 196 199 199 203 204 204 207 208 208 208 209 209 210 21 21 213 213 213 213 214 215 215 218 220 220 220 220 221 Ind ba us O So o oe DA O bo PS bo po bo to to po SO O =] O O QU ta to 24 You have either reached a page that is unavailable for vieming or reached your viewing limit for this book. 24 You have either reached a page that is unavailable for vieming or reached your viewing limit for this book. Preface Why Apache Cassandra? Apache Cassandra is a free, open source, distributed data storage system that differs sharply from relational database management systems. Cassandra first started as an incubation project at Apache in January ot 2009. Shortly thereafter, the committers, led by Apache Cassandra Project Chair Jonathan Ellis, re- leased version 0.3 of Cassandra, and have steadily made minor releases since that time. Though as of this writing it has not yet reached a 1.0 release, Cassandra is being used in production by some of the biggest properties on the Web, including Facebook, Twitter, Cisco, Rackspace, Digg, Cloudkick, Reddit, and more. Cassandra has become so popular because of its outstanding technical features. It is durable, seamlessly scalable, and tuncably consistent. 1 performs blazingly fast writes, can store hundreds of terabytes of data, and is decentralized and symmetrical so there's no single point of failure. Ttis highly available and offers a schema-free data model. Is This Book for You? This book is intended for a variety of audiences. It should be useful to you if you are: * A developer working with large-scale, high-volume websites, such as Web 2.0 so- cial applications * An application architect or data architect who needs to understand the available options for high-performance, decentralized, elastic data stores * A database administrator or database developer currently working with standard relational database systems who needs to understand how to implementa fault- tolerant, eventually consistent data store xvii * A manager who wants to understand the advantages (and disadvantages) of Cas- sandra and related columnar databases to help make decisions about technology strategy * Astudent, analyst, or researcher who is designing a project related to Cassandra or other non-relarional data store options This book is a technical guide. In many ways, Cassandra represents a new way of thinking about data. Many developers who gained their professional chops in the last 15-20 years have become well-versed in thinking about data in purely relational or object-oriented terms. Cassandra's data model is very different and can be difficult to wrap your mind around at first, especially for those of us with entrenched ideas about what a database is (and should be). Using Cassandra does not mean that you have to be a Java developer. However, Cas- sandra is written in Java, so if you're going to dive into the source code, a solid under- standing of Java is crucial. Although it's not strictly necessary to know Java, it can help you to better understand exceptions, how to build the source code, and how to use some of the popular clients. Many of the examples in this book are in Java. But because of the interface used to access Cassandra, you can use Cassandra from a wide variety of languages, including C%, Scala, Python, and Ruby. Finally, it is assumed that you have a good understanding of how the Web works, can use an integrated development environment (IDE), and are somewhat familiar with the typical concerns of data-driven applications. You might be a well-seasoned developer or administrator but still, on occasion, encounter tools used in the Cassandra world that you're not familiar with. For example, Apache Ivy is used to build Cassandra, and a popular client (Hector) is available via Gir. In cases where I speculate that you'll need to do a little setup of your own in order to work with the examples, I try to support that. What's in This Book? This book is designed with the chapters acting, to a reasonable extent, as standalone guides. This is important for a book on Cassandra, which has a variety of audiences and is changing rapidly. To borrow from the software world, I wanted the book to be “modular"—sort of. If you're new to Cassandra, it makes sense to read the book in order; ifyou've passed the introductory stages, you will still find value in later chapters, which you can read as standalone guides. Here is how the book is organized: Chapter 1, Introducing Cassandra This chapter introduces Cassandra and discusses what's exciting and different about it, who is using it, and what its advantages are. Chapter 2, Installing Cassandra This chapter walks you through installing Cassandra on a variety of platforms. xviii | Preface Chapter 12, Integrating Hadoop In this chapter, written by Jeremy Hanna, we put Cassandra in a larger context and see how to integrate it with the popular implementation of Google's Map/Reduce algorithm, Hadoop. Appendix Many new databases have cropped up m response to the need to scale at Big Data levels, or to take advantage of a “schema-free” model, or to support more recent initiatives such as the Semantic Web. Here we contextualize Cassandra against a variety of the more popular nonrelational databases, examining document- oriented databases, distributed hashtables, and graph databases, to better understand Cassandra's offerings. Glossary It can be difficult to understand something that's really new, and Cassandra has many terms that might be unfamiliar to developers or DBAs coming from the re- lational application development world, so I've included this glossary to make it easier to read the rest of the book. If you're stuck on a certain concept, you can flip to the glossary to help clarify things such as Merkle trees, vector clocks, hinted handoffs, read repairs, and other exotic terms. This book is developed against Cassandra 0.6 and 0.7. The project team is working hard on Cassandra, and new minor releases and bug fix re- 4! leases come out frequently. Where possible, 1 have tried ro call out rel- evant differences, but you might be using a different version by the time you read this, and the implementation may have changed. Finding Out More If you'd like to find out more about Cassandra, and to get the latest updates, visit this book's companion website at http:/Avww.cassandraguide.com. Tt's also an excellent idea to follow me on Twitter at (Vebenhewitt. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URIs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords. xx | Preface Constant width bold Shows commands or other text that should be typed literally by the user. Constant width italic Shows text that should be replaced with user-supplied values or by values deter- mined by context. da, This icon signifies a tip, suggestion, or general note. This icon indicates a warning or caution. “> Using Code Examples This book is here to help you get your job done. In general, you may use the code in this book in your programs and documentation. You do not need to contact us for permission unless you're reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O"Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your produetr's documentation does require permission. We appreciate, but do not require, attribution. An attribution usually includes the ritle, author, publisher, and ISBN. For example: “Cassandra: The Definitive Guide by Eben Hewitt. Copyright 2011 Eben Hewitt, 978-1-449-39041-9.” Ifyou feel your use of code examples falls outside fair use or the permission given here, feel free to contact us at permissions(Doreilly.com. Safariº Enabled «> Safari Books Online is an on-demand digital library that leis you easily Safari e ” : : mom search over 7,500 technology and creative reference books and videos to find the answers you need quickly. Witha subscription, you can read any page and watch any video from our library online. Read books on your cell phone and mobile devices. Access new titles before they are available for print, and get exclusive access to manuscripts im development and post feedback for the authors. Copy and paste code samples, organize your favorites, Preface | xxi Pm inspired by the many territic developers who have contributed to Cassandra. Hats off for making such a pretty and powerful database. As always, thank you to Alison Brown, who read drafts, gave menotes, and made sure that | had time to work; this book would not have happened without you. Preface | xxiii