



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Class: Distributed Software Develop; Subject: Computer Science; University: University of San Francisco (CA); Term: Unknown 1989;
Typology: Study notes
1 / 7
This page cannot be seen from the preview
Don't miss anything!




More DHTs^ Chris BrooksDepartment of Computer ScienceUniversity of San Francisco
Department of Computer Science — University of San Francisco – p. 1/
??
16-2: Distributed Hash Tables^ •^ On Tuesday, we talked about Distributed Hash Tables^ ◦
Used Chord as an example • Able to act as a distributed storage and indexing mechanism • Maps “keys” (hashes of data) to “values” (coordinates in aspace of nodes.) • Each node is responsible for a set of the keyspace.
Department of Computer Science — University of San Francisco – p. 2/
??
16-3: Desirable Properties^ •^ Desirable properties of DHTs include:^ ◦
Always able to find data stored in the network ◦ Able to support entry and exit of nodes ◦ Able to tolerate node failure ◦ Efficient routing of queries
Department of Computer Science — University of San Fra
16-4: Chord^ •^ Recall that Chord uses a 1-D ring to structure the network.^ •^ Nodes keep a
finger table
that tells them the successors at
various points in the network. • Routing requires
O(logn
)^ messages.
-^ Can tolerate failures through replication of data.
Department of Computer Science — University of San Francisco – p. 4/
??
16-5: CAN^ •^ CAN stands for Content Addressable Network^ •^ Developed at Berkeley at the same time as Chord wasdeveloped at MIT.^ •^ Also provides hash-table like functionality.^ ◦
get, store, delete
Department of Computer Science — University of San Francisco – p. 5/
??
16-6: Address construction in CAN^ •^ CAN uses a
d-dimensional torus as its address space. ◦^ Vector of
d^ dimensions where each dimension “wraps around”
-^ Each node is responsible for a
zone in this space.
-^ The hash function maps data into a point in this space.
Department of Computer Science — University of San Fra
-^ An example CAN in a 2-Dspace. •^ Node 1 is responsible fordata in the range ( (0-0.25),(0.5-1) ) •^ Node
4 is responsible
for
data in the range ( (0.5-1),(0.5-1) )
Department of Computer Science — University of San Francisco – p. 7/
??
16-8: Routing in CAN^ •^ Each node keeps track of the IP address and zone of itsneighbors.^ •^ A neighbor is a node whose zone overlaps a node in
n^ −^1
dimensions, and abuts it in the
nth dimension.
-^ In our example, 2,3, and 4 are neighbors of 1.^ ◦^ Remember that the space is really a torus. -^ Routing is done by greedy search. •^ Always forward the message to the neighbor who is closest tothe data, using standard Euclidean distance. (ties brokenrandomly)
Department of Computer Science — University of San Francisco – p. 8/
??
16-9: Routing in CAN^ •^ In a
d-dimensional space of
n^ evenly-sized zones,
◦^ Routing requires
(^1) d d^ (n (^4) )^ hops.
◦^ Each node stores info about 2
d^ neighbors.
-^ By increasing
d, we can reduce the length of routes
-^ Cost: additional storage at each node. •^ We get resistance to failure when routing for free^ ◦^
If we are trying to route through a non-responsive node, justchoose the “next-best” path.
Department of Computer Science — University of San Fra
16-10: Node joining^ •^ New nodes are added to the system by choosing an existingzone and subdividing it.^ •^ First, the new node must find an existing bootstrap node.^ •^ Then it must find its coordinates within CAN^ •^ Finally, it must update its neighbors’ routing tables.
Department of Computer Science — University of San Francisco – p. 10/
??
16-11: Node joining - bootstrapping^ •^ It is assumed that the IP address of at least one current CANnode is known to the joining node.^ •^ The designers use DNS to map a CAN hostname to one ormore CAN nodes.^ •^ This bootstrap node can provide the IP addresses of othernodes in the network that can be used as entry points.
Department of Computer Science — University of San Francisco – p. 11/
??
16-12: Node joining - finding a zone to share^ •^ The newly-joining node next randomly selects a point
P^ in the
coordinate space. • It sends a JOIN message to that coordinate. • This message is routed as all other CAN messages. • When it reaches the node responsible for
P^ , that node returns
a reply to the newly-joining node. • The node responsible for
P^ then divides its zone in half and
assigns half to the newly-joining node.
Department of Computer Science — University of San Fran
16-19: Improved routing metrics^ •^ The basic CAN routing algorithm uses greedy search. Itchooses the node that produces the greatest reduction indistance between the present node and the data to be found.^ •^ An alternative is to also measure the round-trip-time between anode and each of its neighbors.^ •^ When choosing where to forward a packet, select the node thatmaximizes progress over RTT.^ •^ This favors low-latency paths at the IP level.
Department of Computer Science — University of San Francisco – p. 19/
??
16-20: Overloading zones^ •^ The basic design assumes that each zone is assigned toexactly one node.^ •^ Alternatively, multiple peers can share zones.^ •^ Each peer maintains a list of other peers in its zone, in additionto the neighbor list.^ •^ When a node receives an update from a neighbor, he computesthe RTT for each of the nodes in that zone.^ •^ Retains the lowest RTT.^ •^ Contents may be either divided amongst peers or elsereplicated.
Department of Computer Science — University of San Francisco – p. 20/
??
16-21: CAN summary^ •^ Like Chord, CAN provides an implementation of a DHT.^ •^ Uses a
d-dimensional space to divide data.
-^ Space is broken and merged as nodes enter and leave. •^ Explicitly trades off storage for routing efficiency. •^ But Chord and CAN are active research projects (at MIT andBerkeley, respectively.)
Department of Computer Science — University of San Fran
16-22: Coral^ •^ Coral is a peer-to-peer Web content caching and distributionsystem.^ •^ Designed to distribute the load of high-demand Web content.^ •^ Also uses a form of DHTs^ •^ Run by NYU^ •^ To access “Coralized” content, append ’.nyud.net:8090’ to aURL.
Department of Computer Science — University of San Francisco – p. 22/
??
16-23: Coral in a Nutshell, pt 1^ •^ Client sends a DNS request for foobar.com.nyud.net to its localresolver.^ •^ DNS request is passed along to a Coral DNS server in the .netdomain.^ •^ The Coral DNS server probes the client to discover RTT, andthe location of the last few network hops.^ •^ Checks Coral to see if any HTTP proxies are near the client.^ •^ DNS returns proxies (or sometimes nameservers) close to theclient.
Department of Computer Science — University of San Francisco – p. 23/
??
16-24: Coral in a Nutshell, pt 2^ •^ The client then sends an HTTP get to the specified proxy.^ •^ If the proxy has a cache of the file, this is returned.^ •^ Otherwise, the proxy looks up the object in Coral^ •^ Object is either returned from another Coral node, or else fromthe original source.^ •^ Coral proxy stores a reference to this item in Coral, indicatingthat the proxy now has a copy.
Department of Computer Science — University of San Fran
16-25: Coral DNS^ •^ When a lookup is done in DNS for a Coral address (anythingending with nyucd.net), one of the Coral DNS serversresponsible for this domain tries to discover a Coral cache nearthe client.^ •^ This client’s IP address will be returned as a result of the DNSlookup.^ •^ RTT and traceroute-style information is used to try to determinethe latency to the client.
Department of Computer Science — University of San Francisco – p. 25/
??
16-26: Coral HTTP proxy^ •^ The Coral proxies are responsible for cacheing and serving upcontent.^ •^ Goal: fetch content from other clients whenever possible^ ◦
The whole idea is to each the purden on the original host. • Each proxy keeps a local cache. • On request, does the proxy have the data? ◦ Yes. Send the data to the client. Done. ◦ No, but a proxy we know does. Fetch the data and return itto the client. ◦ No, and no nearby proxies have the data. Fetch the datafrom the origin.
Department of Computer Science — University of San Francisco – p. 26/
??
16-27: Coral Architecture^ •^ Keys are hashed 160-bit identifiers (for data), or else hashed IPaddresses (for nodes).^ •^ Like Coral and CAN, this key is used to place the data in thesystem.^ •^ This ID is then used to perform routing.
Department of Computer Science — University of San Fran
16-28: Routing and DSHTs^ •^ Coral uses a variant of the DHT called a distributed sloppy hashtable to handle routing lookups.^ •^ Each node keeps a hash table that maps keys to nodes that are^ close to
the node storing the data. ◦^ In other words, they might hash to a node that’s not quiteright. • When a piece of data is requested, a node looks up the key inits hash table and either finds: ◦^ The node storing the data ◦^ A node that is closer to the data (in keyspace terms)^ •^ XOR of the keys is used to determine distance. ◦^ Like Chord, Coral can find the actual node storing the datain a logarithmic number of hops.
Department of Computer Science — University of San Francisco – p. 28/
??
16-29: Sloppy Storage^ •^ A potential problem for systems like Coral is the saturation of afew nodes hosting frequently-requested pieces of data.^ ◦
This is sometimes called hot-spot congestion • Coral solves this by using sloppy storage. • Data is often stored at a node close to the hash value, ratherthan at the hash value itself.
Department of Computer Science — University of San Francisco – p. 29/
??
16-30: Sloppy Storage^ • •^ When a node wants to insert a key/value pair in Coral, it uses atwo-phase technique.^ •^ In the forward phase, it searches forward to the node with thehash value corresponding to the data to be stored.^ •^ It stops whenever it finds a node that is both
full^ and
loaded
for
the key of the data to be stored.^ ◦^ full: The node stores enough values for the key that aresufficiently stale.^ ◦^ Loaded: The node has received sufficient requests for thekey within a given time period. • If a full and loaded node is discovered, the data is stored thereand the algorithm stops. • Otherwise, the node is placed on a stack and routing continues.
Department of Computer Science — University of San Fran
16-37: Discussion^ •^ SONs provide a middle ground between DHTs andunstructured P2P networks.^ •^ Allow nodes to choose what to store and who to connect to.^ •^ Require a mechanism for classifying nodes according tocontent.^ ◦
Most effective with files for which there is an externalhierarchy, such as allmusic. • Less theoretically sound than DHTs ◦ Most results are purely empirical.
Department of Computer Science — University of San Francisco – p. 37/
??
16-38: Summary^ •^ DHTs provide a structured way to store and index informationusing a P2P model.^ •^ Chord uses a logical ring, CAN uses a multi-dimensional torus.^ •^ Both use greedy search to locate data.^ ◦
Can provide efficient lookup as the network grows. • Coral applies DHT technology to web cacheing and replication. ◦ Uses sloppy DHTs to deal with congestion issues. • Semantic Overlay Networks are an alternative approach tosolving the routing problem in P2P networks. ◦ Connect nodes with related content, and search only withinthose networks.
Department of Computer Science — University of San Francisco – p. 38/
??