Lecture Slides on DHTs - Distributed Software Develop | CS 682, Study notes of Software Engineering

Material Type: Notes; Class: Distributed Software Develop; Subject: Computer Science; University: University of San Francisco (CA); Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-olu
koofers-user-olu 🇺🇸

9 documents

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Distributed Software Development
More DHTs
Chris Brooks
Department of Computer Science
University of San Francisco
Departmentof Computer Science University of San Francisco p. 1/??
16-2: Distributed Hash Tables
On Tuesday, we talked about Distributed Hash Tables
Used Chord as an example
Able to act as a distributed storage and indexing mechanism
Maps “keys” (hashes of data) to “values” (coordinates in a
space of nodes.)
Each node is responsible for a set of the keyspace.
Departmentof Computer Science University of San Francisco p. 2/??
16-3: Desirable Properties
Desirable properties of DHTs include:
Always able to find data stored in the network
Able to support entry and exit of nodes
Able to tolerate node failure
Efficient routing of queries
Departmentof Computer Science University of San Francisco
16-4: Chord
Recall that Chord uses a 1-D ring to structure the network.
Nodes keep a finger table that tells them the successors at
various points in the network.
Routing requires O(logn)messages.
Can tolerate failures through replication of data.
Departmentof Computer Science University of San Francisco p. 4/??
16-5: CAN
CAN stands for Content Addressable Network
Developed at Berkeley at the same time as Chord was
developed at MIT.
Also provides hash-table like functionality.
get, store, delete
Departmentof Computer Science University of San Francisco p. 5/??
16-6: Address construction in CAN
CAN uses a d-dimensional torus as its address space.
Vector of ddimensions where each dimension “wraps
around”
Each node is responsible for a zone in this space.
The hash function maps data into a point in this space.
Departmentof Computer Science University of San Francisco
pf3
pf4
pf5

Partial preview of the text

Download Lecture Slides on DHTs - Distributed Software Develop | CS 682 and more Study notes Software Engineering in PDF only on Docsity!

Distributed Software Development

More DHTs^ Chris BrooksDepartment of Computer ScienceUniversity of San Francisco

Department of Computer Science — University of San Francisco – p. 1/

??

16-2: Distributed Hash Tables^ •^ On Tuesday, we talked about Distributed Hash Tables^ ◦

Used Chord as an example • Able to act as a distributed storage and indexing mechanism • Maps “keys” (hashes of data) to “values” (coordinates in aspace of nodes.) • Each node is responsible for a set of the keyspace.

Department of Computer Science — University of San Francisco – p. 2/

??

16-3: Desirable Properties^ •^ Desirable properties of DHTs include:^ ◦

Always able to find data stored in the network ◦ Able to support entry and exit of nodes ◦ Able to tolerate node failure ◦ Efficient routing of queries

Department of Computer Science — University of San Fra

16-4: Chord^ •^ Recall that Chord uses a 1-D ring to structure the network.^ •^ Nodes keep a

finger table

that tells them the successors at

various points in the network. • Routing requires

O(logn

)^ messages.

-^ Can tolerate failures through replication of data.

Department of Computer Science — University of San Francisco – p. 4/

??

16-5: CAN^ •^ CAN stands for Content Addressable Network^ •^ Developed at Berkeley at the same time as Chord wasdeveloped at MIT.^ •^ Also provides hash-table like functionality.^ ◦

get, store, delete

Department of Computer Science — University of San Francisco – p. 5/

??

16-6: Address construction in CAN^ •^ CAN uses a

d-dimensional torus as its address space. ◦^ Vector of

d^ dimensions where each dimension “wraps around”

-^ Each node is responsible for a

zone in this space.

-^ The hash function maps data into a point in this space.

Department of Computer Science — University of San Fra

16-7: Example^0 0 ,

-^ An example CAN in a 2-Dspace. •^ Node 1 is responsible fordata in the range ( (0-0.25),(0.5-1) ) •^ Node

4 is responsible

for

data in the range ( (0.5-1),(0.5-1) )

Department of Computer Science — University of San Francisco – p. 7/

??

16-8: Routing in CAN^ •^ Each node keeps track of the IP address and zone of itsneighbors.^ •^ A neighbor is a node whose zone overlaps a node in

n^ −^1

dimensions, and abuts it in the

nth dimension.

-^ In our example, 2,3, and 4 are neighbors of 1.^ ◦^ Remember that the space is really a torus. -^ Routing is done by greedy search. •^ Always forward the message to the neighbor who is closest tothe data, using standard Euclidean distance. (ties brokenrandomly)

Department of Computer Science — University of San Francisco – p. 8/

??

16-9: Routing in CAN^ •^ In a

d-dimensional space of

n^ evenly-sized zones,

◦^ Routing requires

(^1) d d^ (n (^4) )^ hops.

◦^ Each node stores info about 2

d^ neighbors.

-^ By increasing

d, we can reduce the length of routes

-^ Cost: additional storage at each node. •^ We get resistance to failure when routing for free^ ◦^

If we are trying to route through a non-responsive node, justchoose the “next-best” path.

Department of Computer Science — University of San Fra

16-10: Node joining^ •^ New nodes are added to the system by choosing an existingzone and subdividing it.^ •^ First, the new node must find an existing bootstrap node.^ •^ Then it must find its coordinates within CAN^ •^ Finally, it must update its neighbors’ routing tables.

Department of Computer Science — University of San Francisco – p. 10/

??

16-11: Node joining - bootstrapping^ •^ It is assumed that the IP address of at least one current CANnode is known to the joining node.^ •^ The designers use DNS to map a CAN hostname to one ormore CAN nodes.^ •^ This bootstrap node can provide the IP addresses of othernodes in the network that can be used as entry points.

Department of Computer Science — University of San Francisco – p. 11/

??

16-12: Node joining - finding a zone to share^ •^ The newly-joining node next randomly selects a point

P^ in the

coordinate space. • It sends a JOIN message to that coordinate. • This message is routed as all other CAN messages. • When it reaches the node responsible for

P^ , that node returns

a reply to the newly-joining node. • The node responsible for

P^ then divides its zone in half and

assigns half to the newly-joining node.

Department of Computer Science — University of San Fran

16-19: Improved routing metrics^ •^ The basic CAN routing algorithm uses greedy search. Itchooses the node that produces the greatest reduction indistance between the present node and the data to be found.^ •^ An alternative is to also measure the round-trip-time between anode and each of its neighbors.^ •^ When choosing where to forward a packet, select the node thatmaximizes progress over RTT.^ •^ This favors low-latency paths at the IP level.

Department of Computer Science — University of San Francisco – p. 19/

??

16-20: Overloading zones^ •^ The basic design assumes that each zone is assigned toexactly one node.^ •^ Alternatively, multiple peers can share zones.^ •^ Each peer maintains a list of other peers in its zone, in additionto the neighbor list.^ •^ When a node receives an update from a neighbor, he computesthe RTT for each of the nodes in that zone.^ •^ Retains the lowest RTT.^ •^ Contents may be either divided amongst peers or elsereplicated.

Department of Computer Science — University of San Francisco – p. 20/

??

16-21: CAN summary^ •^ Like Chord, CAN provides an implementation of a DHT.^ •^ Uses a

d-dimensional space to divide data.

-^ Space is broken and merged as nodes enter and leave. •^ Explicitly trades off storage for routing efficiency. •^ But Chord and CAN are active research projects (at MIT andBerkeley, respectively.)

Department of Computer Science — University of San Fran

16-22: Coral^ •^ Coral is a peer-to-peer Web content caching and distributionsystem.^ •^ Designed to distribute the load of high-demand Web content.^ •^ Also uses a form of DHTs^ •^ Run by NYU^ •^ To access “Coralized” content, append ’.nyud.net:8090’ to aURL.

Department of Computer Science — University of San Francisco – p. 22/

??

16-23: Coral in a Nutshell, pt 1^ •^ Client sends a DNS request for foobar.com.nyud.net to its localresolver.^ •^ DNS request is passed along to a Coral DNS server in the .netdomain.^ •^ The Coral DNS server probes the client to discover RTT, andthe location of the last few network hops.^ •^ Checks Coral to see if any HTTP proxies are near the client.^ •^ DNS returns proxies (or sometimes nameservers) close to theclient.

Department of Computer Science — University of San Francisco – p. 23/

??

16-24: Coral in a Nutshell, pt 2^ •^ The client then sends an HTTP get to the specified proxy.^ •^ If the proxy has a cache of the file, this is returned.^ •^ Otherwise, the proxy looks up the object in Coral^ •^ Object is either returned from another Coral node, or else fromthe original source.^ •^ Coral proxy stores a reference to this item in Coral, indicatingthat the proxy now has a copy.

Department of Computer Science — University of San Fran

16-25: Coral DNS^ •^ When a lookup is done in DNS for a Coral address (anythingending with nyucd.net), one of the Coral DNS serversresponsible for this domain tries to discover a Coral cache nearthe client.^ •^ This client’s IP address will be returned as a result of the DNSlookup.^ •^ RTT and traceroute-style information is used to try to determinethe latency to the client.

Department of Computer Science — University of San Francisco – p. 25/

??

16-26: Coral HTTP proxy^ •^ The Coral proxies are responsible for cacheing and serving upcontent.^ •^ Goal: fetch content from other clients whenever possible^ ◦

The whole idea is to each the purden on the original host. • Each proxy keeps a local cache. • On request, does the proxy have the data? ◦ Yes. Send the data to the client. Done. ◦ No, but a proxy we know does. Fetch the data and return itto the client. ◦ No, and no nearby proxies have the data. Fetch the datafrom the origin.

Department of Computer Science — University of San Francisco – p. 26/

??

16-27: Coral Architecture^ •^ Keys are hashed 160-bit identifiers (for data), or else hashed IPaddresses (for nodes).^ •^ Like Coral and CAN, this key is used to place the data in thesystem.^ •^ This ID is then used to perform routing.

Department of Computer Science — University of San Fran

16-28: Routing and DSHTs^ •^ Coral uses a variant of the DHT called a distributed sloppy hashtable to handle routing lookups.^ •^ Each node keeps a hash table that maps keys to nodes that are^ close to

the node storing the data. ◦^ In other words, they might hash to a node that’s not quiteright. • When a piece of data is requested, a node looks up the key inits hash table and either finds: ◦^ The node storing the data ◦^ A node that is closer to the data (in keyspace terms)^ •^ XOR of the keys is used to determine distance. ◦^ Like Chord, Coral can find the actual node storing the datain a logarithmic number of hops.

Department of Computer Science — University of San Francisco – p. 28/

??

16-29: Sloppy Storage^ •^ A potential problem for systems like Coral is the saturation of afew nodes hosting frequently-requested pieces of data.^ ◦

This is sometimes called hot-spot congestion • Coral solves this by using sloppy storage. • Data is often stored at a node close to the hash value, ratherthan at the hash value itself.

Department of Computer Science — University of San Francisco – p. 29/

??

16-30: Sloppy Storage^ • •^ When a node wants to insert a key/value pair in Coral, it uses atwo-phase technique.^ •^ In the forward phase, it searches forward to the node with thehash value corresponding to the data to be stored.^ •^ It stops whenever it finds a node that is both

full^ and

loaded

for

the key of the data to be stored.^ ◦^ full: The node stores enough values for the key that aresufficiently stale.^ ◦^ Loaded: The node has received sufficient requests for thekey within a given time period. • If a full and loaded node is discovered, the data is stored thereand the algorithm stops. • Otherwise, the node is placed on a stack and routing continues.

Department of Computer Science — University of San Fran

16-37: Discussion^ •^ SONs provide a middle ground between DHTs andunstructured P2P networks.^ •^ Allow nodes to choose what to store and who to connect to.^ •^ Require a mechanism for classifying nodes according tocontent.^ ◦

Most effective with files for which there is an externalhierarchy, such as allmusic. • Less theoretically sound than DHTs ◦ Most results are purely empirical.

Department of Computer Science — University of San Francisco – p. 37/

??

16-38: Summary^ •^ DHTs provide a structured way to store and index informationusing a P2P model.^ •^ Chord uses a logical ring, CAN uses a multi-dimensional torus.^ •^ Both use greedy search to locate data.^ ◦

Can provide efficient lookup as the network grows. • Coral applies DHT technology to web cacheing and replication. ◦ Uses sloppy DHTs to deal with congestion issues. • Semantic Overlay Networks are an alternative approach tosolving the routing problem in P2P networks. ◦ Connect nodes with related content, and search only withinthose networks.

Department of Computer Science — University of San Francisco – p. 38/

??