Solved Problems for Distributed Software Develop - Slides | CS 682, Study notes of Software Engineering

Material Type: Notes; Class: Distributed Software Develop; Subject: Computer Science; University: University of San Francisco (CA); Term: Spring 2007;

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-rqj
koofers-user-rqj 🇺🇸

10 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Distributed Software
Development
Problem Solving I
Chris Brooks
Department of Computer Science
University of San Francisco
Department of Computer Science University of San Francisco p. 1/??
Distributed Problem Solving
The preliminary portion of the course focused on techniques
for achieving properties or states in distributed systems.
Causal delivery, mutual exclusion, etc.
Now, we turn to the question of how to solve problems in a
distributed fashion, assuming that we have implemented
some of these properties.
Department of Computer Science University of San Francisco p. 2/??
Problem environments
One dimension along which we can characterize distributed
problem solving is according to the degree of autonomy or
self-interestedness of the participants.
How much can a protocol assume about the behavior and
motives of the participants?
Department of Computer Science University of San Francisco
Centrally controlled environments
At one extreme, all processes in a system are controlled by a
single individual or organization.
Beowulf cluster
Parallel computer
Intranet
This allows us to make fairly restrictive assumptions about
the behavior of system processes.
NFS, parallel computation (e.g. conjugate gradient)
Department of Computer Science University of San Francisco p. 4/??
Cooperative processes
We’ll also think about processes that are controlled by
separate individuals, but assumed to be cooperative.
SETI@Home, distributed.net
Meeting scheduling
TCP (originally)
In this case, we can assume that processes will act
benevolently, but that they will be heterogenous.
Department of Computer Science University of San Francisco p. 5/??
Non-cooperative processes
We’ll also need to think about non-cooperative systems, in
which each process is self-interested.
Not necessarily malevolent, just concerned only about
its own performance.
This will require a different set of assumptions about how
our protocol should work.
Resource allocation, auctions, some scheduling
problems, file-sharing
Department of Computer Science University of San Francisco
pf3
pf4
pf5

Partial preview of the text

Download Solved Problems for Distributed Software Develop - Slides | CS 682 and more Study notes Software Engineering in PDF only on Docsity!

Distributed SoftwareDevelopment^ Problem Solving I

Chris Brooks Department of Computer ScienceUniversity of San Francisco^ Department of Computer Science — University of San Francisco – p. 1/

??

Distributed Problem Solving • The preliminary portion of the course focused on techniquesfor achieving properties or states in distributed systems. • Causal delivery, mutual exclusion, etc. • Now, we turn to the question of how to solve problems in adistributed fashion, assuming that we have implementedsome of these properties.

Department of Computer Science — University of San Francisco – p. 2/

??

Problem environments

-^ One dimension along which we can characterize distributedproblem solving is according to the degree of autonomy orself-interestedness of the participants. •^ How much can a protocol assume about the behavior andmotives of the participants?

Department of Computer Science — University of San Francisco

Centrally controlled environments • At one extreme, all processes in a system are controlled by asingle individual or organization.^ •^ Beowulf cluster^ •^ Parallel computer^ •^ Intranet • This allows us to make fairly restrictive assumptions aboutthe behavior of system processes.^ •^ NFS, parallel computation (e.g. conjugate gradient)

Department of Computer Science — University of San Francisco – p. 4/

??

Cooperative processes

-^ We’ll also think about processes that are controlled byseparate individuals, but assumed to be cooperative.^ •^ SETI@Home, distributed.net^ •^ Meeting scheduling^ •^ TCP (originally) •^ In this case, we can assume that processes will actbenevolently, but that they will be heterogenous.

Department of Computer Science — University of San Francisco – p. 5/

??

Non-cooperative processes • We’ll also need to think about non-cooperative systems, inwhich each process is self-interested. • Not necessarily malevolent, just concerned only aboutits own performance. • This will require a different set of assumptions about howour protocol should work. • Resource allocation, auctions, some schedulingproblems, file-sharing

Department of Computer Science — University of San Francisco

TCP: an illustration

-^ TCP is an example of a protocol that was designed to workin a cooperative environment. •^ Recall that TCP is built on top of UDP^ •^ UDP provides packet-oriented delivery. •^ TCP provides reliable in-order delivery on top of UDP. •^ Sender A sends a packet to receiver B. •^ B returns an acknowledgment that the packet was received. •^ If A does not receive an ACK before a timer expires, thepacket is resent.

Department of Computer Science — University of San Francisco – p. 7/

??

TCP: an illustration

-^ To improve transmission efficiency, TCP uses a conceptcalled^

sliding windows

-^ The sender has a “window” of size

n. It sends all packets

within that window. • As the lowest-numbered packet in the window isacknowleged, the window “slides” upward, and morepackets are sent. • This improves transmission rates - the goal is for thenetwork to be completely saturated.

Department of Computer Science — University of San Francisco – p. 8/

??

TCP: an illustration

-^ The problem is how to deal with congestion.^ •^ Packets may be dropped by the receiver, or byintermediate hosts.^ •^ When should the sender resend?^ •^ Too slow

→^ inefficiency

-^ Too quickly

→^ oversaturation is worsened.

-^ TCP uses an adaptive retransmission policy.^ •^ As connection performance changes, so does timeoutduration.

Department of Computer Science — University of San Francisco

TCP: an illustration

-^ The TCP congestion algorithm does the following (loosely):^ •^ When a packet is lost, halve the window size and doubletimeout.^ •^ If all packets in a window are transmitted successfully,increase window size by 1. •^ There are lots of details in the implementation of this thatI’m glossing over. •^ The key point is this: This protocol works wonderfully,

as

long as everyone else also uses it

-^ Designed to minimize congestion over the entireInternet.

Department of Computer Science — University of San Francisco – p. 10/

??

TCP: an illustration

-^ In the early days if the Internet, this was not a problem.^ •^ Small number of users, fewer bandwidth-saturatingapps. •^ Parallel download of images from web pages was the firstconcern. •^ Later, non-TCP protocols (RTSP, proprietary schemes)implemented their own congestion control algorithms. •^ These applications are not necessarily tuned to any sort ofglobal optimum.

Department of Computer Science — University of San Francisco – p. 11/

??

Tragedy of the Commons

-^ This is an example of a problem known as

tragedy of the

commons

. • Cost of using a resource is not borne equally by thebeneficiaries of that resource. -^ Leads to overuse. •^ Shared resources, such as networks, tend to be vulnerable tothis problem. •^ Game theory provides some ideas for dealing with thisdilemma.

Department of Computer Science — University of San Francisco –

metric-key encryption: a brief digression^ •^ Symmetric key encryption (or secret-key encryption) usesone key to encrypt and decrypt a message.^ •^ As opposed to public-key encryption, which uses pairs ofkeys.^ •^ A series of bit shifts and ANDs with a key are used toconceal a message.^ •^ Secret-key encryption is “more secure” than public keyencryption in the sense that a shorter key is needed toprovide the same level of security.

Department of Computer Science — University of San Francisco – p. 19/

??

Symmetric-key encryption: a brief digression

-^ Two well-known algorithms: DES, RC5.^ •^ DES was developed by the government in the 50s^ •^ RC5 was developed at RSA labs in the 90s. •^ The only

known^ way to defeat them is through exhaustive search of all keys. • DES keyspace is

keysize 2

-^ 56-bit secret-key algorithm has a keyspace of

quadrillion keys.

Department of Computer Science — University of San Francisco – p. 20/

??

distributed.net

-^ History:^ •^ 1997: RC5-56 is cracked: 212 days, 34 quadrillion keyssearched. (47% of keyspace)^ •^ 2002: RC5-64 is cracked: 1757 days, over

1.^16 ×^10

19

keys (63%) of keyspace searched. (270 GKeys/sec atcompletion) • RC5-72 is ongoing. (how long will this take at currentspeeds?) • Other problems: • DES • Factoring • Golomb rulers

Department of Computer Science — University of San Francisco –

How does it work?

-^ The keyspace is broken into set of blocks. •^ A master keyserver tracks all blocks:^ •^ Which are unprocessed^ •^ Which are currently being processed^ •^ Which are done. •^ It communicates with a set of

proxy keyservers Department of Computer Science — University of San Francisco – p. 22/

??

How does it work?

-^ Proxies serve as a layer between clients and servers. •^ Proxies request a block of keys, which are then handed outto clients on demand.^ •^ Avoids server bottleneck.^ •^ Round-robin DNS provides fault-tolerance; if one proxyfails, client uses the next available. •^ When a client is done processing a block, it returns it to theserver. •^ Blocks that are unreturned after 90 days are reassigned.

Department of Computer Science — University of San Francisco – p. 23/

??

Grid computing vs. Public resource com

-^ distributed.net is an example of what’s sometimes referredto as^ public resource computing^ •^ Unused computing resources are added into anapplication dynamically^ •^ Focus is on easily integrating large numbers of clientsinto an application. •^ Grid computing

is a similar concept

-^ Applications can discover and use resources dynamicallyas they become available. •^ Difference is one of degree: grid computing tends tofocus more on interactions between national-lab-levelcomputing facilities. •^ Focus is on access control and rights, management ofidentity, automated service description, etc.

Department of Computer Science — University of San Francisco –

SETI@Home

-^ SETI stands for Search for Extraterrestrial Intelligence •^ Radio telescopes listen for transmissions from outer space^ •^ SETI@Home uses signals captured by a telescope inPuerto Rico^ •^ Either intended or unintended transmissions •^ Radio telescopes produce a vast amount ofcontinuously-occurring data.^ •^ Approximately 35GB/day •^ Standard SETI programs can only examine the datasuperficially •^ By dividing the data into small pieces, it can be distributedto clients worldwide for processing.

Department of Computer Science — University of San Francisco – p. 25/

??

SETI@Home

-^ Data is captured by the telescope onto 35 GB magnetictapes, then mailed to Berkeley. •^ It is then broken into 0.25 MB chunks. •^ Each chunk represents about 107 seconds of data in a10kHz range of the electromagnetic spectrum. •^ As with distributed.net, each chunk can be processedcompletely independently of the others.

Department of Computer Science — University of San Francisco – p. 26/

??

SETI@Home

-^ FFTs are used to extract signals at specific frequencies •^ Doppler effects are removed. (this is the computationallyintensive part) •^ Looks for signals with a Gaussian shape (weaker, thenstrong, then weak again)^ •^ Since the telescope is fixed and the Earth rotates, asignal will ’move across it’ in about 12 seconds.^ •^ Earth-based transmissions will have a constantamplitude. •^ Also looks for pulsed signals.

Department of Computer Science — University of San Francisco –

The SETI@Home architecture • Once data arrives at Berkeley, a

splitter^

program

preprocesses it and divides it into workunits (or chunks). • These are then stored in a database. • Clients interact with a data server that distributes workunits. • The client may then disconnect and work on the data for aslong as necessary. • Results are then returned from the client to the server.

Department of Computer Science — University of San Francisco – p. 28/

??

The SETI@Home architecture • Data is distributed redundantly (the same block is sent toseveral clients). • This provides fault tolerance. • Results are returned to the server, where they are written toa file, then processed and entered into a database. • Once a workunit has enough results, it is consideredcomplete and the results are aggregated.

Department of Computer Science — University of San Francisco – p. 29/

??

The distributed search problem • SETI@Home and distributed.net are both examples of distributed search •^ Exhaustively examine a huge search space. • This sort of problem has many characteristics that make itappealing for large-scale distributed computing •^ All compute nodes are independent of each other.^ •^ No bottlenecks at client^ •^ No need for client-client communication •^ Failure of a compute node is easily tolerated. •^ Redundant computation of results is not a problem. •^ Clients can be stopped and restarted without problem.

Department of Computer Science — University of San Francisco –