Fundamentals - Distributed Software Develop | CS 682, Study notes of Software Engineering

Material Type: Notes; Class: Distributed Software Develop; Subject: Computer Science; University: University of San Francisco (CA); Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-tys
koofers-user-tys 🇺🇸

10 documents

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Distributed Software
Development
Fundamentals
Chris Brooks
Department of Computer Science
University of San Francisco
Department of Computer Science University of San Francisco p. 1/??
Outline
Networking overview
Seven-layer model
Intro to Distributed Systems
Characteristics
Desirable Properties
Dealing with Time
Department of Computer Science University of San Francisco p. 2/??
Outline
Networking overview
Seven-layer model
Intro to Distributed Systems
Characteristics
Desirable Properties
Dealing with Time
Department of Computer Science University of San Francisco
TCP/IP in 30 minutes
Goal: Understand how a network transmits messages at
different layers.
How is a network composed?
What really happens when Firefox opens a connection to a
web server?
Note: this will b e an overview: for more details, take the
networking class.
Department of Computer Science University of San Francisco p. 4/??
Layering
Modern network design takes advantage of the idea of
layering
A particular service or module is constructed as a black box.
Users of that service do not need to know its internals, just
its interface.
This makes it easy to later build new modules (or layers)
that use the lower layers.
For example, HTTP is built on top of TCP.
A web browser does not typically need to worry about
the implementation of TCP, just that it works.
Unlike OO modules, the layers in a networked system
comprise protocols that span multiple machines.
Department of Computer Science University of San Francisco p. 5/??
The OSI seven-layer model
ISO (a standards body) developed a reference model called
OSI that defines the different layers needed for
communication, and specifies which should do each job.
The goal is to produce an open protocol that allows for
heterogeneous, extensible systems.
Aprotocol is a specification describing the order and format
of messages.
An open protocol is one in which all of this information is
publicly available.
Department of Computer Science University of San Francisco
pf3
pf4
pf5

Partial preview of the text

Download Fundamentals - Distributed Software Develop | CS 682 and more Study notes Software Engineering in PDF only on Docsity!

Distributed SoftwareDevelopment^ Fundamentals^ Chris BrooksDepartment of Computer ScienceUniversity of San Francisco^ Department of Computer Science — University of San Francisco – p. 1/

??

Outline

-^ Networking overview^ •^ Seven-layer model •^ Intro to Distributed Systems^ •^ Characteristics^ •^ Desirable Properties^ •^ Dealing with Time

Department of Computer Science — University of San Francisco – p. 2/

??

Outline

-^ Networking overview^ •^ Seven-layer model •^ Intro to Distributed Systems^ •^ Characteristics^ •^ Desirable Properties^ •^ Dealing with Time

Department of Computer Science — University of San Francisco

TCP/IP in 30 minutes • Goal: Understand how a network transmits messages atdifferent layers. • How is a network composed? • What really happens when Firefox opens a connection to aweb server? • Note: this will be an overview: for more details, take thenetworking class.^ Department of Computer Science — University of San Francisco – p. 4/

??

Layering

-^ Modern network design takes advantage of the idea of^ layering •^ A particular service or module is constructed as a black box. •^ Users of that service do not need to know its internals, justits interface. •^ This makes it easy to later build new modules (or layers)that use the lower layers. •^ For example, HTTP is built on top of TCP.^ •^ A web browser does not typically need to worry aboutthe implementation of TCP, just that it works. •^ Unlike OO modules, the layers in a networked systemcomprise protocols that span multiple machines.

Department of Computer Science — University of San Francisco – p. 5/

??

The OSI seven-layer model • ISO (a standards body) developed a reference model calledOSI that defines the different layers needed forcommunication, and specifies which should do each job. • The goal is to produce an open protocol that allows forheterogeneous, extensible systems. • A protocol^ is a specification describing the order and formatof messages. • An open protocol is one in which all of this information ispublicly available.^ Department of Computer Science — University of San Francisco

The OSI seven-layer model • Application • Presentation • Session • Transport • Network • Data Link • Physical^ Department of Computer Science — University of San Francisco – p. 7/

??

Message transmission across layers •^ An application (such as a web browser) wants to send amessage to another computer. •^ That application constructs a message and passes it to theapplication layer. •^ The application layer attaches a header to the message andpasses it to the presentation layer. •^ The presentation layer attaches a header and passes it tothe session layer, and so on.

Department of Computer Science — University of San Francisco – p. 8/

??

Message transmission across laye •^ On the other end, the message is received by the physicallayer, who strips off the appropriate header and passes themessage up to the data link layer. •^ This continues until the message reaches the applicationlayer of the receiving machine. •^ High-level layers don’t need to worry about lower-levellayers. •^ Lower-level layers treat everything from higher layers as datato be sent.

Department of Computer Science — University of San Francisco

Layers and packets • Each layer constructs a packet containing a portion of thedata to be transmitted. • This packet has a data section, and a header. • The header contains origin and destination information,checksums, sequence numbers, and other identifyinginformation. • When a message is sent by TCP, a packet is constructedand passed down to the IP layer. • This entire packet then becomes the data portion of the IPpacket, which is passed down to the network layer, and soon. • On the other end, the lowest layer removes the header andchecks the data integrity, then passes the data portion up tothe next layer.^ Department of Computer Science — University of San Francisco – p. 10/

??

Physical Layer

-^ This is the lowest-level layer, responsible for transmitting 0sand 1s. •^ Governs transmission rates, full or half-duplex, etc. •^ A modem works at the physical layer. •^ Lots of interesting problems at this level that we won’t getinto ...

Department of Computer Science — University of San Francisco – p. 11/

??

Data Link Layer • The data link layer provides error handling for the physicallayer. • Individual bits are grouped together into frames. • A checksum is then computed to detect transmission errors. • The data link layer can then request a retransmission of anerror is detected. • Messages are numbered; receiver can requestre-transmission of any message in a sequence. • Each frame is a separate, distinct message. • The Data link layer provides error-free transmission toupper-level layers.^ Department of Computer Science — University of San Francisco –

An example: HTTP • HTTP is the protocol that drives the Web. • A side note/axe to grind: WWW != Internet!! • It is a stateless protocol that uses TCP as its underlyingprotocol. • The client sends a request, which is processed by theserver. • The server sends a reply, and the exchange is ended.^ Department of Computer Science — University of San Francisco – p. 19/

??

HTTP requests • HTTP has a very simple message format. GET /~brooks/index.html HTTP/1.1Host: www.cs.usfca.eduConnection: closeUser-agent: Mozilla/4.0Accept-language: en • You can try this out for yourself with telnet ...^ Department of Computer Science — University of San Francisco – p. 20/

??

HTTP

-^ There are lots of wrinkles and extensions to HTTP^ •^ Cookies to help save state^ •^ CGI, SOAP to pass data and execute code as the resultof an HTTP request.^ •^ Web caching to store data closer to clients. •^ These are all possible because HTTP is an open protocol. •^ This is also what makes it possible for different companiesto write web browsers and web servers that seamlessly worktogether.

Department of Computer Science — University of San Francisco –

Summary

-^ The modern networking stack can be conceptually brokeninto a set of layers. •^ Each layer has a specific, well-defined function.^ •^ Acts as a black box •^ Higher-level layers build on the functionality of lower-levellayers. •^ We’ll be primarily concerned with the Transport andApplication layers.

Department of Computer Science — University of San Francisco – p. 22/

??

Outline

-^ Networking overview^ •^ Seven-layer model •^ Intro to Distributed Systems^ •^ Characteristics^ •^ Desirable Properties^ •^ Dealing with Time

Department of Computer Science — University of San Francisco – p. 23/

??

What is a Distributed System? • What is a distributed system? •^ (Couloris) “A distributed system is one in whichhardware or software components communicate orcoordinate their actions only by passing messages.” •^ (Tanenbaum) “A distributed system is a collection ofindependent computers that appear to the users of thesystem as a single computer.” •^ (Lamport) “You know you have one when the crash of acomputer you’ve never heard of stops you from gettingany work done.” • All of these get at different aspects of the issue ...

Department of Computer Science — University of San Francisco –

Advantages of a distributed system^ •^ Can share expensive resources or data^ •^ Economics^ •^ A collection of PCs can provide betterprice/performance than a single mainframe.^ •^ Speed^ •^ A distributed system will often have more computingpower than a single mainframe.^ •^ Inherent distribution^ •^ Often, your data/users/resources are geographicallydistributed

Department of Computer Science — University of San Francisco – p. 25/

??

Advantages of a distributed system •^ Reliability^ •^ If one node fails, the rest of the system can continue •^ Incremental growth^ •^ Components can be added or replaced in smallincrements.

Department of Computer Science — University of San Francisco – p. 26/

??

Disadvantages of distributed syste^ •^ Software design is much more complicated.^ •^ Lack of appropriate tools/languages^ •^ Disagreement on principles: how much should usersknow about the system? How much the system handleon a user’s behalf?^ •^ Potential network saturation^ •^ Privacy and security issues^ •^ Allowing resources to be shared can lead to data leakage^ •^ Extra sysadmin work

Department of Computer Science — University of San Francisco –

Design Issues

-^ Transparency •^ Flexibility •^ Dependability •^ Performance •^ Scalability

Department of Computer Science — University of San Francisco – p. 28/

??

Transparency

-^ The goal of transparency is a

single-system image

-^ From the user’s POV, it looks like a single machine. • Types of transparency: •^ Location transparency - Users cannot tell where theirresources are actually located. •^ Migration transparency - Resources can move withoutchanging their names. •^ Replication transparency - the number of copies of aresource is hidden from users. •^ Concurrency transparency - Users can share resourceswithout being aware of other users. •^ Parallelism transparency - A task can be run on multiplemachines without the user being aware of it.

Department of Computer Science — University of San Francisco – p. 29/

??

Transparency

-^ Is transparency always a good thing? What is the downside?

Department of Computer Science — University of San Francisco –

Communication paradigms • Asynchronous: there is no bound on message delay • Synchronous: • Known upper bound^

b^ on message delay

-^ Every process^ p^

has a local clock^ C

which drifts at ap rate of^ r >^0 and^ ∀

′p and ∀t > t: C(t)−C− 1 p (1 + r)≤ ′(t)p≤^ (1 +^ r)′ (^) t−t

-^ In English, clock drift has an upper and lower bound. •^ Also, bounds on the amount of time needed for aprocess to execute a single step. • Synchronous communication allows you to implementapproximately synchronized clocks, even in the presence offailure.

Department of Computer Science — University of San Francisco – p. 37/

??

Dealing with time • One of the fundamental problems in distributed systems isdealing correctly with time. • Not only when things happened, but what order thingshappened in. • We would like for all processes to see

relevant^ chages in the same order.^ •^ Example: updating a replicated database. • Depending on the communication model, this may be quitedificult. • Insight: often, it doesn’t matter exactly what time anoperation happens, but what order events occur in. • (exception: hard real-time systems)

Department of Computer Science — University of San Francisco – p. 38/

??

Global time servers • NTP is an Internet Protocol that allows your machine tosynchronize its clock with a remote source, thereby keepingit accurate. • Is that all we need to do?^ Department of Computer Science — University of San Francisco –

Global time servers • NTP is an Internet Protocol that allows your machine tosynchronize its clock with a remote source, thereby keepingit accurate. • Is that all we need to do? • Maybe. Maybe not. • What if we don’t have an Internet connection, or NTPis blocked by our firewall? • Can we guarantee that all users use the same remotetime server? • How often should they update? • What if users don’t do this?^ Department of Computer Science — University of San Francisco – p. 40/

??

Logical time

-^ The algorithms we’ll look at in this class will not need todepend on the^ absolute

time that something happens.

-^ Instead, we’ll be interested in the

logical^ time, or^ causal order^ in which events occur. • As long as all processes agree on the order in which a set ofevents that influence each other occurs, we’re OK. • We’ll spend time next week looking at this problem.

Department of Computer Science — University of San Francisco – p. 41/

??

Summary

-^ There are lots of desirable properties and design issues fordistributed systems.^ •^ Performance, scalability, reliability, flexibility,transparency^ •^ Often, we must sacrifice one for another^ •^ Some (e.g. Parallel transparency) are not possible withtoday’s technology. •^ Communication can be either

synchronous^ or^ asynchronous

-^ Time is a very sticky problem to deal with in distributedsystems. •^ Characterizing types of failure will help us identify what ouralgorithms and systems can and cannot stand up to.

Department of Computer Science — University of San Francisco –