Distributed File Systems - Operating Systems, Distributed Computation - Lecture Slides, Slides of Operating Systems

During the course of work of the Operating Systems, Distributed Computation, we learn the core of the programming. The main points disucss in these lecture slides are:Distributed File Systems, Naming and Transparency, Remote File Access, Stateful Versus Stateless Service, File Replication, Location Transparency, Location Independence, Naming Schemes, Approaches to Naming Files

Typology: Slides

2012/2013

Uploaded on 04/24/2013

banamala
banamala 🇮🇳

4.4

(19)

114 documents

1 / 27

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
17: Distributed File
Systems 1
OPERATING SYSTEMS
Distributed File Systems
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b

Partial preview of the text

Download Distributed File Systems - Operating Systems, Distributed Computation - Lecture Slides and more Slides Operating Systems in PDF only on Docsity!

17: Distributed File 1

OPERATING SYSTEMS

Distributed File Systems

Docsity.com

17: Distributed File 2

DISTRIBUTED FILE SYSTEMS

Overview:

  • Background
  • Naming and Transparency
  • Remote File Access
  • Stateful versus Stateless Service
  • File Replication
  • An Example: AFS

Docsity.com

17: Distributed File 4

SYSTEMS

Clients, servers, and storage are dispersed across machines. Configuration and implementation may vary -

a) Servers may run on dedicated machines, OR b) Servers and clients can be on the same machines. c) The OS itself can be distributed (with the file system a part of that distribution. d) A distribution layer can be interposed between a conventional OS and the file system.

Clients should view a DFS the same way they would a centralized FS; the distribution is hidden at a lower level.

Performance is concerned with throughput and response time.

Definitions

Docsity.com

17: Distributed File 5

SYSTEMS

Naming is the mapping between logical and physical objects.

  • Example: A user filename maps to <cylinder, sector>.
  • In a conventional file system, it's understood where the file actually resides; the system and disk are known.
  • In a transparent DFS, the location of a file, somewhere in the network, is hidden.
  • File replication means multiple copies of a file; mapping returns a SET of locations for the replicas.

Location transparency -

a) The name of a file does not reveal any hint of the file's physical storage location. b) File name still denotes a specific, although hidden, set of physical disk blocks. c) This is a convenient way to share data. d) Can expose correspondence between component units and machines.

Naming and Transparency

Docsity.com

17: Distributed File 7

SYSTEMS

The ANDREW DFS AS AN EXAMPLE:

  • Is location independent.
  • Supports file mobility.
  • Separation of FS and OS allows for disk-less systems. These have lower cost and convenient system upgrades. The performance is not as good.

NAMING SCHEMES:

There are three main approaches to naming files:

  1. Files are named with a combination of host and local name.
    • This guarantees a unique name. NOT location transparent NOR location independent.
    • Same naming works on local and remote files. The DFS is a loose collection of independent file systems.

Naming and Transparency

Docsity.com

17: Distributed File 8

SYSTEMS

NAMING SCHEMES:
  1. Remote directories are mounted to local directories.
    • So a local system seems to have a coherent directory structure.
    • The remote directories must be explicitly mounted. The files are location independent.
    • SUN NFS is a good example of this technique.
  2. A single global name structure spans all the files in the system.
    • The DFS is built the same way as a local filesystem. Location independent.

Naming and Transparency

Docsity.com

17: Distributed File 10

SYSTEMS

CACHING

Reduce network traffic by retaining recently accessed disk blocks in a cache, so that repeated accesses to the same information can be handled locally.

If required data is not already cached, a copy of data is brought from the server to the user.

Perform accesses on the cached copy.

Files are identified with one master copy residing at the server machine,

Copies of (parts of) the file are scattered in different caches.

Cache Consistency Problem -- Keeping the cached copies consistent with the master file.

Remote File Access

Docsity.com

17: Distributed File 11

SYSTEMS

CACHING

A remote service ((RPC) has these characteristic steps:

a) The client makes a request for file access. b) The request is passed to the server in message format. c) The server makes the file access. d) Return messages bring the result back to the client.

This is equivalent to performing a disk access for each request.

Remote File Access

Docsity.com

17: Distributed File 13

SYSTEMS

CACHE LOCATION:

What should be cached? << blocks <---> files >>.

Bigger sizes give a better hit rate; Smaller give better transfer times.

  • Caching on disk gives:

— Better reliability.

  • Caching in memory gives:

— The possibility of diskless work stations, — Greater speed,

Since the server cache is in memory, it allows the use of only one mechanism.

Remote File Access

Docsity.com

17: Distributed File 14

SYSTEMS

CACHE UPDATE POLICY:

A write through cache has good reliability. But the user must wait for writes to get

to the server. Used by NFS.

Delayed write - write requests complete more rapidly. Data may be written over the previous cache write, saving a remote write. Poor reliability on a crash.

  • Flush sometime later tries to regulate the frequency of writes.
  • Write on close delays the write even longer.
  • Which would you use for a database file? For file editing?

Remote File Access

Docsity.com

17: Distributed File 16

SYSTEMS

CACHE CONSISTENCY:

The basic issue is, how to determine that the client-cached data is consistent with what's on the server.

  • Client - initiated approach -

The client asks the server if the cached data is OK. What should be the frequency of "asking"? On file open, at fixed time interval, ...?

  • Server - initiated approach -

Possibilities: A and B both have the same file open. When A closes the file, B "discards" its copy. Then B must start over.

The server is notified on every open. If a file is opened for writing, then disable caching by other clients for that file.

Get read/write permission for each block; then disable caching only for particular blocks.

Remote File Access

Docsity.com

17: Distributed File 17

SYSTEMS

COMPARISON OF CACHING AND REMOTE SERVICE:
  • Many remote accesses can be handled by a local cache. There's a great deal of locality of reference in file accesses. Servers can be accessed only occasionally rather than for each access.
  • Caching causes data to be moved in a few big chunks rather than in many smaller pieces; this leads to considerable efficiency for the network.
  • Cache consistency is the major problem with caching. When there are infrequent writes, caching is a win. In environments with many writes, the work required to maintain consistency overwhelms caching advantages.
  • Caching requires a whole separate mechanism to support acquiring and storage of large amounts of data. Remote service merely does what's required for each call. As such, caching introduces an extra layer and mechanism and is more complicated than remote service.

Remote File Access

Docsity.com

17: Distributed File 19

SYSTEMS

STATEFUL VS. STATELESS SERVICE:

Performance is better for stateful.

  • Don't need to parse the filename each time, or "open/close" file on every request.
  • Stateful can have a read-ahead cache.

Fault Tolerance: A stateful server loses everything when it crashes.

  • Server must poll clients in order to renew its state.
  • Client crashes force the server to clean up its encached information.
  • Stateless remembers nothing so it can start easily after a crash.

Remote File Access

Docsity.com

17: Distributed File 20

SYSTEMS

FILE REPLICATION:
  • Duplicating files on multiple machines improves availability and performance.
  • Placed on failure-independent machines ( they won't fail together ).

Replication management should be "location-opaque".

  • The main problem is consistency - when one copy changes, how do other copies reflect that change? Often there is a tradeoff: consistency versus availability and performance.
  • Example:

"Demand replication" is like whole-file caching; reading a file causes it to be cached locally. Updates are done only on the primary file at which time all other copies are invalidated.

  • Atomic and serialized invalidation isn't guaranteed ( message could get lost / machine could crash. )

Remote File Access

Docsity.com