Distributed File Systems - Advanced Operating Systems - Lecture Slides, Slides of Advanced Operating Systems

Main points of this lecture are: Distributed File Systems, Consistency and Replication, File Service, File Server Design, Sequence of Bytes, Sequence of Records, File Attributes, Local Copy of File, Issue of Buffering, Symbolic Links

Typology: Slides

2012/2013

Uploaded on 04/23/2013

atasi
atasi 🇮🇳

4.6

(32)

134 documents

1 / 61

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CIS 620
Advanced Operating Systems
Lecture 11 Distributed File Systems,
Consistency and Replication
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d

Partial preview of the text

Download Distributed File Systems - Advanced Operating Systems - Lecture Slides and more Slides Advanced Operating Systems in PDF only on Docsity!

CIS 620

Advanced Operating Systems

Lecture 11 – Distributed File Systems,

Consistency and Replication

  • File service vs. file server
    • The file service is the specification.
    • A file server is a process running on a machine to

implement the file service for (some) files on that

machine.

  • In a normal distributed system would have one file

service but perhaps many file servers.

  • If have very different kinds of file systems we might not be able to have a single file service as perhaps some functions are not available.
  • File attributes
    • rwx and perhaps a (append)
      • This is really a subset of what is called ACL -- access control list or Capability.
      • You get ACLs and Capabilities by reading columns and rows of the access matrix.
    • owner, group, various dates, size
    • dump, autocompress, immutable
  • Upload/download vs. remote access.
    • Upload/download means the only file services

supplied are read file and write file.

  • All modifications done on a local copy of file.
  • Conceptually simple at first glance.
  • Whole file transfers are efficient (assuming you are going to access most of the file) when compared to multiple small accesses.
  • Not an efficient use of bandwidth if you access only a small part of a large file.
  • Requires storage on client.
  • Note that meta-data is written for a read so if you want faithful semantics every client read must modify metadata on server or all requests for metadata (e.g ls or dir commands) must go to server.
  • Cache consistency question.
  • Directories
  • Mapping from names to files/directories.
  • Contains rules for names of files and

(sub)directories.

  • Hierarchy i.e. tree
    • (hard) links
  • With hard links the filesystem becomes a Directed Acyclic Graph instead of a simple tree.
  • Symbolic links
  • Symbolic not symmetric. Indeed asymmetric.
  • Consider cd ~ mkdir dir touch dir1/file ln -s dir1/file1 file
  • Imagine hard links pointing to directories (Unix

does not permit this).

cd ~ mkdir B; mkdir C mkdir B/D; mkdir B/E ln B B/D/oh-my

  • Now you have a loop with honest looking links.
  • Normally you can't remove a directory (i.e. unlink

it from its parent) unless it is empty.

  • But when can have multiple hard links to a directory, you should permit removing (i.e. unlinking) one even if the directory is not empty.
  • So in the above example you could unlink B from

A.

  • Now you have garbage (unreachable, i.e.

unnamable) directories B, D, and E.

  • For a centralized system you need a conventional

garbage collection.

  • For distributed system you need a distributed

garbage collector, which is much harder.

  • Transparency
    • Location transparency
      • Path name (i.e. full name of file) does not say where the file is located.
  • Examples
    • Machine + path naming
      • /machine/path
      • machine:path
    • Mounting remote file system onto local hierarchy
    • When done intelligently we get location

transparency

  • Single namespace looking the same on all

machines

  • Two level naming
    • We said above that a directory is a mapping from

names to files (and subdirectories).

  • More formally, the directory maps the user name

/home/me/class-notes.html to the OS name for

that file 143428 (the Unix inode number).

  • These two names are sometimes called the

symbolic and binary names.

  • For some systems the binary names are available.
  • Redundant storage of files for availability
  • Naturally must worry about updates
    • When visible?
    • Concurrent updates?
  • Whenever you hear of a system that keeps multiple copies of something, an immediate question should be "are these immutable?". If the answer is no, the next question is "what are the update semantics?”
  • Sharing semantics
  • Unix semantics - A read returns the value stored

by the last write.

  • Actually Unix doesn't quite do this.
    • If a write is large (several blocks), do seeks for each
    • During a seek, the process sleeps (in the kernel)
    • Another process can be writing a range of blocks that intersects the blocks for the first write.
    • The result could be (depending on disk scheduling), that the result does not have a last write.
  • Perhaps Unix semantics means - A read returns the value stored by the last write, providing one exists.
  • Perhaps Unix semantics means - A write syscall should be thought of as a sequence of write-block syscalls and similar for reads. A read-block syscall returns the value of the last write-block syscall for that block
  • May mess up file-pointer semantics
    • The file pointer is shared across the fork so all the children of a parent share it.
    • But if the children run on another machine with session semantics, the file pointer can't be shared since the other machine does not see the effect of the writes done by the parent).
  • Immutable files
  • Then there is "no problem”
  • Fine if you don't want to change anything
  • Can have "version numbers"
    • Old version may become inaccessible (at least under the current name)
    • With version numbers if you use name without number you get the highest numbered version
    • But really you do have the old (full) name accessible
      • VMS definitely did this
    • Note that directories are still mutable
    • Otherwise no create-file is possible