Lecture Notes on Distributed File Systems, Study notes of Computer Science

These are the lecture notes for a computer science 425 course on distributed systems, specifically focused on distributed file systems. Topics such as file systems, file attributes, system modules, and requirements for distributed file systems. It also discusses the architecture and implementation of network file system (nfs) and andrew file system (afs).

Typology: Study notes

Pre 2010

Uploaded on 03/16/2009

koofers-user-k2f
koofers-user-k2f 🇺🇸

10 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Copyright 2001, Medic T. Harandi
Student Notes Pages
2002, M. T. Harandi and J. Hou (modified: I. Gupta)
Lecture 22- 1
Lecture 22- 1
Computer Science
425
Distributed Systems
Computer Science
425
Distributed Systems
Lecture 22
Distributed File Systems
Reading: Chapter 8
2002, M. T. Harandi and J. Hou (modified: I. Gupta)
Lecture 22- 2
Lecture 22- 2
File Systems
File Systems
A file is a collection of data with a user view (file structure)
and a physical view (blocks).
A directory is a file that provides a mapping from text names
to internal file identifiers.
File systems implement file management:
Naming and locating a file
Accessing a file create, delete, open, close, read, write, append,
truncate
Physical allocation of a file.
Security and protection of a file.
A distributed file system (DFS) is a file system with
distributed storage and distributed users . Files may be
located remotely on servers, and accesse d by multiple clients.
E.g., SUN NFS and AFS
DFS provides transparency of location, a ccess, and
migration of files.
DFS systems use cache replicas for effic iency and fault
tolerance
2002, M. T. Harandi and J. Hou (modified: I. Gupta)
Lecture 22- 3
Lecture 22- 3
File Attributes & System Modules
File Attributes & System Modules
File Attribute
Record Block Block Block
length
creation timestamp
read timestamp
write timestamp
attribute timestamp
reference count
file type
ownership
access control list
Directory
Module
File
Module
Access
control
Module
File
Access
Module
Block
Module
Device
Module
File System Modules
2002, M. T. Harandi and J. Hou (modified: I. Gupta)
Lecture 22- 4
Lecture 22- 4
File System Modules
File System Modules
Directory module: relates file names to file IDs
File module: relates file IDs to particular files
Access control module: checks permission for operation requested
File access module: reads or writes file data or attributes
Block module: accesses and allocates disk blocks
Device module: disk I/O and buffering
(Single host File system. DFS may require additional components.)
Layered architecture: each layer depends only on the layers below it.
2002, M. T. Harandi and J. Hou (modified: I. Gupta)
Lecture 22- 5
Lecture 22- 5
UNIX File System Operations
UNIX File System Operations
filedes = open(name, mode)
filedes = creat(name, mode)
Opens an existing file with the given name.
Creates a new file with the given name.
Both operations deliver a file descriptor referencing the open
file. The mode is read, writeor both.
status = close(filedes) Closes the open file filedes.
count = read(filedes, buffer, n)
count = write(filedes, buffer, n)
Transfers nbytes from the file referenced by filedes to buffer.
Transfers nbytes to the file referenced by filedesfrom buffer.
Both operations deliver the number of bytes actually transferred
and advance the read-write pointer.
pos = lseek(filedes, offset,
whence)
Moves the read-write pointer to offset (relative or absolute,
depending on whence).
status = unlink(name) Removes the file namefrom the directory structure. If the file
has no other links to it, it is deleted from disk.
status = link(name1, name2) Creates a new link (name2) for a file (name1).
status = stat(name, buffer) Gets the file attributes for file nameinto buffer.
2002, M. T. Harandi and J. Hou (modified: I. Gupta)
Lecture 22- 6
Lecture 22- 6
Distributed File System (DFS) Requirements
Distributed File System (DFS) Requirements
Transparency - server-side changes sho uld be invisible to
the client-side.
Access transparency: A single set of operations is provided for
access to local/remote files.
Location Transparency: All client processes see a uniform file
name space.
Migration Transparency: When files are moved from one ser ver
to another, users should not see it
Performance Transparency
Scaling Transparency
File Replication
A file may be represented by several copies for service efficiency and
fault tolerance.
Concurrent File Updates
Changes to a file by one client should not interfere with the operation of
other clients simultaneously accessing the same file.
pf3
pf4
pf5

Partial preview of the text

Download Lecture Notes on Distributed File Systems and more Study notes Computer Science in PDF only on Docsity!

Copyright 2001, Medic T. Harandi

  2002, M. T. Harandi and J. Hou (modified: I. Gupta) Lecture 22- 1Lecture 22- 1

Computer Science

Distributed Systems

Computer Science

Distributed Systems

Lecture 22

Distributed File Systems

Reading: Chapter 8

 2002, M. T. Harandi and J. Hou (modified: I. Gupta) Lecture 22- 2Lecture 22- 2

File Systems File Systems

 A file is a collection of data with a user view (file structure)

and a physical view (blocks).

 A directory is a file that provides a mapping from text names

to internal file identifiers.

 File systems implement file management:

 Naming and locating a file
 Accessing a file – create, delete, open, close, read, write, append,

truncate

 Physical allocation of a file.
 Security and protection of a file.

 A distributed file system (DFS) is a file system with

distributed storage and distributed users. Files may be
located remotely on servers, and accessed by multiple clients.
E.g., SUN NFS and AFS

 DFS provides transparency of location, access, and

migration of files.

 DFS systems use cache replicas for efficiency and fault

tolerance

^ ^ 2002, M. T. Harandi and J. Hou (modified: I. Gupta)^ Lecture 22- 3Lecture 22- 3

File Attributes & System Modules File Attributes & System Modules

File Attribute Record

Block Block Block

length
creation timestamp
read timestamp
write timestamp
attribute timestamp
reference count
file type
ownership
access control list
Directory
Module
File
Module
Access
control
Module
File
Access
Module
Block
Module
Device
Module
File System Modules

 2002, M. T. Harandi and J. Hou (modified: I. Gupta) Lecture 22- 4Lecture 22- 4

File System Modules File System Modules

Directory module: relates file names to file IDs
File module: relates file IDs to particular files
Access control module: checks permission for operation requested
File access module: reads or writes file data or attributes
Block module: accesses and allocates disk blocks
Device module: disk I/O and buffering

(Single host File system. DFS may require additional components.) Layered architecture: each layer depends only on the layers below it.

  2002, M. T. Harandi and J. Hou (modified: I. Gupta) Lecture 22- 5Lecture 22- 5

UNIX File System Operations UNIX File System Operations

filedes = open(name, mode) filedes = creat(name, mode)

Opens an existing file with the given name. Creates a new file with the given name. Both operations deliver a file descriptor referencing the open file. The mode is read, write or both. status = close(filedes) Closes the open file filedes.

count = read(filedes, buffer, n)

count = write(filedes, buffer, n)

Transfers n bytes from the file referenced by filedes to buffer. Transfers n bytes to the file referenced by filedes from buffer. Both operations deliver the number of bytes actually transferred and advance the read-write pointer.

pos = lseek(filedes, offset, whence)

Moves the read-write pointer to offset (relative or absolute, depending on whence).

status = unlink(name) Removes the file name from the directory structure. If the file has no other links to it, it is deleted from disk. status = link(name1, name2) Creates a new link (name2) for a file (name1).

status = stat(name, buffer) Gets the file attributes for file name into buffer.

 2002, M. T. Harandi and J. Hou (modified: I. Gupta) Lecture 22- 6Lecture 22- 6

Distributed File System (DFS) Requirements Distributed File System (DFS) Requirements

 Transparency - server-side changes should be invisible to

the client-side.

 Access transparency: A single set of operations is provided for

access to local/remote files.

 Location Transparency: All client processes see a uniform file

name space.

 Migration Transparency: When files are moved from one server

to another, users should not see it

 Performance Transparency

 Scaling Transparency

 File Replication

 A file may be represented by several copies for service efficiency and

fault tolerance.

 Concurrent File Updates

Changes to a file by one client should not interfere with the operation of

other clients simultaneously accessing the same file.

Copyright 2001, Medic T. Harandi

  2002, M. T. Harandi and J. Hou (modified: I. Gupta) Lecture 22- 7Lecture 22- 7

DFS Requirements (2) DFS Requirements (2)

Concurrent File Updates

One-copy update semantics: the file contents seen by all of the

processes accessing or updating a given file are those they
would see if only a single copy of the file existed.

Fault Tolerance

 At most once invocation semantics.
 At least once semantics. OK for a server protocol designed for

idempotent operations (i.e., duplicated requests do not result in invalid updates to files)

 Security

 Access Control list = per object, list of allowed users and access

allowed to each

 Capability list = per user, list of objects allowed to access and

type of access allowed (could be different for each (user,obj))

 User Authentication: need to authenticate requesting clients so

that access control at the server is based on correct user
identifiers.

 Efficiency

 Whole file v.s. block transfer

 2002, M. T. Harandi and J. Hou (modified: I. Gupta) Lecture 22- 8Lecture 22- 8

Basic File Service Model Basic File Service Model

E.g., SUN NFS (Network File System) and AFS (Andrew File

System)

 An abstract model :

 Flat file service

implements create, delete, read, write, get attribute, set

attribute and access control operations.

 Directory service: is itself a client of (i.e., uses) flat file service.

 Creates and updates directories (hierarchical file structures)

and provides mappings between user names of files and the
unique file ids in the flat file structure.

 Client service: A client of directory and flat file services

Runs in each client’s computer, integrating and expanding

flat file and directory services to provide a unified API (e.g.,
the full set of UNIX file operations).

 Holds information about the locations of the flat file server

and directory server processes.

^ ^ 2002, M. T. Harandi and J. Hou (modified: I. Gupta)^ Lecture 22- 9Lecture 22- 9

File Service Architecture File Service Architecture

Client computer Server computer

Application program

Application program

Client module

Flat file service

Directory service

 2002, M. T. Harandi and J. Hou (modified: I. Gupta) Lecture 22- 10Lecture 22- 10

Flat File Service Operations Flat File Service Operations

Read(FileId, i, n) -> Data
— throwsBadPosition
If 1 ≤ i ≤ Length(File): Reads a sequence of up to n items
from a file starting at item i and returns it in Data.
Write(FileId, i, Data)
— throwsBadPosition
If 1 ≤ i ≤ Length(File)+1: Writes a sequence of Data to a
file, starting at item i, extending the file if necessary.
Create() -> FileId Creates a new file of length 0 and delivers a UFID for it.
Delete(FileId) Removes the file from the file store.
GetAttributes(FileId)->Attr Returns the file attributes for the file.
SetAttributes(FileId, Attr) Sets the file attributes (only those attributes that are not
shaded in ).

(1) Repeatable operation: No read-write pointer. Except for Create and delete, the operations are idempotent, allowing the use of at least once RPC semantics. (2) Stateless servers: No file descriptors. Stateless servers can be restarted after a failure and resume operation without the need to restore any state.

In contrast, the UNIX file operations are neither idempotent nor consistent, because (a) a read-write pointer is generated by the UNIX file system whenever a file is opened. (b) If an operation is accidentally repeated, the automatic advance of the read/write pointer results in access to different positions of the file.

  2002, M. T. Harandi and J. Hou (modified: I. Gupta) Lecture 22- 11Lecture 22- 11

Access Control Access Control

  • In UNIX, the user’s access rights are checked

against the access mode requested in the open

call and the file is opened only if the user has the

necessary rights.

  • In DFS, a user identity has to be passed with

requests – server first authenticates the user.

  • An access check is made whenever a file name is converted to
a UFID (unique file id), and the results are encoded in the form
of a capability which is returned to the client for future access.
» Capability = per user, list of objects allowed to access and
type of access allowed (could be broken up per (user,obj))
  • A user identity is submitted with every client request, and an
access check is performed for every file operation.

 2002, M. T. Harandi and J. Hou (modified: I. Gupta) Lecture 22- 12Lecture 22- 12

Directory Service Operations Directory Service Operations

Lookup(Dir, Name) -> FileId — throwsNotFound

Locates the text name in the directory and returns the relevant UFID. If Name is not in the directory, throws an exception.

AddName(Dir, Name, File) — throwsNameDuplicate

If Name is not in the directory, adds (Name, File) to the directory and updates the file’s attribute record. If Name is already in the directory: throws an exception.

UnName(Dir, Name) — throwsNotFound

If Name is in the directory: the entry containing Name is removed from the directory. If Name is not in the directory: throws an exception. GetNames(Dir, Pattern)->NameSeq Returns all the text names in the directory that match the regular expression Pattern. Like grep.

(1) Hierarchic file system: The client module provides a function that gets the UFID of a file given its pathname. The function interprets the pathname starting from the root, using Lookup to obtain the UFID of each directory in the path.

(2) Each server may hold several file groups, each of which is a collection of files located on the server. A file group identifier consists of IP address + date, and allows (i) file groups to migrate across servers, and (ii) clients to access file groups.

Copyright 2001, Medic T. Harandi

  2002, M. T. Harandi and J. Hou (modified: I. Gupta) Lecture 22- 19Lecture 22- 19

Server Caching Server Caching

  • File pages, directories and file attributes that have

been read from the disk retained in a main

memory buffer cache.

  • Read-ahead anticipates read accesses and

fetches the pages following those that have most

recently been read.

  • In delayed-write, when a page has been altered,

its new contents are written back to the disk only

when the buffered page is required for another

client.

  • In comparison, Unix sync operation writes pages to disk every
30 seconds
  • In write-through, data in write operations is stored

in the memory cache at the server immediately

and written to disk before a reply is sent to the

client.

  • Better strategy to ensure data integrity even when server
crashes occur. More expensive.

 2002, M. T. Harandi and J. Hou (modified: I. Gupta) Lecture 22- 20Lecture 22- 20

Client Caching Client Caching

  • A timestamp-based method is used to validate

cached blocks before they are used.

  • Each data item in the cache is tagged with
    • Tc: the time when the cache entry was last validated.
    • Tm: the time when the block was last modified at the server.
    • A cache entry at time T is valid if
(T-Tc < t) or (Tm client = Tm server).
  • t=freshness interval
» Compromise between consistency and efficiency
» Sun Solaris: t is set adaptively between 3-30 seconds for
files, 30-60 seconds for directories

^ ^ 2002, M. T. Harandi and J. Hou (modified: I. Gupta)^ Lecture 22- 21Lecture 22- 21

Client Caching (Cont’d) Client Caching (Cont’d)

  • When a cache entry is read, a validity check is

performed.

  • If the first half of validity condition (previous slide) is true, the
the second half need not be evaluated.
  • If the first half is not true, Tm (^) server is obtained (via getattr() to
server) and compared against Tm client
  • When a cached page (not the whole file) is

modified, it is marked as dirty and scheduled to

be flushed to the server.

  • Modified pages are flushed when the file is closed or a sync
occurs at the client.
  • Does not guarantee one-copy update semantics.
  • More details in textbook – please read up

 2002, M. T. Harandi and J. Hou (modified: I. Gupta) Lecture 22- 22Lecture 22- 22

Andrew File System (AFS) Andrew File System (AFS)

  • Two unusual design principles:
    • Whole file serving » Not in blocks
    • Whole file caching » Permanent cache, survives reboots
  • Based on (validated) assumptions that
    • Most file accesses are by a single user
    • Most files are small
    • Even a client cache as “large” as 100MB is supportable (e.g., in RAM)
    • File reads are much more often that file writes, and typically sequential
  • We’ll see overview only

  2002, M. T. Harandi and J. Hou (modified: I. Gupta) Lecture 22- 23Lecture 22- 23

Distribution of Processes in the Andrew File

System

Distribution of Processes in the Andrew File

System

Venus

Workstations Servers

Venus

User^ Venus program

Network

UNIX kernel

UNIX kernel

Vice

User program

User program

Vice UNIX kernel

UNIX kernel

UNIX kernel

Vice and Venus are Unix processes  2002, M. T. Harandi and J. Hou (modified: I. Gupta) Lecture 22- 24Lecture 22- 24

System Call Interception in AFS System Call Interception in AFS

UNIX file
system calls
Non-local file
operations
Workstation
Local
disk
User
program
UNIX kernel
Venus
UNIX file system
Venus

Modified version of BSD, designed to intercept open, close, and some other file system calls.

Copyright 2001, Medic T. Harandi

  2002, M. T. Harandi and J. Hou (modified: I. Gupta) Lecture 22- 25Lecture 22- 25

Implementation of File System Calls in AFS Implementation of File System Calls in AFS

User process UNIX kernel Venus Net Vice open(FileName, mode)

If FileNamerefers to a file in shared file space, pass the request to Venus.

Open the local file and return the file descriptor to the application.

Check list of files in local cache. If not present or there is no valid callback promise, send a request for the file to the Vice server that is custodian of the volume containing the file.

Place the copy of the file in the local file system, enter its local name in the local cache list and return the local name to UNIX.

Transfer a copy of the file and acallback promiseto the workstation. Log the callback promise.

read(FileDescriptor, Buffer, length)

Perform a normal UNIX read operation on the local copy. write(FileDescriptor, Buffer, length)

Perform a normal UNIX write operation on the local copy. close(FileDescriptor) Close the local copy and notify Venus that the file has been closed.If the local copy hasbeen changed, send a copy to the Vice server that is the custodian of the file.

Replace the file contents and send a callback to all other clients holdingcallback promiseson the file.  2002, M. T. Harandi and J. Hou (modified: I. Gupta) Lecture 22- 26Lecture 22- 26

Summary Summary

  • Distributed File system requirements –

transparency, etc.

  • NFS and AFS
  • Vnodes (NFS), mounting, caching, whole file

caching (AFS)

  • Next lecture: Replication Control
    • Sections 15.1-15.3.
    • MP2 due Nov 11 (Sunday)
    • No office hours this Thursday (Nov 8)

^ ^ 2002, M. T. Harandi and J. Hou (modified: I. Gupta)^ Lecture 22- 27Lecture 22- 27

Optional Material Optional Material

 2002, M. T. Harandi and J. Hou (modified: I. Gupta) Lecture 22- 28Lecture 22- 28

The Mount Service in NFS The Mount Service in NFS

student
usr
users
nfs
pet jim bob
staff
people
org
mth john^ bob
Each server keeps a record of local files available for
remote mounting. Clients use a mount command for
remote mounting, providing name mappings
Remote
Mount
Server 1 Client Server 2

  2002, M. T. Harandi and J. Hou (modified: I. Gupta) Lecture 22- 29Lecture 22- 29

Mount Service Mount Service

  • Clients use the UNIX mount command and specify the
remote host name, pathname of a directory in the remote
filesystem and the local name with which it is to be
mounted.
  • The mount command communicates with the mount service
process on the remote host via RPC.
  • The RPC operation takes the directory pathname and returns the file handle of the specified directory.
  • The location of the server (IP address and port number) and the file handle for the remote directory are passed on to the VFS module and the NFS client.
  • On each server, there is a file with a well-known name
(/etc/exports) containing the names of local filesystems that
are available for remote mounting.

 2002, M. T. Harandi and J. Hou (modified: I. Gupta) Lecture 22- 30Lecture 22- 30

Automounter Automounter

  • The automounter
    • Is a local NFS server at the client machine that mounts a remote directory dynamically whenever an empty mount point is referenced by a client.
    • Maintains a table of mount points (pathnames) with a reference to one or more NFS servers listed.
  • When the NFS client module attempts to resolve a
pathname that includes an empty mount point, it passes a
lookup() request to the automounter.
  • The automounter locates the required filesystem in its table
and sends a probe request to each server listed.
  • The filesystem on the first server to respond is then
mounted at the client.