Operating Systems Lecture 22: Distributed File Systems - Prof. Emery Berger, Study notes of Operating Systems

A set of lecture notes from a university course on operating systems, specifically lecture 22 which covers distributed file systems. Remote file access, remote caching, network file systems, and cache update policies. It also touches on advanced file systems and their organization, as well as repairing file system inconsistencies and solutions like journaling file systems and log-structured file systems.

Typology: Study notes

Pre 2010

Uploaded on 08/18/2009

koofers-user-h9n
koofers-user-h9n 🇺🇸

10 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CMPSCI 377 Operating Systems Fall 2005
Lecture 22: December 8
Lecturer: Emery Berger Scribes: Billy Dean and Zeid Rusan
Last time: Distributed Computing
Today we’re going to learn about Distributed File Systems:
Remote File Access
Remote Caching
Network File Systems
Servers with state and without replication
22.1 Distributed File Systems
22.1.1 Remote File Access
When files are accessed remotely, first the local system calls to get the remote file have to be changed into
RPCs (Remote Procedure Calls).
Remote Caching on a Local Disk:
+ Reduces access time
+ Safe if a node fails
- Hard to keep consistent local copy
Remote Caching on Local Memory:
+ Quick access time
+ Works without a disk
- Hard to keep consistent copy in memory
- Power must be constant or data will be lost
22.2 Cache update policies
Write-through
+ Reliable
22-1
pf3
pf4
pf5

Partial preview of the text

Download Operating Systems Lecture 22: Distributed File Systems - Prof. Emery Berger and more Study notes Operating Systems in PDF only on Docsity!

CMPSCI 377 Operating Systems Fall 2005

Lecture 22: December 8

Lecturer: Emery Berger Scribes: Billy Dean and Zeid Rusan

Last time: Distributed Computing

Today we’re going to learn about Distributed File Systems:

  • Remote File Access
  • Remote Caching
  • Network File Systems
  • Servers with state and without replication

22.1 Distributed File Systems

22.1.1 Remote File Access

When files are accessed remotely, first the local system calls to get the remote file have to be changed into RPCs (Remote Procedure Calls).

Remote Caching on a Local Disk:

    • Reduces access time
    • Safe if a node fails
    • Hard to keep consistent local copy

Remote Caching on Local Memory:

    • Quick access time
    • Works without a disk
    • Hard to keep consistent copy in memory
    • Power must be constant or data will be lost

22.2 Cache update policies

Write-through

    • Reliable

22-2 Lecture 22: December 8

    • Low performance

Write-back:

    • Quick
    • Reduces network traffic
    • Users machine crashes, causing data loss

Cache consistency:

Client initiated consistency checks the consistency at every file access or at given intervals. Server initiated consistency checks the consistency on a timer

  • Server detects conflicts and invalidates the cache
  • Server needs to know which clients have cached which parts of the files
  • Server also needs to know which clients are readers and which are writers

22.3 Network File Systems

Defines a set of Remote Procedure Call operations for remote access to files

  1. Directory search, read directory entries
  2. Manipulating links and directories
  3. Accessing file attributes
  4. Read/Write files

NFS changes all requests into RPCs

    • NFS is good because it doesnt rely on all nodes being the same
    • However it is not consistent
    • Everything is done remotely, therefore...
    • It is 10 to 100 times slower than local procedure calls
    • Caching is bad as well because its all done through RPC. E.g. Four round trips of RPC are required for creating a new file.

Tip: In Linux, the /tmp directory is mounted locally. So you can copy files you want to compile into there first to get faster compiles.

Sun’s NFS:

  • The standard for UNIX

22-4 Lecture 22: December 8

  • Lots being updated
  • Scattered on disk; slow and non-atomic.

Note: Making Inodes bigger or adding more indirection will give you bigger file capacity, but indirection requires too much hopping.

22.4.3 Repairing File System Inconsistency

Unix’s command /fsck is a FS check: It detects and repairs structural problems, etc by marching though the entire disk looking for faults in all the Metadata. It is run after power outages etc But...:

    • It is SLOW
    • It might not work

The main problem: We have non-atomic writes which then have to recover.

A solution:

22.4.4 Journaling File Systems

  • Automatically write all planned transactions into a log.
  • Recovery in JFS: Some updates are fully committed to file system, find journal entries, replay the actions.

Journaling is great for Metadata.

    • Fast
    • Guarantees consistency
    • Doesn’t guarantee zero data loss

22.4.5 Log-structured File System

    • Preserves data integrity
    • Performance

Schedules periodic compaction, places Inode map to keep track of all locations. You roll back to the last Inode map if something happens.

Examples: ”Sprite” LFS, ReiserFS and X3 File systems: Data is safe and it is faster:

    • Outperforms UNIX FS for small writes and matches it for reads and large writes
    • Utilizes 70 percent of disk bandwidth even with the overhead of segment cleaning included

Lecture 22: December 8 22-

LFS structure:

  • Inode maps maintain the location of data, directories, etc.
  • Segments - large free extents for writing new data. LFS extends journaling to data (hardware RAID can do this too though)
  • Checkpoints, roll-forwards are part of this.
  • Traditional FSs have integrity problems but that is solved by Journaling. Journaling is:
    • Fast
    • Stable