File System Implementation: Accessing Files and File System Optimizations, Study notes of Computer Science

An in-depth look into the implementation of file systems, focusing on how to find and access file blocks in dos and unix systems. It also covers file system optimizations such as caching and free-space management. Examples and explanations of various file system concepts.

Typology: Study notes

Pre 2010

Uploaded on 09/24/2009

koofers-user-1pj
koofers-user-1pj ๐Ÿ‡บ๐Ÿ‡ธ

10 documents

1 / 34

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Operating Systems
Operating Systems
CMPSC 473
CMPSC 473
File System
File System
Implementation
Implementation
April 10, 2008 - Lecture
April 10, 2008 - Lecture
21
21
Instructor: Trent Jaeger
Instructor: Trent Jaeger
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22

Partial preview of the text

Download File System Implementation: Accessing Files and File System Optimizations and more Study notes Computer Science in PDF only on Docsity!

Operating SystemsOperating Systems

CMPSC 473 CMPSC 473

File System File System ImplementationImplementation

April 10, 2008 - Lecture April 10, 2008 - Lecture 2121

Instructor: Trent Jaeger Instructor: Trent Jaeger

  • Last class:
    • File System Implementation Basics
  • Today:
    • File System Implementation Optimizations

Directory

โ€ข Contains a sequence (table) of entries for

each file.

โ€ข In DOS, each entry has

  • [Fname , Extension , Attributes , Time , Date ,

Size , First Block #]

โ€ข In UNIX, each entry has

  • [Fname, i-node #]

Accessing a file block in DOS \a\b\c

  • Go to โ€œ\โ€ FAT entry (in memory)
  • Go to corresponding data block(s) of โ€œ\โ€ to find entry for โ€œaโ€
  • Read 1 st data block of โ€œaโ€ to check if โ€œbโ€ present. Else, use the FAT entry to find the next block of โ€œaโ€ and search again for โ€œbโ€, and so on. Eventually you will find entry for โ€œbโ€.
  • Read 1 st data block of โ€œbโ€ to check if โ€œcโ€ present. .....
  • Read the relevant block of โ€œcโ€, by chasing the FAT entries in memory.

Accessing a file block in UNIX /a/b/c

  • Get block after block of โ€œbโ€ till entry for โ€œcโ€ is found (gives its i-node #)
  • Get i-node of โ€œcโ€ from disk
  • Find out whether block you are searching for is in 1 st 10 ptrs, or 1-level or 2-level or 3-level indirect.
  • Based on this you can either directly get the block, or retrieve it after going through the levels of indirection.
  • Imagine searching through the inodes each time

you do a read() or write() on a file

  • Too much overhead!
  • However, once you have the i-node of the file (or a

FAT entry in DOS), then it is fairly efficient!

  • You want to cache the i-node (or the id of the FAT

entry) for a file in memory and keep re-using it.

  • Even if after all this (i.e. bringing the pointers to blocks of a file into memory), may not suffice since we still need to go to disk to get the blocks themselves.
  • How do we address this problem?
    • Cache disk (data) blocks in main memory โ€“ called file caching

File Caching/Buffering

  • Cache disk blocks that are in need in physical memory.
  • On a read() system call, first look up this cache to check if block is present. - This is done in software - Look up is done based on logical block id. - Typically perform some kind of โ€œhashingโ€
  • If present, copy this from OS cache/buffer into the data structure passed by user in the read() call.
  • Else, read block from disk, put in OS cache and then copy to user data structure.
  • On a write, should we do write-back or a write-

through?

  • With write-back, you may loose data that is written if machine goes down before write-back
  • With write-through, you may be loosing performance
    • Loss in opportunity to perform several writes at a time
    • Perhaps the write may not even be needed!
  • DOS uses write-through
  • In UNIX,
  • writes are buffered, and they are propagated in the background after a delay, i.e. every 30 secs there is a sync() call which propagates dirty blocks to disk.
  • This is usually done in the background.
  • Metadata (directories/i-nodes) writes are propagated immediately.

Cache space is limited!

  • We need a replacement algorithm.
  • Here we can use LRU, since the OS gets called on each reference to a block and the management is done in software.
  • However, you typically do not do this on demand!
  • Use High and Low water marks:
    • When the # of free blocks falls below Low water mark, evict blocks from memory till it reaches High water mark.

Block Sizes

  • Larger block sizes => higher internal

fragmentation.

  • Larger block sizes => higher disk transfer rates
  • Median file size in UNIX environments ~ 1K
  • Typical block sizes are of the order of 512, 1K or

2K.

Free Space

  • Find the block to use when one is needed
    • Find space quickly
    • Keep storage reasonable
  • Options
    • Bit vector
    • Linked List
    • Grouping
    • Counting

Free-Space Management

  • Bit vector downside
    • Space
  • Example:

block size = 2

12

bytes

disk size = 2

30

bytes (1 gigabyte)

n = 2

30

12

18

bits (or 32K bytes)

Free-Space Linked List