Log Structured FS, Lecture Slide - Computer Science, Slides of Computer Numerical Control

Log structure, File systems, Motivation, LFS writes, Floating inodes, LFS data structure, Compaction, Threading, Cleaning process, Cost benefit analysis, Postscript, Array of disk

Typology: Slides

2010/2011

Uploaded on 10/07/2011

christina
christina 🇺🇸

4.6

(23)

393 documents

1 / 17

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Log Structured FS
Arvind Krishnamurthy
Spring 2004
Log Structured File Systems
nRadical, different approach to designing file systems
nTechnology motivations: some technologies are advancing
more faster than others
nCPU are getting faster every year (x2 every 1-2 years)
nEverything else except CPU will become a bottleneck (Amdahl’s
law)
nDisks are not getting much faster
nMemory is growing in size dramatically (x2 every 1.5 years)
nFile systems èFile caches are a good idea (cut down on disk
bandwidth)
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Log Structured FS, Lecture Slide - Computer Science and more Slides Computer Numerical Control in PDF only on Docsity!

Log Structured FS

Arvind Krishnamurthy

Spring 2004

Log Structured File Systems

n Radical, different approach to designing file systems

n Technology motivations: some technologies are advancing

more faster than others

n CPU are getting faster every year (x2 every 1-2 years) n Everything else except CPU will become a bottleneck (Amdahl’s law) n Disks are not getting much faster n Memory is growing in size dramatically (x2 every 1.5 years) n File systems Ë File caches are a good idea (cut down on disk bandwidth)

Motivation (contd.)

n File System motivations:

n File caches help reads a lot n File caches do not help writes very much n Delayed writes help but cannot delay for ever n File caches make disk writes more frequent than disk reads n Files are mostly small -- too much synchronous I/O n Disk geometries not predictable n RAID: whole bunch of disks with data striped across them n Increases bandwidth, but does not change latency n Does not help small files (more on this later)

LFS Writes

n Treat disk as a tape!

n Buffer recent writes in memory n Log append only – no overwrite in place n Log is the only thing on disk! Main storage structure

n When you create a small file (less than a block): n Write data block to memory log n Write file inode to memory log n Write directory block to memory log n Write directory inode to memory log n When memory accumulates to say 1MB or say 30 seconds have elapsed, write log to disk as a single write

n No seeks for writes

n But inodes are now floating

LFS: floating inodes

When write: n Append data, inode, piece of inode-map to the log

n Record location of piece of inode map in map of inode map (in memory)

n Checkpoint map of inode map once in a while

LFS Data structures

When read: n From map map, to inode map, to inode to block

n Get some locality in inode map

n Cache a lot of hot pieces of inode map

n Number of I/Os per read: a little worse than FFS

LFS Data structures (contd.)

When recover: n Read checkpoint, get map of map

n Roll forward in log to update map of map

Wrap Around Problem

n Pretty soon you run out of space on the disk

n Log needs to wrap around

n Two approaches:

n Compaction n Threading

n Sprite (first implementation of LFS):

n Combination of the two; open up free segments & avoid copying

Combined Solution

n Want benefits of both: n Compaction: big free space n Threading: leave long living things in place so they aren’t copied again and again

n Solution: “segmented log” n Chop disk into a bunch of large “segments” n Compaction within segments n Threading among segments n Always write to the “current clean” segment before moving onto next one n Segment cleaner: pick some segments and collect their live data together

Recap

n In LFS, everything is stored in a single log

n Carry over the data-blocks and I-node data structures from Unix n Buffer writes and write them to disk as a sequential log n Use inode-map and inode-map-map to keep track of floating I- nodes n Cache (in memory) typically minimizes the cost of the extra levels of indirection n Inode-map-map and pieces of inode-map are cache in memory

Cleaning

n Eventually the log could fill the entire disk

n Reclaim the holes in the log. Two approaches: n Compaction of entire disk n Threading over live data n LFS uses a hybrid strategy. Divides disk into “segments” n Threads over non-empty segments n Segments guarantee that seek costs are amortized n Every once in a while, picks a few segments, compacts them to generate empty segments

Cleaning Process

n When to clean?

n When the number of free segments falls below a certain threshold

n Choosing a segment to clean:

n Will be based on amount of live data it contains n Segment usage table: tracks number of live bytes in each segment n When you rewrite I-nodes/data blocks, find the old segment in which they used to live, and decrement the usage count for the old segment

Cleaning Goals

n Want bimodal distribution:

n Small number of low-utilized segments n So that cleaner can always find easy segments to clean n Large number of highly-utilized segments n So that disk is well utilized

# segs

u

Greedy Cleaner

n Greedy cleaner: pick the lowest “u” to clean

n Workload #1: uniform (pick random files to overwrite)

n Workload #2: hot-cold workload (90% of the updates to 10% of the files)

Greedy Cleaner

n Greedy strategy is not creating a bimodal distribution

n Slow moving segments likely to make the cleaning threshold high

n Separation of data into hot & cold data also didn’t help

Better Approach

n Cold segment space more valuable: if you clean cold segments, takes them longer to come back

n Hot free space is less valuable: might as well wait a bit longer

When is LFS good?

n LFS does well on “common” cases

n LFS degrade for “corner” cases

Why this is good research?

n Driven by keen awareness of technology trend

n Willing to radically depart from conventional practice

n Yet keep sufficient compatibility to keep things simple and

limit grunge work

n Provide insight with simplified math

n Simulation to evaluate and validate ideas

n Solid real implementation and measurements

Announcements

n Design review meetings:

n Tomorrow from 2-4pm n Thursday from 2-4pm with Zheng Ma

n Suggested background readings:

n RAID paper n Unix Time Sharing System paper

RAIDs and availability

n Suppose you need to store more data than fits on a single disk (e.g., large database or file servers). How should arrange data across disks?

n Option 1: treat disks as huge pool of disk blocks n Disk1 has blocks 1, 2, …, N n Disk2 has blocks N+1, N+2, …, 2N n …………

n Option 2: Stripe data across disks, with k disks:

n Disk1 has blocks 1, k+1, 2k+1, … n Disk2 has blocks 2, k+2, 2k+2, … n …………

n What are the advantages/disadvantages of the two options?

Writes to RAID 4

n Large writes which accesses all disks (say, a stripe of

blocks)

n Compute the parity block and store it on the parity disk

n Small writes. Two options:

n Read current stripe of blocks, compute parity with the new block, write parity block n Better option: n Read current version of block being written n Read current version of parity block n Compute how parity would change: n If a bit on block changed, the corresponding parity bit needs to be flipped n Write new version of block n Write new version of parity block

n Disk containing parity block is updated on all writes

Distributed Parity

n Parity blocks are distributed across disks

n Spreads load evenly n Multiple writes could potentially be serviced at the same time n All disks can be used for servicing reads

Comparison

n RAID-5 vs. normal disks:

n RAID-5: better throughput, better reliability, good bandwidth for large reads, small waste of space n Normal disks: perform better for small writes

n RAID-1 vs. RAID-5: Which is better?

n RAID-1 wastes more space n For small writes: RAID-1 is better

n HP-AutoRAID system:

n Stores hot data in RAID- n Cold data in RAID- n Does automatic background propagation of data as working set changes