Membrane: Operating System Support for Restartable File Systems

Swaminathan Sundararaman, Sriram Subramanian, Abhishek Rajimwale,

Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Michael M. Swift

Computer Sciences Department, University of Wisconsin, Madison

Abstract

We introduce Membrane, a set of changes to the oper-

ating system to support restartable file systems. Mem-

brane allows an operating system to tolerate a broad

class of file system failures and does so while remain-

ing transparent to running applications; upon failure, the

file system restarts, its state is restored, and pending ap-

plication requests are serviced as if no failure had oc-

curred. Membrane provides transparent recovery through

a lightweight logging and checkpoint infrastructure, and

includes novel techniques to improve performance and

correctness of its fault-anticipation and recovery machin-

ery. We tested Membrane with ext2, ext3, and VFAT.

Through experimentation, we show that Membrane in-

duces little performance overhead and c an tolerate a wide

range of file system crashes. More critically, Membrane

does so with little or no change to existing file systems

thus improving robustness to crashes without mandating

intrusive changes to existing file-system code.

1 Introduction

Operating systems crash. Whether due to software

bugs [8] or hardware bit-flips [22], the reality is clear:

large code bases are brittle and the smallest problem in

software implementation or hardware environment can

lead the entire monolithic operating system to fail.

Recent research has made great headway in operating-

system crash tolerance, particularly in surviving device

driver failures [9, 10, 13, 14, 20, 31, 32, 37, 40]. Many

of these approaches achieve some level of fault toler-

ance by building a hard wall around OS subsystems using

address-space based isolation and microrebooting [2, 3]

said drivers upon fault detection. For example, Nooks

(and follow-on work with Shadow Drivers) encapsulate

device drivers in their own protection domain, thus mak-

ing it challenging for errant driver code to overwrite data

in other parts of the kernel [31, 32]. Other approaches

are similar, using variants of microkernel-based architec-

tures [7, 13, 37] or virtual machines [10, 20] to isolate

drivers from the kernel.

Device drivers are not the only OS subsystem, nor are

they necessarily where the most important bugs reside.

Many recent studies have shown that file systems contain

a large number of bugs [5, 8, 11, 25, 38, 39]. Perhaps

this is not surprising, as file systems are one of the largest

and most complex code bases in the kernel. Further,

file systems are still under active development, and new

ones are introduced quite frequently. For example, Linux

has many established file systems, including ext2 [34],

ext3 [35], reiserfs [27], and still there is great interest in

next-generation file systems such as Linux ext4 and btrfs.

Thus, file systems are large, complex, and under develop-

ment, the perfect storm for numerous bugs to arise.

Because of the likely presence of flaws in their imple-

mentation, it is critical to consider how to recover from

file system crashes as well. Unfortunately, we cannot di-

rectly apply previous work from the device-driver litera-

ture to improving file-system fault recovery. File systems,

unlike device drivers, are extremely stateful, as they man-

age vast amounts of both in-memory and persistent data;

making matters worse is the fact that file systems spread

such state across many parts of the kernel including the

page cache, dynamically-allocated memory, and so forth.

On-disk state of the file system also needs to be consis-

tent upon restart to avoid any damage to the stored data.

Thus, when a file system crashes, a great deal m ore care is

required to recover while keeping the rest ofthe OS intact.

In this paper, we introduce Membrane, an operating

system framework to support lightweight, stateful recov-

ery from file system crashes. During normal operation,

Membrane logs file system operations, tracks file sys-

tem objects, and periodically performs lig htweight check-

points of file system state. If a file system crash oc-

curs, Membrane parks pending requests, cleans up ex-

isting state, restarts the file system from the most recent

checkpoint, and replays the in-memory operation log to

restore the state of the file system. Once finished with re-

covery, Membrane begins to service application requests

again; applications are unaware of the crash and restart

except for a small performance blip during recovery.

Membrane achieves its performance and robustness

through the application of a num ber of novel mechanisms.

For example, a generic checkpointing mechanism enables

low-cost snapshots of file system-state that serve as re-

covery points after a crash with minimal support from ex-

isting file systems. A page stealing technique greatly re-

duces logging overheads of write operations, which would

otherwise increase time and space overheads. Finally, an

intricate skip/trust unwind protocol is applied to carefully

unwind in-kernel threads through both the crashed file

Membrane: Operating System Support for Restartable File Systems, Lecture notes of Operating Systems

Related documents

Partial preview of the text

Download Membrane: Operating System Support for Restartable File Systems and more Lecture notes Operating Systems in PDF only on Docsity!

Swaminathan Sundararaman, Sriram Subramanian, Abhishek Rajimwale,

Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Michael M. Swift

Computer Sciences Department, University of Wisconsin, Madison

2.1 Nooks and Shadow Drivers

2.2 SafeDrive

2.3 CuriOS

3.3 Overview

o p e n ( " fi l e " ) w r i t e ( 4 K ) w r i t e ( 4 K )

F D 3

F i l e p o s i t i o n 0

F D 3

F i l e p o s i t i o n 4 K

F D 3

F i l e p o s i t i o n 8 K

s u c c e s s s u c c e s s s u c c e s s

F D 3

F i l e p o s i t i o n??

C h

ec

k p

i on

t

C r a s h!

R o l l b a c k

U n w i n d

R e p l a y ( w 1 )

w 0 : w 1 : w 2 :

4 R^ e^ t^ r^ y^ ( w 2 )

w r i t e ( 4 K )

3.4 Fault Detection

3.5 Fault Anticipation

4.1 Linux Background

4.2 Fault Detection

4.3 Fault Anticipation

s y s _ o p e n ( )

d o _ s y s _ o p e n ( )

f i l p _ o p e n ( )

o p e n _ n a m e i^ ( )

v f^ s _ c r e a t e ( )

e x t 2 _ c r e a t e ( )

e x t 2 _ a d^ d^ l^ i^ n k ( )

e x t 2 _ p r e p a r e _ w r i^ t e ( )

b l o c k _ p r e p a r e _ w r i t e ( )

e x t 2 _ g e t _ b^ l^ o c k ( )

d o _ s y s _ o p e n ( ) c l e a n u p

r e^ l^ e^ a^ s^ e^ f^ d^

o p e n _ n a m e i ( ) c l e a n u p

r e^ l^ e^ a^ s^ e^ n^ a^ m^ e^ i^ d^ a^ t^ a

b l o c k _ p r e p a r e _ w r i t e ( ) c l e a n u p

c l e a r b u f f e r

z e^ r o^ p^ a^ g^ e

m a r k n o t d i r t y

f a u l t m e m b r a^ n^ e

f a u l t m e m b r a^ n^ e

s k i p

s k i p

5.1 Transparency

5.3 Generality