






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The concept of file systems, which are used to organize and manage data on data storage devices such as hard disk drives, floppy disks, optical discs, or flash memory storage devices. It covers aspects such as space management, file names, directories, and metadata. It also discusses the utilities that are included in file systems to initialize, alter parameters of and remove an instance of the filesystem, and create, rename and delete directory entries and alter metadata associated with a directory.
Typology: Summaries
1 / 12
This page cannot be seen from the preview
Don't miss anything!







A file system (or filesystem ) is a means to organize data expected to be retained after a program terminates by providing procedures to store, retrieve and update data, as well as manage the available space on the device(s) which contain it. A file system organizes data in an efficient manner and is tuned to the specific characteristics of the device. There is usually a tight coupling between the operating system and the file system. Some filesystems provide mechanisms to control access to the data and metadata. Ensuring reliability is a major responsibility of a filesystem. Some filesystems provide a means for multiple programs to update data in the same file at nearly the same time. File systems are used on data storage devices such as hard disk drives, floppy disks, optical discs, or flash memory storage devices to maintain the physical location of the computer files. They may provide access to data on a file server by acting as clients for a network protocol (e.g. NFS, SMB, or 9P clients), or they may be virtual and exist only as an access method for virtual data (e.g. procfs). This is distinguished from a directory service and registry.
Example of slack space, demonstrated with 4,096-byte NTFS clusters: 100,000 files, each 5 bytes per file, equals 500, bytes of actual data, but requires 409,600,000 bytes of disk space to store File systems allocate space in a granular manner, usually multiple physical units on the device. The file system is responsible for organizing files and directories, and keeping track of which areas of the media belong to which file and which are not being used. For example, in Apple DOS of the early 1980s, 256- byte sectors on 140 kilobyte floppy disk used a track/sector map. This results in unused space when a file is not an exact multiple of the allocation unit, sometimes referred to as slack space. For a 512-byte allocation, the average unused space is 255 bytes. For a 64 KB clusters, the average unused space is 32KB. The size of the allocation unit is chosen when the file system is created. Choosing the allocation size based on the average size of the files expected to be in the filesystem can minimize the amount of unusable space. Frequently the default allocation may provide reasonable usage. If it can be anticipated that a file system will contain mostly small files a small cluster size should be chosen. Choosing an allocation size that is too small results in excessive overhead if the file system will contain mostly very large files. File system fragmentation occurs when unused space or single files are not contiguous. As a filesystem is used, files are created, modified and deleted. When a file is created the filesystem allocates space for the data. Some filesystems permit or require specifying an initial space allocation and subsequent incremental allocations as the file grows. As files are deleted the space they were allocated eventually is considered available for use by other files. This creates alternating used and unused areas of various sizes. This is free space fragmentation. When a file is created and there is not an area of contiguous space available for its initial allocation the space must be assigned in fragments. When a file is modified such that it becomes larger it may exceed the space initially allocated to it, another allocation must be assigned elsewhere and the file becomes fragmented. A file system may not make use of a storage device but can be used to organize and represent access to any data, whether it is stored or dynamically generated (e.g. procfs).
A file name (or filename ) is used to reference the storage location in the filesystem. Most filesystems have restrictions on the length of the filename. Some filesystems have case insensitive filenames. Most file system interface utilities place restrictions on the characters permitted in the filename restricting some special characters to provide a syntax to indicate a device, device type, directory prefix or file type.
These are typically not file system restrictions and utilities may provide a means to refer to files with embedded special characters such as enclosing the entire filename within quotes ("). Avoiding using special characters makes it easier for users to refer to files. Some filesystem utilities, editors and compilers treat prefixes and suffixes in a special way. These are usually merely conventions and not implemented within the filesystem.
File systems typically have directories (sometimes called folders) which allow the user to group files. This may be implemented by connecting the file name to an index in a table of contents or an inode in a Unix- like file system. Directory structures may be flat (i.e. linear), or allow hierarchies where directories may contain subdirectories. The first file system to support arbitrary hierarchies of directories was the file system in the Multics operating system.[1]^ The native file systems of Unix-like systems also support arbitrary directory hierarchies, as do, for example, Apple's Hierarchical File System and its successor HFS+ in classic Mac OS (HFS+ is still used in Mac OS X), the FAT file system in MS-DOS 2. and later andMicrosoft Windows, the NTFS file system in the Windows NT family of operating systems, and the ODS-2 and higher levels of the Files-11 file system in OpenVMS.
Other bookkeeping information is typically associated with each file within a file system. The length of the data contained in a file may be stored as the number of blocks allocated for the file or as a byte count. The time that the file was last modified may be stored as the file's timestamp. File systems might store the file creation time, the time it was last accessed, the time the file's meta-data was changed, or the time the file was last backed up. Other information can include the file's device type (e.g. block, character, socket, subdirectory, etc.), its owner user ID and group ID, and its access permission settings (e.g. whether the file is read-only, executable, etc.). Additional attributes can be associated on file systems, such as NTFS, XFS, ext2/ext3, some versions of UFS, and HFS+, using extended file attributes. Some file systems provide for user defined attributes such as the author of the document, the character encoding of a document or the size of an image. Some file systems allow for different data collections to be associated with one file name. These separate collections may be referred to as streams or forks. Apple has long used a forked file system on the Macintosh, and Microsoft supports streams in NTFS. Some file systems maintain multiple past revisions of a file under a single file name; the filename by itself retrieves the most recent version, while prior saved version can be accessed using a special naming convention such as "filename;4" or "filename(-4)" to access the version four saves ago.
File systems include utilities to initialize, alter parameters of and remove an instance of the filesystem. Some include the ability to extend or truncate the space allocated to the file system. Directory utilities create, rename and delete directory entries and alter metadata associated with a directory. They may include a means to create additional links to a directory (hard links in Unix), rename parent links (".." in Unix-like OS), and create bidirectional links to files. File utilities create, list, copy, move and delete files, and alter metadata. They may be able to truncate data, truncate or extend space allocation, append to, move, and modify files in-place. Depending on the underlying structure of the filesystem, they may provide a mechanism to prepend to, or truncate from, the beginning of a file, insert entries into the middle of a file or deletion entries from a file. Also in this category are utilities to free space for deleted files if the filesystem provides an undelete function. Some filesystems defer reorganization of free space, secure erasing of free space and rebuilding of hierarchical structures. They provide utilities to perform these functions at times of minimal activity. Included in this category is the infamous defragmentation utility. Some of the most important features of filesystem utilities involve supervisory activities which may involve bypassing ownership or direct access to the underlying device. These include high performance backup
Another approach is to partition the disk so that several filesystems with different attributes can be used. One filesystem, for use as browser cache, might be configured with a small allocation size. This has the additional advantage of keeping the frantic activity of creating and deleting files typical of browser activity in a narrow area of the disk and not interfering with allocations of other files. A similar partition might be created for email. Another partition, and filesystem might be created for the storage of audio or video files with a relatively large allocation. One of the filesystems may normally be set read-only and only periodically be set writable. Multiple filesystems on a single system has the additional benefit that in the event of a corruption of a single partition, the remaining filesystems will frequently be still intact. This includes virus destruction of the system partition or even a system that will not boot. Filesystem utilities which require dedicated access can effectively be completed piecemeal, in addition defragmentation may be more effective. Several system maintenance utilities such as virus scans and backups can also be processed in segments. For example it is not necessary to back up the filesystem containing videos along with all the other files if none have been added since the last backup.
All file systems have some functional limit that defines the maximum storable data capacity within that system. These functional limits are a best-guess effort by the designer to determine how large the storage systems will be right now, and how large storage systems are likely to become in the future. Disk storage has continued to increase at near exponentialrates (see Moore's law), so after a few years file systems have kept reaching design limitations that require computer users to repeatedly move to a newer system with ever greater capacity. File system complexity typically varies proportionally with the available storage capacity. The file systems of early 1980s home computers with 50 KB to 512 KB of storage would not be a reasonable choice for modern storage systems with hundreds of gigabytes of capacity. Likewise, modern file systems would not be a reasonable choice for these early systems, since the complexity of modern file system structures would consume most or all of the very limited capacity of the early storage systems.
File system types can be classified into disk/tape file systems, network file systems and special purpose file systems.
A disk file system takes advantages of the ability of disk storage media to randomly address data in a short amount of time. Additional considerations include the speed of accessing data following that initially requested and the anticipation that the following data may also be requested. This permits multiple users (or processes) access to various data on the disk without regard to the sequential location of the data. Examples include FAT (FAT12, FAT16, FAT32), exFAT, NTFS, HFS and HFS+, HPFS, UFS, ext2, ext3, ext4, btrfs, ISO 9660, Files-11, Veritas File System, VMFS, ZFS, ReiserFS and UDF. Some disk file systems are journaling file systems or versioning file systems.
ISO 9660 and Universal Disk Format (UDF) are two common formats that target Compact Discs, DVDs and Blu-ray discs. Mount Rainier is an extension to UDF supported by Linux 2.6 series and Windows Vista that facilitates rewriting to DVDs.
A flash file system considers the special abilities, performance and restrictions of flash memory devices. Frequently a disk file system can use a flash memory device as the underlying storage media but it is much better to use a filesystem specifically designed for a flash device.
A tape file system is a file system and tape format designed to store files on tape in a self-describing form. Magnetic tapes are sequential storage media with significantly longer random data access times than disks, posing challenges to the creation and efficient management of a general-purpose file system.
In a disk file system there is typically a master file directory, and a map of used and free data regions. Any file additions, changes, or removals require updating the directory and the used/free maps. Random access to data regions is measured in milliseconds so this system works well for disks. Tape requires linear motion to wind and unwind potentially very long reels of media. This tape motion may take several seconds to several minutes to move the read/write head from one end of the tape to the other. Consequently, a master file directory and usage map can be extremely slow and inefficient with tape. Writing typically involves reading the block usage map to find free blocks for writing, updating the usage map and directory to add the data, and then advancing the tape to write the data in the correct spot. Each additional file write requires updating the map and directory and writing the data, which may take several seconds to occur for each file. Tape file systems instead typically allow for the file directory to be spread across the tape intermixed with the data, referred to as streaming , so that time-consuming and repeated tape motions are not required to write new data. However, a side effect of this design is that reading the file directory of a tape usually requires scanning the entire tape to read all the scattered directory entries. Most data archiving software that works with tape storage will store a local copy of the tape catalog on a disk file system, so that adding files to a tape can be done quickly without having to rescan the tape media. The local tape catalog copy is usually discarded if not used for a specified period of time, at which point the tape must be re-scanned if it is to be used in the future. IBM has developed a file system for tape called the Linear Tape File System. The IBM implementation of this file system has been released as the open-source IBM Linear Tape File System — Single Drive Edition (LTFS—SDE) product. The Linear Tape File System uses a separate partition on the tape to record the index meta-data, thereby avoiding the problems associated with scattering directory entries across the entire tape.
Writing data to a tape is often a significantly time-consuming process that may take several hours. Similarly, completely erasing or formatting a tape can also take several hours. With many data tape technologies it is not necessary to format the tape before over-writing new data to the tape. This is due to the inherently destructive nature of overwriting data on sequential media. Because of the time it can take to format a tape, typically tapes are pre-formatted so that the tape user does not need to spend time preparing each new tape for use. All that is usually necessary is to write an identifying media label to the tape before use, and even this can be automatically written by software when a new tape is used for the first time.
Another concept for file management is the idea of a database-based file system. Instead of, or in addition to, hierarchical structured management, files are identified by their characteristics, like type of file, topic, author, or similar rich metadata. [1] IBM DB2 for i [2] (formerly known as DB2/400 and DB2 for i5/OS) is a database file system as part of the object based IBM i [3] operating system (formerly known as OS/400 and i5/OS), incorporating a single level store and running on IBM Power Systems (formerly known as AS/400 and iSeries), designed by Frank G. Soltis IBM's former chief scientist for IBM i. Around 1978 to 1988 Frank G. Soltis and his team at IBM Rochester have successfully designed and applied technologies like the database file system where others like Microsoft later failed to accomplish [4]. These technologies are informally known as 'Fortress Rochester' and were in few basic aspects extended from early Mainframe technologies but in many ways more advanced from a technology perspective. Some other projects that aren't "pure" database file systems but that use some aspects of a database file system: A lot of Web-CMS use a relational DBMS to store and retrieve files. Examples: XHTML files are stored as XML or text fields, image files are stored as blob fields; SQL SELECT (with optional XPath) statements retrieve the files, and allow the use of a sophisticated logic and more rich information associations than "usual file systems". Very large file systems, embodied by applications like Apache Hadoop and Google File System, use some database file system concepts.
In the Linux kernel, configfs and sysfs provide files that can be used to query the kernel for information and configure entities in the kernel. procfs maps processes and, on Linux, other operating system structures into a filespace.
In the late 1970s hobbyists saw the development of the microcomputer. Disk and digital tape devices were too expensive for hobbyists. An inexpensive basic data storage system was devised that used common audio cassette tape. When the system needed to write data, the user was notified to press "RECORD" on the cassette recorder, then press "RETURN" on the keyboard to notify the system that the cassette recorder was recording. The system wrote a sound to provide time synchronization, then modulated sounds that encoded a prefix, the data, a checksum and a suffix. When the system needed to read data, the user was instructed to press "PLAY" on the cassette recorder. The system would listen to the sounds on the tape waiting until a burst of sound could be recognized as the synchronization. The system would then interpret subsequent sounds as data. When the data read was complete, the system would notify the user to press "STOP" on the cassette recorder. It was primitive, but it worked (a lot of the time). Data was stored sequentially in an un-named format. Multiple sets of data could be written and located by fast- forwarding the tape and observing at the tape counter to find the approximate start of the next data region on the tape. The user might have to listen to the sounds to find the right spot to begin playing the next data region. Some implementations even included audible sounds interspersed with the data.
In a flat file system, there are no subdirectories. When floppy disk media was first available this type of filesystem was adequate due to the relatively small amount of data space available. CP/M machines featured a flat file system, where files could be assigned to one of 16 user areas and generic file operations narrowed to work on one instead of defaulting to work on all of them. These user areas were no more than special attributes associated with the files, that is, it was not necessary to define specific quota for each of these areas and files could be added to groups for as long as there was still free storage space on the disk. The Apple Macintosh also featured a flat file system, the Macintosh File System. It was unusual in that the file management program (Macintosh Finder) created the illusion of a partially hierarchical filing system on top of EMFS. This structure required every file to have a unique name, even if it appeared to be in a separate folder. While simple, flat file systems becomes awkward as the number of files grows and makes it difficult to organize data into related groups of files. A recent addition to the flat file system family is Amazon's S3, a remote storage service, which is intentionally simplistic to allow users the ability to customize how their data is stored. The only constructs are buckets (imagine a disk drive of unlimited size) and objects (similar, but not identical to the standard concept of a file). Advanced file management is allowed by being able to use nearly any character (including '/') in the object's name, and the ability to select subsets of the bucket's content based on identical prefixes.
Many operating systems include support for more than one file system. Sometimes the OS and the filesystem are so tightly interwoven it is difficult to separate out filesystem functions. There needs to be an interface provided by the operating system software between the user and the file system. This interface can be textual (such as provided by a command line interface, such as the Unix shell, or OpenVMS DCL) or graphical (such as provided by a graphical user interface, such as file browsers). If graphical, the metaphor of the folder , containing documents, other files, and nested folders is often used (see also: directory and folder).
Unix-like operating systems create a virtual file system, which makes all the files on all the devices appear to exist in a single hierarchy. This means, in those systems, there is one root directory, and every file existing on the system is located under it somewhere. Unix-like systems can use a RAM disk or network shared resource as its root directory.
Unix-like systems assign a device name to each device, but this is not how the files on that device are accessed. Instead, to gain access to files on another device, the operating system must first be informed where in the directory tree those files should appear. This process is called mounting a file system. For example, to access the files on a CD-ROM, one must tell the operating system "Take the file system from this CD-ROM and make it appear under such-and-such directory". The directory given to the operating system is called the mount point – it might, for example, be /media. The /media directory exists on many Unix systems (as specified in the Filesystem Hierarchy Standard) and is intended specifically for use as a mount point for removable media such as CDs, DVDs, USB drives or floppy disks. It may be empty, or it may contain subdirectories for mounting individual devices. Generally, only theadministrator (i.e. root user) may authorize the mounting of file systems. Unix-like operating systems often include software and tools that assist in the mounting process and provide it new functionality. Some of these strategies have been coined "auto-mounting" as a reflection of their purpose.
Linux supports many different file systems, but common choices for the system disk include the ext* family (such as ext2, ext3 and ext4), XFS, JFS, ReiserFS and btrfs.
The Sun Microsystems Solaris operating system in earlier releases defaulted to (non-journaled or non- logging) UFS for bootable and supplementary file systems. Solaris defaulted to, supported, and extended UFS. Support for other file systems and significant enhancements were added over time, including Veritas Software Corp. (Journaling) VxFS, Sun Microsystems (Clustering) QFS, Sun Microsystems (Journaling) UFS, and Sun Microsystems (open source, poolable, 128 bit compressible, and error-correcting) ZFS. Kernel extensions were added to Solaris to allow for bootable Veritas VxFS operation. Logging or Journaling was added to UFS in Sun's Solaris 7. Releases of Solaris 10, Solaris Express, OpenSolaris, and other open source variants of the Solaris operating system later supported bootable ZFS. Logical Volume Management allows for spanning a file system across multiple devices for the purpose of adding redundancy, capacity, and/or throughput. Legacy environments in Solaris may use Solaris Volume Manager (formerly known as Solstice DiskSuite.) Multiple operating systems (including Solaris) may use Veritas Volume Manager. Modern Solaris based operating systems eclipse the need for Volume Management through leveraging virtual storage pools in ZFS.
The family of FAT file systems is supported by almost all operating systems for personal computers, including all versions of Windows and MS-DOS/PC DOS and DR-DOS. (PC DOS is an OEM version of MS-DOS, MS-DOS was originally based on SCP's 86-DOS. DR-DOS was based on Digital Research's Concurrent DOS.) The FAT file systems are therefore well-suited as an universal exchange format between computers and devices of most any type and age. The FAT file system traces its roots back to an (incompatible) 8-bit FAT precursor in Stand-alone Disk BASIC and the short-lived M-DOS project. Over the years, the filesystem has been expanded from FAT12 to FAT16 and FAT32. Various features have been added to the file system including subdirectories, codepage support,extended attributes, and long filenames. Third-parties such as Digital Research have incorporated optional support for deletion tracking, and volume/directory/file-based multi-user security schemes to support file and directory passwords and permissions such as read/write/execute/delete access rights. Most of these extensions are not supported by Windows. The FAT12 and FAT16 file systems had a limit on the number of entries in the root directory of the file system and had restrictions on the maximum size of FAT-formatted disks orpartitions. FAT32 addresses the limitations in FAT12 and FAT16, except for the file size limit of close to 4 GB, but it remains limited compared to NTFS. FAT12, FAT16 and FAT32 also have a limit of eight characters for the file name, and three characters for the extension (such as .exe). This is commonly referred to as the 8.3 filenamelimit. VFAT, an optional extension to FAT12, FAT16 and FAT32, introduced in Windows 95 and Windows NT 3.5, allowed long file names (LFN) to be stored in the FAT filesystem in a backwards compatible fashion.
Main article: NTFS NTFS, introduced with the Windows NT operating system, allowed ACL-based permission control. Other features also supported by NTFS include hard links, multiple file streams, attribute indexing, quota tracking, sparse files, encryption, compression, and reparse points (directories working as mount-points for other file systems, symlinks, junctions, remote storage links), though not all these features are well- documented.[ citation needed ]
Main article: exFAT exFAT is a proprietary and patent-protected file system with certain advantages over NTFS with regards to file system overhead. exFAT is not backwards compatible with FAT file systems such as FAT12, FAT16 or FAT32. The file system is supported with newer Windows systems, such as Windows 2003, Windows Vista, Windows 2008, Windows 7 and more recently, support has been added for Windows XP.[8]^ Support in other operating systems is sparse since Microsoft has not published the specifications of the file system and implementing support for exFAT requires a license.
It may be advantageous or necessary to have files in a different file system than they currently exist. Reasons include the need for an increase in the space requirements beyond the limits of the current filesystem. The depth of path may need to be increased beyond the restrictions of the filesystem. There may be performance or reliability considerations. Providing access to another operating system which does not support existing filessytem is another reason.
In some cases conversion can be done in-place, although migrating the file system is more conservative, as it involves a creating a copy of the data and is recommended.[13]^ On Windows, FAT and FAT32 file systems can be converted to NTFS via the convert.exe utility, but not the reverse.[13]^ On Linux, ext2 can be converted to ext3 (and converted back), and ext3 can be converted to ext4 (but not back),[14]^ and both ext3 and ext4 can be converted to btrfs, and converted back until the undo information is deleted.
[15] (^) These conversions are possible due to using the same format for the file data itself, and relocating the metadata into empty space, in some cases using sparse file support.[15]
Migration has the disadvantage of requiring additional space although it may be faster. The best case is if there is unused space on media which will contain the final file system. For example, to migrate a FAT32 filesystem to an ext2 filesystem. First create a new ext2 filesystem, then copy the data to the filesystem, then delete the FAT32 filesystem. An alternative, when there is not sufficient space to retain the original filesystem until the new one is created, is to use a work area (such as a removable media). This takes longer but a backup of the data is a nice side effect.
In hierarchical file systems, files are accessed by means of a path that is a branching list of directories containing the file. Different filesystems have different limits on the depth of the path. Filesystems also have a limit on the length of an individual filename. Copying files with long names or located in paths of significant depth from one filesystem to another may cause undesirable results. This depends on how the utility doing the copying handles the discrepancy.
The FAT file system is simple yet robust. It offers reasonable performance even in light-weight implementations and is therefore widely adopted and supported by virtually all existing operating systems for personal computers as well as some home computers andembedded systems. This makes it a useful format for solid-state memory cards and a convenient way to share data between operating systems. FAT file systems are the default file system for removable media (with the exception of CDs and DVDs) and as such are commonly found on floppy disks, super-floppies, memory and flash memory cards or USB flash drives and are supported by most portable devices such as PDAs, digital cameras, camcorders, media players, or mobile phones. While FAT12 is omnipresent on floppy disks,FAT16 and FAT32 are typically found on the larger media. FAT was also commonly used on hard disks throughout the DOS and Windows 9x eras, but its use on hard drives has declined since the introduction of Windows XP, which primarily uses the newer NTFS. FAT is still used in hard drives expected to be used by multiple operating systems, such as in shared Windows and Linux environments. Due to the widespread use of FAT formatted media since its introduction many operating systems have provided support for FAT and subsequently VFAT and FAT32 through official or third-party file system handlers. For example, Linux, FreeBSD, BeOS and JNodeprovide inbuilt support for FAT. Mac OS 9 and Mac OS X also support FAT file systems on volumes other than the boot disk. AmigaOS supports FAT through the CrossDOS file system. For many purposes, the NTFS file system is superior to FAT in terms of features and reliability; its main drawbacks are the size overhead for small volumes and the very limited support by anything other than the NT-based versions of Windows, since the exact specification is a trade secret of Microsoft. The availability of NTFS-3G since mid 2006 has led to much improved NTFS support in Unix-like operating systems, considerably alleviating this concern. It is still not possible to use NTFS in DOS-like operating systems without third-party drivers, which in turn makes it difficult to use a DOS floppy for recovery purposes. Microsoft provided a recovery console to work around this issue, but for security reasons it severely limited what could be done through the Recovery Console by default. The movement of recovery utilities to boot CDs based on BartPE or Linux (with NTFS-3G) is finally eroding this drawback. Due to its long history of usage on desktops and portable computers meanwhile spanning over more than three decades and its frequent use in embedded solutions, the FAT file system continues to be the most widespread file system worldwide. For floppy disks, the FAT has been standardized as ECMA-107[4]^ and ISO/IEC 9293:1994[5]^ (superseding ISO 9293:1987[6]). These standards cover FAT12 and FAT16 with only short 8.3 filename support; long filenames with VFAT are partially patented.[7]