文件系统的持久化

《OSTEP》文件持久化部分阅读笔记

在两次write()之间，系统可能出现异常的断电，崩溃等情况，导致磁盘文件被部分更新。

在系统重启时，文件系统被再次装载，如何解决之前异常崩溃带来的on-disk image inconsistent state 是接下来要讨论的重点

具体来看下磁盘出现不一致的情况

当我们append new data block 到一个文件，至少需要更新3个数据结构:inode,data block,data bitmap在更新这三个数据结构的过程中，系统都可能崩溃，在不同的阶段崩溃造成的后果不一样。

假设只有一次更新成功

基本思路是让不一致发生，然后修复它。

但是存在一些问题比如,即使inode 与data bitmap 一致，仍然可能存在读取脏数据
其次这种做法最大的问题就是速度太慢！需要扫描整个磁盘来检查所有相关的数据结构。

在更新磁盘之前，先将当前操作(trananactions) 记录下来，当系统崩溃时候，能够replay。这样做的好处就是能够精准定位崩溃问题，不需要扫描整个磁盘。

具体如何实现

Journal write: Write the transaction, including a transaction-begin block, all pending data and metadata updates, and a transaction-end block, to the log; wait for these writes to complete.
Checkpoint: Write the pending metadata and data updates to their final locations in the file system.

问题
在写日志的时候可能崩溃

改进
将写日志操作拆分为两步

这样做的好处在于磁盘能够保证对单个区块(512byte)的原子操作，进而保证对单条日志记录要么成功要么失败。

日志的空间大小有限，一旦日志空间满，之后的transactions 将不能被记录。
具体解决方法通过循环日志(类似循环队列) 一旦一个transaction 被check了，系统就释放对应的日志空间

经过上述几个改进，我们现在的做法如下

Journal write: Write the contents of the transaction (containing TxBand the contents of the update) to the log; wait for these writes to
complete.
Journal commit: Write the transaction commit block (containingTxE) to the log; wait for the write to complete; the transaction is now committed.
Checkpoint: Write the contents of the update to their final locations within the file system.
Free: Some time later, mark the transaction free in the journal by updating the journal superblock.

上述解决方案还能进一步优化: 我们将data block 两次写到了磁盘

基本操作与之前一样，只是没有将data 写到日志中。

Data write: Write data to final location; wait for completion(the wait is optional; see below for details).
Journal metadata write: Write the begin block and metadata to thelog; wait for writes to complete.
Journal commit: Write the transaction commit block (containingTxE) to the log; wait for the write to complete; the transaction (including data) is now committed.
Checkpoint metadata: Write the contents of the metadata updateto their final locations within the file system.
Free: Later, mark the transaction free in journal superblock.

先写data 是为了避免脏读，如果先写meta，然后系统崩溃，就会导致读取旧的数据。