I recently acquired the writeaheadlog.com domain. It seems like all the cool kids have interesting domains, like allthingsdistributed.com or crazybob.org, and I didn't want to be left out. your-name.com is so 2008. "Ahead" inspires forward-thinking and optimistism--full steam ahead! "Write" and "log" are appropriate and factual--this is a log of my writing. Taken together, write-ahead-logging is used in many database systems, hinting this will be a software and programming centric blog. Surprised such a great domain was still available; out of 2,267,233,742 Internet users I alone deem it worthy of $19.99/year.
So what exactly is a write-ahead log? Imagine your application has two counters. The constraint is the counters must be equal in all consistent states. We might double each counter during a transaction,
Problems arise when there are system failures. After output(A) there is a power outage so output(B) does not get executed. On disk, A is 20 but B is still 10, violating our contraint. We need a mechanism to handle such failures since they cannot be prevented.
Enter the log. The log records information about transactions so we can restore our system to a consistent state. The first log approach, the undo log, reverses the changes of incomplete transactions. In our example, upon recovery, changes to A are undone so A is once again 10 and (A == B == 10). The log is of course written to nonvolatile storage.
An undo log looks something like this,
When we update A we log a record indicate its before value 10. Likewise, when we change B from 10 to 20 we record its before value 10. Before outputting A and B to disk we must flush the log (the undo log, not the data log). Only after output(A) and output(B) are successful can we record <commit T>.
With the undo log in place, how do we recovery from failure? We read the undo log from the end (most recently written record) to the start and find incomplete transactions. Transactions with a <commit> record we can ignore because we know that <commit> can only be recorded after output has been successful. If there is no <commit> record for the transaction we cannot be certain that output was successful, so we use the undo information to revert the changes. <T, B, 10> sets B back to 10 and <T, A, 10> sets A back to 10. The undo records are idempotent so if there is a crash during recovery we can just recovery as usual, setting B to 10 even if B is already 10; setting A to 10 even if A is already 10. The undo log records <abort T> to indicate we aborted the transaction.
The undo log is great but there is one annoying performance issue. Before we can record <commit T> in the undo log we must do output(A) and output(B), incurring disk I/O. If we have a lot of transactions we keep having to do output in order to maintain the integrity of the undo log. We may want to buffer the output until a convenient time.
Enter the redo log. Instead of undoing a change we will record information (the new value v) so we can rerun transactions, reapplying the change if necessary. Before doing any output we must record the <commit> record. We write-ahead. The write-ahead logging rule,
Before modifying any database element X on disk, it is necessary that all log records pertaining to this modification of X, including both the update record <T, X, v> and the <COMMIT T> record, must appear on disk.
So our redo log will look something like this,
We record the new values (20 and 20) then commit then flush the log. We can only do output after the <commit T> record has been written and the log flushed. This solves the issue of buffering our output. We still will do disk I/O for the log flush but logs are sequential appends so it can be much faster than random outputs to random blocks on the disk (see my blog on Log Structured File Systems for Dummies, not that I'm calling you a dummy).
To recovery with a redo log we begin at the head of the log scanning forward (opposite of the undo log). If we see an incomplete transaction (no <commit T> record) we can ignore (except adding an <abort T>), knowing that no output was ever done. However, if we see a <commit T> record we don't know whether the output was successful or not, so we just redo the change, even if it's redundant. A would be set to 20, B would be set to 20 even though the log indicates the transaction committed with <commit T>. Like the undo log, the changes are idempotent so repeated calls are fine.
Now you know the inspiration for writeaheadlog.com. There is a lot more information in Database Systems: The Complete Book (which the above is a blatant plagarism of). For example, we can combine the undo and redo log to create a undo/redo log that stores before and after values. This allows us more flexibility in when we write the <commit T> record and when we need to flush the log and do output. There are also challenges of checkpointing so we don't have to read the entire log.
Incidentally, while very few of us have the skills (myself included) and desire to write a transaction/log/recovery manager, the undo/redo approach to maintaining consistency can be used to mimic transaction-like properties. Imagine remote HTTP calls or NoSQL systems that lack strict transaction support. We might use the undo/redo techniques/approach so our application can recover to a consistent state by repeatedly applying idempotent changes in the face of failures.
Comments