08-09-2011
4,673,
588
Join Date: Oct 2010
Last Activity: 1 February 2016, 3:35 PM EST
Location: Southern NJ, USA (Nord)
Posts: 4,673
Thanks Given: 8
Thanked 588 Times in 561 Posts
Suppose you are proessing an input file updating a database and making a report. At some point, you record how many input records are processed, how many report lines or bytes are written, and commit your updates in the database. Now you can restart, seeking to the last checkpoint output file offset and writing from there, processing input starting at the saved record count. You only vulnerability is the time from the start of writing the checkpoint to the commit. Discard incomplete restart point records, as the commit has not happened. File writes need to be fsync() to disk at commit time. In fact, the safest place to put the checkpoint is in the db so it does not exist for the future until the commit. If your process has 2 or more db, there is a 2 phase commit, prepare and commit, to minimize the window in time where one db is committed and the second is not.
A lot of the value of an rdbms or a middleware is maintaining transactional boundaries (checkpoints).
Some actions are not repeatable, and to have checkpoints, sometimes a process needs to be redesigned to make all actions capable of roll back to start of transaction.
Sometimes you can design the process so repeated actions due to a restart do not damage the outputs. For instance, for each input record, a row is inserted with a timestamp. If you rerun after inserting for half a file, there are extra rows with older timestamps but no new information. A cleanup process can detect them and remove them. Or, the loader can determine the last insert for the key had identical information, and discard records that do nothing without an insert. In batch, you might do a first phase to find all differences, and a second phase to apply them. If a rerun of an interrupted second phase is necessary, or a the whole file is rerun by accident, there is less or no phase 2 data, as the file is already partly or completely applied.
The transaction or no-checkpoint data in an RDBMS can be handled two ways. As data is modified, new pages are created but old pages are preserved unchanged. At the commit, the database installs the new pages and discards the old pages. While a session is updating pages, it may be the only session which sees the version on the new pages, or it may share that view (nutty idea). A query or session transaction running long on the old pages may be given temporary 'ownership' of them so they are not discarded until a commit or rollback there. Rollback is implicit on exit or disconnect. A process may get killed for owning too man uncommitted pages when doing a query and space runs low or limits are hit, because some other process is committing pages and pushing the old page ownership on them.
Obviously, two processes cannot update the same value in the database, so the second will wait on a lock. If a locks x and then blocks wanting y while b locks y and then blocks wanting x, one process will be killed to remove the deadlock. It helps if everyone locks resources in alphabetical order or the like, so there is no deadlock. For bottlenecks, a process or rdbms schema redesign might be necessary.
Locks can be more common if the rdbms does not handle multi-generational data as I described, but only supports one new and one current page set. The first user to modify the page locks out all others. A user querying the current page locks out any proess desiring changes until it finishes, so the current page stays put. Some DB are granular to the page for locks, others to the row, but too many row or page locks are promoted to a full table lock. Some DB lock out just changes to the table, others lock all access to the table.