[rc5] Why Isn't RC5 Client Fail-Soft?

Mikus Grinbergs mikus at bga.com
Sat Jul 26 16:16:15 EDT 1997


In the past couple of weeks, while working with things other than
RC5, I have twice managed to "crash" my system - i.e., the only
"recovery" was to re-boot.  Both times, the current block that RC5
was working on was "lost" -- it had ALREADY been removed from
buff-in.rc5, but due to the crash it was NEVER recorded - neither
out on buff-out.rc5 (finished) nor back on buff-in.rc5 (checkpointed).

I imagine the way it works was designed to support multiple copies
of RC5 runing simultaneously - if the block is removed from buff-in
as soon as it is "picked up", no other copy will "pick up" that same
block.  But what if the client that _has_ the block never gets the
chance to record its status?


Let me suggest using one more file - buff-work.rc5.  Then "picking
up" a block would mean moving it from buff-in to buff-work, and
"putting down" a block would mean moving it from buff-work to
buff-out.  And "checkpointing" would mean removing the block from
buff-work, as well as adding it again to buff-in.

With this change, those who did not wish to re-process blocks would
start RC5 normally, and it would only look at buff-in.  But a new
parameter (-restart) would allow an operator, after a system crash,
to first move the blocks (if any) from buff-work to buff-in, before
starting decoding -- thereby "recovering" those blocks which were
neither finished nor checkpointed before the system went down.


mikus

----
To unsubscribe, send email to majordomo at llamas.net with 'unsubscribe rc5' in the body.



More information about the rc5 mailing list