[RC5] Checkpoint file

Unrau, Trevor unraut at aecl.ca
Thu Oct 25 08:36:25 EDT 2001

You don't need separate .ini files for each client run to get unique ckpoint
files.  On my network, I have a single directory with client and buffer
files, and a single .bat file with the line "dnetc -ckpoint %HOSTNAME%" or
something like that (I don't have it in front of me right now -- check SET
for the variable name).  Just run the .bat file on each machine, and they
will create a ckpoint file with the host name.
As well, I don't shut the clients down cleanly.  This prevents them from
writing partially completed blocks to the buff-in file.  When the clients
are re-started, they resume work using the block in their ckpoint files.
The most work you lose is ~5 mins per machine.  This seems to prevent
problems with all the clients accessing the same buffer file at once.

Trevor Unrau
Computer Systems Technologist,
Atomic Energy of Canada
Being right is highly overrated. Even a stopped clock is right twice a day.

-----Original Message-----
From: TimO [mailto:hairballmt at mcn.net]
Sent: Thursday October 25, 2001 1:18 AM
To: rc5 at lists.distributed.net
Subject: Re: [RC5] Checkpoint file

> >You *can* have separate buffer files, using the technique someone else
> >mentioned of separate config files.  As long as you are setting up a
> >config file per client, set each config file to use a unique checkpoint
> >file.
> >
> >Where buffer files are iffy, shared checkpoint files are absolutely not
> >an option.  The client assumes it owns the checkpoint file, and rewrites
> >it every time the work in process is saved.  If you are using checkpoint
> >files, you *must* guarantee that each client is using a separate config
> >file.
> mh, well can this be done with the -ini option ? mh, I will look if theres
> an option for checkpoint files, because its not fun to set up 138
> ini files... I use a script to start them all via ssh, so I could use the
> hostname as name for the checkpoint file, that should be unique
> >A clean shutdown of a client will result in all work being saved
> >properly, checkpoint or not.  The checkpoint file is really intended to
> >protect against unexpected shutdowns.
> but afaik thats not true if several clients use the same buffer file. then
> only the last work unit will be used, all others discarded, so everything
> not finished then will be lost....
> or am i wrong ?

You need unique config files and/or unique buffer/checkpoint files for
each client.  You said that you want to keep network traffic down, so
checkpoint files aren't a very good option for you.  dnetc -shutdown,
^C, and killall(Linux anyway) dnetc will all save current state if you
have unique buffer files for each client(as specified by unique config
files).  Hitting the 'reset' button, pulling the plug, any other
catastrophic loss of power, or kill -9 pid_of_dnetc are the most common
instances which would make a checkpoint file handy.  How often do those
occur?  Having unique buffer/checkpoint files requires you to have
unique config files for each client.

-- TimO
                             No Cool .sig
To unsubscribe, send 'unsubscribe rc5' to majordomo at lists.distributed.net
rc5-digest subscribers replace rc5 with rc5-digest
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.distributed.net/pipermail/rc5/attachments/20011025/163dac4b/attachment-0001.htm

More information about the rc5 mailing list