[RC5] Stats suggestion

Kelly Byrd kbyrd at rotor.net
Fri Jan 5 09:53:19 EST 2001


Ben is right on this. When I first encountered the exponential average idea
I couldn't believe it would work well at all. But after actually running
real numbers through it (bandwidth counts in my case) it works extremely
well. You can to the data to be more or less sensitive to current data (sort
of like a 5-day average rather than a 90-day average) by  changing the
"half-life" (I called it the decay factor). But nonetheless, the first
calculation requires all historical data. After that, you only need
yesterdays average, today's new rate and the decay factor. 

KB

-----Original Message-----
From: Ben Clifford [mailto:benc at hawaga.org.uk]
Sent: Friday, January 05, 2001 9:24 AM
To: rc5 at lists.distributed.net
Subject: Re: [RC5] Stats suggestion



On Sat, 6 Jan 2001, Stephen Berg wrote:

> Actually the comparison to a *nix uptime load average makes sense. 
> And that sounds like a better solution all around.  I guess the stats
> gurus will have to comment though.  I also like the idea of having a
> 7, 30 and 60 day value.  Or maybe 7, 30 and 90.

It will probably need a bit of playing with the data to see what people
prefer. There is no particular reason to have three - thats just the way
that unix does it for loadaverage so I was thinking about three different
counters. dnet could have more or less depending on various factors (eg.
load on server, differing peoples desires).

The larger the value, the longer it takes for changes to take effect - i.e
if I massively increase my block rate starting tomorrow (by purchasing a
new 4-processor PC), it will take 90 days for the average to have got
half-way to the new rate, and another 90 days for it to have got three
quarters of the way. So we don't want to set the values too high.

> Wouldn't the
> calculation for this still need to retrieve daily values for every
> day being computed?  I think that was one of the drawbacks of adding
> this feature since it would have to do that for each and every
> participant along with the normal daily processing.

nonono, one of the main advantages of this, and the reason that I posted
it here, is that the calculation can take place with *only* two pieces of
info: (1)the number of blocks submitted today, (2)yesterdays average.

It seems a bit counter-intuitive but if you write down all the formulae
and play about with them it magically comes out needing only those two
pieces of information.

I expect this is one of the reasons why the unix people chose it.

A bit more explanation of what happens:

For half-life=7days:  
  work done in the past 7 days counts for half of the average
  work done in the 7 days prior to that counts as a quarter of the
average,
  work done in the 7 days prior to that counts as an eighth of the
average,
  work done in the 7 days prior to that counts as a sixteenth of the
average,
  and so on until the first block you ever submitted, which counts as one
squillionth of the average. 

And within each 7 day period, the most recent day counts as worth more
than the earliest day.

-- 
http://www.hawaga.org.uk/travel/ for my rotating world map applet
http://www.hawaga.org.uk/benc_key.txt PGP / GPG key 0x30F06950 - please use
it!


--
To unsubscribe, send 'unsubscribe rc5' to majordomo at lists.distributed.net
rc5-digest subscribers replace rc5 with rc5-digest
--
To unsubscribe, send 'unsubscribe rc5' to majordomo at lists.distributed.net
rc5-digest subscribers replace rc5 with rc5-digest



More information about the rc5 mailing list