[RC5] Birthday Paradox

Tim Charron tcharron at interlog.com
Sat Mar 21 16:00:43 EST 1998


> At 23 people, the probability of everybody having a different birthday
> drops below 50%. So the probability of some two people having the *same*
> birthday (the opposite of the above case) rises above 50%. The probability
> of some pair of people sharing the same birthday in a group of 23 people is
> about 50.73%.

For a simple way to see the "paradox" part of this, add the following 
twist:  As each new person enters the room, they are forced to shake 
hands with everyone else who is already in the room, and to compare 
birthdays.  The second person introduces himself to 1 other person.  
The 3rd to 2 others, the 4th to 3 others, etc.  The 23rd person 
introduces himself to 22 others.  At this point, there have been 
1+2+3+4+5+...+22 = 253 introductions.  Looking at the problem this 
way, it's a bit easier to believe that there's a 50% chance of a 
duplicate birthday having been found.

> For the ECCp-97 challenge, the number of DPs is 1073741824. My calculations
> give n=38582 when the 50% mark is crossed. TimC recently said their 50%
> mark was around 450000. Did I screw up somewhere? Here's the perl program I
> used to calculate my results:

> Can anybody tell me where I've gone wrong?

There are 2^30 DPs, however, there are actually 2^97 different points 
possible.  The Pollard Rho algorithm that was used is actually a 
"random walk".  Each client just keeps generating points.  Each time 
a point with a particular property is found, it's sent to the server. 
 The random walk and the property chosen are both arbitrary, but are 
identical across all clients.  In the ECCp-97 case, they just counted 
a DP as any point with an x coordinate in a particular range that was 
2^30 wide (hence the 2^30 possible DPs).  The trick is that each step 
in the random walk is determined based on the actual co-ordinates of 
that point.  This ensures that once 2 different clients come across a 
duplicate point, their paths will be identical from that point 
forward.  Eventually, each of these clients will encounter the same 
DP and send it to the server, along with some parameters about how it 
was generated.  The server can then extract the solution to the 
problem.

There's a shortcut to calculate the 50% mark.  When there are "n" 
possible values, then once k=sqrt(n*pi/2) points are encountered, the 
chances of a duplicate are 50%.  For n=2^97, this works out to 
498901406404343.  Since only one in every 2^30 points are sent to the 
server, the server needs to find 498901406404343/2^30 = 
464638 points to cross the 50% mark.   My earlier calculation used 
k=1.2*sqrt(n), which gives 444873 points required.  For the next ECC 
contest, ECCp-109, n=2^109, and the contest is about 64 times more 
difficult than ECCp-97.

-- Tim
Tim Charron
tcharron at interlog.com
tcharron at ctfinance.com
http://www.interlog.com/~tcharron/

An idle computer is a terrible thing!  http://www.distributed.net/
--
To unsubscribe, send 'unsubscribe rc5' to majordomo at lists.distributed.net
rc5-digest subscribers replace rc5 with rc5-digest



More information about the rc5 mailing list