[RC5] Use of FPU on Intel 486 and P5 processors

gindrup at okway.okstate.edu gindrup at okway.okstate.edu
Tue Jan 6 16:04:27 EST 1998


     Although not entirely resolving in this issue, 
     ftp://download.intel.com/design/mmx/manuals/24281603.PDF
     has:
     5.6.1 Using Integer Instructions to Hide Latencies of Floating-Point
     Instructions
        When a floating-point instruction depends on the result of the 
     immediately preceding instruction, and it is also a floating-point 
     instruction, it is advantageous to move integer instructions between 
     the two FP instructions, even if the integer instructions perform loop
     control. The following example restructures a loop in this manner:
     
     From reading around in that document, my best guess for the cause of 
     the stalls you're seeing is that the integer instructions are 
     referencing the same "piece" of memory as the FP instructions, and 
     there are significant penalties for quickly changing the access width 
     of memory operations.
     
     I'd be curious to know how much of the FP stack you're using since it 
     might be possible to schedule more than one key in the FPU with 
     staggered execution (to hide the latency of the other FPU key).
     
     I'll be looking in my '486 manual tonight to see if the FPU is 
     supposed to stall the integer unit on that processor.  (Although it 
     might be throwing "Not an Instruction" exceptions on the '486).
            -- Eric Gindrup ! gindrup at Okway.okstate.edu


______________________________ Reply Separator _________________________________
Subject: [RC5] Use of FPU on Intel 486 and P5 processors 
Author:  <rc5 at llamas.net > at SMTP
Date:    1/6/98 10:48 AM


There has been some comment on the possibility of using the floating point unit 
(FPU) on Intel processors in parallel with the integer unit to help process more
keys.

I have tested this idea on a 486 and a P5 and found that instructions to the FPU
stall the integer pipeline for the duration of the FPU instruction.  

It may be different on other chips, or even later Intel chips, but I don't have 
access to those.  If after testing you discover that the FPU does execute in 
parallel without stalling the integer pipeline, I have available a 34 step FPU 
sequence that will do one cycle of 
round 1 of the key expansion.  Foolishly, though partly as an intellectual 
exercise, I developed this before testing the pipeline stalling.

As MMX instructions use the FPU registers I suspect that they too will stall the
integer pipeline but this should really be tested.

Just to add some facts to this discussion.

Bruce Ford                                      b.ford at qut.edu.au Systems 
Programmer
Teaching and Learning Support Services          Ph: +61 7 3864 3383 Queensland 
University of Technology
--
To unsubcribe, send 'unsubscribe rc5' to majordomo at llamas.net rc5-digest 
subscribers replace rc5 with rc5-digest




--
To unsubcribe, send 'unsubscribe rc5' to majordomo at llamas.net
rc5-digest subscribers replace rc5 with rc5-digest



More information about the rc5 mailing list