[RC5] Use of FPU on Intel 486 and P5 processors

Mary Conner trif at serv.net
Tue Jan 6 15:30:03 EST 1998


I had a look at the feasibility of using the FPU to do keys in parallel
with with regular integer pipelines on x86 processors, looking specially
at the Cyrix 6x86, since that is the processor I have.  The biggest
problem that I could see is that the FPU instructions still have to go
through the integer pipelines for decoding, and the instruction prefetch
unit is capable of delivering two instructions per cycle, one to each
pipeline, and FPU instructions can only go to the X pipeline.  Almost all
of the instructions inside the key cycle loop are rotls, adds, and xors,
and with the exception of rotl by register, all are only 1 cycle in
length.  rotl by register is a 2 cycle instruction.  Those instructions
that make memory accesses *might* add additional cycles, but most
involve registers.  Using the FPU is no win unless the FPU instructions
can be scheduled behind a multi-cycle instruction that is known to be
going to the X pipeline.  I don't have the pentium data book, but
I have been told that the pentium rotl requires both pipelines, so
sticking FPU instructions behind the pentium rotl's might solve the
integer stall problem.  But if you're sticking an FPU instruction 
into a pipeline behind a one cycle instruction, the pipeline will 
stall because the next instruction it was expecting to execute has
been sent to the FPU, and the execution unit will have to wait an
additional clock cycle for another instruction to arrive.

On Tue, 6 Jan 1998 gindrup at okway.okstate.edu wrote:

> 
>      Although not entirely resolving in this issue, 
>      ftp://download.intel.com/design/mmx/manuals/24281603.PDF
>      has:
>      5.6.1 Using Integer Instructions to Hide Latencies of Floating-Point
>      Instructions
>         When a floating-point instruction depends on the result of the 
>      immediately preceding instruction, and it is also a floating-point 
>      instruction, it is advantageous to move integer instructions between 
>      the two FP instructions, even if the integer instructions perform loop
>      control. The following example restructures a loop in this manner:
>      
>      From reading around in that document, my best guess for the cause of 
>      the stalls you're seeing is that the integer instructions are 
>      referencing the same "piece" of memory as the FP instructions, and 
>      there are significant penalties for quickly changing the access width 
>      of memory operations.
>      
>      I'd be curious to know how much of the FP stack you're using since it 
>      might be possible to schedule more than one key in the FPU with 
>      staggered execution (to hide the latency of the other FPU key).
>      
>      I'll be looking in my '486 manual tonight to see if the FPU is 
>      supposed to stall the integer unit on that processor.  (Although it 
>      might be throwing "Not an Instruction" exceptions on the '486).
>             -- Eric Gindrup ! gindrup at Okway.okstate.edu
> 
> 
> ______________________________ Reply Separator _________________________________
> Subject: [RC5] Use of FPU on Intel 486 and P5 processors 
> Author:  <rc5 at llamas.net > at SMTP
> Date:    1/6/98 10:48 AM
> 
> 
> There has been some comment on the possibility of using the floating point unit 
> (FPU) on Intel processors in parallel with the integer unit to help process more
> keys.
> 
> I have tested this idea on a 486 and a P5 and found that instructions to the FPU
> stall the integer pipeline for the duration of the FPU instruction.  
> 
> It may be different on other chips, or even later Intel chips, but I don't have 
> access to those.  If after testing you discover that the FPU does execute in 
> parallel without stalling the integer pipeline, I have available a 34 step FPU 
> sequence that will do one cycle of 
> round 1 of the key expansion.  Foolishly, though partly as an intellectual 
> exercise, I developed this before testing the pipeline stalling.
> 
> As MMX instructions use the FPU registers I suspect that they too will stall the
> integer pipeline but this should really be tested.
> 
> Just to add some facts to this discussion.
> 
> Bruce Ford                                      b.ford at qut.edu.au Systems 
> Programmer
> Teaching and Learning Support Services          Ph: +61 7 3864 3383 Queensland 
> University of Technology
> --
> To unsubcribe, send 'unsubscribe rc5' to majordomo at llamas.net rc5-digest 
> subscribers replace rc5 with rc5-digest
> 
> 
> 
> 
> --
> To unsubcribe, send 'unsubscribe rc5' to majordomo at llamas.net
> rc5-digest subscribers replace rc5 with rc5-digest
> 
> 

--
To unsubcribe, send 'unsubscribe rc5' to majordomo at llamas.net
rc5-digest subscribers replace rc5 with rc5-digest



More information about the rc5 mailing list