You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for sharing the codes.
Can you tell which algorithm exactly is being used for multiplication and division in the codes. I was curious.
Single-precision division is taking >110 cycles, so I guess, these algorithms may not be the ones used in real processors?
(I could not find any contact email, so thought of raising an issue).
The text was updated successfully, but these errors were encountered:
The restoring division algorithm is used in this core. I expect that some processors do use the restoring division algorithm, however there are more computationally efficient methods such as SRT (as in the famous Pentium bug) and Newton's method. These algorithms take approximately half the number of iterations to achieve the same precision. The multiplication uses the hard multiplier blocks in the FPGA.
My aim was to prioritise area over speed in this library, because floating point operations can consume rather a lot of real estate in FPGA fabric. For example division is achieved by performing repeated shifts and subtracts, in this core the same shift and subtract logic are reused for each iteration of the algorithm. This means that each division takes ~100 cycles, but uses about 1/50th of the logic that a pipelined implementation would use. There is still room for improvement, but this was a pretty reasonable trade-off in my original application.
If you are interested in a high-performance implementation of floating point operations why not check out https://github.com/dawsonjon/verilog-math this contains fully pipelined implementations of multiply, divide and square root among others. In this implementation the single precision divider has a latency of 36 clock cycles, but has a throughput rate of 1 operation per clock cycle. The design should be good for a few hundred MHz in a modern FPGA.
Thanks for sharing the codes.
Can you tell which algorithm exactly is being used for multiplication and division in the codes. I was curious.
Single-precision division is taking >110 cycles, so I guess, these algorithms may not be the ones used in real processors?
(I could not find any contact email, so thought of raising an issue).
The text was updated successfully, but these errors were encountered: