Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a question about multiplication and division algorithms #12

Open
hellorohan opened this issue May 16, 2017 · 1 comment
Open

a question about multiplication and division algorithms #12

hellorohan opened this issue May 16, 2017 · 1 comment

Comments

@hellorohan
Copy link

Thanks for sharing the codes.
Can you tell which algorithm exactly is being used for multiplication and division in the codes. I was curious.
Single-precision division is taking >110 cycles, so I guess, these algorithms may not be the ones used in real processors?
(I could not find any contact email, so thought of raising an issue).

@dawsonjon
Copy link
Owner

Hi,

The restoring division algorithm is used in this core. I expect that some processors do use the restoring division algorithm, however there are more computationally efficient methods such as SRT (as in the famous Pentium bug) and Newton's method. These algorithms take approximately half the number of iterations to achieve the same precision. The multiplication uses the hard multiplier blocks in the FPGA.

My aim was to prioritise area over speed in this library, because floating point operations can consume rather a lot of real estate in FPGA fabric. For example division is achieved by performing repeated shifts and subtracts, in this core the same shift and subtract logic are reused for each iteration of the algorithm. This means that each division takes ~100 cycles, but uses about 1/50th of the logic that a pipelined implementation would use. There is still room for improvement, but this was a pretty reasonable trade-off in my original application.

If you are interested in a high-performance implementation of floating point operations why not check out https://github.com/dawsonjon/verilog-math this contains fully pipelined implementations of multiply, divide and square root among others. In this implementation the single precision divider has a latency of 36 clock cycles, but has a throughput rate of 1 operation per clock cycle. The design should be good for a few hundred MHz in a modern FPGA.

Jon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants