You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the best parts of Filament is the amount of control it provides the user by virtue of being an HDL. At the same time, however, the gen framework enables filament to hook up with accelerator generators, and in general some designs may simply be better left to a program to optimize rather than by hand. Although we could use external tools for every such component, one of their limitations at the moment is the inability to optimize in a controlled manner, as input and output interfaces are usually simple (inputs at the same time, outputs as well) and are not constrainable by the user.
Ideally, we would like to avoid forcing the user to deal with all this routing code when it is not performance-critical, so for this I propose adding an HLS-like system (possibly based on SDC modules) to do this sort of optimization automatically.
With this system, I hope in the future manually optimized Filament will be less common, and instead projects in Filament will look like a combination of blocks, some hand-optimized, some from accelerators, and even analog circuitry with RTL models, all represented by Filament interfaces and easily connected together without requiring the user to do mostly menial registering and manual pipelining.
For example, take a fixed point multiply add implemented entirely in filament:
comp FixedMultiplyAdd[W,D]<'G:1>(X:['G,'G+1]W,Y:['G,'G+1]W,Z:['G,'G+1]W,) -> (
out:['G+1,'G+2]W){
mult := Mult[W,2*W](X,Y)// 2*W being the output width here
ext := SignExtend[W,2*W](Z);// Extend Z before adding
add := Add[2*W,2*W](mult.out, ext.out);
sliced := Slice[2*W,2*W-D-1,W-D](add.out);// Slice the necessary bitsR:= sliced.out;}
Assuming that the multiply and add cannot both fit in a single cycle, we want Filament to be able to figure out that the second of the following two possible schedules is optimal.
// Extend first and then register
X * Y -> X*Y
(2W) + slice(X*Y+Z)
ext(Z) -> Z
// Register and then extend
X * Y -> X*Y
(2W) + slice(X*Y+Z)
Z -> ext(Z)
Furthermore, if instead we provide an unbound existential parameter to the compiler, we want it to figure out an optimal value for us, such as in the case of latency, where Filament should be able to find the optimal latency for a given frequency/technology specification:
comp FixedMultiplyAdd[W,D]<'G:1>(X:['G,'G+1]W,Y:['G,'G+1]W,Z:['G,'G+1]W,) -> (
out:['G+L,'G+L]W) with {
some LwhereL >= 0;}{
mult := Mult[W,2*W](X,Y)// 2*W being the output width here
ext := SignExtend[W,2*W](Z);// Extend Z before adding
add := Add[2*W,2*W](mult.out, ext.out);
sliced := Slice[2*W,2*W-D-1,W-D](add.out);// Slice the necessary bitsR:= sliced.out;}
This would mean
First, allowing some form of automated pipelining. For now, we can ignore resource usage, and only worry about inserting registers.
Then, we should support minimizing for register usage (bitwidths).
Next, we should replace certain primitives with "true" primitives - give them proper latency information and resource usage estimates (Could start with just a handmade model for a specific FPGA). This would also mean that we would allow the compiler to violate certain resource limitations in the language, as currently in Filament every instance is explicit in the language (and separate from an invocation). Perhaps this could be enabled/disabled via an attribute.
Further thing like nested component unrolling could be even more interesting.
HLS Filament would allow for constrained optimization, allowing optimized blocks to fit into the timing constraints of a larger system, rather than forcing routing logic to work around optimized blocks instead.
The text was updated successfully, but these errors were encountered:
One of the best parts of Filament is the amount of control it provides the user by virtue of being an HDL. At the same time, however, the
gen
framework enables filament to hook up with accelerator generators, and in general some designs may simply be better left to a program to optimize rather than by hand. Although we could use external tools for every such component, one of their limitations at the moment is the inability to optimize in a controlled manner, as input and output interfaces are usually simple (inputs at the same time, outputs as well) and are not constrainable by the user.Ideally, we would like to avoid forcing the user to deal with all this routing code when it is not performance-critical, so for this I propose adding an HLS-like system (possibly based on SDC modules) to do this sort of optimization automatically.
With this system, I hope in the future manually optimized Filament will be less common, and instead projects in Filament will look like a combination of blocks, some hand-optimized, some from accelerators, and even analog circuitry with RTL models, all represented by Filament interfaces and easily connected together without requiring the user to do mostly menial registering and manual pipelining.
For example, take a fixed point multiply add implemented entirely in filament:
Assuming that the multiply and add cannot both fit in a single cycle, we want Filament to be able to figure out that the second of the following two possible schedules is optimal.
Furthermore, if instead we provide an unbound existential parameter to the compiler, we want it to figure out an optimal value for us, such as in the case of latency, where Filament should be able to find the optimal latency for a given frequency/technology specification:
This would mean
HLS Filament would allow for constrained optimization, allowing optimized blocks to fit into the timing constraints of a larger system, rather than forcing routing logic to work around optimized blocks instead.
The text was updated successfully, but these errors were encountered: