Investigating Yosys's n-bit adder designs for ECP5 #59

bkushigian · 2022-05-25T19:58:58Z

bkushigian
May 25, 2022
Collaborator

Reverse engineering Yosys's n-bit adders for ECP5

I want to understand how Yosys is creating their adders. This has potential benefits:

See how different features of ECP5 architecture are used (e.g., carry
chains?)
Get a feeling for possible design choices that aren't obvious at first glance
Become familiar with the output format so I can target it when I compile our
modules

Each adder I inspect is written in behavioral Verilog and translated to
structural Verilog with the script:

read -sv module.sv
hierarchy -top example
proc; opt; techmap; opt
synth_ecp5
write_verilog synth-ecp5.sv

I'll list the behavioral Verilog of each module as well as the structural
Verilog resulting from running Yosys on the behavioral Verilog.

A Quick Word on LUT4 Inputs

The inputs A..D to a LUT4 form a 4-bit number DBCA (notice they're
reversed!). Here D is the MSB and A is the LSB. We can label the bits by
their hex address. For an INIT value of 0x0FF0 we would have the following:

  NIBBLE |   3  |   2  |   1  |   0  | 
  -------+------+------+------+------+
  INIT   | 0000 | 1111 | 1111 | 0000 |
  ADDR   | fedc | ba98 | 7654 | 3210 |

I've broken each 16-bit number up into nibbles. This is very useful since

inputs D and C move across nibbles (inter-nibble inputs)
inputs B and A move within nibbles (intra-nibble inputs)

The following table shows how D and C select the nibble:

D	C	Nibble
0	0	0
0	1	1
1	0	2
1	1	3

I find it helpful to have this high level visualization because often times DC
and AB work together in 'intuitive' ways, and this translates to inter- vs
intra- nibble movement.

For instance, we'll see a lot of symmetric LUT4 INIT values, where the first
and last nibbles are the same and the middle two nibbles are the same. This
corresponds to the XOR of inputs C and D being important somehow.

We'll also see one or both of inputs A and B be held constant (e.g., to 0). This
means that we don't really care which value within a nibble we use.

2 Bit Adder

A 2-bit adder. This can in theory be handled by a single slice.

Behavior Verilog:

module example(input [1:0] a, input [1:0] b, output [1:0] out);
    assign out = a + b;
endmodule

Structural Verilog:

/* Generated by Yosys 0.16+41 ) */

module example(a, b, out);
  input  [1:0] a;
  wire   [1:0] a;
  input  [1:0] b;
  wire   [1:0] b;
  output [1:0] out;
  wire   [1:0] out;

  (* Compute the LSB of the output *)
  LUT4 #(
    .INIT(16'h0ff0)
  ) LUT4_0 (
    (* memory location = b[0] a[0] 0 0 *)
    .A(1'h0),
    .B(1'h0),
    .C(a[0]),
    .D(b[0]),
    .Z(out[0])
  );

  (* Compute the MSB of the output *)
  LUT4 #(
    .INIT(16'h8778)
  ) LUT4_1 (
    .A(a[0]),
    .B(b[0]),
    .C(a[1]),
    .D(b[1]),
    .Z(out[1])
  );
endmodule

This has the following structure:

Let's look at each of the LUTs:

`LUT4_0`

LUT4_0 computes the bitwise XOR of inputs a[0] and b[0] and wires it to
out[0]: notice that it's INIT value 0x0ff0 is an XOR truth table 0b0110
with each bit repeated 4 times:

    0000 1111 1111 0000
       ^    ^    ^    ^

Intra-nibble inputs A and B are always zero, so we always read the least
significant bit of a nibble (marked with ^ in the above figure), and the LSBs
of the nibbles are 0110 respectively (again, just an XOR truth table).

Inter-nibble inputs C and D choose which LSB to read: 0 when C = D and
1 otherwise (e.g., C XOR D).

`LUT4_1`

LUT4_1 is a bit more compilcated than LUT4_0 since it handles carry logic,
but if we view it as a slight modification of LUT4_0 it is actually very
simple.

Inter-nibble inputs C and D are wired to a[1] and b[1], and these will
act much like C and D did for a[0] and b[0] in the previous LUT.

Intra-nibble inputs A and B are wired to a[0] and b[0], and we need
these to reason about the carry value from the addition of a[0] and b[0].

We only need to carry when both a[0] and b[0] are 1, and this corresponds
to the most significant bit of each nibble. So when either a[0] or b[0] are
0 we perform the same calculation as in the previous LUT; when they are both 1 we perform that calculation and then flip the result.

This can be accomplished by simply negating the MSB of each nibble in LUT4_0's INIT value:

         0x0FF0
       = 0000 1111 1111 0000  
    ---> 1000 0111 0111 1000
       = 0x8778

In fact Yosys could (should?) have used this INIT value for LUT4_0 in the
first place!

4 Bit Adder

This one is more complicated, and I'll speculate on why in a moment. For now,
let's take what would seem to be a very easy solution: we can generalize what we
did above, right?

[==================    SLICE 0    ==================]

       0 -> A +--------+
       0 -> B |  LUT4  | Z ----->   out[0]
    a[0] -> C | 0x8778 |
    b[0] -> D +--------+
   
    a[0] -> A +--------+
    b[0] -> B |  LUT4  | Z ----->   out[1]
    a[1] -> C | 0x8778 |
    b[1] -> D +--------+
   
[==================    SLICE 1    ==================]

    a[1] -> A +--------+
    b[1] -> B |  LUT4  | Z ----->   out[2]
    a[2] -> C | 0x8778 |
    b[2] -> D +--------+
   
    a[2] -> A +--------+
    b[2] -> B |  LUT4  | Z ----->   out[3]
    a[3] -> C | 0x8778 |
    b[3] -> D +--------+

The problem with this (I think...) is that a[1] and b[1] are used across
slice boundaries. Is this disallowed?

Anyway, let's look at the Behavioral and Structural Verilog

Behavioral Verilog

module example(input [3:0] a, input [3:0] b, output [3:0] out);
  assign out = a + b;
endmodule

Structural Verilog

/* Generated by Yosys 0.16+41 */

module example(a, b, out);
  input  [3:0] a;
  wire   [3:0] a;
  input  [3:0] b;
  wire   [3:0] b;
  output [3:0] out;
  wire   [3:0] out;
  wire tmp_wire;
  wire WIRE_MUX_B;
  wire WIRE_MUX_A;

  LUT4 #(
    .INIT(16'h0ff0)
  ) LUT4_0 (
    .A(1'h0),
    .B(1'h0),
    .C(a[0]),
    .D(b[0]),
    .Z(out[0])
  );

  LUT4 #(
    .INIT(16'h8778)
  ) LUT4_1 (
    .A(a[0]),
    .B(b[0]),
    .C(a[1]),
    .D(b[1]),
    .Z(out[1])
  );

  LUT4 #(
    .INIT(16'h1777)
  ) LUT4_CROSS_SLICE_CARY (
    .A(a[1]),
    .B(b[1]),
    .C(a[0]),
    .D(b[0]),
    .Z(tmp_wire)
  );

  LUT4 #(
    .INIT(16'h3cc3)
  ) LUT4_out_2 (
    .A(1'h0),
    .B(tmp_wire),
    .C(a[2]),
    .D(b[2]),
    .Z(out[2])
  );

  LUT4 #(
    .INIT(16'hd42b)
  ) LUT4_to_MUX_B (
    .A(tmp_wire),
    .B(a[2]),
    .C(b[2]),
    .D(a[3]),
    .Z(WIRE_MUX_B)
  );

  LUT4 #(
    .INIT(16'h2bd4)
  ) LUT4_to_MUX_A (
    .A(tmp_wire),
    .B(a[2]),
    .C(b[2]),
    .D(a[3]),
    .Z(WIRE_MUX_A)
  );

  PFUMX MUX_out_3 (
    .ALUT(WIRE_MUX_B),
    .BLUT(WIRE_MUX_A),
    .C0(b[3]),
    .Z(out[3])
  );
endmodule

This code has the following layout:

This is more complicated than the 2-bit adder, but we can break it down:

Computing out[0] and out[1] are identical to the 2-bit adder.
To compute out[2] we need carry information from out[1]. This is handled provided LUT_CROSS_SLICE_CARRY. This table outputs 0 whenever the add at bit 1 results in a carry and 1 otherwise...in other words, this table stores the predicate "out[1] does NOT carry".

The output from the carry LUT is used in LUT_2, along with inputs a[2] and b[2]. The input to LUT_2.A is hardcoded to 0.
To compute out[3] we need carry info from out[2]. This could be handled in the same way as computing carry info from out[2] was, but Yosys opts to use another design. Yosys generates to LUTs, LUT_MUX_B and LUT_MUX_A.
- LUT_MUX_B computes the value of out[3] when b[3] = 1
- LUT_MUX_A computes the value of out[3] when b[3] = 0
Both LUT4s take the same inputs: LUT_CROSS_SLICE_CARRY, a[2], b[2], and a[3]. LUT_CROSS_SLICE_CARRY, a[2] and b[2] compute carry info from the add at bit 2, and a[3] is one of the bits to be added (the other being b[3]). Note that the INIT values of the two tables are bitwise negations of each other. Finally, these values are MUXed using b[3] as the selector bit, and the result is assigned to out[3].

Also interesting: note that Yosys is separating uses of input bits 0 and 1 from input bits 2 and 3 (e.g., a[1] and a[2] are not used in the same LUT). I don't know if this is necessary or not.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigating Yosys's n-bit adder designs for ECP5 #59

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Investigating Yosys's n-bit adder designs for ECP5 #59

bkushigian May 25, 2022 Collaborator

Reverse engineering Yosys's n-bit adders for ECP5

A Quick Word on LUT4 Inputs

2 Bit Adder

LUT4_0

LUT4_1

4 Bit Adder

Behavioral Verilog

Structural Verilog

Replies: 0 comments

bkushigian
May 25, 2022
Collaborator

`LUT4_0`

`LUT4_1`