Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit pointwise Conv1D implementation for "Latency" strategy #811

Conversation

jmduarte
Copy link
Member

@jmduarte jmduarte commented Jun 14, 2023

Description

This is mostly for discussion and to let others test it out like @Duchstf. This PR adds an explicit pointwise Conv1D implementation, where the reuse factor (RF) is used to split the layer execution and reuse the existing module RF times

Original pointwise Conv1D:

  • (in_width, n_chan) -> (in_width, n_filt)

This PR splits it into RF calls of

  • (in_width/RF, n_chan) -> (in_width/RF, n_filt)
  • (in_width/RF, n_chan) -> (in_width/RF, n_filt)
  • (in_width/RF, n_chan) -> (in_width/RF, n_filt)
  • ...

The II ~ RF. To turn it on you have to configure ConvImplementation of the layer named <layer>

config["LayerName"]["<layer>"]["ConvImplementation"] = "Pointwise"

Limitations:

  • Assumes in_width is divisible by RF
  • Hardcoded explicit execution up to RF = 120. Could be automated with code generation.

Type of change

  • New feature (non-breaking change which adds functionality)
  • A new research paper code implementation

Tests

See test/pytest/test_pointwiseconv.py

Checklist

  • I have read the guidelines for contributing.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • I have installed and run pre-commit on the files I edited or added.
  • I have added tests that prove my fix is effective or that my feature works.

@jmduarte
Copy link
Member Author

pre-commit.ci autofix

@jmduarte jmduarte added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Jun 14, 2023
@Duchstf
Copy link
Member

Duchstf commented Jun 20, 2023

@jmduarte Can you re-base this to the lastest changes from #815? I'd love to try this out after that!

@Duchstf
Copy link
Member

Duchstf commented Jun 20, 2023

@jmduarte I'm actually trying this out now, but I just realized it is in vivado, is it possible to update this to vitis?? I would be happy to contribute if you want!!

Comment on lines +188 to +218
pointwise_conv_1d_latency_cl<data_T, res_T, CONFIG_T>(data_tmp[0], res_tmp[0], weights, biases);
pointwise_conv_1d_latency_cl<data_T, res_T, CONFIG_T>(data_tmp[1], res_tmp[1], weights, biases);
if (CONFIG_T::reuse_factor > 2)
pointwise_conv_1d_latency_cl<data_T, res_T, CONFIG_T>(data_tmp[2], res_tmp[2], weights, biases);
if (CONFIG_T::reuse_factor > 3)
pointwise_conv_1d_latency_cl<data_T, res_T, CONFIG_T>(data_tmp[3], res_tmp[3], weights, biases);
if (CONFIG_T::reuse_factor > 4)
pointwise_conv_1d_latency_cl<data_T, res_T, CONFIG_T>(data_tmp[4], res_tmp[4], weights, biases);
if (CONFIG_T::reuse_factor > 5)
pointwise_conv_1d_latency_cl<data_T, res_T, CONFIG_T>(data_tmp[5], res_tmp[5], weights, biases);
if (CONFIG_T::reuse_factor > 6)
pointwise_conv_1d_latency_cl<data_T, res_T, CONFIG_T>(data_tmp[6], res_tmp[6], weights, biases);
if (CONFIG_T::reuse_factor > 7)
pointwise_conv_1d_latency_cl<data_T, res_T, CONFIG_T>(data_tmp[7], res_tmp[7], weights, biases);
if (CONFIG_T::reuse_factor > 8)
pointwise_conv_1d_latency_cl<data_T, res_T, CONFIG_T>(data_tmp[8], res_tmp[8], weights, biases);
if (CONFIG_T::reuse_factor > 9)
pointwise_conv_1d_latency_cl<data_T, res_T, CONFIG_T>(data_tmp[9], res_tmp[9], weights, biases);
if (CONFIG_T::reuse_factor > 10)
pointwise_conv_1d_latency_cl<data_T, res_T, CONFIG_T>(data_tmp[10], res_tmp[10], weights, biases);
if (CONFIG_T::reuse_factor > 11)
pointwise_conv_1d_latency_cl<data_T, res_T, CONFIG_T>(data_tmp[11], res_tmp[11], weights, biases);
if (CONFIG_T::reuse_factor > 12)
pointwise_conv_1d_latency_cl<data_T, res_T, CONFIG_T>(data_tmp[12], res_tmp[12], weights, biases);
if (CONFIG_T::reuse_factor > 13)
pointwise_conv_1d_latency_cl<data_T, res_T, CONFIG_T>(data_tmp[13], res_tmp[13], weights, biases);
if (CONFIG_T::reuse_factor > 14)
pointwise_conv_1d_latency_cl<data_T, res_T, CONFIG_T>(data_tmp[14], res_tmp[14], weights, biases);
if (CONFIG_T::reuse_factor > 15)
pointwise_conv_1d_latency_cl<data_T, res_T, CONFIG_T>(data_tmp[15], res_tmp[15], weights, biases);
if (CONFIG_T::reuse_factor > 16)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if there is a better way to do this ...

Copy link
Member Author

@jmduarte jmduarte Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we can use the code-generation machinery like this:

class GenerateConvIm2col(OptimizerPass):
'''Generates tcode for im2col step of 1D/2d convolution'''
def match(self, node):
return isinstance(node, (Conv1D, Conv2D)) and node.model.config.get_config_value('IOType') == 'io_parallel'
def transform(self, model, node):
node_class = node.__class__.__name__
if '1D' in node_class:
self._generate_im2col_1d(node)
elif '2D' in node_class:
self._generate_im2col_2d(node)
else:
raise Exception(f'Cannot generate instructions for node {node.name} ({node_class})')
def _generate_im2col_1d(self, node):
code_str = node.model.config.backend.generate_conv1d_line_buffer_fn(
node.get_attr('index'),
node.get_attr('n_partitions'),
node.get_input_variable().shape[0],
node.get_input_variable().shape[1],
kernel=node.get_attr('filt_width'),
stride=node.get_attr('stride_width'),
pad=(node.get_attr('pad_left'), node.get_attr('pad_right')),
)
node.set_attr('line_buffer_codegen', Source(code_str))
def _generate_im2col_2d(self, node):
code_str = node.model.config.backend.generate_conv2d_line_buffer_fn(
node.get_attr('index'),
node.get_attr('n_partitions'),
node.get_input_variable().shape[0],
node.get_input_variable().shape[1],
node.get_input_variable().shape[2],
kernel=(node.get_attr('filt_height'), node.get_attr('filt_width')),
stride=(node.get_attr('stride_height'), node.get_attr('stride_width')),
pad=(
node.get_attr('pad_top'),
node.get_attr('pad_bottom'),
node.get_attr('pad_left'),
node.get_attr('pad_right'),
),
)
node.set_attr('line_buffer_codegen', Source(code_str))

def generate_conv1d_line_buffer_fn(self, layer_idx, n_partitions, in_W, in_C, kernel=3, stride=1, pad=0, dilation=1):
"""Generate a C++ function that mimics the im2col algorithm. This function works for 1D convolution.
The HLS compiler produces suboptimal designs for a im2col algorithm implementation, so a trick we use is
to generate a resulting a result of im2col transformation explicitly, instead of relying on loops. Since
the result depends on the paraleters of the convolution layer (the input size, the kernel size, stride etc),
we need to do this for every convolution layer.
Args:
layer_idx (int): Index of layer ('index' attribute).
n_partitions (int): Number of partitions to divide the input into.
The pixels in each partition will be processed in parallel.
in_W (int): Width of input.
in_C (int): Number of channels.
kernel (int, optional): Size of the kernel. Defaults to 3.
stride (int, optional): Stride length. Defaults to 1.
pad (int or Iterable, optional): Padding to apply. Defaults to 0.
Specified as either a number or a list [left_pad, right_pad].
dilation (int, optional): Dilation rate. Defaults to 1.
Returns:
str: Generated C++ function
"""
if isinstance(pad, Iterable):
pad_left = pad[0]
pad_right = pad[1]
else:
pad_left = pad
pad_right = pad
im2col_matrix = self._compute_conv1d_im2col((in_W, in_C), kernel, stride, (pad_left, pad_right), dilation)
generated_code = (
"template<class data_T, typename CONFIG_T>\n"
"class fill_buffer_{index} : public FillConv1DBuffer<data_T, CONFIG_T> {{\n"
" public:\n"
" static void fill_buffer(\n"
" data_T data[CONFIG_T::in_width * CONFIG_T::n_chan],\n"
" data_T buffer[CONFIG_T::n_pixels][CONFIG_T::filt_width * CONFIG_T::n_chan],\n"
" const unsigned partition\n"
" ) {{\n"
).format(index=layer_idx)
indent = ' '
for partition_idx, partition in enumerate(np.split(im2col_matrix, n_partitions)):
generated_code += indent * 2 + f'if (partition == {partition_idx:>3}) {{\n'
for pixel_idx, arr in enumerate(partition):
buffer_stmts = []
for j, v in enumerate(arr):
if v == 0:
val = '0'
else:
val = f'data[{int(v - 1)}]'
buffer_stmts.append(f'buffer[{pixel_idx}][{j}] = {val:>10};')
generated_code += indent * 3 + ' '.join(buffer_stmts) + '\n'
generated_code += '\n' + indent * 2 + '}\n'
generated_code += indent + '}\n'
generated_code += '};\n'
return generated_code

Copy link
Member Author

@jmduarte jmduarte Oct 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vloncar @Duchstf I started this branch to use the code generation machinery: jmduarte#20

direct diff w.r.t. main: main...jmduarte:split_pointwise_conv_by_rf_codegen
#881

is this better than the current approach?

@jmduarte jmduarte force-pushed the split_pointwise_conv_by_rf_rebase_latest branch from 90f9e10 to 56797e7 Compare June 21, 2023 14:24
@jmduarte jmduarte added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Jun 21, 2023
@jmitrevs jmitrevs added this to the v0.8.0 milestone Aug 11, 2023
@jmduarte jmduarte added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Sep 8, 2023
@jmduarte jmduarte added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Sep 12, 2023
@jmduarte jmduarte changed the title Explicit Pointwise Conv1D implementaiton for "Latency" strategy Explicit pointwise Conv1D implementation for "Latency" strategy Oct 8, 2023
@jmduarte jmduarte added please test Trigger testing by creating local PR branch and removed please test Trigger testing by creating local PR branch labels Oct 9, 2023
@jmduarte
Copy link
Member Author

Superseded by #881

@jmduarte jmduarte closed this Oct 20, 2023
JanFSchulte added a commit that referenced this pull request Dec 4, 2024
Pointwise Conv1D with code generation for "Latency" strategy (update of #811)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
please test Trigger testing by creating local PR branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants