Releases: ARM-software/CMSIS-NN
v6.0.0
Release Notes
The following are the updates compared to previous release in CMSIS-NN v5.0.0
API Changes
- These are non backward compatible API change, hence the release has a major version update. Please refer to arm_nnfunctions.h for more details.
- Int32 bias support for int16x8 convolution - arm_convolve_wrapper_s16/arm_convolve_s16 parameters updated
- Int16 input convolution support for MVEI - removed arm_convolve_fast_s16
- LSTM reimplemention - most LSTM API functions replaced or updated
- API function arm_convolve_1_x_n_s8_get_buffer_size parameters updated
Performance Improvements
- Performance improvements for int4 DW convolution
- MVE Conv improvments by avoiding unaligned access
- LSTM reimplementation - overall improvements
- MVE Conv 1xN conv using im2col
New Features
- MVEI packed int4 kernel support in FC, convolution and DW convolution
- LSTM reimplemented to align with TFLM reference kernel.
- LSTM support for int16 input
- DSP/MVEI support for Transpose convolution
- Support for grouped convolutions
- Non zero filter offset support for FC
- Int16 input convolution support for MVEI
- Int32 bias support for int16x8 convolution
General Improvements
- Unit tests refactoring started
Full Changelog: v5.0.0...v6.0.0
v5.0.0
Release Notes
The following are the updates compared to previous release in CMSIS-NN v4.1.0
API Changes
- Improved read efficiency in FC for MVE extension.
- This is non backward compatible API change, hence the release has a major version update.
- The new api changes are arm_vector_sum_s8, arm_svdf_s8 and arm_svdf_s8_get_buffer_size_mve. Please refer to arm_nnfunctions.h for details.
Performance Improvements
- Improved read efficiency in FC for MVE extension.
- This also means FC and SVDF calculate kernel sums in prepare phase before actual inference, and because of this there may be an increase in memory usage for certain models.
New Features
- Packed int4 kernel support in FC, convolution and DW convolution for scalar version and DSP extension.
- Scalar/base support for new operator Transpose convolution.
General Improvements
- Extended unit test coverage.
Full Changelog: 23.08...v5.0.0
v4.1.0
Release Notes
The following are the updates compared to previous release in CMSIS-NN v4.0.0
Performance Improvements
- Improvements in LSTM, generic convolution, 1xN convolution, DW convolution and FC for MVE extension.
- Improvements in LSTM, generic convolution and int8/int16 elementwise mul for DSP extension.
New Features
- Script to extract model hyperparameters.
- Get size of buffers on host to support TVM use case.
- Dependency to CMSIS-Core is removed. CMSIS-NN can be built without including any other CMSIS module.
- A new DS_CNN_S model unit test is added that is used in End-to-End benchmark AudioMark.
General Improvements
- Extended unit test coverage.
Bug Fixes
- Potential out of buffer write in SVDF state data.
- Fix selection of correct int16 DW Convolution function.
- Workaround for a GCC 12.2 Internal Compiler Error affecting MVE.
- Fix error in buffer size calculation of DW Convolution wrapper for int8.
- Fix 'asm operand has impossible constraint' error for certain combination of GCC compiler related to MVE optimizations.
CMSIS-NN 4.0.0
Release Notes
The following are the updates compared to previous release in CMSIS 5.9.0
Return Type Change
The return type of all API's that returned a status is now changed. CMSIS-NN used error codes from CMSIS-DSP in the form of enum 'arm_status'. This is now replaced by enum 'arm_cmsis_nn_status'. The status values are still the same. It is reccomended that users change the return type in their applications.
Removal of Legacy Functions
Neural Network(NN) operators which do not follow the quantization specification of TensorFlow Lite for Microcontrollers is removed. Existing users can use CMSIS 5.9.0 release to continue using it.
As a consequence of this, the data type aliases q7_t, q15_t, q31_t and q63_t are replaced by int8_t, int16_t, int32_t, int64_t respectively.
New Operators
Scalar implementation of LSTM with unit tests. We plan to add optimizations for DSP extension and Multi Vector Extension(MVE) in the next release.
New Features
These are new optimizations to existing operators.
- DSP extension optimization for int16 average pooling
- MVE optimization for int16 max and average pooling
- MVE optimization for int16 add and mul
- MVE optimization for int16 fully connected
- MVE and DSP extension optimization for int16 depthwise convolution
- MVE and DSP extension optimization for non-unity stride 1x1 convolution
Performance Improvements
- 3x3 depthwise convolution for DSP extension
- 1x1 convolution for MVE