Skip to content

Technical Preview

Pre-release
Pre-release
Compare
Choose a tag to compare
@luluman luluman released this 03 Nov 10:00
· 3904 commits to master since this release

TPU-MLIR Project Update

Bug Fixes and Dependency Updates

  • Fix Dependency: Fixed the dependency of MLIRInputConversion.
  • SDK Release Workflow: Fixed tpu-mlir tag for building and added workflow file for SDK release.
  • Softplus LoweringINT8: Fixed 1684 Softplus LoweringINT8 issue.
  • Slice Begin Index: Fixed bm1684 slice begin_index problem.
  • Mul Conflict Resolution: Partially fixed the output data sign of mul conflict with chip restriction.

Feature Enhancements and Support

  • Subgraph Split Support: Enhanced support for subgraph split.
  • Quant IO List Note: Added quant io list note for better quantization handling.
  • New Full Operation: Supported the aten::new_full operation.
  • Torch Flip for bm1684x: Added support for torch.flip for bm1684x.
  • Weight Input Shape Bind: Supported shape bind for weight input.

Updates and Implementations for Specific Operations

  • Backend Update for sg2260: Updated sg2260 for backend for tag31.
  • ScatterElements Implementation: Implemented ScatterElements for any axis.
  • Unary Indexing Map: Added unary indexing map.
  • Binary Indexing Map: Added binary (add/sub/mul/div/min/max) indexing map.
  • Dynamic NMS Support: Featured support for dynamic nms for bm1684x.

Codebase and Documentation Refinements

  • Cleanup: Removed test/sg2260 dialect.
  • Documentation Update: Updated nntoolchain README and lib.
  • Codegen Documentation: Added documentation for codegen.
  • Template Format Update: Updated import mlir file template format.
  • Quick Start Docs Modification: Modified quick start docs for tpu-mlir.

Optimizations and Performance Improvements

  • Kernel Module Usage: Reverted to using the old kernel module.
  • MLIR Conv2D Optimization: Improved 1684 mlir conv2d with 3ic optimization.
  • SWINT Quantization: Added swint quant for better performance.
  • Opt Parameter Addition: Added an optimization parameter.
  • Loop and Fusion Enhancements: Supported interchange of inner loop, padOp transform, tensor op collapse, fusion on linalg-on-tensor, etc.