Technical Preview
Pre-release
Pre-release
TPU-MLIR Project Update
Bug Fixes and Dependency Updates
- Fix Dependency: Fixed the dependency of MLIRInputConversion.
- SDK Release Workflow: Fixed tpu-mlir tag for building and added workflow file for SDK release.
- Softplus LoweringINT8: Fixed 1684 Softplus LoweringINT8 issue.
- Slice Begin Index: Fixed bm1684 slice begin_index problem.
- Mul Conflict Resolution: Partially fixed the output data sign of mul conflict with chip restriction.
Feature Enhancements and Support
- Subgraph Split Support: Enhanced support for subgraph split.
- Quant IO List Note: Added quant io list note for better quantization handling.
- New Full Operation: Supported the aten::new_full operation.
- Torch Flip for bm1684x: Added support for torch.flip for bm1684x.
- Weight Input Shape Bind: Supported shape bind for weight input.
Updates and Implementations for Specific Operations
- Backend Update for sg2260: Updated sg2260 for backend for tag31.
- ScatterElements Implementation: Implemented ScatterElements for any axis.
- Unary Indexing Map: Added unary indexing map.
- Binary Indexing Map: Added binary (add/sub/mul/div/min/max) indexing map.
- Dynamic NMS Support: Featured support for dynamic nms for bm1684x.
Codebase and Documentation Refinements
- Cleanup: Removed test/sg2260 dialect.
- Documentation Update: Updated nntoolchain README and lib.
- Codegen Documentation: Added documentation for codegen.
- Template Format Update: Updated import mlir file template format.
- Quick Start Docs Modification: Modified quick start docs for tpu-mlir.
Optimizations and Performance Improvements
- Kernel Module Usage: Reverted to using the old kernel module.
- MLIR Conv2D Optimization: Improved 1684 mlir conv2d with 3ic optimization.
- SWINT Quantization: Added swint quant for better performance.
- Opt Parameter Addition: Added an optimization parameter.
- Loop and Fusion Enhancements: Supported interchange of inner loop, padOp transform, tensor op collapse, fusion on linalg-on-tensor, etc.