Releases
v1.7
Change Log
New Features
Added support for new operations including flash attention, custom op dynamic compile, and tpulang ops.
Enabled AttnReorder and added support for dynamic indices in ops like onehot, scatterelements, and cumsum.
Added --dump_dataframe
option for bmodel_checker and support for transpose with order [1, 2, 3, 0]
.
Introduced Watchpoint feature to TDB and added support for mixed-precision networks.
Implemented optimizations for dma efficiency of flash attention and optimized backend for various models.
Added support for local memory dump in pcie mode and added various quantization features like eva quant, swin quant, and detr quant.
Enhanced multi-core support including support for LayerNorm and GroupNorm in coreParallel, and multi-core data slice in tensorLocation.
Added new patterns for Cswin and Einsum operations.
Improved support for LLM (Large Language Models) in bm1688.
Bug Fixes
Fixed various bugs including kernel_module msg_id, SAM-VIT-encoder regression, and attention accuracy problems.
Addressed logical issues in AddToScale pattern and issues in fp_forward.
Resolved bugs in model info core dump, op's liveRange in coreParallel, and DevParallel bugs.
Fixed issues in model combine with io alone and bugs in various ops like interp, RotaryPosEmbPattern, and efficient-lite4 permute.
Performance Improvements
Improved the performance of TDB and the bmodel_checker for 1684x pcie.
Optimized facenet and fixed performance issues of 1688 multicore.
Enabled single-core mode optimizations where necessary.
Documentation and Testing
Updated documentation, refined custom chapters, and ensured consistency in quick start docs.
Added test cases for custom tpulang, multi-core with subnets, and custom cpuop.
Fixed various documentation errors and updated the release note.
Other Changes
Added restrictions to tpulang ops and net test cases.
Adjusted descriptions and refined interfaces for better user experience.
Updated backend .so files and addressed sensitive words in the codebase.
Added support for int4 dtype in tpu_profile and ensured tool/scripts work in Python virtual environments.
You can’t perform that action at this time.