Release TPU-MLIR v1.7 Release · sophgo/tpu-mlir

Change Log

Added support for new operations including flash attention, custom op dynamic compile, and tpulang ops.
Enabled AttnReorder and added support for dynamic indices in ops like onehot, scatterelements, and cumsum.
Added --dump_dataframe option for bmodel_checker and support for transpose with order [1, 2, 3, 0].
Introduced Watchpoint feature to TDB and added support for mixed-precision networks.
Implemented optimizations for dma efficiency of flash attention and optimized backend for various models.
Added support for local memory dump in pcie mode and added various quantization features like eva quant, swin quant, and detr quant.
Enhanced multi-core support including support for LayerNorm and GroupNorm in coreParallel, and multi-core data slice in tensorLocation.
Added new patterns for Cswin and Einsum operations.
Improved support for LLM (Large Language Models) in bm1688.

Fixed various bugs including kernel_module msg_id, SAM-VIT-encoder regression, and attention accuracy problems.
Addressed logical issues in AddToScale pattern and issues in fp_forward.
Resolved bugs in model info core dump, op's liveRange in coreParallel, and DevParallel bugs.
Fixed issues in model combine with io alone and bugs in various ops like interp, RotaryPosEmbPattern, and efficient-lite4 permute.

Updated documentation, refined custom chapters, and ensured consistency in quick start docs.
Added test cases for custom tpulang, multi-core with subnets, and custom cpuop.
Fixed various documentation errors and updated the release note.

Added restrictions to tpulang ops and net test cases.
Adjusted descriptions and refined interfaces for better user experience.
Updated backend .so files and addressed sensitive words in the codebase.
Added support for int4 dtype in tpu_profile and ensured tool/scripts work in Python virtual environments.