Skip to content

TPU-MLIR v1.7 Release

Compare
Choose a tag to compare
@charlesxzb charlesxzb released this 19 Apr 09:58
· 3813 commits to master since this release

Change Log

New Features

  • Added support for new operations including flash attention, custom op dynamic compile, and tpulang ops.
  • Enabled AttnReorder and added support for dynamic indices in ops like onehot, scatterelements, and cumsum.
  • Added --dump_dataframe option for bmodel_checker and support for transpose with order [1, 2, 3, 0].
  • Introduced Watchpoint feature to TDB and added support for mixed-precision networks.
  • Implemented optimizations for dma efficiency of flash attention and optimized backend for various models.
  • Added support for local memory dump in pcie mode and added various quantization features like eva quant, swin quant, and detr quant.
  • Enhanced multi-core support including support for LayerNorm and GroupNorm in coreParallel, and multi-core data slice in tensorLocation.
  • Added new patterns for Cswin and Einsum operations.
  • Improved support for LLM (Large Language Models) in bm1688.

Bug Fixes

  • Fixed various bugs including kernel_module msg_id, SAM-VIT-encoder regression, and attention accuracy problems.
  • Addressed logical issues in AddToScale pattern and issues in fp_forward.
  • Resolved bugs in model info core dump, op's liveRange in coreParallel, and DevParallel bugs.
  • Fixed issues in model combine with io alone and bugs in various ops like interp, RotaryPosEmbPattern, and efficient-lite4 permute.

Performance Improvements

  • Improved the performance of TDB and the bmodel_checker for 1684x pcie.
  • Optimized facenet and fixed performance issues of 1688 multicore.
  • Enabled single-core mode optimizations where necessary.

Documentation and Testing

  • Updated documentation, refined custom chapters, and ensured consistency in quick start docs.
  • Added test cases for custom tpulang, multi-core with subnets, and custom cpuop.
  • Fixed various documentation errors and updated the release note.

Other Changes

  • Added restrictions to tpulang ops and net test cases.
  • Adjusted descriptions and refined interfaces for better user experience.
  • Updated backend .so files and addressed sensitive words in the codebase.
  • Added support for int4 dtype in tpu_profile and ensured tool/scripts work in Python virtual environments.