Skip to content

TPU-MLIR v1.6 release

Compare
Choose a tag to compare
@luluman luluman released this 23 Feb 16:56

Change Log

Bug Fixes

  • Fixed documentation errors and added checks for documentation errors during build.
  • Set workaround for ar.copy cycle issue to 0, avoiding potential data overwriting in inplacing operations.
  • Addressed a bug in Caffe DetectionOutput and fixed a hang in cv186x.
  • Corrected Mul buffer size alignment issues and various other buffer size corrections.
  • Fixed issues with attention accuracy, RotaryPosEmbPattern, and op status validation before the matching process.
  • Addressed a series of backend bugs, including daily build errors, performance declines, and incorrect return values.
  • Fixed data_checker issues, api_conv bug, and a local slice calculation bug.
  • Resolved incorrect affineMap for Pooling buffer and fixed reshape bug for inner products.
  • Corrected Mul&Div dynamic support for local operations and fixed issues with Conv2d buffer size calculations.
  • Addressed various matmul bugs, including fp8 support issues and quantization inconsistencies.

Features

  • Enabled multicore optimizations and added support for multi-core model tests.
  • Updated libbackend_1688.so and various backend updates for better performance and compatibility.
  • Introduced groupParallel operation, support for dynamic input data generation.
  • Added support for new patterns such as Permute fuse pattern and splitQuantizedMLP pattern.
  • Implemented npz compare visualizer tool and added support for bm1688 backend.
  • Added MatMul weight split case and improved permute performance.
  • Added support for img2col pattern, attention interface, and several dialects for SG2260 operations.

Documentation Updates

  • Updated release notes and resolved issues with document formatting.
  • Standardized expression terminology and replaced sensitive words in documentation.

Performance Improvements

  • Improved local softmax performance and optimized dataFlow checking in coreMatch.
  • Enhanced performance for Vit L i8 4 batch operations and refined conv multi-core handling.
  • Optimized VIT-B concurrency and addressed performance issues with MaxPool buffer sizes.