Skip to content
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.

Intel® Neural Speed v0.2 Release

Compare
Choose a tag to compare
@kevinintel kevinintel released this 22 Jan 14:41
· 138 commits to main since this release
abcc0f4

Highlights
Improvements
Examples
Bug Fixing
Validated Configurations

Highlights

  • Support Q4_0, Q5_0 and Q8_0 GGUF models and AWQ
  • Enhance Tensor Parallelism with shared memory in multi-sockets in single node

Improvements

  • Rename Bestla files and their usage (d5c26d4 )
  • Update Python API and reorg scripts (40663e )
  • Enable AWQ with Llama2 example (9be307f )
  • Enable clang tidy (227e89 )
  • TP support multi-node (6dbaa0 )
  • Support accuracy calculation for GPTQ models (7b124aa )
  • Enable log with NEURAL_SPEED_VERBOSE (a8d9e7)

Examples

  • Add Magicoder example (749caca )
  • Enable whisper large example (24b270 )
  • Add Docker file and Readme (f57d4e1 )
  • Support multi-batch ChatGLM-V1 inference (c9fb9d)

Bug Fixing

  • Fix avx512-s8-dequant and asymmetric related bug (fad80b14 )
  • Fix warmup prompt length and add ns_log_level control (070b6b )
  • Fix convert: remove hardcode of AWQ (7729bb )
  • Fix the ChatGLM convert issue. (7671467 )
  • Fix Bestla windows compile issue (760e5f )

Validated Configurations

  • Python 3.10
  • Ubuntu 22.04