This repository has been archived by the owner on Aug 30, 2024. It is now read-only.

Intel® Neural Speed v0.2 Release

kevinintel released this 22 Jan 14:41

· 138 commits to main since this release

Highlights
Improvements
Examples
Bug Fixing
Validated Configurations

Highlights

Support Q4_0, Q5_0 and Q8_0 GGUF models and AWQ
Enhance Tensor Parallelism with shared memory in multi-sockets in single node

Improvements

Rename Bestla files and their usage (d5c26d4 )
Update Python API and reorg scripts (40663e )
Enable AWQ with Llama2 example (9be307f )
Enable clang tidy (227e89 )
TP support multi-node (6dbaa0 )
Support accuracy calculation for GPTQ models (7b124aa )
Enable log with NEURAL_SPEED_VERBOSE (a8d9e7)

Examples

Add Magicoder example (749caca )
Enable whisper large example (24b270 )
Add Docker file and Readme (f57d4e1 )
Support multi-batch ChatGLM-V1 inference (c9fb9d)

Bug Fixing

Fix avx512-s8-dequant and asymmetric related bug (fad80b14 )
Fix warmup prompt length and add ns_log_level control (070b6b )
Fix convert: remove hardcode of AWQ (7729bb )
Fix the ChatGLM convert issue. (7671467 )
Fix Bestla windows compile issue (760e5f )

Validated Configurations

Python 3.10
Ubuntu 22.04

Assets 2