This repository has been archived by the owner on Aug 30, 2024. It is now read-only.
Intel® Neural Speed v0.2 Release
Highlights
Improvements
Examples
Bug Fixing
Validated Configurations
Highlights
- Support Q4_0, Q5_0 and Q8_0 GGUF models and AWQ
- Enhance Tensor Parallelism with shared memory in multi-sockets in single node
Improvements
- Rename Bestla files and their usage (d5c26d4 )
- Update Python API and reorg scripts (40663e )
- Enable AWQ with Llama2 example (9be307f )
- Enable clang tidy (227e89 )
- TP support multi-node (6dbaa0 )
- Support accuracy calculation for GPTQ models (7b124aa )
- Enable log with NEURAL_SPEED_VERBOSE (a8d9e7)
Examples
- Add Magicoder example (749caca )
- Enable whisper large example (24b270 )
- Add Docker file and Readme (f57d4e1 )
- Support multi-batch ChatGLM-V1 inference (c9fb9d)
Bug Fixing
- Fix avx512-s8-dequant and asymmetric related bug (fad80b14 )
- Fix warmup prompt length and add ns_log_level control (070b6b )
- Fix convert: remove hardcode of AWQ (7729bb )
- Fix the ChatGLM convert issue. (7671467 )
- Fix Bestla windows compile issue (760e5f )
Validated Configurations
- Python 3.10
- Ubuntu 22.04