This repository has been archived by the owner on Aug 30, 2024. It is now read-only.
Intel® Neural Speed v0.3 Release
Highlights
Improvements
Examples
Bug Fixing
Validated Configurations
Highlights
- Contributed GPT-J inference to MLPerf v4.0 submission (mlperf commits)
- Enabled 3-bit low precision inference (ee40f28)
Improvements
- Optimization of Layernormalization (98ffee45)
- Update Qwen python API (51088a)
- Load processed model automatically (662553)
- Support continuous batching in Offline and Server (66cb9f5)
- Support loading models from HF directly (bb80273)
- Support autoround (e2d3652)
- Enable OMP in BesTLA (3afae427)
- Enable log with NEURAL_SPEED_VERBOSE (a8d9e7)
- Add YaRN rope scaling data structure (8c846d6)
- Improvements targeting Windows (464239)
Examples
- Enable Qwen 1.8B (ea4b713)
- Enable Phi-2, Phi-1.5 and Phi-1 (c212d8)
- Support 3bits & 4bits GPTQ for Gpt-j 6B (4c9070)
- Support Solar 10.7B with GPTQ (26c68c7, 90f5cbd)
- Support Qwen GGUF inference (cd67b92)
Bug Fixing
- Fix log-level introduced perf problem (6833b2f, 6f85518f)
- Fix straightforward-API issues (4c082b7)
- Fix a blocker on Windows platforms (4adc15)
- Fix whisper python API. (c97dbe)
- Fix Qwen loading & Mistral GPTQ convert (d47984c)
- Fix clang-tidy issues (ad54a1f)
- Fix Mistral online loading issues (0470b1f)
- Handles models that require a HF token access ID (33ffaf07)
- Fix the GGUF convert issue (5293ffa5)
- Fix GPTQ & AWQ convert issue (150e752)
Validated Configurations
- Python 3.10
- Ubuntu 22.04