This repository has been archived by the owner on Aug 30, 2024. It is now read-only.

Intel® Neural Speed v0.3 Release

kevinintel released this 23 Feb 12:57

· 107 commits to main since this release

Highlights
Improvements
Examples
Bug Fixing
Validated Configurations

Highlights

Contributed GPT-J inference to MLPerf v4.0 submission (mlperf commits)
Enabled 3-bit low precision inference (ee40f28)

Improvements

Optimization of Layernormalization (98ffee45)
Update Qwen python API (51088a)
Load processed model automatically (662553)
Support continuous batching in Offline and Server (66cb9f5)
Support loading models from HF directly (bb80273)
Support autoround (e2d3652)
Enable OMP in BesTLA (3afae427)
Enable log with NEURAL_SPEED_VERBOSE (a8d9e7)
Add YaRN rope scaling data structure (8c846d6)
Improvements targeting Windows (464239)

Examples

Enable Qwen 1.8B (ea4b713)
Enable Phi-2, Phi-1.5 and Phi-1 (c212d8)
Support 3bits & 4bits GPTQ for Gpt-j 6B (4c9070)
Support Solar 10.7B with GPTQ (26c68c7, 90f5cbd)
Support Qwen GGUF inference (cd67b92)

Bug Fixing

Fix log-level introduced perf problem (6833b2f, 6f85518f)
Fix straightforward-API issues (4c082b7)
Fix a blocker on Windows platforms (4adc15)
Fix whisper python API. (c97dbe)
Fix Qwen loading & Mistral GPTQ convert (d47984c)
Fix clang-tidy issues (ad54a1f)
Fix Mistral online loading issues (0470b1f)
Handles models that require a HF token access ID (33ffaf07)
Fix the GGUF convert issue (5293ffa5)
Fix GPTQ & AWQ convert issue (150e752)

Validated Configurations

Python 3.10
Ubuntu 22.04

Assets 2