Skip to content
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.

Intel® Neural Speed v0.3 Release

Compare
Choose a tag to compare
@kevinintel kevinintel released this 23 Feb 12:57
· 107 commits to main since this release
150e752

Highlights
Improvements
Examples
Bug Fixing
Validated Configurations

Highlights

  • Contributed GPT-J inference to MLPerf v4.0 submission (mlperf commits)
  • Enabled 3-bit low precision inference (ee40f28)

Improvements

  • Optimization of Layernormalization (98ffee45)
  • Update Qwen python API (51088a)
  • Load processed model automatically (662553)
  • Support continuous batching in Offline and Server (66cb9f5)
  • Support loading models from HF directly (bb80273)
  • Support autoround (e2d3652)
  • Enable OMP in BesTLA (3afae427)
  • Enable log with NEURAL_SPEED_VERBOSE (a8d9e7)
  • Add YaRN rope scaling data structure (8c846d6)
  • Improvements targeting Windows (464239)

Examples

  • Enable Qwen 1.8B (ea4b713)
  • Enable Phi-2, Phi-1.5 and Phi-1 (c212d8)
  • Support 3bits & 4bits GPTQ for Gpt-j 6B (4c9070)
  • Support Solar 10.7B with GPTQ (26c68c7, 90f5cbd)
  • Support Qwen GGUF inference (cd67b92)

Bug Fixing

  • Fix log-level introduced perf problem (6833b2f, 6f85518f)
  • Fix straightforward-API issues (4c082b7)
  • Fix a blocker on Windows platforms (4adc15)
  • Fix whisper python API. (c97dbe)
  • Fix Qwen loading & Mistral GPTQ convert (d47984c)
  • Fix clang-tidy issues (ad54a1f)
  • Fix Mistral online loading issues (0470b1f)
  • Handles models that require a HF token access ID (33ffaf07)
  • Fix the GGUF convert issue (5293ffa5)
  • Fix GPTQ & AWQ convert issue (150e752)

Validated Configurations

  • Python 3.10
  • Ubuntu 22.04