Skip to content

Commit

Permalink
Merge branch 'master' into tj/plugin/template/test/compare-tensor-irdft
Browse files Browse the repository at this point in the history
  • Loading branch information
t-jankowski authored Nov 27, 2024
2 parents f6b83db + 4c4fb48 commit 225fdff
Show file tree
Hide file tree
Showing 263 changed files with 8,797 additions and 5,871 deletions.
6 changes: 3 additions & 3 deletions .github/actions/cache/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions docs/articles_en/about-openvino/release-notes-openvino.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ What's new

* New models supported: Llama 3.2 (1B & 3B), Gemma 2 (2B & 9B), and YOLO11.
* LLM support on NPU: Llama 3 8B, Llama 2 7B, Mistral-v0.2-7B, Qwen2-7B-Instruct and Phi-3
Mini-Instruct.
Mini-Instruct.
* Noteworthy notebooks added: Sam2, Llama3.2, Llama3.2 - Vision, Wav2Lip, Whisper, and Llava.
* Preview: support for Flax, a high-performance Python neural network library based on JAX.
Its modular design allows for easy customization and accelerated inference on GPUs.
Expand Down Expand Up @@ -87,8 +87,8 @@ Common
* A new constant constructor has been added, enabling constants to be created from data pointer
as shared memory. Additionally, it can take ownership of a shared, or other, object, avoiding
a two-step process to wrap memory into ``ov::Tensor``.
* Files are now read via the async ReadFile API, reducing the bottleneck for LLM model load
times on GPU.
* Asynchronous file reading with mmap library has been implemented, reducing loading times for
model files, especially for LLMs.
* CPU implementation of SliceScatter operator is now available, used for models such as Gemma,
supporting increased LLM performance.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ CPU
* Ubuntu 20.04 long-term support (LTS), 64-bit (Kernel 5.15+)
* macOS 12.6 and above, 64-bit and ARM64
* CentOS 7
* Red Hat Enterprise Linux 9.3-9.4, 64-bit
* Red Hat Enterprise Linux (RHEL) 8 and 9, 64-bit
* openSUSE Tumbleweed, 64-bit and ARM64
* Ubuntu 20.04 ARM64

Expand Down Expand Up @@ -65,7 +65,7 @@ GPU
* Ubuntu 22.04 long-term support (LTS), 64-bit
* Ubuntu 20.04 long-term support (LTS), 64-bit
* CentOS 7
* Red Hat Enterprise Linux 9.3-9.4, 64-bit
* Red Hat Enterprise Linux (RHEL) 8 and 9, 64-bit

.. tab-item:: Additional considerations

Expand Down
2 changes: 1 addition & 1 deletion docs/articles_en/get-started/install-openvino.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Install OpenVINO™ 2024.5

<script type="module" crossorigin src="../_static/selector-tool/assets/index-Codcw3jz.js"></script>
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<iframe id="selector" src="../_static/selector-tool/selector-451bede.html" style="width: 100%; border: none" title="Download Intel® Distribution of OpenVINO™ Toolkit"></iframe>
<iframe id="selector" src="../_static/selector-tool/selector-2a63478.html" style="width: 100%; border: none" title="Download Intel® Distribution of OpenVINO™ Toolkit"></iframe>

OpenVINO 2024.5, described here, is not a Long-Term-Support version!
All currently supported versions are:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,21 +20,22 @@ Install required dependencies:
pip install nncf==2.12 onnx==1.16.1 optimum-intel==1.19.0
pip install --pre openvino openvino-tokenizers openvino-genai --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly
NOTE that for systems based on Intel® Core Ultra Processors Series 2 and 16 GB of RAM,
prompts longer then 1024 characters will not work with a model of 7B or more parameters,
Note that for systems based on Intel® Core Ultra Processors Series 2, more than 16GB of RAM
may be required to run prompts over 1024 tokens on models exceeding 7B parameters,
such as Llama-2-7B, Mistral-0.2-7B, and Qwen-2-7B.

Export an LLM model via Hugging Face Optimum-Intel
##################################################

Since **symmetrically-quantized 4-bit (INT4) models are preffered for inference on NPU**, make sure to export
the model with the proper conversion and optimization settings.
Since **symmetrically-quantized 4-bit (INT4) models are preffered for inference on NPU**, make
sure to export the model with the proper conversion and optimization settings.

| You may export LLMs via Optimum-Intel, using one of two compression methods:
| **group quantization** - for both smaller and larger models,
| **channel-wise quantization** - remarkably effective but for models exceeding 1 billion parameters.
You select one of the methods by setting the ``--group-size`` parameter to either ``128`` or ``-1``, respectively. See the following examples:
You select one of the methods by setting the ``--group-size`` parameter to either ``128`` or
``-1``, respectively. See the following examples:

.. tab-set::

Expand Down
Binary file not shown.
Loading

0 comments on commit 225fdff

Please sign in to comment.