From f7393be074a92954ec6767364018817cbee92ced Mon Sep 17 00:00:00 2001
From: Nikita Malinin <nikita.malinin@intel.com>
Date: Mon, 18 Nov 2024 16:53:42 +0100
Subject: [PATCH] Update ReleaseNotes.md

---
 ReleaseNotes.md | 64 +++++++++++++++++++------------------------------
 1 file changed, 24 insertions(+), 40 deletions(-)

diff --git a/ReleaseNotes.md b/ReleaseNotes.md
index da00a15cf7a..ea26992649f 100644
--- a/ReleaseNotes.md
+++ b/ReleaseNotes.md
@@ -4,32 +4,32 @@
 
 Post-training Quantization:
 
-- Breaking changes:
-  - ...
 - General:
-  - Switching from setup.py to pyproject.toml for project configuration.
+  - The main installation method was changed from `setup.py` to `pyproject.toml` approach.
 - Features:
-  - (OpenVINO) Extended support of data-free and data-aware weights compression methods ([nncf.compress_weights()](docs/usage/post_training_compression/weights_compression/Usage.md#user-guide) API) with NF4 per-channel quantization, which makes compressed LLMs more accurate and faster on NPU.
   - Introduced `backup_mode` optional parameter in `nncf.compress_weights()` to specify the data type for embeddings, convolutions and last linear layers during 4-bit weights compression. Available options are INT8_ASYM by default, INT8_SYM, and NONE which retains the original floating-point precision of the model weights.
- (Experimental: Torch FX) Added experimental support for quantization and weights compression of [Torch FX](https://pytorch.org/docs/stable/fx.html) models. The compressed models can be directly executed via [torch.compile(compressed_model, backend="openvino")](https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html), see [int8 quantization example](https://github.com/openvinotoolkit/nncf/tree/develop/examples/post_training_quantization/torch_fx/resnet18). The list of supported features:
-    - INT8 quantization with SmoothQuant, MinMax, Fast Bias Correction and Bias Correction algorithms via nncf.quantize().
-    - Data free INT8 and INT4 weights compression with nncf.compress_weights().
-    - Data free mixed-precision data weights compression with nncf.compress_weights(). "ratio" parameter is specified the percent of the rest layers compressed to 4-bit, e.g. ratio=0.9 means 90% of layers compressed to the corresponding 4-bit data type and the rest to a `backup_mode`.
-  - (OpenVINO) Introduced a new option to cache and reuse statistics for the Weight Compression algorithm, reducing the time required to find optimal compression configurations. The [TinyLlama example](https://github.com/openvinotoolkit/nncf/tree/develop/examples/llm_compression/openvino/tiny_llama_find_hyperparams) has been updated to showcase this feature.
   - Added the `quantizer_propagation_rule` parameter, providing fine-grained control over quantizer propagation. This advanced option is designed to improve accuracy for models where quantizers with different granularity could be merged to per-tensor, potentially affecting model accuracy.
-  - (Experimental: Torch) Added experimental model tracing and execution pre-post hooks based on TorchFunctionMode.
-  - ...
+  - Introduced `nncf.data.generate_text_data` API method that utilizes LLM to generate data for further data-aware optimization. See the [example](examples/llm_compression/openvino/tiny_llama_synthetic_data/) for details.
+  - (OpenVINO) Extended support of data-free and data-aware for `nncf.compress_weights()` with NF4 per-channel quantization, which makes compressed LLMs more accurate and faster on NPU.
+  - (OpenVINO) Introduced a new option to cache and reuse statistics for the Weight Compression algorithm, reducing the time required to find optimal compression configurations. See the [TinyLlama example](https://github.com/openvinotoolkit/nncf/tree/develop/examples/llm_compression/openvino/tiny_llama_find_hyperparams) for details.
+  - (TorchFX, Experimental) Added support for quantization and weight compression of [Torch FX](https://pytorch.org/docs/stable/fx.html) models. The compressed models can be directly executed via `torch.compile(compressed_model, backend="openvino")` (see details [here](https://docs.openvino.ai/2024/openvino-workflow/torch-compile.html)). Also, the [INT8 quantization example](https://github.com/openvinotoolkit/nncf/tree/develop/examples/post_training_quantization/torch_fx/resnet18) was added. The list of supported features:
+    - INT8 quantization with SmoothQuant, MinMax, FastBiasCorrection and BiasCorrection algorithms via `nncf.quantize()`.
+    - Data free INT8, INT4 and mixed-precision weights compression with `nncf.compress_weights()`.
+  - (Torch2, Experimental) Added model tracing and execution pre-post hooks based on TorchFunctionMode.
 - Fixes:
   - Resolved an issue with redundant quantizer insertion before elementwise operations, reducing noise introduced by quantization.
+  - Fixed type mismatch issue for `nncf.quantize_with_accuracy_control()`.
+  - Fixed BiasCorrection algorithm for specific branching cases.
   - (OpenVINO) Fixed GPTQ weight compression method for Stable Diffusion models.
-  - (Torch, ONNX) Scaled dot product attention pattern quantization setup is aligned with OpenVINO.
-  - ...
+  - (OpenVINO) Fixed issue with the variational statistics processing for `nncf.compress_weights()`.
+  - (PyTorch, ONNX) Scaled dot product attention pattern quantization setup is aligned with OpenVINO.
 - Improvements:
-  - The `ultralytics` version has been updated to 8.3.22.
-  - Reduction in peak memory by 30-50% for data-aware weight compression with AWQ, SE, LoRA and mixed precision algorithms.
-  - Reduction in compression time by 10-20% for weight compression with AWQ algorithm.
+  - Reduction in peak memory by 30-50% for data-aware `nncf.compress_weights()` with AWQ, ScaleEstimation, LoRA and mixed-precision algorithms.
+  - Reduction in compression time by 10-20% for `nncf.compress_weights()` with AWQ algorithm.
+  - Aligned behavior for ignored subgraph between different `networkx` versions.
+  - Extended ignored patterns with RoPE block for `nncf.ModelType.TRANSFORMER` scheme.
+  - (OpenVINO) Extended to the ignored scope for `nncf.ModelType.TRANSFORMER` scheme with GroupNorm metatype.
   - (ONNX) SE-block ignored pattern variant for `torchvision` mobilenet_v3 has been extended.
-  - ...
 - Tutorials:
   - [Post-Training Optimization of Llama-3.2-11B-Vision Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/mllama-3.2/mllama-3.2.ipynb)
   - [Post-Training Optimization of YOLOv11 Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/yolov11-optimization/yolov11-object-detection.ipynb)
@@ -38,36 +38,20 @@ Post-training Quantization:
   - [Post-Training Optimization of LLM ReAct Agent Model](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/llm-agent-react/llm-agent-react.ipynb)
   - [Post-Training Optimization of CatVTON Model](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/catvton/catvton.ipynb)
 - Known issues:
-  - ...
-
-Compression-aware training:
-
-- Breaking changes:
-  - ...
-- General:
-  - ...
-- Features:
-  - ...
-- Fixes:
-  - ...
-- Improvements:
-  - ...
-- Tutorials:
-  - ...
-- Known issues:
-  - ...
+  - (ONNX) `nncf.quantize()` method can generate inaccurate INT8 results for models with the BiasCorrection algorithm.
 
 Deprecations/Removals:
 
-- nncf.torch.create_compressed_model() function has been deprecated for PyTorch backend.
-- Removed support for python 3.8.
-- The `tensorflow_addons` has been removed from the dependencies.
-- ...
+- (PyTorch) `nncf.torch.create_compressed_model()` function has been deprecated.
+- Removed support for Python 3.8.
+- Removed `tensorflow_addons` from the dependencies.
 
 Requirements:
 
-- ONNX, ONNXRuntime versions were updated
+- Updated ONNX (1.17.0) and ONNXRuntime (1.19.2) versions.
 - Updated PyTorch (2.5.1) and Torchvision (0.20.1) versions.
+- Updated NumPy version (<2.2.0).
+- Updated Ultralytics (8.3.22).
 
 ## New in Release 2.13.0