Constants computed on-the-fly when model is serialized to IR #27705

slyalin · 2024-11-22T15:26:42Z

A way to avoid the materialization of big constants by computing them on-the-fly during IR serialization. Going to be used in NNCF when compressing big models.

…ization to IR. Helps to not materialize big computed constants in memory for the entire model. Instead make such Constants one by one during serialization to IR. It consists of two parts: - Serialize pass modification to listen for 'postponed_constant' runtime attribute in a node and on-the-fly replace such a node by a Constant node with constant_fold method. - Python helper `make_postponed_constant` that wrap an arbitrary callback that creates a tensor into a custom Python operation that has runtime attribute `postponed_constant`. This operation will constant folded into a Constant instance with data obtained by calling that callback.

slyalin · 2024-11-22T15:30:49Z

src/core/src/pass/serialize.cpp

+            // TODO: As it was mentioned above that there was a chance to have a collision easily, should we limit the number of iterations in the following loop?
+            // FIXME: Loop over items that have matching hash keys only, not entire range [found, m_hash_to_file_positions.end()).
+            // FIXME: According to https://en.cppreference.com/w/cpp/container/multimap/find, std::multimap::find returns any item with the matching key, not necessary
+            //        the first one. But the loop below checking items starting with `found` item only even if there may be other items with matching hash before that position.
+            //        So we are not checking all possible matches.


@pavel-esir, please confirm that we have an issue here.

slyalin · 2024-11-22T15:37:20Z

src/bindings/python/src/openvino/runtime/utils/postponed_constant.py

+            return PostponedConstant.class_type_info
+
+        def evaluate(self, outputs, _):
+            maker().copy_to(outputs[0])


If we allow the assignment of new ov.Tensor instances in outputs, then this copying won't be required. Another (probably better) option is to pass outputs[0] to maker function to build the tensor in place.

slyalin added 2 commits November 22, 2024 11:22

Noticed issues in the hashing of serialized binary blobs.

205c1f1

slyalin requested a review from alexsu52 November 22, 2024 15:26

github-actions bot added category: Core OpenVINO Core (aka ngraph) category: Python API OpenVINO Python bindings labels Nov 22, 2024

slyalin commented Nov 22, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Constants computed on-the-fly when model is serialized to IR #27705

Constants computed on-the-fly when model is serialized to IR #27705

slyalin commented Nov 22, 2024

slyalin Nov 22, 2024

slyalin Nov 22, 2024 •

edited

Loading

Constants computed on-the-fly when model is serialized to IR #27705

Are you sure you want to change the base?

Constants computed on-the-fly when model is serialized to IR #27705

Conversation

slyalin commented Nov 22, 2024

slyalin Nov 22, 2024

Choose a reason for hiding this comment

slyalin Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

slyalin Nov 22, 2024 •

edited

Loading