Skip to content

Commit

Permalink
[intel-npu] Publishing NPU_DEFER_WEIGHTS_LOAD property
Browse files Browse the repository at this point in the history
  • Loading branch information
csoka committed Nov 28, 2024
1 parent a1920c4 commit 960c435
Show file tree
Hide file tree
Showing 5 changed files with 16 additions and 14 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,8 @@ offer a limited set of supported OpenVINO features.
ov::intel_npu::turbo
ov::intel_npu::tiles
ov::intel_npu::max_tiles
ov::intel_npu::bypass_umd_caching
ov::intel_npu::defer_weights_load
.. tab-item:: Read-only properties

Expand All @@ -168,7 +170,6 @@ offer a limited set of supported OpenVINO features.
ov::intel_npu::device_alloc_mem_size
ov::intel_npu::device_total_mem_size
ov::intel_npu::driver_version
ov::intel_npu::bypass_umd_caching
.. note::
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -95,5 +95,12 @@ static constexpr ov::Property<int64_t> max_tiles{"NPU_MAX_TILES"};
*/
static constexpr ov::Property<bool> bypass_umd_caching{"NPU_BYPASS_UMD_CACHING"};

/**
* @brief [Only for NPU Plugin]
* Type: boolean, default is false
* This option allows to delay loading the weights until inference is created
*/
static constexpr ov::Property<bool> defer_weights_load{"NPU_DEFER_WEIGHTS_LOAD"};

} // namespace intel_npu
} // namespace ov
1 change: 1 addition & 0 deletions src/plugins/intel_npu/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,7 @@ The following properties are supported:
| `ov::intel_npu::tiles`/</br>`NPU_TILES` | RW | Sets the number of npu tiles to compile the model for | `[0-]` | `-1` |
| `ov::intel_npu::max_tiles`/</br>`NPU_MAX_TILES` | RW | Maximum number of tiles supported by the device we compile for. Can be set for offline compilation. If not set, it will be populated by driver.| `[0-]` | `[1-6] depends on npu platform` |
| `ov::intel_npu::bypass_umd_caching`/</br>`NPU_BYPASS_UMD_CACHING` | RW | Bypass the caching of compiled models in UMD. | `YES`/ `NO`| `NO` |
| `ov::intel_npu::defer_weights_load`/</br>`NPU_DEFER_WEIGHTS_LOAD` | RW | Delay loading the weights until inference is created. | `YES`/ `NO`| `NO` |

&nbsp;
### Performance Hint: Default Number of DPU Groups / DMA Engines
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -305,13 +305,6 @@ static constexpr ov::Property<BatchMode> batch_mode{"NPU_BATCH_MODE"};
*/
static constexpr ov::Property<int64_t> create_executor{"NPU_CREATE_EXECUTOR"};

/**
* @brief [Only for NPU Plugin]
* Type: boolean, default is false
* This option allows to omit loading the weights until inference is created
*/
static constexpr ov::Property<bool> defer_weights_load{"NPU_DEFER_WEIGHTS_LOAD"};

/**
* @brief Read-only property to get the name of used backend
*/
Expand Down
12 changes: 6 additions & 6 deletions src/plugins/intel_npu/src/plugin/src/plugin.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -489,6 +489,12 @@ Plugin::Plugin()
[](const Config& config) {
return config.get<BYPASS_UMD_CACHING>();
}}},
{ov::intel_npu::defer_weights_load.name(),
{true,
ov::PropertyMutability::RW,
[](const Config& config) {
return config.get<DEFER_WEIGHTS_LOAD>();
}}},
// NPU Private
// =========
{ov::intel_npu::dma_engines.name(),
Expand Down Expand Up @@ -544,12 +550,6 @@ Plugin::Plugin()
[](const Config& config) {
return config.get<CREATE_EXECUTOR>();
}}},
{ov::intel_npu::defer_weights_load.name(),
{false,
ov::PropertyMutability::RW,
[](const Config& config) {
return config.get<DEFER_WEIGHTS_LOAD>();
}}},
{ov::intel_npu::dynamic_shape_to_static.name(),
{false,
ov::PropertyMutability::RW,
Expand Down

0 comments on commit 960c435

Please sign in to comment.