Ecommerce | Index Boostrap Command #83

andreas-gruenwald · 2020-06-10T21:25:16Z

Feature Request

I just have another idea in mind.
Actually we could add a mode to the bootstrap command, so that it will only create "empty" ID-rows for non-existing ones, instead of processing the whole object(s):

https://github.com/pimcore/pimcore/blob/d4bf70250202c56fb81ece2d80a85c282daf67ac/bundles/EcommerceFrameworkBundle/Command/IndexService/BootstrapCommand.php#L131

The store table would then be used as a queue, and the ProcessPreparationQueue command would then take care about the rest:

because in_preparation is not set, the entries will be processed.
if a data object is part of the index, then the missing data will be added and the entry leaves the queue.
if a data object is not part of the index, then the command will remove the row, again.
opposed to the bootstrap command, restart (e.g. every 15 minutes) is supported. So if the container is rebuilt while development, nothing bad happens.

This would depend on pimcore/pimcore#6487 though.

The text was updated successfully, but these errors were encountered:

fashxp · 2020-06-12T07:16:00Z

might be challenging due to

so loading the dataobject will be necessary in any case, and most probably this is the most expensive operation (not extracting data from it for index)

andreas-gruenwald · 2020-06-12T08:06:07Z

might be challenging ...

My idea was to ignore inIndex() and add rows to the store table regardless.
ProcessPreparationQueue will then load the entire data object and kick the irrelevant rows based on inIndex(). Probably also the sub-IDs will then be created on the-fly.

So the effort of loading data objects is the same, but it is shifted from the bootstrap command to the ProcessPreparationQueue, where it is easier to setup.

Maybe there is something I do not see yet.

fashxp · 2020-06-12T08:21:08Z

but for knowing what IDs should be in index, you need to load the data object ... and the whole point of bootstrapping is knowing what IDs should be in index.
processing preparation queue command already needs to know what IDs are in index.

or am I missing something?

andreas-gruenwald · 2020-06-12T08:45:03Z

Here is the difference of the two approaches.
Let's assume that we have a system with 600.000 products. 200.000 are relevant for the product index.

Current mode:

BootstrapCommand:

Load the product ID list (will result in 600.000 IDs) (cost: low).
Iterate the IDs and load the 600.000 data objects (cost: high).
If a product is in index, then add a store-table entry, otherwise remove (existing) rows (cost: low).

For 600.000 products, without parallelization, let's say that the command will run for 48 hours in a project where the product data model is complex.

Alternative/additional mode with `BootstrapCommand` and `ProcessPreparationQueue`:

BootstrapCommand:

Load the product ID list (will result in 600.000 IDs) (cost: low).
Iterate the IDs and add an empty ID row if no entry exists yet (cost: low).

Because the data objects are not loaded in this step, the process will probably terminate after a couple of minutes instead of hours/days.

ProcessPreparationQueue:

All the empty rows that have been added by the BootstrapCommand in step 1 will be scanned, as in_preparation_queue=1 (cost: low).
Those entries that haven't been added to the index before, will be processed (let's assume those are 400.000). If a product is in index, then the row will be updated, otherwise it will be deleted (cost: high).

There are two main differences:

In mode number 2 those rows that are already in index and are already "prepared" will remain untouched. So only the delta will be processed, resulting in less data object reads.
The BootstrapCommand does not have the capability to restart, if the command is stopped unintentionally. Let's assume that the whole processing takes 48 hours, but after 30 hours on the DEV server the Symfony container is built. The command will stop, and restarting the BootstrapCommand will result in another 48 hours run. With approach number 2 this won't happen, as the ProcessPreparationQueue will only process the "open" records, not all data objects based on the product list condition. Mode number one is probably still needed, so those modes could coexist.

markus-moser · 2020-09-18T18:57:29Z

We should definitly create something similar to what @andreas-gruenwald suggested. Currently it's really a very big challenge to do the bootstrapping in projects with many products and tenants.

fashxp · 2021-02-23T10:43:23Z

are there any BC breaks needed for that?

andreas-gruenwald · 2021-09-14T20:30:10Z

are there any BC breaks needed for that?

Don't think that BCs are needed.
Also, the current behavior could remain as the default one.
The best way would probably be to implement a project related pull request.

fashxp · 2021-09-15T07:13:05Z

I would love to see a PR for it :-)

fashxp added the PR Welcome label Mar 9, 2022

brusch transferred this issue from pimcore/pimcore May 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ecommerce | Index Boostrap Command #83

Ecommerce | Index Boostrap Command #83

andreas-gruenwald commented Jun 10, 2020

fashxp commented Jun 12, 2020

andreas-gruenwald commented Jun 12, 2020

fashxp commented Jun 12, 2020

andreas-gruenwald commented Jun 12, 2020

markus-moser commented Sep 18, 2020

fashxp commented Feb 23, 2021

andreas-gruenwald commented Sep 14, 2021 •

edited

Loading

fashxp commented Sep 15, 2021

Ecommerce | Index Boostrap Command #83

Ecommerce | Index Boostrap Command #83

Comments

andreas-gruenwald commented Jun 10, 2020

Feature Request

fashxp commented Jun 12, 2020

andreas-gruenwald commented Jun 12, 2020

fashxp commented Jun 12, 2020

andreas-gruenwald commented Jun 12, 2020

Current mode:

Alternative/additional mode with BootstrapCommand and ProcessPreparationQueue:

markus-moser commented Sep 18, 2020

fashxp commented Feb 23, 2021

andreas-gruenwald commented Sep 14, 2021 • edited Loading

fashxp commented Sep 15, 2021

Alternative/additional mode with `BootstrapCommand` and `ProcessPreparationQueue`:

andreas-gruenwald commented Sep 14, 2021 •

edited

Loading