Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ecommerce | Index Boostrap Command #83

Open
andreas-gruenwald opened this issue Jun 10, 2020 · 8 comments
Open

Ecommerce | Index Boostrap Command #83

andreas-gruenwald opened this issue Jun 10, 2020 · 8 comments

Comments

@andreas-gruenwald
Copy link
Contributor

Feature Request

I just have another idea in mind.
Actually we could add a mode to the bootstrap command, so that it will only create "empty" ID-rows for non-existing ones, instead of processing the whole object(s):

https://github.com/pimcore/pimcore/blob/d4bf70250202c56fb81ece2d80a85c282daf67ac/bundles/EcommerceFrameworkBundle/Command/IndexService/BootstrapCommand.php#L131

The store table would then be used as a queue, and the ProcessPreparationQueue command would then take care about the rest:

  • because in_preparation is not set, the entries will be processed.
  • if a data object is part of the index, then the missing data will be added and the entry leaves the queue.
  • if a data object is not part of the index, then the command will remove the row, again.
  • opposed to the bootstrap command, restart (e.g. every 15 minutes) is supported. So if the container is rebuilt while development, nothing bad happens.

This would depend on pimcore/pimcore#6487 though.

@fashxp
Copy link
Member

fashxp commented Jun 12, 2020

might be challenging due to

so loading the dataobject will be necessary in any case, and most probably this is the most expensive operation (not extracting data from it for index)

@andreas-gruenwald
Copy link
Contributor Author

might be challenging ...

  • My idea was to ignore inIndex() and add rows to the store table regardless.
  • ProcessPreparationQueue will then load the entire data object and kick the irrelevant rows based on inIndex(). Probably also the sub-IDs will then be created on the-fly.

So the effort of loading data objects is the same, but it is shifted from the bootstrap command to the ProcessPreparationQueue, where it is easier to setup.

Maybe there is something I do not see yet.

@fashxp
Copy link
Member

fashxp commented Jun 12, 2020

but for knowing what IDs should be in index, you need to load the data object ... and the whole point of bootstrapping is knowing what IDs should be in index.
processing preparation queue command already needs to know what IDs are in index.

or am I missing something?

@andreas-gruenwald
Copy link
Contributor Author

Here is the difference of the two approaches.
Let's assume that we have a system with 600.000 products. 200.000 are relevant for the product index.

Current mode:

BootstrapCommand:

  1. Load the product ID list (will result in 600.000 IDs) (cost: low).
  2. Iterate the IDs and load the 600.000 data objects (cost: high).
  3. If a product is in index, then add a store-table entry, otherwise remove (existing) rows (cost: low).

For 600.000 products, without parallelization, let's say that the command will run for 48 hours in a project where the product data model is complex.


Alternative/additional mode with BootstrapCommand and ProcessPreparationQueue:

BootstrapCommand:

  1. Load the product ID list (will result in 600.000 IDs) (cost: low).
  2. Iterate the IDs and add an empty ID row if no entry exists yet (cost: low).

Because the data objects are not loaded in this step, the process will probably terminate after a couple of minutes instead of hours/days.

ProcessPreparationQueue:

  1. All the empty rows that have been added by the BootstrapCommand in step 1 will be scanned, as in_preparation_queue=1 (cost: low).
  2. Those entries that haven't been added to the index before, will be processed (let's assume those are 400.000). If a product is in index, then the row will be updated, otherwise it will be deleted (cost: high).

There are two main differences:

  1. In mode number 2 those rows that are already in index and are already "prepared" will remain untouched. So only the delta will be processed, resulting in less data object reads.
  2. The BootstrapCommand does not have the capability to restart, if the command is stopped unintentionally. Let's assume that the whole processing takes 48 hours, but after 30 hours on the DEV server the Symfony container is built. The command will stop, and restarting the BootstrapCommand will result in another 48 hours run. With approach number 2 this won't happen, as the ProcessPreparationQueue will only process the "open" records, not all data objects based on the product list condition. Mode number one is probably still needed, so those modes could coexist.

@markus-moser
Copy link
Contributor

We should definitly create something similar to what @andreas-gruenwald suggested. Currently it's really a very big challenge to do the bootstrapping in projects with many products and tenants.

@fashxp
Copy link
Member

fashxp commented Feb 23, 2021

are there any BC breaks needed for that?

@andreas-gruenwald
Copy link
Contributor Author

andreas-gruenwald commented Sep 14, 2021

are there any BC breaks needed for that?

Don't think that BCs are needed.
Also, the current behavior could remain as the default one.
The best way would probably be to implement a project related pull request.

@fashxp
Copy link
Member

fashxp commented Sep 15, 2021

I would love to see a PR for it :-)

@brusch brusch transferred this issue from pimcore/pimcore May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants