Discussion : Wait for installer state to be ok before continuing. #110

zipkid · 2018-02-09T08:53:29Z

We often see the crx installer fail because AEM is restarting or in any other way not able to handle the necessary queries/commands.
This adds a 'wait for ok to install state' in the crx installer provider.
This type of 'wait' may also be needed in the other providers but possibly without/with another check than the 'Sling+OSGi+Installer.json' .
This code is certainly not good to be merged but we would like to discuss where/how this could be done to ensure clean puppet runs.

Maybe this should be part of https://github.com/bstopp/crx-packmgr-api-client-gem, but that is generated from https://github.com/bstopp/swagger-aem, which i don't know how to work with.

wimsymons · 2018-02-09T08:54:55Z

This might fix #82 as well.

wimsymons · 2018-02-09T08:59:33Z

lib/puppet/provider/aem_crx_package/ruby.rb

+    require 'net/http'
+    retries ||= @resource[:retries]
+    retry_timeout = @resource[:retry_timeout]
+    host = 'http://localhost:4502'


The fixed port '4502' should come from the "port" parameter -> see build_cfg method

True, but for that i needed to modify the 'build_client' function, which i did not want to do for this PR. It is mainly intended as a discussion entry-point and definitely not as a final clean-code PR.

zipkid · 2018-02-09T09:36:20Z

@bstopp , I have updated the .rubocop.yaml -> 'TargetRubyVersion: 2.2'.
Can you trigger the checks please?

bstopp · 2018-02-10T01:13:14Z

Do you have a use case or manifest set that shows this occurring? What is causing the AEM restart, a puppet change or a user initiated change?

What is being experienced right now? A number of subsequent failures?

I am pretty certain i know what the issue is, and this won't solve it; the system already does a check with retries here when the resource is encountered by Puppet for applying.

I was pretty sure i opened a ticket somewhere on the underlying issue; if i find it, i'll link it.

stevengssns · 2018-02-12T08:41:31Z

Hi @bstopp,

An exact example of what we have observed is the following:

In our setup we have a clean AEM 6.3 installation, followed by a Service Pack 1 and Cumulative Fix Pack 2 package installation. When we do a clean install, we have observed that the CFP is often (but not always) only partially installed. When going to the package manager, a substantial number of the sub-packages are still in an uninstalled state.
When reproducing the issue on a local workstation, I observed that one of the package install hooks of one of the CFP sub-packages threw an exception. The exception said that the Dynamic Class Loader service was no longer available.
When investigating further, it turned out that the installation of the CFP package started too soon. When the Service Pack gets installed, and the package manager API returns, then the package manager GUI will show the package to be installed, but it is actually still in progress. This means that there are still a lot of OSGi services that are being reloaded due to the ongoing installation(s), when the next package installation is already started.

To try and make the package installations more robust, we are trying to add a more reliable check on the installation state. This check is based on the Sling OSGi Installer JMX MBean which is mentioned in the following AEM Gem:

https://docs.adobe.com/content/ddc/en/gems/AEM-Sustenance---Best-Practices-for-deploying-AEM-Maintenance-Releases/_jcr_content/par/download/file.res/AEM-Sustenance-Best-Practices-Gems.pdf

In the mean while I have also learned that the following end-point provides similar information, though it is documented nowhere, and googling for it seems to return no Adobe search hits at all.

/crx/packmgr/installstatus.jsp

I've decompiled the code, and it does a very simple check on the ActiveResourceCount attribute of the Sling OSGi Installer JMX MBean being '0' or not.

I hope this clarifies the necessity for these changes, if not I can provide more info.

Fyi, there are still a issues to tackle or think about.

A package installation does not mean there will be a single installation run. This means that when you observe an ActiveResourceCount=0, another run might still start. Certainly when installing a Service Pack I observed ActiveResourceCount=0 several times before the installation was completely done. So a few successful calls are probably needed before deciding the system is done (maybe also checking that the other attributes are no longer changing).
When the installation has failed for some reason, and some of the (new?) bundles are not starting, then the ActiveResourceCount remains > 0. So no other package installations will be possible until this gets resolved. No sure if this will always be desired..

henrykuijpers · 2019-08-22T11:25:21Z

@bstopp any input on this?

bstopp · 2019-09-04T15:20:04Z

Can you confirm this wasn't fixed with v3.0.0?

Wait for installer state to be ok before continuing.

8375759

zipkid force-pushed the feature/wait_for_install_ok branch from f29031d to 8375759 Compare February 9, 2018 08:57

wimsymons reviewed Feb 9, 2018

View reviewed changes

Add check for InstalledResourceCount for 1 retry_timeout time-unit

e3aa8e1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion : Wait for installer state to be ok before continuing. #110

Discussion : Wait for installer state to be ok before continuing. #110

zipkid commented Feb 9, 2018

wimsymons commented Feb 9, 2018

wimsymons Feb 9, 2018

zipkid Feb 9, 2018

zipkid commented Feb 9, 2018

bstopp commented Feb 10, 2018

stevengssns commented Feb 12, 2018

henrykuijpers commented Aug 22, 2019

bstopp commented Sep 4, 2019

Discussion : Wait for installer state to be ok before continuing. #110

Are you sure you want to change the base?

Discussion : Wait for installer state to be ok before continuing. #110

Conversation

zipkid commented Feb 9, 2018

wimsymons commented Feb 9, 2018

wimsymons Feb 9, 2018

Choose a reason for hiding this comment

zipkid Feb 9, 2018

Choose a reason for hiding this comment

zipkid commented Feb 9, 2018

bstopp commented Feb 10, 2018

stevengssns commented Feb 12, 2018

henrykuijpers commented Aug 22, 2019

bstopp commented Sep 4, 2019