Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[6.x.x] Removed finalize() methods #5323

Open
wants to merge 1 commit into
base: develop-6.x.x
Choose a base branch
from

Conversation

reinhapa
Copy link
Member

@reinhapa reinhapa commented Jun 3, 2024

Removed all finalize() methods for the following classes:

  • exist-core/src/main/java/org/exist/util/GZIPInputSource.java
  • exist-core/src/main/java/org/exist/xmldb/AbstractRemoteResource.java
  • exist-core/src/main/java/org/exist/xmldb/RemoteResourceSet.java
  • exist-core/src/main/java/org/exist/xmlrpc/AbstractCachedResult.java

Remplaced finalize() method with shutdown hook:

  • exist-core/src/main/java/org/exist/util/io/TemporaryFileManager.java

Description:

Fixes a SAXParseException that occured in heavy use denoting that the document does missing a needed ending tag or having a premature end of file.

Removed all finalize() methods for the following classes:
- exist-core/src/main/java/org/exist/util/GZIPInputSource.java
- exist-core/src/main/java/org/exist/xmldb/AbstractRemoteResource.java
- exist-core/src/main/java/org/exist/xmldb/RemoteResourceSet.java
- exist-core/src/main/java/org/exist/xmlrpc/AbstractCachedResult.java

Remplaced finalize() method with shutdown hook:
- exist-core/src/main/java/org/exist/util/io/TemporaryFileManager.java

Signed-off-by: Patrick Reinhart <[email protected]>
@reinhapa reinhapa changed the title [cleanup] Removed finalize() methods [6.x.x] Removed finalize() methods Jun 3, 2024
Copy link
Contributor

@adamretter adamretter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. GZipInputSource

    1. InputSource is already highly problematic in its design. Historically classes that use this (including inside and outside of eXist-db code, e.g. Xerces, etc.) have not been good at closing the resources they get from calling InputSource#getByteStream() or InputSource#getReader(). The JavaDoc states:

    An InputSource object belongs to the application: the SAX parser shall never modify it in any way (it may modify a copy if necessary). However, standard processing of both byte and character streams is to close them on as part of end-of-parse cleanup, so applications should not attempt to re-use such streams after they have been handed to a parser.

    So we should hope that the caller closes the stream, but that has not always been the case. eXist-db itself has created a bad precedent here, as in NativeBroker to store an XML document it needs to read and parse the XML document twice (due to a lack of transactions, it has to parse the document to validate it in the 1st phase, and then parse it again to store it in a 2nd phase); of course it does this by expecting an InputSource.

    1. The current design of GZipInputSource which tries to help with closing the stream either explicitly through an additional close method, or implicitly through a finalize method already has problems in its design of getByteStream()` - it only remembers the last stream, but it should have remembered all streams. This was added to remove a problem with file handles being leaked on Linux/Unix, and files remaining locked indefinetly on Windows.

    2. My suggestions would be that:

      1. Outside of eXist-db, we have to assume that the caller calls close on the obtained Reader or InputStream.
      2. You need to understand how InputSource is used within eXist-db (especially NativeBroker) before making any decisions about how to refactor it. If we could cleanup the eXist-db code so that it takes ownership correctly of InputStream/Reader and closes it when finished with it, we could remove the need for close entirely here.
  2. TemporaryFileManager

    1. The Singleton design of this is probably half of the problem, it may be better instead refactored into a BrokerPoolService implementation. This will provide it with a function that can be used to cleanup at database shutdown time.
    2. In the cleanup that you added, you remove the lock file before deleting any other outstanding files, this could lead to a race condition. To avoid this, the lock file must instead always be the last thing that is removed.
  3. AbstractRemoteResource

    1. We would need to do a complete audit of the code base to ensure that close is called everywhere this is used when it is finished with, and if not add those close calls. This is API is improved somewhat in XML:DB API v2 where Resource implements Closeable, but I suspect there are plenty of places in eXist-db that need updating to call close otherwise we risk leaking file handles.
  4. RemoteResourceSet

    1. Same concern as above for AbstractRemoteResource.
  5. AbstractCachedResult

    1. We would need to do a complete audit of the code base to ensure that close is called everywhere this is used when it is finished with, and if not add those close calls.

@dizzzz
Copy link
Member

dizzzz commented Jun 3, 2024

From what I understand , this is in essence a backport of v7?

@adamretter
Copy link
Contributor

this is in essence a backport of v7?

@dizzzz Does that mean the problems I mentioned above also present in v7?

@dizzzz
Copy link
Member

dizzzz commented Jun 4, 2024

Yes; That is what I remember, and Patrick confirmed this

@reinhapa
Copy link
Member Author

reinhapa commented Jun 4, 2024

this is in essence a backport of v7?

@dizzzz Does that mean the problems I mentioned above also present in v7?

Not all issues are present the same way as the Resource interface now also extends the AutoCloseable and the issues in the v7 code base have been addressed when switching to the current XMLDB::API version so far. The only thing missing would be to add AutoCloseable also to the ResourceSet interface.

I have digged into the described issue of stuncated resources at least to days now and it seems the the JVM will actually prematurely call the finalize() method on Objects, that are still referred in our case some times (only on the client side useage so far I can see it. I could solve it for my usage by storing the created Remote[XML/Binary]Resource instances within the RemoteResourceSet so far and closing those as the clear() method is called.

For the TemporaryFileManager I already switched to the better NIO cleanup mechanism by using the DELETE_ON_CLOSE open option. Here we could also look into the Cleaner API instead using the finalize() method in the future...

The AbstractCachedResult I have checked in v7 so far but here we should also look into the Cleaner API .

@adamretter all in all it would be great to have a in person discussion to see how we should takle the finalize() problem once and for all in order to be ready to move up in the new JDKs in the future...

  1. InputSource is already highly problematic in its design. Historically classes that use this (including inside and outside of eXist-db code, e.g. Xerces, etc.) have not been good at closing the resources they get from calling InputSource#getByteStream() or InputSource#getReader(). The JavaDoc states:

An InputSource object belongs to the application: the SAX parser shall never modify it in any way (it may modify a copy if necessary). However, standard processing of both byte and character streams is to close them on as part of end-of-parse cleanup, so applications should not attempt to re-use such streams after they have been handed to a parser.

As far as I understand the documentation the parsing instance is in charge to close potentially open streams at the end of the parse operation. That would mean that we need to do it or make sure that it is being done, right?

@dizzzz
Copy link
Member

dizzzz commented Aug 26, 2024

Please can you discuss this? @adamretter @reinhapa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants