Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After system restart, OrientDB fails to start #10250

Open
suneelkumarch opened this issue Jun 12, 2024 · 10 comments
Open

After system restart, OrientDB fails to start #10250

suneelkumarch opened this issue Jun 12, 2024 · 10 comments
Labels
Milestone

Comments

@suneelkumarch
Copy link

suneelkumarch commented Jun 12, 2024

OrientDB Version: 3.2.18

OS: docker image

Expected behavior

OrientDB is deployed to as container in K8s cluster.
OrientDB runs during the normal operation.
On a node/cluster restarted, its expects that orientdb starts and works as expected.

Actual behavior

At times OrientDB fails to start and ends-up in CrashloopBackOff, with the folllowing error

Exception <ID> in storage plocal:/orientdb/databases/OSystem: 3.2.18 (build 7589013, branch UNKNOWN) [OLocalPaginatedStorage]`

Complete StackTrace:

 INFO  System is started under an effective user : `999` [OEngineLocalPaginated]
 INFO  WAL maximum segment size is set to 2,511 MB [OrientDBDistributed]
 INFO  Databases directory: /orientdb/databases [OServer]
 INFO  Page size for WAL located in /orientdb/databases/OSystem is set to 4096 bytes. [CASDiskWriteAheadLog]
 INFO  DWL:OSystem: block size = 4096 bytes, maximum segment size = 506 MB [DoubleWriteLogGL]
 SEVER Exception `<ID>` in storage `plocal:/orientdb/databases/OSystem`: 3.2.18 (build 75890139e2e64b786a59c95b913af9fbb86c5cfc, branch UNKNOWN) [OLocalPaginatedStorage]
com.orientechnologies.orient.core.exception.OStorageException: Exception during execution of atomic operation inside of storage OSystem
	at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.executeInsideAtomicOperation(OAtomicOperationsManager.java:146)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.open(OAbstractPaginatedStorage.java:531)
	at com.orientechnologies.orient.core.db.OrientDBEmbedded.getAndOpenStorage(OrientDBEmbedded.java:590)
	at com.orientechnologies.orient.core.db.OrientDBEmbedded.openNoAuthorization(OrientDBEmbedded.java:517)
	at com.orientechnologies.orient.core.db.OrientDBEmbedded.openNoAuthorization(OrientDBEmbedded.java:87)
	at com.orientechnologies.orient.core.db.OSystemDatabase.openSystemDatabase(OSystemDatabase.java:86)
	at com.orientechnologies.orient.core.db.OSystemDatabase.checkServerId(OSystemDatabase.java:165)
	at com.orientechnologies.orient.core.db.OSystemDatabase.init(OSystemDatabase.java:153)
	at com.orientechnologies.orient.server.OServer.initSystemDatabase(OServer.java:1147)
	at com.orientechnologies.orient.server.OServer.activate(OServer.java:430)
	at com.orientechnologies.orient.server.OServerMain$1.run(OServerMain.java:49)
Caused by: java.lang.NullPointerException
	at com.orientechnologies.orient.core.storage.impl.local.paginated.base.ODurablePage.<init>(ODurablePage.java:75)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v1.CellBTreeBucketSingleValueV1.<init>(CellBTreeBucketSingleValueV1.java:58)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v1.CellBTreeSingleValueV1.findBucket(CellBTreeSingleValueV1.java:1341)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v1.CellBTreeSingleValueV1.get(CellBTreeSingleValueV1.java:189)
	at com.orientechnologies.orient.core.storage.config.OClusterBasedStorageConfiguration.readProperty(OClusterBasedStorageConfiguration.java:1819)
	at com.orientechnologies.orient.core.storage.config.OClusterBasedStorageConfiguration.readConfiguration(OClusterBasedStorageConfiguration.java:922)
	at com.orientechnologies.orient.core.storage.config.OClusterBasedStorageConfiguration.load(OClusterBasedStorageConfiguration.java:253)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.lambda$open$1(OAbstractPaginatedStorage.java:537)
	at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.executeInsideAtomicOperation(OAtomicOperationsManager.java:140)
	... 10 more
@suneelkumarch suneelkumarch changed the title Failed to start OrientDB server After system restart, OrientDB fails to start Jun 12, 2024
@tglman tglman added the bug label Jun 20, 2024
@tglman tglman added this to the 3.2.x milestone Jun 20, 2024
@scotthoye
Copy link

I'm also occasionally encountering a similar NullPointerException in ODurablePage (using version 3.2.32). Here are a few example stacktraces:

SEVERE [10:45:17 10-Oct-24 EDT][com.orientechnologies.orient.core.storage.disk.OLocalPaginatedStorage] Exception `56B353D7` in storage `plocal:XXX`: 3.2.32 (build ${buildNumber}, branch UNKNOWN)
com.orientechnologies.orient.core.exception.OStorageException: Internal error happened in storage XXX please restart the server or re-open the storage to undergo the restore process and fix the error.	DB name="XXX"
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.checkErrorState(OAbstractPaginatedStorage.java:4587)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.checkOpennessAndMigration(OAbstractPaginatedStorage.java:4567)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.getClusterIdByName(OAbstractPaginatedStorage.java:2186)
	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentAbstract.getClusterIdByName(ODatabaseDocumentAbstract.java:619)
	at com.orientechnologies.orient.core.metadata.OMetadataDefault.init(OMetadataDefault.java:122)
	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentEmbedded.loadMetadata(ODatabaseDocumentEmbedded.java:348)
	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentEmbedded.init(ODatabaseDocumentEmbedded.java:205)
	at com.orientechnologies.orient.core.db.OrientDBEmbedded.newSessionInstance(OrientDBEmbedded.java:439)
	at com.orientechnologies.orient.core.db.OrientDBEmbedded.openNoAuthorization(OrientDBEmbedded.java:459)
	at com.orientechnologies.orient.core.db.OrientDBEmbedded.lambda$executeNoAuthorization$8(OrientDBEmbedded.java:1150)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: com.orientechnologies.orient.core.exception.OStorageException: Exception during execution of component operation inside of storage XXX	DB name="XXX"
	at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.calculateInsideComponentOperation(OAtomicOperationsManager.java:226)
	at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.calculateInsideComponentOperation(OAtomicOperationsManager.java:213)
	at com.orientechnologies.orient.core.storage.impl.local.paginated.base.ODurableComponent.calculateInsideComponentOperation(ODurableComponent.java:96)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.update(CellBTreeSingleValueV3.java:226)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.put(CellBTreeSingleValueV3.java:210)
	at com.orientechnologies.orient.core.index.engine.v1.OCellBTreeMultiValueIndexEngine.put(OCellBTreeMultiValueIndexEngine.java:417)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.putRidIndexEntryInternal(OAbstractPaginatedStorage.java:3270)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.putRidIndexEntry(OAbstractPaginatedStorage.java:3250)
	at com.orientechnologies.orient.core.index.OIndexMultiValues.doPutV1(OIndexMultiValues.java:207)
	at com.orientechnologies.orient.core.index.OIndexMultiValues.doPut(OIndexMultiValues.java:177)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.applyTxChanges(OAbstractPaginatedStorage.java:2584)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commitIndexes(OAbstractPaginatedStorage.java:2569)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commit(OAbstractPaginatedStorage.java:2493)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commit(OAbstractPaginatedStorage.java:2309)
	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentEmbedded.internalCommit(ODatabaseDocumentEmbedded.java:1953)
	at com.orientechnologies.orient.core.tx.OTransactionOptimistic.doCommit(OTransactionOptimistic.java:651)
	at com.orientechnologies.orient.core.tx.OTransactionOptimistic.commit(OTransactionOptimistic.java:116)
	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentAbstract.commit(ODatabaseDocumentAbstract.java:1592)
	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentAbstract.commit(ODatabaseDocumentAbstract.java:1562)
	... 16 more
Caused by: java.lang.NullPointerException
	at com.orientechnologies.orient.core.storage.impl.local.paginated.base.ODurablePage.<init>(ODurablePage.java:75)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueBucketV3.<init>(CellBTreeSingleValueBucketV3.java:58)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.allocateNewPage(CellBTreeSingleValueV3.java:1558)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.splitNonRootBucket(CellBTreeSingleValueV3.java:1453)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.splitBucket(CellBTreeSingleValueV3.java:1416)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.lambda$update$1(CellBTreeSingleValueV3.java:323)
	at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.calculateInsideComponentOperation(OAtomicOperationsManager.java:221)
	... 34 more

Another:

Error on formatting message 'Exception `%08X` in storage `%s`: %s'. Exception: java.lang.IllegalArgumentException: can't parse argument number: buildNumberSEVERE [10:45:17 10-Oct-24 EDT][com.orientechnologies.common.thread.ScalingThreadPoolExecutor] Exception in thread 'OrientDBEmbedded-1'
com.orientechnologies.orient.core.exception.ODatabaseException: Cannot open database 'XXX'
	at com.orientechnologies.orient.core.db.OrientDBEmbedded.openNoAuthorization(OrientDBEmbedded.java:465)
	at com.orientechnologies.orient.core.db.OrientDBEmbedded.lambda$executeNoAuthorization$8(OrientDBEmbedded.java:1150)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: com.orientechnologies.orient.core.exception.OStorageException: Internal error happened in storage XXX please restart the server or re-open the storage to undergo the restore process and fix the error.	DB name="XXX"
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.checkErrorState(OAbstractPaginatedStorage.java:4587)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.checkOpennessAndMigration(OAbstractPaginatedStorage.java:4567)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.getClusterIdByName(OAbstractPaginatedStorage.java:2186)
	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentAbstract.getClusterIdByName(ODatabaseDocumentAbstract.java:619)
	at com.orientechnologies.orient.core.metadata.OMetadataDefault.init(OMetadataDefault.java:122)
	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentEmbedded.loadMetadata(ODatabaseDocumentEmbedded.java:348)
	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentEmbedded.init(ODatabaseDocumentEmbedded.java:205)
	at com.orientechnologies.orient.core.db.OrientDBEmbedded.newSessionInstance(OrientDBEmbedded.java:439)
	at com.orientechnologies.orient.core.db.OrientDBEmbedded.openNoAuthorization(OrientDBEmbedded.java:459)
	... 5 more
Caused by: com.orientechnologies.orient.core.exception.OStorageException: Exception during execution of component operation inside of storage XXX	DB name="XXX"
	at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.calculateInsideComponentOperation(OAtomicOperationsManager.java:226)
	at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.calculateInsideComponentOperation(OAtomicOperationsManager.java:213)
	at com.orientechnologies.orient.core.storage.impl.local.paginated.base.ODurableComponent.calculateInsideComponentOperation(ODurableComponent.java:96)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.update(CellBTreeSingleValueV3.java:226)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.put(CellBTreeSingleValueV3.java:210)
	at com.orientechnologies.orient.core.index.engine.v1.OCellBTreeMultiValueIndexEngine.put(OCellBTreeMultiValueIndexEngine.java:417)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.putRidIndexEntryInternal(OAbstractPaginatedStorage.java:3270)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.putRidIndexEntry(OAbstractPaginatedStorage.java:3250)
	at com.orientechnologies.orient.core.index.OIndexMultiValues.doPutV1(OIndexMultiValues.java:207)
	at com.orientechnologies.orient.core.index.OIndexMultiValues.doPut(OIndexMultiValues.java:177)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.applyTxChanges(OAbstractPaginatedStorage.java:2584)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commitIndexes(OAbstractPaginatedStorage.java:2569)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commit(OAbstractPaginatedStorage.java:2493)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commit(OAbstractPaginatedStorage.java:2309)
	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentEmbedded.internalCommit(ODatabaseDocumentEmbedded.java:1953)
	at com.orientechnologies.orient.core.tx.OTransactionOptimistic.doCommit(OTransactionOptimistic.java:651)
	at com.orientechnologies.orient.core.tx.OTransactionOptimistic.commit(OTransactionOptimistic.java:116)
	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentAbstract.commit(ODatabaseDocumentAbstract.java:1592)
	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentAbstract.commit(ODatabaseDocumentAbstract.java:1562)
	... 16 more
Caused by: java.lang.NullPointerException
	at com.orientechnologies.orient.core.storage.impl.local.paginated.base.ODurablePage.<init>(ODurablePage.java:75)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueBucketV3.<init>(CellBTreeSingleValueBucketV3.java:58)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.allocateNewPage(CellBTreeSingleValueV3.java:1558)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.splitNonRootBucket(CellBTreeSingleValueV3.java:1453)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.splitBucket(CellBTreeSingleValueV3.java:1416)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.lambda$update$1(CellBTreeSingleValueV3.java:323)
	at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.calculateInsideComponentOperation(OAtomicOperationsManager.java:221)
	... 34 more

@tglman
Copy link
Member

tglman commented Oct 16, 2024

Hi,

This looks some issues in the storage logic, I do remember that some fixes in the storage logic have been done in some early releases of 3.2.x so I do suggest to update to the last hotfix, also it seems that the issues are mostly around the OSystem database, that if you are not using some advanced features like auditing can be removed safely and it will be recreated, let me know if you still have problem with the newer version of the OrientDB

@suneelkumarch
Copy link
Author

@tglman if the observation is just with System DB(OSystem), the deleting it and restarting orientdb service recreates the system DB. However, in my case this issue is observed with not just with system DB (OSystem), but also with application DB as well..

@scotthoye
Copy link

scotthoye commented Oct 22, 2024

Thank you for the comments! A few more details/questions:

  • We originally saw this NullPointerException when using version 3.2.23. After upgrading to 3.2.32 (only two away from the latest release), we are still seeing the exceptions. We briefly tested with version 3.2.30 in between, and we did not observe the exception in that version, but it's possible that we didn't give it enough runtime to know for sure.
  • We have tried deleting everything (more than just OSystem) and starting fresh by re-populating the database, and things will work fine for a few days. But eventually the problem occurs again after a few days of runtime for no known reason. The operations performed against the database are consistent in our test scenario, so we haven't determined a sequence of operations that leads to the exception state (it seems to happen randomly).
  • If we're not using the advanced features of OSystem, is there any way to disable it to avoid this NullPointerException? Is OSystem necessary?

Thanks again!

@tglman
Copy link
Member

tglman commented Nov 4, 2024

Hi,

Are you using volumes for the data folder in your k8s deploy ?

Regards

@scotthoye
Copy link

Thanks, @tglman. I can't speak for the original reporter, but in my case, I'm not using K8s at all. So no volumes for the data folder. Just running OrientDb locally on a Windows PC with only local access.

@suneelkumarch
Copy link
Author

suneelkumarch commented Nov 5, 2024

@tglman Yes, in my case, I am using k8s volumes.

@suneelkumarch
Copy link
Author

Hi,

Are you using volumes for the data folder in your k8s deploy ?

Regards

@tglman using k8s volumes has any observations with orientdb, is it?

@tglman
Copy link
Member

tglman commented Nov 12, 2024

Hi @suneelkumarch,

We do suggest to use volumes, because containers filesystems are not designed for databases, so if you are using volumes you are doing the right way.

I was double checking this, if you are using volumes this errors should come from some other places, did you have this issues after an upgrade or you get this errors also when using the database with the exact same version of OrientDB of which it was created ?

@suneelkumarch
Copy link
Author

suneelkumarch commented Nov 12, 2024

Hi @tglman, we get there errors on the same version of the orientdb, NO upgrades were performed. This observation is seen after the system was restarted(not a graceful shutdown), where orientdb is running as Pod.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

3 participants