Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

withShouldSkipBlockingWait will always fail to acquire lock if already exists and expired #93

Open
cristianocca opened this issue Aug 18, 2023 · 8 comments

Comments

@cristianocca
Copy link

cristianocca commented Aug 18, 2023

Acquiring a lock with withShouldSkipBlockingWait(true) (so we don't block waiting) does not attempt to retrieve a lock that is expired (e.g., owner died / past lease time) as opposed to not using this option. Basically, if a lock is stale in DDB, using withShouldSkipBlockingWait will never be able to pick up the lock and it will consider it as already taken.

@rahulvishwanatham
Copy link

I am facing the same issue, any solution to this please? A lock is stale in my DDB and no other instance is able to acquire the lock indefinitely. Is there a way to save the lastUpdatedTime to DDB, so that the isExpired() method of LockItem works well.

@a-lcardos
Copy link

Facing the same issue.

Can be easily reproduced

 val lockClient = AmazonDynamoDBLockClient(
            AmazonDynamoDBLockClientOptions.builder(ddb, "test_lock")
                .withTimeUnit(TimeUnit.SECONDS)
                .withLeaseDuration(5L)
                .withCreateHeartbeatBackgroundThread(false)
                .build())

        val lock1 = lockClient.acquireLock(
            AcquireLockOptions
                .builder("myTest")
                .withAdditionalTimeToWaitForLock(0)
                .withTimeUnit(TimeUnit.SECONDS)
                .build()
        )
        log.info { "First lock $lock1" }

        Thread.sleep(10000)
        log.info { "Slept a 10 sec. Lease time is 5 sec. Attempting to acquire lock again" }

        val lock2 = lockClient.acquireLock(
            AcquireLockOptions
                .builder("myTest")
                .withShouldSkipBlockingWait(true)
                .withAdditionalTimeToWaitForLock(1)
                .withTimeUnit(TimeUnit.SECONDS)
                .build()
        )
        log.info { "Lock $lock2" }

Fails in attempt to acquire the lock2

    01 Mar 2024 11:38:31,279 [INFO]  (Test worker): First lock LockItem{Partition Key=myTest, Sort Key=Optional.empty, Owner Name=---, Lookup Time=349680572, Lease Duration=5000, Record Version Number=a7c31f70-67ca-4105-af95-0bd75193df52, Delete On Close=true, Data=, Is Released=false}
    01 Mar 2024 11:38:41,289 [INFO]  (Test worker): Slept a 10 sec. Lease time is 5 sec. Attempting to acquire lock again

    com.amazonaws.services.dynamodbv2.model.LockCurrentlyUnavailableException: The lock being requested is being held by another client.
        at app//com.amazonaws.services.dynamodbv2.AmazonDynamoDBLockClient.acquireLock(AmazonDynamoDBLockClient.java:483)
        at app//com.integ.DynamoDbIntegrationTest.sample example(DynamoDbIntegrationTest.kt:136)

@a-lcardos
Copy link

README says this should work in a different way

If the lock does not exist or if the lock has been acquired by the other machine and is stale (has passed the lease
duration), this would successfully acquire the lock.

Issue is happening because the isExpired checks if its stale actually using the lock lookupTime

return LockClientUtils.INSTANCE.millisecondTime() - this.lookupTime.get() > this.leaseDuration.get();

The lookupTime is the time the item was read from the DynamoDb table

https://github.com/awslabs/amazon-dynamodb-lock-client/blame/master/src/main/java/com/amazonaws/services/dynamodbv2/AmazonDynamoDBLockClient.java#L987
https://github.com/awslabs/amazon-dynamodb-lock-client/blame/master/src/main/java/com/amazonaws/services/dynamodbv2/AmazonDynamoDBLockClient.java#L1020

By reading the documentation I was interpreting "stale" and "lease time" meaning: if an item is added at X time and the lease time is L, at time Y where Y>X+L the lock would be stale (or expired).

The implementation does not translate to that exactly.

When using withShouldSkipBlockingWait(false) if the lock is stale (my definition of stale) it takes the whole additional lease time plus the additional time you passed for it to acquire the lock.

Can be easily reproduced by changing the lock2 in the example above to use withShouldSkipBlockingWait(false)

    01 Mar 2024 11:55:37,123 [INFO]  (Test worker): Slept a 10 sec. Lease time is 5 sec. Attempting to acquire lock again
    01 Mar 2024 11:55:42,736 [INFO]  (Test worker): Lock LockItem{Partition Key=myTest, Sort Key=Optional.empty, Owner Name=----, Lookup Time=350712085, Lease Duration=5000, Record Version Number=0c6e2944-0819-456f-aec1-6f5a53428883, Delete On Close=true, Data=, Is Released=false}

(it took 5 seconds to acquire the lock after the 10 second sleep. The lease time of this lock was 5 seconds. The attempt to get the second lock should have immediately acquired the lock if the definition of stale was the one I added here)

@evanhat
Copy link

evanhat commented Mar 12, 2024

+1 @a-lcardos

I had withShouldSkipBlockingWait set to true, and I just spent 2 hours trying to figure out why the lease expiration wasn't working lol.

If my understanding is correct, it looks like the only way for a lock to "expire" (without calling release) is to call acquireLock() and let it block for the entire lease duration (?)

It seems like there's no way you can use this to have a non-blocking lock acquire w/ lease expiration. Maybe you could work around it by setting a ttl on you table items, but that's just super not ideal.

@cristianocca
Copy link
Author

This was fixed in the internally-maintained version of this package. Would be good to have the fix shipped to the public version as well.

@evanhat
Copy link

evanhat commented Mar 12, 2024

@cristianocca It looks like you're right! I guess this was resolved on branch v1.3.x 👀.

@evanhat
Copy link

evanhat commented Mar 13, 2024

@cristianocca It looks like you're also right about that version not being shipped to the public 🥲.

@shetsa-amzn
Copy link
Collaborator

We have merged the latest commit from @moshegood and this change has been release in version 1.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants