Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TTL support #144

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Add TTL support #144

wants to merge 8 commits into from

Conversation

exoego
Copy link

@exoego exoego commented Aug 18, 2021

Closes #103

  • Run background "scanner" job: using setInterval to support legacy Node.js not having Worker threads
  • The job scans items with the TTL attribute. Then, it delete the expired items.

TODO

  • Enable TTL Scanner background job only if TTL is enabled
  • Configurable scanning job interval
  • Delete items from GSI/LSI too
  • Testing with a table created w/ TTL, not via UpdateTimeToLive

@exoego exoego force-pushed the ttl-support branch 2 times, most recently from 9853879 to a0dae69 Compare August 22, 2021 07:17
@exoego exoego marked this pull request as ready for review August 22, 2021 07:17
@exoego
Copy link
Author

exoego commented Aug 22, 2021

I think This PR is now ready to be reviewd.

Also, I recommend to setup GitHub Actions via #146

db/index.js Outdated
})
})
})
}, 1000)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how long the expiry job may run if there are many tables or large tables. 🤔

I occasionally dump a lot of data into dynalite in order to exercise business logic code, in which case, table scan can easily take longer than 1s.

Btw., aws dynamodb expiry is not guaranteed to happen on the second, in fact it can be delayed by hours even a day.

Copy link
Author

@exoego exoego Aug 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw., aws dynamodb expiry is not guaranteed to happen on the second, in fact it can be delayed by hours even a day.

Yeah, that is true.
According to the docs, "TTL typically deletes expired items within 48 hours of expiration".

I am considering to make it configurable.
Currently, I set it 1000 to minimize test time.

table scan can easily take longer than 1s.

I am also considering to optimize table scan.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per my comments below, I do still share overall concerns about efficiency – moreso seeing that folks like @dimaqq dump large amounts of data into this thing. I def think this can get figured out though!

@reutsharabani
Copy link

Any chance this is merged soon?

README.md Outdated Show resolved Hide resolved
@CLAassistant
Copy link

CLAassistant commented Aug 30, 2023

CLA assistant check
All committers have signed the CLA.

Copy link
Member

@ryanblock ryanblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is rad – I love where you're going (where you went?) here. I know it's been a while, but I'd love to try to get this over the finish line!

db/index.js Outdated
var currentUnixSeconds = Math.round(Date.now() / 1000)

function logError(err, result) {
if (err) console.error("@@@", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk about @@@ but we should plug this into #10 (or whatever our verbose logging is, when ready).

db/index.js Outdated
}, {})
var data = {TableName: name, Key: itemKey}
var cb = function (err) {
// Noop ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prob a verbose log?

if (err) console.error("@@@", err)
}

lazyStream(tableDb.createKeyStream({}), logError)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a completely ignorant comment but is it possible this is an inefficient way to query our table names?

Overall I'm very curious as to how expensive this operation is to run. It's probably not likely to blow up a modern machine, but it's O(n^lots). I wonder if a more efficient approach would be to create an internal ttl index and query that instead of streaming out + operating on the contents of each table.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have time to create such an index for efficient traversing.

})
})
})
}, 3000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible this could be reliably reduced? Those seconds add up in test suites!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced the fixed setTimeout(..., 3000) with setInterval(... 50) in 669eea4

On my machine, the test ends somewhere between 1900-2000 ms.

@ryanblock ryanblock marked this pull request as draft August 30, 2023 02:30
@tianshu123
Copy link

Hi, it looks to me that everything is ready to be merged. May I ask when will all the commits be merged?

@ryanblock
Copy link
Member

@tianshu123 we'll take next steps once I've heard back from @exoego. I haven't seen any remarks or resolved conversations.

@exoego exoego marked this pull request as ready for review November 28, 2023 03:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Does dynalite support TTL?
6 participants