Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Add options pertaining to snapshot expire schedule as part of config #151

Open
rams3sh opened this issue May 19, 2023 · 2 comments

Comments

@rams3sh
Copy link

rams3sh commented May 19, 2023

As discussed in discord, some community members including me have been facing inconsistent timeouts and errors during the snapshot expiry process.

There seems to be some bug with Athena and a case has been raised by me with regards to it. In parallel, to overcome the issue of timeouts, I tried experimenting by changing the schedule and stepfunction timeout to 30m from 1hr and it worked well for me. Expiry averages to 16 mins of running before failing with ICEBERG_VACUUM_MORE_RUNS_NEEDED and then subsequent query sucesssfully executes with average of 30 seconds for my volume of logs with this new setting!!

It would be helpful, if the schedule and stepfunction timeout is kept as part of config so that the consumer can find the sweet spot where the expiry works as expected depending on the size of the logs they ingest. This will also help in managing the athena related issue until it gets resolved.

@B161851
Copy link

B161851 commented Mar 15, 2024

@rams3sh I am also facing same issue, but I didn't get what you are trying to say. Can you please tell me how to avoid the timeout errors while running the vacuum from stepfunction. please post a code snippet how to add timeout variable

@rams3sh
Copy link
Author

rams3sh commented Mar 19, 2024

@B161851 There is an inherent issue with Athena because of which timeout issues are happening.

As a workaround, I manually updated the event bridge time scheduled to run the VACUUM command as there exists no parameter in matano config to do it from CLI. Decreasing the time of VACUUM ensures that cleanup data does not get accumulated faster. Please note, this is only a temporary fix. Permanent fix can only be provided by Athena team from AWS.

As and when you add more data sources, you may start witnessing the timeout even within the small duration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants