-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[aws] [s3] Introduce ignore_older & start_timestamp for S3 input allowing better registry cleanups #41817
Draft
Kavindu-Dodan
wants to merge
8
commits into
elastic:main
Choose a base branch
from
Kavindu-Dodan:feat/s3-input-start-time-and-ignore-old
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
[aws] [s3] Introduce ignore_older & start_timestamp for S3 input allowing better registry cleanups #41817
Kavindu-Dodan
wants to merge
8
commits into
elastic:main
from
Kavindu-Dodan:feat/s3-input-start-time-and-ignore-old
+831
−57
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Kavindu-Dodan
added
enhancement
Team:obs-ds-hosted-services
Label for the Observability Hosted Services team
backport-8.x
Automated backport to the 8.x branch with mergify
labels
Nov 27, 2024
botelastic
bot
added
needs_team
Indicates that the issue/PR needs a Team:* label
and removed
needs_team
Indicates that the issue/PR needs a Team:* label
labels
Nov 27, 2024
6 tasks
Kavindu-Dodan
force-pushed
the
feat/s3-input-start-time-and-ignore-old
branch
2 times, most recently
from
November 27, 2024 22:32
4924d70
to
79ae2c1
Compare
Kavindu-Dodan
commented
Nov 27, 2024
@@ -115,6 +115,7 @@ filebeat.inputs: | |||
- Add support to source AWS cloudwatch logs from linked accounts. {pull}41188[41188] | |||
- Jounrald input now supports filtering by facilities. {pull}41061[41061] | |||
- Add support to include AWS cloudwatch linked accounts when using log_group_name_prefix to define log group names. {pull}41206[41206] | |||
- AWS S3 input registry cleanup for untracked s3 objects. {pull}41694[41694] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Kavindu Dodanduwa <[email protected]> # Conflicts: # x-pack/filebeat/input/awss3/states.go # x-pack/filebeat/input/awss3/states_test.go
Signed-off-by: Kavindu Dodanduwa <[email protected]>
Signed-off-by: Kavindu Dodanduwa <[email protected]>
…them Signed-off-by: Kavindu Dodanduwa <[email protected]>
Signed-off-by: Kavindu Dodanduwa <[email protected]>
Signed-off-by: Kavindu Dodanduwa <[email protected]>
Signed-off-by: Kavindu Dodanduwa <[email protected]>
Signed-off-by: Kavindu Dodanduwa <[email protected]>
Kavindu-Dodan
force-pushed
the
feat/s3-input-start-time-and-ignore-old
branch
from
December 3, 2024 23:06
52fad61
to
6f5472c
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
backport-8.x
Automated backport to the 8.x branch with mergify
enhancement
Team:obs-ds-hosted-services
Label for the Observability Hosted Services team
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed commit message
Introduce
ignore_older
andstart_timestamp
properties to AWS S3 input. This is a follow-up for #41694.The configurations introduced here act as input object filters. If the object fails to match derived filters, the entries will be cleaned up from the registry, reducing filebeat memory consumption.
Introduced configurations are,
For both inputs, the object's last modified timestamp is taken into comparison. See Use cases section for further explanation
Note - a follow-up for #41694. Hence diff contains all changes
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Disruptive User Impact
None as defaults are disabled. However, when configurations introduced here are used, the following can have an impact on the user,
start_timestamp
is defined, then objects with the last modified timestamps prior to the timestamp are ignored from processing (documented 1)ignore_older
is defined, then objects that do not fall within the look-back period when processing started (polling run) are ignored (documented 1)start_timestamp
&ignore_older
are defined, the initial run will process all entries up tostart_timestamp
. The subsequent runs will not include entries that do not fall withinignore_older
even if processing failed for an object. (documented 1)How to test this PR locally
ignore_older
&start_timestamp
to see how data ingestion change with their values. See Use cases section for further explanationRelated issues
aws-s3
input's bucket polling accumulates state in the registry #39116Use cases
Consider below diagrams where there're 3 objects Object A, Object B and Object C with their respect last modified timestamps t1, t2 and t3.
And consider how filebeat processes and track registry entries based on following scnearios
Default behavior
If none of the configurations used, then filebeat will process and internal registry will track all objects continuously unless they are removed from the bucket.
Use start_timestamp
If
start_timestamp
is used, then objects newer than the timestamp are accepted for processing. The registry will grow unless objects are removed from the bucket.Use ignore_older
If
ignore_older
is defined, input will process objects within the provided duration, calculated from the current time. The registry will track objects within the current timeframe and others will get cleaned up eventually by subsequent runs.Use both ignore_older & start_timestamp
If both properties are defined,
ignore_older
duration).ignore_older
duration.Footnotes
https://github.com/elastic/beats/pull/41817/files#diff-422765b7341c5bbf6de7af38927e34e00a5073b188585a7af3c4fee1175b64a6R574-R597 ↩ ↩2 ↩3
https://github.com/Kavindu-Dodan/data-gen ↩