Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update existing lists #144

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Update existing lists #144

wants to merge 1 commit into from

Conversation

bsyk
Copy link
Contributor

@bsyk bsyk commented Dec 1, 2024

To avoid the need to delete all lists and recreate them, we can update existing lists only when their contents had changed.

This processes the diffs between the desired list of domains and the existing lists. Removing entires that are no longer in the desired list and appending any new entries. This prefers to minimize the number of PATCH calls by appending entries to the lists we're already patching for the removals.

The priority for additions is:

  1. Add to lists we're already patching for removals, filling up to LIST_ITEM_SIZE entries.
  2. Add to existing lists with fewer than LIST_ITEM_SIZE entries.
  3. Create a new list.

@bsyk
Copy link
Contributor Author

bsyk commented Dec 1, 2024

This is quite a large diff. Happy to iterate if there are things you want to change.

@vietthedev
Copy link
Contributor

I'm not sure list patching can work reliably. A domain can be in another list instead of the previous list after blocklist update and would mess up the patching process.

For example: We have two lists and the first one is full. If a new domain is added to the first list after the blocklist update, the last domain would end up being in the second list. We wouldn't be able to check its existence without going through all the current lists.

Before blocklist update

1st list
domainA
domainAA
...
domainB

2nd list
domainC
domainD

After blocklist update

1st list
domainA
domainAA
domainAAA
...

2nd list
domainB
domainC
domainD

Things become even more complicated if the next update is from a different set of blockklists.

Therefore, to make it work reliably, we can only go through all the current lists but that would defeat the purpose to save time and requests because there isn't an API to get all items of all the lists.

@bsyk
Copy link
Contributor Author

bsyk commented Dec 3, 2024

This change does check all existing lists. It is a reliable operation and idempotent such that should an error occur, the same command can be used again to complete any missing changes.

The benefits of this change are not only (slightly) faster operations, but avoiding any period where the lists or rules are unapplied which would leave the network and users vulnerable to accessing otherwise blocked hosts.

The maximum requests in this flow is 2x number_of_lists (GET + PATCH to every list) in the worst case.
The normal case is > 1x < 2x number_of_lists where it would be unusual to have to patch every list.

The requests in the current flow is 2x number_of_lists (DELETE + POST to recreate every list) in the normal case.

To avoid the need to delete all lists and recreate them, we can update existing lists only when their contents had changed.
This processes the diffs between the desired list of domains and the existing lists. Removing entires that are no longer in the desired lists and appending any new entries. This prefers to minimize the number of PATCH calls by appending entries to the lists we're already patching for the removals.
The priority for additions is:
1. Add to lists we're already patching for removals.
2. Add to existing lists with fewer than LIST_ITEM_SIZE entries.
3. Create a new list.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants