Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

validate cache consistency on first access #815

Merged

Conversation

jmartin-tech
Copy link
Collaborator

Verify cache on startup without importing files unless rebuild is required.

This improves development workflow to ensure the that in progress plugins are found on cli launch, initial load times in development environments will delay for full rebuild.

Further enhancement to consider not addressed here:

  • user local plugins to tracked in upgrade plugin structure #384
  • targeted cache update of changed/added/removed plugins only (likely slated with user local support)
  • cache update base on filesystem changes when running in interactive mode (no issue yet needs discussion on value)

Testing:

  • A clean install will load the initial plugin file with limited delay.
python -m garak --list_generators

Monitor garak.log during the following:

  • Logs will report rebuilding plugin cache
touch garak/generators/litellm.py; python -m garak --list_generators
  • Logs will report rebuilding plugin cache
  • LiteLLM generators should not be reported in the output.
rm garak/generators/litellm.py; python -m garak --list_generators
  • Logs will report rebuilding plugin cache
  • LiteLLM generators should be reported in the output.
git checkout garak/generators/litellm.py; python -m garak --list_generators

Tests to be added shortly.

* validate on startup
* update on any plugin file change

Signed-off-by: Jeffrey Martin <[email protected]>
Comment on lines 70 to +71
if not os.path.exists(self._user_plugin_cache_filename):
update_user_file = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like this is asserted before it's known whether there are any changes present (i.e. before it's known if any data will be written). should the mkdir on L79 etc. be skipped if there won't be changes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If self._user_plugin_cache_filename does not exist it will be written to even if there are no changes required, this conditional detects a need to copy the file which will be a write. The mkdir is to ensure the full path the file will be placed in exists.

garak/_plugins.py Outdated Show resolved Hide resolved
Comment on lines +121 to +123
if mod_time > user_time:
# rebuild all for now, this can be more made more selective later
return False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

) as cache_file:
json.dump(local_cache, cache_file, cls=PluginEncoder, indent=2)
logging.debug("rebuilding plugin cache")
with PluginCache._mutex:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines +157 to +158
sorted_keys = sorted(list(plugin_dict.keys()))
local_cache[plugin_type] = {i: plugin_dict[i] for i in sorted_keys}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this sorting redundant, seeing as the data goes into a set?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sorting reduces churn in the package file when we commit updates. In testing of the original addition of the cache I found the order of plugins is not consistent unless sorted.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, ok. this might be relying on non-guaranteed behavior in dict but i guess we can deal with that if it rears its head.

@jmartin-tech jmartin-tech force-pushed the feature/cached-plugin-enum-updates branch 2 times, most recently from 3545818 to 767157e Compare August 1, 2024 15:55
* raises assertion when package plugin cache is not found
* test for removed module file
* test missing package plugin cache

Signed-off-by: Jeffrey Martin <[email protected]>
@jmartin-tech jmartin-tech force-pushed the feature/cached-plugin-enum-updates branch from 767157e to a6db301 Compare August 1, 2024 15:56
Comment on lines +100 to +103
@pytest.mark.skipif(
sys.platform == "win32",
reason="Windows Github executor does not raise the ValueError",
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this add, however the test passes in a Win10 miniconda env on 3.10 and even in github actions on 3.12, which suggests the failure is github runner specific. Will dig deeper at another time and skip the Windows test for now.

@jmartin-tech jmartin-tech merged commit a78c2f8 into NVIDIA:main Aug 1, 2024
8 checks passed
@jmartin-tech jmartin-tech deleted the feature/cached-plugin-enum-updates branch August 1, 2024 16:33
@github-actions github-actions bot locked and limited conversation to collaborators Aug 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants