Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

955 - Download Config Validation for Batch Jobs #1016

Merged

Conversation

avrohomgottlieb
Copy link
Contributor

@avrohomgottlieb avrohomgottlieb commented Dec 4, 2024

Issue Number

Closes #955

Purpose/Implementation Notes

This PR removes the submitting of invalid jobs to Batch.

We had originally just submitted all project (or sample) download_config combinations to Batch, and let the invalid ones return early. However, spinning up Batch instances with jobs that will not create a computed file is not an optimal solution (and is not cost effective). As such, this PR tightens up dispatch_to_batch by first validating which download configs produce libraries with projects and samples and only then sending them through.

Changes are also applied to loader::generate_computed_files to maintain consistency among the load_data and dispatch_to_batch+generate_computed_file workflows.

The patches in test_loader had to be slightly refactored in order to accommodate for this change as well. When the dust settled, the constants GENERATED_PROJECT_DOWNLOAD_CONFIGS and GENERATED_SAMPLE_DOWNLOAD_CONFIGS were removed because they were no longer necessary.

Types of changes

What types of changes does your code introduce?

  • New feature (non-breaking change which adds functionality)

Functional tests

List out the functional tests you've completed to verify your changes work locally.

Checklist

  • Lint and unit tests pass locally with my changes

Screenshots

N/A

@avrohomgottlieb avrohomgottlieb changed the title Avrohom/955 add batch job download config validation 955 - Add batch job download config validation Dec 4, 2024
@avrohomgottlieb avrohomgottlieb changed the title 955 - Add batch job download config validation 955 - Download Config Validation for Batch Jobs Dec 4, 2024
…r used GENERATED_{PROJECT, SAMPLE}_DOWNLOAD_CONFIGS constants from common.py
@avrohomgottlieb
Copy link
Contributor Author

avrohomgottlieb commented Dec 5, 2024

Because this PR touches Library::get_project_libraries_from_download_config and Library::get_sample_libraries_from_download_config, I'd like to take the opportunity to move the first method to the Project model and change the name to Project::get_libraries, and the same for the second method, renaming it Sample::get_libraries.

@avrohomgottlieb avrohomgottlieb marked this pull request as ready for review December 5, 2024 17:34
@davidsmejia
Copy link
Contributor

Because this PR touches Library::get_project_libraries_from_download_config and Library::get_sample_libraries_from_download_config, I'd like to take the opportunity to move the first method to the Project model and change the name to Project::get_libraries, and the same for the second method, renaming it Sample::get_libraries.

This sounds good to me, we should make download_config optional then and return all libraries if not passed.

Comment on lines 156 to 157
if not libraries.exists():
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of deleting this we should throw an error here. We should test that we fail this as well.

@@ -269,6 +269,13 @@ def get_download_config_file_paths(self, download_config: Dict) -> List[Path]:
file_path for file_path in data_file_path_objects if file_path.parent.name != "merged"
]

def get_valid_download_configs(self) -> List[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's break this up into two different properties project.valid_download_configs and project.valid_download_config_names

if not libraries.exists():
return
raise ValueError(
"Invalid request: no libraries exist with this sample-download_config combination."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Invalid request: no libraries exist with this sample-download_config combination."
"Unable to find libraries for download_config."

if not libraries.exists():
return
raise ValueError(
"Invalid request: no libraries exist with this project-download_config combination."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Invalid request: no libraries exist with this project-download_config combination."
"Unable to find libraries for download_config."

return self.libraries.all()

if download_config not in common.PROJECT_DOWNLOAD_CONFIGS.values():
raise ValueError("Invalid download configuration passed. Unable to retrieve libraries.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise ValueError("Invalid download configuration passed. Unable to retrieve libraries.")
raise ValueError("Invalid download_config passed. Unable to retrieve libraries.")

return self.libraries.all()

if download_config not in common.SAMPLE_DOWNLOAD_CONFIGS.values():
raise ValueError("Invalid download configuration passed. Unable to retrieve libraries.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise ValueError("Invalid download configuration passed. Unable to retrieve libraries.")
raise ValueError("Invalid download_config passed. Unable to retrieve libraries.")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's replace the download config implementations with references to common here.

@avrohomgottlieb avrohomgottlieb merged commit 7368829 into dev Dec 6, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add download config validation for Batch job submission
2 participants