-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Publishing rework #451
Publishing rework #451
Conversation
…-cov-conflict downgrade pytest-cov to ~=2.12.1 to fix conflict with flake8
Update setup and requirements to remove ML packages
temporarily include jsonschema in tests
…tests remove references to dlhub in test files
Release updates for notebooks
summer student work and additional progress, including: - code coverage - bubble vis of repo - bug fixes - project README changes and improvements - OOP refactoring - test updates - updated example notebooks
… remove dataframe building section at end, clean up installs
add joblib to setup.py and requirements.txt, increment to foundry ver…
for oqmd notebook, remove reference to f.describe(), add train split,…
_read_json was printing a debug data frame. Removed.
* add initial directory-making functionality * add acl permission setting * add PUT request logiv and ACL setting, plus TODOs * add logic to delete acl rule after creation * add try/except handling to acl creation * add prepare query param so we don't need to make dirs; fix bug when rule_id is not set * clean up path joining logic, as well as comments * add capability to upload all files in a folder, instead of one individual file * update endpoint destination to use a UUID as the folder name * break out acl rule adding to its own function, tidy up * break out PUT request functionality * break out upload_folder() into upload_file() and integrate https functions into publish(), with proper params * change endpoint to NCSA, make usage more modular; small os.path bug fixes * reorder functions to be easier to read * add upload capability for single file, with error handling * fix logic bugs with destination path setting s.t. all subfolders are written to destination * cleanup var names in upload_folder() logic; making endpoint_dest path more robust * code cleanup and breakout helper functions to reduce size of publish() * add parameter checks to publish() and reduce param complexity * add docstrings, plus add test param to publish() * appease flake8 * add one more flake8 fix * fix auths in tests, add system test for HTTPS publication, small comments * add system test for HTTPS upload * break out https publishing into more unit-testable method * refactor function defs to work better for testing; add https upload unit test * fix bug where artifact was written to uploaded dataset * update os.walk block comparison to be more robust * update publish() docstring and add type hints * clean up imports, fix type hint for Response, add some context for Xtract file * WIP to separate helpers into submodule -- need to fix test and method design * fix typing discrepancy for requests.Response * update modification date * Temporarily remove ACL rule creation for https upload * Fix flake8 comment error * Fix flake8 once more * Fixing local tests, flake8, kwargs * Adding test data * Debug result on GHA * Debug result on GHA * Debug result on GHA * Debug result on GHA * add Ben's patch to submodule * generalize the included functions and move make_globus_link here from foundry object * move make_globus_link function to submodule * update tests to generalized input format * properly pass 'auths' object between functions * update modification date * prepend underscore to private function * correct call to upload_to_endpoint() in foundry.py * re-add ACL rule logic * update auth passing to be more user-friendly; includes test changes * Introduce a collection to hold authorizers It uses a dataclass so that we can annotate the type of authorizers that the tuple, then document them I put it in a new module, `foundry.auth` so that it can be used by both the foundry module and the https_upload module (avoiding circular dependencies) * alter args such that it's not possible for the user to have endpoint_id and gcs_auth_client misalign * change language to endpoint_auth_clients for clarity of purpose * docstring updates --------- Co-authored-by: Ben Blaiszik <[email protected]> Co-authored-by: isaac-darling <[email protected]> Co-authored-by: Logan Ward <[email protected]>
* add initial directory-making functionality * add acl permission setting * add PUT request logiv and ACL setting, plus TODOs * add logic to delete acl rule after creation * add try/except handling to acl creation * add prepare query param so we don't need to make dirs; fix bug when rule_id is not set * clean up path joining logic, as well as comments * add capability to upload all files in a folder, instead of one individual file * update endpoint destination to use a UUID as the folder name * break out acl rule adding to its own function, tidy up * break out PUT request functionality * break out upload_folder() into upload_file() and integrate https functions into publish(), with proper params * change endpoint to NCSA, make usage more modular; small os.path bug fixes * reorder functions to be easier to read * add upload capability for single file, with error handling * fix logic bugs with destination path setting s.t. all subfolders are written to destination * cleanup var names in upload_folder() logic; making endpoint_dest path more robust * code cleanup and breakout helper functions to reduce size of publish() * add parameter checks to publish() and reduce param complexity * add docstrings, plus add test param to publish() * appease flake8 * add one more flake8 fix * fix auths in tests, add system test for HTTPS publication, small comments * add system test for HTTPS upload * break out https publishing into more unit-testable method * refactor function defs to work better for testing; add https upload unit test * fix bug where artifact was written to uploaded dataset * update os.walk block comparison to be more robust * update publish() docstring and add type hints * clean up imports, fix type hint for Response, add some context for Xtract file * WIP to separate helpers into submodule -- need to fix test and method design * fix typing discrepancy for requests.Response * update modification date * Temporarily remove ACL rule creation for https upload * Fix flake8 comment error * Fix flake8 once more * Fixing local tests, flake8, kwargs * Adding test data * Debug result on GHA * Debug result on GHA * Debug result on GHA * Debug result on GHA * add Ben's patch to submodule * generalize the included functions and move make_globus_link here from foundry object * move make_globus_link function to submodule * update tests to generalized input format * properly pass 'auths' object between functions * update modification date * prepend underscore to private function * correct call to upload_to_endpoint() in foundry.py * re-add ACL rule logic * update auth passing to be more user-friendly; includes test changes * Introduce a collection to hold authorizers It uses a dataclass so that we can annotate the type of authorizers that the tuple, then document them I put it in a new module, `foundry.auth` so that it can be used by both the foundry module and the https_upload module (avoiding circular dependencies) * alter args such that it's not possible for the user to have endpoint_id and gcs_auth_client misalign * change language to endpoint_auth_clients for clarity of purpose * docstring updates * fix bug from last round of review edits --------- Co-authored-by: Ben Blaiszik <[email protected]> Co-authored-by: isaac-darling <[email protected]> Co-authored-by: Logan Ward <[email protected]>
* update publishing notebook example to use HTTPS upload primarily, along with minor fixes * add https upload methods and data * fix function call to publish_dataset * remove ACL rule code to fix error issue * update globus images in notebook * remove commented code * add missing scopes * appease flake overlords * Add search lambda authorizer (sl_authorizer) to dlhub_client instantiation * removed unnecessary scopes * update curation info in notebook --------- Co-authored-by: Ben Blaiszik <[email protected]> Co-authored-by: isaac-darling <[email protected]> Co-authored-by: Logan Ward <[email protected]> Co-authored-by: Eric Blau <[email protected]>
-- adding back in dev
* adding ability to specify splits for loading * refining test * Update splits_to_load --> splits --------- Co-authored-by: blaiszik <[email protected]>
* update to 0.6.0 for HTTPS pub * Upload Foundry class load() function to default download using https (#340) * Update setup.py Fix version number for pyPI deploy * Update setup.py version for pyPI * Update requirements.txt to latest DLHub SDK This is needed to require upgrade of DLHub SDK for Foundry users when they upgrade Foundry. * Update version to 0.6.3 * incorporate load() to foundry.__init__() * automating api documentation using github action (#342) * CI: Automated documentation build * Removing remnants of XTract * CI: Automated documentation build * CI: Automated documentation build * Update README.md with contributing instructions (#357) * Update README.md with contributing instructions * Update PR language * merging in split specification * flake fixes * add jingrui examples (#363) * removed blank line * Load on init (#358) * incorporate load() to foundry.__init__() * merging in split specification * flake fixes * removed blank line * Adds note for quickstart globus set to false * Validating metadata before publishing * remove arguments from Foundry object that are duplicated with base class * Update setup.py version to 0.7.0 * refactor foundry to separate foundry instance from dataset objects * fine tuning search functionality * removing redefinition of FoundryDataset * address comments in PR review * remove unused import * updgrade setup-python from v2 to v4 * upgrade other setup-python from v2 to v4 * modify limit test --------- Co-authored-by: ascourtas <[email protected]> Co-authored-by: Ben Blaiszik <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Marshall McDonnell <[email protected]>
PR Summary
|
…ig to ignore said subfolder for line length, updated README.md
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## dev #451 +/- ##
======================================
Coverage ? 74.01%
======================================
Files ? 12
Lines ? 943
Branches ? 0
======================================
Hits ? 698
Misses ? 245
Partials ? 0 ☔ View full report in Codecov by Sentry. |
Addressing #404 - adopted the existing infastructure to allow for the publishing of a
FoundryDataset
object as opposed to the previous approach of passing arguments to theFoundry.publish()
function. This will allow better validation of the required components - in particular, the objects containing the datacite and metadata information.To facilitate the validation of the datacite and metadata information, the
dc_model.py
andproject_model.py
files were created using the datamodel code generator package, which took the existing jsonschema definitions for these objects from the MDF schema repository and created a pydantic representation to be used in the validation. In an effort to ensure reproducibility for any future changes to the schemas, the resultant .py files from the datamodel code generator were left full intact. These are used as superclasses to model definitions forFoundrySchema
andFoundryDatacite
objects in themodels.py
file that add the required logic for the pydantic classes to work well with the Foundry package.The
dataset_publishing.ipynb
was updated to reflect the new approach, but will require additional work once #403 is undertaken.