v0.7.1 - Code fixes, splits specification, and metadata validation and handling
This release addresses some previous bugs with loading datasets on init(); that functionality has been removed for the time being, in favor of a more robust refactor in future releases.
In addition to code fixes, we've added the following functionality and improvements:
Users can now specify to download specific splits when they load_data(); this reduces the time and RAM required for people to use datasets when they may only need part of it.
Ex: tr = f.load_data(split="train")
Also, dataset metadata are now validated with appropriate error handling, so a user publishing a new dataset is instantly notified if any part of their specification is incompatible with the metadata schema. This will improve the user experience for both dataset publishers and consumers.
Additionally, this release includes code cleanup, docs improvements, and new applied AI examples.
What's Changed
- Metadata error handling by @blue442 in #377
- remove breaking changes that load on init by @ascourtas in #406
- Split specification by @blue442 in #344
- automating api documentation using github action by @blue442 in #342
- Removing remnants of XTract by @blaiszik in #355
- Merge Split specification from dev into main (#344) by @blaiszik in #356
- Update README.md with contributing instructions by @ascourtas in #357
- add jingrui examples by @ascourtas in #363
- Adds note for quickstart globus set to false by @marshallmcdonnell in #374
Full Changelog: v0.7.0...v0.7.1