Release v0.7.1 - Code fixes, splits specification, and metadata validation and handling · MLMI2-CSSI/foundry

This release addresses some previous bugs with loading datasets on init(); that functionality has been removed for the time being, in favor of a more robust refactor in future releases.

In addition to code fixes, we've added the following functionality and improvements:

Users can now specify to download specific splits when they load_data(); this reduces the time and RAM required for people to use datasets when they may only need part of it.

Ex: tr = f.load_data(split="train")

Also, dataset metadata are now validated with appropriate error handling, so a user publishing a new dataset is instantly notified if any part of their specification is incompatible with the metadata schema. This will improve the user experience for both dataset publishers and consumers.

Additionally, this release includes code cleanup, docs improvements, and new applied AI examples.

What's Changed

Metadata error handling by @blue442 in #377
remove breaking changes that load on init by @ascourtas in #406
Split specification by @blue442 in #344
automating api documentation using github action by @blue442 in #342
Removing remnants of XTract by @blaiszik in #355
Merge Split specification from dev into main (#344) by @blaiszik in #356
Update README.md with contributing instructions by @ascourtas in #357
add jingrui examples by @ascourtas in #363
Adds note for quickstart globus set to false by @marshallmcdonnell in #374

Full Changelog: v0.7.0...v0.7.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.7.1 - Code fixes, splits specification, and metadata validation and handling

What's Changed

Contributors