Skip to content

v0.4.0

Compare
Choose a tag to compare
@nmichlo nmichlo released this 31 Mar 10:32
· 60 commits to main since this release

Major Additions

  • Added disent.dataset.DisentIterDataset to compliment DisentDataset for datasets without size.
  • Added Cars3d64Data and SmallNorb64Data to disent.dataset.data. These classes are optimised versions of their respective datasets that have their transforms pre-computed. This is much faster than resizing the observations during training as most of the disentanglement benchmarks are based off of datasets of width and height: 64x64
  • Added disent.dataset.sampling.GroundTruthRandomWalkSampler. This ground-truth dataset sampler simulates random walks around the factor space. For example: if there are two ground-truth factors x and y corresponding to a grid, this sampler would simulate an agent randomly moving around the grid.
  • Improvements to the registry. Augments, reconstruction losses and latent distributions can now be registered with disent using disent.registry.KERNELS, disent.registry.RECON_LOSSES and disent.registry.LATENT_HANDLERS. This affects:
    • disent.frameworks.helper.latent_distributions.make_latent_distribution
    • disent.frameworks.helper.reconstructions.make_reconstruction_loss
    • disent.dataset.transform._augment.get_kernel
  • Refactored disent.frameworks.DisentFramework, now also supports PyTorchLightning training, validation and test steps.
  • Split Ae and Vae heirarchy
    • This is so that we can directly check if a framework is an instance of one or the other. Previously Vae was a subclass of Ae which was unintuitive.
  • Rewrite of the disent.registry to make it more intuitive and useful throughout disent. Custom regex resolvers can now also be registered. There are now also different types of registries. Registries now also have examples for each item that can be constructed. See disent.registry._registry for more information.

Other Improvements

  • Improvements to disent.dataset.DisentDataset:
    • Added sampler, transform and augment properties.
    • Improved shallow_copy and unwrapped_shallow_copy logic and available arguments.
    • Can now return the ground-truth factors by specifying DisentDataset(return_factors=True)
    • Improved handling of batches and collating
  • Added state_space_copy(...) to disent.dataset.data.GroundTruthData, this function returns a copy of the underlying state space.
    • disent.dataset.samling Samplers now store the copy of the state space instead of the original dataset
  • Added sample(...) to disent.dataset.sampling.BaseDisentSampler, which is a more explicit alias to the original __call__(...) method.
  • to_img_tensor_u8 and to_img_tensor_f32 now check the size of the observations before resizing, if the size is unchanged, performance is greatly improved! This affects ToImgTensorF32 and ToImgTensorU8 from disent.dataset.transform.
  • Added factor_multipliers property to disent.dataset.util.state_space.StateSpace which allows custom implementations of pos_to_idx and idx_to_pos.
  • Added torch math helper functions to: disent.nn.functional
    • including: torch_norm, torch_dist, torch_norm_euclidean, torch_norm_manhattan, and torch_dist_hamming.
  • Added triplet_soft_loss and dist_triplet_soft_loss to torch.nn.loss.triplet.
  • Added more modes to disent.nn.weights.init_model_weights.
  • Added FixedValueSchedule and MultiplySchedule to disent.schedule. These schedules are useful for setting a constant value throughout a run, and overriding the actually set values in the config.
  • Added modify_name_keep_ext to disent.util.inout.paths. For adding prefixes or suffixes to files names without affecting the extension.
  • Added the decorator try_njit to disent.util.jit. This decorator tries to wrap the function with numba.njit, otherwise a warning is displayed. Numba should be an optional dependency, it is not specified in the requirements.
  • Split disent.util.lightning.callbacks into separate files.
    • Added many new features and fixes to these callbacks for the new versions.
  • Added disent.util.math.integer for computing the gcd and lcm with arbitrary precision values.
  • Added disent.util.visualise.vis_img with various features for visualising both tensors and bumpy images.
    • tensors by default are considered to be in CHW format, while numpy arrays are considered to be in HWC format. These values can be overridden
    • See torch_to_images(...) and numpy_to_images(...) for more details.
    • Other duplicated functions throughout the library will be replaced with these in future.

Breaking Changes

  • Temporarily removed DSpritesImagenetData. This dataset contains research code for my MSc and was not intended to be in previous releases. This will be re-added soon.
  • disent.dataset.transform._augment.make_kernel default scale mode changed to "none" from "sum".
    • This affects various other locations in the code, including disent.frameworks.helper.reconstructions.AugmentedReconLossHandler which uses kernels to augment loss functions.
  • Split Ae and Vae heirarchy
    • Vae is no longer an instance of Ae.
  • Metrics are now instances of disent.metrics.utils.Metric.
    • This callable class can easily be created using the disent.metrics.utils.make_metric decorator over existing metric functions.
    • The purpose of this change is to make metric default arguments self-contained. The Metric class has the functions compute and compute_fast which wrap the underlying decorated function. Arguments can be overridden as usual, however, the two versions when called use different default arguments.
  • Renamed and removed functions inside disent.util.visualise.vis_latents

Fixes

  • Fixed disent.dataset.sampling.GroundTruthDistSampler numerical precision error when computing scaled factor distances. Without this fix there is up to 1.5% change of making a sampling error over certain datasets.
  • Updated disent.nn.functional._pca for newer torch versions
  • Renamed disent.nn.loss.softsort.torch_soft_sort(...) parameter dims_at_end to leave_dims_at_end. This now matches torch_soft_rank(...).
  • disent.nn.loss.triplet_mining.configured_idx_mine(...) now exits early if the mode is set to "none".

Config Changes

  • Removed augment/basic.yaml and added augment/example.yaml instead.
  • Added the config group run_plugins which can be used to register a callback that is run by the experiment to register custom items with the disent framework such as new reconstruction losses or kernels.
  • dataset/cars3d.yaml and dataset/smallnorb.yaml now point to the optimized 64x64 versions of the datasets by default.
  • Renamed disable_decoder to detach_decoder in Ae and Vae configs
  • Removed disable_posterior_scale option from Ae and Vae configs
  • models/*.yaml now directly point to a model target instead of a separate encoder and decoder
  • run_callbacks/*.yaml now directly point to class targets rather than using pre-defined keys
  • run_logging/*.yaml now directly point to class targets rather than using pre-defined keys
  • Rewrite experiment.run to be more general. The hydra and experiment functionality can now be called from anywhere or used anywhere.
    • Ability to register your own config overrides without extending or forking disent has been added. We enable this by adding to the hydra search path. All that a user needs to do is specify the DISENT_CONFIGS_PREPEND environment variable to a new config folder. Anything inside this new config folder will recursively take priority over the existing experiment/config folder.
  • Rewrite HydraDataModule to only accept necessary arguments rather than the raw config. Configs are updated accordingly to specify these parameters directly.
  • Added experiment.util.hydra_main which can be used anywhere to launch a hydra experiment using the disent configs.
    • hydra_main(...) is used to run an experiment that passes a config to the given callback
    • patch_hydra() can instead be used just to initialise hydra if you want to setup everything yourself. The search path plugin that looks for DISENT_CONFIGS_PREPEND is registered, as well as various OmegaConf resolvers, including:
      • ${exit:<msg>} register a custom OmegaConf resolver that exits the program if accessed. We can use this to deprecate functionality, or force variables to be overridden!
      • ${run_num:<root_dir>} returns the current experiment number
      • ${run_dir:<root_dir>,<name>} returns the current experiment folder with the name appended
      • ${fmt:"{:04d}",42} returns "0042", the exact same as str.format
      • ${abspath:<rel_path>} convert a relative path to an abs path using the original hydra working directory, not the changed experiment dir.
      • ${rsync_dir:<src>/<name>,<dst>/<name>} useful if datasets are already prepared on a shared drive and need to be copied to a temp drive for example!
  • Added experiment.util.path_utils which adds support for automatically obtaining an experiment number from a directory of number prefixed files. The number returned is the existing maximum number plus one.

Test Changes

  • Updated tests.test_experiment to use new experiment.util.hydra_main functionality
  • Pickle tests for frameworks
  • Tests for torch norm functions
  • Registry test fixes
  • Extensive tests for new disent.util.visualize.vis_img functions and returned datatypes
  • temp_environ context manager