Releases: lightly-ai/lightly
BYOL model, Refactoring and New Tutorial for Active Learning
BYOL model, Refactoring and New Tutorial for Active Learning
New Model: BYOL
- This release adds a new model for self-supervised learning: BYOL (see https://arxiv.org/abs/2006.07733)
- Thanks @pranavsinghps1 for your contribution!
Improvements
- Refactored NTXent Loss. The new code is shorter and easier to understand.
- Added a scorer for semantic segmentation to do active learning with image segmentation
- Added color highlighting in CLI
- CLI returns now the
dataset_id
when creating a new dataset
New Active Learning Turorial using Detectron2
- This tutorial shows the full power of the lightly self-supervised embedding and active learning scorers
- Check it out here: https://docs.lightly.ai/tutorials/platform/tutorial_active_learning_detectron2.html
Models
- Bootstrap your own latent: A new approach to self-supervised Learning, 2020
- Barlow Twins: Self-Supervised Learning via Redundancy Reduction, 2021
- SimSiam: Exploring Simple Siamese Representation Learning, 2020
- MoCo: Momentum Contrast for Unsupervised Visual Representation Learning, 2019
- SimCLR: A Simple Framework for Contrastive Learning of Visual Representations, 2020
Active Learning Refactoring and Minor Improvements
Active Learning Refactoring and Minor Improvements
Instantiate shuffle tensor directly on device
This change makes our momentum encoders more efficient by directly instantiating temporary tensors on device instead of moving them there after instantiation. Thanks a lot to @guarin for pointing out the problem and swiftly fixing it!
Active Learning Refactoring
The new strategy of uploading active learning scores to a query tag instead of the preselected tag is enforced making our framework more flexible, easier to use, and allowing users to make several samplings with the same set of scores at the cost of little computational overhead.
Additionally, active learning scores were renamed to match the current literature. We now support uncertainty sampling with the least confidence, margin and entropy variant as described in http://burrsettles.com/pub/settles.activelearning.pdf, page 12f, chapter 3.1.
Minor Bug Fixes and Improvements
Better handling of edge cases when doing active learning for object detection.
Models
More Powerful CLI Commands, Stability Improvements and Updated Documentation
More Powerful CLI Commands, Stability Improvements and Updated Documentation
Create a new dataset directly when runninglightly-upload
and lightly-magic
Just replace the argument dataset_id="your_dataset_id"
with the argument new_dataset_name="your_dataset_name"
. To learn more, look at the docs,
Get only the newly added samples from a tag
lightly-download
has the flag exclude_parent_tag
If this flag is set, the samples in the parent tag are excluded from being downloaded. This is very practical when doing active learning and you only want the filenames newly added to the tag.
ActiveLearningAgent
has new attribute added_set
If you prefer getting the newly added samples from the active learning agent, just access its new attribute added_set
Minor Updates and Fixes
Updated documentation and docstrings to make working with lightly simpler.
Minor bug fixes and improvements.
Models
Hypersphere Loss, Stability Improvements and Updated Documentation
Hypersphere Loss, Stability Improvements and Updated Documentation
Hypersphere Loss (@EelcoHoogendoorn)
Implemented the loss function described here, which achieves competitive results with more cited ones (symmetric negative cosine similarity & contrastive loss) while providing better interpretability.
You can use the loss in combination with all other losses supported by lightly:
# initialize loss function
loss_fn = HypersphereLoss()
# generate two random transforms of images
t0 = transforms(images)
t1 = transforms(images)
# feed through (e.g. SimSiam) model
out0, out1 = model(t0, t1)
# calculate loss
loss = loss_fn(out0, out1)
Thank you, @EelcoHoogendoorn, for your contribution
Minor Updates and Fixes
Updated documentation and docstrings to make working with lightly simpler.
Minor bug fixes and improvements.
Models
Consistency Regularization, CLI update, and API client update
Consistency Regularization, CLI update, and API client update
Consistency Regularization
This release contains an implementation of the CO2 (consistency contrast) regularization which can be used together with our contrastive loss function. We observed consistent (although marginal) improvements when applying the regularizer to our models!
lightly-version
A new CLI command was added to enable users to easily check the installed version from the command line. This is especially useful when working with different environments and it's not clear which version of lightly is being used. The command is:
> lightly-version
1.0.4
API client
Minor updates to the API client were made enabling lightly to send exif
data of images to the API and to make sampling requests with the sampling method ACTIVE_LEARNING
which simply returns the samples with the highest active learning score.
Models
New Augmentation (Solarization) and Updates to README and Docs
New Augmentation (Solarization) and Updates to README and Docs
Solarization
Solarization is an augmentation which inverts all pixels above a given threshold. It is being applied in many papers about self-supervised learning. For example, in BYOL and Barlow Twins.
Updates to README and Docs (multi GPU training)
The README received a code example to show how to use lightly. The documentation was polished and received a section about how to use lightly with multiple GPUs.
Experimental: Active Learning Scorers for Object Detection
Scorers for active learning with object detection were added. These scorers will not work with the API yet and are therefore also not yet documented.
Models
Barlow Twins, a New Benchmarking Module and Updated Documentation
Barlow Twins, a New Benchmarking Module and Updated Documentation
Barlow Twins (@AdrianArnaiz)
An implementation of the Barlow Twins architecture and loss for self-supervised learning is added. The approach measures the cross-correlation matrix between the outputs of two identical networks and making it as similar to the unit matrix as possible.
Thank you @AdrianArnaiz for your contribution
Benchmarking Module
A benchmarking module is added for simpler evaluation of models using kNN callback.
API Updates: Lightly Platform
You can now easily download your datasets from the Lightly Platform using the CLI:
lightly-download token=123 dataset_id=xyz output_dir=store/dataset/here
lightly-download token=123 dataset_id=xyz tag_name=my-tag output_dir=store/tag/here
Minor Updates and Fixes
Updated documentation and docstrings to make working with lightly simpler.
transforms
can now be passed directly to the LightlyDataset
. Learn more here.
Minor bug fixes and improvements.
Models
Fix Imports
Fix Imports
Fixes a bug introduced in the last release.
Possible in v1.1.1 but not in v1.1.0:
import lightly
dataset = lightly.data.LightlyDataset(input_dir='my/dataset')
Models
Self-supervised Active Learning
Lightly gets support for Active-Learning
We're excited to offer our new active-learning functionality! Use the strong representations learned in a self-supervised fashion together with model predictions to further improve the data selection process.
This release introduces breaking changes with respect to the API calls.
Active-Learning
The self-supervised representations together with the model predictions provide a great basis for deciding which samples should be annotated and which ones are redundant.
This release brings a completely new interface with which you can add active-learning to your ML project with just a few lines of code.:
- ApiWorkflowClient:
lightly.api.api_workflow_client.ApiWorkflowClient
The ApiWorkflowClient is used to connect to our API. The API handles the selection of the images based on embeddings and active- learning scores. To initialize the ApiWorkflowClient, you will need the datasetId and the token from the Lightly Platform. - ActiveLearningAgent:
lightly.active_learning.agents.agent.ActiveLearningAgent
The ActiveLearningAgent builds the client interface of our active-learning framework. It helps with indicating which images are preselected and which ones to sample from. Furthermore, one can query it to get a new batch of images. To initialize an ActiveLearningAgent you need an ApiWorkflowClient. - SamplerConfig:
lightly.active_learning.config.sampler_config.SamplerConfig
The SamplerConfig allows the configuration of a sampling request. In particular, you can set the number of samples, the name of the resulting selection, and the SamplingMethod. Currently, you can set the SamplingMethod to one of the following:- Random: Selects samples uniformly at random.
- Coreset: Selects samples that are diverse.
- Coral: Combines Coreset with scores to do active-learning.
- Scorer:
lightly.active_learning.scorers.scorer.Scorer
The Scorer takes as input the predictions of a pre-trained model on the set of unlabeled images. It evaluates different scores based on how certain the model is about the images and passes them to the API so the sampler can use them with Coral.
Check out our documentation to learn more!
API (breaking)
With the refactoring of our API, we are switching to using a generated Python client. This leads to clearer and unified endpoints, fewer errors, and better error messages. Unfortunately, this means that previous versions of the package are no longer compatible with our new API.
Note that this only affects all API calls. Using the package for self-supervised learning is unaffected.
Models
SimSiam and Refactoring of Models and Dataset
SimSiam and Refactoring of Models and Dataset
This release contains breaking changes. The models SimCLR
and MoCo
, the LightlyDataset
, and the BaseCollateFunction
were refactored. These changes were necessary to make the code base better understandable.
SimSiam (@busycalibrating)
An implementation of the SimSiam self-supervised framework is introduced. It relies on a siamese network architecture and aims to maximize similarity between two augmentations of one image.
Refactoring: LightlyDataset
The LightlyDataset
is refactored such that the constructor now always expects an input directory input_dir
which indicates where the images are stored. To use a LightlyDataset
with any PyTorch dataset, the class method LightlyDataset.from_torch_dataset
can be used.
1.0.7 (incompatible)
>>> dataset = LightlyDataset(from_folder='path/to/data')
>>>
>>> dataset = LightlyDataset(root='./', name='cifar10', download=True)
1.0.8
>>> dataset = LightlyDataset(input_dir='path/to/data')
>>>
>>> torch_dataset = torchvision.datasets.CIFAR10(root='./', download=True)
>>> dataset = LightlyDataset.from_torch_dataset(torch_dataset)
Refactoring: BaseCollateFunction
The BaseCollateFunction
now returns a tuple of augmented image batches along with the labels and filenames (aug0, aug1), labels, filenames
where aug0
and aug1
are both of shape bsz x channels x H x W
.
Refactoring: SimCLR, MoCo and NTXentLoss
In accordance with the changes of the BaseCollateFunction
, SimCLR
and MoCo
will expect the augmented images seperately now instead of as a single batch. Similarly, the NTXentLoss
now requires a separate batch of representations as inputs.
1.0.7 (incompatible)
>>> # batch size is 128
>>> batch, labels, filenames = next(iter(dataloader))
>>> batch.shape
torch.Size([256, 3, 32, 32])
>>> # number of features is 64
>>> y = simclr(batch)
>>> y.shape
torch.Size([256, 64])
>>> loss = ntx_ent_loss(y)
1.0.8
>>> # batch size is 128
>>> (batch0, batch1), labels, filenames = next(iter(dataloader))
>>> batch0.shape
torch.Size([128, 3, 32, 32])
>>> batch1.shape
torch.Size([128, 3, 32, 32])
>>> # number of features is 64
>>> y0, y1 = simclr(batch0, batch1)
>>> y0.shape
torch.Size([128, 64])
>>> y1.shape
torch.Size([128, 64])
>>> loss = ntx_ent_loss(y0, y1)
Documentation Updates
A tutorial about how to use the SimSiam model is added along with some minor changes and improvements.
Minor Changes
Private functions are hidden from autocompletion.