Skip to content

v0.4.0

Compare
Choose a tag to compare
@matteo-grella matteo-grella released this 17 Jan 02:42
· 1020 commits to main since this release

Added

  • Various new test cases (improving the coverage).
  • nlp.embeddings.syncmap package.
  • ml.nn.recurrent.srnn.BiModel which implements a bidirectional variant of the Shuffling Recurrent Neural Networks (SRNN).
  • Configurable timeout and request limit to all HTTP and gRPC servers (see also commands help).

Changed

  • All CLI commands implementation has been refactored, so that the docker-entrypoint can reuse all other cli.App objects, instead of just running separate executables. By extension, now the Dockerfile builds a single executable file, and the final image is way smaller.
  • All dependencies have been upgraded to the latest version.
  • Simplify custom error definitions using fmt.Errorf instead of functions from github.com/pkg/errors.
  • Custom binary data serialization of matrices and models is now achieved with Go's encoding.gob. Many specific functions and methods are now replaced by fewer and simpler encoding/decoding methods compatible with gob. A list of important related changes follows.
    • utils.kvdb.KeyValueDB is no longer an interface, but a struct which directly implements the former "badger backend".
    • utils.SerializeToFile and utils.DeserializeFromFile now handle generic interface{} objects, instead of values implementing Serializer and Deserializer.
    • mat32 and mat64 custom serialization functions (e.g. MarshalBinarySlice, MarshalBinaryTo, ...) are replaced by implementations of BinaryMarshaler and BinaryUnmarshaler interfaces on Dense and Sparse matrix types.
    • PositionalEncoder.Cache and AxialPositionalEncoder.Cache fields (from ml.encoding.pe package) are now public.
    • All types implementing nn.Model interface are registered for gob serialization (in init functions).
    • embeddings.Model.UsedEmbeddings type is now nlp.embeddings.syncmap.Map.
    • As a consequence, you will have to re-serialize all your models.
  • Flair converter now sets the vocabulary directly in the model, instead of creating a separate file.
  • sequencelabeler.Model.LoadParams has been renamed to Load.

Removed

  • In relation to the aforementioned gob serialization changes:
    • nn.ParamSerializer and related functions
    • nn.ParamsSerializer and related functions
    • utils.Serializer and utils.Deserializer interfaces
    • utils.ReadFull function
  • sequencelabeler.Model.LoadVocabulary

Fixed

  • docker-entrypoint sub-command hugging-face-importer has been renamed to huggingface-importer, just like the main command itself.
  • docker-entrypoint sub-command can be correctly specified without leading ./ or / when run from a Docker container.
  • BREAKING: mat32.Matrix serialization has been fixed, now serializing single values to chunks of 4 bytes (instead of 8, like float64). Serialized 32-bit models will now be half the size! Unfortunately you will have to re-serialize your models (sorry!).