Skip to content

Releases: AbsaOSS/spline

Re-written, server based Spline version, powered by ArangoDB

16 Dec 17:31
Compare
Choose a tag to compare

New vision

In this release we have completely revised the vision and architecture of Spline.
Starting from 0.4 release Spline has begun its journey from being a simple Spark-only lineage tracking tool towards a more generic concept - a cross-framework data lineage tracking solution. The new vision covers much broader aspects of lineage tacking, including (at certain extent) real-time monitoring, errors tracking, impact analysis and many more. Spline version 0.4.0 is the first version of that "new" Spline. It doesn't contain any brand-new features so far comparing to Spline 0.3.9, but it rather provides a brand-new background and architecture.

New Architecture

Spline core is now split into two main parts - a Spline server and a Spline agent.
The Spline server is implemented in form of the Spline REST Gateway that exposes two independent REST APIs - the Producer API (used by Agent to send the metadata to the server), and the Consumer API (used by the Spline UI or other parties to get the collected linage data). Although both APIs will evolve in the future versions, we'll try our best to maintain backward compatibility with the Producer one.

Migration from Spline 0.3

Spline 0.4 comes with the command line migration tool that can be used to migrate old Spline 0.3 data from a MongoDB to a new Spline 0.4 storage, that is now based on the ArangoDB.

Atlas support

Atlas integration has been removed (again) from Spline 0.4.0 and will most likely be re-introduced in some another shape in one of the future Spline versions (see #279)

Improvements in Spark Agent

There also were a number improvements in the Spark operations support. For example we added support for Delta, Kafka and JDBC (as both source and target for batch jobs), and also support for some Spark SQL commands (e.g. create table as, drop table and others).
See https://github.com/AbsaOSS/spline/blob/release/0.4.0/spark/agent/README.md

Containerization

Spline is now much easier to try out and use in clouds as all its moving parts are implemented as Docker containers - ArangoDB, Spline REST Gateway and Spline UI.
The Spline Agent for Spark is now of a much smaller size (due to less amount of dependencies) and, just like in a previous Spline 0.3.9 is shipped in form of a pre-build bundle for three major Spark versions - 2.2, 2.3 and 2.4 - https://search.maven.org/search?q=spark-agent-bundle

Bugfix release

29 Jul 16:23
Compare
Choose a tag to compare
release/0.3.9

[maven-release-plugin] copy for tag release/0.3.9

Bugfix release

17 May 19:41
Compare
Choose a tag to compare
release/0.3.8

[maven-release-plugin] copy for tag release/0.3.8

PySpark, Codeless init, Uber JAR, 'saveAsTable' etc, bugfixes

02 May 11:58
Compare
Choose a tag to compare

In addition to bugfixes and performance improvements this release introduces the following features:

  1. PySpark support
  2. Codeless initialization (via spark.sql.queryExecutionListeners property)
  3. Support for saveAsTable, insertInto commands and JDBC datasource.

Also Spline is now available as an uber-JAR.

Spark 2.4 + Atlas support

01 Feb 19:40
Compare
Choose a tag to compare

This release add Spark 2.4 support and brings back Apache Atlas integration (that was removed in previous releases). It also fixes a few bugs.

Bugfix release

26 Nov 21:20
Compare
Choose a tag to compare

Fixed bugs:

  • #53 null and None literals fail to deserialize
  • #57 Child datatypes are missing for certain lineages

Maintenance release

19 Oct 22:27
Compare
Choose a tag to compare
    Release Notes - Spline - Data Lineage for Spark - Version 0.3.2

Bug

  • [SL-141] - Still hitting 16Mb MongoDB's document size limit after the lineage monolith doc split
  • [SL-145] - Spline UI times out on building lineage overview
  • [SL-148] - ERROR TypeError: Cannot read property '_typeHint' of undefined
  • [SL-149] - UI: Infinite scrolling works wrong
  • [SL-150] - Lineage overview loading time improvements

Story

  • [SL-137] - Improve filter predicate visualization #6h

Task

  • [SL-140] - Check Spark configuration for MongoDB connection string
  • [SL-146] - Update License info + sanitizing

MongoDB script fix, Spark compat fix, attribute highlighting fix

19 Apr 11:07
Compare
Choose a tag to compare
    Release Notes - Spline - Data Lineage for Spark - Version 0.3.1

Bug

  • [SL-117] - Fix MongoDB migration script
  • [SL-116] - Fix Spark 2.3 compatibility
  • [SL-133] - Operation details attribute not highlighted on node change

APPEND write mode support. Various performance improvements.

09 Apr 17:02
Compare
Choose a tag to compare
    Release Notes - Spline - Data Lineage for Spark - Version 0.3.0

Bug

  • [SL-93] - ParallelCompositeDataLineageReader semantic is confusing
  • [SL-95] - HDFS persistence layer not working
  • [SL-103] - Only capture successful lineages
  • [SL-115] - Timeout error on dataset descriptors request
  • [SL-130] - TypeError: Cannot read property 'schema' of undefined
  • [SL-131] - E11000 duplicate key error collection: spline-dev.attributes index: _id_ dup key

Story

  • [SL-21] - Search
  • [SL-38] - Mongo: Split lineage document
  • [SL-72] - Investigation whether support of structured streaming would be possible
  • [SL-91] - Listener for Stream Processing
  • [SL-92] - Mongo 3.6 support
  • [SL-97] - Support APPEND operation
  • [SL-99] - UI: Dataset list infinite scrolling
  • [SL-106] - Harvester for Strectured Streming Nodes

Task

  • [SL-116] - Upgrade to Spark 2.3
  • [SL-117] - Persistent model migration

UI performance improvements. Mongo storage fixes.

19 Mar 23:49
Compare
Choose a tag to compare
    Release Notes - Spline - Data Lineage for Spark - Version 0.2.7

Bug

  • [SL-126] - When persisting to Mongo keys contain "."

Story

  • [SL-100] - Optimize UI to handle wide datasets
  • [SL-108] - Authentication to Secured Mongo Instances