Releases: AbsaOSS/spline
Re-written, server based Spline version, powered by ArangoDB
New vision
In this release we have completely revised the vision and architecture of Spline.
Starting from 0.4 release Spline has begun its journey from being a simple Spark-only lineage tracking tool towards a more generic concept - a cross-framework data lineage tracking solution. The new vision covers much broader aspects of lineage tacking, including (at certain extent) real-time monitoring, errors tracking, impact analysis and many more. Spline version 0.4.0 is the first version of that "new" Spline. It doesn't contain any brand-new features so far comparing to Spline 0.3.9, but it rather provides a brand-new background and architecture.
New Architecture
Spline core is now split into two main parts - a Spline server and a Spline agent.
The Spline server is implemented in form of the Spline REST Gateway
that exposes two independent REST APIs - the Producer API (used by Agent to send the metadata to the server), and the Consumer API
(used by the Spline UI or other parties to get the collected linage data). Although both APIs will evolve in the future versions, we'll try our best to maintain backward compatibility with the Producer one.
Migration from Spline 0.3
Spline 0.4 comes with the command line migration tool that can be used to migrate old Spline 0.3 data from a MongoDB to a new Spline 0.4 storage, that is now based on the ArangoDB.
Atlas support
Atlas integration has been removed (again) from Spline 0.4.0 and will most likely be re-introduced in some another shape in one of the future Spline versions (see #279)
Improvements in Spark Agent
There also were a number improvements in the Spark operations support. For example we added support for Delta, Kafka and JDBC (as both source and target for batch jobs), and also support for some Spark SQL commands (e.g. create table as
, drop table
and others).
See https://github.com/AbsaOSS/spline/blob/release/0.4.0/spark/agent/README.md
Containerization
Spline is now much easier to try out and use in clouds as all its moving parts are implemented as Docker containers - ArangoDB, Spline REST Gateway and Spline UI.
The Spline Agent for Spark is now of a much smaller size (due to less amount of dependencies) and, just like in a previous Spline 0.3.9 is shipped in form of a pre-build bundle for three major Spark versions - 2.2, 2.3 and 2.4 - https://search.maven.org/search?q=spark-agent-bundle
Bugfix release
release/0.3.9 [maven-release-plugin] copy for tag release/0.3.9
Bugfix release
release/0.3.8 [maven-release-plugin] copy for tag release/0.3.8
PySpark, Codeless init, Uber JAR, 'saveAsTable' etc, bugfixes
In addition to bugfixes and performance improvements this release introduces the following features:
- PySpark support
- Codeless initialization (via
spark.sql.queryExecutionListeners
property) - Support for
saveAsTable
,insertInto
commands and JDBC datasource.
Also Spline is now available as an uber-JAR.
Spark 2.4 + Atlas support
This release add Spark 2.4 support and brings back Apache Atlas integration (that was removed in previous releases). It also fixes a few bugs.
Bugfix release
Maintenance release
Release Notes - Spline - Data Lineage for Spark - Version 0.3.2
Bug
- [SL-141] - Still hitting 16Mb MongoDB's document size limit after the lineage monolith doc split
- [SL-145] - Spline UI times out on building lineage overview
- [SL-148] - ERROR TypeError: Cannot read property '_typeHint' of undefined
- [SL-149] - UI: Infinite scrolling works wrong
- [SL-150] - Lineage overview loading time improvements
Story
- [SL-137] - Improve filter predicate visualization #6h
Task
MongoDB script fix, Spark compat fix, attribute highlighting fix
APPEND write mode support. Various performance improvements.
Release Notes - Spline - Data Lineage for Spark - Version 0.3.0
Bug
- [SL-93] - ParallelCompositeDataLineageReader semantic is confusing
- [SL-95] - HDFS persistence layer not working
- [SL-103] - Only capture successful lineages
- [SL-115] - Timeout error on dataset descriptors request
- [SL-130] - TypeError: Cannot read property 'schema' of undefined
- [SL-131] - E11000 duplicate key error collection: spline-dev.attributes index: _id_ dup key
Story
- [SL-21] - Search
- [SL-38] - Mongo: Split lineage document
- [SL-72] - Investigation whether support of structured streaming would be possible
- [SL-91] - Listener for Stream Processing
- [SL-92] - Mongo 3.6 support
- [SL-97] - Support APPEND operation
- [SL-99] - UI: Dataset list infinite scrolling
- [SL-106] - Harvester for Strectured Streming Nodes