Skip to content

Releases: realtimedatalake/rtdl

v0.2.0

31 May 04:48
4900a10
Compare
Choose a tag to compare

V0.2.0 - Current status -- what works and what doesn't

What works? 🚀

rtdl's initial feature set is built and working. You can use the API on port 80 to configure streams that ingest json from an rtdl endpoint on port 8080, process them into Parquet, and save the files to a destination configured in your stream. rtdl can write files locally, to HDFS, to AWS S3, GCP Cloud Storage, and Azure Blob Storage and you can query your data via Dremio's web UI at http://localhost:9047 (login with Username: rtdl and Password rtdl1234). rtdl supports writing in the Delta Lake table format as well as integration with the AWS Glue and Snowflake External Tables metadata catalogs.

What's new? 💥

  • Upgrading to v0.2.0 requires following the steps in our upgrade guide.
  • Added Delta Lake support.
  • Switched to file-based configuration storage (removed dependency on PostgreSQL).

What doesn't work/what's next on the roadmap? 🚴🏼

  • Community contribution: Stateful Function for PII detection and masking.
  • Making AWS Glue, Snowflake External Tables, and Delta Lake support on a by-stream basis.
  • git integration for stream configurations.
  • Research and implementation for Apache Hudi, Apache Iceberg, and Project Nessie.
  • Graphical user interface.
  • Dremio Cloud support.

v0.1.2

03 Apr 22:03
Compare
Choose a tag to compare

V0.1.2 - Current status -- what works and what doesn't

What works? 🚀

rtdl's initial feature set is built and working. You can use the API on port 80 to
configure streams that ingest json from an rtdl endpoint on port 8080, process them into Parquet,
and save the files to a destination configured in your stream. rtdl can write files locally, to
HDFS, to AWS S3, GCP Cloud Storage, and Azure Blob Storage and you can query your data via Dremio's
web UI at http://localhost:9047 (login with Username: rtdl and Password rtdl1234).

What's new? 💥

  • Added HDFS support.
  • Added AWS Glue support.
  • Added Snowflake External Tables support.

What doesn't work/what's next on the roadmap? 🚴🏼

  • Community contribution: Stateful Function for PII detection and masking.
  • Move stream configurations to JSON files instead of SQL.
  • git integration for stream configurations.
  • Research and implementation for Apache Hudi, Apache Iceberg, Delta Lake, and Project Nessie.
  • Graphical user interface.
  • Dremio Cloud support.

v0.1.1

07 Mar 05:00
1d5cf01
Compare
Choose a tag to compare

V0.1.1 - Current status -- what works and what doesn't

What works? 🚀

rtdl's initial feature set is built and working. You can use the API on port 80 to
configure streams that ingest json from an rtdl endpoint on port 8080, process them into Parquet,
and save the files to a destination configured in your stream. rtdl can write files locally, to
AWS S3, GCP Cloud Storage, and Azure Blob Storage and you can query your data via Dremio's web UI
at http://localhost:9047 (login with Username: rtdl and Password rtdl1234).

What's new? 💥

  • Replaced Kafka & Zookeeper with Redpanda.
  • Added support for HDFS.
  • Fixed issue with handling booleans when writing Parquet.
  • Added several logo variants and a banner to the public directory.

What doesn't work/what's next on the roadmap? 🚴🏼

  • Dremio Cloud support.
  • Apache Hudi support.
  • Start using GitHub Projects for work tracking.
  • Research and implementation for Apache Iceberg, Delta Lake, and Project Nessie.
  • Community contribution: Stateful Function for PII detection and masking.
  • Graphical user interface.

v0.1.0

21 Feb 21:01
Compare
Choose a tag to compare

V0.1.0 - Current status -- what works and what doesn't

What works? 🚀

rtdl's initial feature set is built and working. You can use the API on port 80 to
configure streams that ingest json from an rtdl endpoint on port 8080, process them into Parquet,
and save the files to a destination configured in your stream. rtdl can write files locally, to
AWS S3, GCP Cloud Storage, and Azure Blob Storage and you can query your data with Dremio on port
9047 (login with Username: rtdl and Password rtdl1234).

What's new? 💥

  • Added support for Azure Blob Storage V2 (please note that for events written to Azure Blob Storage
    V2 - it can take time up to 1 minute for data to reflect in Dremio)
  • Added support for GZIP and LZO compressions in addition to SNAPPY (default). Specify
    compression_type_id as 2 for GZIP and 3 for LZO
  • Added support for Segment webhooks. You can set up RTDL ingester endpoint as a webhook in Segment.
    You will need to create a stream with the stream_alt_id as either the Source ID or the
    Write Key from the API Keys tab of Settings for the Source connected to the Webhook
    Destination.

What doesn't work/what's next on the roadmap? 🚴🏼

  • Start using GitHub Projects for work tracking
  • Research and implementation for Apache Hudi, Apache Iceberg, Delta Lake, and Project Nessie
  • Writing to HDFS
  • Graphical User Interface

v0.0.2

09 Feb 21:37
159eebe
Compare
Choose a tag to compare

V0.0.2 - Current status -- what works and what doesn't

What works? 🚀

rtdl is not full-featured yet, but it is currently functional. You can use the API on port 80 to
configure streams that ingest json from an rtdl endpoint on port 8080, process them into Parquet,
and save the files to a destination configured in your stream. rtdl can write files locally, to
AWS S3, and to GCP Cloud Storage, and you can query your data with Dremio on port 9047 (login with
Username: rtdl and Password rtdl1234).

What's new? 💥

  • Switched from Apache Hive Metastore + Presto to Dremio. Dremio works for all storage types. This was
    incorrectly noted as not functioning in the original release notes.
  • Added support for using a flattened JSON object as value for gcp_json_credentials field in the
    createStream API call. Previously, you had to double-quote everything and flatten.
  • Added CONTRIBUTING.md and decided to use a DCO over a CLA - tl;dr use -s when you commit, like
    git commit -s -m "..."

What doesn't work/what's next on the roadmap? 🚴🏼

  • Add support for Azure Blob Storage
  • Add support for Segment Webhooks as a source
  • Add support for more compressions - currently default Snappy compression is supported

v0.0.1

27 Jan 04:35
Compare
Choose a tag to compare
preparations for making the repo public