Releases: realtimedatalake/rtdl
v0.2.0
V0.2.0 - Current status -- what works and what doesn't
What works? 🚀
rtdl's initial feature set is built and working. You can use the API on port 80 to configure streams that ingest json from an rtdl endpoint on port 8080, process them into Parquet, and save the files to a destination configured in your stream. rtdl can write files locally, to HDFS, to AWS S3, GCP Cloud Storage, and Azure Blob Storage and you can query your data via Dremio's web UI at http://localhost:9047 (login with Username: rtdl
and Password rtdl1234
). rtdl supports writing in the Delta Lake table format as well as integration with the AWS Glue and Snowflake External Tables metadata catalogs.
What's new? 💥
- Upgrading to v0.2.0 requires following the steps in our upgrade guide.
- Added Delta Lake support.
- Switched to file-based configuration storage (removed dependency on PostgreSQL).
What doesn't work/what's next on the roadmap? 🚴🏼
- Community contribution: Stateful Function for PII detection and masking.
- Making AWS Glue, Snowflake External Tables, and Delta Lake support on a by-stream basis.
- git integration for
stream
configurations. - Research and implementation for Apache Hudi, Apache Iceberg, and Project Nessie.
- Graphical user interface.
- Dremio Cloud support.
v0.1.2
V0.1.2 - Current status -- what works and what doesn't
What works? 🚀
rtdl's initial feature set is built and working. You can use the API on port 80 to
configure streams that ingest json from an rtdl endpoint on port 8080, process them into Parquet,
and save the files to a destination configured in your stream. rtdl can write files locally, to
HDFS, to AWS S3, GCP Cloud Storage, and Azure Blob Storage and you can query your data via Dremio's
web UI at http://localhost:9047 (login with Username: rtdl
and Password rtdl1234
).
What's new? 💥
- Added HDFS support.
- Added AWS Glue support.
- Added Snowflake External Tables support.
What doesn't work/what's next on the roadmap? 🚴🏼
- Community contribution: Stateful Function for PII detection and masking.
- Move
stream
configurations to JSON files instead of SQL. - git integration for
stream
configurations. - Research and implementation for Apache Hudi, Apache Iceberg, Delta Lake, and Project Nessie.
- Graphical user interface.
- Dremio Cloud support.
v0.1.1
V0.1.1 - Current status -- what works and what doesn't
What works? 🚀
rtdl's initial feature set is built and working. You can use the API on port 80 to
configure streams that ingest json from an rtdl endpoint on port 8080, process them into Parquet,
and save the files to a destination configured in your stream. rtdl can write files locally, to
AWS S3, GCP Cloud Storage, and Azure Blob Storage and you can query your data via Dremio's web UI
at http://localhost:9047 (login with Username: rtdl
and Password rtdl1234
).
What's new? 💥
- Replaced Kafka & Zookeeper with Redpanda.
- Added support for HDFS.
- Fixed issue with handling booleans when writing Parquet.
- Added several logo variants and a banner to the public directory.
What doesn't work/what's next on the roadmap? 🚴🏼
- Dremio Cloud support.
- Apache Hudi support.
- Start using GitHub Projects for work tracking.
- Research and implementation for Apache Iceberg, Delta Lake, and Project Nessie.
- Community contribution: Stateful Function for PII detection and masking.
- Graphical user interface.
v0.1.0
V0.1.0 - Current status -- what works and what doesn't
What works? 🚀
rtdl's initial feature set is built and working. You can use the API on port 80 to
configure streams that ingest json from an rtdl endpoint on port 8080, process them into Parquet,
and save the files to a destination configured in your stream. rtdl can write files locally, to
AWS S3, GCP Cloud Storage, and Azure Blob Storage and you can query your data with Dremio on port
9047 (login with Username: rtdl
and Password rtdl1234
).
What's new? 💥
- Added support for Azure Blob Storage V2 (please note that for events written to Azure Blob Storage
V2 - it can take time up to 1 minute for data to reflect in Dremio) - Added support for GZIP and LZO compressions in addition to SNAPPY (default). Specify
compression_type_id
as 2 for GZIP and 3 for LZO - Added support for Segment webhooks. You can set up RTDL
ingester
endpoint as a webhook in Segment.
You will need to create a stream with thestream_alt_id
as either theSource ID
or the
Write Key
from theAPI Keys
tab ofSettings
for the Source connected to the Webhook
Destination.
What doesn't work/what's next on the roadmap? 🚴🏼
- Start using GitHub Projects for work tracking
- Research and implementation for Apache Hudi, Apache Iceberg, Delta Lake, and Project Nessie
- Writing to HDFS
- Graphical User Interface
v0.0.2
V0.0.2 - Current status -- what works and what doesn't
What works? 🚀
rtdl is not full-featured yet, but it is currently functional. You can use the API on port 80 to
configure streams that ingest json from an rtdl endpoint on port 8080, process them into Parquet,
and save the files to a destination configured in your stream. rtdl can write files locally, to
AWS S3, and to GCP Cloud Storage, and you can query your data with Dremio on port 9047 (login with
Username: rtdl
and Password rtdl1234
).
What's new? 💥
- Switched from Apache Hive Metastore + Presto to Dremio. Dremio works for all storage types. This was
incorrectly noted as not functioning in the original release notes. - Added support for using a flattened JSON object as value for
gcp_json_credentials
field in the
createStream
API call. Previously, you had to double-quote everything and flatten. - Added CONTRIBUTING.md and decided to use a DCO over a CLA - tl;dr use -s when you commit, like
git commit -s -m "..."
What doesn't work/what's next on the roadmap? 🚴🏼
- Add support for Azure Blob Storage
- Add support for Segment Webhooks as a source
- Add support for more compressions - currently default Snappy compression is supported
v0.0.1
preparations for making the repo public