Skip to content
This repository has been archived by the owner on Mar 24, 2023. It is now read-only.

Latest commit

 

History

History
79 lines (48 loc) · 5.45 KB

README.md

File metadata and controls

79 lines (48 loc) · 5.45 KB

⚠️ This branch includes examples for an unsupported version of Materialize (0.26.x).

Materialize + Redpanda + dbt Hack Day

Slack Badge

Welcome to the first virtual Hack Day hosted by Materialize and our good friends at Redpanda and dbt Labs! The goal of this event is to encourage knowledge-sharing between our communities (we've already learned so much just putting it together!), and give you a taste of what building streaming analytics pipelines with this stack looks like.

What to expect

Maybe you've never used dbt. Maybe you're new to streaming. Maybe you're even new to dbt and streaming. But guess what: it doesn't really matter!

We'll kick things off with a quick intro to each of the projects and go over the details of the event to make sure you're all set! From there, you can choose your own Hack Day adventure. We are also giving you somewhere to start!

How does it work?

👾 Build

Throughout the day, folks from all three projects will be available to bounce off ideas, support you with your project or just...chat. To get in touch with us, join the official Slack channel or reach out in Troubleshooting!

🤲 Share

At the end of the event, we encourage you to share your projects, experiments and learnings in Show and tell! This can be a link to a GitHub repo with your project, a blog post, or just a plain text recap of your Hack Day...whatever feels right.

💥 Get out there!

As a "Thank you!" for joining us and getting your hack on, we'd love to send you some swag! We might also reach out about promoting and showcasing your work more widely in the data community.

Where to start

Our goal was to guarantee that everyone is able to get up and running in a reasonable amount of time, as well as find something fun to work on regardless of their level of expertise with each tool. For this reason, you can find a sample project in the repo with enough plumbing to spin up an end-to-end setup that you can play around with, extend or completely modify:

demo_overview

To get started, fork this repo, clone it and navigate to the sample_project directory:

git clone https://github.com/<github-username>/mz-hack-day-2022.git

cd mz-hack-day-2022/sample_project

Where to go from here?

There's a lot more you can do as you ramp up! In case you need some ideas, here are a few seed challenges:

Tool Challenge
Materialize Replace the JSON file with a Postgres database that pushes changes to the aircraft reference data into Materialize, either through Redpanda+Debezium or directly.
Materialize Push data from a materialized view to a web app using TAIL. You can use our Node.js and Materialize guide as a reference!
Redpanda Create a producer for a new data source or adapt the existing one to use pandaproxy instead.
Redpanda Give WASM transforms (beta) a try for data pre-processing (e.g. cleaning, masking).
dbt Add a sink model that outputs the results of the fct_flight materialized view back to Redpanda.
dbt Incorporate macros from the materialize-dbt-utils package into your models.

Resources

Documentation

Alternative data sources

Source Requires authentication? Rate limited? Link
Citi Bike NYC Citi Bike GBFS real-time feed
Network Rail UK ☑️ RTPPM
Twitter ☑️ ☑️ Twitter API v2
Twitch ☑️ ☑️ Twitch API

If you know about other cool data sources you'd like to add to the list, feel free to open an issue or a pull request with suggestions!