diff --git a/blog/index.html b/blog/index.html index 353f26f..deb2350 100644 --- a/blog/index.html +++ b/blog/index.html @@ -200,6 +200,14 @@

2023

+

+ Who's watching the watchdog? + 11/22 +

+ + + +

A side project story: Hacker Gifts (2018-2024) 10/29 diff --git a/blogpost-contexts/index.html b/blogpost-contexts/index.html index 304d028..f12d83e 100644 --- a/blogpost-contexts/index.html +++ b/blogpost-contexts/index.html @@ -230,6 +230,14 @@

Related posts:

+ + + + + + + + diff --git a/cdtmp/index.html b/cdtmp/index.html index 4790854..7a2340b 100644 --- a/cdtmp/index.html +++ b/cdtmp/index.html @@ -213,6 +213,14 @@

Related posts:

+ + + + + + + + diff --git a/copy-with-syntax/index.html b/copy-with-syntax/index.html index e198bd8..99ec63c 100644 --- a/copy-with-syntax/index.html +++ b/copy-with-syntax/index.html @@ -217,6 +217,14 @@

Related posts:

+ + + + + + + + diff --git a/ctrl-r/index.html b/ctrl-r/index.html index 9823d74..1f937b0 100644 --- a/ctrl-r/index.html +++ b/ctrl-r/index.html @@ -212,6 +212,14 @@

Related posts:

+ + + + + + + + diff --git a/e2e-tests/index.html b/e2e-tests/index.html index 88c592c..ea49a7e 100644 --- a/e2e-tests/index.html +++ b/e2e-tests/index.html @@ -269,6 +269,14 @@

Related posts:

+ + + + + + + + diff --git a/feed.xml b/feed.xml index 227fd3d..ca68ea7 100644 --- a/feed.xml +++ b/feed.xml @@ -5,12 +5,40 @@ https://frantic.im/favicon.png Occasional posts on technology and stuff - 2023-10-24T19:13:00.962Z + 2023-11-22T18:31:03.008Z Alex Kotliarskyi + + https://frantic.im/whos-watching-the-watchdog + Who's watching the watchdog? + 2023-11-22T12:00:00+00:00 + + + Making reliable systems that expect things to go wrong + At my current company we have an automated pipeline for processing customer’s orders. It’s pretty complex — talking to multiple different services, training models, storing large files, updating the database, sending emails and push notifications.

+

Sometimes things get stuck because of a temporary 3rd party outage or a bug in our code.

+

So we built a watchdog service: it monitors the stream of orders and makes sure the orders get processed within reasonable timeframe (3 hours). The watchdog only looks at the final invariant — was the order fulfilled and delivered to the customer? It doesn’t care about any intermediary steps.

+

This system has saved us many times. When the watchdog finds a stuck order, it posts in our special channel in Slack. We investigate the problem and address the root cause, so hopefully we won’t see new orders stuck for the same reason.

+

But who’s watching the watchdog? What if it fails to run?

+

It actually happened to us once. The watchdog is running on the job scheduling system, and that system went down. That meant no orders were getting processed and watchdog also wasn’t running. The alerts channel in Slack was blissfully silent.

+

To address this case, we need a system that can watch the watchdog. We are using these two:

+ +

The idea behind both systems is the same: they expect a regular cron job to “check in” on a pre-defined schedule. If it misses a check-in, there’s likely a problem and we get an alert in Slack.

+

Complex systems always find surprising ways to fail. When adding an end-to-end quality watchdog (and ways to watch the watchdog) you can create a positive loop of detecting issues and hardening the system.

+ + ]]>
+
+ https://frantic.im/hacker-gifts A side project story: Hacker Gifts (2018-2024) @@ -652,46 +680,4 @@ Things you can do: ]]> - - https://frantic.im/octave - A side project story: octave.im (2013-2016) - 2021-02-23T12:00:00+00:00 - - - A story about my attempt at SaaS - It all started around 2013: I was going through a course on Machine Learning by Andrew Ng.

-

The practical part of the course depended on GNU Octave (open source math toolkit), but installing it on a Mac was a huge pain. I did manage to do it, but noticed that many people on forums complanied about the same thing.

-

So I had a brilliant idea — wouldn’t it be great if Octave was available via SaaS model? With fancy features like built in code editor, command line and plots?

-

Node, React & Docker

-

I built the first prototype in one night on June 8, 2013. I used NodeJS 0.10-ish with socket.io on the server side and CodeMirror with some plugins on the frontend.

-

In October that year I rewrote the frontend in React — the experience of doing so was amazing! React was young (createClass/autobind/mixins) but its programming model “clicked” with me. I remember hanging out in their IRC channel looking for help with autoscrolling. I was really impressed at how quick and friendly the response was (thanks @sophiebits!).

-

The initial version of the backend would just run octave in a dedicated folder. My second iteration ued Docker, which at the time was very new and unproven. It all ran on a Digital Ocean 2GB RAM droplet.

-

The killer feature was displaying plots inline in a REPL. You can see it on this gif:

-

-

It worked through a clever hack: I pre-configured Octave to use gnuplot with special arguments that made it save the graph to a file (instead of showing it on the screen). My NodeJS backend listened to filesystem changes and notified the frontend when it detected the update.

-

Product market fit

-

I tried to promote octave.im for the students of the ML course. I posted the link on forums couple of times and added it to the course wiki page (that was surprisingly very hidden). The reception among students has been really positive, but the course moderators weren’t happy: they wanted some kind of validation that it’s a serious thing (which it wasn’t).

-

Overall I had more than 3500 people sign up over the course of several years. Unfortunately I didn’t keep any metrics screenshots. The twitter account, @OctaveCloud, got 57 followers (organically).

-

Speaking of which, I used Mixpanel and loved its simple API and dashboards. They even sent me a free T-shirt :)

-

Total profit: -$420

-

As every other hacker out there I also hoped to make it sustainable, so in October 2015 I added $4 monthly subscription with 2 weeks trial. To be honest I wasn’t very serious about it at that point. I just wanted to play with Stripe, see if people would actually pay. And they did! Overall I have collected about $300 in revenue.

-

An interesting thing that I noticed was that people subscribe and then stop using the product, without unsubscribing (I did have the unsubscribe button on the profile, no questions asked). I ended up manually cancelling a bunch of subscriptions on Stripe without updating the app DB, so people could still use the service (which they didn’t anyways).

-

In numbers

- -

Screenshot, for posterity:

-

- - ]]>
-
- \ No newline at end of file diff --git a/figma/og_watchdog.png b/figma/og_watchdog.png new file mode 100644 index 0000000..371b051 Binary files /dev/null and b/figma/og_watchdog.png differ diff --git a/good-errors-leave-trace/index.html b/good-errors-leave-trace/index.html index 5e74d03..5ee3e24 100644 --- a/good-errors-leave-trace/index.html +++ b/good-errors-leave-trace/index.html @@ -266,6 +266,14 @@

Related posts:

+ + + + + + + + diff --git a/hacker-gifts/index.html b/hacker-gifts/index.html index 7fc0ca8..cf9e9fb 100644 --- a/hacker-gifts/index.html +++ b/hacker-gifts/index.html @@ -257,6 +257,14 @@

Related posts:

+ + + + + + + + diff --git a/hello-world/index.html b/hello-world/index.html index 343ef7b..40d81b5 100644 --- a/hello-world/index.html +++ b/hello-world/index.html @@ -206,6 +206,14 @@

Related posts:

+ + + + + + + + diff --git a/how-not-to-flux-loops/index.html b/how-not-to-flux-loops/index.html index 7098177..676db3e 100644 --- a/how-not-to-flux-loops/index.html +++ b/how-not-to-flux-loops/index.html @@ -250,6 +250,14 @@

Related posts:

+ + + + + + + + diff --git a/how-not-to-flux-set-actions/index.html b/how-not-to-flux-set-actions/index.html index f02659b..097cac2 100644 --- a/how-not-to-flux-set-actions/index.html +++ b/how-not-to-flux-set-actions/index.html @@ -283,6 +283,14 @@

Related posts:

+ + + + + + + + diff --git a/how-to-convince-your-boss-to-use-react-native/index.html b/how-to-convince-your-boss-to-use-react-native/index.html index ac0df75..600d1db 100644 --- a/how-to-convince-your-boss-to-use-react-native/index.html +++ b/how-to-convince-your-boss-to-use-react-native/index.html @@ -253,6 +253,14 @@

Related posts:

+ + + + + + + + diff --git a/keynote/index.html b/keynote/index.html index 857ceda..3ffe5f7 100644 --- a/keynote/index.html +++ b/keynote/index.html @@ -248,6 +248,14 @@

Related posts:

+ + + + + + + + diff --git a/macos-app-shortcuts/index.html b/macos-app-shortcuts/index.html index a2d730a..108cf47 100644 --- a/macos-app-shortcuts/index.html +++ b/macos-app-shortcuts/index.html @@ -224,6 +224,14 @@

Related posts:

+ + + + + + + + diff --git a/no-constraints-no-fun/index.html b/no-constraints-no-fun/index.html index 231ae60..8d90c55 100644 --- a/no-constraints-no-fun/index.html +++ b/no-constraints-no-fun/index.html @@ -208,6 +208,14 @@

Related posts:

+ + + + + + + + diff --git a/notify-on-completion/index.html b/notify-on-completion/index.html index 9e4b07c..38c8451 100644 --- a/notify-on-completion/index.html +++ b/notify-on-completion/index.html @@ -244,6 +244,14 @@

Related posts:

+ + + + + + + + diff --git a/octave/index.html b/octave/index.html index f143c04..b14b553 100644 --- a/octave/index.html +++ b/octave/index.html @@ -230,6 +230,14 @@

Related posts:

+ + + + + + + + diff --git a/onityper/index.html b/onityper/index.html index a3aed3c..27d8a5f 100644 --- a/onityper/index.html +++ b/onityper/index.html @@ -292,6 +292,14 @@

Related posts:

+ + + + + + + + diff --git a/plotting-ideas/index.html b/plotting-ideas/index.html index 2bd2edc..d941e64 100644 --- a/plotting-ideas/index.html +++ b/plotting-ideas/index.html @@ -228,6 +228,14 @@

Related posts:

+ + + + + + + + diff --git a/react-and-javascript-in-5-min/index.html b/react-and-javascript-in-5-min/index.html index 6e131a8..776e97a 100644 --- a/react-and-javascript-in-5-min/index.html +++ b/react-and-javascript-in-5-min/index.html @@ -396,6 +396,14 @@

Related posts:

+ + + + + + + + diff --git a/react-api-evolution/index.html b/react-api-evolution/index.html index ee358c4..18b2354 100644 --- a/react-api-evolution/index.html +++ b/react-api-evolution/index.html @@ -459,6 +459,14 @@

Related posts:

+ + + + + + + + diff --git a/react-conf-2018/index.html b/react-conf-2018/index.html index 7b4d0c3..96db4cc 100644 --- a/react-conf-2018/index.html +++ b/react-conf-2018/index.html @@ -303,6 +303,14 @@

Related posts:

+ + + + + + + + diff --git a/replacing-jekyll/index.html b/replacing-jekyll/index.html index 4e7332e..30f7e44 100644 --- a/replacing-jekyll/index.html +++ b/replacing-jekyll/index.html @@ -239,6 +239,14 @@

Related posts:

+ + + + + + + + diff --git a/side-projects-are-hard/index.html b/side-projects-are-hard/index.html index 9d9ad11..7319eed 100644 --- a/side-projects-are-hard/index.html +++ b/side-projects-are-hard/index.html @@ -230,6 +230,14 @@

Related posts:

+ + + + + + + + diff --git a/test-plan/index.html b/test-plan/index.html index f78597c..372074a 100644 --- a/test-plan/index.html +++ b/test-plan/index.html @@ -216,6 +216,14 @@

Related posts:

+ + + + + + + + diff --git a/the-first-react-native-app/index.html b/the-first-react-native-app/index.html index 3262075..c1ef951 100644 --- a/the-first-react-native-app/index.html +++ b/the-first-react-native-app/index.html @@ -240,6 +240,14 @@

Related posts:

+ + + + + + + + diff --git a/using-redux-with-flow/index.html b/using-redux-with-flow/index.html index 1b4831f..7d15bf2 100644 --- a/using-redux-with-flow/index.html +++ b/using-redux-with-flow/index.html @@ -300,6 +300,14 @@

Related posts:

+ + + + + + + + diff --git a/whos-watching-the-watchdog/index.html b/whos-watching-the-watchdog/index.html new file mode 100644 index 0000000..f0ac6f7 --- /dev/null +++ b/whos-watching-the-watchdog/index.html @@ -0,0 +1,239 @@ + + + + + + + Who's watching the watchdog? / frantic.im + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ +

Who's watching the watchdog?

+
+

At my current company we have an automated pipeline for processing customer’s orders. It’s pretty complex — talking to multiple different services, training models, storing large files, updating the database, sending emails and push notifications.

+

Sometimes things get stuck because of a temporary 3rd party outage or a bug in our code.

+

So we built a watchdog service: it monitors the stream of orders and makes sure the orders get processed within reasonable timeframe (3 hours). The watchdog only looks at the final invariant — was the order fulfilled and delivered to the customer? It doesn’t care about any intermediary steps.

+

This system has saved us many times. When the watchdog finds a stuck order, it posts in our special channel in Slack. We investigate the problem and address the root cause, so hopefully we won’t see new orders stuck for the same reason.

+

But who’s watching the watchdog? What if it fails to run?

+

It actually happened to us once. The watchdog is running on the job scheduling system, and that system went down. That meant no orders were getting processed and watchdog also wasn’t running. The alerts channel in Slack was blissfully silent.

+

To address this case, we need a system that can watch the watchdog. We are using these two:

+ +

The idea behind both systems is the same: they expect a regular cron job to “check in” on a pre-defined schedule. If it misses a check-in, there’s likely a problem and we get an alert in Slack.

+

Complex systems always find surprising ways to fail. When adding an end-to-end quality watchdog (and ways to watch the watchdog) you can create a positive loop of detecting issues and hardening the system.

+ + + + + + + + + +
+
+

Hello! This text lives here to convince you to subscribe. If you are reading this, consider clicking that subscribe button for more details.

+

I write about programming, software design and side projects Subscribe

+
+
+ +
+ + + + +