Skip to content

Latest commit

 

History

History
90 lines (64 loc) · 3.97 KB

README.md

File metadata and controls

90 lines (64 loc) · 3.97 KB

crowd

High Available Reverse Proxy for Asynchronous Message Consumption

Status

⚠️ At the moment the project is just a spike, don't take it too seriously.

Motivation

The project started because of a common usecase for an endpoint (HTTP) to ingest data, where the user on the client side expects to receive a delivery confirmation that the message has been received, yet the response itself doens't matter.

On a company that I used to work for, that was the case for the entry point of leads, which is the source of 💰 that must be always available.

As the image above shows, both the database writes and mailing were triggered on time of the request. We'll have seen that, is a pretty common pattern. The problem is that scaling it creates all sorts of issues, mostly unnecessary ones. People try to scale the database to handle the back-pressure, which even if you have to do it, you don't want to depend entirely on a database write to handle enormous data ingestion.

Also, you probably handle different loads during the day, or time of the year, which would make you pay for a database on the worst-case-scenario but only make use of it during the spikes.

e.g.:

Request:

POST /leads HTTP/1.1
Content-Type: application/json
Accept: application/json
Content-Length: 327
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0.1 Safari/605.1.15

{"message":"this is a lead message", "shop": "a23m07b23"}

Response:

HTTP/1.1 200
Content-Type: application/json;charset=UTF-8
Vary: Accept-Encoding
Connection: keep-alive
Date: Sun, 09 Dec 2018 11:07:54 GMT
Content-Encoding: gzip
Transfer-Encoding: Identity

{"status": "OK"}

crowd: An alternative

The whole point of crowd is to have a high available endpoint to avoid package loss, disregarding the processing. If you need to return something to the user on the time of the request, it's probably not for you.

Which is a quite common pattern for data-ingestion in general: statistics, events, tracking, order-placement, queue items, ...

That way you can easily control an acceptable throughput to the rest of your system.

You can use crowd to handle back-pressure on existing endpoints and scale your current REST APIs to consume the messages later:

Or even as an entrypoint to your stream/queues with multiple consumers:

Disclaimer

It is important to notice that, for most projects, you could probably rewrite your endpoints to push directly to queues and call it a day. However, the scenario where this idea was conceived would involve having to come up with a standard way of doing this across multiple languages, frameworks, teams,... which would be more expensive than implementing only once and solving it as an infrastructural problem.

Contributing

Commands

make help                           Lists the available commands
make install                        Installs app dependencies
make run                            Runs the app
make test                           Runs tests
make compose                        Runs docker compose dependencies
make decompose                      Stops docker compose dependencies