-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scalable data ingestion architecture with microservices #53
Comments
I agree with this. Not sure that the individual lib/packages currently being bundled in (specifically for encoders) is the best route forward. How would one accomplish the layering part?! |
Good question! It rarely happens to find a one-fits-all solution, nevertheless I find inspiration in the those that drive modern technology (Elastic Search, Netflix, Kubernetes). To have aclearer Idea I'd need to understand the expected super-rag workflow in detail (maybe an flow chart), but essentially we are looking at two fundamental design patterns: a) Producer-ConsumerTraditional Queue-Broker/Exchange-Celery pub/sub as you know it b) Control Plane - Data planeThe pipeline is entirely driven by the topic message [payload]. Comparable to langgraphs' "Graph", composed by nodes and edges that implementing an upstream/downstream communication system. What I can tell from my experience is the traditional (a) design might not fit the kind of scalability modern technology demands (super-rag might be the case), unless you want to have DevOps team shooting themselves in the foot as queues start spilling overnight. You may find detailed context in these articles: |
@teocns I don't think we should worry about this since we prefer horizontal scaling over serverless functions at the moment. So that's not big deal I guess |
@elisalimli Are we looking at a monolithic multi-threaded worker application sharing the same process runtime? |
I added some optimisations in there now to allow for some concurrency without hitting rate limits. |
@homanp perfect solution for not blocking I/O, though let's keep in mind that it operates on one CPU core. Based on how we think of deploying this in the future, we might want to use multiprocessing pools and let the user specify |
Makes sense |
I see two ways forward:
Not sure which approach is best atm. |
for the current moment, yes. |
I have now decoupled the |
Building the Docker image I noticed it was sized at 7GB in size and that it took 5 minutes for the build to complete.
It turns out PyTorch is responsible for pulling its massive nvidia driver libraries packaged in
torch[gpu]
, while its siblingtorch[cpu]
bottoms down at 56mb.pip list sorted by size
New to poetry, I've been through several community discussions covering the same kind of issue:
While the solution could've been as simple as adding
extras = [ "cpu" ]
or setting the/cpu
branch astorch
wheels source URL, that wasn't possible.The last resort is bundling a list of
.whl
URLs passed to PyTorch's dependency source by intersecting:The list got quite long, the build time took longer (3x compared to the original).
The XY problem?
Super-rag's vision is a highly available and scalable API backed by workers, thus looking at a microservice-oriented architecture.
Torch is a heavy-lifting CPU/GPU-bound toolkit meant to be decoupled from the IO-bound API. It is a use-case, and it is probable that as the project grows, other "strategies" will be implemented, each with their own use cases.
It is essential for workers' images to be minimal. The image size directly impacts launch-time [availability]: you want a worker's image to be pulled, loaded in memory and start as quickly as possible.
Therefore, analyzing and understanding the use cases for dependencies helps identifying common libraries or services defined as reusable granular image layers.
As for now, my proposed solution is to have individual images (or layers) that only ships with what is strictly necessary for a given triplet (platform, architecture, python version).
Feel free to brainstorm with me on this subject; ideas are always welcome!
The text was updated successfully, but these errors were encountered: