C++ Inference Engine from scratch

I am developing this project to learn C++ and get hands-on experience with inference engines.

How to build

CMake will complain if you are missing some system dependencies: protobuf, gtest, google benchmark, yaml-cpp

This starts up an http server and uses python to send requests. You can also do the equivalent with curl via command line.

Optimize Cuda kernels. Gemm is very naive at the moment.
Add dynamic batching to Go server.
Add graph optimizations.
Add input validations to Go server.
Optimize memory allocator usage - should check available memory during loading, total memory usage can be pretty accurately estimated.
Improve error handling.
Explore NVTX profiling.

This project wasn't designed with the idea of external contributions but if you fancy, improvements are welcome!

I enjoy writing technical blog posts and I've written some about this project: