I am developing this project to learn C++ and get hands-on experience with inference engines.
- clone the project:
git clone [email protected]:MichalPitr/inference_engine.git
cd inference_engine
sh build.sh
CMake will complain if you are missing some system dependencies: protobuf, gtest, google benchmark, yaml-cpp
This starts up an http server and uses python to send requests. You can also do the equivalent with curl via command line.
- Build like explained above.
cd server
go run main.go
- Open another terminal
cd utils
source venv/bin/activate
python infer_server.py
- Optimize Cuda kernels. Gemm is very naive at the moment.
- Add dynamic batching to Go server.
- Add graph optimizations.
- Add input validations to Go server.
- Optimize memory allocator usage - should check available memory during loading, total memory usage can be pretty accurately estimated.
- Improve error handling.
- Explore NVTX profiling.
This project wasn't designed with the idea of external contributions but if you fancy, improvements are welcome!
I enjoy writing technical blog posts and I've written some about this project: