Release v1.1.1 · milakov/nnForge

Using space-filling curve for all the convolutional updaters, testers and hessians in CUDA backend, training large networks performance improved
Improved concurrent training and loading/processing input data for all the stages by loading data in a separate host thread, CUDA backend only
In-memory supervised data reader added
Added NVTX profiling for reading input data, CUDA backend only
Fixed:
- Binding texture to too large linear buffer
- Average subsampling backprop in CUDA backend is wrong for non-even configs
- Fixed performance in Windws with WDDM driver

Provide feedback