NVDLA enables accelerating neural network inference job which is achieved in two steps
- Optimize trained neural network for DLA hardware and convert the graph to DLA HW instructions. This converted graph is saved to a flatbuffer file called as loadable. This is achieved using NVDLA compiler and performed offline on host system.
- Run inference job on DLA using loadable from step 1. This is achieved using NVDLA runtime and performed on target system.
This section explains how to run test application on available NVDLA platforms and dependencies for it. First dependency to run test application is loadable generated from NVDLA compiler. Refer to NVDLA Compiler for more details to generate loadable for a network.
ResNet-50 model from https://github.com/KaimingHe/deep-residual-networks is verified on this platform for all configurations (nv_full/nv_large/nv_small) and it can be used to start with.
This section explains how to run test application on docker container which has pre-built binaries for nv_full configuration.
docker pull nvdla/vp
docker run -it -v /home:/home nvdla/vp
cd /usr/local/nvdla
aarch64_toplevel -c aarch64_nvdla.lua
mount -t 9p -o trans=virtio r /mnt
cd /mnt
insmod drm.ko
insmod opendla_1.ko
Expected output after installing NVDLA driver
[ 310.625140] opendla: loading out-of-tree module taints kernel.
[ 310.629362] 0 . 12 . 5
[ 310.629567] reset engine done
[ 310.633122] [drm] Initialized nvdla 0.0.0 20171017 for 10200000.nvdla on minor 0
./nvdla_runtime --loadable fast-math.nvdla --image 0000.jpg
fast-math.nvdla : loadable generated from NVDLA compiler
0000.jpg : 224x224 image for ResNet-50 model
It takes very long to execute ResNet-50 on virtual platform. It took ~2.5hrs for fp16 and ~5hrs for int8. Sometimes it looks like hang but wait.
ctrl+a x
This section explains how to run test application on virtual platform without any pre-built binaries. Assumption: Host system is Ubuntu16.04
- Install system requirements as per System Requirements for Virtual Platform
- Build and install Buildroot
- Create required directories
mkdir -p /usr/local/nvdla/images/linux-4.13.3
- Copy Linux kernel image, rootfs and drm driver
cp {buildroot-root}/output/images/Image /usr/local/nvdla/images/linux-4.13.3/
cp {buildroot-root}/output/images/rootfs.ext4 /usr/local/nvdla/images/linux-4.13.3/
cp {buildroot-root}/output/build/linux-4.13.3/drivers/gpu/drm/drm.ko /usr/local/nvdla/
- Build NVDLA Kernel Driver
- Copy NVDLA kernel driver
cp {sw-repo-root}/kmd/port/linux/opendla.ko /usr/local/nvdla/
- Build virtual simulator
- Install virtual simulator
cp {vp-repo-root}/build/bin/aarch64_toplevel /usr/bin/
cp {vp-repo-root}/build/lib/libcosim_sc_wrapper.so /usr/lib/
cp {vp-repo-root}/build/lib/libnvdla.so /usr/lib/
cp {vp-repo-root}/build/lib/libqbox-nvdla.so /usr/lib/
cp {vp-repo-root}/build/lib/liblog.so /usr/lib/
cp {vp-repo-root}/build/lib/libnvdla_cmod.so /usr/lib/
cp {vp-repo-root}/build/lib/libsimplecpu.so /usr/lib/
cp {vp-repo-root}/conf/aarch64_nvdla.lua /usr/lib/
cp {vp-repo-root}/libs/qbox.build/share/qemu/efi-virtio.rom /usr/local/nvdla
- Build NVDLA runtime
- Copy runtime lib and app
cp {sw-repo-root}/umd/out/apps/runtime/nvdla_runtime/nvdla_runtime /usr/local/nvdla
cp {sw-repo-root}/umd/out/core/src/runtime/libnvdla_runtime/libnvdla_runtime.so /usr/local/nvdla
- Download ResNet-50 caffe model from https://github.com/KaimingHe/deep-residual-networks
- Generate loadable using NVDLA Compiler
- Run simulator
- Insert kernel driver modules
- Run application
- Exit virtual simulator
This section explains how to run ResNet-50 caffe model on FireSim platform. This platform supports only nv_large and nv_small configurations currently.
- If FireSim setup is not done then do the setup and come back, otherwise jump to step 2.
- Configure FireSim for NVDLA (nv_large/nv_small)
- Build NVDLA runtime for RISC-V
export TOP={firesim-nvdla-repo}/sw/firesim-software/nvdla/sw/umd
make TOOLCHAIN_PREFIX={firesim-nvdla-repo}/riscv-tools-install/bin/riscv64-unknown-linux-gnu- runtime
- Copy NVDLA runtime lib and test application
cp {firesim-nvdla-repo}/sw/firesim-software/nvdla/sw/umd/out/core/src/runtime/libnvdla_runtime/libnvdla_runtime.so {firesim-nvdla-repo}/sw/firesim-software/workloads/nvdla/overlay/root/nvdla/
cp {firesim-nvdla-repo}/sw/firesim-software/nvdla/sw/umd/out/apps/runtime/nvdla_runtime/nvdla_runtime {firesim-nvdla-repo}/sw/firesim-software/workloads/nvdla/overlay/root/nvdla/
- Copy loadable and image
cp {some-path}/fast-math.nvdla {firesim-nvdla-repo}/sw/firesim-software/workloads/nvdla/overlay/root/nvdla/
cp {some-path}/0000.jpg {firesim-nvdla-repo}/sw/firesim-software/workloads/nvdla/overlay/root/nvdla/
- Build NVDLA software
cd firesim-nvdla/sw/firesim-software
./marshal -v build workloads/nvdla.json
./marshal install workloads/nvdla.json
- Launch simulation
- After login to the system run NVDLA runtime test application
./nvdla_runtime --loadable fast-math.nvdla --image 0000.jpg
NVDLA compiler is used to optimize neural network for DLA HW architecture and create list of HW instructions to run inference on DLA. NVDLA compiler can be built from source code or directly use pre-compiled binary
Usage: ./nvdla_compiler [options] --prototxt <prototxt_file> --caffemodel <caffemodel_file>
where options include:
-h print this help message
-o <outputpath> outputs wisdom files in 'outputpath' directory
--profile <basic|default|performance|fast-math> computation profile (default: fast-math)
--cprecision <fp16|int8> compute precision (default: fp16)
--configtarget <nv_full|nv_large|nv_small> target platform (default: nv_full)
--calibtable <int8 calibration table> calibration table for INT8 networks (default: 0.00787)
--quantizationMode <per-kernel|per-filter> quantization mode for INT8 (default: per-kernel)
--batch batch size (default: 1)
--informat <ncxhwx|nchw|nhwc> input data format (default: nhwc)
./nvdla_compiler --prototxt ResNet-50-deploy.prototxt --caffemodel ResNet-50-model.caffemodel -o . --profile fast-math --cprecision int8 --configtarget nv_small --calibtable resnet50.json --quantizationMode per-filter --batch 1 --informat nhwc
./nvdla_compiler --prototxt ResNet-50-deploy.prototxt --caffemodel ResNet-50-model.caffemodel -o . --profile fast-math --cprecision int8 --configtarget nv_large --calibtable resnet50.json --quantizationMode per-filter --batch 1 --informat nhwc
./nvdla_compiler --prototxt ResNet-50-deploy.prototxt --caffemodel ResNet-50-model.caffemodel -o . --profile fast-math --cprecision fp16 --configtarget nv_full --batch 1 --informat nhwc
./nvdla_compiler --prototxt ResNet-50-deploy.prototxt --caffemodel ResNet-50-model.caffemodel -o . --profile fast-math --cprecision int8 --configtarget nv_full --calibtable resnet50.json --quantizationMode per-filter --batch 1 --informat nhwc
Once the compilation is successful, it will generate .nvdla file in output directort specified using -o argument. For example, in above case it will generate fast-math.nvdla in curren directory.
NVDLA Compiler can be updated using source code and rebuild as below
export TOP={sw-repo-root}/umd
make compiler
Note :
In some cases if compiler build fails because of linking error with protobuf library then rebuild protobuf library as below
./configure --enable-shared
make check
sudo make install
NVDLA compiler is used to run inference on DLA platform using loadable generated from NVDLA compiler. NVDLA runtime can be built from source code or directly use pre-compiled binary arm64 or risc-v
Usage: ./nvdla_runtime [-options] --loadable <loadable_file>
where options include:
-h print this help message
-s launch test in server mode
--image <file> input jpg/pgm file
--normalize <value> normalize value for input image
--mean <value> comma separated mean value for input image
--rawdump dump raw dimg data
./nvdla_runtime --loadable fast-math.nvdla --image 0000.jpg --rawdump
NVDLA Runtime can be updated using source code and rebuild as below
export TOP={sw-repo-root}/umd
make TOOLCHAIN_PREFIX=<path_to_toolchanin> runtime
For example:
export TOP={sw-repo-root}/umd
make TOOLCHAIN_PREFIX={buildroot-root}/output/host/bin/aarch64-linux-gnu- runtime
export TOP={firesim-nvdla-repo}/sw/firesim-software/nvdla/sw/umd
make TOOLCHAIN_PREFIX={firesim-nvdla-repo}/riscv-tools-install/bin/riscv64-unknown-linux-gnu- runtime
ARM64 build is dependent on buildroot installation.
RISC-V build is dependent on RISC-V tools installation
NVDLA Kernel Driver for ARM64 virtual platform is loaded as a kernel module. It's source code is at https://github.com/nvdla/sw/tree/master/kmd and pre-built binary is at https://github.com/nvdla/sw/blob/master/prebuilt/arm64-linux/
Register mappings for nv_small/nv_large and nv_full configurations are different and hence pre-built includes two binaries:
- opendla_1.ko : for nv_full
- opendla_2.ko : for nv_large and nv_small
define DLA_2_CONFIG if you want to build driver for nv_small or nv_large configuration otherwise keep it undefined
make KDIR={buildroot-root}/output/build/linux-4.13.3 ARCH=arm64 CROSS_COMPILE={buildroot-root}/output/host/bin/aarch64-linux-gnu-
Refer to buildroot for Linux kernel and toolchain
Currently only FireSim is available as an RISC-V platform. NVDLA Kernel Driver is integrated as part of Linux kernel and present at https://github.com/nvdla/riscv-linux/tree/firesim-nvdla/drivers/nvdla riscv-linux repo is present as sub-module in https://github.com/nvdla/firesim-nvdla and not required to clone separately. It will get cloned and built as part of FireSim setup.
If you want to update NVDLA kernel driver then update code at {firesim-nvdla-repo-root}/sw/firesim-software/risc-linux/drivers/nvdla and run below commands to build and install driver from {firesim-nvdla-repo-root}/sw/firesim-software/
./marshal -v build workloads/nvdla.json
./marshal install workloads/nvdla.json
Below platforms are available for NVDLA development and verification
More details at http://nvdla.org/vp.html
Docker container has pre-installed all system requirements to build virtual simulator. If not using docker container then refer to installing system requirements.
git clone https://github.com/nvdla/hw.git
cd hw
git checkout origin/nvdla1
Options to select for nv_full configuration
Enter project names (Press ENTER to use: nv_full):nv_full
Enter c pre-processor path (Press ENTER to use: /home/utils/gcc-4.9.3/bin/cpp):/usr/bin/cpp
Enter g++ path (Press ENTER to use: /home/utils/gcc-4.9.3/bin/g++):/usr/bin/g++
Enter perl path (Press ENTER to use: /home/utils/perl-5.8.8/bin/perl):/usr/bin/perl
Enter java path (Press ENTER to use: /home/utils/java/jdk1.8.0_131/bin/java):/usr/bin/java
Enter systemc path (Press ENTER to use: /usr/local/systemc-2.3.0/):
OPTIONAL: Enter verilator path (Press ENTER to use: verilator):
OPTIONAL: Enter clang path (Press ENTER to use: clang):
tools/bin/tmake -build cmod_top
Download and Build VP
git clone https://github.com/nvdla/vp.git
cd vp
git submodule update --init --recursive
cmake -DCMAKE_INSTALL_PREFIX=[install dir] -DSYSTEMC_PREFIX=[systemc prefix] -DNVDLA_HW_PREFIX=[nvdla_hw prefix] -DNVDLA_HW_PROJECT=[nvdla_hw project name]
For example:
cmake -DCMAKE_INSTALL_PREFIX=build -DSYSTEMC_PREFIX=/usr/local/systemc-2.3.0/ -DNVDLA_HW_PREFIX=/odla/vpr/nv_full -DNVDLA_HW_PROJECT=nv_full
make install
git clone https://github.com/nvdla/hw.git
cd hw
git checkout origin/master
Options to select for nv_large configuration
Enter project names (Press ENTER if use: nv_small nv_small_256 nv_small_256_full nv_medium_512 nv_medium_1024_full nv_large):nv_large
Using designware or not [1 for use/0 for not use] (Press ENTER if use: 1):
Enter design ware path (Press ENTER if use: /home/tools/synopsys/syn_2011.09/dw/sim_ver):
Enter c pre-processor path (Press ENTER if use: /home/utils/gcc-4.8.2/bin/cpp):/usr/bin/cpp
Enter gcc path (Press ENTER if use: /home/utils/gcc-4.8.2/bin/gcc):/usr/bin/gcc
Enter g++ path (Press ENTER if use: /home/utils/gcc-4.8.2/bin/g++):/usr/bin/g++
Enter perl path (Press ENTER if use: /home/utils/perl-5.10/5.10.0-threads-64/bin/perl):/usr/bin/perl
Enter java path (Press ENTER if use: /home/utils/java/jdk1.8.0_131/bin/java):/usr/bin/java
Enter systemc path (Press ENTER if use: /home/ip/shared/inf/SystemC/1.0/20151112/systemc-2.3.0/GCC472_64_DBG):/usr/local/systemc-2.3.0
Enter python path (Press ENTER if use: /home/tools/continuum/Anaconda3-5.0.1/bin/python):/usr/bin/python
Enter vcs_home path (Press ENTER if use: /home/tools/vcs/mx-2016.06-SP2-4):
Enter novas_home path (Press ENTER if use: /home/tools/debussy/verdi3_2016.06-SP2-9):
Enter verdi_home path (Press ENTER if use: /home/tools/debussy/verdi3_2016.06-SP2-9):
OPTIONAL: Enter verilator path (Press ENTER to use: verilator):
OPTIONAL: Enter clang path (Press ENTER to use: /home/utils/llvm-4.0.1/bin/clang):
tools/bin/tmake -build cmod_top
Download and Build VP
git clone https://github.com/nvdla/vp.git
cd vp
git submodule update --init --recursive
cmake -DCMAKE_INSTALL_PREFIX=[install dir] -DSYSTEMC_PREFIX=[systemc prefix] -DNVDLA_HW_PREFIX=[nvdla_hw prefix] -DNVDLA_HW_PROJECT=[nvdla_hw project name]
For example:
cmake -DCMAKE_INSTALL_PREFIX=build -DSYSTEMC_PREFIX=/usr/local/systemc-2.3.0/ -DNVDLA_HW_PREFIX=/odla/vpr/nv_large -DNVDLA_HW_PROJECT=nv_large
make install
git clone https://github.com/nvdla/hw.git
cd hw
git checkout origin/master
Options to select for nv_small configuration
Enter project names (Press ENTER if use: nv_small nv_small_256 nv_small_256_full nv_medium_512 nv_medium_1024_full nv_large):nv_small
Using designware or not [1 for use/0 for not use] (Press ENTER if use: 1):
Enter design ware path (Press ENTER if use: /home/tools/synopsys/syn_2011.09/dw/sim_ver):
Enter c pre-processor path (Press ENTER if use: /home/utils/gcc-4.8.2/bin/cpp):/usr/bin/cpp
Enter gcc path (Press ENTER if use: /home/utils/gcc-4.8.2/bin/gcc):/usr/bin/gcc
Enter g++ path (Press ENTER if use: /home/utils/gcc-4.8.2/bin/g++):/usr/bin/g++
Enter perl path (Press ENTER if use: /home/utils/perl-5.10/5.10.0-threads-64/bin/perl):/usr/bin/perl
Enter java path (Press ENTER if use: /home/utils/java/jdk1.8.0_131/bin/java):/usr/bin/java
Enter systemc path (Press ENTER if use: /home/ip/shared/inf/SystemC/1.0/20151112/systemc-2.3.0/GCC472_64_DBG):/usr/local/systemc-2.3.0
Enter python path (Press ENTER if use: /home/tools/continuum/Anaconda3-5.0.1/bin/python):/usr/bin/python
Enter vcs_home path (Press ENTER if use: /home/tools/vcs/mx-2016.06-SP2-4):
Enter novas_home path (Press ENTER if use: /home/tools/debussy/verdi3_2016.06-SP2-9):
Enter verdi_home path (Press ENTER if use: /home/tools/debussy/verdi3_2016.06-SP2-9):
OPTIONAL: Enter verilator path (Press ENTER to use: verilator):
OPTIONAL: Enter clang path (Press ENTER to use: /home/utils/llvm-4.0.1/bin/clang):
tools/bin/tmake -build cmod_top
Download and Build VP
git clone https://github.com/nvdla/vp.git
cd vp
git submodule update --init --recursive
cmake -DCMAKE_INSTALL_PREFIX=[install dir] -DSYSTEMC_PREFIX=[systemc prefix] -DNVDLA_HW_PREFIX=[nvdla_hw prefix] -DNVDLA_HW_PROJECT=[nvdla_hw project name]
For example:
cmake -DCMAKE_INSTALL_PREFIX=build -DSYSTEMC_PREFIX=/usr/local/systemc-2.3.0/ -DNVDLA_HW_PREFIX=/odla/vpr/nv_small -DNVDLA_HW_PROJECT=nv_small
make install
FireSim-NVDLA is a fork of the FireSim FPGA-accelerated full-system simulator integrated with NVIDIA Deep Learning Accelerator (NVDLA).
https://github.com/nvdla/firesim-nvdla is forked from https://github.com/CSL-KU/firesim-nvdla to run NVDLA native test application on FireSim platform.
Original FireSim+NVDLA integration is maintained by the Computer Systems Design Laboratory at the University of Kansas. FireSim-NVDLA runs on the Amazon FPGA cloud (EC2 F1 instance).
To work with FireSim-NVDLA, first, you need to learn how to use FireSim. It is recommended to follow the steps in the FireSim documentation (v1.6.0) to set up the simulator and run a single-node simulation. Please make sure that you are following the right version of the documentation. The only difference in setup is you use the URL of this repository when cloning in Setting up the FireSim Repo:
git clone https://github.com/nvdla/firesim-nvdla
cd firesim-nvdla
./build-setup.sh fast
After successfully running a single-node simulation, come back to this guide and follow the rest of instructions to run test application on FireSim platform.
Note: Make sure that you are using FPGA Developer AMI - 1.6.0
. Version 1.5.0 no longer works due to the issues related to Python.
Configure FireSim to simulate the target which has the NVDLA model. For that, in firesim-nvdla/deploy/config_runtime.ini
, change the parameter defaulthwconfig
to firesim-quadcore-no-nic-nvdla-ddr3-llc4mb
. Additionally, change workloadname
to nvdla.json
. Your final config_runtime.ini
should look like this:
# RUNTIME configuration for the FireSim Simulation Manager
# See docs/Advanced-Usage/Manager/Manager-Configuration-Files.rst for documentation of all of these params.
# This references a section from config_hwconfigs.ini
# In homogeneous configurations, use this to set the hardware config deployed
# for all simulators
sudo apt-get update
sudo apt-get install g++ cmake libboost-dev python-dev libglib2.0-dev libpixman-1-dev liblua5.2-dev swig libcap-dev libattr1-dev default-jdk
Steps required if using Ubuntu higher than 14.04
sudo apt-get install python-software-properties
sudo apt-get install software-properties-common
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get install gcc-4.8
sudo apt-get install g++-4.8
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.8 50
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.8 50
wget -O systemc-2.3.0a.tar.gz http://www.accellera.org/images/downloads/standards/systemc/systemc-2.3.0a.tar.gz
tar -xzvf systemc-2.3.0a.tar.gz
cd systemc-2.3.0a
sudo mkdir -p /usr/local/systemc-2.3.0/
mkdir objdir
cd objdir
../configure --prefix=/usr/local/systemc-2.3.0
sudo make install
wget -O YAML-1.24.tar.gz http://search.cpan.org/CPAN/authors/id/T/TI/TINITA/YAML-1.24.tar.gz
tar -xzvf YAML-1.24.tar.gz
cd YAML-1.24
perl Makefile.PL
sudo make install
wget -O IO-Tee-0.65.tar.gz http://search.cpan.org/CPAN/authors/id/N/NE/NEILB/IO-Tee-0.65.tar.gz
tar -xzvf IO-Tee-0.65.tar.gz
cd IO-Tee-0.65
perl Makefile.PL
sudo make install
cpan -i Capture::Tiny [Note: Fix nvdla.org for it]
cpan -i XML::Simple [Note: Fix nvdla.org for it]
git clone https://github.com/nvdla/buildroot
make qemu_aarch64_virt_defconfig
make menuconfig
* Target Options -> Target Architecture -> AArch64 (little endian)
* Target Options -> Target Architecture Variant -> cortex-A57
* Toolchain -> Custom kernel headers series -> 4.13.x
* Toolchain -> Toolchain type -> External toolchain
* Toolchain -> Toolchain -> Linaro AArch64 2017.08
* Toolchain -> Toolchain origin -> Toolchain to be downloaded and installed
* Toolchain -> Copy gdb server to the Target
* Kernel -> () Kernel version -> 4.13.3
* Kernel -> Kernel configuration -> Use the architecture default configuration
* System configuration -> Enable root login with password -> Y
* System configuration -> Root password -> nvdla
* Target Packages -> Show packages that are also provided by busybox -> Y
* Target Packages -> Networking applications -> openssh -> Y
* Target Packages -> Debugging, profiling and benchmark -> gdb -> Y
* Target Packages -> Debugging, profiling and benchmark -> full debugger -> Y
make -j4
Toolchain is downloaded at below location which can be used to build NVDLA kernel driver and NVDLA runtime for ARM64
Linux kernel 4.13.3 is downloaded at below location which can be used to build NVDLA kernel driver for ARM64