See 2020.1 Vitis Application Acceleration Development Flow Tutorials |
In this lab, you will experience the acceleration potential by running the application first as a software-only version and then as an optimized FPGA-accelerated version using a precompiled FPGA accelerator.
-
Run the following command to set up the application.
# Source the Vitis runtime environment export LAB_WORK_DIR=<Downloaded Github repository>/Hardware_Accelerators/Design_Tutorials/02-bloom
-
Next, build the C application:
-
Navigate to the
cpu_src
directory. -
Use the following command to run the original application with the number of documents as the argument, and generate the golden output file for comparison.
cd $LAB_WORK_DIR/cpu_src/ make run
The generated output compute scores are stored in the host code in the
cpu_profile_score
array that represents the outputs for the total number of specified documents. The results will look similar to the following:./host 100000 Initializing data Creating documents - total size : 1398.903 MBytes (349725824 words) Creating profile weights Total execution time of CPU | 2949.3867 ms Compute Hash processing time | 2569.3266 ms Compute Score processing time | 380.0601 ms -------------------------------------------------------------------- Execution COMPLETE
-
-
Run the application on the FPGA. For the purposes of this lab, the FPGA accelerator is implemented with an 8x parallelization factor.
-
Eight input words are processed in parallel, producing eight output flags in parallel during each clock cycle.
To run the optimized application on the FPGA, run the following
make
command.make run_fpga SOLUTION=1
The following output displays.
Processing 1398.905 MBytes of data Splitting data in 8 sub-buffers of 174.863 MBytes for FPGA processing -------------------------------------------------------------------- Executed FPGA accelerated version | 427.1341 ms ( FPGA 230.345 ms ) Executed Software-Only version | 3057.6307 ms -------------------------------------------------------------------- Verification: PASS
The computed throughput is:
Throughput = Total data/Total time = 1.39 GB/427.1341ms = 3.25 GB/s
By efficiently leveraging FPGA acceleration, the throughput of the application increases by a factor of 7.
-
In this step, you observed the acceleration that can be achieved using an FPGA. Next, you will architect the application for the application and dive into what functions can be accelerated by profiling the original applications. You will also define the interface boundaries and performance constraints to achieve the desired acceleration.
Return to Getting Started Pathway — Return to Start of Tutorial
Copyright© 2020 Xilinx