Skip to content

Add a new framework

sf-wind edited this page Jul 25, 2018 · 2 revisions

The AI performance evaluation platform is a framework and backend agnostic platform. Different frameworks/backends can be plugged in to the platform with minimal change. The harness drives the frameworks running on device to collect the desired metrics. However, different frameworks may be invoked differently. They may also generate the metrics in different formats. Thus, the harness needs to customize the driver for the framework and convert the generated metrics to a unified format.

Framework specific driver

In the harness codebase, benchmarking/frameworks directory contains all the necessary code change to on board a new framework. Each added framework is a subdirectory and implements a class inherited from framework_base.py. The class needs to implement all the methods in framework_base.py.

You can use the existing frameworks caffe2, oculus, tflite as examples to add your framework.

Unified format

The unified format is a dictionary of dictionary. The key of the outer dictionary is only used to match metrics in different benchmark runs (e.g. treatment and control). It is not part of the metrics and is not reported. The value of the outer dictionary is another dictionary, which contains the metrics to be reported. The content of hte dictionary is metric dependent. But most often contains the following fields:

  • type: a unique identifier. It may be NET to indicate the metric is for an entire model, or an operator name. It can be other meaningful identifier for your purpose. Entries with the same type refer to the same entity.
  • metric: a string to indicate the metric. Usually type and metric uniquely identifies the entry in one benchmark run.
  • values: a list of float numbers indicating the collected metrics in multiple iterations of one benchmark run.
  • unit: a string indicating the unit of the metric.
  • info_string (optional): a string that provides more information on the metric or model. In different iterations of one benchmark run, its content must be identical.
  • summary (optional): the summary statistics of multiple iterations of the metric. It is usually calculated from the raw data saved in metric. However, in some frameworks (e.g. tflite), the framework binary reports the statistics directly instead of the raw metric. Thus, it can be explicitly populated and the harness will use the values directly.