Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVIDIA: Can I use Xavier MLPerf Inference code base for MLPerf Infernce benchmarking on Jetson TX2 and Nano. #15

Open
Mamtesh11 opened this issue Jan 27, 2020 · 19 comments

Comments

@Mamtesh11
Copy link

Mamtesh11 commented Jan 27, 2020

I tried MLPerf Inference code base by NVIDIA for Xavier (Close Div) and It gives same result as Published. I want to use this code on TX2 and Nano. However I tried it with --gpu_only flag but didn't work out. So, can submitter from NVIDIA help to do MLPerf benchmarking on TX2 and Nano with published code base.

@Mamtesh11 Mamtesh11 changed the title NVIDIA: Can I use Xavier MLPerf Inference code base for MLPerf Infernce becnh martking on Jetson TX2 and Nano. NVIDIA: Can I use Xavier MLPerf Inference code base for MLPerf Infernce benchmarking on Jetson TX2 and Nano. Jan 27, 2020
@psyhtest
Copy link

@Mamtesh11 That's an excellent question, I was wondering the same. The answer from NVIDIA was maybe, but some modifications and experimentation are needed. We (dividiti) are looking into this right now.

/cc @nvpohanh

@Mamtesh11
Copy link
Author

@psyhtest Yes, I saw them and tried to reproduce the same, run_harness process got killed in case of mobilenet multistream scenario.

@psyhtest
Copy link

We generated TensorRT plans for the Xavier configuration on a machine with GTX 1080 (compute capability 6.1). Unfortunately, we then failed to deploy it on both TX1 (compute capability 5.3) and TX2 (compute capability 6.2), e.g.:

[TensorRT] ERROR: INVALID_CONFIG: The engine plan file is generated on an incompatible device, expecting compute 6.2 got compute 6.1, please rebuild. 
[TensorRT] ERROR: engine.cpp (1324) - Serialization Error in deserialize: 0 (Core engine deserialization failure)
[TensorRT] ERROR: INVALID_STATE: std::exception 
[TensorRT] ERROR: INVALID_CONFIG: Deserialize the cuda engine failed.

@nvpohanh Is there a way to specify a different compute capability for the target?

@nvpohanh
Copy link

@psyhtest You have to generate the plans on the same GPU which you plan to run these plans on. Have you tried generating the plans on TX1 and/or TX2?

@psyhtest
Copy link

@nvpohanh Yes, but we have maxed out our 128 GB SD card on TX1, which had at least 70 GB of free space when we started :). I've ordered a 400 GB SD card now. And we don't even have external storage on our TX2 module. Is there a way to build only specific things like ResNet?

@psyhtest
Copy link

Further to previous question, is it still necessary to download COCO and object detection models if all I want is to generate TensorRT plans for ResNet?

@nvpohanh
Copy link

@psyhtest To build engines for specific benchmark-scenario combination, do make generate_engines RUN_ARGS="--benchmarks=<BENCHMARK> --scenarios=<SCENARIOS>". Also, I don't think generating engines requires "real" dataset. TRT use random data to do auto-tuning, anyway.

@psyhtest
Copy link

@nvpohanh How about calibration? Don't you need real data for that?

@nvpohanh
Copy link

@psyhtest Calibration can be shared across GPUs (in most cases). Could you try that?

Also, what's the point of trying INT8 on TX1 and/or TX2? Would FP32 suffice?

@psyhtest
Copy link

@nvpohanh We have suspected the same about sharing: calibration done on GTX 1080 seems to be quite similar to that done on TX1.

On TX1 and TX2, we get around 1.8x speedup with FP16 over FP32. We though we would get extra speedup with INT8. Unfortunately, neither hardware supports INT8, according to this matrix. Given that, FP16 is our best option, which we already support via CK.

@nvpohanh
Copy link

Glad that FP16 works and gives you the speedup!

@psyhtest
Copy link

Thanks! But I'm now wondering about @Mamtesh11's original question. Will the way NVIDIA constructed optimized TensorRT plans for Xavier work for the older devices with FP32/FP16 support only? (As I understand, Nano is equivalent to TX1, compute-capability wise.)

@nvpohanh
Copy link

@psyhtest Do you mean generating the TRT plans on Xavier and then run the plans on TX1? I am afraid that it won't work since TRT requires that you generate plans on the same GPU you run on. On the other hand, using TRT to generate plans on TX1 etc. should work.

@psyhtest
Copy link

@nvpohanh I get it that I need to generate and run plans on the same platform. But, IIRC, you construct one graph (SSD Large?) layer by layer. Do you specify the main data type there explicitly? Because if you do and that data type is INT8, then that won't work on any pre-Xavier hardware, right?

@psyhtest
Copy link

(That's what I meant by "the way NVIDIA constructed optimized TensorRT plans for Xavier".)

@nvpohanh
Copy link

We put all the configurable settings in the config files, like this one: https://github.com/mlperf/inference_results_v0.5/blob/master/closed/NVIDIA/measurements/Xavier/resnet/Offline/config.json

Under the hood, the script simply parses the config files and set TensorRT settings accordingly. Therefore, to run on older hardware, you just need to the config file has the correct settings, or you can look into the scripts and find the right TensorRT settings.

We don't currently have plan to provide official config files for MLPerf benchmarks for the older hardware, but feel free to let me know if you run into any issue.

@psyhtest
Copy link

Thanks @nvpohanh, will do!

@psyhtest
Copy link

Please see @ens-lg4's comment here.

@psyhtest
Copy link

@nvpohanh Given the issues with running object detection models on TX1/TX2 that we reported, I'm wondering the Xavier AGX binaries are going to work on the upcoming Xavier NX? In particular, are you aware of any required changes e.g. to the input layout induced by the DLAs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants