-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark against faster Python trk loader #16
Comments
Hi @arokem , I've just updated my repository with two additional implementations that visibly speed up the loading. For now, I just named them |
Hi @emanuele, I noticed that the numba version is loading all the data in memory first before processing compared to the other methods that only read the requested streamlines from the file (via a sequence of file seeking + reading). |
@MarcCote , true indeed. The numba version is by no means complete - at the moment it uses too much memory. But in practice, it is the most frequently used in our lab, because it loads large tractograms in a few seconds and memory is not the main issue, a sort of trade-off between time and memory. In the future, I'll try to improve on that side. Thanks for the feedback! |
I made some improvements on parsing the streamline lengths. It is kind of exploiting how uint32 and float32 represent the numbers in binary. Here's the relevant part. I'm curious to see what you guys think, I'm probably missing some corner cases, right? |
I have created an experimental JavaScript implementation that includes a benchmark against other formats. These results may be very different from Python. In particular, JavaScript does not natively support the uint64 offsets required by TRX nor the float16 used by the default converter. I realize this issue is specifically about Python, but each of you may have insights regarding the JavaScript implementation and opportunity for optimizations. One could also use the same datasets for evaluation of both the native Python and C++ implementations. The |
Pretty cool demo. It is intriguing why the trk and tck perform so much better compared to trx! Do you have an intuition why? Also, I have a naive question. Did you compare the loaded streamlines from the different files to make sure the data was properly loaded? I don't see that in trx_bench.mjs |
The TRK and TCK have no compression. The performance of both TRK and TCK are actually pretty slow, because they interleave the track length and vertex position information. They just look fast relative to compressed formats. Segmenting the offsets and the positions makes them a lot faster. The VTK requires a lot of disk space because tff_convert_tractogram uses float64 for the vertex positions. |
For this experiment were the TRX compressed (deflate)? How does it compare if not compress (stored)? And does JavaScript supports memmap (reading a whole chunk to memory at the time) or it has to actually read the bytes into memory (sequential)? |
@MarcCote: The JavaScript code is used by NiiVue which allows users to drag and drop TRK, TCK, VTK, FIB and TRX streamlines using any web enabled device (phone, tablet, computer). This provides a zero-footprint test for these formats. The minimal demo also allows you to visualize DPS (properties) and DPV (scalars) of TRK and TRX files, if they are available. |
|
@neurolabusc Thank you for this, I will dig into it this weekend! I was wondering if it was possible for trx to have the ''0'' compression (Using STORED instead of DEFLATE)? In python, thanks to the zip header we don't have to unzip at all and all files are accessible directly (just like if it was a folder). This would be closer to trk/tck without compression, like the 'base' file format rather than the choice of compression we put on top of it. |
@frheault This is a good point. My original test was with the raw output from your tff_convert_tractogram script, which does use STORED not DEFLATE. However, I only tested uncompressed TRX with pako, and did not try this later when I discovered the faster fflate. I retested with fflate, and does a great job. The residual big performance penalty to TRX is the use of float16, which is not native to JavaScript (or x86-64). The
I would strongly advocate for the specification to be changed to optionally allow offsets to be UINT32 (as this is the maximum allowed by most graphics languages, is sufficient for most processed datasets, and results in smaller files). Likewise, it would be great if your converter provided the option to allow the user to choose the float32 precision for positions. As an aside, your latest modifications to tff_convert_tractogram.py lead to a segmentation fault when I try to convert any tracts, for example from here:
This is a regression as an early commit (14f7cac) works fine. |
That regression is a good reminder that we need tests for everything. At least smoke testing. |
I did a little research, while x86-64 does not natively support float16, it includes the SSE/AVX instructions for converting half (float16) to single precision (float32) _mm_cvtph_ps/_mm256_cvtph_ps and the reverse _mm_cvtps_ph/_mm256_cvtps_ph. Likewise, ARM has similar Neon instructions VCVT_F32_F16 and VCVT_F16_F32. This would allow low level languages like C and sophisticated libraries like Numpy to rapidly decode/encode float16s. Unfortunately, these are not available in JavaScript. |
At some point, should compare with @emanuele's https://github.com/emanuele/load_trk, for a comparison for a more optimized Python trk reader.
The text was updated successfully, but these errors were encountered: