-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does fftw plan reusage makes sense? #57
Comments
any news on this? I wondered if this idea is relevant for using FFTW or for interpreting the results of gearshifft? |
to finally give an answer on this, I plotted Time of Upload vs Total Time to get the ratio. upload refers to the memcpy operation and the timer measured a ~40% contribution to the total solution time at the worst case. But does this really comes from memcpy? download is the same memcpy operation, just in the other direction. It is smooth and fast, no significant times here. So the long upload time might come from a cache warmup. The rshiny tool is going to get an update to examine such statistics. At the moment I do not plan to change fftw in gearshifft to avoid the memcopies in the aforementioned cases. |
thanks for the update. Interesting findings I believe. Are these results from multi-threaded or single-threaded runs? I am asking as it doesn't need to be warm-up only, but (in a multi-threaded scenario) also cache line trashing. |
true. this is multi-threaded. the single-threaded benchmark is still running on taurus. let's see what we will have there. |
FFTW_MEASURE means, that fftw overwrites the input and output buffers in the planning stage.
After planning the buffers can be filled with data (memcpy).
Plan reusage means to have only one plan at a time.
For non-fftw_measure (estimate, wisdom) plans I think, it is NOT worthwhile to reuse fftw plans as they do not allocate temporary buffers (are we sure?).
But: It might be worthwhile in terms of memcpy. We could save memcpy part as long as padding is not required. The input data coming from BenchmarkExecutor is aligned, but not padded with respect to FFT, so only for padding (Inplace Real2Complex) the memcpy part would be required.
Have to look on the results w.r.t. upload and download times ..
The text was updated successfully, but these errors were encountered: