-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark #8
Comments
Brilliant! Your project looks awesome, I'll take a look at that! Thanks for sharing it! |
Thanks! I've been thinking about a tool and a place to test Fortran fpm packages, not with the intention of competing, but with the aim of improving the packages. |
That's a very good initiative, how are you thinking about proceeding? would you like PRs to centralize the Benchmarks and try to have them published with a github action? I forked your project to try it out, I managed to get results with gfortran but I'm hitting a few dependencies issues with ifort and ifx. I thought that having a companion sphinx-gallery would be a good way of having the plots neatly organized. |
Yes, exactly. I think this may be the easiest way to get the results.
Each benchmark has an index. In your test, I noticed that the last one for Here are the flags I used for each compiler: fpm.rsp. I used
It looks great. However, I am not familiar with it. I will take a look at it. If you could provide it, that would be great. |
Upsss, that was a typo, fixed. Yes, I saw the dependencies and managed to install blas/lapack for runing with gfortran. But for intel compilers I had to add a bunch of stuff https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html but I still have other compile errors and no time to actually solve it :s ... I'll continue later on to take a look... Also, I was thinking that the benchmarks will be more interesting with -O3 instead of -Ofast, which even compiler developers do not recommend. https://fortran-lang.discourse.group/t/is-ofast-in-gfortran-or-fast-flag-in-intel-fortran-safe-to-use/2755/4 Regarding setting up a sphinx gallery connected to a project, here they have an example and describe how to connect it with a source project. For inspiration, I always look at PyVista they have a github repo with all the sources and a secondary repo that is automatically fed https://github.com/pyvista/pyvista/tree/main/doc ... something like this should work with a |
You can also use -qmkl instead of -llapack and -lblas. By the way, you are right; I will replace -Ofast with -O3.
Alright, I will take a look at it. Thank you! If you find the time, you can send a pull request for the dot_product or any other implementations. |
Perfect, if you get started with that, here a few dependencies that I use for sphinx projects: numpydoc for the documentation style within the python scripts |
So this worked, I had to comment out in the fpm.toml this Couple of questions: How do you measure the speedup? it seems like the ratio is inverted when I look at the plots and values in the data. I didn't check where you compute it, but I would have expected something like The reference value is systematically the benchmark placed in the first place? |
perfect!
You can find it here: link to the code. Yes, you are right; this is inverted now. Thank you for reminding me. Feel free to send a pull request (PR) if you have time, or I will change it as soon as possible.
Yes, exactly. I tried to provide an example demo with some comments here: link to the demo. |
The results change quite a bit from one run to another, for instance here with ifort and the following flags This is very nice and interesting, I think that from a statistical point of view is more than acceptable. I was just wondering then, how could the actual time of the function be extracted from the intermediate operations included to avoid for excessive optimization. This time changes also the ratio as Oh, just saw the label of the abscissa should be updated to |
I am working on speeding up plots. I will write to you again here. |
I'm wondering if something like this could help to have a clearer view: call bench%start_benchmark(1,'dot_product','a = dot_product(u,v)',[p])
time = 0._rk !> a variable defined as time(0:1)
do nl = 1,bench%nloops
time(0) = time(0) + timer() !> a function pointer using the selected method
u = u + real(nl,rk) ! to prevent compiler from optimizing (loop-invariant)
v = v + real(nl,rk) ! to prevent compiler from optimizing (loop-invariant)
time(1) = time(1) + timer()
a = dot_product(u,v)
end do
call bench%stop_benchmark(cmp_gflops , extract_time = time(1)-time(0) ) !> an optional variable to extract time
!! from the analysis that is not associated with the function that is being benchmarked
```
? |
I changed speed-up plot to plot for all problem sizes: I also tried to plot the average weighted speed-up; however, I'm not sure if this provides valuable insights:
I think there are many factors such as the temperature of the CPU, other processes running during benchmarking, different random numbers,... However, I updated the code to use the same random numbers consistently.
I noticed this before. But if you measure this time, you need to calculate the time of the timer function again! In my opinion, for large problem sizes, this could be simply avoided. Or maybe calculate it one time outside the benchmarking object, then extract it. [edited:] Please check the latest results for the dot product generated by the GitHub Action workflow.: https://github.com/gha3mi/forbenchmark/tree/main/benchmarks/dot |
Excellent! These results are very interesting! I'll push a version as is, though locally I had to:
|
I tried something: time = 0._rk
call bench%start_benchmark(7,'kahan', "a = fprod_kahan(u,v)",[p])
do nl = 1,bench%nloops
time(0) = timer()
u = u + real(nl,rk) ! to prevent compiler from optimizing (loop-invariant)
v = v + real(nl,rk) ! to prevent compiler from optimizing (loop-invariant)
time(1) = time(1) + timer() - time(0)
a = fprod_kahan(u,v)
end do
call bench%stop_benchmark(cmp_gflops)
print *, 'inner time: ', time(1)/bench%nloops
...
real(8) function timer() result(y)
call cpu_time(y)
end function And got results in the lines of Meth.: kahan; Des.: a = fprod_kahan(u,v) ; Argi.:100000
Elapsed time : 0.000060600 [s]
Speedup : 0.987 [-]
Performance : 1.650 [GFLOPS]
inner time: 5.958200000000069E-005 So basically most of the time is actually in the two lines avoiding the optimization, and the dot product is almost transparent! Maybe it would be better to test with larger arrays or splitting the loop in a different way. |
Thanks! I merged your PR. Today was busy, I'll take a look at the last messages later. |
Hi,
I'm currently working on the ForBenchmark project and have generated some results for the
dot_product
here. If you are interested, you can add yourdot_product
implementation to this benchmark. Fpm makes it easy to include it as a dependency, and a Python script will generate the results.Best,
Ali
The text was updated successfully, but these errors were encountered: