Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slower than Julia for Lorenz system? #3

Open
Nicholaswogan opened this issue Sep 5, 2021 · 8 comments
Open

Slower than Julia for Lorenz system? #3

Nicholaswogan opened this issue Sep 5, 2021 · 8 comments

Comments

@Nicholaswogan
Copy link

Here are some micro julia benchmarks for Lorenz equations: https://github.com/Nicholaswogan/NumbaLSODA/tree/main/benchmark

With the best Julia methods, integrating to 100 seconds and interpolating takes a few milliseconds. I compared this to FLINT ( see attached), which is slower by a factor of at least 5.

Does this look right?

I had to adjust some of the FLINT compiler options in the CMakeLists.txt because I was getting errors. So maybe these adjustments are responsible for the speed difference.

I used gfortran 11 and compiled in Release mode.

FLINT-benchmark.zip

@princemahajan
Copy link
Owner

princemahajan commented Sep 6, 2021

Interesting, I can reproduce it. Following are my results. Even without interpolation, I am seeing it's taking more time than Julia's numbers you published. Need to investigate this.

 --- FLINT Stat Results (         100  loops)  ---
  
 A. Natural Step-size
      Method   Time (ms)      FCalls    Accepted    Rejected
  DOP54        0.328E+01       63617       10575          30
  DOP853       0.172E+01       37349        2466         705
  Verner65E    0.344E+01       53682        6640          78
  Verner98R    0.281E+01       39628        2141         358
 B. Interpolated Grid
      Method   Time (ms)      FCalls    Accepted    Rejected
  DOP54        0.594E+01       63617       10575          30
  DOP853       0.344E+01       44747        2466         705
  Verner65E    0.688E+01       60322        6640          78
  Verner98R    0.578E+01       50333        2141         358

@princemahajan
Copy link
Owner

princemahajan commented Sep 6, 2021

@Nicholaswogan So I tried to reproduce your Julia benchmark results without using the benchmarking function but with a simple loop to make sure the FLINT test script and your script are identical. And I am seeing much worse performance for Julia code this way and not entirely sure why. See your modified Julia script and results on my machine.

`using OrdinaryDiffEq, StaticArrays, BenchmarkTools, Sundials, LSODA, ModelingToolkit

function lorenz_static(u,p,t)
@inbounds begin
dx = p[1](u[2]-u[1])
dy = u[1]
(p[2]-u[3]) - u[2]
dz = u[1]*u[2] - p[3]*u[3]
end
SA[dx,dy,dz]
end
u0 = SA[1.0,0.0,0.0]
p = SA[10.0,28.0,8/3]
tspan = (0.0,100.0)

prob = ODEProblem(lorenz_static,u0,tspan,p)
#@Btime solve(prob,Tsit5(),saveat=0.1,reltol=1.0e-8,abstol=1.0e-8) # 2.612 ms (30 allocations: 59.22 KiB)
#@Btime solve(prob,Vern8(),saveat=0.1,reltol=1.0e-8,abstol=1.0e-8); # 1.803 ms (31 allocations: 65.70 KiB)

rtime = @Elapsed for i in 1:100
solve(prob,Tsit5(),saveat=0.1,reltol=1.0e-8,abstol=1.0e-8) # 2.612 ms (30 allocations: 59.22 KiB)
end
println("Tsit5: ", rtime/100*1000, " ms")

rtime = @Elapsed for i in 1:100
solve(prob,Vern8(),saveat=0.1,reltol=1.0e-8,abstol=1.0e-8); # 1.803 ms (31 allocations: 65.70 KiB)
end
println("Vern8: ", rtime/100*1000, " ms")`

julia> include("TestLorenz.jl")
Lorenz
Tsit5: 15.803541000000001 ms
Vern8: 38.870095 ms

julia> include("TestLorenz.jl")
Lorenz
Tsit5: 13.455046 ms
Vern8: 38.820251999999996 ms

@Nicholaswogan
Copy link
Author

@princemahajan I get the same times as you, when I run that exact julia script. But for some reason I get faster times when I uncomment the @btime:

using OrdinaryDiffEq, StaticArrays, BenchmarkTools, Sundials, LSODA, ModelingToolkit

function lorenz_static(u,p,t)
    @inbounds begin
        dx = p[1]*(u[2]-u[1])
        dy = u[1]*(p[2]-u[3]) - u[2]
        dz = u[1]*u[2] - p[3]*u[3]
    end
    SA[dx,dy,dz]
end
u0 = SA[1.0,0.0,0.0]
p = SA[10.0,28.0,8/3]
tspan = (0.0,100.0)

prob = ODEProblem(lorenz_static,u0,tspan,p)
@btime solve(prob,Tsit5(),saveat=0.1,reltol=1.0e-8,abstol=1.0e-8) # 2.612 ms (30 allocations: 59.22 KiB)
@btime solve(prob,Vern8(),saveat=0.1,reltol=1.0e-8,abstol=1.0e-8); # 1.803 ms (31 allocations: 65.70 KiB)

rtime = @elapsed for i in 1:100
solve(prob,Tsit5(),saveat=0.1,reltol=1.0e-8,abstol=1.0e-8) # 2.612 ms (30 allocations: 59.22 KiB)
end
println("Tsit5: ", rtime/100*1000, " ms")

rtime = @elapsed for i in 1:100
solve(prob,Vern8(),saveat=0.1,reltol=1.0e-8,abstol=1.0e-8); # 1.803 ms (31 allocations: 65.70 KiB)
end
println("Vern8: ", rtime/100*1000, " ms")

The results are

julia> include("TestLorenz.jl")
  2.446 ms (31 allocations: 59.23 KiB)
  1.399 ms (32 allocations: 65.72 KiB)
Tsit5: 2.60718273 ms
Vern8: 1.63626658 ms

julia> include("TestLorenz.jl")
  2.450 ms (31 allocations: 59.23 KiB)
  1.404 ms (32 allocations: 65.72 KiB)
Tsit5: 2.66984644 ms
Vern8: 1.7514147299999998 ms

@princemahajan
Copy link
Owner

yes i see that too. this is weird.

I also like to point out that generally in my benchmarking codes, I perturb the initial conditions slightly before every integration in the loop, this is closer to reality as you not gonna integrate the exact same initial conditions for the same time span again and again in practice. In julia diffeq, this will require the ODEProblem() invocation to be part of the timed loop. But in FLINT, you specify initial conditions directly in Integrate() method so this is more efficient this way.

@princemahajan
Copy link
Owner

I thought I found the issue but not. With @cpuelapsed macro that is closer to cpu_time() function I am using in FLINT for timing the code, the results are always same. FLINT's solvers do seem to be taking more time for this equation than Julia's TSIT5 and Verner8 solvers. If I can extract the number of function calls made and accepted/rejected steps from Julia solvers, then we will get an idea why is that.

julia> include("TestLorenz.jl")
Tsit5: 12.633589 ms
Vern8: 35.851274000000004 ms
Tsit5: 3.43 ms
Vern8: 1.72 ms

@Nicholaswogan
Copy link
Author

You can get Julia ODE solver stats with

sol = solve(prob,Tsit5(),saveat=0.1,reltol=1.0e-8,abstol=1.0e-8);
sol.destats

result is

DiffEqBase.DEStats
Number of function 1 evaluations:                  70647
Number of function 2 evaluations:                  0
Number of W matrix evaluations:                    0
Number of linear solves:                           0
Number of Jacobians created:                       0
Number of nonlinear solver iterations:             0
Number of nonlinear solver convergence failures:   0
Number of rootfind condition calls:                0
Number of accepted steps:                          11774
Number of rejected steps:                          0

@princemahajan
Copy link
Owner

princemahajan commented Sep 6, 2021 via email

@princemahajan
Copy link
Owner

Following are the test results from the latest FLINT code and Julia Diffeq package. In these results, Julia's Tsit5 is slower, DP5 and DP8 take almost same time, and Vern6 and Vern9 are faster than FLINT's corresponding solvers.

 --- FLINT Stats ---
  
 A. Internal Step-size
      Method     Time(s)      FCalls    Accepted    Rejected Total Steps
  DOP54        0.127E+01      253225       42078         151       42229
  DOP853       0.297E+00       74888        5839         438        6277
  Verner65E    0.105E+01      177861       22123         125       22248
  Verner98R    0.453E+00       73507        4385         223        4608
 B. Interpolated Grid
      Method     Time(s)      FCalls    Accepted    Rejected Total Steps
  DOP54        0.230E+01      253225       42078         151       42229
  DOP853       0.672E+00       92405        5839         438        6277
  Verner65E    0.266E+01      199984       22123         125       22248
  Verner98R    0.123E+01       95432        4385         223        4608
julia> sol = main();
Tsit5: 1.4893084 s
DP5: 1.2646739 s
DP8: 0.2551725 s
Vern6: 0.8604503 s
Vern9: 0.2845905 s

============
julia script:

##

using OrdinaryDiffEq, StaticArrays, BenchmarkTools, Sundials, LSODA, ModelingToolkit

function lorenz_static(u,p,t)
    @inbounds begin
        dx = p[1]*(u[2]-u[1])
        dy = u[1]*(p[2]-u[3]) - u[2]
        dz = u[1]*u[2] - p[3]*u[3]
    end
    SA[dx,dy,dz]
end


##

function main()

u0 = SA[1.0,0.0,0.0]
p = SA[10.0,28.0,8/3]
tspan = (0.0,100.0)

prob = ODEProblem(lorenz_static,u0,tspan,p)
#@btime solve(prob,Tsit5(),saveat=0.1,reltol=1.0e-11,abstol=1.0e-14) # 2.612 ms (30 allocations: 59.22 KiB)
#@btime solve(prob,Vern9(),saveat=0.1,reltol=1.0e-11,abstol=1.0e-14); # 1.803 ms (31 allocations: 65.70 KiB)
sol = 0



rtime = @elapsed for i in 1:100
    sol = solve(prob,Tsit5(),reltol=1.0e-11,abstol=1.0e-14, dense=false) # 2.612 ms (30 allocations: 59.22 KiB)
end
println("Tsit5: ", rtime, " s")

rtime = @elapsed for i in 1:100
    sol = solve(prob,DP5(),reltol=1.0e-11,abstol=1.0e-14,dense=false); # 1.803 ms (31 allocations: 65.70 KiB)
end
println("DP5: ", rtime, " s")

rtime = @elapsed for i in 1:100
    sol = solve(prob,DP8(),reltol=1.0e-11,abstol=1.0e-14,dense=false); # 1.803 ms (31 allocations: 65.70 KiB)
end
println("DP8: ", rtime, " s")

rtime = @elapsed for i in 1:100
    sol = solve(prob,Vern6(),reltol=1.0e-11,abstol=1.0e-14,dense=false); # 1.803 ms (31 allocations: 65.70 KiB)
end
println("Vern6: ", rtime, " s")

rtime = @elapsed for i in 1:100
    sol = solve(prob,Vern9(),reltol=1.0e-11,abstol=1.0e-14,dense=false); # 1.803 ms (31 allocations: 65.70 KiB)
end
println("Vern9: ", rtime, " s")

return sol

end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants