IPM/HSD: for BigFloat optimization, computing ξg_ in solve_newton_system! causes a big portion of total allocations #122

nsajko · 2022-04-20T14:17:05Z

This line causes a lot of allocation, reported by --track-allocation=all:

Tulip.jl/src/IPM/HSD/step.jl

Line 216 in 2b055fd

@timeit hsd.timer "ξg_" ξg_ = (ξg + ξtk / pt.τ

So there's possibly room for improvement at or around that line.

The text was updated successfully, but these errors were encountered:

mtanneau · 2022-04-21T13:16:29Z

I tried a couple of different syntaxes, and it looks like, when using BigFloat arithmetic, things like

dot((ξxzl ./ pt.xl) .* dat.lflag, dat.l .* dat.lflag)

allocate a lot of memory indeed.
IIRC, I chose that syntax because then I didn't have to worry about dimensions (everything has the same size, and what needs to be zero is always zero).

Based on your comment, I think a better design choice would be to drop these .* flag and slice the arrays instead.
Something like

dot( (ξxzl ./ pt.xl)[dat.lflag], dat.l[dat.lflag])

nsajko · 2022-04-21T16:18:24Z

A quick thought, maybe useful: I think that part of the reason for the current inefficiency is the bool-float multiplication method from base/bool.jl Base.:*(x::Bool, y::T)::promote_type(Bool,T) where T<:AbstractFloat, which takes special care for the case when the float value is NaN or infinity. I'm not sure how does Tulip handle NaN internally, but an idea that comes to mind is this:
If it's known that y (of floating point type) is finite and not NaN, it would probably be faster to convert x from Bool to UInt8 or UInt before multiplying.

mtanneau · 2022-04-21T17:11:05Z

The main reason for using bool-float multiplication was because some of these can be Inf or NaN, yes. They get multiplied by false and get reset to zero, which makes the math work.

I did not realize at the time it could cause big performance issues.

nsajko · 2023-01-06T00:31:23Z

I started optimizing IPM/HSD using MutableArithmetics with good results, however I wonder which arithmetics are supposed to be supported for HSD? It's likely that I'll break some non-BigFloat arithmetics while developing, so I'm wondering how much do the current tests cover this?

We now do basically this for a logical vector l: `dot(a[l], b[l])`, instead of `dot(a .* l, b .* l) as before. This is as suggested here: ds4dm#122 (comment) Helps with the performance of the BigFloat arithmetic. This is the output with the script from the previous commit: ``` exp = 6: 27.391877 seconds (142.35 M allocations: 21.646 GiB, 11.38% gc time, 23.70% compilation time) elapsed time (ns): 27391877044 gc time (ns): 3116697350 bytes allocated: 23242616545 pool allocs: 142136686 non-pool GC allocs: 9144 malloc() calls: 118007 realloc() calls: 88649 free() calls: 116981 minor collections: 506 full collections: 0 exp = 9: 289.050404 seconds (1.12 G allocations: 175.440 GiB, 29.73% gc time) elapsed time (ns): 289050404133 gc time (ns): 85940338982 bytes allocated: 188377360520 pool allocs: 1117528611 non-pool GC allocs: 75494 malloc() calls: 728937 realloc() calls: 554273 free() calls: 743507 minor collections: 4186 full collections: 3 ``` Fixes ds4dm#122

@benchmark

We now do basically this for a logical vector l: `dot(a[l], b[l])`, instead of `dot(a .* l, b .* l) as before. This is as suggested here: ds4dm#122 (comment) Helps with the performance of the BigFloat arithmetic. A Julia script and a Unix shell script were used to conduct an experiment for assesing the impact of this commit and the previous commit on performance. The scripts and the resulting CSV file follow. The benchmark experiment is conducted and the CSV created by removing sources of system load on a computer and running the shell script four times: once with the `init` command and three times with the `run` command, each time checking out a different commit in the Tulip git repo. Unix (Bourne) shell script: ```sh set -u command=$1 julia_opts='-O3 --min-optlevel=3 --heap-size-hint=5G --depwarn=error --warn-overwrite=yes' script=tulip_benchmark.jl case "$command" in init_csv) printf '%s,%s,%s,%s\n' 'Tulip version' estimator 'measurement type' value ;; run) tulip_version=$2 $PATH_TO_JULIA_BIN $julia_opts "$script" "$tulip_version" ;; *) printf '%s\n' error 2>&1 exit 1 ;; esac ``` Julia script: ```julia const benchmark_seconds = 500 const polynomial_degree = 20 setprecision(BigFloat, 12 * 2^7) using BenchmarkTools import FindMinimaxPolynomial, # v0.2.3 Tulip, MathOptInterface const FMP = FindMinimaxPolynomial const MMX = FMP.Minimax const PPTI = FMP.PolynomialPassingThroughIntervals const NE = FMP.NumericalErrorTypes const to_poly = FMP.ToSparsePolynomial.to_sparse_polynomial const mmx = MMX.minimax_polynomial const error_type_relative = NE.RelativeError() const MOI = MathOptInterface const itv_max_err = FMP.ApproximateInfinityNorm.interval_max_err function make_lp() lp = Tulip.Optimizer{BigFloat}() # Remove iteration limit just in case MOI.set(lp, MOI.RawOptimizerAttribute("IPM_IterationsLimit"), 2000) # Disable presolve, speeds things up #MOI.set(lp, MOI.RawOptimizerAttribute("Presolve_Level"), 0) lp end const itv = (-big"2.0"^-3, big"45.0") odd_monomials(n::Int) = 1:2:n sind_mmx(n::Int) = mmx( make_lp, sind, (itv,), odd_monomials(n), # Small factor to have less variance in the results initial_perturb_factor = 1//(2^20), # We're benchmarking LP, so disable other stuff worst_segments_density = 5, worst_segments_breadth_limit = 2, worst_segments_depth_ratio = 1/2, # Exit right after the first step exit_condition = true, ) function report(estimator; benchmark, benchmark_name) b = estimator(benchmark) println("$benchmark_name,$estimator,time,$(b.time)") println("$benchmark_name,$estimator,gctime,$(b.gctime)") println("$benchmark_name,$estimator,memory,$(b.memory)") println("$benchmark_name,$estimator,allocs,$(b.allocs)") end function report(;benchmark, benchmark_name) for quantile in (minimum, median, maximum) report( quantile, benchmark = benchmark, benchmark_name = benchmark_name, ) end end report( benchmark = (@benchmark sind_mmx(polynomial_degree) seconds=benchmark_seconds), benchmark_name = first(ARGS), ) ``` CSV results: ```csv Tulip version,estimator,measurement type,value v0.9.5,minimum,time,2.63759609e8 v0.9.5,minimum,gctime,3.4505713e7 v0.9.5,minimum,memory,495299296 v0.9.5,minimum,allocs,3629874 v0.9.5,median,time,4.2594161e8 v0.9.5,median,gctime,1.3250321e8 v0.9.5,median,memory,495299296 v0.9.5,median,allocs,3629874 v0.9.5,maximum,time,4.55935021e8 v0.9.5,maximum,gctime,1.40426286e8 v0.9.5,maximum,memory,495299296 v0.9.5,maximum,allocs,3629874 MutableArithmetics for IPM/HSD,minimum,time,2.57993117e8 MutableArithmetics for IPM/HSD,minimum,gctime,2.9403466e7 MutableArithmetics for IPM/HSD,minimum,memory,442052896 MutableArithmetics for IPM/HSD,minimum,allocs,3238720 MutableArithmetics for IPM/HSD,median,time,4.22323273e8 MutableArithmetics for IPM/HSD,median,gctime,1.282365305e8 MutableArithmetics for IPM/HSD,median,memory,442052896 MutableArithmetics for IPM/HSD,median,allocs,3238720 MutableArithmetics for IPM/HSD,maximum,time,4.56330849e8 MutableArithmetics for IPM/HSD,maximum,gctime,1.57061172e8 MutableArithmetics for IPM/HSD,maximum,memory,442052896 MutableArithmetics for IPM/HSD,maximum,allocs,3238720 IPM/HSD: use logical slicing ...,minimum,time,2.40996648e8 IPM/HSD: use logical slicing ...,minimum,gctime,2.5335783e7 IPM/HSD: use logical slicing ...,minimum,memory,386588512 IPM/HSD: use logical slicing ...,minimum,allocs,2833356 IPM/HSD: use logical slicing ...,median,time,3.76039574e8 IPM/HSD: use logical slicing ...,median,gctime,1.06930941e8 IPM/HSD: use logical slicing ...,median,memory,386588512 IPM/HSD: use logical slicing ...,median,allocs,2833356 IPM/HSD: use logical slicing ...,maximum,time,4.00347376e8 IPM/HSD: use logical slicing ...,maximum,gctime,1.27260987e8 IPM/HSD: use logical slicing ...,maximum,memory,386588512 IPM/HSD: use logical slicing ...,maximum,allocs,2833356 ``` Fixes ds4dm#122

@benchmark

We now do basically this for a logical vector l: `dot(a[l], b[l])`, instead of `dot(a .* l, b .* l) as before. This is as suggested here: ds4dm#122 (comment) Helps with the performance of the BigFloat arithmetic. A Julia script and a Unix shell script were used to conduct an experiment for assesing the impact of this commit and the previous commit on performance. The scripts and the resulting CSV file follow. The benchmark experiment is conducted and the CSV created by removing sources of system load on a computer and running the shell script four times: once with the `init` command and three times with the `run` command, each time checking out a different commit in the Tulip git repo. Unix (Bourne) shell script: ```sh set -u command=$1 julia_opts='-O3 --min-optlevel=3 --heap-size-hint=5G --depwarn=error --warn-overwrite=yes' script=tulip_benchmark.jl case "$command" in init_csv) printf '%s,%s,%s,%s\n' 'Tulip version' estimator 'measurement type' value ;; run) tulip_version=$2 $PATH_TO_JULIA_BIN $julia_opts "$script" "$tulip_version" ;; *) printf '%s\n' error 2>&1 exit 1 ;; esac ``` Julia script: ```julia const benchmark_seconds = 500 const polynomial_degree = 20 setprecision(BigFloat, 12 * 2^7) using BenchmarkTools import FindMinimaxPolynomial, # v0.2.3 Tulip, MathOptInterface const FMP = FindMinimaxPolynomial const MMX = FMP.Minimax const PPTI = FMP.PolynomialPassingThroughIntervals const NE = FMP.NumericalErrorTypes const to_poly = FMP.ToSparsePolynomial.to_sparse_polynomial const mmx = MMX.minimax_polynomial const error_type_relative = NE.RelativeError() const MOI = MathOptInterface const itv_max_err = FMP.ApproximateInfinityNorm.interval_max_err function make_lp() lp = Tulip.Optimizer{BigFloat}() # Remove iteration limit just in case MOI.set(lp, MOI.RawOptimizerAttribute("IPM_IterationsLimit"), 2000) # Disable presolve, speeds things up #MOI.set(lp, MOI.RawOptimizerAttribute("Presolve_Level"), 0) lp end const itv = (-big"2.0"^-3, big"45.0") odd_monomials(n::Int) = 1:2:n sind_mmx(n::Int) = mmx( make_lp, sind, (itv,), odd_monomials(n), # Small factor to have less variance in the results initial_perturb_factor = 1//(2^20), # We're benchmarking LP, so disable other stuff worst_segments_density = 5, worst_segments_breadth_limit = 2, worst_segments_depth_ratio = 1/2, # Exit right after the first step exit_condition = true, ) function report(estimator; benchmark, benchmark_name) b = estimator(benchmark) println("$benchmark_name,$estimator,time,$(b.time)") println("$benchmark_name,$estimator,gctime,$(b.gctime)") println("$benchmark_name,$estimator,memory,$(b.memory)") println("$benchmark_name,$estimator,allocs,$(b.allocs)") end function report(;benchmark, benchmark_name) for quantile in (minimum, median, maximum) report( quantile, benchmark = benchmark, benchmark_name = benchmark_name, ) end end report( benchmark = (@benchmark sind_mmx(polynomial_degree) seconds=benchmark_seconds), benchmark_name = first(ARGS), ) ``` CSV results: ```csv Tulip version,estimator,measurement type,value v0.9.5,minimum,time,2.63759609e8 v0.9.5,minimum,gctime,3.4505713e7 v0.9.5,minimum,memory,495299296 v0.9.5,minimum,allocs,3629874 v0.9.5,median,time,4.2594161e8 v0.9.5,median,gctime,1.3250321e8 v0.9.5,median,memory,495299296 v0.9.5,median,allocs,3629874 v0.9.5,maximum,time,4.55935021e8 v0.9.5,maximum,gctime,1.40426286e8 v0.9.5,maximum,memory,495299296 v0.9.5,maximum,allocs,3629874 MutableArithmetics for IPM/HSD,minimum,time,2.57993117e8 MutableArithmetics for IPM/HSD,minimum,gctime,2.9403466e7 MutableArithmetics for IPM/HSD,minimum,memory,442052896 MutableArithmetics for IPM/HSD,minimum,allocs,3238720 MutableArithmetics for IPM/HSD,median,time,4.22323273e8 MutableArithmetics for IPM/HSD,median,gctime,1.282365305e8 MutableArithmetics for IPM/HSD,median,memory,442052896 MutableArithmetics for IPM/HSD,median,allocs,3238720 MutableArithmetics for IPM/HSD,maximum,time,4.56330849e8 MutableArithmetics for IPM/HSD,maximum,gctime,1.57061172e8 MutableArithmetics for IPM/HSD,maximum,memory,442052896 MutableArithmetics for IPM/HSD,maximum,allocs,3238720 IPM/HSD: use logical slicing ...,minimum,time,2.40996648e8 IPM/HSD: use logical slicing ...,minimum,gctime,2.5335783e7 IPM/HSD: use logical slicing ...,minimum,memory,386588512 IPM/HSD: use logical slicing ...,minimum,allocs,2833356 IPM/HSD: use logical slicing ...,median,time,3.76039574e8 IPM/HSD: use logical slicing ...,median,gctime,1.06930941e8 IPM/HSD: use logical slicing ...,median,memory,386588512 IPM/HSD: use logical slicing ...,median,allocs,2833356 IPM/HSD: use logical slicing ...,maximum,time,4.00347376e8 IPM/HSD: use logical slicing ...,maximum,gctime,1.27260987e8 IPM/HSD: use logical slicing ...,maximum,memory,386588512 IPM/HSD: use logical slicing ...,maximum,allocs,2833356 ``` Fixes ds4dm#122

@benchmark

We now do basically this for a logical vector l: `dot(a[l], b[l])`, instead of `dot(a .* l, b .* l) as before. This is as suggested here: ds4dm#122 (comment) Helps with the performance of the BigFloat arithmetic. A Julia script and a Unix shell script were used to conduct an experiment for assesing the impact of this commit and the previous commit on performance. The scripts and the resulting CSV file follow. The benchmark experiment is conducted and the CSV created by removing sources of system load on a computer and running the shell script four times: once with the `init_csv` command and three times with the `run` command, each time checking out a different commit in the Tulip git repo. Unix (Bourne) shell script: ```sh set -u command=$1 julia_opts='-O3 --min-optlevel=3 --heap-size-hint=5G --depwarn=error --warn-overwrite=yes' script=tulip_benchmark.jl case "$command" in init_csv) printf '%s,%s,%s,%s\n' 'Tulip version' estimator 'measurement type' value ;; run) tulip_version=$2 $PATH_TO_JULIA_BIN $julia_opts "$script" "$tulip_version" ;; *) printf '%s\n' error 2>&1 exit 1 ;; esac ``` Julia script: ```julia const benchmark_seconds = 500 const polynomial_degree = 20 setprecision(BigFloat, 12 * 2^7) using BenchmarkTools import FindMinimaxPolynomial, # v0.2.3 Tulip, MathOptInterface const FMP = FindMinimaxPolynomial const MMX = FMP.Minimax const PPTI = FMP.PolynomialPassingThroughIntervals const NE = FMP.NumericalErrorTypes const to_poly = FMP.ToSparsePolynomial.to_sparse_polynomial const mmx = MMX.minimax_polynomial const error_type_relative = NE.RelativeError() const MOI = MathOptInterface const itv_max_err = FMP.ApproximateInfinityNorm.interval_max_err function make_lp() lp = Tulip.Optimizer{BigFloat}() # Remove iteration limit just in case MOI.set(lp, MOI.RawOptimizerAttribute("IPM_IterationsLimit"), 2000) # Disable presolve, speeds things up #MOI.set(lp, MOI.RawOptimizerAttribute("Presolve_Level"), 0) lp end const itv = (-big"2.0"^-3, big"45.0") odd_monomials(n::Int) = 1:2:n sind_mmx(n::Int) = mmx( make_lp, sind, (itv,), odd_monomials(n), # Small factor to have less variance in the results initial_perturb_factor = 1//(2^20), # We're benchmarking LP, so disable other stuff worst_segments_density = 5, worst_segments_breadth_limit = 2, worst_segments_depth_ratio = 1/2, # Exit right after the first step exit_condition = true, ) function report(estimator; benchmark, benchmark_name) b = estimator(benchmark) println("$benchmark_name,$estimator,time,$(b.time)") println("$benchmark_name,$estimator,gctime,$(b.gctime)") println("$benchmark_name,$estimator,memory,$(b.memory)") println("$benchmark_name,$estimator,allocs,$(b.allocs)") end function report(;benchmark, benchmark_name) for quantile in (minimum, median, maximum) report( quantile, benchmark = benchmark, benchmark_name = benchmark_name, ) end end report( benchmark = (@benchmark sind_mmx(polynomial_degree) seconds=benchmark_seconds), benchmark_name = first(ARGS), ) ``` CSV results: ```csv Tulip version,estimator,measurement type,value v0.9.5,minimum,time,2.63759609e8 v0.9.5,minimum,gctime,3.4505713e7 v0.9.5,minimum,memory,495299296 v0.9.5,minimum,allocs,3629874 v0.9.5,median,time,4.2594161e8 v0.9.5,median,gctime,1.3250321e8 v0.9.5,median,memory,495299296 v0.9.5,median,allocs,3629874 v0.9.5,maximum,time,4.55935021e8 v0.9.5,maximum,gctime,1.40426286e8 v0.9.5,maximum,memory,495299296 v0.9.5,maximum,allocs,3629874 MutableArithmetics for IPM/HSD,minimum,time,2.57993117e8 MutableArithmetics for IPM/HSD,minimum,gctime,2.9403466e7 MutableArithmetics for IPM/HSD,minimum,memory,442052896 MutableArithmetics for IPM/HSD,minimum,allocs,3238720 MutableArithmetics for IPM/HSD,median,time,4.22323273e8 MutableArithmetics for IPM/HSD,median,gctime,1.282365305e8 MutableArithmetics for IPM/HSD,median,memory,442052896 MutableArithmetics for IPM/HSD,median,allocs,3238720 MutableArithmetics for IPM/HSD,maximum,time,4.56330849e8 MutableArithmetics for IPM/HSD,maximum,gctime,1.57061172e8 MutableArithmetics for IPM/HSD,maximum,memory,442052896 MutableArithmetics for IPM/HSD,maximum,allocs,3238720 IPM/HSD: use logical slicing ...,minimum,time,2.40996648e8 IPM/HSD: use logical slicing ...,minimum,gctime,2.5335783e7 IPM/HSD: use logical slicing ...,minimum,memory,386588512 IPM/HSD: use logical slicing ...,minimum,allocs,2833356 IPM/HSD: use logical slicing ...,median,time,3.76039574e8 IPM/HSD: use logical slicing ...,median,gctime,1.06930941e8 IPM/HSD: use logical slicing ...,median,memory,386588512 IPM/HSD: use logical slicing ...,median,allocs,2833356 IPM/HSD: use logical slicing ...,maximum,time,4.00347376e8 IPM/HSD: use logical slicing ...,maximum,gctime,1.27260987e8 IPM/HSD: use logical slicing ...,maximum,memory,386588512 IPM/HSD: use logical slicing ...,maximum,allocs,2833356 ``` Fixes ds4dm#122

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IPM/HSD: for BigFloat optimization, computing ξg_ in solve_newton_system! causes a big portion of total allocations #122

IPM/HSD: for BigFloat optimization, computing ξg_ in solve_newton_system! causes a big portion of total allocations #122

nsajko commented Apr 20, 2022

mtanneau commented Apr 21, 2022

nsajko commented Apr 21, 2022

mtanneau commented Apr 21, 2022

nsajko commented Jan 6, 2023

IPM/HSD: for BigFloat optimization, computing ξg_ in solve_newton_system! causes a big portion of total allocations #122

IPM/HSD: for BigFloat optimization, computing ξg_ in solve_newton_system! causes a big portion of total allocations #122

Comments

nsajko commented Apr 20, 2022

mtanneau commented Apr 21, 2022

nsajko commented Apr 21, 2022

mtanneau commented Apr 21, 2022

nsajko commented Jan 6, 2023