-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPM/HSD: for BigFloat optimization, computing ξg_ in solve_newton_system! causes a big portion of total allocations #122
Comments
I tried a couple of different syntaxes, and it looks like, when using BigFloat arithmetic, things like dot((ξxzl ./ pt.xl) .* dat.lflag, dat.l .* dat.lflag) allocate a lot of memory indeed. Based on your comment, I think a better design choice would be to drop these dot( (ξxzl ./ pt.xl)[dat.lflag], dat.l[dat.lflag]) |
A quick thought, maybe useful: I think that part of the reason for the current inefficiency is the bool-float multiplication method from base/bool.jl |
The main reason for using bool-float multiplication was because some of these can be Inf or NaN, yes. They get multiplied by I did not realize at the time it could cause big performance issues. |
I started optimizing IPM/HSD using MutableArithmetics with good results, however I wonder which arithmetics are supposed to be supported for HSD? It's likely that I'll break some non-BigFloat arithmetics while developing, so I'm wondering how much do the current tests cover this? |
We now do basically this for a logical vector l: `dot(a[l], b[l])`, instead of `dot(a .* l, b .* l) as before. This is as suggested here: ds4dm#122 (comment) Helps with the performance of the BigFloat arithmetic. This is the output with the script from the previous commit: ``` exp = 6: 27.391877 seconds (142.35 M allocations: 21.646 GiB, 11.38% gc time, 23.70% compilation time) elapsed time (ns): 27391877044 gc time (ns): 3116697350 bytes allocated: 23242616545 pool allocs: 142136686 non-pool GC allocs: 9144 malloc() calls: 118007 realloc() calls: 88649 free() calls: 116981 minor collections: 506 full collections: 0 exp = 9: 289.050404 seconds (1.12 G allocations: 175.440 GiB, 29.73% gc time) elapsed time (ns): 289050404133 gc time (ns): 85940338982 bytes allocated: 188377360520 pool allocs: 1117528611 non-pool GC allocs: 75494 malloc() calls: 728937 realloc() calls: 554273 free() calls: 743507 minor collections: 4186 full collections: 3 ``` Fixes ds4dm#122
We now do basically this for a logical vector l: `dot(a[l], b[l])`, instead of `dot(a .* l, b .* l) as before. This is as suggested here: ds4dm#122 (comment) Helps with the performance of the BigFloat arithmetic. A Julia script and a Unix shell script were used to conduct an experiment for assesing the impact of this commit and the previous commit on performance. The scripts and the resulting CSV file follow. The benchmark experiment is conducted and the CSV created by removing sources of system load on a computer and running the shell script four times: once with the `init` command and three times with the `run` command, each time checking out a different commit in the Tulip git repo. Unix (Bourne) shell script: ```sh set -u command=$1 julia_opts='-O3 --min-optlevel=3 --heap-size-hint=5G --depwarn=error --warn-overwrite=yes' script=tulip_benchmark.jl case "$command" in init_csv) printf '%s,%s,%s,%s\n' 'Tulip version' estimator 'measurement type' value ;; run) tulip_version=$2 $PATH_TO_JULIA_BIN $julia_opts "$script" "$tulip_version" ;; *) printf '%s\n' error 2>&1 exit 1 ;; esac ``` Julia script: ```julia const benchmark_seconds = 500 const polynomial_degree = 20 setprecision(BigFloat, 12 * 2^7) using BenchmarkTools import FindMinimaxPolynomial, # v0.2.3 Tulip, MathOptInterface const FMP = FindMinimaxPolynomial const MMX = FMP.Minimax const PPTI = FMP.PolynomialPassingThroughIntervals const NE = FMP.NumericalErrorTypes const to_poly = FMP.ToSparsePolynomial.to_sparse_polynomial const mmx = MMX.minimax_polynomial const error_type_relative = NE.RelativeError() const MOI = MathOptInterface const itv_max_err = FMP.ApproximateInfinityNorm.interval_max_err function make_lp() lp = Tulip.Optimizer{BigFloat}() # Remove iteration limit just in case MOI.set(lp, MOI.RawOptimizerAttribute("IPM_IterationsLimit"), 2000) # Disable presolve, speeds things up #MOI.set(lp, MOI.RawOptimizerAttribute("Presolve_Level"), 0) lp end const itv = (-big"2.0"^-3, big"45.0") odd_monomials(n::Int) = 1:2:n sind_mmx(n::Int) = mmx( make_lp, sind, (itv,), odd_monomials(n), # Small factor to have less variance in the results initial_perturb_factor = 1//(2^20), # We're benchmarking LP, so disable other stuff worst_segments_density = 5, worst_segments_breadth_limit = 2, worst_segments_depth_ratio = 1/2, # Exit right after the first step exit_condition = true, ) function report(estimator; benchmark, benchmark_name) b = estimator(benchmark) println("$benchmark_name,$estimator,time,$(b.time)") println("$benchmark_name,$estimator,gctime,$(b.gctime)") println("$benchmark_name,$estimator,memory,$(b.memory)") println("$benchmark_name,$estimator,allocs,$(b.allocs)") end function report(;benchmark, benchmark_name) for quantile in (minimum, median, maximum) report( quantile, benchmark = benchmark, benchmark_name = benchmark_name, ) end end report( benchmark = (@benchmark sind_mmx(polynomial_degree) seconds=benchmark_seconds), benchmark_name = first(ARGS), ) ``` CSV results: ```csv Tulip version,estimator,measurement type,value v0.9.5,minimum,time,2.63759609e8 v0.9.5,minimum,gctime,3.4505713e7 v0.9.5,minimum,memory,495299296 v0.9.5,minimum,allocs,3629874 v0.9.5,median,time,4.2594161e8 v0.9.5,median,gctime,1.3250321e8 v0.9.5,median,memory,495299296 v0.9.5,median,allocs,3629874 v0.9.5,maximum,time,4.55935021e8 v0.9.5,maximum,gctime,1.40426286e8 v0.9.5,maximum,memory,495299296 v0.9.5,maximum,allocs,3629874 MutableArithmetics for IPM/HSD,minimum,time,2.57993117e8 MutableArithmetics for IPM/HSD,minimum,gctime,2.9403466e7 MutableArithmetics for IPM/HSD,minimum,memory,442052896 MutableArithmetics for IPM/HSD,minimum,allocs,3238720 MutableArithmetics for IPM/HSD,median,time,4.22323273e8 MutableArithmetics for IPM/HSD,median,gctime,1.282365305e8 MutableArithmetics for IPM/HSD,median,memory,442052896 MutableArithmetics for IPM/HSD,median,allocs,3238720 MutableArithmetics for IPM/HSD,maximum,time,4.56330849e8 MutableArithmetics for IPM/HSD,maximum,gctime,1.57061172e8 MutableArithmetics for IPM/HSD,maximum,memory,442052896 MutableArithmetics for IPM/HSD,maximum,allocs,3238720 IPM/HSD: use logical slicing ...,minimum,time,2.40996648e8 IPM/HSD: use logical slicing ...,minimum,gctime,2.5335783e7 IPM/HSD: use logical slicing ...,minimum,memory,386588512 IPM/HSD: use logical slicing ...,minimum,allocs,2833356 IPM/HSD: use logical slicing ...,median,time,3.76039574e8 IPM/HSD: use logical slicing ...,median,gctime,1.06930941e8 IPM/HSD: use logical slicing ...,median,memory,386588512 IPM/HSD: use logical slicing ...,median,allocs,2833356 IPM/HSD: use logical slicing ...,maximum,time,4.00347376e8 IPM/HSD: use logical slicing ...,maximum,gctime,1.27260987e8 IPM/HSD: use logical slicing ...,maximum,memory,386588512 IPM/HSD: use logical slicing ...,maximum,allocs,2833356 ``` Fixes ds4dm#122
We now do basically this for a logical vector l: `dot(a[l], b[l])`, instead of `dot(a .* l, b .* l) as before. This is as suggested here: ds4dm#122 (comment) Helps with the performance of the BigFloat arithmetic. A Julia script and a Unix shell script were used to conduct an experiment for assesing the impact of this commit and the previous commit on performance. The scripts and the resulting CSV file follow. The benchmark experiment is conducted and the CSV created by removing sources of system load on a computer and running the shell script four times: once with the `init` command and three times with the `run` command, each time checking out a different commit in the Tulip git repo. Unix (Bourne) shell script: ```sh set -u command=$1 julia_opts='-O3 --min-optlevel=3 --heap-size-hint=5G --depwarn=error --warn-overwrite=yes' script=tulip_benchmark.jl case "$command" in init_csv) printf '%s,%s,%s,%s\n' 'Tulip version' estimator 'measurement type' value ;; run) tulip_version=$2 $PATH_TO_JULIA_BIN $julia_opts "$script" "$tulip_version" ;; *) printf '%s\n' error 2>&1 exit 1 ;; esac ``` Julia script: ```julia const benchmark_seconds = 500 const polynomial_degree = 20 setprecision(BigFloat, 12 * 2^7) using BenchmarkTools import FindMinimaxPolynomial, # v0.2.3 Tulip, MathOptInterface const FMP = FindMinimaxPolynomial const MMX = FMP.Minimax const PPTI = FMP.PolynomialPassingThroughIntervals const NE = FMP.NumericalErrorTypes const to_poly = FMP.ToSparsePolynomial.to_sparse_polynomial const mmx = MMX.minimax_polynomial const error_type_relative = NE.RelativeError() const MOI = MathOptInterface const itv_max_err = FMP.ApproximateInfinityNorm.interval_max_err function make_lp() lp = Tulip.Optimizer{BigFloat}() # Remove iteration limit just in case MOI.set(lp, MOI.RawOptimizerAttribute("IPM_IterationsLimit"), 2000) # Disable presolve, speeds things up #MOI.set(lp, MOI.RawOptimizerAttribute("Presolve_Level"), 0) lp end const itv = (-big"2.0"^-3, big"45.0") odd_monomials(n::Int) = 1:2:n sind_mmx(n::Int) = mmx( make_lp, sind, (itv,), odd_monomials(n), # Small factor to have less variance in the results initial_perturb_factor = 1//(2^20), # We're benchmarking LP, so disable other stuff worst_segments_density = 5, worst_segments_breadth_limit = 2, worst_segments_depth_ratio = 1/2, # Exit right after the first step exit_condition = true, ) function report(estimator; benchmark, benchmark_name) b = estimator(benchmark) println("$benchmark_name,$estimator,time,$(b.time)") println("$benchmark_name,$estimator,gctime,$(b.gctime)") println("$benchmark_name,$estimator,memory,$(b.memory)") println("$benchmark_name,$estimator,allocs,$(b.allocs)") end function report(;benchmark, benchmark_name) for quantile in (minimum, median, maximum) report( quantile, benchmark = benchmark, benchmark_name = benchmark_name, ) end end report( benchmark = (@benchmark sind_mmx(polynomial_degree) seconds=benchmark_seconds), benchmark_name = first(ARGS), ) ``` CSV results: ```csv Tulip version,estimator,measurement type,value v0.9.5,minimum,time,2.63759609e8 v0.9.5,minimum,gctime,3.4505713e7 v0.9.5,minimum,memory,495299296 v0.9.5,minimum,allocs,3629874 v0.9.5,median,time,4.2594161e8 v0.9.5,median,gctime,1.3250321e8 v0.9.5,median,memory,495299296 v0.9.5,median,allocs,3629874 v0.9.5,maximum,time,4.55935021e8 v0.9.5,maximum,gctime,1.40426286e8 v0.9.5,maximum,memory,495299296 v0.9.5,maximum,allocs,3629874 MutableArithmetics for IPM/HSD,minimum,time,2.57993117e8 MutableArithmetics for IPM/HSD,minimum,gctime,2.9403466e7 MutableArithmetics for IPM/HSD,minimum,memory,442052896 MutableArithmetics for IPM/HSD,minimum,allocs,3238720 MutableArithmetics for IPM/HSD,median,time,4.22323273e8 MutableArithmetics for IPM/HSD,median,gctime,1.282365305e8 MutableArithmetics for IPM/HSD,median,memory,442052896 MutableArithmetics for IPM/HSD,median,allocs,3238720 MutableArithmetics for IPM/HSD,maximum,time,4.56330849e8 MutableArithmetics for IPM/HSD,maximum,gctime,1.57061172e8 MutableArithmetics for IPM/HSD,maximum,memory,442052896 MutableArithmetics for IPM/HSD,maximum,allocs,3238720 IPM/HSD: use logical slicing ...,minimum,time,2.40996648e8 IPM/HSD: use logical slicing ...,minimum,gctime,2.5335783e7 IPM/HSD: use logical slicing ...,minimum,memory,386588512 IPM/HSD: use logical slicing ...,minimum,allocs,2833356 IPM/HSD: use logical slicing ...,median,time,3.76039574e8 IPM/HSD: use logical slicing ...,median,gctime,1.06930941e8 IPM/HSD: use logical slicing ...,median,memory,386588512 IPM/HSD: use logical slicing ...,median,allocs,2833356 IPM/HSD: use logical slicing ...,maximum,time,4.00347376e8 IPM/HSD: use logical slicing ...,maximum,gctime,1.27260987e8 IPM/HSD: use logical slicing ...,maximum,memory,386588512 IPM/HSD: use logical slicing ...,maximum,allocs,2833356 ``` Fixes ds4dm#122
We now do basically this for a logical vector l: `dot(a[l], b[l])`, instead of `dot(a .* l, b .* l) as before. This is as suggested here: ds4dm#122 (comment) Helps with the performance of the BigFloat arithmetic. A Julia script and a Unix shell script were used to conduct an experiment for assesing the impact of this commit and the previous commit on performance. The scripts and the resulting CSV file follow. The benchmark experiment is conducted and the CSV created by removing sources of system load on a computer and running the shell script four times: once with the `init_csv` command and three times with the `run` command, each time checking out a different commit in the Tulip git repo. Unix (Bourne) shell script: ```sh set -u command=$1 julia_opts='-O3 --min-optlevel=3 --heap-size-hint=5G --depwarn=error --warn-overwrite=yes' script=tulip_benchmark.jl case "$command" in init_csv) printf '%s,%s,%s,%s\n' 'Tulip version' estimator 'measurement type' value ;; run) tulip_version=$2 $PATH_TO_JULIA_BIN $julia_opts "$script" "$tulip_version" ;; *) printf '%s\n' error 2>&1 exit 1 ;; esac ``` Julia script: ```julia const benchmark_seconds = 500 const polynomial_degree = 20 setprecision(BigFloat, 12 * 2^7) using BenchmarkTools import FindMinimaxPolynomial, # v0.2.3 Tulip, MathOptInterface const FMP = FindMinimaxPolynomial const MMX = FMP.Minimax const PPTI = FMP.PolynomialPassingThroughIntervals const NE = FMP.NumericalErrorTypes const to_poly = FMP.ToSparsePolynomial.to_sparse_polynomial const mmx = MMX.minimax_polynomial const error_type_relative = NE.RelativeError() const MOI = MathOptInterface const itv_max_err = FMP.ApproximateInfinityNorm.interval_max_err function make_lp() lp = Tulip.Optimizer{BigFloat}() # Remove iteration limit just in case MOI.set(lp, MOI.RawOptimizerAttribute("IPM_IterationsLimit"), 2000) # Disable presolve, speeds things up #MOI.set(lp, MOI.RawOptimizerAttribute("Presolve_Level"), 0) lp end const itv = (-big"2.0"^-3, big"45.0") odd_monomials(n::Int) = 1:2:n sind_mmx(n::Int) = mmx( make_lp, sind, (itv,), odd_monomials(n), # Small factor to have less variance in the results initial_perturb_factor = 1//(2^20), # We're benchmarking LP, so disable other stuff worst_segments_density = 5, worst_segments_breadth_limit = 2, worst_segments_depth_ratio = 1/2, # Exit right after the first step exit_condition = true, ) function report(estimator; benchmark, benchmark_name) b = estimator(benchmark) println("$benchmark_name,$estimator,time,$(b.time)") println("$benchmark_name,$estimator,gctime,$(b.gctime)") println("$benchmark_name,$estimator,memory,$(b.memory)") println("$benchmark_name,$estimator,allocs,$(b.allocs)") end function report(;benchmark, benchmark_name) for quantile in (minimum, median, maximum) report( quantile, benchmark = benchmark, benchmark_name = benchmark_name, ) end end report( benchmark = (@benchmark sind_mmx(polynomial_degree) seconds=benchmark_seconds), benchmark_name = first(ARGS), ) ``` CSV results: ```csv Tulip version,estimator,measurement type,value v0.9.5,minimum,time,2.63759609e8 v0.9.5,minimum,gctime,3.4505713e7 v0.9.5,minimum,memory,495299296 v0.9.5,minimum,allocs,3629874 v0.9.5,median,time,4.2594161e8 v0.9.5,median,gctime,1.3250321e8 v0.9.5,median,memory,495299296 v0.9.5,median,allocs,3629874 v0.9.5,maximum,time,4.55935021e8 v0.9.5,maximum,gctime,1.40426286e8 v0.9.5,maximum,memory,495299296 v0.9.5,maximum,allocs,3629874 MutableArithmetics for IPM/HSD,minimum,time,2.57993117e8 MutableArithmetics for IPM/HSD,minimum,gctime,2.9403466e7 MutableArithmetics for IPM/HSD,minimum,memory,442052896 MutableArithmetics for IPM/HSD,minimum,allocs,3238720 MutableArithmetics for IPM/HSD,median,time,4.22323273e8 MutableArithmetics for IPM/HSD,median,gctime,1.282365305e8 MutableArithmetics for IPM/HSD,median,memory,442052896 MutableArithmetics for IPM/HSD,median,allocs,3238720 MutableArithmetics for IPM/HSD,maximum,time,4.56330849e8 MutableArithmetics for IPM/HSD,maximum,gctime,1.57061172e8 MutableArithmetics for IPM/HSD,maximum,memory,442052896 MutableArithmetics for IPM/HSD,maximum,allocs,3238720 IPM/HSD: use logical slicing ...,minimum,time,2.40996648e8 IPM/HSD: use logical slicing ...,minimum,gctime,2.5335783e7 IPM/HSD: use logical slicing ...,minimum,memory,386588512 IPM/HSD: use logical slicing ...,minimum,allocs,2833356 IPM/HSD: use logical slicing ...,median,time,3.76039574e8 IPM/HSD: use logical slicing ...,median,gctime,1.06930941e8 IPM/HSD: use logical slicing ...,median,memory,386588512 IPM/HSD: use logical slicing ...,median,allocs,2833356 IPM/HSD: use logical slicing ...,maximum,time,4.00347376e8 IPM/HSD: use logical slicing ...,maximum,gctime,1.27260987e8 IPM/HSD: use logical slicing ...,maximum,memory,386588512 IPM/HSD: use logical slicing ...,maximum,allocs,2833356 ``` Fixes ds4dm#122
This line causes a lot of allocation, reported by
--track-allocation=all
:Tulip.jl/src/IPM/HSD/step.jl
Line 216 in 2b055fd
So there's possibly room for improvement at or around that line.
The text was updated successfully, but these errors were encountered: