Precompute expensive computations #1911

heshpdx · 2024-09-06T04:42:08Z

Analysis showed some easy opportunities to gain performance through replacing expensive FP divides with FP multiplies. We can loft the computation outside the hot loops and hold inverse values instead. This way we don't incur the cost of the divide on every loop iteration. Most CPUs are much slower on FP divides compared to FP multiplies.

This technique already exists in the codebase, for example:
https://github.com/ERGO-Code/HiGHS/blob/5ce7a27/src/util/HFactor.cpp#L726

I measured +4.5% performance with this patch, when solving i_n13.mps or netlarge.mps from https://plato.asu.edu/ftp/lptestset/network/.

Replace hot FDIVs with FMULs through saving off the inverse values outside the loops, so we don't incur the cost on each loop iteration.

jajhall · 2024-09-06T13:08:16Z

Thanks, but I don't observe these improvements running interactively on my laptop. I'm travelling, so will try again more rigorously on a PC when I get home

Which of the netlarge problems was it, by the way. I tried all four.

heshpdx · 2024-09-06T17:36:08Z

Thank you for accepting the patch. Performance will vary based each microarchitecture's relative difference in div/mul latencies. I was measuring on an Ampere Altra aarch64. The input file was netlarge2.mps, sorry for the confusion.

heshpdx · 2024-09-06T17:41:56Z

I have a formatting fix. How do I offer it here?
heshpdx@1aad204

jajhall · 2024-09-06T17:48:03Z

Thanks, but don't worry. I will rebase your original PR into a new branch of latest, and do some experiments.

jajhall · 2024-09-06T18:09:01Z

I have a formatting fix. How do I offer it here?
heshpdx@1aad204

Hmm... could you create a PR from heshpdx@1aad204 to https://github.com/ERGO-Code/HiGHS/tree/consider-1911 please, as I seem to be getting some merge conflicts if I try to do it

heshpdx · 2024-09-06T20:25:27Z

Hmm... could you create a PR from heshpdx@1aad204 to https://github.com/ERGO-Code/HiGHS/tree/consider-1911 please, as I seem to be getting some merge conflicts if I try to do it

Ok I made this. #1914
Good luck!
thanks

heshpdx · 2024-09-09T16:43:30Z

I ran on another ARM server today at the office. Data below is from a Huawei Taishan (Kunpeng 920 CPU) using gcc-11.4. We see +6.5% on netlarge2.mps and +3.0% on i_n13.mps. The results are reproducible.

master, both cmdlines:

$ ./build_slow/bin/highs --presolve off netlarge2.mps
Running HiGHS 1.7.2 (git hash: 5ce7a2753): Copyright (c) 2024 HiGHS under MIT licence terms
LP   netlarge2 has 40000 rows; 1160000 cols; 2320000 nonzeros
Coefficient ranges:
  Matrix [1e+00, 1e+00]
  Cost   [1e+00, 1e+04]
  Bound  [2e+04, 4e+05]
  RHS    [1e+00, 4e+02]
Solving LP without presolve, or with basis, or unconstrained
Using EKK dual simplex solver - serial
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Pr: 13613(800000) 1s
      32234     5.6806565648e+08 Pr: 3320(127608) 6s
      39954     5.7448429400e+08 Pr: 0(0) 10s
Model   status      : Optimal
Simplex   iterations: 39954
Objective value     :  5.7448429400e+08
HiGHS run time      :         10.28

$ ./build_slow/bin/highs --presolve off i_n13.mps
Running HiGHS 1.7.2 (git hash: 5ce7a2753): Copyright (c) 2024 HiGHS under MIT licence terms
LP   i_n13 has 8192 rows; 741455 cols; 1482910 nonzeros
Coefficient ranges:
  Matrix [1e+00, 1e+00]
  Cost   [1e+00, 4e+03]
  Bound  [1e+00, 9e+06]
  RHS    [9e+06, 9e+06]
Solving LP without presolve, or with basis, or unconstrained
Using EKK dual simplex solver - serial
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Pr: 2(1.80671e+07) 0s
       3758     1.0903244690e+11 Pr: 3273(7.23338e+08); Du: 0(3.56085e-07) 5s
      18892     2.3143298601e+11 Pr: 4190(1.832e+09); Du: 0(6.65129e-07) 10s
      35759     3.0645568816e+11 Pr: 4820(2.1231e+09); Du: 0(1.15216e-06) 15s
      55327     3.7490448124e+11 Pr: 4795(2.85531e+09); Du: 0(1.20775e-06) 20s
      76691     4.3049351710e+11 Pr: 3951(2.55723e+09); Du: 0(1.14716e-06) 26s
      97243     4.7801930842e+11 Pr: 3712(3.05969e+09); Du: 0(1.30255e-06) 31s
     119911     5.4374059208e+11 Pr: 3460(3.26244e+09); Du: 0(7.58261e-07) 36s
     143610     6.0941433079e+11 Pr: 3167(2.70235e+09); Du: 0(8.41314e-07) 41s
     149942     8.0849647520e+11 Pr: 5858(6.52321e+08); Du: 0(3.75338e-07) 47s
     153754     8.4257734541e+11 Pr: 4957(1.68588e+08); Du: 0(3.47095e-07) 52s
     157447     8.4733822071e+11 Pr: 3977(4.79445e+07); Du: 0(3.77339e-07) 58s
     162382     8.4746305066e+11 Pr: 1560(3.59236e+06); Du: 0(8.37662e-07) 63s
     164255     8.4771620650e+11 Pr: 0(0) 66s
Model   status      : Optimal
Simplex   iterations: 164255
Objective value     :  8.4771620650e+11
HiGHS run time      :         65.88

branch, both cmdlines:

$ ./build_fast/bin/highs --presolve off netlarge2.mps
Running HiGHS 1.7.2 (git hash: 1aad20439): Copyright (c) 2024 HiGHS under MIT licence terms
LP   netlarge2 has 40000 rows; 1160000 cols; 2320000 nonzeros
Coefficient ranges:
  Matrix [1e+00, 1e+00]
  Cost   [1e+00, 1e+04]
  Bound  [2e+04, 4e+05]
  RHS    [1e+00, 4e+02]
Solving LP without presolve, or with basis, or unconstrained
Using EKK dual simplex solver - serial
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Pr: 13613(800000) 1s
      34459     5.7169393747e+08 Pr: 2465(94118) 6s
      39954     5.7448429400e+08 Pr: 0(0) 10s
Model   status      : Optimal
Simplex   iterations: 39954
Objective value     :  5.7448429400e+08
HiGHS run time      :          9.65

$ ./build_fast/bin/highs --presolve off i_n13.mps
Running HiGHS 1.7.2 (git hash: 1aad20439): Copyright (c) 2024 HiGHS under MIT licence terms
LP   i_n13 has 8192 rows; 741455 cols; 1482910 nonzeros
Coefficient ranges:
  Matrix [1e+00, 1e+00]
  Cost   [1e+00, 4e+03]
  Bound  [1e+00, 9e+06]
  RHS    [9e+06, 9e+06]
Solving LP without presolve, or with basis, or unconstrained
Using EKK dual simplex solver - serial
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Pr: 2(1.80671e+07) 0s
       4009     1.1549166576e+11 Pr: 3358(7.88891e+08); Du: 0(4.21773e-07) 5s
      20156     2.3955273869e+11 Pr: 4316(1.54803e+09); Du: 0(9.59695e-07) 10s
      38401     3.1406280397e+11 Pr: 4913(2.09319e+09); Du: 0(1.4121e-06) 15s
      59160     3.8742694719e+11 Pr: 4571(2.96417e+09); Du: 0(9.75124e-07) 20s
      81425     4.3975960129e+11 Pr: 3883(2.66298e+09); Du: 0(1.6854e-06) 25s
     103418     4.9796966643e+11 Pr: 3612(3.52669e+09); Du: 0(1.34964e-06) 30s
     127217     5.6565078787e+11 Pr: 3398(3.17893e+09); Du: 0(1.47158e-06) 36s
     148750     7.0635807639e+11 Pr: 5374(2.2728e+09); Du: 0(7.14663e-07) 41s
     150562     8.2155681588e+11 Pr: 5700(4.94399e+08); Du: 0(5.56892e-07) 47s
     154296     8.4528485598e+11 Pr: 4974(1.38991e+08); Du: 0(5.65943e-07) 52s
     158001     8.4738292208e+11 Pr: 3704(3.85488e+07); Du: 0(5.99151e-07) 57s
     162954     8.4771851567e+11 Pr: 1151(2.01813e+06); Du: 0(1.14222e-06) 62s
     164255     8.4771620650e+11 Pr: 0(0) 64s
Model   status      : Optimal
Simplex   iterations: 164255
Objective value     :  8.4771620650e+11
HiGHS run time      :         63.92

heshpdx · 2024-09-12T04:11:51Z

Today I got access to an AMD server in the OCI cloud, /proc/cpuinfo says this is a "AMD EPYC 7J13 64-Core Processor." I used taskset to make sure we land on the same core, since each core in a many-core machine has a different latency to memory. We see +2.1% on netlarge2.mps and +2.2% on i_n13.mps. The results are reproducible.

master, both cmdlines:

$ taskset -c 4 ./build_slow/bin/highs --presolve off ../netlarge2.mps 
Running HiGHS 1.7.2 (git hash: 5ce7a2753): Copyright (c) 2024 HiGHS under MIT licence terms
LP   netlarge2 has 40000 rows; 1160000 cols; 2320000 nonzeros
Coefficient ranges:
  Matrix [1e+00, 1e+00]
  Cost   [1e+00, 1e+04]
  Bound  [2e+04, 4e+05]
  RHS    [1e+00, 4e+02]
Solving LP without presolve, or with basis, or unconstrained
Using EKK dual simplex solver - serial
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Pr: 13613(800000) 0s
      39954     5.7448429400e+08 Pr: 0(0) 5s
      39954     5.7448429400e+08 Pr: 0(0) 5s
Model   status      : Optimal
Simplex   iterations: 39954
Objective value     :  5.7448429400e+08
HiGHS run time      :          5.31

$ taskset -c 4 ./build_slow/bin/highs --presolve off ../i_n13.mps 
Running HiGHS 1.7.2 (git hash: 5ce7a2753): Copyright (c) 2024 HiGHS under MIT licence terms
LP   i_n13 has 8192 rows; 741455 cols; 1482910 nonzeros
Coefficient ranges:
  Matrix [1e+00, 1e+00]
  Cost   [1e+00, 4e+03]
  Bound  [1e+00, 9e+06]
  RHS    [9e+06, 9e+06]
Solving LP without presolve, or with basis, or unconstrained
Using EKK dual simplex solver - serial
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Pr: 2(1.80671e+07) 0s
      26542     2.7401949548e+11 Pr: 4632(2.08721e+09); Du: 0(8.79338e-07) 5s
      71526     4.1866844793e+11 Pr: 4111(2.86759e+09); Du: 0(9.52449e-07) 10s
     118674     5.3873873561e+11 Pr: 3436(3.11155e+09); Du: 0(7.13305e-07) 15s
     149942     8.0849647520e+11 Pr: 5858(6.52321e+08); Du: 0(3.75338e-07) 21s
     157447     8.4733822071e+11 Pr: 3977(4.79445e+07); Du: 0(3.77339e-07) 26s
     164255     8.4771620650e+11 Pr: 0(0) 29s
Model   status      : Optimal
Simplex   iterations: 164255
Objective value     :  8.4771620650e+11
HiGHS run time      :         29.34

branch, both cmdlines:

$ taskset -c 4 ./build_fast/bin/highs --presolve off ../netlarge2.mps 
Running HiGHS 1.7.2 (git hash: 1aad20439): Copyright (c) 2024 HiGHS under MIT licence terms
LP   netlarge2 has 40000 rows; 1160000 cols; 2320000 nonzeros
Coefficient ranges:
  Matrix [1e+00, 1e+00]
  Cost   [1e+00, 1e+04]
  Bound  [2e+04, 4e+05]
  RHS    [1e+00, 4e+02]
Solving LP without presolve, or with basis, or unconstrained
Using EKK dual simplex solver - serial
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Pr: 13613(800000) 0s
      39954     5.7448429400e+08 Pr: 0(0) 5s
Model   status      : Optimal
Simplex   iterations: 39954
Objective value     :  5.7448429400e+08
HiGHS run time      :          5.20

$ taskset -c 4 ./build_fast/bin/highs --presolve off ../i_n13.mps 
Running HiGHS 1.7.2 (git hash: 1aad20439): Copyright (c) 2024 HiGHS under MIT licence terms
LP   i_n13 has 8192 rows; 741455 cols; 1482910 nonzeros
Coefficient ranges:
  Matrix [1e+00, 1e+00]
  Cost   [1e+00, 4e+03]
  Bound  [1e+00, 9e+06]
  RHS    [9e+06, 9e+06]
Solving LP without presolve, or with basis, or unconstrained
Using EKK dual simplex solver - serial
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Pr: 2(1.80671e+07) 0s
      27781     2.7834600388e+11 Pr: 4527(2.04886e+09); Du: 0(8.92919e-07) 5s
      73555     4.2503174321e+11 Pr: 3967(2.78817e+09); Du: 0(9.40745e-07) 10s
     121654     5.4910876317e+11 Pr: 3363(2.99491e+09); Du: 0(8.3465e-07) 15s
     149942     8.0849647520e+11 Pr: 5858(6.52321e+08); Du: 0(3.75338e-07) 20s
     158001     8.4738292208e+11 Pr: 3704(3.85488e+07); Du: 0(5.99151e-07) 26s
     164255     8.4771620650e+11 Pr: 0(0) 29s
Model   status      : Optimal
Simplex   iterations: 164255
Objective value     :  8.4771620650e+11
HiGHS run time      :         28.70

jajhall · 2024-09-12T11:41:45Z

Thanks. I've just run these two models on my desktop, and get a small improvement.

I'll run through the Mittelmann test set to extend the sample

fwesselm · 2024-09-13T09:23:47Z

@jajhall I could run the MIPLIB etc., if you like.

jajhall · 2024-09-13T09:28:45Z

@jajhall I could run the MIPLIB etc., if you like.

Thanks, but I think that the LP test set is enough as a sanity check that noting is broken. That said, MIP behaviour will change, so you'll need a reference set of results for future comparison

I'm not checking for performance as a means of deciding whether to make the change, as it's hard to imagine it ever being worse. I'm just intrigued to see what difference it makes.

Precompute expensive computations

08e57a6

Replace hot FDIVs with FMULs through saving off the inverse values outside the loops, so we don't incur the cost on each loop iteration.

jajhall changed the base branch from master to latest September 6, 2024 11:02

jajhall merged commit 0f4b0d0 into ERGO-Code:latest Sep 6, 2024
109 of 110 checks passed

jajhall mentioned this pull request Sep 6, 2024

Revert "Precompute expensive computations" #1912

Merged

heshpdx mentioned this pull request Sep 9, 2024

Replace expensive computations #1914

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Precompute expensive computations #1911

Precompute expensive computations #1911

heshpdx commented Sep 6, 2024

jajhall commented Sep 6, 2024

heshpdx commented Sep 6, 2024

heshpdx commented Sep 6, 2024

jajhall commented Sep 6, 2024

jajhall commented Sep 6, 2024

heshpdx commented Sep 6, 2024

heshpdx commented Sep 9, 2024

heshpdx commented Sep 12, 2024

jajhall commented Sep 12, 2024

fwesselm commented Sep 13, 2024

jajhall commented Sep 13, 2024

Precompute expensive computations #1911

Precompute expensive computations #1911

Conversation

heshpdx commented Sep 6, 2024

jajhall commented Sep 6, 2024

heshpdx commented Sep 6, 2024

heshpdx commented Sep 6, 2024

jajhall commented Sep 6, 2024

jajhall commented Sep 6, 2024

heshpdx commented Sep 6, 2024

heshpdx commented Sep 9, 2024

heshpdx commented Sep 12, 2024

jajhall commented Sep 12, 2024

fwesselm commented Sep 13, 2024

jajhall commented Sep 13, 2024