Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precompute expensive computations #1911

Merged
merged 1 commit into from
Sep 6, 2024
Merged

Conversation

heshpdx
Copy link
Contributor

@heshpdx heshpdx commented Sep 6, 2024

Analysis showed some easy opportunities to gain performance through replacing expensive FP divides with FP multiplies. We can loft the computation outside the hot loops and hold inverse values instead. This way we don't incur the cost of the divide on every loop iteration. Most CPUs are much slower on FP divides compared to FP multiplies.

This technique already exists in the codebase, for example:
https://github.com/ERGO-Code/HiGHS/blob/5ce7a27/src/util/HFactor.cpp#L726

I measured +4.5% performance with this patch, when solving i_n13.mps or netlarge.mps from https://plato.asu.edu/ftp/lptestset/network/.

Replace hot FDIVs with FMULs through saving off the inverse values
outside the loops, so we don't incur the cost on each loop iteration.
@jajhall jajhall changed the base branch from master to latest September 6, 2024 11:02
@jajhall jajhall merged commit 0f4b0d0 into ERGO-Code:latest Sep 6, 2024
109 of 110 checks passed
@jajhall
Copy link
Member

jajhall commented Sep 6, 2024

Thanks, but I don't observe these improvements running interactively on my laptop. I'm travelling, so will try again more rigorously on a PC when I get home

Which of the netlarge problems was it, by the way. I tried all four.

@heshpdx
Copy link
Contributor Author

heshpdx commented Sep 6, 2024

Thank you for accepting the patch. Performance will vary based each microarchitecture's relative difference in div/mul latencies. I was measuring on an Ampere Altra aarch64. The input file was netlarge2.mps, sorry for the confusion.

@heshpdx
Copy link
Contributor Author

heshpdx commented Sep 6, 2024

I have a formatting fix. How do I offer it here?
heshpdx@1aad204

@jajhall
Copy link
Member

jajhall commented Sep 6, 2024

Thanks, but don't worry. I will rebase your original PR into a new branch of latest, and do some experiments.

@jajhall
Copy link
Member

jajhall commented Sep 6, 2024

I have a formatting fix. How do I offer it here?
heshpdx@1aad204

Hmm... could you create a PR from heshpdx@1aad204 to https://github.com/ERGO-Code/HiGHS/tree/consider-1911 please, as I seem to be getting some merge conflicts if I try to do it

@heshpdx
Copy link
Contributor Author

heshpdx commented Sep 6, 2024

Hmm... could you create a PR from heshpdx@1aad204 to https://github.com/ERGO-Code/HiGHS/tree/consider-1911 please, as I seem to be getting some merge conflicts if I try to do it

Ok I made this. #1914
Good luck!
thanks

@heshpdx
Copy link
Contributor Author

heshpdx commented Sep 9, 2024

I ran on another ARM server today at the office. Data below is from a Huawei Taishan (Kunpeng 920 CPU) using gcc-11.4. We see +6.5% on netlarge2.mps and +3.0% on i_n13.mps. The results are reproducible.

master, both cmdlines:

$ ./build_slow/bin/highs --presolve off netlarge2.mps
Running HiGHS 1.7.2 (git hash: 5ce7a2753): Copyright (c) 2024 HiGHS under MIT licence terms
LP   netlarge2 has 40000 rows; 1160000 cols; 2320000 nonzeros
Coefficient ranges:
  Matrix [1e+00, 1e+00]
  Cost   [1e+00, 1e+04]
  Bound  [2e+04, 4e+05]
  RHS    [1e+00, 4e+02]
Solving LP without presolve, or with basis, or unconstrained
Using EKK dual simplex solver - serial
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Pr: 13613(800000) 1s
      32234     5.6806565648e+08 Pr: 3320(127608) 6s
      39954     5.7448429400e+08 Pr: 0(0) 10s
Model   status      : Optimal
Simplex   iterations: 39954
Objective value     :  5.7448429400e+08
HiGHS run time      :         10.28

$ ./build_slow/bin/highs --presolve off i_n13.mps
Running HiGHS 1.7.2 (git hash: 5ce7a2753): Copyright (c) 2024 HiGHS under MIT licence terms
LP   i_n13 has 8192 rows; 741455 cols; 1482910 nonzeros
Coefficient ranges:
  Matrix [1e+00, 1e+00]
  Cost   [1e+00, 4e+03]
  Bound  [1e+00, 9e+06]
  RHS    [9e+06, 9e+06]
Solving LP without presolve, or with basis, or unconstrained
Using EKK dual simplex solver - serial
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Pr: 2(1.80671e+07) 0s
       3758     1.0903244690e+11 Pr: 3273(7.23338e+08); Du: 0(3.56085e-07) 5s
      18892     2.3143298601e+11 Pr: 4190(1.832e+09); Du: 0(6.65129e-07) 10s
      35759     3.0645568816e+11 Pr: 4820(2.1231e+09); Du: 0(1.15216e-06) 15s
      55327     3.7490448124e+11 Pr: 4795(2.85531e+09); Du: 0(1.20775e-06) 20s
      76691     4.3049351710e+11 Pr: 3951(2.55723e+09); Du: 0(1.14716e-06) 26s
      97243     4.7801930842e+11 Pr: 3712(3.05969e+09); Du: 0(1.30255e-06) 31s
     119911     5.4374059208e+11 Pr: 3460(3.26244e+09); Du: 0(7.58261e-07) 36s
     143610     6.0941433079e+11 Pr: 3167(2.70235e+09); Du: 0(8.41314e-07) 41s
     149942     8.0849647520e+11 Pr: 5858(6.52321e+08); Du: 0(3.75338e-07) 47s
     153754     8.4257734541e+11 Pr: 4957(1.68588e+08); Du: 0(3.47095e-07) 52s
     157447     8.4733822071e+11 Pr: 3977(4.79445e+07); Du: 0(3.77339e-07) 58s
     162382     8.4746305066e+11 Pr: 1560(3.59236e+06); Du: 0(8.37662e-07) 63s
     164255     8.4771620650e+11 Pr: 0(0) 66s
Model   status      : Optimal
Simplex   iterations: 164255
Objective value     :  8.4771620650e+11
HiGHS run time      :         65.88

branch, both cmdlines:

$ ./build_fast/bin/highs --presolve off netlarge2.mps
Running HiGHS 1.7.2 (git hash: 1aad20439): Copyright (c) 2024 HiGHS under MIT licence terms
LP   netlarge2 has 40000 rows; 1160000 cols; 2320000 nonzeros
Coefficient ranges:
  Matrix [1e+00, 1e+00]
  Cost   [1e+00, 1e+04]
  Bound  [2e+04, 4e+05]
  RHS    [1e+00, 4e+02]
Solving LP without presolve, or with basis, or unconstrained
Using EKK dual simplex solver - serial
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Pr: 13613(800000) 1s
      34459     5.7169393747e+08 Pr: 2465(94118) 6s
      39954     5.7448429400e+08 Pr: 0(0) 10s
Model   status      : Optimal
Simplex   iterations: 39954
Objective value     :  5.7448429400e+08
HiGHS run time      :          9.65

$ ./build_fast/bin/highs --presolve off i_n13.mps
Running HiGHS 1.7.2 (git hash: 1aad20439): Copyright (c) 2024 HiGHS under MIT licence terms
LP   i_n13 has 8192 rows; 741455 cols; 1482910 nonzeros
Coefficient ranges:
  Matrix [1e+00, 1e+00]
  Cost   [1e+00, 4e+03]
  Bound  [1e+00, 9e+06]
  RHS    [9e+06, 9e+06]
Solving LP without presolve, or with basis, or unconstrained
Using EKK dual simplex solver - serial
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Pr: 2(1.80671e+07) 0s
       4009     1.1549166576e+11 Pr: 3358(7.88891e+08); Du: 0(4.21773e-07) 5s
      20156     2.3955273869e+11 Pr: 4316(1.54803e+09); Du: 0(9.59695e-07) 10s
      38401     3.1406280397e+11 Pr: 4913(2.09319e+09); Du: 0(1.4121e-06) 15s
      59160     3.8742694719e+11 Pr: 4571(2.96417e+09); Du: 0(9.75124e-07) 20s
      81425     4.3975960129e+11 Pr: 3883(2.66298e+09); Du: 0(1.6854e-06) 25s
     103418     4.9796966643e+11 Pr: 3612(3.52669e+09); Du: 0(1.34964e-06) 30s
     127217     5.6565078787e+11 Pr: 3398(3.17893e+09); Du: 0(1.47158e-06) 36s
     148750     7.0635807639e+11 Pr: 5374(2.2728e+09); Du: 0(7.14663e-07) 41s
     150562     8.2155681588e+11 Pr: 5700(4.94399e+08); Du: 0(5.56892e-07) 47s
     154296     8.4528485598e+11 Pr: 4974(1.38991e+08); Du: 0(5.65943e-07) 52s
     158001     8.4738292208e+11 Pr: 3704(3.85488e+07); Du: 0(5.99151e-07) 57s
     162954     8.4771851567e+11 Pr: 1151(2.01813e+06); Du: 0(1.14222e-06) 62s
     164255     8.4771620650e+11 Pr: 0(0) 64s
Model   status      : Optimal
Simplex   iterations: 164255
Objective value     :  8.4771620650e+11
HiGHS run time      :         63.92

@heshpdx
Copy link
Contributor Author

heshpdx commented Sep 12, 2024

Today I got access to an AMD server in the OCI cloud, /proc/cpuinfo says this is a "AMD EPYC 7J13 64-Core Processor." I used taskset to make sure we land on the same core, since each core in a many-core machine has a different latency to memory. We see +2.1% on netlarge2.mps and +2.2% on i_n13.mps. The results are reproducible.

master, both cmdlines:

$ taskset -c 4 ./build_slow/bin/highs --presolve off ../netlarge2.mps 
Running HiGHS 1.7.2 (git hash: 5ce7a2753): Copyright (c) 2024 HiGHS under MIT licence terms
LP   netlarge2 has 40000 rows; 1160000 cols; 2320000 nonzeros
Coefficient ranges:
  Matrix [1e+00, 1e+00]
  Cost   [1e+00, 1e+04]
  Bound  [2e+04, 4e+05]
  RHS    [1e+00, 4e+02]
Solving LP without presolve, or with basis, or unconstrained
Using EKK dual simplex solver - serial
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Pr: 13613(800000) 0s
      39954     5.7448429400e+08 Pr: 0(0) 5s
      39954     5.7448429400e+08 Pr: 0(0) 5s
Model   status      : Optimal
Simplex   iterations: 39954
Objective value     :  5.7448429400e+08
HiGHS run time      :          5.31

$ taskset -c 4 ./build_slow/bin/highs --presolve off ../i_n13.mps 
Running HiGHS 1.7.2 (git hash: 5ce7a2753): Copyright (c) 2024 HiGHS under MIT licence terms
LP   i_n13 has 8192 rows; 741455 cols; 1482910 nonzeros
Coefficient ranges:
  Matrix [1e+00, 1e+00]
  Cost   [1e+00, 4e+03]
  Bound  [1e+00, 9e+06]
  RHS    [9e+06, 9e+06]
Solving LP without presolve, or with basis, or unconstrained
Using EKK dual simplex solver - serial
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Pr: 2(1.80671e+07) 0s
      26542     2.7401949548e+11 Pr: 4632(2.08721e+09); Du: 0(8.79338e-07) 5s
      71526     4.1866844793e+11 Pr: 4111(2.86759e+09); Du: 0(9.52449e-07) 10s
     118674     5.3873873561e+11 Pr: 3436(3.11155e+09); Du: 0(7.13305e-07) 15s
     149942     8.0849647520e+11 Pr: 5858(6.52321e+08); Du: 0(3.75338e-07) 21s
     157447     8.4733822071e+11 Pr: 3977(4.79445e+07); Du: 0(3.77339e-07) 26s
     164255     8.4771620650e+11 Pr: 0(0) 29s
Model   status      : Optimal
Simplex   iterations: 164255
Objective value     :  8.4771620650e+11
HiGHS run time      :         29.34

branch, both cmdlines:

$ taskset -c 4 ./build_fast/bin/highs --presolve off ../netlarge2.mps 
Running HiGHS 1.7.2 (git hash: 1aad20439): Copyright (c) 2024 HiGHS under MIT licence terms
LP   netlarge2 has 40000 rows; 1160000 cols; 2320000 nonzeros
Coefficient ranges:
  Matrix [1e+00, 1e+00]
  Cost   [1e+00, 1e+04]
  Bound  [2e+04, 4e+05]
  RHS    [1e+00, 4e+02]
Solving LP without presolve, or with basis, or unconstrained
Using EKK dual simplex solver - serial
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Pr: 13613(800000) 0s
      39954     5.7448429400e+08 Pr: 0(0) 5s
Model   status      : Optimal
Simplex   iterations: 39954
Objective value     :  5.7448429400e+08
HiGHS run time      :          5.20

$ taskset -c 4 ./build_fast/bin/highs --presolve off ../i_n13.mps 
Running HiGHS 1.7.2 (git hash: 1aad20439): Copyright (c) 2024 HiGHS under MIT licence terms
LP   i_n13 has 8192 rows; 741455 cols; 1482910 nonzeros
Coefficient ranges:
  Matrix [1e+00, 1e+00]
  Cost   [1e+00, 4e+03]
  Bound  [1e+00, 9e+06]
  RHS    [9e+06, 9e+06]
Solving LP without presolve, or with basis, or unconstrained
Using EKK dual simplex solver - serial
  Iteration        Objective     Infeasibilities num(sum)
          0     0.0000000000e+00 Pr: 2(1.80671e+07) 0s
      27781     2.7834600388e+11 Pr: 4527(2.04886e+09); Du: 0(8.92919e-07) 5s
      73555     4.2503174321e+11 Pr: 3967(2.78817e+09); Du: 0(9.40745e-07) 10s
     121654     5.4910876317e+11 Pr: 3363(2.99491e+09); Du: 0(8.3465e-07) 15s
     149942     8.0849647520e+11 Pr: 5858(6.52321e+08); Du: 0(3.75338e-07) 20s
     158001     8.4738292208e+11 Pr: 3704(3.85488e+07); Du: 0(5.99151e-07) 26s
     164255     8.4771620650e+11 Pr: 0(0) 29s
Model   status      : Optimal
Simplex   iterations: 164255
Objective value     :  8.4771620650e+11
HiGHS run time      :         28.70

@jajhall
Copy link
Member

jajhall commented Sep 12, 2024

Thanks. I've just run these two models on my desktop, and get a small improvement.

I'll run through the Mittelmann test set to extend the sample

@fwesselm
Copy link
Contributor

@jajhall I could run the MIPLIB etc., if you like.

@jajhall
Copy link
Member

jajhall commented Sep 13, 2024

@jajhall I could run the MIPLIB etc., if you like.

Thanks, but I think that the LP test set is enough as a sanity check that noting is broken. That said, MIP behaviour will change, so you'll need a reference set of results for future comparison

I'm not checking for performance as a means of deciding whether to make the change, as it's hard to imagine it ever being worse. I'm just intrigued to see what difference it makes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants