Tpetra: Unexplained memory usage in CrsMatrix::doImport #11004

tasmith4 · 2022-09-09T20:16:27Z

Bug Report

@trilinos/tpetra @crdohrm

Description

Tpetra's CrsMatrix::doImport seems to use more memory than necessary. First noticed by the Sierra/SD team in #6648 and #6663. Initial (partial) fix in #6831. After that, further investigation shows doImport still has roughly 100% higher memory usage than Sierra/SD's hand-rolled importer. A comment in Sierra's MATHLIB/gdsw2/TpetraUtil.C suggests the memory growth is believed to come from Tpetra::StaticProfile, but it's unclear whether that's still correct after #6831 merged.

Data collected on 9/2/22 by @tasmith4 from a Sierra/SD test (instructions to reproduce included below; results in Mbytes -- Min memory / Avg memory / Max memory / Max high-water-mark memory):

Using Sierra/SD's importSquareMatrix
- Before: 513/536/548/623
- After: 603/615/629/685
Using Tpetra's CrsMatrix::doImport
- Before: 514/553/593/667
- After: 687/712/745/831

Steps to Reproduce

Prerequisite: obtain Sierra code access
Clone the Sierra code and test repositories
Edit MATHLIB/gdsw2/Preconditioner.C, inside Preconditioner::factorOverlapMatrix:
- Before and after m_TUtil.getTpetraMatrix(m_preconditionerMatrix, overlapMap, overlapMatrix); add a call to m_memTime->OutputMemoryAndTiming(/* some text here */);
Assign the reproducing Sierra test: assign -p Salinas_rtest/acceptance/thermalExpansion/perf_layered_frusta_eng.test
Build: bake -e debug --targets-from-tests=assigned.tests
Run the test: testrun -e debug --save-all-results
Go to the test directory reported by testrun, open dd_memory.dat, and find the output text you specified in the OutputMemoryAndTiming call
To switch between the hand-rolled variant and built-in Tpetra variant, find the #define FIX_TRILINOS_GDSW line in MATHLIB/gdsw2/TpetraUtil.C, edit as appropriate, rebuild, and rerun the test

The text was updated successfully, but these errors were encountered:

tasmith4 · 2022-09-09T20:40:40Z

Notes after some review of this problem:

Implementation of importSquareMatrix is in MATHLIB/gdsw2/TpetraFunctionsGDSW.[h|C]
importSquareMatrix has 3 main pieces:
- constructDistributor
  - hard to draw exact parallels, but seems to include some of the LID-level plan computation that would happen in the Tpetra::Import constructor
- communicateRowMap
  - seems to include more of the LID-level plan computation
- communicateMatrixData
  - packs data directly out of the matrix (i.e. rakes out values directly, doesn't use packAndPrepare)
  - sends the data with the Distributor constructed above
  - makes a brand-new CrsMatrix and inserts the values it received with CrsMatrix::insertLocalValues
  - calls CrsMatrix::fillComplete

Top guesses for where to look first (based on key differences in implementations):

packAndPrepare and/or unpackAndCombine -- both are skipped in the Sierra/SD implementation
- Is there some inefficiency in CrsMatrix's implementation of these functions?
- Does the DistObject transfer code handle the pack/unpack inefficiently given what we know about how CrsMatrix will pack? (i.e. has Sierra/SD found an efficiency by specializing for CrsMatrix?)
Import constructor -- also skipped in favor of direct communication
- Does it do more work/use more memory than needed?

github-actions · 2023-09-10T12:23:29Z

This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity.
If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE label.
If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE.
If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.

tasmith4 added type: bug The primary issue is a bug in Trilinos code or tests pkg: Tpetra client: Sierra All issues that primarily impacts SNL Sierra codes labels Sep 9, 2022

github-actions bot added the MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. label Sep 10, 2023

csiefer2 added DO_NOT_AUTOCLOSE This issue should be exempt from auto-closing by the GitHub Actions bot. and removed MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. labels Sep 12, 2023

jhux2 added this to Tpetra Aug 12, 2024

jhux2 moved this to In Progress in Tpetra Aug 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tpetra: Unexplained memory usage in CrsMatrix::doImport #11004

Tpetra: Unexplained memory usage in CrsMatrix::doImport #11004

tasmith4 commented Sep 9, 2022

tasmith4 commented Sep 9, 2022

github-actions bot commented Sep 10, 2023

Tpetra: Unexplained memory usage in CrsMatrix::doImport #11004

Tpetra: Unexplained memory usage in CrsMatrix::doImport #11004

Comments

tasmith4 commented Sep 9, 2022

Bug Report

Description

Steps to Reproduce

tasmith4 commented Sep 9, 2022

github-actions bot commented Sep 10, 2023