Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tpetra: Unexplained memory usage in CrsMatrix::doImport #11004

Open
tasmith4 opened this issue Sep 9, 2022 · 2 comments
Open

Tpetra: Unexplained memory usage in CrsMatrix::doImport #11004

tasmith4 opened this issue Sep 9, 2022 · 2 comments
Labels
client: Sierra All issues that primarily impacts SNL Sierra codes DO_NOT_AUTOCLOSE This issue should be exempt from auto-closing by the GitHub Actions bot. pkg: Tpetra type: bug The primary issue is a bug in Trilinos code or tests

Comments

@tasmith4
Copy link
Contributor

tasmith4 commented Sep 9, 2022

Bug Report

@trilinos/tpetra @crdohrm

Description

Tpetra's CrsMatrix::doImport seems to use more memory than necessary. First noticed by the Sierra/SD team in #6648 and #6663. Initial (partial) fix in #6831. After that, further investigation shows doImport still has roughly 100% higher memory usage than Sierra/SD's hand-rolled importer. A comment in Sierra's MATHLIB/gdsw2/TpetraUtil.C suggests the memory growth is believed to come from Tpetra::StaticProfile, but it's unclear whether that's still correct after #6831 merged.

Data collected on 9/2/22 by @tasmith4 from a Sierra/SD test (instructions to reproduce included below; results in Mbytes -- Min memory / Avg memory / Max memory / Max high-water-mark memory):

  • Using Sierra/SD's importSquareMatrix
    • Before: 513/536/548/623
    • After: 603/615/629/685
  • Using Tpetra's CrsMatrix::doImport
    • Before: 514/553/593/667
    • After: 687/712/745/831

Steps to Reproduce

  • Prerequisite: obtain Sierra code access
  • Clone the Sierra code and test repositories
  • Edit MATHLIB/gdsw2/Preconditioner.C, inside Preconditioner::factorOverlapMatrix:
    • Before and after m_TUtil.getTpetraMatrix(m_preconditionerMatrix, overlapMap, overlapMatrix); add a call to m_memTime->OutputMemoryAndTiming(/* some text here */);
  • Assign the reproducing Sierra test: assign -p Salinas_rtest/acceptance/thermalExpansion/perf_layered_frusta_eng.test
  • Build: bake -e debug --targets-from-tests=assigned.tests
  • Run the test: testrun -e debug --save-all-results
  • Go to the test directory reported by testrun, open dd_memory.dat, and find the output text you specified in the OutputMemoryAndTiming call
  • To switch between the hand-rolled variant and built-in Tpetra variant, find the #define FIX_TRILINOS_GDSW line in MATHLIB/gdsw2/TpetraUtil.C, edit as appropriate, rebuild, and rerun the test
@tasmith4 tasmith4 added type: bug The primary issue is a bug in Trilinos code or tests pkg: Tpetra client: Sierra All issues that primarily impacts SNL Sierra codes labels Sep 9, 2022
@tasmith4
Copy link
Contributor Author

tasmith4 commented Sep 9, 2022

Notes after some review of this problem:

  • Implementation of importSquareMatrix is in MATHLIB/gdsw2/TpetraFunctionsGDSW.[h|C]
  • importSquareMatrix has 3 main pieces:
    • constructDistributor
      • hard to draw exact parallels, but seems to include some of the LID-level plan computation that would happen in the Tpetra::Import constructor
    • communicateRowMap
      • seems to include more of the LID-level plan computation
    • communicateMatrixData
      • packs data directly out of the matrix (i.e. rakes out values directly, doesn't use packAndPrepare)
      • sends the data with the Distributor constructed above
      • makes a brand-new CrsMatrix and inserts the values it received with CrsMatrix::insertLocalValues
      • calls CrsMatrix::fillComplete

Top guesses for where to look first (based on key differences in implementations):

  • packAndPrepare and/or unpackAndCombine -- both are skipped in the Sierra/SD implementation
    • Is there some inefficiency in CrsMatrix's implementation of these functions?
    • Does the DistObject transfer code handle the pack/unpack inefficiently given what we know about how CrsMatrix will pack? (i.e. has Sierra/SD found an efficiency by specializing for CrsMatrix?)
  • Import constructor -- also skipped in favor of direct communication
    • Does it do more work/use more memory than needed?

@github-actions
Copy link

This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity.
If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE label.
If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE.
If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.

@github-actions github-actions bot added the MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. label Sep 10, 2023
@csiefer2 csiefer2 added DO_NOT_AUTOCLOSE This issue should be exempt from auto-closing by the GitHub Actions bot. and removed MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. labels Sep 12, 2023
@jhux2 jhux2 added this to Tpetra Aug 12, 2024
@jhux2 jhux2 moved this to In Progress in Tpetra Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client: Sierra All issues that primarily impacts SNL Sierra codes DO_NOT_AUTOCLOSE This issue should be exempt from auto-closing by the GitHub Actions bot. pkg: Tpetra type: bug The primary issue is a bug in Trilinos code or tests
Projects
Status: In Progress
Development

No branches or pull requests

2 participants