Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/batched quda deflation #76

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

leonhostetler
Copy link
Collaborator

This pull request implements QUDA deflation for ks_spectrum with support for multiple right-hand sides. Previously, MILC deflation was done on CPU.

Key points:

  1. To use all of the features, ks_spectrum must be compiled with WANT_QUDA, WANT_FN_CG_GPU, and WANT_EIG_GPU all true
  2. QUDA deflation is implemented for UML, CG, and CGZ, for single and multiple right-hand sides but will only apply to the even parity solves
  3. Eigenvector files are loaded and saved directly by QUDA--MILC's corresponding functions are bypassed
  4. Using fresh_ks_eigen with ks_spectrum will trigger QUDA's eigensolve internally. MILC's eigensolve functions are bypassed
  5. This functionality depends on changes made from the QUDA side as well. Until those are merged into QUDA develop, you can use the leonhostetler/milc_batched_deflation branch of https://github.com/leonhostetler/quda.git

More details:

Using ks_spectrum with fresh_ks_eigen is now working. So there is no longer a need to do a two-part process where the eigenvectors are generated using QUDA's standalone eigensolver and then using MILC's ks_spectrum to load the eigenvectors and do the deflation. The ks_spectrum application can now handle both the eigensolve and CG solves in the same run.

This is implemented for UML, CG, and CGZ, however, deflation only occurs for the even parity solves. For UML, where the odd parity solve is just a polishing of the odd solution reconstructed from the even solution, this works well since the odd solve typically requires many fewer iterations. However, for CG and CGZ, this means that only the even half of the problem will be sped up by deflation. If there is a need for odd parity deflation, we'll need to think about how best to implement that in the future.

Note that eigenvector files are loaded and saved from within QUDA--not MILC. This was both the simplest way to interface with the QUDA solver and the way that ensured minimal memory usage. For example, if MILC loaded the eigenvectors and then passed them to QUDA, then the host memory usage would be doubled, and this is not a feasible approach given the size of eigenvectors. This way, MILC only passes the filenames back and forth to QUDA. If e.g. one wants to use non-QUDA eigenvectors with the QUDA deflated solver, then one would need a separate utility to convert the file to QUDA-readable format, save it to disk, and then run ks_spectrum with the QUDA solver.

The QUDA deflation should work fine for varying masses. If different quark masses are used for different propagators, the eigenvectors remain the same, but the eigenvalues need to be updated since they depend on the quark mass by $+4m^2$. This is taken care of automatically. The eigenvalues are preserved unless the quark mass changes, and then they are automatically recalculated.

In a real-world application to compute many correlators, the job is typically chunked into readin sets. The gauge field is loaded for the first readin set and then "continue" is used for subsequent readin sets. With QUDA's ability to preserve the deflation space, the eigenvectors are handled in a similar manner. For the first readin set, the eigenvectors are either read in or generated. For subsequent readin sets, one must still include the parameters for reloading or generating the eigenvectors, however, these are ignored because QUDA will just continue with the initial set of eigenvectors. Thus, for multiple readin sets one does not have to worry that unnecessary time is spent reloading the eigenvectors and recomputing the eigenvalues. This also means that one cannot change eigenvector sets during a run. This behavior could be modified by changing qep.preserve_deflation_space if desired, but I don't think it's necessary. If one wants to switch to different eigenvectors, one might as well do a separate run.

One can adjust the eigensolver precision from the input parameter file. Typically, single precision should be fine. However, with such "sloppy" eigenvectors, it is important that the deflation is repeated periodically during the CG solve. This is controlled by the tol_restart parameter.

When eigenvectors are saved to disk, they are saved in single precision. This could be modified easily, but there should be no need to save them in double precision since single precision eigenvectors are fine provided that tol_restart is reasonable.

Note that QUDA's block TRLM does not seem to be working well yet, so leave block_size at 1.

In general, when using QUDA deflation, the ks_spectrum application will need an input file with parameters like:

max_number_of_eigenpairs 512 		# How many eigenvectors to use for deflation
tol_restart 1e-2 			# How often to do the redeflation

When loading eigenvectors from file, use a parameter block like:

reload_parallel_ks_eigen [filename]	# This works with both single file and partfile formats
file_number_of_eigenpairs 512		# In case the file has more eigenvectors than will be used to deflate
forget_ks_eigen 			# Don't save the eigenvectors to file

Alternatively, when generating fresh eigenvectors, use a parameter block like:

fresh_ks_eigen				# Run QUDA's eigensolver
save_partfile_ks_eigen [filename] 	# Use save_parallel_ks_eigen for single file format or forget_ks_eigen to discard
Max_Lanczos_restart_iters 1000		# Max number of Lanczos restart iterations
eigenval_tolerance 1e-12		# Eigenvalue tolerance
Lanczos_max 1024			# Size of Krylov space, corresponds to QUDA's n_kr
Lanczos_restart 1000			# Deprecated, does nothing as far as I can tell
eigensolver_prec 1			# Precision in eigensolver, double=2, single=1, half=0
batched_rotate 20			# Size of batch_rotate
Chebyshev_alpha 0.1			# Must be larger than 4*m^2 for largest quark mass that will be deflated
Chebyshev_beta 0			# Leave at 0 for QUDA to estimate internally
Chebyshev_order 100			# Chebyshev order
block_size 1				# block_size>1 implies block TRLM (doesn't work well yet?)

Also, don't forget to set

deflate yes/no

in the propagator stanzas.

@stevengottlieb
Copy link
Collaborator

stevengottlieb commented Dec 22, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants