Feature/batched quda deflation #76

leonhostetler · 2024-12-22T14:41:15Z

This pull request implements QUDA deflation for ks_spectrum with support for multiple right-hand sides. Previously, MILC deflation was done on CPU.

Key points:

To use all of the features, ks_spectrum must be compiled with WANT_QUDA, WANT_FN_CG_GPU, and WANT_EIG_GPU all true
QUDA deflation is implemented for UML, CG, and CGZ, for single and multiple right-hand sides but will only apply to the even parity solves
Eigenvector files are loaded and saved directly by QUDA--MILC's corresponding functions are bypassed
Using fresh_ks_eigen with ks_spectrum will trigger QUDA's eigensolve internally. MILC's eigensolve functions are bypassed
This functionality depends on changes made from the QUDA side as well. Until those are merged into QUDA develop, you can use the leonhostetler/milc_batched_deflation branch of https://github.com/leonhostetler/quda.git

More details:

Using ks_spectrum with fresh_ks_eigen is now working. So there is no longer a need to do a two-part process where the eigenvectors are generated using QUDA's standalone eigensolver and then using MILC's ks_spectrum to load the eigenvectors and do the deflation. The ks_spectrum application can now handle both the eigensolve and CG solves in the same run.

This is implemented for UML, CG, and CGZ, however, deflation only occurs for the even parity solves. For UML, where the odd parity solve is just a polishing of the odd solution reconstructed from the even solution, this works well since the odd solve typically requires many fewer iterations. However, for CG and CGZ, this means that only the even half of the problem will be sped up by deflation. If there is a need for odd parity deflation, we'll need to think about how best to implement that in the future.

Note that eigenvector files are loaded and saved from within QUDA--not MILC. This was both the simplest way to interface with the QUDA solver and the way that ensured minimal memory usage. For example, if MILC loaded the eigenvectors and then passed them to QUDA, then the host memory usage would be doubled, and this is not a feasible approach given the size of eigenvectors. This way, MILC only passes the filenames back and forth to QUDA. If e.g. one wants to use non-QUDA eigenvectors with the QUDA deflated solver, then one would need a separate utility to convert the file to QUDA-readable format, save it to disk, and then run ks_spectrum with the QUDA solver.

The QUDA deflation should work fine for varying masses. If different quark masses are used for different propagators, the eigenvectors remain the same, but the eigenvalues need to be updated since they depend on the quark mass by $+4m^2$. This is taken care of automatically. The eigenvalues are preserved unless the quark mass changes, and then they are automatically recalculated.

In a real-world application to compute many correlators, the job is typically chunked into readin sets. The gauge field is loaded for the first readin set and then "continue" is used for subsequent readin sets. With QUDA's ability to preserve the deflation space, the eigenvectors are handled in a similar manner. For the first readin set, the eigenvectors are either read in or generated. For subsequent readin sets, one must still include the parameters for reloading or generating the eigenvectors, however, these are ignored because QUDA will just continue with the initial set of eigenvectors. Thus, for multiple readin sets one does not have to worry that unnecessary time is spent reloading the eigenvectors and recomputing the eigenvalues. This also means that one cannot change eigenvector sets during a run. This behavior could be modified by changing qep.preserve_deflation_space if desired, but I don't think it's necessary. If one wants to switch to different eigenvectors, one might as well do a separate run.

One can adjust the eigensolver precision from the input parameter file. Typically, single precision should be fine. However, with such "sloppy" eigenvectors, it is important that the deflation is repeated periodically during the CG solve. This is controlled by the tol_restart parameter.

When eigenvectors are saved to disk, they are saved in single precision. This could be modified easily, but there should be no need to save them in double precision since single precision eigenvectors are fine provided that tol_restart is reasonable.

Note that QUDA's block TRLM does not seem to be working well yet, so leave block_size at 1.

In general, when using QUDA deflation, the ks_spectrum application will need an input file with parameters like:

max_number_of_eigenpairs 512 		# How many eigenvectors to use for deflation
tol_restart 1e-2 			# How often to do the redeflation

When loading eigenvectors from file, use a parameter block like:

reload_parallel_ks_eigen [filename]	# This works with both single file and partfile formats
file_number_of_eigenpairs 512		# In case the file has more eigenvectors than will be used to deflate
forget_ks_eigen 			# Don't save the eigenvectors to file

Alternatively, when generating fresh eigenvectors, use a parameter block like:

fresh_ks_eigen				# Run QUDA's eigensolver
save_partfile_ks_eigen [filename] 	# Use save_parallel_ks_eigen for single file format or forget_ks_eigen to discard
Max_Lanczos_restart_iters 1000		# Max number of Lanczos restart iterations
eigenval_tolerance 1e-12		# Eigenvalue tolerance
Lanczos_max 1024			# Size of Krylov space, corresponds to QUDA's n_kr
Lanczos_restart 1000			# Deprecated, does nothing as far as I can tell
eigensolver_prec 1			# Precision in eigensolver, double=2, single=1, half=0
batched_rotate 20			# Size of batch_rotate
Chebyshev_alpha 0.1			# Must be larger than 4*m^2 for largest quark mass that will be deflated
Chebyshev_beta 0			# Leave at 0 for QUDA to estimate internally
Chebyshev_order 100			# Chebyshev order
block_size 1				# block_size>1 implies block TRLM (doesn't work well yet?)

Also, don't forget to set

deflate yes/no

in the propagator stanzas.

stevengottlieb · 2024-12-22T15:35:24Z

This looks great, Leon. Your explanation of the details is super. Thanks, Steve On Dec 22, 2024, at 9:41 AM, Leon Hostetler ***@***.***> wrote: This pull request implements QUDA deflation for ks_spectrum with support for multiple right-hand sides. Previously, MILC deflation was done on CPU. Key points: 1. To use all of the features, ks_spectrum must be compiled with WANT_QUDA, WANT_FN_CG_GPU, and WANT_EIG_GPU all true 2. QUDA deflation is implemented for UML, CG, and CGZ, for single and multiple right-hand sides but will only apply to the even parity solves 3. Eigenvector files are loaded and saved directly by QUDA--MILC's corresponding functions are bypassed 4. Using fresh_ks_eigen with ks_spectrum will trigger QUDA's eigensolve internally. MILC's eigensolve functions are bypassed 5. This functionality depends on changes made from the QUDA side as well. Until those are merged into QUDA develop, you can use the leonhostetler/milc_batched_deflation branch of https://github.com/leonhostetler/quda.git More details: Using ks_spectrum with fresh_ks_eigen is now working. So there is no longer a need to do a two-part process where the eigenvectors are generated using QUDA's standalone eigensolver and then using MILC's ks_spectrum to load the eigenvectors and do the deflation. The ks_spectrum application can now handle both the eigensolve and CG solves in the same run. This is implemented for UML, CG, and CGZ, however, deflation only occurs for the even parity solves. For UML, where the odd parity solve is just a polishing of the odd solution reconstructed from the even solution, this works well since the odd solve typically requires many fewer iterations. However, for CG and CGZ, this means that only the even half of the problem will be sped up by deflation. If there is a need for odd parity deflation, we'll need to think about how best to implement that in the future. Note that eigenvector files are loaded and saved from within QUDA--not MILC. This was both the simplest way to interface with the QUDA solver and the way that ensured minimal memory usage. For example, if MILC loaded the eigenvectors and then passed them to QUDA, then the host memory usage would be doubled, and this is not a feasible approach given the size of eigenvectors. This way, MILC only passes the filenames back and forth to QUDA. If e.g. one wants to use non-QUDA eigenvectors with the QUDA deflated solver, then one would need a separate utility to convert the file to QUDA-readable format, save it to disk, and then run ks_spectrum with the QUDA solver. The QUDA deflation should work fine for varying masses. If different quark masses are used for different propagators, the eigenvectors remain the same, but the eigenvalues need to be updated since they depend on the quark mass by $+4m^2$. This is taken care of automatically. The eigenvalues are preserved unless the quark mass changes, and then they are automatically recalculated. In a real-world application to compute many correlators, the job is typically chunked into readin sets. The gauge field is loaded for the first readin set and then "continue" is used for subsequent readin sets. With QUDA's ability to preserve the deflation space, the eigenvectors are handled in a similar manner. For the first readin set, the eigenvectors are either read in or generated. For subsequent readin sets, one must still include the parameters for reloading or generating the eigenvectors, however, these are ignored because QUDA will just continue with the initial set of eigenvectors. Thus, for multiple readin sets one does not have to worry that unnecessary time is spent reloading the eigenvectors and recomputing the eigenvalues. This also means that one cannot change eigenvector sets during a run. This behavior could be modified by changing qep.preserve_deflation_space if desired, but I don't think it's necessary. If one wants to switch to different eigenvectors, one might as well do a separate run. One can adjust the eigensolver precision from the input parameter file. Typically, single precision should be fine. However, with such "sloppy" eigenvectors, it is important that the deflation is repeated periodically during the CG solve. This is controlled by the tol_restart parameter. When eigenvectors are saved to disk, they are saved in single precision. This could be modified easily, but there should be no need to save them in double precision since single precision eigenvectors are fine provided that tol_restart is reasonable. Note that QUDA's block TRLM does not seem to be working well yet, so leave block_size at 1. In general, when using QUDA deflation, the ks_spectrum application will need an input file with parameters like: max_number_of_eigenpairs 512 # How many eigenvectors to use for deflation tol_restart 1e-2 # How often to do the redeflation When loading eigenvectors from file, use a parameter block like: reload_parallel_ks_eigen [filename] # This works with both single file and partfile formats file_number_of_eigenpairs 512 # In case the file has more eigenvectors than will be used to deflate forget_ks_eigen # Don't save the eigenvectors to file Alternatively, when generating fresh eigenvectors, use a parameter block like: fresh_ks_eigen # Run QUDA's eigensolver save_partfile_ks_eigen [filename] # Use save_parallel_ks_eigen for single file format or forget_ks_eigen to discard Max_Lanczos_restart_iters 1000 # Max number of Lanczos restart iterations eigenval_tolerance 1e-12 # Eigenvalue tolerance Lanczos_max 1024 # Size of Krylov space, corresponds to QUDA's n_kr Lanczos_restart 1000 # Deprecated, does nothing as far as I can tell eigensolver_prec 1 # Precision in eigensolver, double=2, single=1, half=0 batched_rotate 20 # Size of batch_rotate Chebyshev_alpha 0.1 # Must be larger than 4*m^2 for largest quark mass that will be deflated Chebyshev_beta 0 # Leave at 0 for QUDA to estimate internally Chebyshev_order 100 # Chebyshev order block_size 1 # block_size>1 implies block TRLM (doesn't work well yet?) Also, don't forget to set deflate yes/no in the propagator stanzas.

…

leonhostetler added 6 commits December 14, 2024 15:37

Added QUDA deflation for UML, CG, and CGZ for single right-hand side

10c485f

Fixed fresh and save options for QUDA eigenvectors

edcfd92

Updated some interfacing with quda

2535fbf

Fixed deflate savebuf was overwriting mass savebuf

eb301ba

QUDA batched deflation for multiple right hand sides

7a6d501

Added to input parameters

33e58df

leonhostetler mentioned this pull request Dec 22, 2024

MILC batched deflation lattice/quda#1529

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/batched quda deflation #76

Feature/batched quda deflation #76

leonhostetler commented Dec 22, 2024

stevengottlieb commented Dec 22, 2024 via email

Feature/batched quda deflation #76

Are you sure you want to change the base?

Feature/batched quda deflation #76

Conversation

leonhostetler commented Dec 22, 2024

stevengottlieb commented Dec 22, 2024 via email