Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bs mmerge #7

Open
wants to merge 1,175 commits into
base: BSM
Choose a base branch
from
Open

Bs mmerge #7

wants to merge 1,175 commits into from

Conversation

pittlerf
Copy link

Hi,
I have put together the input files, and corrected an error in the smearing routine for the scalar field correlation functions. The openmpi version is not included yet.

Cheers

Feri

@kostrzewa
Copy link
Owner

@pittlerf @urbach @Marcogarofalo
I will only be able to start reviewing this next week (Feb 20th), but there are a number of other things which I need to treat with higher priority right now.

Copy link
Owner

@kostrzewa kostrzewa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine to go in once the requested changes have been made. I would prefer to see the contractions modularised because 4000 lines is a heck of a long source file...

@@ -276,7 +276,7 @@ _sse_store(r);
(r).c1 -= I * (s).c1; \
(r).c2 -= I * (s).c2;

#define complex_times_vector(r,c,s) \
#define _complex_times_vector(r,c,s) \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you remove the redefinition of this in line 688?

read_input.l Outdated
mu03_BSM = c;
if(myverbose != 0) printf(" BSM parameter mu03 set to %f\n", mu03_BSM);
}
{SPC}*propagatorsonthefly{EQL}{DIGIT}+ {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually follow the convention of yes and no for boolean inputs

@@ -40,7 +40,7 @@ libsolver_TARGETS = bicgstab_complex gmres incr_eigcg eigcg restart_X ortho \
generate_dfl_subspace dfl_projector \
cg_mms_tm cg_mms_tm_nd solver_field sumr mixed_cg_her index_jd \
dirac_operator_eigenvectors spectral_proj \
jdher_su3vect cg_her_su3vect eigenvalues_Jacobi
jdher_su3vect cg_her_su3vect eigenvalues_Jacobi eigenvalues_krylov
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot build the code right now because eigenvalues_krylov.c is missing

@pittlerf
Copy link
Author

Hi,

I have set up the yes / no input parameters and I created a new library for the contractions and put the necessary functions there. In this way the main executable is only 627 line.

Cheers

Feri

Copy link
Owner

@kostrzewa kostrzewa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, this is looking excellent now. Would you like to claim ownership of your additions?

A remaining issue, I think, is that there is no Makefile.in for the new subdirectory, would you agree?

About the initialisations, I would only worry if these are a significant overhead, if they are not, we are ready to pull this in.

@@ -0,0 +1,32 @@
/***********************************************************************
*
* Copyright (C) 2015 Mario Schroeck
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be "Ferenc Pittler" :)

@@ -0,0 +1,2381 @@
/***********************************************************************
*
* Copyright (C) 2009 Carsten Urbach
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clearly, you are the author :)

@@ -380,10 +380,12 @@ if( strcmp(scalar_input_filename, "create_random_scalarfield") == 0 ) {
printf("\n# square norm of the source: ||w||^2 = %e\n\n", squarenorm);
fflush(stdout);
}

//initialize BSM2f operator
init_D_psi_BSM2f();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of interest, what is the overhead of initialising and freeing this one per gauge configuration?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's a tiny fraction of total time, then this can remain, of course.

@pittlerf
Copy link
Author

Hi,

Thanks for noting me, I claimed the authorship and added a Makefile.in for the contractions directory.

Cheers

Feri

@kostrzewa
Copy link
Owner

I think there might be a small issue when the local lattice volume is just two lattice sites:

$ mpirun -np 16 ./test_DslashBSM2
parameter rho_BSM set to 1.000000
parameter eta_BSM set to 1.000000
parameter  m0_BSM set to 0.000000
# Creating the following cartesian grid for a 4 dimensional parallelisation:
# 2 x 2 x 2 x 2
# The code was compiled with -D_GAUGE_COPY
# The code was compiled for non-blocking MPI calls (spinor and gauge)

# The number of processes is 16 
# The lattice size is 4 x 4 x 4 x 4
# The local lattice size is 2 x 2 x 2 x 2
# Initialising rectangular gauge action stuff
# The lattice is correctly mapped by the index arrays

# The computed plaquette value is 1.218448e-01.
Testing inversion

# square norm of the source: ||w||^2 = 1.228800e+04

cg_her_bi: Inversion took 0.162697 seconds for 42 iterations
Operator inversion time: t_F = 0.162984 sec 

time 0.0200288296
Operator application time: t_F = 0.020029 sec 

# || D_F w ||^2 = 8.4426846639613839e+04
Check consistency of D and D^dagger
Check consistency of D and D^dagger
Check consistency of D and D^dagger
Check consistency of D and D^dagger
Check consistency of D and D^dagger
Check consistency of D and D^dagger
Check consistency of D and D^dagger
Check consistency of D and D^dagger
Check consistency of D and D^dagger
Check consistency of D and D^dagger
Check consistency of D and D^dagger
Check consistency of D and D^dagger
Check consistency of D and D^dagger
Check consistency of D and D^dagger
Check consistency of D and D^dagger
Check consistency of D and D^dagger
< D_F w, v >        = 1.3171860303588308e+04 + I*(-5.7667488605880299e+03)
Operator dagger application time: t_F = 0.001365 sec 	 
< w, D_F^dagger v > = 1.3171860303588310e+04 + I*(-5.7667488605880308e+03)

| < D_F w, v > - < w, D_F^dagger v > | = 2.0336919783401661e-12

@kostrzewa
Copy link
Owner

@pittlerf were you able to understand why particular parallelisations seem to produce errors at the 1e-12 level?

@pittlerf
Copy link
Author

I will check it now.

@kostrzewa
Copy link
Owner

Hi Feri, you keep adding changes to this without seemingly addressing any of the points in this review. Should I just close this then and we forget about it?

@kostrzewa
Copy link
Owner

In particular, there was a version of this which was okay to pull in, then you added lots of new stuff and now I have no idea what is what...

@@ -469,7 +469,7 @@ int main(int argc, char *argv[]){
}//end of loop for spinor and color source degrees of freedom
}
if (g_cart_id == 0){
snprintf(contractions_fname,200,"bsmcontractions.%.4d.%d.%.8d",nstore, src_idx, iscalar);
snprintf(contractions_fname,200,"bsmcontractions.%.4d.%d.%.8d",nstore, isample, iscalar);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isample is only for stochastic sources

@pittlerf
Copy link
Author

Hi,
Of course I can remove invert_save and using the value of propagatorsonthefly_BSM in op_invert. For the other thing I saw that it is within a loop index-ed by isample, that was my only reason to include it. Besides, the whole code is now written for source index zero only. Actually I recently changed the format of the output (was asked by Petros).
Feri

@pittlerf
Copy link
Author

Hi, I checked the contractions code using parallelization with local lattice dimension 222*2. I got the same results. I divided actually a 4**4 lattice in each direction to two MPI threads, and checked the output file for the contraction routine. I use the propagator on the fly option, so the parallelization of the operator was also tested.

@kostrzewa
Copy link
Owner

Hi, I checked the contractions code using parallelization with local lattice dimension 222*2. I got the same results. I divided actually a 4**4 lattice in each direction to two MPI threads, and checked the output file for the contraction routine. I use the propagator on the fly option, so the parallelization of the operator was also tested.

Yes, but the problem is not at the contraction level and there it very likely depends on the parameters whether an issue will or will not show up. However, at the operator level (in test_Dslash), it seems that there is something going wrong and it would be good to understand why that is. It may well be that 1e-12 is to be expected, but then: can we trust residuals coming from an operator that has differences at the 1e-12 level?

@kostrzewa
Copy link
Owner

@pittlerf

Of course I can remove invert_save and using the value of propagatorsonthefly_BSM in op_invert. For the other thing I saw that it is within a loop index-ed by isample, that was my only reason to include it. Besides, the whole code is now written for source index zero only. Actually I recently changed the format of the output (was asked by Petros).

It seems to me that what is necessary is simply a change in the way the BSM inversion is handled in op_invert() and at the BSM operator level. The operator struct has pointers for four propagators, which is clearly insufficient for a full two-flavour inversion (and even more insufficient when also the inverse of the conjugate operator is required). As a result, extending the operator struct is probably the way to go. One could, for instance, add a pointer to an array of spinor fields. For the two-flavour operators, this could then be set to point to some allocated memory for four sets of four e/o propagators (16 volume/2 spinor fields) to store the down-source M and M^dagger propagators as well ass the up-source M and M^dagger propagators. After the call to inverter(), these can be cleanly extracted in the contraction code, without adding further auxiliary fields or additional functions which replicate lots of code.

@pittlerf
Copy link
Author

Hi,
I have printed out the whole vector using different parallelization and I found aggreement upto 20 digits (I plot exactly that many digits). I do not actually now how MPI_ALLREDUCE works, but that can be the only difference. I tested, when I use the kahan summation the error will be twice as large when I use 2 MPI process. But even this linear scaling fails for large number of processes, for example when I use 16 the error will only 4 times larger.
Feri

kostrzewa pushed a commit that referenced this pull request Apr 27, 2017
check qphix input parameters and set some defaults in the interface
kostrzewa and others added 12 commits October 10, 2017 17:42
…mergetest

conflict as a result of the data type change for the clover field
…r selection logic when QPhiX is not used for a monomial
…oken, as are the qphix base classes and the wrapper functions for the full-spinor operators. This is due to github issue etmc#404 which should be fixed if we ever need the tests again (this should be done at some point, especially if we want to get the QPhiX interface tested under Travis)
Fix GH#400, DDalphaAMG interface should use correct TM_USE_OMP prepro…
pittlerf and others added 28 commits December 7, 2020 14:02
…d make use of ternary operators in computing signs, removing static declaration from static const int sign
add some FIXMEs from our discussion on the 4th of March, 2021
…ng the non-local interaction for the scalar field to F(x+-mu)-F(x)
…so in the rest we have to take into account 3.5r0
…e of csw we actually add 1+0.5*csw*clover instead of 0.5*(1+csw*clover), after the modification code still passes Marco's test
pittlerf pushed a commit to pittlerf/tmLQCD that referenced this pull request Sep 10, 2021
synch up with quda_work and quda_work_add_actions branches
if ( (vectorcurrentcurrent_BSM == 1 ) || ( axialcurrentcurrent_BSM == 1 )){
for(src_idx = 0; src_idx < 12; src_idx++ )
{
snprintf(prop_fname,200,"bsm2prop.%.4d.%.2d.%02d.%.8d.inverted",nstore, T_global-1, src_idx, iscalar);
Copy link
Owner

@kostrzewa kostrzewa Oct 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pittlerf What does T_global-1 encode here in the isample position?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, yeah, sorry yes, I just thought tat in the case of point sources that we use (just one at (0,0,0)), this might be the place, but of coarse, it should have a standalone place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants