Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kokkos and its simtbx/LS49 tests to Azure build #728

Merged
merged 16 commits into from
Mar 30, 2022
Merged

Conversation

phyy-nx
Copy link
Contributor

@phyy-nx phyy-nx commented Jan 25, 2022

Co-authored-by: Felix Wittwer [email protected]
Co-authored-by: Billy K. Poon [email protected]
Co-authored-by: Nicholas Sauter [email protected]

@nksauter
Copy link
Contributor

Actually the Kokkos build would only be expected to work with std c++ >= 11 and probably not with Python 2.7. Is it possible to readjust for that?

@bkpoon
Copy link
Member

bkpoon commented Jan 26, 2022

The XFEL CI tests are only in Python 3 and have C++11 enabled.

But the flag is for configure.py, not boostrap.py. The change should be

--config-flags="--enable_kokkos"

I just updated the commit and adjusted the formatting.

@bkpoon
Copy link
Member

bkpoon commented Jan 26, 2022

On macOS, it looks like the same error that @mewall had. The gpu extension is not built since --enable_cuda is not provided, but something needs it?

On linux, the KOKKOS_CXXFLAGS has echo as the first item. It should not be there.

@bkpoon
Copy link
Member

bkpoon commented Jan 26, 2022

Wait, kokkos is not enabled for macOS.

# only build kokkos on linux and if kokkos is enabled
if sys.platform.startswith('linux') and env_etc.enable_kokkos:
env_simtbx.SConscript("kokkos/SConscript",exports={ 'env' : env_simtbx })

@bkpoon
Copy link
Member

bkpoon commented Jan 26, 2022

@JBlaschke, would #675 help with getting rid of echo?

@phyy-nx
Copy link
Contributor Author

phyy-nx commented Jan 26, 2022

The echo is coming from here:

kokkos_cxxflags = subprocess.check_output(
['make', '-f', 'Makefile.kokkos', 'print-cxx-flags'],
cwd=os.environ['KOKKOS_PATH'])

I looked into the Makefile.kokkos and there's lots of calls to echo, wrapped in shell commands. Maybe those aren't working on Azure? That's why I tried to print os.envrion['SHELL'], but it appears that's not set on Azure.

@phyy-nx
Copy link
Contributor Author

phyy-nx commented Jan 26, 2022

Contents of kokkos_cxxflags right before it used:

['echo', '"-std=c++14', '-march=core-avx2', '-mtune=core-avx2', '-fopenmp', '-I./', '-I/__w/1/modules/kokkos/core/src', '-I/__w/1/modules/kokkos/containers/src', '-I/__w/1/modules/kokkos/algorithms/src"']

@@ -100,6 +111,9 @@ steps:
chmod +x $(Pipeline.Workspace)/modules/xfel_regression/merging_test_data/merge_thermo.csh
export OMP_NUM_THREADS=4
libtbx.run_tests_parallel module=uc_metrics module=simtbx module=xfel_regression module=LS49 nproc=4
echo "DEBUG"
cat mp4k/rank_0*.err
echo "DEBUG2"
Copy link
Member

@bkpoon bkpoon Jan 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These debugging lines will cause this step to always pass since the last command will always run correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were going to set OMP_NUM_THREADS to 2 to get rid of the OpenMP warning and I'm testing a change that enables kokkos on macOS.

@bkpoon
Copy link
Member

bkpoon commented Jan 27, 2022

Now there are 4 remaining test failures

2022-01-27T18:59:52.3610355Z libtbx.python "/__w/1/modules/LS49/adse13_187/cyto_batch.py" N_total=1 test_pixel_congruency=True mosaic_method=double_random mosaic_spread_samples=50 write_output=False test_without_mpi=True log.outdir=mp1k nxmx_local_data=/global/cfs/cdirs/m3562/der/master_files/run_000795.JF07T32V01_master.h5 context=kokkos_gpu [FAIL] 2.2s
2022-01-27T18:59:52.3611179Z   Time:  2.22
2022-01-27T18:59:52.3611567Z   Return code: 1
2022-01-27T18:59:52.3611860Z   OKs: 0
2022-01-27T18:59:52.3612564Z libtbx.python "/__w/1/modules/LS49/adse13_187/tst_multipanel_argchk.py" N_total=1 mosaic_spread_samples=50 test_without_mpi=True log.outdir=mp2k nxmx_local_data=/global/cfs/cdirs/m3562/der/master_files/run_000795.JF07T32V01_master.h5 context=kokkos_gpu [FAIL] 1.9s
2022-01-27T18:59:52.3613376Z   Time:  1.87
2022-01-27T18:59:52.3613671Z   Return code: 1
2022-01-27T18:59:52.3614293Z   OKs: 0
2022-01-27T18:59:52.3615077Z libtbx.python "/__w/1/modules/LS49/adse13_187/tst_write_file_action.py" N_total=1 mosaic_method=double_random mosaic_spread_samples=50 test_without_mpi=True log.outdir=mp3k write_output=False nxmx_local_data=/global/cfs/cdirs/m3562/der/master_files/run_000795.JF07T32V01_master.h5 context=kokkos_gpu [FAIL] 1.8s
2022-01-27T18:59:52.3615871Z   Time:  1.84
2022-01-27T18:59:52.3616154Z   Return code: 1
2022-01-27T18:59:52.3616549Z   OKs: 0
2022-01-27T18:59:52.3618254Z libtbx.python "/__w/1/modules/LS49/adse13_187/tst_write_file_action.py" N_total=1 write_output=True write_experimental_data=True mosaic_spread_samples=62 test_without_mpi=True log.outdir=mp4k nxmx_local_data=/global/cfs/cdirs/m3562/der/master_files/run_000795.JF07T32V01_master.h5 mask_file=/global/cfs/cdirs/m3562/nks/adse13_187/13_221/event_648.mask context=kokkos_gpu [FAIL] 1.9s
2022-01-27T18:59:52.3619147Z   Time:  1.91
2022-01-27T18:59:52.3619446Z   Return code: 1
2022-01-27T18:59:52.3619829Z   OKs: 0

The change for using kokkos on macOS has not been committed yet.

@bkpoon
Copy link
Member

bkpoon commented Mar 14, 2022

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@bkpoon
Copy link
Member

bkpoon commented Mar 15, 2022

/azp run "XFEL CI"

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@bkpoon
Copy link
Member

bkpoon commented Mar 15, 2022

/azp run XFEL CI

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bkpoon
Copy link
Member

bkpoon commented Mar 17, 2022

/azp run XFEL CI

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@bkpoon
Copy link
Member

bkpoon commented Mar 17, 2022

/azp run XFEL CI

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

phyy-nx and others added 6 commits March 30, 2022 06:46
@bkpoon bkpoon merged commit 03272eb into master Mar 30, 2022
@bkpoon bkpoon deleted the kokkos_azure branch March 30, 2022 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants