Different numbers of iterations while running code in different hardware paltform #1

Lynkzhang · 2018-04-06T11:03:29Z

I ran AMG on Xeon, KNL and KNM platform, but when I was running on KNM/KNL I find that the numbers of iterations become much more than on Xeon(with same configuration), what's more, when I running with 1 OMP thread per process, the number of iterations will become 0 on KNM/KNL, is there any way(like some flag) to control the number of iteration, or this is a problem of the code?

ulrikeyang · 2018-04-06T14:07:59Z

That should not happen. I have not observed any such behavior on the various machines where I ran and tested the code, however I never ran it on KNM/KNL. How did you build and run the code?

Lynkzhang · 2018-04-07T07:29:02Z

Thank you so much for replying me!
Here is how I compile:
source /opt/intel/parallel_studio_xe_2018.1.038/bin/psxevars.sh intel64 export I_MPI_CC=icc export I_MPI_CXX=icpc export I_MPI_F77=ifort export I_MPI_F90=ifort make
for the make file, I added flag: -ipo -xHost

here is how I run this,I am using sde for profile:
mpiexec -host $HOST -genv OMP_NUM_THREADS=1 -n 64 bash -c "/home/Lynkzhang/test/precision/dep/sde-external-8.16.0-2018-01-30-lin/sde64 -sse-sde -global_region -mix_omit_per_thread_stats -mix_omit_per_function_stats -start_ssc_mark 111:repeat -stop_ssc_mark 222:repeat -iform 1 -omix oSDE/"$MPI_LOCALRANKID".txt -knm -- ./test/amg -problem 1 -P 4 4 4 -n 64 64 64"

simply speaking, I am running something like:
mpiexec -host $HOST -genv OMP_NUM_THREADS=1 -n 64 ./amg -problem 1 -P 4 4 4 -n 64 64 64

here is the output:

Running with these driver parameters:
solver ID = 1

Laplacian_27pt:
(Nx, Ny, Nz) = (256, 256, 256)
(Px, Py, Pz) = (4, 4, 4)

=============================================
Generate Matrix:

Spatial Operator:
wall clock time = 17.962424 seconds
wall MFLOPS = 0.000000
cpu clock time = 17.950000 seconds
cpu MFLOPS = 0.000000

RHS vector has unit components
Initial guess is 0

IJ Vector Setup:

RHS and Initial Guess:
wall clock time = 3.546373 seconds
wall MFLOPS = 0.000000
cpu clock time = 3.550000 seconds
cpu MFLOPS = 0.000000

=============================================
Problem 1: AMG Setup Time:

PCG Setup:
wall clock time = 81.694932 seconds
wall MFLOPS = 0.000000
cpu clock time = 81.700000 seconds
cpu MFLOPS = 0.000000

FOM_Setup: nnz_AP / Setup Phase Time: 0.000000e+00

Walltime of the main kernel: 16.958590 sec

Problem 1: AMG-PCG Solve Time:

PCG Solve:
wall clock time = 17.080244 seconds
wall MFLOPS = 0.000000
cpu clock time = 17.090000 seconds
cpu MFLOPS = 0.000000

Iterations = 0
Final Relative Residual Norm = 0.000000e+00

FOM_Solve: nnz_AP * Iterations / Solve Phase Time: 0.000000e+00

Figure of Merit (FOM_1): 1.480463e+06

Lynkzhang · 2018-04-10T06:41:00Z

The problem is compile on KNM/KNL with flag -xHost. The results become normal after I remove this flag.

Lynkzhang · 2018-04-11T07:53:42Z

we find that, the code will have problem when it is compiled with AVX512 instruction set (like compile with flag -xMIC-AVX512)

ulrikeyang · 2018-04-11T13:13:03Z

Can you give me some more information what that means. And what type of problem do you see. From: Lynkzhang <[email protected]> Sent: Wednesday, April 11, 2018 12:54 AM To: LLNL/AMG <[email protected]> Cc: Yang, Ulrike Meier <[email protected]>; Comment <[email protected]> Subject: Re: [LLNL/AMG] Different numbers of iterations while running code in different hardware paltform (#1) we find that, the code will have problem when it is compiled with AVX512 instruction set (like compile with flag -xMIC-AVX512) — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#1 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/APjVrUSXJy8VJRgtV9Xr3t9FgbxEjGj1ks5tnbaGgaJpZM4TJ4FZ>.

Lynkzhang · 2018-04-12T05:45:22Z

Like I mentioned before, the problem is number of iterations become abnormal.
especially running with one thread per process, the number will be 0.
Under other conditions(like run binary directly ./amg ), the number of iterations will change to a different random number each time.

ulrikeyang · 2018-04-13T15:47:30Z

What does using -xMIC-AVX512 change?

Lynkzhang closed this as completed Apr 10, 2018

Lynkzhang reopened this Apr 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different numbers of iterations while running code in different hardware paltform #1

Different numbers of iterations while running code in different hardware paltform #1

Lynkzhang commented Apr 6, 2018 •

edited

Loading

ulrikeyang commented Apr 6, 2018

Lynkzhang commented Apr 7, 2018 •

edited

Loading

Lynkzhang commented Apr 10, 2018

Lynkzhang commented Apr 11, 2018

ulrikeyang commented Apr 11, 2018 via email

Lynkzhang commented Apr 12, 2018

ulrikeyang commented Apr 13, 2018

Different numbers of iterations while running code in different hardware paltform #1

Different numbers of iterations while running code in different hardware paltform #1

Comments

Lynkzhang commented Apr 6, 2018 • edited Loading

ulrikeyang commented Apr 6, 2018

Lynkzhang commented Apr 7, 2018 • edited Loading

============================================= Generate Matrix:

RHS vector has unit components Initial guess is 0

IJ Vector Setup:

============================================= Problem 1: AMG Setup Time:

Walltime of the main kernel: 16.958590 sec

Problem 1: AMG-PCG Solve Time:

Lynkzhang commented Apr 10, 2018

Lynkzhang commented Apr 11, 2018

ulrikeyang commented Apr 11, 2018 via email

Lynkzhang commented Apr 12, 2018

ulrikeyang commented Apr 13, 2018

Lynkzhang commented Apr 6, 2018 •

edited

Loading

Lynkzhang commented Apr 7, 2018 •

edited

Loading

=============================================
Generate Matrix:

RHS vector has unit components
Initial guess is 0

=============================================
Problem 1: AMG Setup Time: