-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read unstructured domain resample #18
Read unstructured domain resample #18
Conversation
…ction so it can be imported and called
I set all the tests for unstructured meshes to be known failures as they fail the tolerance test. I feel the tolerance is set to low for these tests, I could just set a tolerance which would make all the tests pass. Would that be the best thing to do? The maximum relative error for these tests are
Which are much larger than the current tolerance value of 0.005. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @jeff-cole
this is looking good, i have some detail comments within this review that would benefit from explaining or amending.
xios_examples/shared_testing.py
Outdated
elif nc_method == 'data_func': | ||
# create a netCDF file from an analytic function | ||
os.chdir(cls.test_dir) | ||
import xios_examples.gen_netcdf as gn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please may we keep import
s at the top of the module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved as requested
xios_examples/gen_netcdf.py
Outdated
|
||
def create_ncfile_unstructured(ncmeshout, meshin_file, meshin_varname, func, add_bounds=True, data_prefix='', data_suffix=''): | ||
|
||
# Create unstructured mesh from lfric mesh netCDF file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please may we have this as a triple-quote docstring for the function, rather than a hash-comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
def create_ncfile(ncfile, nlat, nlon, func, dim_prefix='', dim_suffix='', data_prefix='', data_suffix=''): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please may we have a triple-quote docstring for this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
mesh_varname=defaults['mesh_varname'], | ||
nlat=defaults['nlat'], nlon=defaults['nlon'], | ||
nlatr=defaults['nlatr'], nlonr=defaults['nlonr']): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a docstring would help here.
One unknown on reading: can file_out be a path that includes directories and file name, or is it limited to file name within current working dir? (see shared_testing comments)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring added. Yes you can use full paths.
xios_examples/shared_testing.py
Outdated
inf], cwd=cls.test_dir, check=True) | ||
elif nc_method == 'data_func': | ||
# create a netCDF file from an analytic function | ||
os.chdir(cls.test_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd like to avoid changing directories within the test runs if possible.
could gn.run(
take a full path, using cls.test_dir
to avoid the os.chdir
call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this works fine, changed
# these tests are hitting exceptions with XIOS3 | ||
# but not XIOS2, so skip for XIOS3 runner | ||
setattr(TestResampleDomain, tname, | ||
unittest.skip(TestResampleDomain.make_a_resample_test(f))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please may i check, are there exceptions with XIOS3 for these tests, different from the known_failures
?
if so, what is XIOS3 reporting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must attempt I never tried these tests with XIOS3 just assumed they did not work as per the other tests. I not sure what the other tests failures are, is there any information on that? Anyway I tried running with XIOS3 and got this error
In file "field.cpp", function "void xios::CField::setContextClient(xios::CContextClient*)", line 1496 -> Grid not defined for odatax (if field is an input field, set read_access to true)
which didn't happen with XIOS2. The field is an input field but I don't know what set read_access to true
means, I have already got mode="read"
, do you have any idea on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just had a look at the documentation and there is a read_access attribute
but it is for a field not a file as I originally tried. From the user guide
In order not to waste memory, the instant data of a field can be read from the
model only if:
• it is part of a file whose attribute mode is “read”
• or its attribute read_access is set to “true”.
In any other case, trying to access the field data would cause an error.
Looks like this has changed in XIOS3
After adding the read_access
attribute I no longer got the above error I now get this error
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x7f38a25483ff in ???
#1 0x7f38a61a4caa in _ZNKSs4_Rep12_M_is_leakedEv
at /home/conda/feedstock_root/build_artifacts/gcc_compilers_1715013693798/work/libstdc++-v3/include/bits/cow_string.h:214
#2 0x7f38a61a5da7 in _ZNSs4_Rep7_M_grabERKSaIcES2_
at /home/conda/feedstock_root/build_artifacts/gcc_compilers_1715013693798/work/libstdc++-v3/include/bits/cow_string.h:265
#3 0x7f38a61a5dee in _ZNSsC2ERKSs
at /home/conda/feedstock_root/build_artifacts/gcc_compilers_1715013693798/work/libstdc++-v3/include/bits/cow_string.h:546
#4 0x55c4acc707c0 in _ZN4xios20CTransformationPaths23getNextTransformationIdEv
at /home/users/jcole/software/xios3_trunk_r2629/src/transformation/transformation_path.cpp:97
#5 0x55c4ac99c806 in _ZN4xios5CGrid24buildTransformationGraphERNS_17CGarbageCollectorEbPS0_ddRS3_bPNS_6CFieldE
at /home/users/jcole/software/xios3_trunk_r2629/src/node/grid.cpp:2141
#6 0x55c4ac91ef0c in _ZN4xios6CField18buildWorkflowGraphERNS_17CGarbageCollectorE
at /home/users/jcole/software/xios3_trunk_r2629/src/node/field.cpp:746
#7 0x55c4ac652500 in _ZN4xios8CContext15closeDefinitionEv
at /home/users/jcole/software/xios3_trunk_r2629/src/node/context.cpp:1208
#8 0x55c4acb28e41 in cxios_context_close_definition
at /home/users/jcole/software/xios3_trunk_r2629/src/interface/c/icdata.cpp:133
#9 0x55c4ac2bfe37 in __idata_MOD_xios_close_context_definition
at /home/users/jcole/software/xios3_trunk_r2629/config_jasmin_debug/ppsrc/xios/interface/fortran/idata.f90:691
#10 0x55c4ac26f7cd in initialise
at /home/users/jcole/software/git/jeff-cole/tcd-XIOS-demonstration/xios_examples/read_unstructured_domain_resample/resample.F90:62
#11 0x55c4ac26f45c in resample
at /home/users/jcole/software/git/jeff-cole/tcd-XIOS-demonstration/xios_examples/read_unstructured_domain_resample/resample.F90:18
#12 0x55c4ac26fd98 in main
at /home/users/jcole/software/git/jeff-cole/tcd-XIOS-demonstration/xios_examples/read_unstructured_domain_resample/resample.F90:9
Is this similat to what you saw for the other tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @jeff-cole
Many of the tests run fine with XIOS2 & XIOS3 (there are some exceptions, but we'd like to keep these to a minimum if we can, and explore to try and have version-portable example code)
Where & how did you run these tests to get the invalid_memory_reference
error please?
i've just tried branching from your branch and enabling these test for XIOS3 in a fork-branch:
https://github.com/mo-marqh/tcd-XIOS-demonstration/tree/read_unstructured_domain_resample
mo-marqh@7640c1e
I have set off the github runners within my fork seems to run these tests for XIOS3 and complete, with the same tolerance known-error, but no Segmentation fault
https://github.com/mo-marqh/tcd-XIOS-demonstration/actions/runs/9157120279/job/25172852674#step:6:92
it would be good to try and understand this environment difference if we can.
In the meantime, please may you remove these skip patterns for these tests and run them on github for this branch?
thank you
mark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mo-marqh
I think using exceptedFailure hides errors other than not meeting the tolerance requirements. When I turned off skipping the XIOS3 test it passed with 4 known failures. When I then increased the tolerance value and removed the known failures it failed with a python netCDF4 problem, which was previously masked by using exceptedFailure.
I fixed the netCDF4 problem (I was using a feature which was added after the version being using by the github runners) and now it fails with the same error as I got above (not using read_access
). I will fix this and see if I get the invalid_memory_reference
error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get the same invalid_memory_reference
error, this time I got
/home/runner/work/tcd-XIOS-demonstration/tcd-XIOS-demonstration/xios_examples/read_unstructured_domain_resample/xios_client_0.err
-> error : WARNING: Unexpected request for buffer to communicate with server 0
In file "type_impl.hpp", function "void xios::CType<T>::_checkEmpty() const [with T = int]", line 210 -> Data is not initialized
I also tried running it on archer2 (not with the trunk but 2 versions earlier) and this seemed to get further but hung.
All of the read_axis/read_domain tests skip running XIOS3, can you remember what problem they had which made you skip them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My original runs were done on jasmin, this is what gave the invalid_memory_reference
error,
mesh_file = 'mesh_C12.nc' | ||
|
||
|
||
# A list of input `.cdl` files where XIOS is known to produce different |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a comment inconsistency, there are no input .cdl files
here, i think, they are generated from Python code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed this to not refer to .cdl files but I don't really understand what this comment means, there is no list here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can't see where this file is being used in any of the code, is it required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is needed to get the mesh topology information to write the LFRic unstructured netCDF file. It is referenced in file test_resample_cases.py
when you define the TestResampleDomain
class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and used in the gn.run
call in file shared_testing.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's clear, thank you.
given that this is an 84k binary file, i wonder whether it would be useful for repository readability to include this as a plain text .cdl
file and to build it using ncgen
in the test preparation? it's not much of a size change, it would be 134k
I'm a bit wary of storing binary files within the repository as it's harder to understand what they are, and not possible to track changes meaningfully
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this module being used?
I can't see how it's being called from anything else? (am I missing this?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this isn't used, I will delete it
import numpy as np | ||
|
||
class dataFunc: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class docstring please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
i'd like to do a bit more digging into this and assess this tolerance value and what might be appropriate for us to target. if we can do this as part of the review, then that's okay, but we can merge this as 'known failures' and spawn a follow up ticket to address tolerances for resampling |
hi @jeff-cole i think what i would like would be to set these tolerances such that they pass for these values, and Then i'd like to create a separate issue with the aim of justifying these tolerances, so that we can show willing, and have evidence beyond just values that worked. does this seem reasonable to you? |
… only versions >=1.6.0 of python netCDF4 support using Dimension instances
This test case extends read_domain_decomp_resample to regrid data on unstructured LFRic mesh to rectilinear grid.