Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Software shouldn't change their inputs #12

Open
FerreolS opened this issue Feb 14, 2022 · 23 comments
Open

Software shouldn't change their inputs #12

FerreolS opened this issue Feb 14, 2022 · 23 comments
Assignees

Comments

@FerreolS
Copy link
Member

To prevent any surprise and to simplify the tracking by OImaging, the software should update only the OUTPUT PARAM table. It can add new HDU image if needed but shouldn't change the HDUs presents in the input file.

@jsy1001
Copy link
Contributor

jsy1001 commented Feb 14, 2022

Could you be more precise as to what is required? BSMEM doesn't intentionally change any of the input tables, but it reads their contents into memory and writes out new tables containing the information it read in. Hence there may be changes such as removal of keywords that BSMEM isn't expecting.

The order of the tables is changed, but I would hope that OImaging is robust to that.

@FerreolS
Copy link
Member Author

Yes, I forgot this case. Changing the order of keywords and table values is ok as it does not change the information content.

@FerreolS
Copy link
Member Author

The software should add an extra column with the modeled values to the OI_VIS2, OI_VIS,
OI_T3 (and OI_FLUX if needed) tables. As suggested by @emmt , the software must flag the rows that were not used (with a NaN?) rather than removing these rows.

@jsy1001
Copy link
Contributor

jsy1001 commented Feb 15, 2022

@FerreolS By "not used" do you mean excluded by USE_VIS etc., flagged (True value in the FLAG column), or something else? Why is it necessary to flag unused data in a new way if the excluded data can be inferred from the input prameters and/or FLAG columns?

@FerreolS
Copy link
Member Author

My concern was about WISARD, that need T3 and VIS2 at the same timestamp. If for some reason there is no corresponding VIS2, the T3 will not be used . It's value should stay in the table but must be flagged somehow. Am I right @GillesDuvert ?

@jsy1001
Copy link
Contributor

jsy1001 commented Feb 15, 2022

There is a potential edge case that could apply to BSMEM. It concerns the UVMAX parameter, which isn't implemented in OImaging yet. This tells BSMEM to filter the input data at the specified UV radius.

At the moment, BSMEM applies a UV radius filter in two situations: (a) if UVMAX is specified and is smaller than the maximum UV radius in the data; (b) if the initial image pixel size is larger than \lambda_min / (6 B_max). At the moment if case (b) applies the corresponding UVMAX value is written to the input parameters HDU of the output file, potentially overwriting the original UVMAX specified by the user.

I'm not sure of the best solution here - suggestions welcome!

@GillesDuvert
Copy link

Hi I have mixed feelings about #12 (comment) above.
Even if some (u,v) point was not used because V2 is missing, the reconstruction is an image that holds everywhere, so a model V2 and a model T3 is available at each u,v point. This goes also for measurements that are too noisy, WISARD drops them (error on 'myopic' complex visibilities is way too large). If the, say, observed V2, is flagged (or NULL?) it does not prevent to write a model_v2 IMHO.

@GillesDuvert
Copy link

But, back to the issue title, the main motive is to have the whole input image + input image header unchanged, yes?
(something I would have sworn was the case for Wisard)

@GillesDuvert
Copy link

done.

@FerreolS
Copy link
Member Author

At the moment, BSMEM applies a UV radius filter in two situations: (a) if UVMAX is specified and is smaller than the maximum UV radius in the data; (b) if the initial image pixel size is larger than \lambda_min / (6 B_max). At the moment if case (b) applies the corresponding UVMAX value is written to the input parameters HDU of the output file, potentially overwriting the original UVMAX specified by the user.

This case raise two concerns for me:

  • a parameters is (over)written in the INPUT PARAM table: should we rather keep the original value and INPUT and put the new value in OUTPUT? In that case the GUI could be able to detect this change and inform the user.
  • the "Pixel size too large for data" message is in the log but the user attention is not raised on it. How should we implement a way to raise warning ? by a keyword WARNING (containing just a flag or a complete warning message) or by letting parsing the log looking for warning? Or should we pop up those previsible warning in the GUI when clicking on the run button?

@FerreolS
Copy link
Member Author

BTW it seems that on this particular UVMAX issue, the message
ERROR : ** Message: 15:42:23.762: Pixel size too large for data is in the log even if the resolution is ok :

BSMEM_CI_VERSION: bsmem-v2.2.1-2021-12-16T14:08:21+00:00
cmd: "bsmem-ci /usr/local/tomcat/temp/uws/UPLOAD_1645113576690_inputfile /usr/local/tomcat/temp/uws/UPLOAD_1645113576690_inputfile.out"
ERROR : ** Message: 15:59:36.727: Pixel size too large for data
ERROR : ** Message: 15:59:36.731: Negative/zero values in prior image are not permitted. All have been replaced by 1e-08.
Bispectrum noise:	Classic elliptic approximation 
Maximum uv radius:	4.954e+07 waves
Array resolution:	2.0820 mas
Pixel size:		0.1000 mas
Image dimension:	200 pixels
Image fov:		20.00 mas
Pix/fastest fringe:	41.6396
Entropy functional:	Gull-Skilling entropy
Hyperparameter scheme:	Chi2 = N method
Maximum n# iterations:	50
Initial flux:		0.010000

@jsy1001
Copy link
Contributor

jsy1001 commented Feb 17, 2022

BTW it seems that on this particular UVMAX issue, the message ERROR : ** Message: 15:42:23.762: Pixel size too large for data is in the log even if the resolution is ok :

Indeed that is a bug. I was hoping we could agree on the correct way of handling UVMAX before I fix it.

@emmt
Copy link
Collaborator

emmt commented Feb 21, 2022

MiRA does raise an error or at least a warning if the input parameters are wrong (including the choice of the pixel size). The specified input parametes should not be modified by MiRA. Sometimes the error is not straightforward to interpret e.g. a pixel size larger than the field of view yields in an error because MiRA cannot create a 0x0 pixel image (this can be improved but being smart with all possible errors may be endless...). I can replace warnings such as:

WARNING - Current pixelsize (1 mas) is too large which will yield aliasing.  Maximum pixelsize to avoid aliasing is 0.854623 mas.

by an error if this seems preferable in spite that some people like to override given advices ;-)

@FerreolS
Copy link
Member Author

FerreolS commented Mar 1, 2022

I will add in the document that INPUT PARAM table (and INIT_IMG image) must remained unchanged. If for some reason (like the UVMAX case) the actual keyword used by the software differ, this actual value must be written int the OUTPUT PARAM table.

Is it ok for you?

If needed OImaging can easily track this change.

@emmt
Copy link
Collaborator

emmt commented Mar 1, 2022

To avoid too many specific rules, why not just have all input parameters copied into the out parameters with their actual (unchanged or not) values?

@FerreolS
Copy link
Member Author

FerreolS commented Mar 1, 2022

As you wish, but on my opinion it will make change detection harder for human readers (there is still some people that read keyword). For OImaging it is not an issue I guess

@FerreolS
Copy link
Member Author

FerreolS commented Mar 1, 2022

adding the point software should not alter or destroy other HDUs, blindly copying them in the output file from #15 (comment) that is much more related to this issue

@jsy1001
Copy link
Contributor

jsy1001 commented Mar 2, 2022

I will add in the document that INPUT PARAM table (and INIT_IMG image) must remained unchanged. If for some reason (like the UVMAX case) the actual keyword used by the software differ, this actual value must be written int the OUTPUT PARAM table.

Is it ok for you?

If needed OImaging can easily track this change.

Yes this is ok. See https://gitlab.com/jsy1001/bsmem/-/issues/10

@buthanoid
Copy link
Member

buthanoid commented Mar 3, 2022

The benefit of not modifying the input parameters is traceability. Using OImaging or not, it allows you to remember what you gave to the software to build the output file you are reading.
The drawback is that placing an input parameter in the output table is misleading. Firstly it can make the user think that it was not used as input (whereas it actually was), secondly the parameter may only make sense as input so seeing it as output can be confusing.

Concerning tracing if the software modified the input parameters or not, OImaging can do it as long as the new value of the parameter is somewhere in the output file. Of course, if both old and new values are in the output file, it is easier for OImaging.

The current spec already ask the softwares to copy the input parameters table in the output file. However, it also ask the software to complete it with missing keywords that the software needs.

I reckon that sometimes the input image is modified.

My (current) point of view is that is is more simple if the software modifies the input parameters:

  • the input values used by the software are in the input table
  • there is no input params in the output table

In any case, the user must be correctly informed when a parameter has been changed. He must not believe that the software used exactly what he gave as input.

@jsy1001
Copy link
Contributor

jsy1001 commented Mar 3, 2022

@buthanoid So are you proposing that when the reconstruction software modifies an explicit input parameter, both the input and used values should go in the input parameters table?

If so, how do we store both values? I was thinking of appending something to the parameter name (e.g. keywords [param] and [param]-OLD), but we will run into the 8 character limit for keywords (unless we use the hierarchical keyword convention).

@buthanoid
Copy link
Member

buthanoid commented Mar 3, 2022

Not exactly, I was proposing that the software should only store the value it actually used. So old values are lost. With a bit of work OImaging could retrieve them.
note: I find the others solutions good too.

@jsy1001
Copy link
Contributor

jsy1001 commented Mar 7, 2022

Hi I have mixed feelings about #12 (comment) above. Even if some (u,v) point was not used because V2 is missing, the reconstruction is an image that holds everywhere, so a model V2 and a model T3 is available at each u,v point. This goes also for measurements that are too noisy, WISARD drops them (error on 'myopic' complex visibilities is way too large). If the, say, observed V2, is flagged (or NULL?) it does not prevent to write a model_v2 IMHO.

At the moment bsmem does not follow this suggestion. The behaviour is documented in docs/common_interface.md, which reads:

A copy of the input data is written to the output file. Only the tables containing selected data are copied. Within the tables, unselected data are replaced by NULL values.
Model data, corresponding to the selected input data only, are written to the output file using the additional columns defined in Table 4 of the interface document.

This means that using the output as input to another reconstruction will only work as expected if the data selection parameters (TARGET, USE_*, UV_MAX) are unchanged. Changing bsmem to write all of the input data (whether selected or not) and model values would involve a fair amount of work.

What do the other algorithms do?

@FerreolS
Copy link
Member Author

FerreolS commented Mar 9, 2022

This issue was also discussed in #5 :
In this issue @emmt said :

Ok that seems to be a better idea to keep all input data.

For the model of unused data, using NULL is the more logical (and then omitting the column if all elements are NULL). Using the current image it is however possible to compute a model for all input data. It could then be an option to do that. On output, there could be an optional column of flags indicating for each data whether: data was used and model provided, data was not used but model provided, and neither data was used nor model provided. I just mention this possibility for completeness but it may add to much complexity for the image reconstruction software. So your solution has my personnal preference as it is consistent and the simplest (we could add the optional column of flags later if it appears to be really needed).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants