You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A number of issues and PRs have arisen suggesting that PINT's parsing of tim files should be revisited to ensure that we are parsing these files in an appropriate way. There is considerable discussion but it is spread over multiple issues and PRs, so I want to gather it here in hopes that we can establish what PINT's parsing should do, to be implemented later.
A summary of the problem
There are a number of formats for .tim file entries:
Parkes, Princeton, and ITOA. These are FORTRAN-style text formats with fixed columns where various things appear; these are mostly fairly easy to distinguish from one another although some nuance is necessary. White space between columns may or may not exist. These do not support custom flags.
Tempo2. This is a C/Python-style free-form text format with no fixed columns where fields are delimited by whitespace, and where the first field can and does contain more or less arbitrary non-whitespace text (often a filename). These support custom flags. By specification and custom, the presence of such TOAs is signalled by a command (see below) "FORMAT 1".
Commands. These include "FORMAT 1" to signal the presence of tempo2-format TOAs, "JUMP" to introduce a(n old-style) JUMP, "TIME" to signal that some TOAs need adjustment, and "INCLUDE" to signal that additional TOAs should be read from another file.
Comments. These are conventionally lines starting with "#" or "C" (though that latter can easily conflict with tempo2-format fields). In a common custom, they may be created by prepending "C " to a TOA that has been excised; scripts to programmatically generate or recover these TOAs are common.
Informal comments. Traditionally TEMPO or TEMPO2 silently ignore any line they don't recognize, treating these as comments.
The problem, or at least a problem, is that "in the wild" some files exist that intermix these different kinds of entry in a variety of ways. The flexibility of the tempo2 format means it can easily be confused with Parkes, Princeton, and ITOA TOAs. The flexibility of comment format also complicates parsing, and the existence of informal comments makes error reporting much more difficult.
How should we change PINT's parsing to improve this situation?
Suggestions that have been floated:
Enforce that all lines after FORMAT 1 in a given file are treated as tempo2-format. (Do we allow tempo2-format entries not flagged by FORMAT 1?)
Allow users to choose between strict and best-guess parsing modes - strict modes where they can specify that their files are supposed to contain only tempo2-format TOAs signalled by FORMAT 1 and no informal comments, and any deviation gives rise to an exception, and best-guess modes where PINT tries its best to make anything it sees into a TOA.
Make the smallest possible change to PINT's parsing necessary to fix existing bugs.
I wanted to add just a bit more about the way tempo2 reads .tim files:
Yes, indeed, if "FORMAT" appears anywhere, it assumes the tempo2-style format. If "HEAD" appears anywhere (including after FORMAT), it assumes ".tpo" format, whatever that is. I'm guessing this use case doesn't happen often.
Only some mixed formats are supported.
FORMAT will cause non-tempo2 TOAs to parse incorrectly.
Lack of FORMAT will cause tempo2 TOAs to parse incorrectly.
Flags like JUMP/TIME etc. are accumulated as they are read, i.e. in parse order, just like tempo, and apply to all subsequent TOAs regardless of inferred format. You probably don't want this, and so if somehow you were successfully parsing a file with both tempo and tempo2-style TOAs, it would only work as expected if tempo2 preceded tempo.
In summary, sometimes tempo2 will successfully parse .tim files with mixed formats, but it's not a universal property, and many cases will definitely yield erroneous results.
Thus, the safe thing is to require that all TOAs in a file share the same format. This can also be managed with a metafile using "include".
A number of issues and PRs have arisen suggesting that PINT's parsing of tim files should be revisited to ensure that we are parsing these files in an appropriate way. There is considerable discussion but it is spread over multiple issues and PRs, so I want to gather it here in hopes that we can establish what PINT's parsing should do, to be implemented later.
A summary of the problem
There are a number of formats for
.tim
file entries:The problem, or at least a problem, is that "in the wild" some files exist that intermix these different kinds of entry in a variety of ways. The flexibility of the tempo2 format means it can easily be confused with Parkes, Princeton, and ITOA TOAs. The flexibility of comment format also complicates parsing, and the existence of informal comments makes error reporting much more difficult.
How do programs address this?
How should we change PINT's parsing to improve this situation?
Suggestions that have been floated:
Issues/PRs where this is discussed: #1320 #1319 #1271 #730 #731
The text was updated successfully, but these errors were encountered: