Cannot read VDIF streams with non-monotonically increasing frame number #13

mhvk · 2015-12-17T14:49:55Z

@ishengyang, @pharaofranz: taking this out of e-mails to github so we don't forget. Not completely trivial so when this is addressed depends on how urgent it is. Note that I did merge #12, so the legacy headers now do get recognized properly in master.

A Mark 5B file converted to vdif by using jive5ab uses a somewhat peculiar ordering of threads and frames:

thread frame

0      0
0      1
1      0
1      1
2      0
2      1
3      0
3      1
4      0
4      1
5      0
5      1
6      0
6      1
7      0
7      1
0      2
0      3

It would be nice to ensure that the stream reader can read this, perhaps by explicitly telling it that there are 8 threads. A possible issue is that with the above ordering, one cannot seek a particular frame number into the raw data file, so this may involve more generally addressing that the data file can have gaps or inconsistent ordering.

The text was updated successfully, but these errors were encountered:

mhvk · 2017-06-01T19:49:11Z

Part of making vdif more robust is in https://github.com/mhvk/baseband/tree/vdif-more-robust

But that would not work all that well either, or at least be very inefficient. Better might be to read quite a large number of frames at the same time, and then selecting the right frame number. In particular in my ~/python/drao_vdif.py conversion script (see https://gist.github.com/mhvk/11889bf460885f0f178a62ea4d0008f5), I use:

class DRAOVDIFHeader(VDIFHeader0):

    _header_parser = VDIFHeader0._header_parser + HeaderParser(
        (('link', (3, 16, 4)),
         ('slot', (3, 20, 6)),
         ('eud2', (5, 0, 32))))

    def __new__(cls, words, edv=None, verify=True, **kwargs):
        return object.__new__(cls)

    def verify(self):
        pass

    @classmethod
    def fromfile(cls, fh, edv=0, verify=False):
        self = super(DRAOVDIFHeader, cls).fromfile(fh, edv=0, verify=False)
        # Correct wrong bps
        self.mutable = True
        self['bits_per_sample'] = 3
        return self


    fh = np.memmap('drao/' + files[0], dtype=np.uint32, mode='r')
    fil = fh.reshape(-1, 5032 // 4)
    header = DRAOVDIFHeader(fil.T[:8], edv=0, verify=False)

This gives a multi-D header in which one could do header['frame_nr'] == frame_nr and get all headers in one go.

mhvk · 2018-03-14T14:50:39Z

Looking at this again, I think the best solution would in fact be to read and reorder the file ourselves (using the raw VDIF writer). It might make sense to just document a work-around.

mhvk · 2019-03-08T22:52:10Z

For really regularly behaved data, an intermediate file reader that does the reordering on the fly is also possible.

IMFardz · 2019-03-08T23:04:25Z

I also have the same issue, so I thought it would be a good idea to post it here.

I have 8 threaded vdif files, whose threads are not ordered. This causes baseband to miscalculate the sample shape of the data. I think how baseband currently calculates the number of threads is that it reads one frame at a time, records the thread number, and stops counting after there is a repeated thread number.

For instance, If I do the following:

In [1]: from baseband import vdif
In [2]: fh = vdif.open('gk049c_gb_no0025.vdif')
In [3]: fh.sample_shape
Out[3]: SampleShape(nthread=4)

Now if I check the thread ids by repeatedly using the lines
In [6]: rh = vdif.open('gk049c_gb_no0025.vdif', 'rb')
In [32]: arr = rh.read_frame()
In [33]: arr.header

I find that the thread ids are: 1, 3, 5, 7, 1. I do find however that threads 0, 2, 4 and 6 pop up later in the scan. I think the first number of frames are only the odd number scans too. I had to read several thousand frames (~ 3000) before I found an even numbered thread. It could also be that even threads appear early in the file too and I am just very unlucky.

mhvk · 2019-03-09T00:30:43Z

@IMFardz - could you provide a bit more detail: once the even-numbered threads appear, do they have the frame number? I think the trick would be to just start a loop and print only header['frame_no]andheader['thread_id']`. The reason I ask is that it may be that for some reason in the first lot of time, the even ones were not on, and that after the offset, everything is OK. That would be much easier to deal with...

IMFardz · 2019-03-09T02:04:58Z

Hmm, so I looped through the file using the following for loop:
num = fh.seek(-1,2)
fh.seek(0)
while fh.tell() < num:
arr = fh.read_frame()
header = arr.header
print(header['frame_nr'], header['thread_id'])

And I think I got something somewhat interesting.

For the first 170 frames, it repeats the pattern 1, 3, 5, 7 as follows:
168 7, 169 1, 169 3, 169 5, 169 7, ...
However, after frame 170, it starts counting the even threads.
170 1, 170 3, 0 0, 0 2, 170 5, 170 7, 0 4, 0 6, 1 0, 1 2,
Unfortunately, it does not maintain a steady pattern of thread numbers, as sometimes it would read two odd threads, then two even threads, then 4 odd ones, then 2 or 3 or 4 odd ones and etc. It is not completely random however. The odd threads are always ordered (ie 1 appears before 3 which appears before 5 which is before 7 and the same for even). In the case of this specific dataset, the even threads represent one polarization and the odd threads represent another, so perhaps this is not so surprising.

I think my best best bet would separate this vdif file into two different vdif files, one with the even threads and another with the odd threads. That would give me two separate vdif files, each with four threads that have ordered thread ids.

Let me know if my conclusion makes sense.

mhvk · 2019-03-09T18:32:31Z

@IMFardz - interestingly different... I think you could indeed just wrote two files. Alternatively, open the file in two readers, one for odd and one for even, and just collect the right frames and write them out to a new file that is properly ordered.

IMFardz · 2019-03-15T22:40:23Z

@mhvk I tried to write a sample script which describes what you just explained. As far as I can tell, I think it is pretty simple and works fine. I think it could be edited pretty easily to serve similar purposes related to this issue. Let me know if you see any glaring issues with this.

Thanks!

split_pol.txt

mhvk · 2019-03-22T00:43:14Z

@IMFardz - yes, that looks pretty good! Small comments (not functionally important, but to help understanding): opening the files with 'rb' and 'wb' allows you to avoid using fh_raw; this will also ensure you get the rightfile size. On the latter, the total number of bytes is just fh.seek(0, 2) (where fh is the binary, not stream, filehandle).

Anyway, I think this would be a good one for adding to the documentation!

mhvk added bug vdif labels Aug 23, 2017

mhvk added Effort medium Package expert Buggy data Needs clarification docs labels Mar 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot read VDIF streams with non-monotonically increasing frame number #13

Cannot read VDIF streams with non-monotonically increasing frame number #13

mhvk commented Dec 17, 2015

mhvk commented Jun 1, 2017

mhvk commented Mar 14, 2018

mhvk commented Mar 8, 2019

IMFardz commented Mar 8, 2019 •

edited

Loading

mhvk commented Mar 9, 2019

IMFardz commented Mar 9, 2019 •

edited

Loading

mhvk commented Mar 9, 2019

IMFardz commented Mar 15, 2019 •

edited

Loading

mhvk commented Mar 22, 2019

Cannot read VDIF streams with non-monotonically increasing frame number #13

Cannot read VDIF streams with non-monotonically increasing frame number #13

Comments

mhvk commented Dec 17, 2015

mhvk commented Jun 1, 2017

mhvk commented Mar 14, 2018

mhvk commented Mar 8, 2019

IMFardz commented Mar 8, 2019 • edited Loading

mhvk commented Mar 9, 2019

IMFardz commented Mar 9, 2019 • edited Loading

mhvk commented Mar 9, 2019

IMFardz commented Mar 15, 2019 • edited Loading

mhvk commented Mar 22, 2019

IMFardz commented Mar 8, 2019 •

edited

Loading

IMFardz commented Mar 9, 2019 •

edited

Loading

IMFardz commented Mar 15, 2019 •

edited

Loading