Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot read VDIF streams with non-monotonically increasing frame number #13

Open
mhvk opened this issue Dec 17, 2015 · 9 comments
Open

Comments

@mhvk
Copy link
Owner

mhvk commented Dec 17, 2015

@ishengyang, @pharaofranz: taking this out of e-mails to github so we don't forget. Not completely trivial so when this is addressed depends on how urgent it is. Note that I did merge #12, so the legacy headers now do get recognized properly in master.

A Mark 5B file converted to vdif by using jive5ab uses a somewhat peculiar ordering of threads and frames:

thread frame

0      0
0      1
1      0
1      1
2      0
2      1
3      0
3      1
4      0
4      1
5      0
5      1
6      0
6      1
7      0
7      1
0      2
0      3

It would be nice to ensure that the stream reader can read this, perhaps by explicitly telling it that there are 8 threads. A possible issue is that with the above ordering, one cannot seek a particular frame number into the raw data file, so this may involve more generally addressing that the data file can have gaps or inconsistent ordering.

@mhvk
Copy link
Owner Author

mhvk commented Jun 1, 2017

Part of making vdif more robust is in https://github.com/mhvk/baseband/tree/vdif-more-robust

But that would not work all that well either, or at least be very inefficient. Better might be to read quite a large number of frames at the same time, and then selecting the right frame number. In particular in my ~/python/drao_vdif.py conversion script (see https://gist.github.com/mhvk/11889bf460885f0f178a62ea4d0008f5), I use:

class DRAOVDIFHeader(VDIFHeader0):

    _header_parser = VDIFHeader0._header_parser + HeaderParser(
        (('link', (3, 16, 4)),
         ('slot', (3, 20, 6)),
         ('eud2', (5, 0, 32))))

    def __new__(cls, words, edv=None, verify=True, **kwargs):
        return object.__new__(cls)

    def verify(self):
        pass

    @classmethod
    def fromfile(cls, fh, edv=0, verify=False):
        self = super(DRAOVDIFHeader, cls).fromfile(fh, edv=0, verify=False)
        # Correct wrong bps
        self.mutable = True
        self['bits_per_sample'] = 3
        return self


    fh = np.memmap('drao/' + files[0], dtype=np.uint32, mode='r')
    fil = fh.reshape(-1, 5032 // 4)
    header = DRAOVDIFHeader(fil.T[:8], edv=0, verify=False)

This gives a multi-D header in which one could do header['frame_nr'] == frame_nr and get all headers in one go.

@mhvk
Copy link
Owner Author

mhvk commented Mar 14, 2018

Looking at this again, I think the best solution would in fact be to read and reorder the file ourselves (using the raw VDIF writer). It might make sense to just document a work-around.

@mhvk
Copy link
Owner Author

mhvk commented Mar 8, 2019

For really regularly behaved data, an intermediate file reader that does the reordering on the fly is also possible.

@IMFardz
Copy link

IMFardz commented Mar 8, 2019

I also have the same issue, so I thought it would be a good idea to post it here.

I have 8 threaded vdif files, whose threads are not ordered. This causes baseband to miscalculate the sample shape of the data. I think how baseband currently calculates the number of threads is that it reads one frame at a time, records the thread number, and stops counting after there is a repeated thread number.

For instance, If I do the following:

In [1]: from baseband import vdif
In [2]: fh = vdif.open('gk049c_gb_no0025.vdif')
In [3]: fh.sample_shape
Out[3]: SampleShape(nthread=4)

Now if I check the thread ids by repeatedly using the lines
In [6]: rh = vdif.open('gk049c_gb_no0025.vdif', 'rb')
In [32]: arr = rh.read_frame()
In [33]: arr.header

I find that the thread ids are: 1, 3, 5, 7, 1. I do find however that threads 0, 2, 4 and 6 pop up later in the scan. I think the first number of frames are only the odd number scans too. I had to read several thousand frames (~ 3000) before I found an even numbered thread. It could also be that even threads appear early in the file too and I am just very unlucky.

@mhvk
Copy link
Owner Author

mhvk commented Mar 9, 2019

@IMFardz - could you provide a bit more detail: once the even-numbered threads appear, do they have the frame number? I think the trick would be to just start a loop and print only header['frame_no]andheader['thread_id']`. The reason I ask is that it may be that for some reason in the first lot of time, the even ones were not on, and that after the offset, everything is OK. That would be much easier to deal with...

@IMFardz
Copy link

IMFardz commented Mar 9, 2019

Hmm, so I looped through the file using the following for loop:
num = fh.seek(-1,2)
fh.seek(0)
while fh.tell() < num:
arr = fh.read_frame()
header = arr.header
print(header['frame_nr'], header['thread_id'])

And I think I got something somewhat interesting.

For the first 170 frames, it repeats the pattern 1, 3, 5, 7 as follows:
168 7, 169 1, 169 3, 169 5, 169 7, ...
However, after frame 170, it starts counting the even threads.
170 1, 170 3, 0 0, 0 2, 170 5, 170 7, 0 4, 0 6, 1 0, 1 2,
Unfortunately, it does not maintain a steady pattern of thread numbers, as sometimes it would read two odd threads, then two even threads, then 4 odd ones, then 2 or 3 or 4 odd ones and etc. It is not completely random however. The odd threads are always ordered (ie 1 appears before 3 which appears before 5 which is before 7 and the same for even). In the case of this specific dataset, the even threads represent one polarization and the odd threads represent another, so perhaps this is not so surprising.

I think my best best bet would separate this vdif file into two different vdif files, one with the even threads and another with the odd threads. That would give me two separate vdif files, each with four threads that have ordered thread ids.

Let me know if my conclusion makes sense.

@mhvk
Copy link
Owner Author

mhvk commented Mar 9, 2019

@IMFardz - interestingly different... I think you could indeed just wrote two files. Alternatively, open the file in two readers, one for odd and one for even, and just collect the right frames and write them out to a new file that is properly ordered.

@IMFardz
Copy link

IMFardz commented Mar 15, 2019

@mhvk I tried to write a sample script which describes what you just explained. As far as I can tell, I think it is pretty simple and works fine. I think it could be edited pretty easily to serve similar purposes related to this issue. Let me know if you see any glaring issues with this.

Thanks!

split_pol.txt

@mhvk
Copy link
Owner Author

mhvk commented Mar 22, 2019

@IMFardz - yes, that looks pretty good! Small comments (not functionally important, but to help understanding): opening the files with 'rb' and 'wb' allows you to avoid using fh_raw; this will also ensure you get the rightfile size. On the latter, the total number of bytes is just fh.seek(0, 2) (where fh is the binary, not stream, filehandle).

Anyway, I think this would be a good one for adding to the documentation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants