Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow users to turn verification off? #133

Closed
cczhu opened this issue Mar 8, 2018 · 6 comments
Closed

Allow users to turn verification off? #133

cczhu opened this issue Mar 8, 2018 · 6 comments

Comments

@cczhu
Copy link
Contributor

cczhu commented Mar 8, 2018

During sequential decode, approximately 10% of the time is taken up by verifying header and payload integrity. While this is essential for selective or initial sequential decode, if the user is re-analyzing data they know to be error-free, it would be useful to pass verify=False to stream readers for the performance boost.

@mhvk
Copy link
Owner

mhvk commented Mar 14, 2018

I guess this could have multiple parts. Two from from simple to complex so far:

  1. Just pass on verify when reading a frame or header;
  2. Avoid even more tests (such as whether frames are properly consecutive).

It may be that we end up allowing a string-valued argument, but True and False should do sensible default things.

(The package-novice label is for option 1.)

@cczhu
Copy link
Contributor Author

cczhu commented May 31, 2018

Created a PR to address 1.

As for 2., looking at profiling results for sequential decoding various file formats, only VDIF may require a significant amount of verification time (~20% of the total frame read-in and decode time for an nthread = 8 file). This is probably because each frameset contains multiple frames, all of which are checked - Mark 4 headers have fairly expensive individual assert statements, but when compared against an nthread = 8 VDIF file there are fewer headers per sample. Of the items checked, these two

assert (h0['sync_pattern'] ==
            h0._header_parser.defaults['sync_pattern'])
assert fr0.payload.shape == (fr0.header.samples_per_frame, fr0.header.nchan) 

are the most expensive, together taking about 33% of the total verification time.

(A non-negligible amount of time apparently is also spent navigating the class inheritance structure to track down a method, since it takes a few µs more to run header.verify() than it is to manually run all the checks in verify.)

I'm not sure how to speed up the second assert, but for the first one we can just store the sync pattern in the VDIF header class definition, since these are set by the EDV. Better would be to limit the amount of verification to one header per frameset, by default, then adding a strict option to verify which also checks whether the headers are consistent with each other (#117).

@mhvk
Copy link
Owner

mhvk commented Jun 5, 2018

Agreed we should at least factor out the sync_pattern; the payload shape test would be irrelevant for cases where a header is passed in (such as in fromfile) since it is effectively derived from the header in the first place.

@cczhu
Copy link
Contributor Author

cczhu commented Jun 5, 2018

So in that case, in addition to what's been done in #233, should we move the sync_pattern to the VDIF headers, and simply remove the sample shape check in the frame verify? The only time when the latter breaks is when the user manually instantiates a class using a header and payload they independently created.

@mhvk
Copy link
Owner

mhvk commented Jun 5, 2018

I think for VDIFFrame, I'd rather just pass verify=False in the cls(...) call in fromfile, since we know the verification has already been done (the other test for complex data is also completely pointless for the fromfile case, since it is derived from the header anyway).

@cczhu cczhu mentioned this issue Jun 6, 2018
@cczhu
Copy link
Contributor Author

cczhu commented Jun 6, 2018

Solved (for now) by #233 and #241.

@cczhu cczhu closed this as completed Jun 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants