Writing and Reading Sections of .e57 File #244

JonKirkland · 2023-04-18T09:53:10Z

Hello,
Currently I only know how to read and write e57 files while storing the data for an entire file in a buffer. Since I have some large .e57's I would like to work with, I was wondering if it was possible to:
Read from 0-5 million points, write 5 million points, then read from 5-10 million points etc. not so much memory is being used.
Looking at the docs I saw CompressedVectorReader.seek(), but I was not able to get this working for me. I have also not been able to find any example in the tests. If you anyone could outline a way for this to be done I would greatly appreciate it.

asmaloney · 2023-04-18T13:02:49Z

The code for seek() looks like this:

   void CompressedVectorReaderImpl::seek( uint64_t /*recordNumber*/ )
   {
      checkImageFileOpen( __FILE__, __LINE__, static_cast<const char *>( __FUNCTION__ ) );

      // !!! implement
      throw E57_EXCEPTION1( ErrorNotImplemented );
   }

This is related to #79 - though you are also asking for a batch/streaming interface.

(I've mentioned in other places that I started a new implementation from scratch a while ago. I'd implemented batched reading the way you describe because I think it makes a lot of sense!)

JonKirkland · 2023-06-12T14:50:42Z

Sorry to revive this, but is the fact that libe57 uses Xerces preventing file streaming? I'm just curious as I've been using this library a lot and have started to poke around the code to gain a better understanding.
Great project btw.

asmaloney · 2023-06-12T16:04:19Z

is the fact that libe57 uses Xerces preventing file streaming?

Nope - that's a separate issue. The issue with Xerces is that it is like using a sledgehammer to push a tack into cork - and it's been a constant source of problems to include & build. A small simple implementation like pugixml would be better. The structure of libE57Format's code, however, makes replacing the XML a fair bit of work.

For streaming, I think it would be possible to implement CompressedVectorReaderImpl::seek and use it somehow (which I believe was the original intent), but not efficiently because the library doesn't implement certain features from the standard (e.g. indexing).

dancergraham · 2024-08-21T19:00:52Z

For streaming, I think it would be possible to implement CompressedVectorReaderImpl::seek and use it somehow (which I believe was the original intent), but not efficiently because the library doesn't implement certain features from the standard (e.g. indexing).

Any ideas / pointers on how this would be done? Are you referring to the ASCE standard for e57 files?

asmaloney · 2024-08-21T19:29:14Z

Are you referring to the ASCE standard for e57 files?

The ASTM standard specifies a way to set up indices. Up until 3.2, this library (and the "reference" one) didn't include any index packets at all even though at least one is required by the standard.

I think the seek method was supposed to use these indices to quickly jump to a record (hence the param recordNumber). If these indices were implemented properly, then you could jump to a specific range of points efficiently (e.g. "read 100k points starting from record 11,56,278").

In my other E57 implementation, reading & processing is done in batches instead of all points at once, which I think is a better way to handle reading in general. Something like:

PointData pd = <read from file structure>;
pd.setBatchsize( 1024 * 100 );
auto readCallback = [](const PointRecordList &inList) {
   // process the points - inList is "batch size" in length (or however many are left to be read)
};
pd.readByBatch( readCallback );

Something like this could probably be implemented on top of libE57Format with a bit of work.

JonKirkland · 2024-08-21T21:14:13Z

Hello, I was looking at this again two months ago and spent a little bit of time trying to implement batching/chunking to reduce process memory. Here are some possible pointers / a rundown of how it works to help get started if anyone else wants to have a look.

First memory is allocated when passing a Data3D header to Data3DPointsData_t constructor, it passes the point count to the various buffers, so there is enough memory to read in all the data. This needs changing, possibly want to add another constructor with a batch size argument.

Secondly, the SetUpData3DPointsData() function checks the structure of the data and maps the various fields (like color, cartesianX, Intensity) to the corresponding buffer, the vector of SourceDestBuffer's just tells the CompressedVectorReader where to read each field to. So I don't think this needs changing for batching.

Next the actual reading of the data is done using the CompressedVectorReader returned from the above function. Within CompressedVectorReaderImpl::read().

Within read the main logic is within 3 functions:

BitpackDecoder::inputProcess() - I am not sure what this actually does and do not know if this needs to be changed.

earliestPacketNeededForInput() - This gets the memory offset to read the next packet of data. In order to implement batching without seek(), (so reading from 0 to pointcount, but without using all the memory), the offset of the last batch needs to be saved, possibly as a part of the reader class, this way the offset can be kept track of in between batches.

feedPacketToDecoders() - This takes the offset as an argument, possibly don't need to change anything here, not sure.

I remember I got stuck somewhere, but I also realize I haven't tried everything I wrote above, so I'll probably have another go at it within the next couple weeks if no one else does.

asmaloney added the enhancement New feature or request label Apr 18, 2023

dancergraham mentioned this issue Aug 21, 2024

Partial read of E57 file davidcaron/pye57#74

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Writing and Reading Sections of .e57 File #244

Writing and Reading Sections of .e57 File #244

JonKirkland commented Apr 18, 2023

asmaloney commented Apr 18, 2023

JonKirkland commented Jun 12, 2023

asmaloney commented Jun 12, 2023

dancergraham commented Aug 21, 2024

asmaloney commented Aug 21, 2024

JonKirkland commented Aug 21, 2024

Writing and Reading Sections of .e57 File #244

Writing and Reading Sections of .e57 File #244

Comments

JonKirkland commented Apr 18, 2023

asmaloney commented Apr 18, 2023

JonKirkland commented Jun 12, 2023

asmaloney commented Jun 12, 2023

dancergraham commented Aug 21, 2024

asmaloney commented Aug 21, 2024

JonKirkland commented Aug 21, 2024