Add support for the indexed users db format #934

DaleFarnsworth · 2021-10-05T14:02:24Z

The indexed format is a tree-structured database. Each
unique string is stored only once in the db and referenced through
pointers by each dmr ID entry that uses that string.

The new format uses about half the space of the standard
md380 userdb format.

The indexed format is a tree-structured database. Each unique string is stored only once in the db and referenced through pointers by each dmr ID entry that uses that string. The new format uses about half the space of the standard md380 userdb format.

DaleFarnsworth · 2021-10-05T14:09:38Z

Hi Travis. Please review and comment. I've been running various iterations of this code and new db format for a couple of months now without any issues. The changes I made to usersdb.c support both the standard userdb format and this new indexed tree-structured format. The new format also begins with a single ascii line containint "0", so if the new format is installed on a radio running old firmware, it just looks like a 0-length database. After support for the new db format is added to md380tools, the firmware will support either format.

A description of the format is contained in README-INDEXEDDB.md .

The repo at https://github.com/DaleFarnsworth/md380IndexedUserDB contains C programs that convert both ways between the standard db format and this new indexed format. The conversion back and forth is lossless.

Thanks.

travisgoodspeed · 2021-10-07T18:25:04Z

Any volunteers to review this code? At first glance it's a worthy contribution, but I'm too burned out on this project to review the code thoroughly myself.

rogerclarkmelbourne · 2021-10-07T20:48:53Z

There is another / further compression which can be applied, as long as you only need upper and lower case ASCII and numbers and space and comma, because the total number of unique chars is 64 not 256.

Hence 4 ASCII bytes can be packed into 3 bytes.

AFIK. This is the compression method used by some manufacurers like Connect Systems.

DaleFarnsworth · 2021-10-07T21:55:28Z

We don't need comma, but currently, the users db I use has '#', '&', "'", (single quote), '(', ')', '*', '+', '-', '.', '/', ':', ';', '=', '?', '@', ']', '_', '`', '|', and '$'. We could avoid some of these with cleanup, but I think we'll still benefit from having at least space, dash, ampersand, and period. I find it tough to get to the required 64 character alphabet.

One of my (admittedly self-imposed) requirements was that the current database contents be fully supported. I don't plan to add any character string compression, but others are welcome to do so.

rogerclarkmelbourne · 2021-10-07T22:20:55Z

No worries

It was just a suggestion, as it does yield about 30% extra compression on the entire uncompressed string for each record.

However, it would yield less compression on your shorter sub strings.

BTW. I initially thought your compression, also handed all the complete duplicate records, where people have 2 or 3 ID's and completely the same information in each, apart from the ID

I wonder if you could somehow add that as some sort of special case.
But you'd need to see how much compression that yielded.

There are also a large number of ID's which hardly ever get used. HamDigital.org used to maintain a list of active ID's which could be downloaded with activity range limits up to 1 year or more.

And I recall, only about 50% of the IDs were every active in any given year.

Of course of DMR MARC supported TA, then none of this would be necessary ;-)

And I don't know why no one has written an extension to MMDVMHost to inject TA, because that would fix the problem for the large number of people using hotspots etc on DMR MARC, and potentially for all DMR MARC repeaters which use MMDVMHost

Unfortunetly I don't have time to update MMDVMHost, because I'm busy on loads of other projects

DaleFarnsworth · 2021-10-07T22:35:14Z

My method already only stores one record when multiple dmrids have the same callsign, name, etc. It's not a special case.

rogerclarkmelbourne · 2021-10-07T22:44:34Z

ok.

thats good to know

DaleFarnsworth · 2021-10-07T23:48:15Z

It looks like (back-of-the-envelope guesstimate) that we could save an extra 10% (on a full database containing names, cities, states and countries) by encoding the most often occurring character pairs as unused character values. I.e. encode the current characters into values 0 to <number_of_unique-characters>-1 and use values <number_of_unique_characters> to 255 to represent the most often occurring character pairs. And the decoding would be quite simple. I think I'll code it up and see what it gives us.

If that's implemented it will be independent of the current code, so I would still appreciate someone's careful review of this PU as it currently stands.

rogerclarkmelbourne · 2021-10-07T23:56:09Z

I'd not be using it with MD380 tools, and unfortunatly I'm also mega busy with other projects, so this one would not get looked at for several months

DaleFarnsworth · 2021-10-08T06:23:57Z

I prototype the character pair compression. My estimate of 10% savings was way off. It's actually only 4%. And that's on the indexed file. The saving based on the original file is less than 2%. I don't know that it's worth it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for the indexed users db format #934

Add support for the indexed users db format #934

DaleFarnsworth commented Oct 5, 2021

DaleFarnsworth commented Oct 5, 2021

travisgoodspeed commented Oct 7, 2021

rogerclarkmelbourne commented Oct 7, 2021 •

edited

Loading

DaleFarnsworth commented Oct 7, 2021

rogerclarkmelbourne commented Oct 7, 2021 •

edited

Loading

DaleFarnsworth commented Oct 7, 2021

rogerclarkmelbourne commented Oct 7, 2021

DaleFarnsworth commented Oct 7, 2021

rogerclarkmelbourne commented Oct 7, 2021

DaleFarnsworth commented Oct 8, 2021

Add support for the indexed users db format #934

Are you sure you want to change the base?

Add support for the indexed users db format #934

Conversation

DaleFarnsworth commented Oct 5, 2021

DaleFarnsworth commented Oct 5, 2021

travisgoodspeed commented Oct 7, 2021

rogerclarkmelbourne commented Oct 7, 2021 • edited Loading

DaleFarnsworth commented Oct 7, 2021

rogerclarkmelbourne commented Oct 7, 2021 • edited Loading

DaleFarnsworth commented Oct 7, 2021

rogerclarkmelbourne commented Oct 7, 2021

DaleFarnsworth commented Oct 7, 2021

rogerclarkmelbourne commented Oct 7, 2021

DaleFarnsworth commented Oct 8, 2021

rogerclarkmelbourne commented Oct 7, 2021 •

edited

Loading

rogerclarkmelbourne commented Oct 7, 2021 •

edited

Loading