Updated NodeNorm preferred label to match Babel's #300

gaurav · 2024-11-04T22:38:15Z

I changed Babel's preferred name lookup algorithm a while ago (TranslatorSRI/Babel#330), but I didn't change NodeNorm's preferred name lookup at the same time. This PR updates the prefix boost order to match Babel's and updates the algorithm to match Babel's as closely as possible.

Babel's algorithm does something significantly different from what NodeNorm's algorithm tries to do: Babel's algorithm for generating conflated synonym files uses the preferred name algorithm to find the best name for each unconflated clique, then picks the first preferred name when conflating multiple cliques; however, by the time we get to the create_node() code in NodeNorm, we've lost track of what the subcliques are, so instead we just run the "preferred label" algorithm on all the labels within the conflated clique and hope for the best.

This PR modifies NodeNorm to try to replicate Babel's algorithm: although we lose track of the subcliques, when we know that we're dealing with a conflation, we can walk through all the identifiers one-by-one and try to find a subclique with at least one non-empty label. We use a set() to ensure that this is as efficient as possible.

Ultimately, we should get rid of even this simplified code (#299) and just read the preferred name calculated by Babel for every clique, which is now present in the NodeNorm output files. And I don't think we'll hit the worst-case performance very often.

We use the order in TranslatorSRI/Babel#330

This reverts commit 67a46c9.

This reverts commit 6e8d1fa.

This will bring it in line with Babel.

Updated NodeNorm preferred label to match Babel's.

fed781e

gaurav requested a review from cbizon November 4, 2024 22:38

gaurav and others added 7 commits November 4, 2024 18:09

Added on:push trigger for testing.

0d1545f

Wrapped a map() in a list().

c7751e1

Deleted on:push trigger after testing.

4f4517d

Updated preferred_name_boost_prefixes to sync with Babel.

b3590f8

We use the order in TranslatorSRI/Babel#330

Added on:push trigger for testing.

31bc57f

Fixed possible bug in label choosing.

8f64ec3

Get rid of trying to sync preferred label algorithm.

67a46c9

gaurav removed the request for review from cbizon November 7, 2024 15:56

gaurav added 2 commits November 7, 2024 11:14

Revert "Get rid of trying to sync preferred label algorithm."

d09c560

This reverts commit 67a46c9.

Improve documentation.

5b2e20f

gaurav requested a review from cbizon November 7, 2024 16:25

gaurav and others added 14 commits November 7, 2024 15:16

Unreverted the simpler algorithm.

6e8d1fa

Increased demote_labels_longer_than to 40.

ab4404d

Reduced demote_labels_longer_than to 20.

7faf8aa

Revert "Unreverted the simpler algorithm."

78cfb42

This reverts commit 6e8d1fa.

Support Babel's preferred labels for conflated cliques.

a867dea

Improved code, maybe fixed bug.

b0021f1

More bugfixes.

3e13196

Oops.

3312137

More bugfixes.

20d1fd2

Reduced demote_labels_longer_than to 15.

4cf160d

This will bring it in line with Babel.

Removed on:push trigger after testing.

0ea9919

Slightly improved algorithm to avoid unnecessary queries to Redis.

fa32307

Added on:push for testing.

4800f1d

Removed on:push trigger.

6fb7675

gaurav merged commit 73b56dc into master Nov 8, 2024

gaurav deleted the sync-nodenorm-label-with-babel branch November 8, 2024 05:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated NodeNorm preferred label to match Babel's #300

Updated NodeNorm preferred label to match Babel's #300

gaurav commented Nov 4, 2024 •

edited

Loading

Updated NodeNorm preferred label to match Babel's #300

Updated NodeNorm preferred label to match Babel's #300

Conversation

gaurav commented Nov 4, 2024 • edited Loading

gaurav commented Nov 4, 2024 •

edited

Loading