Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Versification dictionary fills up with custom versifications and causes exception to be thrown #1367

Open
Enkidu93 opened this issue Dec 5, 2024 · 11 comments

Comments

@Enkidu93
Copy link

Enkidu93 commented Dec 5, 2024

We're encountering this issue in Machine in the Machine/Serval/ScriptureForge pipeline. See here.

A quick fix could be as simple as changing the Add in this line to a TryAdd. But given that we really don't need the Load method to save versifications to an internal dictionary for our use-case, a better solution might be to just have a Load function that does not add anything to the dictionary.

(I'm happy to code up a solution and put in a PR myself if I have the go-ahead).

@tombogle
Copy link
Contributor

tombogle commented Dec 5, 2024

Anyone can create a PR! (Not everyone can get past the review...)

@tombogle
Copy link
Contributor

tombogle commented Dec 5, 2024

I am a little curious as to why you're needing to create a bunch of custom versifications. Usually, those are pretty rare. But your use case already sounds a little different.

@Enkidu93
Copy link
Author

Enkidu93 commented Dec 5, 2024

Anyone can create a PR! (Not everyone can get past the review...)

OK! I look forward to it haha

I am a little curious as to why you're needing to create a bunch of custom versifications. Usually, those are pretty rare. But your use case already sounds a little different.

It's possible that some of these are the result of test projects - not sure. In any case, we'd like Machine to be robust in these circumstances.

What's the rationale for storing the versifications in the static dictionary in the first place? Caching?

@tombogle
Copy link
Contributor

tombogle commented Dec 5, 2024

I'm not 100% sure, but I believe the static caching has two purposes: performance and also to ensure referential equality. I think we want to ensure that each distinct versification is only loaded once so we do not have to deep-compare two identical versifications. @FoolRunning could probably shed more light on that and correct me if I'm wrong.

@FoolRunning
Copy link
Collaborator

FoolRunning commented Dec 6, 2024

Yes, it is for caching. The ScrVers is the wrapper for the versification with the internal mappings being done through the Versification class (The Versification class should never be referenced directly in an application). The ScrVers class is serializable so it can be deserialized from an XML file. What we didn't want is for each created instance of ScrVers to create a new copy of the versification tables (which would be very wasteful). This is the reason that all versifications are cached until the program is exited (i.e. in a static variable).

Getting a "duplicate key" exception suggests that the versification table is not be used correctly (Not your fault - there isn't any documentation 😅). You shouldn't ever load the same versification table more than once. If a versification is customized, it should have a unique name (usually the ID of the scripture project).

As for having it fill up, that is usually not a problem for most applications. I'm not sure what your application does, but if you're filling up with thousands of custom versifications, then one option would be to create a PR with a new method on Versification.Table that can remove a cached versification - but the application would have to be very sure that there are no ScrVers instances floating around referencing a removed Versification instance.

EDIT: Forgot to mention that you should, ideally, not be calling Versification.Table.Load at all - you should just be creating ScrVers implementations (giving it a name or type) and let that handle the loading.

@FoolRunning
Copy link
Collaborator

FoolRunning commented Dec 6, 2024

For reference, here is simplified version of Paratext's implementation of the versification table (set via Versification.Table.Implementation) that handles custom versifications from projects (The name contains a "-" followed by the ID of the project):

public class ParatextVersificationTable : Versification.Table
{
    #region Cosntants
    public const string customVersFilename = "custom.vrs";
    internal const char customVersificationSeparator = '-';
    #endregion

    #region Overrides of Table
    protected override Versification Get(string versName)
    {
        if (!Exists(versName))
        {
            LoadVersification(ref versName);
        }

        return base.Get(versName);
    }

    public override bool VersificationFileExists(string versName)
    {
        string[] parts = versName.Split(new[] { customVersificationSeparator }, 2);
        if (parts.Length == 1)
            return base.VersificationFileExists(versName); // Not a custom versification

        ScrText scrText = ScrTextCollection.FindById(HexId.FromStrSafe(parts[1]), parts[1]);
        return scrText != null && scrText.FileManager.Exists(customVersFilename);
    }

    protected override bool HandleVersificationLineError(InvalidVersificationLineException ex)
    {
        string frmMsg = GetLocalizedErrorMessage(ex);
        string message = frmMsg != null ? string.Format(frmMsg, ex.LineText, ex.FileName) : ex.Message;

        message += "\n\n" + Localizer.Str("ParatextVersificationTable_1", "For more information, search help for 'custom versification'.");

        Alert.Show(message, string.Format("Error in versification for project: {0}"), Path.GetDirectoryName(ex.FileName), AlertLevel.Warning);

        return true;
    }
    #endregion

    #region Private helper methods
    private static string GetLocalizedErrorMessage(InvalidVersificationLineException ex)
    {
        switch (ex.Type)
        {
            case VersificationLoadErrorType.InvalidSyntax:
                return "Invalid syntax in line [{0}]\nFile: {1}";
            case VersificationLoadErrorType.MissingName:
                return "Versification file must contain a name line";
            case VersificationLoadErrorType.NoSegmentsDefined:
                return "No segments defined in line [{0}]\nFile: {1}";
            case VersificationLoadErrorType.DuplicateExcludedVerse:
                return "Duplicate excluded verse in line [{0}]\nFile: {1}";
            case VersificationLoadErrorType.DuplicateSegment:
                return "Duplicate verse segment in line [{0}]\nFile: {1}";
            case VersificationLoadErrorType.InvalidManyToOneMap:
                return "Must map to or from a single verse in line [{0}]\nFile: {1}";
            case VersificationLoadErrorType.UnspecifiedSegmentLocation:
                return "Special '-' segment must be first segment in line [{0}]\nFile: {1}";
            default: return null;
        }
    }

    private void LoadVersification(ref string versName)
    {
        // If a custom versification was asked for, make sure it's loaded before we attempt to get it.
        string[] parts = versName.Split(new[] { customVersificationSeparator }, 2);
        if (parts.Length > 1)
        {
            ScrVersType versType = EnumUtils.ParseString<ScrVersType>(parts[0]);
            if (versType == ScrVersType.Unknown)
                versType = ScrVersType.English;

            ScrVers baseVers = new ScrVers(versType);
            string nameOrId = parts[1];
            ScrText scrText = ScrTextCollection.FindById(HexId.FromStrSafe(nameOrId), nameOrId);
            if (scrText == null || !scrText.FileManager.Exists(customVersFilename))
                versName = parts[0]; // project does not have a custom versification, so fallback to the built-in version
            else
            {
                using (TextReader reader = scrText.FileManager.OpenFileForRead(customVersFilename))
                {
                    // path for custom.sty is just used for error messages - see HandleVersificationLineError
                    Load(reader, Path.Combine(scrText.Name, customVersFilename), baseVers, versName);
                }
            }
        }
    }
    #endregion
}

Hopefully that clears up some things.

EDIT: If you're using ParatextData in your application, this implementation is there, it just needs to be set:
Versification.Table.Implementation = new ParatextVersificationTable()
or, ideally, you are calling ParatextData.Initialize (which also sets it).

@Enkidu93
Copy link
Author

Enkidu93 commented Dec 9, 2024

Yes, it is for caching. The ScrVers is the wrapper for the versification with the internal mappings being done through the Versification class (The Versification class should never be referenced directly in an application). The ScrVers class is serializable so it can be deserialized from an XML file. What we didn't want is for each created instance of ScrVers to create a new copy of the versification tables (which would be very wasteful). This is the reason that all versifications are cached until the program is exited (i.e. in a static variable).

Getting a "duplicate key" exception suggests that the versification table is not be used correctly (Not your fault - there isn't any documentation 😅). You shouldn't ever load the same versification table more than once. If a versification is customized, it should have a unique name (usually the ID of the scripture project).

As for having it fill up, that is usually not a problem for most applications. I'm not sure what your application does, but if you're filling up with thousands of custom versifications, then one option would be to create a PR with a new method on Versification.Table that can remove a cached versification - but the application would have to be very sure that there are no ScrVers instances floating around referencing a removed Versification instance.

EDIT: Forgot to mention that you should, ideally, not be calling Versification.Table.Load at all - you should just be creating ScrVers implementations (giving it a name or type) and let that handle the loading.

This is very helpful - thank you! Just now seeing your edits. Sounds like, yes, maybe there's a bug on our end since it sure seems like this is an issue that shouldn't be happening.

Here's the relevant snippet of code on our end that's calling the Load method:

...
            var scrVersType = (int?)settingsDoc.Root.Element("Versification") ?? (int)ScrVersType.English;
            var versification = new ScrVers((ScrVersType)scrVersType);
            if (Exists("custom.vrs"))
            {
                var guid = (string)settingsDoc.Root.Element("Guid");
                string versName = ((ScrVersType)scrVersType).ToString() + "-" + guid;
                if (Versification.Table.Implementation.Exists(versName))
                {
                    versification = new ScrVers(versName);
                }
                else
                {
                    using (var reader = new StreamReader(Open("custom.vrs")))
                    {
                        versification = Versification.Table.Implementation.Load(
                            reader,
                            "custom.vrs",
                            versification,
                            versName
                        );
                    }
                }
            }
...

Does something stand out to you about this as incorrect, @FoolRunning?

@FoolRunning
Copy link
Collaborator

FoolRunning commented Dec 9, 2024

@Enkidu93, Yeah, this is what I was trying to say. You shouldn't be loading the versifications like that. You should just always be doing versification = new ScrVers(versName) and having the ScrVers makes a call to the Versification.Table and it determines if it needs to load. That's why I gave the implementation of the versification table that handles loading of custom versifications - you'll need to do that instead of calling Versification.Table.Implementation.Load - you should never have to directly call load in your application.
If that's sill not clear, I can try post more code.

@Enkidu93
Copy link
Author

@Enkidu93, Yeah, this is what I was trying to say. You shouldn't be loading the versifications like that. You should just always be doing versification = new ScrVers(versName) and having the ScrVers makes a call to the Versification.Table and it determines if it needs to load. That's why I gave the implementation of the versification table that handles loading of custom versifications - you'll need to do that instead of calling Versification.Table.Implementation.Load - you should never have to directly call load in your application. If that's sill not clear, I can try post more code.

OK, I think I'm understanding now. So we'll need to create a custom Versification.Table.Implementation that mirrors the code you've pasted above. So there's no default behavior that appropriately handles custom versifications? I still feel like I'm missing something or that our code is calling the library improperly more than just in how it's calling Load directly.

@Enkidu93
Copy link
Author

@ddaspit, see here.

@FoolRunning
Copy link
Collaborator

@Enkidu93, Sorry for the slow response.
Yes, the default implementation does not handle custom versifications so you have to create an implementation that does handle them. The reason for this is that it's up to each application to figure out how to handle custom versifications. Since you're loading custom versifications from Paratext, it's probably best to do it in a similar way. As I said before, you can make it much easier on yourself by just using the implementation in ParatextData.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants