Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matching proximate phrases and precedence for dictionary combinations for actors #66

Open
philip-schrodt opened this issue Dec 14, 2018 · 0 comments

Comments

@philip-schrodt
Copy link
Contributor

In the Arabic validation set arabic_gsr_validation_18-11-14.xml, the sentence 5b6757616203c433883a1f0b produces a target actor with the code USAMED, whereas the actual target is "American soldiers" جندي_أميركي_ which would code to USAMIL. The MED (media) agent comes out of the word موقع (site/location) being in the sentence and the agent dictionary, and in a chain of dependencies (possibly due to a parsing error) connecting this to أميركي (American) but the phrase جندي_أميركي_ is in the actor dictionary and should have taken precedence: in other words, having matched a country-agent combination, there is no need to look further for agents (at least this is how TABARI and PETR-1 worked, and thus is still implicit in the UDP dictionaries). Also if multiple agents are present, the more proximate would take priority -- جندي (soldiers) is in the agent dictionary -- or at the very least, if agents were being concatenated, you'd get USAMILMED or USAMEDMIL. This is, granted, a somewhat odd situation as موقع probably shouldn't be in the agent dictionary in the first place, as it is too general (it's there, presumably, as a synonym for موقع موقع_إلكتروني (website) and got there via automated translation) but those agent assignment precedence rules for dictionaries and proximity are more general.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant