Weird behavior for string operations with escaped characters #115

nusakan · 2024-08-29T22:50:26Z

EYE v10.19.7 (2024-08-24)
SWI-Prolog version 9.0.4

I may be wrong but some eye reasoner behaviors may be incorrect.

Escaped characters issue

Some builtin string: <http://www.w3.org/2000/10/swap/string#> operations have unexpected behaviors when used with strings containing backslashes ("\\", "\t" ...) characters, especially when regular expressions are involved.

All provided examples are using the Notation3 syntax. (for which the single backslash "\" is not a valid syntax).

For example:

Expression	Expected	Obtained
`"\n" string:length ?v`	`?v` is `1`	`?v` is `2`
`("a\\b" "(\\\\)" "-") string:replace ?v`	`?v` is `"a-b"`	`?v` is `"a--b"`
`("a\\b" "(.*)") string:scrape "a\\b"`	Match	No match
Other not tested string manipulation operations	whatever	Not confident...

A quick fix may be implemented for specific builtin core string functions.

However, it seems that the internal representation of literals relies on the N3 strings serialization syntax:
"\t" is represented as \ and t which produces these weird behaviors (even "segfault"s in some more complex cases).

So the quick fix may not be way to go: it introduces supplementary computational efforts, and only addresses partially the issue.

Thanks for your attention (and all the work you have already done).

Quick & dirty fix (`v10.19.7` - for illustration purpose only).

Not for production use - it may introduce critical issues, and does not solve all the problems
Only for string builtin functions described here: https://w3c-cg.github.io/n3Builtins/ (string:scrape, string:replace, string:length)
Some statements wrongly match (see Sample test cases below )
the escape_atoms rule ensure the conversion

%
% New rule
%
escape_atoms(X, Y) :-
    when(
        (   ground(X)
        ),
        (   atom_codes(X,W),
            escape_codes(V,W),
            atom_codes(Y,V)
        )
    ),
    when(
        (   ground(Y)
        ),
        (   atom_codes(Y,V),
            escape_codes(V,W),
            atom_codes(X,W)
        )
    ).

%
% Modified rules
%
'<http://www.w3.org/2000/10/swap/string#length>'(literal(A, _), B) :-
    when(
        (   ground(A)
        ),
        (   escape_atoms(A,V),
            sub_atom(V, 0, B, 0, _)
        )
    ).

'<http://www.w3.org/2000/10/swap/string#replace>'([literal(X, _), literal(Search, _), literal(Replace, _)], literal(Y, type('<http://www.w3.org/2001/XMLSchema#string>'))) :-
    when(
        (   ground([X, Search, Replace])
        ),
        (   (   atomic_list_concat(['(', Search, ')'], Se),
                regex(Se, X, [St|_]),    % Convert back to unescaped 
                escape_atoms(S,St)   % strings before processing
            ->  (   sub_atom(Search, 0, 1, _, '^')
                ->  atom_concat(S, T, X),
                    atom_concat(Replace, T, Y)
                ;   (   sub_atom(Search, _, 1, 0, '$')
                    ->  atom_concat(T, S, X),
                        atom_concat(T, Replace, Y)
                    ;   atom_codes(X, XC),
                        string_codes(S, SC),
                        atom_codes(Replace, RC),
                        subst([[[0'$, 0'1], SC]], RC, TC),
                        subst([[SC, TC]], XC, YC),
                        atom_codes(Y, YC)
                    )
                )
            ;   Y = X
            )
        )
    ).

'<http://www.w3.org/2000/10/swap/string#scrape>'([literal(X, _), literal(Y, _)], literal(Z, type('<http://www.w3.org/2001/XMLSchema#string>'))) :-
    when(
        (   ground([X, Y])
        ),
        (   regex(Y, X, [W|_]),
            atom_string(V,W),
            escape_atoms(Z,V)
        )
    ).

Sample test cases

@prefix string: <http://www.w3.org/2000/10/swap/string#> .
@prefix : <http://example.org/test#>.

{
  "\t" string:length 1 .
  ("\\" "\t") string:concatenation "\\\t" .
  "\\\t" string:contains "\\\t" .
  "x\\A\tx" string:containsIgnoringCase "\\a\t" .
  "end\\" string:endsWith "\\" .
  "\\A\t" string:equalIgnoringCase "\\a\t" .
  ("[%s]" "\\\t\n") string:format "[\\\t\n]" .
  "\\\t1" string:greaterThan "\\\t0" .
  "\\\t0" string:lessThan "\\\t1" .
  "\\" string:matches "\\\\" .
  "\t" string:matches "\\t" .
  # "\t" string:matches "\t" .      # WARNING: wrongly resolved with no error
  "A\tB" string:notEqualIgnoringCase "a\\tB" .
  "\\\t" string:notGreaterThan "\\\t" .
  "\\\t" string:notLessThan "\\\t" .
  "\\\t" string:notMatches "\\n" .
  # "\\\t" string:notMatches "\t" . # WARNING: wrongly resolved with no error
  ("a\\b" "(\\\\)" "\t[$1]\t") string:replace "a\t[\\]\tb" .
  ("\\" "(.*)") string:scrape "\\" .
  ("\\" "(\\\\)") string:scrape "\\" .
  ("\t" "(.*)") string:scrape "\t" .
  ("\t" "(.*)")!string:scrape string:length 1 .
  "\\start" string:startsWith "\\" .
} => {
  :test :passed true .
} .

The text was updated successfully, but these errors were encountered:

josd · 2024-08-30T19:16:06Z

Thank you very much for your fix which is more or less done in the latest eye and lingua.

nusakan · 2024-09-01T00:04:55Z

Thank you for your fast commit.

It seems however that there are still few failures. I've written a small test suite in Notation3 (extended to other string: rules), covering a larger scope (like false positives/negatives or errors).

I've uploaded here: string_test.n3.txt
(renamed to .txt as github does not recognize .n3 as text files)

You may run it with:

eye --quiet --nope string_test.n3

This test suite may not be quiet exact (I've not found an explicit documentation for the all the string: rules), you may review it before considering the test should pass or fail.

Here are the failing properties as exposed by the test suite alongside the involved tests (see the test suite for reference) (some core builtin rules are still there) with EYE v10.19.8 (2024-08-30) (master branch):

@prefix string: <http://www.w3.org/2000/10/swap/string#>.
@prefix : <http://example.com/string_test.n3#>.

string:greaterThan :fail (:t090 :t091 :t092 :t093 :t094 :t095).
string:lessThan :fail (:t110 :t111 :t112 :t113 :t114 :t115).
string:matches :fail (:t140).
string:notContainsRoughly :fail (:t150).
string:notGreaterThan :fail (:t173 :t174 :t175 :t170 :t171 :t172).
string:notLessThan :fail (:t183 :t184 :t185 :t180 :t181 :t182).
string:notMatches :fail (:t190).
string:replaceAll :fail (:t210 :t211 :t215).
string:scrapeAll :fail (:t231 :t232).
string:search :fail (:t244).
string:substring :fail (:t261 :t262).
string:upperCase :fail (:t270).
string:contains :fail (:t020).
string:containsIgnoringCase :fail (:t030).
string:containsRoughly :fail (:t040).
string:endsWith :fail (:t050).
string:format :error (:t081).

Best regards

josd · 2024-09-01T20:53:06Z

Thank you for your fast test development👍
Your cases should also perfectly fit in https://github.com/eyereasoner/eye/blob/master/reasoning/bi/biP.n3#L111
Anyway, I first try to fix the issues and some 30 of them are now fixed in https://github.com/eyereasoner/eye/releases/tag/v10.19.9 and https://github.com/eyereasoner/lingua/releases/tag/v1.4.1

The remaining ones are now

string:matches :fail (:t140).
string:notMatches :fail (:t190).
string:replaceAll :fail (:t210 :t215).
string:scrapeAll :fail (:t231 :t232).
string:search :fail (:t244).
string:substring :fail (:t262).
string:format :error (:t081).
string:replaceAll :error (:t211 :t212).

josd · 2024-09-03T22:38:48Z

We are now at

string:matches :fail (:t140).
string:notMatches :fail (:t190).
string:replaceAll :fail (:t210 :t215).
string:scrapeAll :fail (:t231 :t232).
string:search :fail (:t244).

nusakan · 2024-09-04T08:21:05Z

It appears that raised errors are independent of tests success or failure.

Errors such as:

** ERROR ** eam ** error(syntax_error(missing closing parenthesis),_624)
[...]

.. are raised with the following passing tests:

:t203 :assertNone { ("a\tb" "\\" "-") string:replace "a-tb" } .
:t213 :assertNone { ("a\tb" ("\\") ("-")) string:replaceAll "a-tb" } .
:t223 :assertNone { ("a\tb" "(\\)" ) string:scrape ?t223v } .
:t234 :assertNone { ("a\tb\nc" "(\\)") string:scrapeAll ("t") } .

While the following failing tests do not raise any error:

:t215 :assertNone { ("\tb" ("\t" "b") ("x" "y")) string:replaceAll "xy" } .
:t232 :assertNone { ("a\tb\nc" "(\t|\n)") string:scrapeAll ("\t" "\n") } .
:t244 :assertNone { ("a\\\t" "(\\+)") string:search ?t244v } .

I see no problem for regular expressions expressed with wrong syntax to have errors turned into warning messages, provided it follows the SNAF rule, as it provides substantial information (even if hard to debug). However for consistency reasons, the same behavior should be adopted in any case.

josd · 2024-09-04T14:28:08Z

I guess those exceptions might have to do with SWI-Prolog/packages-pcre#24 so will stay tuned.
After a few tweaks we are now at

string:replaceAll :fail (:t211 :t215).
string:scrapeAll :fail (:t231 :t232).
string:search :fail (:t244).

josd added a commit that referenced this issue Aug 30, 2024

fixing issue #115

1cccfa1

josd added a commit to eyereasoner/lingua that referenced this issue Aug 30, 2024

fixing eye issue eyereasoner/eye#115

469b4d2

josd added a commit that referenced this issue Sep 1, 2024

further fixing issue #115

0e28f21

josd added a commit to eyereasoner/lingua that referenced this issue Sep 1, 2024

further fixing issue eyereasoner/eye#115

e504421

josd added a commit that referenced this issue Sep 3, 2024

further fixing issue #115

8176540

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weird behavior for string operations with escaped characters #115

Weird behavior for string operations with escaped characters #115

nusakan commented Aug 29, 2024 •

edited

Loading

josd commented Aug 30, 2024

nusakan commented Sep 1, 2024

josd commented Sep 1, 2024

josd commented Sep 3, 2024 •

edited

Loading

nusakan commented Sep 4, 2024

josd commented Sep 4, 2024

Weird behavior for string operations with escaped characters #115

Weird behavior for string operations with escaped characters #115

Comments

nusakan commented Aug 29, 2024 • edited Loading

Escaped characters issue

Quick & dirty fix (v10.19.7 - for illustration purpose only).

Sample test cases

josd commented Aug 30, 2024

nusakan commented Sep 1, 2024

josd commented Sep 1, 2024

josd commented Sep 3, 2024 • edited Loading

nusakan commented Sep 4, 2024

josd commented Sep 4, 2024

nusakan commented Aug 29, 2024 •

edited

Loading

Quick & dirty fix (`v10.19.7` - for illustration purpose only).

josd commented Sep 3, 2024 •

edited

Loading