Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filtering w/o child selection #109

Open
grzegorz-herman opened this issue Jul 19, 2021 · 22 comments
Open

filtering w/o child selection #109

grzegorz-herman opened this issue Jul 19, 2021 · 22 comments

Comments

@grzegorz-herman
Copy link

First of all, thanks for the standardization efforts!

Unless I am missing something, the current draft (https://datatracker.ietf.org/doc/html/draft-ietf-jsonpath-base-01) allows filtering only in the context of child selection. I understand the semantics of [?(<predicate>)] as follows:

  • we have a current list of items (as produced by selectors earlier on the path),
  • for each of the items, we examine its subitems - elements if it is an array, values if it is an object, none otherwise,
  • we evaluate <predicate> on each subitem (with the special selector @ denoting the subitem),
  • the resulting list consists of those subitems for which the predicate evaluates to (something) true.

The problem I see is that there seems absolutely no way to filter the current list of items. In some situations, one can work around it, but sometimes not. Assume I have a value

{
  "a": { "v": true, "w": true },
  "b": { "v": false, "w": true },
  "c": { "v": true, "w": false },
  "d": { "v": true, "w": true }
}

and I would like to select those of subobjects with keys "a", "b" or "c" (but not "d") whose element "v" is true (i.e., the resulting list should contain { "v": true, "w": true } and { "v": true, "w": false }). I cannot apply the filtering at the top level, as it would include the key "d", I cannot apply it one level below as it would have to include "w", and there are no other levels.

Imagine that we had an independent filtering selector, say {<predicate>}, with the semantics of evaluating <predicate> on each current item (with @ denoting the item) and leaving those which satisfy it (note: I am not proposing this exact syntax here, just the semantics). Then I could intuitively solve the example above with $['a','b','c']{@.v}.

(as a side note, with the above the syntax [?(<predicate>)] would become equivalent to [*]{<predicate>})

Some implementations (jayway IIRC) actually “solve” the problem above by applying the filtering predicate in [?(...)] to items or subitems, apparently depending on whether the items are arrays or not. IMO following this path in the standard is a bad idea, as it can quickly become confusing (what if the list contains a mixture of arrays, objects, and primitives?).

Please, consider adding an independent filtering selector to the standard!

(as another side note, binding the “current item” @ is currently also attached to filtering, do you think it could be worthwhile to make it independent as well?)

@gregsdennis
Copy link
Collaborator

gregsdennis commented Jul 19, 2021

You can already do this in the existing syntax:

$['a','b','c'][?(@.v)]

This first selects the nodes under a, b, and c. Then from that resulting set, all the values which have v = true. It produces the following nodes:

  • $.a
  • $.c

The one thing we don't have a syntax for is negative selection, e.g. selecting everything except d. Interestingly this SO question recently popped up asking for something similar.

@grzegorz-herman
Copy link
Author

@gregsdennis, IMO what you suggest might work only by accident (or deliberate extension of the semantics by some implementations), but according to the wording of the current draft it should not. Please correct my reasoning in the following:

  • “a selector acts on each of the nodes in its input nodelist and concatenates the resultant nodelists to form the result nodelist of the selector” (3.4, third paragraph) - we will thus be working with nodelists (not to be confused with arrays);
  • “The root selector "$" [...] produces as output a list consisting of one node: the argument itself.” (3.4, first paragraph) - the first nodelist has one element, the node with the whole document;
  • “A union [...] selects the concatenation of the lists (in the order of the selectors) of nodes selected by the union elements.” (3.5.8.2). The nodelists for each of the indices "a", "b" and "c" each consist of a single entry (the appropriate subobject of the root), and thus the nodelist after this selector contains these three subobjects;
  • “During iteration process each array element or object member is visited and its value -- accessible via symbol "@" -- [...] is tested against a boolean expression” (3.5.9.1, second paragraph) - for each of the three nodes in the current nodelist (root subobjects keyed by "a", "b" and "c"), the items bound to @ by the filter are the values in the node, which are all booleans; none of them contains a member named "v", and thus the resulting nodelist is empty.

@gregsdennis
Copy link
Collaborator

gregsdennis commented Jul 19, 2021

Yeah, I think you're right. Mine would return nothing.

What you're looking for is more along the lines of

$[?(@.v)]

except that it omits $.d.

We have discussed allowing @ to have access to the key value. There's a proposal somewhere for a function like key(@) within an expression to capture a key when iterating over objects. We didn't go into that deeply because we decided to postpone defining functions for now.

If such a thing were to be supported, you could do

$[?(key(@) != 'd' && @.v)]

@grzegorz-herman
Copy link
Author

Thanks for the answer. Yes, the examples in the draft (Table 6) have the function index which seems to be doing that. However, from the point of view of pure semantics, this solution feels more like a workaround: why should I use some special function to filter on the key, when there is a dedicated “core” syntax for selecting subobjects by key.

And then, even such key-accessing function would not allow to write a path with the desired semantics of “select the whole (root) object if it satisfies the given predicate”, as there would be no place where the initial singleton nodelist could be made into an array to be filtered.

Please note, that I am not looking for a solution to any specific problem. My actual use case needs a special JSONPath implementation anyway, so I have simply extended what is currently in the draft by independent filtering (using the {<predicate>} syntax). I am just very much into programming languages, and I hope my somewhat “purist” remarks might help to make the standard even better.

By the way, I would be interested to learn how much the standardization effort here is (only) aimed at codifying what the existing implementations are doing, or, in another terms, how strong the requirement of backwards compatibility with status quo is. I see some possibilities for streamlining the semantics, but things could potentially break here and there, so I am uncertain if it even makes sense to share those here.

@gregsdennis
Copy link
Collaborator

You'd probably be interested in reviewing our charter and that there's an existing project to compare various implementations' levels of support.

In summary, we are interested in not breaking existing implementations as much as possible, but that's secondary to creating a query language that makes sense. For example, @goessner's original blog post mentions that expressions are to support the syntax of the environment in which that they're run (e.g. Javascript, Python, etc.). However we determined that this doesn't support interoperability well, so we have to define our own syntax for expressions.

@goessner
Copy link
Collaborator

@grzegorz-herman : You are correctly pointing to a certain insufficiency of JSONPath here.

I would like to elaborate here a little more. Both arguments ...

{
    "a": 1,
    "b": 2,
    "c": 3
}
[
    1,
    2,
    3
]

will select against the query $[?(@)] the identical value list [1,2,3], different node lists though.
So this conforms to the spec ...

During iteration process each array element or object member is
visited and its value -- accessible via symbol "@" -- or one of its
descendants -- uniquely defined by a relative path -- is tested
against a boolean expression "boolean-expr".

So it is vitally important for @ to give us each value in order to build a relative path as in @.v above.

Note that this has nothing to do with the fact, that we are iterating over the root node. The enability to filter during iteration against the index is inherent.

{
    "obj": {
        "a": 1,
        "b": 2,
        "c": 3
    }
}

also won't work via $.obj[?(@ != 'd')] or $[?(@[*] != 'd')] (useless anyway).

We simply have no mechanism yet to get the node-index during iteration.

Imagine, we had the symbol § for giving us the node-index instead of the node value during iteration, we could write $.obj[?(§ != 'd')] but not $[?(§[*] != 'd')] even if § would be so smart to resolve correctly, if it occurs at the start of a relative path. Also we probably might need to have a complement function value then. So I see no use value in introducing a new syntax element here.

@grzegorz-herman: In this context I do not understand ...

... why should I use some special function to filter on the key, when there is a dedicated “core” syntax for selecting subobjects by key?

... as @ is unable to be resolved to its index (key).

Having functions like index in our expression language seems to get increased importance through discussions like this. $.obj[?(index(@) != 'd')] would obviously do what we want.

... do I miss out here something ?

Thanks

--
sg

@danielaparker
Copy link

danielaparker commented Sep 6, 2021

Assume I have a value

{
  "a": { "v": true, "w": true },
  "b": { "v": false, "w": true },
  "c": { "v": true, "w": false },
  "d": { "v": true, "w": true }
}

and I would like to select those of subobjects with keys "a", "b" or "c" (but not "d") whose element "v" is true (i.e., the resulting list should contain { "v": true, "w": true } and { "v": true, "w": false }). I cannot apply the filtering at the top level, as it would include the key "d", I cannot apply it one level below as it would have to include "w", and there are no other levels.

It may be interesting to compare how this requirement would be met in JSONiq, an ISO/IEC approved, OASIS standard. In the JSONiq syntax, one way would be:

let $data := {
  "a": { "v": true, "w": true },
  "b": { "v": false, "w": true },
  "c": { "v": true, "w": false },
  "d": { "v": true, "w": true }
}

for $key in jn:keys($data),
    $x in $data($key)
where ($key eq "a" or $key eq "b" or $key eq "c") and $x."v" eq true
return $x

Output:

{
  "v": true,
  "w": true
}
{
  "v": true,
  "w": false
}

@goessner
Copy link
Collaborator

goessner commented Sep 8, 2021

... interesting ...

@key would be another syntax approach equivalent to function key(@) then.

@gregsdennis
Copy link
Collaborator

I think I prefer the function syntax. It opens the door for other functions.

@danielaparker
Copy link

danielaparker commented Sep 8, 2021

I think I prefer the function syntax. It opens the door for other functions.

@gregsdennis , I don't think it does, open the door for other functions. This is based on prior experience with supporting functions in two JSONPath implementations (C++ and C#), one allowing for custom user defined functions, and also on prior experience with two JMESPath implementations (C++ and C#) with built in functions as per the JMESPath Specification.

I had earlier suggested key(@) here, without fully thinking it through, but I no longer think it's a good idea. The difficulty is what that requires for the arguments passed to functions. Support for key(@) as a general function would require arguments to come in pairs, either path-value pairs or minimally key-value pairs. And of course not all function arguments are associated with paths. That makes for a messy function interface, departing from prior experience, prior experience suggest that functions should operate on values only.

Of course, an implementation could support key(@), but I think it would be as a special case, using internal details unavailable to say, a user taking advantage of an extension function interface.

@remorhaz
Copy link
Contributor

remorhaz commented Sep 8, 2021

I think I prefer the function syntax. It opens the door for other functions.

Introducing functions also brings permanent temptation to add new functions to the list; it's not bad in itself, but it requires a good versioning strategy for JSONPath. And user-defined functions can cause interoperability problems. We should think twice before opening this door.

On the other hand, introducing "current index" symbol binds the operation to current context. With functions we can build queries like $.a.b[?(key(@.c.d)=="e")], but such query can be easily replaced by $.a.b[?(@.c["e"].d)] - so we probably don't need to detect index in arbitrary context.

@remorhaz
Copy link
Contributor

remorhaz commented Sep 8, 2021

And if we use, for example , # as current index symbol, we also can make it possible to access whole path. For example (given path a->b->c->d):

  • #[1, 3] selects c and a;
  • #[0] is equivalent of # and selects d
  • #[0:2] selects d and c, and so on.

That can be quite a powerful syntax.

@grzegorz-herman
Copy link
Author

The discussion seems to focus more and more on accessing the current key/index. I just wanted to point out that, while such access would indeed solve the example problem, the real intent/question of this issue is different: should JSONPath have a filtering construct operating on the “current list of values”, not tied to child selection. I hope this question does not get lost. Thanks!

@remorhaz
Copy link
Contributor

remorhaz commented Sep 8, 2021

should JSONPath have a filtering construct operating on the “current list of values”

I've re-read your inital post:

I would like to select those of subobjects with keys "a", "b" or "c" (but not "d") whose element "v" is true

In fact, we may be lacking an intersection operation over unions. Some possible examples of this:

  1. $['a', 'b', 'c']#[?(@.v == true)] - operator applies over two (or more) separate unions;
  2. $['a', 'b', 'c' # ?(@.v == true)] - operator applies over two (or more) groups of selectors within a single union.

In both cases # (symbol is random) acts like an intersection operator over nodelists and result of its application is list of nodes that exists in all nodelists.

I don't know if these examples look more "intuitive" than semantics of "alternative filtering" originally proposed by @grzegorz-herman or not. As for me personally, concept of intersection is more natural. If we're working with sets, why not use classic set operations?

P.S.:

Some implementations (jayway IIRC) actually “solve” the problem above by applying the filtering predicate in [?(...)] to items or subitems, apparently depending on whether the items are arrays or not.

And so does mine PHP implementation (which I've tried to implement as close as possible to JayWay's), but I think that it's absolutely ugly.

@grzegorz-herman
Copy link
Author

@remorhaz:

In fact, we may be lacking an intersection operation over unions. Some possible examples of this:

  1. $['a', 'b', 'c']#[?(@.v == true)] - operator applies over two (or more) separate unions;
  2. $['a', 'b', 'c' # ?(@.v == true)] - operator applies over two (or more) groups of selectors within a single union.

I find the second version somewhat weird: on the left-hand side of the intersection operator we have indices, while on the right – a predicate on values.

You can view the first as either an intersection of two unions, or as a regular child selection (['a', 'b', 'c']) followed by the filtering I am suggesting (#[?(@.v == true)], in my non-syntax simply {@.v == true}). The latter has the advantage of being able to have additional selector in between, e.g. $['a','b'].g{@.v == true}.

Some implementations (jayway IIRC) actually “solve” the problem above by applying the filtering predicate in [?(...)] to items or subitems, apparently depending on whether the items are arrays or not.

And so does mine PHP implementation (which I've tried to implement as close as possible to JayWay's), but I think that it's absolutely ugly.

I wholeheartedly agree!

@remorhaz
Copy link
Contributor

remorhaz commented Sep 9, 2021

I find the second version somewhat weird: on the left-hand side of the intersection operator we have indices, while on the right – a predicate on values.

Union (at least on the level of idea) is just a combination of selectors, it allows you to mix any predicates, like this: $['a', ?(@.v == true), 'b'].

Update: I've looked over draft and ensured that union is defined like this:

Union selector "[,,...,]", holding a comma delimited list of index, index wild card, array slice, and filter selectors.

@remorhaz
Copy link
Contributor

remorhaz commented Sep 9, 2021

And yes, you're right in that your {...} syntax is full equivalent of my first variant #[...] technically; the only difference is that my # works as binary operator "intersect" over unons (or their parts) that are already defined in syntax, and you define completely new selector; but how to name it? How to define it's input node list?

@danielaparker
Copy link

@remorhaz , Possibly follow XPath 2 and later approach to support union and intersect operators? Union ("|") to combine the nodes returned by two or more paths, intersect to return the nodes common to both paths? And allow these operators to be grouped with left and right parentheses?

It could be argued that this is a natural evolution of JSONPath, much as XPath 1 evolved. It could also be argued that that's not JSONPath anymore.

@remorhaz
Copy link
Contributor

remorhaz commented Sep 9, 2021

@danielaparker, yes, the "problem" is that union syntax evolved naturally to [..., ...] in JSONPath - where , works like | in XPath and there's no grouping. I would like to find the way not to break this paradygm.

In fact, we can place all this grammar into union-entry. We can make two entries:

  • "nested union" entry (.., ...);
  • binary intersect operator over entries.

And get something like this:

$['a', ('b', 'c') intersect ?(@.val == true)]

That could be translated to following functions:

union(
  index('a'),
  intersection(
    union(
      index('b'),
      index('c'),
    ),
    filter(@val == true),
)

Of course this syntax has tons of pitfalls, but it follows XPath way in general. But we also can follow another way:

$['a', {['b', 'c'], ?(@.val == true)}]

In other words, if we defined union like [..., ...], then why not follow the naturally evolved pattern and define intersection like {..., ...}? And I see no problems in nesting these structures.

In this case, @grzegorz-herman 's problem salvation will be written like this:

${['a', 'b', 'c'], ?(@.val == true)}

What would you say?

@glyn
Copy link
Collaborator

glyn commented Sep 9, 2021 via email

@danielaparker
Copy link

@danielaparker, yes, the "problem" is that union syntax evolved naturally to [..., ...] in JSONPath - where , works like | in XPath and there's no grouping. I would like to find the way not to break this paradygm.

But we also can follow another way:

$['a', {['b', 'c'], ?(@.val == true)}]

In other words, if we defined union like [..., ...], then why not follow the naturally evolved pattern and define intersection like {..., ...}? And I see no problems in nesting these structures.

In this case, @grzegorz-herman 's problem salvation will be written like this:

${['a', 'b', 'c'], ?(@.val == true)}

Reading the first suggestion made me feel woozy :-) I thought the second was easier on the eyes. I don't love or hate the notation, and could probably get used to it. I'm not personally enthusiastic about {..., ...} for intersection, I think I would prefer [[..., ...]]. But I'm used to being a minority of one.

@cabo
Copy link
Member

cabo commented Jan 17, 2022

112 output:

Consensus: We acknowledge but won't address this issue

(Please read the minutes for more details. JSONPath-base is about selection, not projection.
Added a revisit-after-based label so we think about this more once -base is done.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants