Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for math operators in expressions #419

Open
gregsdennis opened this issue Mar 8, 2023 · 19 comments
Open

Support for math operators in expressions #419

gregsdennis opened this issue Mar 8, 2023 · 19 comments
Labels

Comments

@gregsdennis
Copy link
Collaborator

I think mathematical operators would be a beneficial addition (😏) to the expression syntax. It would allow things like

to check for model consistency. (Arguably, you just wouldn't serialize c as it should be a calculated field, but people do stranger things.)

There are doubtless other use cases.

I have support for this currently in my library. It's really easy to implement, and I don't think it would be too hard to specify.

I think this is within our charter as @goessner's original implementations supported "underlying scripting language" for expressions, which undoubtedly supported these operators.

@gregsdennis
Copy link
Collaborator Author

(I'm happy to defer this until after we've sorted out our function typing issues.)

@goessner
Copy link
Collaborator

goessner commented Mar 8, 2023

Well ... this might be useful indeed. But implementing arithmethics and specifying it in a clean way are two very different shoes.

We might deal then with:

  • 0.1+0.2 == 0.3 problem.
  • Division by Zero.
  • Define EPSILON
  • should we allow + operator to also concatenate strings ?
  • Explicite number type
  • rounding
  • sqrt ... where to stop ?

Alternatively, I can imagine, that a function similar to CSS calc would be easy to implement and easier to specify.

@glyn
Copy link
Collaborator

glyn commented Mar 8, 2023

Yes, first class support for mathematical operators will entail a lot of spec work. Function extensions could be used instead.

I suggest we defer this issue and tag it "revisit-after-base-done".

@gregsdennis
Copy link
Collaborator Author

The comparison indicates that many implementations support a path like $[?(@.key+50==100)], but it's split about 50/50 between reading that as

  • a math operation: @.key + 50
  • a "key+50" key

I wonder how adding in a couple spaces would do: $[?(@.key + 50==100)]. This should differentiate whether math operations are supported.

@cabo
Copy link
Member

cabo commented Mar 9, 2023

member-name-shorthand cannot contain a +, so recognizing @.key is not a problem.
(The problem is that adding math adds a ton of additional considerations.
E.g., what if @.key is "50" and not 50, etc.)

@gregsdennis
Copy link
Collaborator Author

member-name-shorthand cannot contain a +

Yeah, it's understood that those implementations aren't spec-compliant.

The problem is that adding math adds a ton of additional considerations. E.g., what if @.key is "50" and not 50, etc.

Yeah, it's understood that we'd have to do that stuff. I don't think we should shy away from it, though.

I still think this is within our charter.

@cabo cabo added the enhancement New feature or request label Mar 9, 2023
@ohler55
Copy link

ohler55 commented Mar 11, 2023

Personal bias here but I've found simple math operators (-, +, *, /) very useful in practice. With a decision on what to return for a divide by zero I think most end users would like the extra flexibility time math operators provide.

The one limitation I've had users question is why a - character can not be in a token since it can be confused with a minus sign when the token is used in an expression.

@gregsdennis
Copy link
Collaborator Author

We currently forbid - in the shorthand name syntax (requiring the brackets syntax instead), so that's not a problem.

@glyn
Copy link
Collaborator

glyn commented Mar 12, 2023

Deferring until after base done.

@goessner
Copy link
Collaborator

Follow up of #449:

Take the following arithmetic example: (a + b + c)*d/e <= 42, where a,b,c,d,e are members of the current node.

Using a set of small (binary) functions results in the query

$.arr[?div(prod(sum(sum(@.a,@.b),@.c),@.d),@.e) <= 42]

whereas using a calc function looks like

$.arr[?calc('(@[email protected][email protected])*@.d/@.e') <= 42]

I predict, most users will prefer the latter syntax.

We need here a function calc

  • expecting a single argument of type string.
  • returning the resulting number value or false (or Nothing) in case of an invalid argument.
  • having access to its environment via closure concept.

The string argument must contain a pure arithmetic expression, that means

  • only a limited set of (binary?) arithmetic operators is allowed (+,-,*,/,%,**).
  • operands need to be
    • number literals
    • singular nodelists containing number values
    • functions returning number values or singular nodelists containing number values

When Greg says regarding inline arithmetic:

I have support for this currently in my library. It's really easy to implement, and I don't think it would be too hard to specify.

Then implementing the calc function would even be more easier due to encapsulation. An implementation being able to parse JSONPath queries shouldn't find parsing isolated arithmetic expressions extremely challenging. Specifying that function should be a lot easier than specifying inline arithmetic with all its side effects.

Then there is another charming aspect of this approach.

Imagine the following scenario: A user is supplying a set of parts of simple geometry, holding the part-descriptions in a JSON array.

Each part description is redundancy-free and holds geometric and material properties. The part mass might be a measure of the selling price. So if we want to find all cuboids with a mass less than 20 (kg), we can start the query

$.parts[[email protected]=='cuboid' && calc('@.a*@.b*@.c*@.rho') < 20]

where a,b,c in [m] are the cuboid dimensions and rho its density in [kg/m^3].

In case we know - as the JSON author - that the part mass is frequently requested, we can even put into the header section of the JSON data

{  mass: {
     cuboid:"@.a*@.b*@.c*@.rho",
     sphere:"4/3*3.14*@r**3*@.rho",
     cylinder:"3.14*@.r**2*@.h*@rho"
   },
   parts: [...]
}

also the arithmetic expressions for other part masses. This way we can reformulate the query above to

$.parts[?calc($.mass[?index(@)!=''][email protected]) < 20]

which of course then requires the useful index function most recently discussed in #156.

Apart from that, having simple strings holding arithmetic expressions allows us to store them in JSON for reuse in the same way, as we can do it with JSONPath queries or preferrably with normalized pathes as strings.

That you cannot do conceptually with the barely readable mult/div/sum approach.

@gregdennis:

I fail to see how a calc() function would be any different than just including math operators in expressions. You'd still have to specify what is valid as a parameter to calc() and how that works. It seems easier to just define math operators and be done with it.

... no, due to strong encapsulation and sharp restricted syntax explained above.

@cabo:

Of course, this would break any attempt to have an extensible function interface, ...

I don't see this, please elaborate.

... because calc would need to include half of JSONPath’s syntax and would need access to all the related functionality as well.

... again no, due to strong encapsulation and sharp restricted syntax of pure arithmetic expressions, implementation should be easy, as Greg already mentioned above.

Stefan

@ohler55
Copy link

ohler55 commented Mar 23, 2023

If we are considering the ease of use for the end user I would think $.parts[?(@.x == @y + 3)] or $.parts[?(@.x == (@y + 3))] would be the most natural.

It shouldn't really matter how hard it is to implement if it is better for the end users. Anyone undertaking the task of implement the spec will have to be competent anyway so a little more work shouldn't be that large a hurdle. (IMHO)

@goessner
Copy link
Collaborator

@ohler55 ... I do understand this very well from a user's point of view. But on the way there will be a lot of spec work to be done. So we are discussing here a way, how functions - in which form - can help to add arithmetic expressions to queries, while having sufficient user acceptance.

I would applaud if some implementers gain experience meanwhile by implementing side by side

  • query inline arithmetic.
  • encapsulate it in a calc function.

Then they can help to identify edge cases, type collisions and handling of numeric anomalies.

@danielaparker
Copy link

Follow up of #449:

Take the following arithmetic example: (a + b + c)*d/e <= 42, where a,b,c,d,e are members of the current node.

Using a set of small (binary) functions results in the query

$.arr[?div(prod(sum(sum(@.a,@.b),@.c),@.d),@.e) <= 42]

whereas using a calc function looks like

$.arr[?calc('(@[email protected][email protected])*@.d/@.e') <= 42]

But you don't need a calc function to support that notation, it's very straightforward to incorporate numeric operators into the script expression language, with the usual precedence and associativity. For example, for two C++ and .Net implementations described here, given the following document,

{"arr":[{"a":2,"b":3,"c":5,"d":8,"e":2},{"a":2,"b":3,"c":5,"d":10,"e":2}]}

and query

$.arr[?(@[email protected][email protected])*@.d/@.e <= 42]

the result is

[{"a":2,"b":3,"c":5,"d":8,"e":2}]

That is, it's very straight forward if @.a, @.b, etc, evaluate to values, not sure what it would mean if they were to evaluate to nodelists.

Daniel

@ohler55
Copy link

ohler55 commented Mar 23, 2023

I took the approach described by @danielaparker in OjG but there is no reason all three of the proposed approaches could not be implemented. Having said that, picking one approach as the minimum and offering the others are extensions might be a way to resolve this.

@gregsdennis
Copy link
Collaborator Author

gregsdennis commented Mar 23, 2023

I agree with @danielaparker and @ohler55: these operators need to be supported in general expressions, not merely inside some function.

Then implementing the calc function would even be more easier due to encapsulation... Specifying that function should be a lot easier than specifying inline arithmetic with all its side effects. - @goessner

I don't see how the level of effort for supporting them in a function is any less than to support them in general expressions. If anything I think it's more effort because you have to explain why this syntax is valid only inside of this function.

$.parts[[email protected]=='cuboid' && calc('@.a*@.b*@.c*@.rho') < 20]

From a parsing perspective, this is much more complicated than

$.parts[[email protected]=='cuboid' && @.a*@.b*@.c*@.rho < 20]

From a user perspective, calc() is unnecessary.

Regarding the "expressions in data" concept, we don't currently support data specifying a path anywhere, and doing so opens a whole new can of worms that we'd need to consider. It's paving the way for an exec() function that executes code.

That you cannot do conceptually with the barely readable mult/div/sum approach.

No one is advocating for this approach. Sure calc() is better than these, but calc() is measurably worse that just supporting math in expressions.

@goessner
Copy link
Collaborator

Hmm ... as an outcome of this discussion the realisation matures, that inline arithmetic develops as a de-facto standard in current implementations, which is also the natural thing, users expect.

It seems to be best, to defer activities into that direction until after base done, which in fact was the reason, why Glyn closed this issue.

@gregsdennis
Copy link
Collaborator Author

gregsdennis commented Apr 19, 2023

I came up with this for basic math support:

math-expr = binary-math-expr / unary-math-expr
binary-math-expr = math-operand binary-math-operator math-operand
unary-math-expr = unary-math-operator (number / singular-query / value-function-expr / math-group)
math-operand = number / singular-query / value-function-expr / math-expr / math-group
math-group = "(" math-expr ")"
binary-math-operator = "+" / "-" / "*" / "/"
unary-math-operator = "-"

We'd then add math-expr as an option on comparable

comparable = literal / singular-query / value-function-expr / math-expr

I believe this gives support for addition, subtraction, multiplication, division, and grouping, though it doesn't give operator precedence as yet (I'm working on that).

It does allow multiple negations (e.g. ----4), which is weird. There's also an ambiguity in -4 now between

  • negative 4 as a number
  • positive 4 that has been negated

In the end, I'm not sure it makes much of a difference; maybe it saves an operation to have it as "negative 4." Given the outcome is the same, maybe we just let implementations decide how they want to handle it.

It also doesn't prevent division by zero, but we'd have to contend with a path or a function returning zero anyway. I think the math-expr evaluating to Nothing is fine. That would result in a "false" comparison which just wouldn't select the node.

Similarly any path or ValueType function which returns a non-number could result in a Nothing evaluation as well.

This doesn't support string concatenation (yet).

@gregsdennis
Copy link
Collaborator Author

Does the ABNF need to give operator precedence?

4+5*6 is syntactically valid whether or not the syntax understands that * should be performed before +.

@glyn glyn reopened this Dec 20, 2023
@glyn glyn reopened this Dec 20, 2023
@cabo
Copy link
Member

cabo commented Dec 20, 2023

Does the ABNF need to give operator precedence?

The principle of least surprise says yes: Implementers will expect the AST they derive from the ABNF to be directly useful for a tree interpreter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants