Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEP-0006 issue that may be very confusing for users #194

Open
VladimirAlexiev opened this issue Feb 2, 2024 · 4 comments
Open

SEP-0006 issue that may be very confusing for users #194

VladimirAlexiev opened this issue Feb 2, 2024 · 4 comments

Comments

@VladimirAlexiev
Copy link
Contributor

@afs @JervenBolleman

https://github.com/w3c/sparql-dev/blob/main/SEP/SEP-0006/sep-0006.md#scope

A sub-select may have variables of the same name that are not lateral-joined to a variable of that name from the LHS.
The inner ?s in the SELECT ?label is not the outer ?s, because the SELECT ?label does not pass out ?s.

This is very confusing for me because here's how I think:

  • the RHS is a "subroutine" that's executed in a loop
  • Its inputs are the variables bound in the LHS
  • Its outputs are the variables projected out in SELECT

So it's confusing for me that to use a var as an input, I need to mention it in the "output" clause.
This comment by @frensjan #100 (comment) is in a similar vein.

I understand that by SPARQL algebra, it's not possible to equate the inner-but-not-exported ?s to the outer ?s.
However, is it possible to return an error or warning, because it's virtually certain that the user didn't mean to use the same var name for two different variables? SEP-0006 goes on:

There needs to be a new syntax restriction: there can be no variable ...

So I'm asking: can a similar syntax restriction be enforced for the case in question?

@afs
Copy link
Collaborator

afs commented Feb 3, 2024

Hi @VladimirAlexiev

As you note, this isn't an issue that is specific to LATERAL, although it needs considering as part of LATERAL.

However, is it possible to return an error or warning

It would be possible. A warning is always possible - that's not defined one way of the other by the specification.

(SEP-0007) requires naming apart but it could be an error instead. An error would lead to a different confusing situation - the name could be changed normally for no effect, now it does have an effect (e.g. the subpart of the query was developed separately).

The choice seems to me to be whether to aim to support the wider range of queries or restrict if possibly confusing. I prefer the former if it does not add too much burden. "possibly confusing" is a judgement which is risky in a spec (IMO).

There was a suggestion of enumerating variables LATERAL[?x,?y]. That does add a next consideration for the query writer.

Other possibilities are

  • implicit adding to the projection - that seems to be confusing where there are doubly nested SELECTs (unlikely to be common! but possible).
  • keeping the name the same and substituting.

The latter is related to the use of Substitution/SEP-0007 is parameterized queries (#57). In this case, it seems natural and desirable to allow the inner same-name ?s to be substituted for a value.

SEP-0007 is also in one of the EXISTS proposals.

My preference is having one mechanism to explain for these different uses, possibly with specific minor variations rather than case-by-case defining independent mechanisms that are similar for the user much of the time.

@VladimirAlexiev
Copy link
Contributor Author

hi @afs and @JervenBolleman !
The RHS can be a pattern not a SELECT, right?
In that case are all RHS variables considered to be "exported"?

(I tried to think up a useful case, but cannot come up with one...
Maybe if the RHS pattern includes magic predicates like geo:sfContains or expensive functions like geof:sfContains.
And so, executing the RHS would be more efficient if all LHS vars are bound before executing the RHS?)

@frensjan
Copy link

frensjan commented Feb 5, 2024

The RHS of LATERAL can be SELECT, right? This is even a pretty important use case I think. An example:

SELECT * {
  # select profiles
  ?profile a :Profile .

  # and the last three posts for each profile
  LATERAL {
    SELECT ?profile ?post {
      ?post a :Post ;
            :author ?profile ;
            :timestamp ?timestamp .
    }
    ORDER BY DESC( ?timestamp )
    LIMIT 3
  }
}

In this query ?profile is a variable in the input role for the SELECT clause.

Personally, I'd be in favour of a form in which variables bound in the sub-select don't need to be listed in the SELECT clause. E.g.:

SELECT * {
 # select some post
 ?reference a :Post ;
            :timestamp ?referenceTimestamp .
 
 # and up to three posts before it
 LATERAL {
    SELECT ?post {
      ?post a :Post ;
            :author ?profile ;
            :timestamp ?timestamp .
      FILTER( ?timestamp < ?referenceTimestamp )
    }
    ORDER BY DESC( ?timestamp )
    LIMIT 3
 }
}

In particular because in the second example here, ?referenceTimestamp can never be in the output role in the sub-query. So adding it to the SELECT clause only brings about confusion.

@VladimirAlexiev
Copy link
Contributor Author

#195 considers more basic cases (without LATERAL) inspired by @frensjan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants