Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support bulk-fetch using JOIN #171

Open
andyjefferson opened this issue Mar 7, 2017 · 5 comments
Open

Support bulk-fetch using JOIN #171

andyjefferson opened this issue Mar 7, 2017 · 5 comments

Comments

@andyjefferson
Copy link
Member

andyjefferson commented Mar 7, 2017

If we have a JDOQL query like
SELECT FROM Person WHERE this.firstName == :value

then this becomes
SELECT P.* FROM PERSON P WHERE P.FIRST_NAME = ?

If a Person has a Set

then if the addresses field is in the fetch plan we already support a bulk-fetch mode "EXISTS" giving SQL of
SELECT A.* FROM ADDRESS A WHERE EXISTS (SELECT P.ID FROM PERSON P WHERE P.FIRST_NAME = ? AND A.PERSON_ID = P.ID)

We could potentially have a bulk-fetch mode "JOIN" as
SELECT A.* FROM PERSON P, ADDRESS A WHERE A.PERSON_ID = P.ID AND P.FIRST_NAME = ?

The reason why this is more complicated than the EXISTS case is that for EXISTS we can make use of the backing store getIteratorStatement for the basic statement, and then put the original query in an EXISTS clause. Here we need to start from the basic query (but clearing the select) and then adding the join to the element, while catering for all different combinations of set/list/collection whether with embedded elements or not, and whether via FK or JoinTable.

@Hexiaoqiao
Copy link

Hexiaoqiao commented Nov 9, 2017

any plan to optimize the generated SQL?

@andyjefferson
Copy link
Member Author

No. That is dependent on contributions, this being open source and all. As per the "unresourced" tag on this issue

@shawnweeks
Copy link

Another option that is supported on several databases is to use an "IN" clause instead of "EXISTS" which would be implicitly converted to a join without any of the risks associated with inadvertent many to many relationships. That would probably be a lot easier to implement than the join as the SQL is closely related to exists. I normally work on Hadoop Projects but since I'm looking at using some of this I'll start getting familiar with the code base and see if I can help.

@shawnweeks
Copy link

Based on some testing against PostgreSQL 10.3, MaraiDB 10.2 and Oracle 12.2 this optimization is already happening with "EXISTS" and "IN". I can post the test scripts for other folks to look at but assuming you can use a relatively modern release of your database software you're already getting the benefit of using a join.

@andyjefferson
Copy link
Member Author

Thx for your input, interesting to hear.

A comparison of the 3 "bulk"/"batch" options (EXISTS, IN, JOIN) for EclipseLink JPA is present on this link https://java-persistence-performance.blogspot.co.uk/2010/08/batch-fetching-optimizing-object-graph.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants