Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for batch operations (R2DBC) #27229

Conversation

MarkMarkyMarkus
Copy link

Add support for batch operations to the DefaultDatabaseClient:

databaseClient.sql("INSERT INTO legoset (id, name, manual) VALUES(:id, :name, :manual)")
				.bind("id", 42055)
				.bind("name", "SCHAUFELRADBAGGER")
				.bindNull("manual", Integer.class)
				.add()
				.bind("id", 2021)
				.bind("name", "TOM")
				.bindNull("manual", Integer.class)
				.fetch().rowsUpdated()

Closes #259

Demo app: springR2dbcBatchExample.tar.gz

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Jul 30, 2021
@sbrannen sbrannen added in: data Issues in data modules (jdbc, orm, oxm, tx) type: enhancement A general enhancement labels Jul 30, 2021
@patrickhuy
Copy link

@sbrannen @mp911de this looks very intresting for me, could you share some insights what the next steps for this PR should be?

@mp911de
Copy link
Member

mp911de commented Oct 7, 2021

Batching through .add() would only properly work if the SQL doesn't change. DatabaseClient uses named parameters and expands named parameter markers using collection arguments into native bind markers. The number of bind markers depends on the actual collection size. This is useful when using IN clauses. I'm not quite sure about the direction.

We could follow the idea of NamedParameter of expanding the SQL using the first bindings and raising an exception later on if the expanded parameter count doesn't match the number of parameters found in a collection argument.

Other than that, please check out the contribution guide to learn how we accept contributions.

@MarkMarkyMarkus
Copy link
Author

Thanks for the response.

We could follow the idea of NamedParameter of expanding the SQL using the first bindings and raising an exception later on if the expanded parameter count doesn't match the number of parameters found in a collection argument.

Do you mean that we should check the expanded parameters count in NamedParameterUtils.substituteNamedParameters() or NamedParameter.addPlaceholder()?

@Andro999b
Copy link

Is there any plans to add batching?

@gabfssilva
Copy link

Batching through .add() would only properly work if the SQL doesn't change.

@mp911de, correct me if I'm wrong, but the SPI has two ways of batching, right? One that allows you to send a single command with multiple parameters (via parameter binding) & another that allows you to pass multiple commands and, although both are sent in a single roundtrip, the latter one is slower since the database will have to parse every single command, so if your command won't change, you should stick with the first approach.

I'm asking because the implementation of this PR is probably the most optimized (since it uses the first approach), although less flexible. Isn't there a way to add both ways to the DatabaseClient, so the developer would choose which way fits better for them?

@mp911de
Copy link
Member

mp911de commented Oct 26, 2022

@gabfssilva your perception is correct. Parametrized batching parses the command once and sends a execute command to the database for each binding set. I think it would be possible to bridge the .add(…)-based mechanism into DatabaseClient. The other type of batching, where commands are concatenated with ; can be used already because you're in control of the SQL statement.

@bwhyman
Copy link

bwhyman commented Jan 22, 2023

It works for me now.
spring-data-r2dbc: 6.0.1; jasync-r2dbc-mysql: 2.1.12.

@Autowired
private io.r2dbc.spi.ConnectionFactory connectionFactory;
@Transactional
    public Mono<Void> update(List<User> users) {
        String sql = "update user u set u.address=? where s.id=?";
        return DatabaseClient.create(connectionFactory).sql(sql).filter(statement -> {
            for (User u : users) {
                statement.bind(0, u.getAddress()).bind(1, s.getId()).add();
            }
            return statement;
        }).then();
    }

@snicoll
Copy link
Member

snicoll commented Aug 26, 2023

@mp911de I am happy helping the merge process but I could use a review on your side first.

@snicoll snicoll added the status: waiting-for-feedback We need additional information before we can continue label Aug 26, 2023
@mp911de
Copy link
Member

mp911de commented Aug 28, 2023

From the spec, we've learned that statement.….add().….add().….execute() (no trailing .add() before .execute()) is a pretty sharp edge leading easily to quite some confusion when using that API.

Considering batching with DatabaseClient would allow us to come up with a smoother API and an approach, that doesn't create DefaultGenericExecuteSpec instances for each bound parameter.

As a starting point, how about a bind (or params as in JdbcClient) method lambda, something along the lines of:

databaseClient.sql("INSERT INTO legoset (id, name, manual) VALUES(:id, :name, :manual)")
				.bind(params -> params.bind("name", name1).bind("manual", manual1))
				.bind(params -> params.bind("name", name2).bind("manual", manual2))
				.fetch();

The lambda parameter would be a simple interface carrying bind methods only to avoid confusion with Statement:

interface Params {
  Params bind(String name, Object value);
  Params bind(int index, Object value);
  Params bindNull(String name, Class<?> type);
  Params bindNull(int index, Class<?> type);
}

@snicoll
Copy link
Member

snicoll commented Sep 11, 2023

@MarkMarkyMarkus would you have some cycle to amend your PR with the review of Mark? If not, that's cool we can take over when time permits.

@snicoll snicoll added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Sep 11, 2023
@MarkMarkyMarkus
Copy link
Author

Yes, I will take a look.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Sep 11, 2023
@snicoll snicoll added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Sep 15, 2023
@MarkMarkyMarkus MarkMarkyMarkus force-pushed the feature/add-batch-operations branch from 063786f to 9bf751a Compare September 20, 2023 15:09
@snicoll snicoll removed the status: waiting-for-feedback We need additional information before we can continue label Sep 27, 2023
@jhoeller
Copy link
Contributor

Note that we intentionally designed our new JdbcClient without batch operations, so it is not a priority for R2DBC either. That said, if we can agree on an implementation style for multiple parameter bindings that is not at odds with the existing API design and that is reasonable to use with named parameters as well as indexed parameters, we may revisit this for both JDBC and R2DBC.

@jhoeller jhoeller removed the status: waiting-for-triage An issue we've not yet triaged or decided on label Nov 23, 2023
@jhoeller jhoeller added this to the 6.x Backlog milestone Nov 23, 2023
@hantsy
Copy link
Contributor

hantsy commented May 5, 2024

Hope there is a bind method to accept a list/array as params directly to avoid bind object one by one.

.bind(
  List.of(
    Tuple.of(param1, param2, param3...), 
    Tuple.of(param1, param2, param3...),
    ... 
  )
)

@jhoeller jhoeller modified the milestones: 6.x Backlog, General Backlog Jun 18, 2024
@sdeleuze sdeleuze self-assigned this Oct 17, 2024
@sdeleuze
Copy link
Contributor

sdeleuze commented Oct 17, 2024

I did some test with https://github.com/TechEmpower/FrameworkBenchmarks/tree/master/frameworks/Java/spring-webflux in order to use batch update instead of individual requests. Under high latency, batch update from this PR allows to increase the throughput by 50%, so I would be in favor of trying to move forward on this PR if we can find the right API.

Speaking about the API and related to @hantsy comment, I think we should evolve the API to make it easier to use a collection of data to bind, because if I am not mistaken, the current one is not practical to use for this very popular (likely the most popular) use case. I had to do something like that:

// Wrapper class is needed because variable used in lambda must be final or effectively final
var wrapper = new Object(){ GenericExecuteSpec spec =
        databaseClient.sql("UPDATE world SET randomnumber=:randomnumber WHERE id = :id"); };
worlds.forEach(world -> wrapper.spec = wrapper.spec.bind(params ->
        params.bind("id", world.id).bind("randomnumber", world.randomnumber)));
return wrapper.spec.fetch().rowsUpdated();

Maybe we should just have a GenericExecuteSpec method that takes a list/collection for the values to bind and a lambda to do binding logic, as a modernized version of JdbcTemplate#batchUpdate. I am not sure we should keep the variant proposed in this PR (I am not fully convinced we need batch updates for specifying the binding via the fluent API), but happy to get feedback on this.

I was not able to make it work with UPDATE world SET randomnumber=$2 WHERE id = $1 (which works for individual update) and had to use UPDATE world SET randomnumber=:randomnumber WHERE id = :id for the batch update variant. I am not sure why.

@bwhyman I was not able to make your filter() based proposal working.

@jhoeller @mp911de Any thoughts?

@bwhyman
Copy link

bwhyman commented Oct 19, 2024

databaseClient.inConnection(conn -> {
            Statement statement = conn.createStatement(sql);
            for (int i = 0; i < timetables.size(); i++) {
                var tb = timetables.get(i);
                statement.bind(0, snowflake.nextId())
                        .bind(1, collid)
                        .bind(2, tb.getStartweek())
                        .bind(3, tb.getEndweek());
                // The add() method cannot be called for the last time
                if (i < timetables.size() - 1) {
                    statement.add();
                }
            }
            return Flux.from(statement.execute()).collectList();
        });

R2DBC:3.3.1
Statement add() method cannot be called for the last time, it is very wired.

@sdeleuze sdeleuze added the status: waiting-for-internal-feedback An issue that needs input from a member or another Spring Team label Oct 21, 2024
@sdeleuze
Copy link
Contributor

sdeleuze commented Oct 28, 2024

@bwhyman I was able to use batch updates with what you suggested but using inConnectionMany instead of inConnection. Except the if (i < timetables.size() - 1) part, it is pretty straightforward.

@mp911de @jhoeller Do you think we could relax the requirement of skipping the last add() to make it easier to use? If we do, I am not sure we need a dedicated API, we can maybe just create a related issue, close this PR, and document something like:

databaseClient.inConnectionMany(conn -> {
            Statement statement = conn.createStatement(sql);
            for (World world : worlds) {
                statement.bind(0, world.randomnumber).bind(1, world.id).add();
            }
            return Flux.from(statement.execute()).collectList();
        });

I am also wondering if dedicated batch update support would provide even more performance benefits (the improvement I get in Techempower are visible here).

@mp911de
Copy link
Member

mp911de commented Oct 29, 2024

Implicit batching with the current API opens pathways for a new class of bugs.

DatabaseClient.sql(…).bind(0, foo).bind(0, bar) uses a single statement while DatabaseClient.sql(…).bind(params -> …).bind(params -> …) runs the same statement with two parameter binding sets. Ideally, after the new method's first invocation, the calling code should never get a chance to call bind(position|name, …). Also, we should have a method that indicates that we're entering batch mode to make it obvious (andBind(params -> …)?). Probably someone can come up with a better name, though.

I like bind(params -> …) as a general addition.

@sdeleuze
Copy link
Contributor

I like bind(params -> …) as a general addition.

Ok so if nobody objects, I will close this PR unmerged, create a related issue for bind(params -> …) and work on that tentatively targeting Spring Framework 6.2.0.

@sdeleuze
Copy link
Contributor

sdeleuze commented Oct 29, 2024

Closing this (old) PR unmerged in favor of #33812.

@sdeleuze sdeleuze closed this Oct 29, 2024
@sdeleuze sdeleuze added status: superseded An issue that has been superseded by another and removed status: waiting-for-internal-feedback An issue that needs input from a member or another Spring Team labels Oct 29, 2024
@jhoeller jhoeller removed this from the General Backlog milestone Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: data Issues in data modules (jdbc, orm, oxm, tx) status: superseded An issue that has been superseded by another type: enhancement A general enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for batch insert