Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for batch insert #259

Open
deblockt opened this issue Dec 18, 2019 · 14 comments
Open

Add support for batch insert #259

deblockt opened this issue Dec 18, 2019 · 14 comments
Labels
status: pending-design-work Needs design work before any code can be developed

Comments

@deblockt
Copy link

I have try to perform a batch insert using DatabaseClient.
I have not found a solution.

I have a wrokaround using a Statement but to get a statement I should create my own Connection.

Have you planned to support a batch insert using add like with Statement SPI

@istarion
Copy link

You can obtain connection with ConnectionFactoryUtils helper method:
val connection = ConnectionFactoryUtils.getConnection(connectionFactory).awaitSingle()

@deblockt
Copy link
Author

Thanks, so far I have used ConnectionAccessor to get the same connection as DatabaseClient.

@mp911de
Copy link
Member

mp911de commented Jan 22, 2020

Currently, we don't have support for batching. There are two types of batches that R2DBC supports:

  • Unparametrized via Connection.createBatch()
  • Parametrized via Statement.add()

Unparametrized batches could be supported through execute(List<String>). However, the execute interfaces expose bind(…) methods that do not seem appropriate in that context and we would require introducing another set of interfaces.

Parametrized statements seem straight-forward, but there's a caveat that is due to named parameter resolution. Named parameter support considers if a bound value is a collection type. If so, then the SQL expansion creates a parameter placeholder for each element in the Collection

Example:

client.execute("INSERT INTO foo VALUES(:my_param)")
    .bind("my_param", "a-value")

Resulting SQL (Postgres syntax):

INSERT INTO foo VALUES($1)
client.execute("INSERT INTO foo VALUES(:my_param)")
    .bind("my_param", Arrays.asList("one", "two"))

Resulting SQL (Postgres syntax):

INSERT INTO foo VALUES($1, $2)

For batching, this example makes little sense as binding a collection to an INSERT has very little use. The point I'm trying to make is that if the parameter multiplicity changes across bindings with named parameters processing enabled, the resulting SQL is no longer the same and we cannot use batching.

client.execute("INSERT INTO foo VALUES(:my_param)")
    .bind("my_param", Arrays.asList("one"))
    .add()
    .bind("my_param", Arrays.asList("one", "two"))

@mp911de mp911de added the status: pending-design-work Needs design work before any code can be developed label Jan 22, 2020
@gjgarryuan
Copy link

gjgarryuan commented Jun 18, 2020

@mp911de

I tried the code sample you provided above

client.execute("INSERT INTO foo VALUES(:my_param)")
    .bind("my_param", Arrays.asList("one"))
    .add()
    .bind("my_param", Arrays.asList("one", "two"))

I assume the client is the DatabaseClient where the DatabaseClient#bind method returns a BindSpec, not a Statement object where the Statement#add is available.

Is it possible to use parameterized statements to form batching via DatabaseClient currently?

In addition, is it possible to use parameterized insert statement with multiple values? For instance, INSERT INTO foo VALUES ("foo", "bar"), ("FOO", "BAR"). Can I do something like:

final List<Object[]> tuples = new ArrayList<>();
tuples.add(new Object[] {"foo", "bar"});
tuples.add(new Object[] {"FOO", "BAR"});

client.execute("INSERT INTO foo VALUES :tuples")
         .bind("tuples", tuples)

@spachip
Copy link

spachip commented Jul 15, 2020

Hi, is there any update on this? I couldn't find the support for batch operations using DatabaseClient. Can you please confirm if the current batch operations support is limited to Statement and Connection objects?

@abhinaba-chakraborty-by
Copy link

abhinaba-chakraborty-by commented Sep 15, 2020

Hey @mp911de ,
I have a scenario where my table has an autogenerated id column and I need to bulk insert items into db and fetch the generated ids. Is there any way I can achieve that?

This is my table:

CREATE TABLE test_table (
  `id` SERIAL NOT NULL,
  `name` VARCHAR(100) NOT NULL,
  `created_date` DATETIME NOT NULL,
  PRIMARY KEY (`id`)
);

To save a list of items, the code I am using:

String initialSql = "INSERT INTO test_table(`name`,`created_date`) VALUES ";

    List<String> values =
        dummyEntities.stream()
            .map(dummyEntity -> "('" + dummyEntity.getName() + "','"
                + dummyEntity.getCreatedDate().atZoneSameInstant(ZoneId.of("UTC")).toLocalDateTime().toString() + "')")
            .collect(Collectors.toList());

    String sqlToExecute =  initialSql + String.join(",", values);
    client.execute(sqlToExecute)
             .//Then what?

The generated SQL statement (from DEBUG Logs):

2020-09-15 18:59:32.613 DEBUG 44801 --- [actor-tcp-nio-1] o.s.d.r2dbc.core.DefaultDatabaseClient   : Executing SQL statement [INSERT INTO test_table(`name`,`created_date`) VALUES ('Abhi57','2020-09-15T13:29:29.951964'),('Abhi92','2020-09-15T13:29:29.952023')]

I even tried using ConnectionFactory, still no clue

    Mono.from(connectionFactory.create())
        .map(Connection::createBatch)
        .map(batch -> {
          dummyEntities.forEach(dummyEntity -> {
            String sql = String.format("INSERT INTO `test_table` (`name`,`created_date`) VALUES ('%s','%s');", dummyEntity.getName(),
                dummyEntity.getCreatedDate().atZoneSameInstant(ZoneId.of("UTC")).toLocalDateTime().toString());
            batch.add(sql);
          });
          return batch;
        })
        .flatMap(batch -> Mono.from(batch.execute()))
        .//Then what?

@mp911de
Copy link
Member

mp911de commented Sep 22, 2020

Depending on the database type, you need to tell the database to return generated keys (see the Postgres documentation on RETURNING). Then, extract the generated keys by consuming Result.map((row, metadata) -> row.get("name_of_id_column", Long.class)).

@luiccn
Copy link

luiccn commented Dec 17, 2020

Any updates on this? It's quite an important feature to be overlooked for so long. Tomorrow it's the 1 year birthday of no batch inserts :(

@mp911de
Copy link
Member

mp911de commented Dec 17, 2020

Thanks @luiccn for reminding us. Meanwhile, DatabaseClient went into Spring Framework and we need to move this ticket there.

Keep in mind that contributing to open source is essential if you want something to happen sooner than it would take the regular way. We'd be more than happy to work with you on the final design if you find the time to come up with a proposal and a pull request.

@aoudiamoncef
Copy link

Hi @mp911de,

I'm interested to work on this issue, please could you give me more information about it

Thanks

@markusheiden
Copy link

markusheiden commented Jul 9, 2023

Is there still no batching possible, except directly via the Connection?

Using the raw batches at the connection is risky because all escaping has to be done "by hand".

Batches perform way better than multiple inserts, even if using Statement.add().add()....execute(). I wonder why Statement.add() does not use a batch under the hood?

@mp911de
Copy link
Member

mp911de commented Jul 13, 2023

Batches perform way better than multiple inserts

Databases provide various options for batching. Users expect an improved approach to apply multiple operations with a noticeable throughput gain from batching.
This can be achieved by either concatenating multiple (unparametrized) SQL operations and sending these with a single request (INSERT INTO foo …;UPDATE bar …;DELETE FROM baz …).

DatabaseClient has a textual interface that allows you to specify such a query.

even if using Statement.add().add()....execute(). I wonder why Statement.add() does not use a batch under the hood?

The other approach to batching that we've seen in databases is the usage of prepared statements. Each prepared statement operation requires multiple roundtrips (prepare, execute, close). We've seen that API in some databases such as Postgres. Some drivers implement a prepared statement cache that can reuse prepared statements if the SQL text and the parameter types match a previously executed statement. That way, Statement.add() leads to the reuse of prepared statements. However, there is no database API (among Postgres, SQL Server, H2, MariaDB) to run a parametrized statement with a table of bindings expecting batching-like behavior.

And that leads us back to the starting point: If you can render SQL statements (e.g. inserts) with safe inline parameters, then concatenate these into a semicolon-combined SQL string, then running that query will yield the best performance from a networking perspective.

@markusheiden
Copy link

markusheiden commented Sep 11, 2023

Thanks for pointing me to the batch usage of DatabaseClient. We will use that.

Though the DatabaseClient does the job, it requires boilerplate code e.g. for batch inserts or updates:

  1. You have to concatenate the same insert or update statement multiple times for DatabaseClient#sql.
  2. You have to ensure that the binds match the number of statements and the order of parameters therein.

Thus I still would very much like it if the support for batching would be more integrated into the API, e.g.

  • The repositories would support batching e.g. for saveAll().
  • Statement#add results in batching.

Please don't consider this as a complaint. It is just a feature request :-)

@Serjohn27
Copy link

+1 for above, not sure what underlying complexity is , but I would love to see batch inserts through saveAll() API similar to jdbc as well !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: pending-design-work Needs design work before any code can be developed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants