rgw/sfs: sqlite_modern_cpp blobs, users and objects changes #246

0xavi0 · 2023-11-14T08:45:58Z

Adds another step on moving the sqlite code to sqlite_modern_cpp.

Adds the code to store/retrieve ceph types as blobs. Adds a new mechanism to declare when a type that has encode and decode functions needs to be stored as a blob.

In order to add a new type, simple add the type to the following tuple.

using BlobTypes = std::tuple<
    rgw::sal::Attrs, ACLOwner, rgw_placement_rule,
    std::map<std::string, RGWAccessKey>, std::map<std::string, RGWSubUser>,
    RGWUserCaps, std::list<std::string>, std::map<int, std::string>,
    RGWQuotaInfo, std::set<std::string>, RGWBucketWebsiteConf,
    std::map<std::string, uint32_t>, RGWObjectLock, rgw_sync_policy_info>

Adds a new header (dbabpi_type_wrapper.h) file that should be included when declaring type bindings for sqlite_modern_cpp.
We need this to avoid circular dependencies with the bind_col_in_db function from sqlite_moden_cpp.
This (from the main sqlite_modern_cpp.h file):

	template<typename T> database_binder &operator<<(database_binder& db, index_binding_helper<T> val) {
		db._next_index(); --db._inx;
		int result = bind_col_in_db(db._stmt.get(), val.index, std::forward<T>(val.value));
		if(result != SQLITE_OK)
			exceptions::throw_sqlite_error(result, db.sql(), sqlite3_errmsg(db._db.get()));
		return db;
	}

calls bind_col_in_db, but the function specialisation needs to be declared first.
So, when declaring type bindings we should only include sqlite_modern_cpp/type_wrapper.h

Changes all BLOB types in Users to the binding types.

Changes all functions in sqlite_users to use sqlite_modern_cpp.

Deletes the conversions files for Users. (conversion is still needed because we need to create the RGWUserInfo object needed for SAL layer, the conversion required is only db->SAL )

Changes all funcions in sqlite_objects to use sqlite_modern_cpp

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Adds another step on moving the sqlite code to sqlite_modern_cpp. Adds the code to store/retrieve ceph types as blobs. Adds a new mechanism to declare when a type that has encode and decode functions needs to be stored as a blob. In order to add a new type, simple add the type to the following tuple. ```c++ using BlobTypes = std::tuple< rgw::sal::Attrs, ACLOwner, rgw_placement_rule, std::map<std::string, RGWAccessKey>, std::map<std::string, RGWSubUser>, RGWUserCaps, std::list<std::string>, std::map<int, std::string>, RGWQuotaInfo, std::set<std::string>, RGWBucketWebsiteConf, std::map<std::string, uint32_t>, RGWObjectLock, rgw_sync_policy_info> ``` Adds a new header (`dbabpi_type_wrapper.h`) file that should be included when declaring type bindings for `sqlite_modern_cpp`. We need this to avoid circular dependencies with the `bind_col_in_db` function from `sqlite_moden_cpp`. This (from the main `sqlite_modern_cpp.h` file): ```c++ template<typename T> database_binder &operator<<(database_binder& db, index_binding_helper<T> val) { db._next_index(); --db._inx; int result = bind_col_in_db(db._stmt.get(), val.index, std::forward<T>(val.value)); if(result != SQLITE_OK) exceptions::throw_sqlite_error(result, db.sql(), sqlite3_errmsg(db._db.get())); return db; } ``` calls `bind_col_in_db`, but the function specialisation needs to be declared first. So, when declaring type bindings we should only include `sqlite_modern_cpp/type_wrapper.h` Changes all BLOB types in Users to the binding types. Changes all functions in `sqlite_users` to use `sqlite_modern_cpp`. Deletes the conversions files for Users. (conversion is still needed because we need to create the `RGWUserInfo` object needed for SAL layer, the conversion required is only db->SAL ) Changes all funcions in `sqlite_objects` to use `sqlite_modern_cpp` Signed-off-by: Xavi Garcia <[email protected]>

0xavi0 · 2023-11-14T10:43:14Z

I think I'll need to work on getting connections from the connection pool, given that the concurrency unit tests fail.
So consider this as NOT READY for review.

tserong · 2023-11-15T08:18:25Z

I think I'll need to work on getting connections from the connection pool, given that the concurrency unit tests fail.
So consider this as NOT READY for review.

Just looking at the current implementation of DBConn::get()...

ceph/src/rgw/driver/sfs/sqlite/dbconn.h

Lines 288 to 290 in 5f48c7b

    
           dbapi::sqlite::database get() { 
        
             return dbapi::sqlite::database(get_storage()->filename()); 
        
           }

...this isn't using the connection pool at all. That's just going to create a new dbapi::sqlite::database every time it's called. We'd need something like the existing std::unordered_map<std::thread::id, Storage> storage_pool, but for dbapi::sqlite::databases instead of Storages.

Signed-off-by: Xavi Garcia <[email protected]>

This adds a temporary workaround for being able to use sqlite_modern_cpp and sqlite_orm together. It adds a new pool of connections of std::shared_ptr<sqlite> because the new database objects from sqlite_modern_cpp has a constructor that gets that in its constructor and its the only thing needed. This should be reworked once the code if fully migrated to sqlite_modern_cpp only. Signed-off-by: Xavi Garcia <[email protected]>

0xavi0 · 2023-11-15T16:07:31Z

I've added a temporary workaround to be able to use sqlite modern cpp along with sqlite_orm.
As the initial connection (and creation) of the database is done with the sqlite_orm code I needed to store the a std::shared_ptr<sqlite3> for that connection.

I'm not storing the whole dbapi::sqlite::database object because the constructor that takes the std::shared_ptr<sqlite3> is only doing this:

database(std::shared_ptr<sqlite3> db):
			_db(db) {}

As all the code before this patch was using a copy of dbapi::sqlite::database I thought it was more straightforward to keep using the copy (given that the copy is not expensive).

We maybe prefer to revisit this when the initial connection is done with sqlite modern cpp.

Also... I'm using the same locks as the actual code might request the database connection from code based on both libraries.

It's a bit messy right now but I don't know if we should spend too much time on something that will be possibly deleted.

I would not merge this PR before Friday, so it's not released next week, that would give us more time to finish all the migration work before next release.

giubacc · 2023-11-16T14:21:19Z

src/rgw/driver/sfs/sqlite/bindings/blob.h

+    std::map<std::string, RGWAccessKey>, std::map<std::string, RGWSubUser>,
+    RGWUserCaps, std::list<std::string>, std::map<int, std::string>,
+    RGWQuotaInfo, std::set<std::string>, RGWBucketWebsiteConf,
+    std::map<std::string, uint32_t>, RGWObjectLock, rgw_sync_policy_info>;


Should we add also: RGWBucketInfo?
Saying this because I'm using encode/decode functions over this type but maybe not needed.

If we store the whole bucket info as a blob we can't then query over specific bucket columns. That's why we didn't store buckets that way.

giubacc · 2023-11-16T14:33:40Z

src/rgw/driver/sfs/sqlite/sqlite_objects.cc

+      << bucket_id << object_name;
+  std::optional<DBObject> ret_object;
+  for (auto&& row : rows) {
+    ret_object = DBObject(row);


std::move(row) otherwise you are calling the non-move constructor of DBObject (hopefully move constructor of DBObject could be more fast than the classical copy constructor)

std::move just does a cast. We also would need to add the constructor that has the rvalue, otherwise it would still call the copy one.

Good point, though, I will experiment and see if we can avoid copies.

The && is here as an universal reference which accepts everything.

std::move just does a cast.

These considerations because when you read auto&& row you'd expect the row reference to be then bound with move enabled functions/constructor/methods.
Otherwise no need to use the && notation in the for.

Good point, though, I will experiment and see if we can avoid copies.

yup, this considerations are valid only if we support a DBObject(.. &&row) constructor (and if with that implementation we can obtain some advantages)

Regarding the move semantics for all conversions... I've done a few experiments and checked the sqlite modern cpp library and I think we can get use of moves, but we'll still need extra work for taking the benefit in the conversion utils file.
Right now we're not using move semantics for the optional assignment values, but I guess it could be beneficial in the future.
We said to explore this in a future PR, when the whole port to sqlite modern cpp is finished.

Anyway... as the rows are defined with the && notation I'm using std::move and changed the constructors involved.

giubacc · 2023-11-16T14:38:24Z

src/rgw/driver/sfs/sqlite/sqlite_query_utils.h

+  auto rows = db << fmt::format("SELECT * FROM {};", table_name);
+  std::vector<Target> ret;
+  for (auto&& row : rows) {
+    ret.emplace_back(Target(row));


std::move(row) , same considerations from above

giubacc · 2023-11-16T14:38:47Z

src/rgw/driver/sfs/sqlite/sqlite_query_utils.h

+         << column_value;
+  std::vector<Target> ret;
+  for (auto&& row : rows) {
+    ret.emplace_back(Target(row));


std::move(row) , same considerations from above

giubacc · 2023-11-16T14:39:21Z

src/rgw/driver/sfs/sqlite/sqlite_query_utils.h

+         << key_value;
+  std::optional<Target> ret;
+  for (auto&& row : rows) {
+    ret = Target(row);


std::move(row) , same as above

giubacc · 2023-11-16T14:42:20Z

src/rgw/driver/sfs/sqlite/sqlite_users.cc

+  dbapi::sqlite::database db = conn->get();
+  auto rows = db << R"sql(SELECT user_id FROM users;)sql";
+  std::vector<std::string> ret;
+  for (std::tuple<std::string> row : rows) {


maybe you can use the && notation here if row can be moved and use std::move(row) in the subsequent constructor

We know in this case that the type returned by the query is std::tuple<std::string>.

I used the && notation in the templated code because we don't know the type at that point.
I've also tested using the deduction form (auto&&) and the tuple form and in both cases the library is creating a row_iterator & and we can always cast to an rvalue using std::move. Anyway, all this will be revisited when the port to sqlite modern cpp is finished.

In this case it's the std::sting inside the tuple the one that we can move.
I've changed the code to move the string.

giubacc · 2023-11-16T14:47:38Z

src/rgw/driver/sfs/sqlite/sqlite_users.cc

+    SELECT user_id FROM access_keys
+    WHERE access_key = ?;)sql"
+                 << key;
+  for (std::tuple<std::string> row : rows) {


I'd avoid using a for in this way for readability, if we are interested in the first record, there should be a more straightforward way to obtain that.
Also, same considerations on moves, maybe you can use the && notation here if row can be moved and use std::move(row) in the subsequent assignment

I'm afraid this library is not offering direct access to rows.
sqlite modern cpp is not creating an intermediate vector with all results in it, like sqlite_orm.
This will give us hopefully faster access, but all we get from the library is the classic methods for iterators (begin, end, operator++...)

This library is calling sqlite next functions for every iteration, so there is not straightforward way to access the first row of a query (without copying the result to an intermediate vector as stated above). We get the data as it is accessed right after the call of sqlite3 next.

I'm afraid this library is not offering direct access to rows.

I think this is more than reasonable, otherwise it would mean the library would load in memory an arbitrary amount of rows, I really hope the library is implementing an iterator like behavior :) .

I think enhanced for in C++ is just a sugar-syntax wrapper using an iterator taken from the collection you are iterating on.
If this were the case should not be difficult accessing the first element with a row.begin() statement.
Anyway, as you prefer, saying this just because I don't like seeing a for used in this way, but it is just personal taste.

oh, do you mean by doing this?

auto row = *rows.begin();

I tried that, but it didn't work as expected.

I tried doing this:

if (rows.begin() != rows.end()) { std::tuple<std::string> val = *rows.begin(); }

The above didn't work because calling begin changes the state internally and the next time we call begin value is not returned as expected. I gave up because although I dislike the for thing there's a lot to do with the port.

This seems to work fine, so I've changed the code to do it this way:

auto iter = rows.begin(); if (iter != rows.end()) { std::tuple<std::string> row = *iter; ... }

Yup, as said, this is matter of personal tastes, not really an issue ;).
Thanks!

irq0 · 2023-11-16T15:14:40Z

src/rgw/driver/sfs/sqlite/dbconn.cc

@@ -187,6 +196,24 @@ StorageRef DBConn::get_storage() {
  }
 }

+dbapi::sqlite::database DBConn::get() {


If a thread already called get_storage() this returns that connection (set in on_open) with the sqlite init we exect (busy timeout, error log, profiling settings)

If that thread did not, this is a new connection with sqlite modern cpp default sqlite_open settings.

We should avoid mixing sqlite connections with different initializations.

Would it suffice to just call get_storage() in the catch, ignore the result and take storage_pool_new.at() which was set in on_open?

Good point (I was wondering about that too). I suspect that workaround you suggested should work.

Yup, that's something I was afraid of. I will try that workaround.

I changed this to use get_connection() to open the db the same way for all cases, meanwhile we finish porting the rest of the code.

irq0 · 2023-11-16T15:16:20Z

src/rgw/driver/sfs/sqlite/bindings/blob.h


-template <>
-struct __is_sqlite_blob<rgw_placement_rule> : std::true_type {};
+// list of types that are stored as blobs and have the encode/decode functions


This is nice! A comment like "To add automagic blob handling for a datastructure with ceph encode/decode add it here" would be awesome. I suspect we'll have to blobify something else eventually

sure, I'll add a comment.
In fact I was also thinking to do the same for this part:

// by default type's decode function is under the ceph namespace template <typename T> struct __ceph_ns_decode : std::true_type {}; template <typename T> inline constexpr bool ceph_ns_decode = __ceph_ns_decode<T>::value; // specialize the ones that are not under the ceph namespace template <> struct __ceph_ns_decode<RGWAccessControlPolicy> : std::false_type {}; template <> struct __ceph_ns_decode<RGWQuotaInfo> : std::false_type {}; template <> struct __ceph_ns_decode<RGWObjectLock> : std::false_type {}; template <> struct __ceph_ns_decode<RGWUserCaps> : std::false_type {}; template <> struct __ceph_ns_decode<ACLOwner> : std::false_type {}; template <> struct __ceph_ns_decode<rgw_placement_rule> : std::false_type {};

Adding a tuple and avoid to have to specialise this way.
I'll add more verbose comments to explain how to add a new blob type

irq0 · 2023-11-16T15:21:05Z

src/rgw/driver/sfs/sqlite/dbconn.h

 // TODO(https://github.com/aquarist-labs/s3gw/issues/788): Make
 // dbapi::sqlite::database the primary interface for sqlite3.
 class DBConn {
 private:
  std::unordered_map<std::thread::id, Storage> storage_pool;
+  std::unordered_map<std::thread::id, ConnectionNewLib> storage_pool_new;


Just a general discussion note:

It's a bit suboptimal that sqlite_modern_cpp want's a shared ptr. I'd rather have DBConn own all the sqlite connections with a unique_ptr. It's lifetime is already equal to sfs / RGW.

The nice thing is that with sqlite_modern_cpp's std::shared_ptr constructor we can take full charge of the sqlite database initialization.

irq0 · 2023-11-16T15:23:53Z

src/rgw/driver/sfs/sqlite/sqlite_query_utils.h

+    const std::string& key_name, const KeyType& key_value
+) {
+  auto rows =
+      db << fmt::format("SELECT * FROM {} WHERE {} = ?;", table_name, key_name)


Add a LIMIT 1?

yup, will do

Started using move semantics for conversions (more to come). DBConn::get uses sqlite_orm's `get_storage` so we are opening the database in the same way. More blob generalisation. Signed-off-by: Xavi Garcia <[email protected]>

0xavi0 force-pushed the sqlite-modern-blobs-users-objects branch from 2a30782 to 5f4ae90 Compare November 14, 2023 09:03

0xavi0 marked this pull request as draft November 14, 2023 10:22

jecluis assigned 0xavi0 Nov 14, 2023

jecluis added kind/enhancement Change that positively impacts existing code area/rgw-sfs RGW & SFS related labels Nov 14, 2023

0xavi0 added 2 commits November 15, 2023 16:18

rgw/sfs: Fix sqlite_users bug

0e21b59

Signed-off-by: Xavi Garcia <[email protected]>

0xavi0 marked this pull request as ready for review November 15, 2023 16:07

0xavi0 requested review from irq0, tserong, jecluis and giubacc November 15, 2023 16:07

jecluis modified the milestones: v0.23.0, v0.22.0, v0.24.0 Nov 15, 2023

jecluis added the priority/1 Should be fixed for next release label Nov 15, 2023

giubacc reviewed Nov 16, 2023

View reviewed changes

irq0 reviewed Nov 16, 2023

View reviewed changes

0xavi0 requested review from irq0 and giubacc November 22, 2023 16:52

rgw/sfs: Few changes around sqlite modern cpp adoption

d63d69c

Started using move semantics for conversions (more to come). DBConn::get uses sqlite_orm's `get_storage` so we are opening the database in the same way. More blob generalisation. Signed-off-by: Xavi Garcia <[email protected]>

0xavi0 force-pushed the sqlite-modern-blobs-users-objects branch from 4e2c8fe to d63d69c Compare November 23, 2023 08:44

jecluis approved these changes Dec 4, 2023

View reviewed changes

jecluis merged commit 1971d89 into aquarist-labs:s3gw Dec 4, 2023
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rgw/sfs: sqlite_modern_cpp blobs, users and objects changes #246

rgw/sfs: sqlite_modern_cpp blobs, users and objects changes #246

0xavi0 commented Nov 14, 2023 •

edited

Loading

0xavi0 commented Nov 14, 2023

tserong commented Nov 15, 2023

0xavi0 commented Nov 15, 2023 •

edited

Loading

giubacc Nov 16, 2023

0xavi0 Nov 17, 2023

giubacc Nov 17, 2023

giubacc Nov 16, 2023

0xavi0 Nov 17, 2023

giubacc Nov 17, 2023

0xavi0 Nov 22, 2023

giubacc Nov 16, 2023

giubacc Nov 16, 2023

giubacc Nov 16, 2023

giubacc Nov 16, 2023

0xavi0 Nov 22, 2023

giubacc Nov 16, 2023

0xavi0 Nov 22, 2023

giubacc Nov 22, 2023

0xavi0 Nov 23, 2023

giubacc Nov 23, 2023

irq0 Nov 16, 2023

tserong Nov 17, 2023

0xavi0 Nov 17, 2023

0xavi0 Nov 22, 2023

irq0 Nov 16, 2023

0xavi0 Nov 17, 2023

irq0 Nov 16, 2023 •

edited

Loading

irq0 Nov 16, 2023

0xavi0 Nov 17, 2023

rgw/sfs: sqlite_modern_cpp blobs, users and objects changes #246

rgw/sfs: sqlite_modern_cpp blobs, users and objects changes #246

Conversation

0xavi0 commented Nov 14, 2023 • edited Loading

Checklist

0xavi0 commented Nov 14, 2023

tserong commented Nov 15, 2023

0xavi0 commented Nov 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

irq0 Nov 16, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

0xavi0 commented Nov 14, 2023 •

edited

Loading

0xavi0 commented Nov 15, 2023 •

edited

Loading

irq0 Nov 16, 2023 •

edited

Loading