Skip to content
This repository has been archived by the owner on Jan 3, 2024. It is now read-only.

rgw/sfs: sqlite_modern_cpp blobs, users and objects changes #246

Merged

Conversation

0xavi0
Copy link

@0xavi0 0xavi0 commented Nov 14, 2023

Adds another step on moving the sqlite code to sqlite_modern_cpp.

Adds the code to store/retrieve ceph types as blobs. Adds a new mechanism to declare when a type that has encode and decode functions needs to be stored as a blob.

In order to add a new type, simple add the type to the following tuple.

using BlobTypes = std::tuple<
    rgw::sal::Attrs, ACLOwner, rgw_placement_rule,
    std::map<std::string, RGWAccessKey>, std::map<std::string, RGWSubUser>,
    RGWUserCaps, std::list<std::string>, std::map<int, std::string>,
    RGWQuotaInfo, std::set<std::string>, RGWBucketWebsiteConf,
    std::map<std::string, uint32_t>, RGWObjectLock, rgw_sync_policy_info>

Adds a new header (dbabpi_type_wrapper.h) file that should be included when declaring type bindings for sqlite_modern_cpp.
We need this to avoid circular dependencies with the bind_col_in_db function from sqlite_moden_cpp.
This (from the main sqlite_modern_cpp.h file):

	template<typename T> database_binder &operator<<(database_binder& db, index_binding_helper<T> val) {
		db._next_index(); --db._inx;
		int result = bind_col_in_db(db._stmt.get(), val.index, std::forward<T>(val.value));
		if(result != SQLITE_OK)
			exceptions::throw_sqlite_error(result, db.sql(), sqlite3_errmsg(db._db.get()));
		return db;
	}

calls bind_col_in_db, but the function specialisation needs to be declared first.
So, when declaring type bindings we should only include sqlite_modern_cpp/type_wrapper.h

Changes all BLOB types in Users to the binding types.

Changes all functions in sqlite_users to use sqlite_modern_cpp.

Deletes the conversions files for Users. (conversion is still needed because we need to create the RGWUserInfo object needed for SAL layer, the conversion required is only db->SAL )

Changes all funcions in sqlite_objects to use sqlite_modern_cpp

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)

Adds another step on moving the sqlite code to sqlite_modern_cpp.

Adds the code to store/retrieve ceph types as blobs.
Adds a new mechanism to declare when a type that has encode and decode
functions needs to be stored as a blob.

In order to add a new type, simple add the type to the following tuple.
```c++
using BlobTypes = std::tuple<
    rgw::sal::Attrs, ACLOwner, rgw_placement_rule,
    std::map<std::string, RGWAccessKey>, std::map<std::string, RGWSubUser>,
    RGWUserCaps, std::list<std::string>, std::map<int, std::string>,
    RGWQuotaInfo, std::set<std::string>, RGWBucketWebsiteConf,
    std::map<std::string, uint32_t>, RGWObjectLock, rgw_sync_policy_info>
```

Adds a new header (`dbabpi_type_wrapper.h`) file that should be included when declaring type
bindings for `sqlite_modern_cpp`.
We need this to avoid circular dependencies with the `bind_col_in_db`
function from `sqlite_moden_cpp`.
This (from the main `sqlite_modern_cpp.h` file):
```c++
	template<typename T> database_binder &operator<<(database_binder& db, index_binding_helper<T> val) {
		db._next_index(); --db._inx;
		int result = bind_col_in_db(db._stmt.get(), val.index, std::forward<T>(val.value));
		if(result != SQLITE_OK)
			exceptions::throw_sqlite_error(result, db.sql(), sqlite3_errmsg(db._db.get()));
		return db;
	}
```
calls `bind_col_in_db`, but the function specialisation needs to be
declared first.
So, when declaring type bindings we should only include
`sqlite_modern_cpp/type_wrapper.h`

Changes all BLOB types in Users to the binding types.
Changes all functions in `sqlite_users` to use `sqlite_modern_cpp`.
Deletes the conversions files for Users. (conversion is still needed
because we need to create the `RGWUserInfo` object needed for SAL
layer, the conversion required is only db->SAL )

Changes all funcions in `sqlite_objects` to use `sqlite_modern_cpp`

Signed-off-by: Xavi Garcia <[email protected]>
@0xavi0 0xavi0 force-pushed the sqlite-modern-blobs-users-objects branch from 2a30782 to 5f4ae90 Compare November 14, 2023 09:03
@0xavi0 0xavi0 marked this pull request as draft November 14, 2023 10:22
@0xavi0
Copy link
Author

0xavi0 commented Nov 14, 2023

I think I'll need to work on getting connections from the connection pool, given that the concurrency unit tests fail.
So consider this as NOT READY for review.

@jecluis jecluis added kind/enhancement Change that positively impacts existing code area/rgw-sfs RGW & SFS related labels Nov 14, 2023
@tserong
Copy link
Member

tserong commented Nov 15, 2023

I think I'll need to work on getting connections from the connection pool, given that the concurrency unit tests fail.
So consider this as NOT READY for review.

Just looking at the current implementation of DBConn::get()...

dbapi::sqlite::database get() {
return dbapi::sqlite::database(get_storage()->filename());
}

...this isn't using the connection pool at all. That's just going to create a new dbapi::sqlite::database every time it's called. We'd need something like the existing std::unordered_map<std::thread::id, Storage> storage_pool, but for dbapi::sqlite::databases instead of Storages.

This adds a temporary workaround for being able to use sqlite_modern_cpp
and sqlite_orm together.

It adds a new pool of connections of std::shared_ptr<sqlite> because the
new database objects from sqlite_modern_cpp has a constructor that gets
that in its constructor and its the only thing needed.

This should be reworked once the code if fully migrated to
sqlite_modern_cpp only.

Signed-off-by: Xavi Garcia <[email protected]>
@0xavi0
Copy link
Author

0xavi0 commented Nov 15, 2023

I've added a temporary workaround to be able to use sqlite modern cpp along with sqlite_orm.
As the initial connection (and creation) of the database is done with the sqlite_orm code I needed to store the a std::shared_ptr<sqlite3> for that connection.

I'm not storing the whole dbapi::sqlite::database object because the constructor that takes the std::shared_ptr<sqlite3> is only doing this:

database(std::shared_ptr<sqlite3> db):
			_db(db) {}

As all the code before this patch was using a copy of dbapi::sqlite::database I thought it was more straightforward to keep using the copy (given that the copy is not expensive).

We maybe prefer to revisit this when the initial connection is done with sqlite modern cpp.

Also... I'm using the same locks as the actual code might request the database connection from code based on both libraries.

It's a bit messy right now but I don't know if we should spend too much time on something that will be possibly deleted.

I would not merge this PR before Friday, so it's not released next week, that would give us more time to finish all the migration work before next release.

@0xavi0 0xavi0 marked this pull request as ready for review November 15, 2023 16:07
@jecluis jecluis modified the milestones: v0.23.0, v0.22.0, v0.24.0 Nov 15, 2023
@jecluis jecluis added the priority/1 Should be fixed for next release label Nov 15, 2023
std::map<std::string, RGWAccessKey>, std::map<std::string, RGWSubUser>,
RGWUserCaps, std::list<std::string>, std::map<int, std::string>,
RGWQuotaInfo, std::set<std::string>, RGWBucketWebsiteConf,
std::map<std::string, uint32_t>, RGWObjectLock, rgw_sync_policy_info>;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add also: RGWBucketInfo?
Saying this because I'm using encode/decode functions over this type but maybe not needed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we store the whole bucket info as a blob we can't then query over specific bucket columns. That's why we didn't store buckets that way.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right!

<< bucket_id << object_name;
std::optional<DBObject> ret_object;
for (auto&& row : rows) {
ret_object = DBObject(row);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::move(row) otherwise you are calling the non-move constructor of DBObject (hopefully move constructor of DBObject could be more fast than the classical copy constructor)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::move just does a cast. We also would need to add the constructor that has the rvalue, otherwise it would still call the copy one.

Good point, though, I will experiment and see if we can avoid copies.

The && is here as an universal reference which accepts everything.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::move just does a cast.

These considerations because when you read auto&& row you'd expect the row reference to be then bound with move enabled functions/constructor/methods.
Otherwise no need to use the && notation in the for.

Good point, though, I will experiment and see if we can avoid copies.

yup, this considerations are valid only if we support a DBObject(.. &&row) constructor (and if with that implementation we can obtain some advantages)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the move semantics for all conversions... I've done a few experiments and checked the sqlite modern cpp library and I think we can get use of moves, but we'll still need extra work for taking the benefit in the conversion utils file.
Right now we're not using move semantics for the optional assignment values, but I guess it could be beneficial in the future.
We said to explore this in a future PR, when the whole port to sqlite modern cpp is finished.

Anyway... as the rows are defined with the && notation I'm using std::move and changed the constructors involved.

auto rows = db << fmt::format("SELECT * FROM {};", table_name);
std::vector<Target> ret;
for (auto&& row : rows) {
ret.emplace_back(Target(row));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::move(row) , same considerations from above

<< column_value;
std::vector<Target> ret;
for (auto&& row : rows) {
ret.emplace_back(Target(row));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::move(row) , same considerations from above

<< key_value;
std::optional<Target> ret;
for (auto&& row : rows) {
ret = Target(row);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::move(row) , same as above

dbapi::sqlite::database db = conn->get();
auto rows = db << R"sql(SELECT user_id FROM users;)sql";
std::vector<std::string> ret;
for (std::tuple<std::string> row : rows) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you can use the && notation here if row can be moved and use std::move(row) in the subsequent constructor

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We know in this case that the type returned by the query is std::tuple<std::string>.

I used the && notation in the templated code because we don't know the type at that point.
I've also tested using the deduction form (auto&&) and the tuple form and in both cases the library is creating a row_iterator & and we can always cast to an rvalue using std::move. Anyway, all this will be revisited when the port to sqlite modern cpp is finished.

In this case it's the std::sting inside the tuple the one that we can move.
I've changed the code to move the string.

SELECT user_id FROM access_keys
WHERE access_key = ?;)sql"
<< key;
for (std::tuple<std::string> row : rows) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd avoid using a for in this way for readability, if we are interested in the first record, there should be a more straightforward way to obtain that.
Also, same considerations on moves, maybe you can use the && notation here if row can be moved and use std::move(row) in the subsequent assignment

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid this library is not offering direct access to rows.
sqlite modern cpp is not creating an intermediate vector with all results in it, like sqlite_orm.
This will give us hopefully faster access, but all we get from the library is the classic methods for iterators (begin, end, operator++...)

This library is calling sqlite next functions for every iteration, so there is not straightforward way to access the first row of a query (without copying the result to an intermediate vector as stated above). We get the data as it is accessed right after the call of sqlite3 next.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid this library is not offering direct access to rows.

I think this is more than reasonable, otherwise it would mean the library would load in memory an arbitrary amount of rows, I really hope the library is implementing an iterator like behavior :) .

I think enhanced for in C++ is just a sugar-syntax wrapper using an iterator taken from the collection you are iterating on.
If this were the case should not be difficult accessing the first element with a row.begin() statement.
Anyway, as you prefer, saying this just because I don't like seeing a for used in this way, but it is just personal taste.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, do you mean by doing this?

auto row = *rows.begin(); 

I tried that, but it didn't work as expected.

I tried doing this:

if (rows.begin() != rows.end()) {
  std::tuple<std::string> val = *rows.begin();
}

The above didn't work because calling begin changes the state internally and the next time we call begin value is not returned as expected. I gave up because although I dislike the for thing there's a lot to do with the port.

This seems to work fine, so I've changed the code to do it this way:

auto iter = rows.begin();
if (iter != rows.end()) {
  std::tuple<std::string> row = *iter;
  ...
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, as said, this is matter of personal tastes, not really an issue ;).
Thanks!

@@ -187,6 +196,24 @@ StorageRef DBConn::get_storage() {
}
}

dbapi::sqlite::database DBConn::get() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a thread already called get_storage() this returns that connection (set in on_open) with the sqlite init we exect (busy timeout, error log, profiling settings)

If that thread did not, this is a new connection with sqlite modern cpp default sqlite_open settings.

We should avoid mixing sqlite connections with different initializations.

Would it suffice to just call get_storage() in the catch, ignore the result and take storage_pool_new.at() which was set in on_open?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point (I was wondering about that too). I suspect that workaround you suggested should work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that's something I was afraid of. I will try that workaround.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to use get_connection() to open the db the same way for all cases, meanwhile we finish porting the rest of the code.


template <>
struct __is_sqlite_blob<rgw_placement_rule> : std::true_type {};
// list of types that are stored as blobs and have the encode/decode functions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice! A comment like "To add automagic blob handling for a datastructure with ceph encode/decode add it here" would be awesome. I suspect we'll have to blobify something else eventually

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I'll add a comment.
In fact I was also thinking to do the same for this part:

// by default type's decode function is under the ceph namespace
template <typename T>
struct __ceph_ns_decode : std::true_type {};

template <typename T>
inline constexpr bool ceph_ns_decode = __ceph_ns_decode<T>::value;

// specialize the ones that are not under the ceph namespace
template <>
struct __ceph_ns_decode<RGWAccessControlPolicy> : std::false_type {};
template <>
struct __ceph_ns_decode<RGWQuotaInfo> : std::false_type {};
template <>
struct __ceph_ns_decode<RGWObjectLock> : std::false_type {};
template <>
struct __ceph_ns_decode<RGWUserCaps> : std::false_type {};
template <>
struct __ceph_ns_decode<ACLOwner> : std::false_type {};
template <>
struct __ceph_ns_decode<rgw_placement_rule> : std::false_type {};

Adding a tuple and avoid to have to specialise this way.
I'll add more verbose comments to explain how to add a new blob type

// TODO(https://github.com/aquarist-labs/s3gw/issues/788): Make
// dbapi::sqlite::database the primary interface for sqlite3.
class DBConn {
private:
std::unordered_map<std::thread::id, Storage> storage_pool;
std::unordered_map<std::thread::id, ConnectionNewLib> storage_pool_new;
Copy link
Member

@irq0 irq0 Nov 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a general discussion note:

It's a bit suboptimal that sqlite_modern_cpp want's a shared ptr. I'd rather have DBConn own all the sqlite connections with a unique_ptr. It's lifetime is already equal to sfs / RGW.

The nice thing is that with sqlite_modern_cpp's std::shared_ptr constructor we can take full charge of the sqlite database initialization.

const std::string& key_name, const KeyType& key_value
) {
auto rows =
db << fmt::format("SELECT * FROM {} WHERE {} = ?;", table_name, key_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a LIMIT 1?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, will do

@0xavi0 0xavi0 requested review from irq0 and giubacc November 22, 2023 16:52
Started using move semantics for conversions (more to come).
DBConn::get uses sqlite_orm's `get_storage` so we are opening the
database in the same way.
More blob generalisation.

Signed-off-by: Xavi Garcia <[email protected]>
@0xavi0 0xavi0 force-pushed the sqlite-modern-blobs-users-objects branch from 4e2c8fe to d63d69c Compare November 23, 2023 08:44
@jecluis jecluis merged commit 1971d89 into aquarist-labs:s3gw Dec 4, 2023
5 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/rgw-sfs RGW & SFS related kind/enhancement Change that positively impacts existing code priority/1 Should be fixed for next release
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants