-
Notifications
You must be signed in to change notification settings - Fork 10
rgw/sfs: sqlite_modern_cpp blobs, users and objects changes #246
rgw/sfs: sqlite_modern_cpp blobs, users and objects changes #246
Conversation
Adds another step on moving the sqlite code to sqlite_modern_cpp. Adds the code to store/retrieve ceph types as blobs. Adds a new mechanism to declare when a type that has encode and decode functions needs to be stored as a blob. In order to add a new type, simple add the type to the following tuple. ```c++ using BlobTypes = std::tuple< rgw::sal::Attrs, ACLOwner, rgw_placement_rule, std::map<std::string, RGWAccessKey>, std::map<std::string, RGWSubUser>, RGWUserCaps, std::list<std::string>, std::map<int, std::string>, RGWQuotaInfo, std::set<std::string>, RGWBucketWebsiteConf, std::map<std::string, uint32_t>, RGWObjectLock, rgw_sync_policy_info> ``` Adds a new header (`dbabpi_type_wrapper.h`) file that should be included when declaring type bindings for `sqlite_modern_cpp`. We need this to avoid circular dependencies with the `bind_col_in_db` function from `sqlite_moden_cpp`. This (from the main `sqlite_modern_cpp.h` file): ```c++ template<typename T> database_binder &operator<<(database_binder& db, index_binding_helper<T> val) { db._next_index(); --db._inx; int result = bind_col_in_db(db._stmt.get(), val.index, std::forward<T>(val.value)); if(result != SQLITE_OK) exceptions::throw_sqlite_error(result, db.sql(), sqlite3_errmsg(db._db.get())); return db; } ``` calls `bind_col_in_db`, but the function specialisation needs to be declared first. So, when declaring type bindings we should only include `sqlite_modern_cpp/type_wrapper.h` Changes all BLOB types in Users to the binding types. Changes all functions in `sqlite_users` to use `sqlite_modern_cpp`. Deletes the conversions files for Users. (conversion is still needed because we need to create the `RGWUserInfo` object needed for SAL layer, the conversion required is only db->SAL ) Changes all funcions in `sqlite_objects` to use `sqlite_modern_cpp` Signed-off-by: Xavi Garcia <[email protected]>
2a30782
to
5f4ae90
Compare
I think I'll need to work on getting connections from the connection pool, given that the concurrency unit tests fail. |
Just looking at the current implementation of ceph/src/rgw/driver/sfs/sqlite/dbconn.h Lines 288 to 290 in 5f48c7b
...this isn't using the connection pool at all. That's just going to create a new |
Signed-off-by: Xavi Garcia <[email protected]>
This adds a temporary workaround for being able to use sqlite_modern_cpp and sqlite_orm together. It adds a new pool of connections of std::shared_ptr<sqlite> because the new database objects from sqlite_modern_cpp has a constructor that gets that in its constructor and its the only thing needed. This should be reworked once the code if fully migrated to sqlite_modern_cpp only. Signed-off-by: Xavi Garcia <[email protected]>
I've added a temporary workaround to be able to use I'm not storing the whole database(std::shared_ptr<sqlite3> db):
_db(db) {} As all the code before this patch was using a copy of We maybe prefer to revisit this when the initial connection is done with Also... I'm using the same locks as the actual code might request the database connection from code based on both libraries. It's a bit messy right now but I don't know if we should spend too much time on something that will be possibly deleted. I would not merge this PR before Friday, so it's not released next week, that would give us more time to finish all the migration work before next release. |
std::map<std::string, RGWAccessKey>, std::map<std::string, RGWSubUser>, | ||
RGWUserCaps, std::list<std::string>, std::map<int, std::string>, | ||
RGWQuotaInfo, std::set<std::string>, RGWBucketWebsiteConf, | ||
std::map<std::string, uint32_t>, RGWObjectLock, rgw_sync_policy_info>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add also: RGWBucketInfo?
Saying this because I'm using encode/decode functions over this type but maybe not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we store the whole bucket info as a blob we can't then query over specific bucket columns. That's why we didn't store buckets that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right!
<< bucket_id << object_name; | ||
std::optional<DBObject> ret_object; | ||
for (auto&& row : rows) { | ||
ret_object = DBObject(row); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::move(row)
otherwise you are calling the non-move constructor of DBObject (hopefully move constructor of DBObject could be more fast than the classical copy constructor)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::move
just does a cast. We also would need to add the constructor that has the rvalue, otherwise it would still call the copy one.
Good point, though, I will experiment and see if we can avoid copies.
The && is here as an universal reference which accepts everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::move just does a cast.
These considerations because when you read auto&& row
you'd expect the row
reference to be then bound with move enabled functions/constructor/methods.
Otherwise no need to use the &&
notation in the for.
Good point, though, I will experiment and see if we can avoid copies.
yup, this considerations are valid only if we support a DBObject(.. &&row)
constructor (and if with that implementation we can obtain some advantages)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the move semantics for all conversions... I've done a few experiments and checked the sqlite modern cpp library and I think we can get use of moves, but we'll still need extra work for taking the benefit in the conversion utils file.
Right now we're not using move semantics for the optional assignment values, but I guess it could be beneficial in the future.
We said to explore this in a future PR, when the whole port to sqlite modern cpp is finished.
Anyway... as the rows are defined with the && notation I'm using std::move and changed the constructors involved.
auto rows = db << fmt::format("SELECT * FROM {};", table_name); | ||
std::vector<Target> ret; | ||
for (auto&& row : rows) { | ||
ret.emplace_back(Target(row)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::move(row)
, same considerations from above
<< column_value; | ||
std::vector<Target> ret; | ||
for (auto&& row : rows) { | ||
ret.emplace_back(Target(row)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::move(row) , same considerations from above
<< key_value; | ||
std::optional<Target> ret; | ||
for (auto&& row : rows) { | ||
ret = Target(row); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::move(row)
, same as above
dbapi::sqlite::database db = conn->get(); | ||
auto rows = db << R"sql(SELECT user_id FROM users;)sql"; | ||
std::vector<std::string> ret; | ||
for (std::tuple<std::string> row : rows) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you can use the &&
notation here if row
can be moved and use std::move(row)
in the subsequent constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know in this case that the type returned by the query is std::tuple<std::string>
.
I used the && notation in the templated code because we don't know the type at that point.
I've also tested using the deduction form (auto&&
) and the tuple form and in both cases the library is creating a row_iterator &
and we can always cast to an rvalue using std::move
. Anyway, all this will be revisited when the port to sqlite modern cpp is finished.
In this case it's the std::sting inside the tuple the one that we can move.
I've changed the code to move the string.
SELECT user_id FROM access_keys | ||
WHERE access_key = ?;)sql" | ||
<< key; | ||
for (std::tuple<std::string> row : rows) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd avoid using a for
in this way for readability, if we are interested in the first record, there should be a more straightforward way to obtain that.
Also, same considerations on moves, maybe you can use the && notation here if row can be moved and use std::move(row) in the subsequent assignment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid this library is not offering direct access to rows.
sqlite modern cpp is not creating an intermediate vector with all results in it, like sqlite_orm.
This will give us hopefully faster access, but all we get from the library is the classic methods for iterators (begin, end, operator++...)
This library is calling sqlite next functions for every iteration, so there is not straightforward way to access the first row of a query (without copying the result to an intermediate vector as stated above). We get the data as it is accessed right after the call of sqlite3 next.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid this library is not offering direct access to rows.
I think this is more than reasonable, otherwise it would mean the library would load in memory an arbitrary amount of rows, I really hope the library is implementing an iterator like behavior :) .
I think enhanced for in C++ is just a sugar-syntax wrapper using an iterator taken from the collection you are iterating on.
If this were the case should not be difficult accessing the first element with a row.begin() statement.
Anyway, as you prefer, saying this just because I don't like seeing a for used in this way, but it is just personal taste.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, do you mean by doing this?
auto row = *rows.begin();
I tried that, but it didn't work as expected.
I tried doing this:
if (rows.begin() != rows.end()) {
std::tuple<std::string> val = *rows.begin();
}
The above didn't work because calling begin
changes the state internally and the next time we call begin
value is not returned as expected. I gave up because although I dislike the for
thing there's a lot to do with the port.
This seems to work fine, so I've changed the code to do it this way:
auto iter = rows.begin();
if (iter != rows.end()) {
std::tuple<std::string> row = *iter;
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, as said, this is matter of personal tastes, not really an issue ;).
Thanks!
@@ -187,6 +196,24 @@ StorageRef DBConn::get_storage() { | |||
} | |||
} | |||
|
|||
dbapi::sqlite::database DBConn::get() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a thread already called get_storage() this returns that connection (set in on_open) with the sqlite init we exect (busy timeout, error log, profiling settings)
If that thread did not, this is a new connection with sqlite modern cpp default sqlite_open settings.
We should avoid mixing sqlite connections with different initializations.
Would it suffice to just call get_storage() in the catch, ignore the result and take storage_pool_new.at() which was set in on_open?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point (I was wondering about that too). I suspect that workaround you suggested should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that's something I was afraid of. I will try that workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this to use get_connection()
to open the db the same way for all cases, meanwhile we finish porting the rest of the code.
|
||
template <> | ||
struct __is_sqlite_blob<rgw_placement_rule> : std::true_type {}; | ||
// list of types that are stored as blobs and have the encode/decode functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice! A comment like "To add automagic blob handling for a datastructure with ceph encode/decode add it here" would be awesome. I suspect we'll have to blobify something else eventually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I'll add a comment.
In fact I was also thinking to do the same for this part:
// by default type's decode function is under the ceph namespace
template <typename T>
struct __ceph_ns_decode : std::true_type {};
template <typename T>
inline constexpr bool ceph_ns_decode = __ceph_ns_decode<T>::value;
// specialize the ones that are not under the ceph namespace
template <>
struct __ceph_ns_decode<RGWAccessControlPolicy> : std::false_type {};
template <>
struct __ceph_ns_decode<RGWQuotaInfo> : std::false_type {};
template <>
struct __ceph_ns_decode<RGWObjectLock> : std::false_type {};
template <>
struct __ceph_ns_decode<RGWUserCaps> : std::false_type {};
template <>
struct __ceph_ns_decode<ACLOwner> : std::false_type {};
template <>
struct __ceph_ns_decode<rgw_placement_rule> : std::false_type {};
Adding a tuple and avoid to have to specialise this way.
I'll add more verbose comments to explain how to add a new blob type
// TODO(https://github.com/aquarist-labs/s3gw/issues/788): Make | ||
// dbapi::sqlite::database the primary interface for sqlite3. | ||
class DBConn { | ||
private: | ||
std::unordered_map<std::thread::id, Storage> storage_pool; | ||
std::unordered_map<std::thread::id, ConnectionNewLib> storage_pool_new; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a general discussion note:
It's a bit suboptimal that sqlite_modern_cpp want's a shared ptr. I'd rather have DBConn own all the sqlite connections with a unique_ptr. It's lifetime is already equal to sfs / RGW.
The nice thing is that with sqlite_modern_cpp's std::shared_ptr constructor we can take full charge of the sqlite database initialization.
const std::string& key_name, const KeyType& key_value | ||
) { | ||
auto rows = | ||
db << fmt::format("SELECT * FROM {} WHERE {} = ?;", table_name, key_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a LIMIT 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, will do
Started using move semantics for conversions (more to come). DBConn::get uses sqlite_orm's `get_storage` so we are opening the database in the same way. More blob generalisation. Signed-off-by: Xavi Garcia <[email protected]>
4e2c8fe
to
d63d69c
Compare
Adds another step on moving the sqlite code to sqlite_modern_cpp.
Adds the code to store/retrieve ceph types as blobs. Adds a new mechanism to declare when a type that has encode and decode functions needs to be stored as a blob.
In order to add a new type, simple add the type to the following tuple.
Adds a new header (
dbabpi_type_wrapper.h
) file that should be included when declaring type bindings forsqlite_modern_cpp
.We need this to avoid circular dependencies with the
bind_col_in_db
function fromsqlite_moden_cpp
.This (from the main
sqlite_modern_cpp.h
file):calls
bind_col_in_db
, but the function specialisation needs to be declared first.So, when declaring type bindings we should only include
sqlite_modern_cpp/type_wrapper.h
Changes all BLOB types in Users to the binding types.
Changes all functions in
sqlite_users
to usesqlite_modern_cpp
.Deletes the conversions files for Users. (conversion is still needed because we need to create the
RGWUserInfo
object needed for SAL layer, the conversion required is only db->SAL )Changes all funcions in
sqlite_objects
to usesqlite_modern_cpp
Checklist