Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1952 unique index2 #2228

Merged
merged 16 commits into from
Apr 17, 2018
Merged

1952 unique index2 #2228

merged 16 commits into from
Apr 17, 2018

Conversation

mariusztrela
Copy link

@mariusztrela mariusztrela commented Mar 14, 2018

#1952

short description

It has been made 2 changes:

  • simplification of index by_location in operation_object

before: ordered_unique( block, trx_in_block, op_in_trx, virtual_op, id )
after: ordered_non_unique( block )

This index doesn't attend in consensus, therefore there is't any problem with potential incorrect working. This index is used only during API call.

  • removing two excess indexes in comment_vote_object
    before: by_comment_voter, by_voter_comment, by_voter_last_update, by_comment_weight_voter
    after: by_comment_voter, by_voter_comment

Missing indexed are simulated by algorithm.

correctness tests
During 1 week were made hundred of thousands API calls to builds: develop, 1952-unique-index2( using pyresttest tool ).
Comparision showed, that result from above servers are identical.

performance tests

  • reindex( develop branch vs 1952-unique-index2 branch ): branch 1952-unique-index2 faster 10 minutes

  • Time of 5000 different calls[ s ]( local machine, 1 thread, used tool pyresttest )
    list_votes;by_voter_last_update ( 12min. 23 sec develop ) ( 12min. 20 sec. 1952-unique-index2 )

  • Time of 4000 different calls[ s ]( local machine, 1 thread, used tool pyresttest )
    get_ops_in_block ( 435s develop ) ( 435s 1952-unique-index2 )

Memory consumption

  • on fullnode without AH there was ~9GB less memory used
  • on consensus node 300MB less

@mvandeberg
Copy link
Contributor

@theoreticalbts The changes to the indices seems reasonable to me. I want your opinion on them as well. Currently, database_api is WIP, so I am not worried about the dense implementation or it even being correct right now.

@theoreticalbts
Copy link
Contributor

theoreticalbts commented Mar 26, 2018

  • pay_curators() refactor: Should be faster to sort() a vector< const comment_object* > instead of using a set< proxy_item >.
  • get_ops_in_block refactoring, similarly.
  • list_votes re-implementation is problematic (see explanation).

So about the last item, I didn't go through the code in detail. (It is quite complicated and uses template magic.) But I suspect it narrows down the set of objects as much as possible using an index, then sorts all the objects in the set into an in-memory structure. The problem with this approach is it defeats the purpose of pagination. The purpose of pagination is, by limiting the number of requested objects to M objects, we can limit the work done by an API call to O(M) database operations, even if there are N objects (where N is much larger than M). I didn't read the code in detail, but I suspect the new approach must be to retrieve all N objects, sort them, then throw all but M away. The problem is that now you're doing O(N) operations. It's not something you can really avoid -- you have to read all the objects to reconstruct the information in the deleted index, and that means O(N) operations just to even see them all.

Since we're breaking API's with appbase anyway, I suggest we simply delete database_api code for querying any of the deleted indexes. If users want to sort in ways not supported by built-in indexes, it's up to them to implement an external process that does the necessary indexing, or uses the sort-in-memory approach. If this is too harsh on backward compatibility (for example condenser cannot function without some of the deleted API options), we should either make the now-unnecessary indexes either (a) a compile-time option, or (b) a run-time option implemented by proxy objects in some plugin.

@mvandeberg
Copy link
Contributor

I am fine removing unused indices from database_api code, but we need to keep them in for the time being if they are used by condenser_api. To be clear, appbase is not breaking APIs. We are deprecating old, inefficient, and inconsistent APIs with a properly designed API. If, as part of that redesign, we want to change how certain data is returned, that is fine. We do need to support the old APIs in condenser_api for the time being however. Any change that breaks condenser_api needs to have a compatibility fix. If it is something we cannot change now, but want to in the future, comment it in source with an FC_TODO.

@mariusztrela
Copy link
Author

  • comment_vote_object instead of proxy_item
    I agree with you - fix is done.

  • vector instead of set
    I don't agree, because for vector double scanning is needed( inserting + sorting ). For set is only inserting.

  • list_votes re-implementation is problematic

In fact if we want to retrieve N records, now we have to retrieve N + K additional information to close actual subset.
Despite that additional records, there is no problem with performance. It was done many tests regarding to performance and no any slowdown was detected.

@mvandeberg
Copy link
Contributor

I agree that using a set instead of a vector for comment vote sorting is best. vector is alright, but as @mariusztrela points out, we need to insert and then sort, whereas we can insert into a sorted data structure with set. If we absolutely wanted to use a flat structure, a flat_set would be the correct choice.

Copy link
Contributor

@mvandeberg mvandeberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in agreement with @theoreticalbts regarding the database_api implementation. The purpose of the API is to return objects with the indices that we need for consensus. If we find a way to eliminate a consensus index, by definition of database_api, it does not need to be returned. Instead of a complicated reconstruction of the sort order in the API, let's simply eliminate it from the API. The API is WIP. If we don't ship it with the sort order, then it won't be used and we won't have to maintain a complicated implementation.

@@ -37,17 +37,30 @@ DEFINE_API_IMPL( account_history_api_chainbase_impl, get_ops_in_block )
{
return _db.with_read_lock( [&]()
{
std::multiset< api_operation_object > tmp_result;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FC has multiset serialization. Let's change the returned type in get_ops_in_block_return to the multiset type to eliminate copying to a vector. When we implement this for rocksdb account history plugin, we can override the comparator to always return true so that it behaves as a queue.

@mariusztrela
Copy link
Author

mariusztrela commented Apr 17, 2018

@mvandeberg It was made changes according to your tips.

@mvandeberg mvandeberg merged commit 42cf1a1 into develop Apr 17, 2018
@mvandeberg mvandeberg deleted the 1952-unique-index2 branch April 17, 2018 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants