1952 unique index2 #2228

mariusztrela · 2018-03-14T10:49:31Z

#1952

short description

It has been made 2 changes:

simplification of index by_location in operation_object

before: ordered_unique( block, trx_in_block, op_in_trx, virtual_op, id )
after: ordered_non_unique( block )

This index doesn't attend in consensus, therefore there is't any problem with potential incorrect working. This index is used only during API call.

removing two excess indexes in comment_vote_object
before: by_comment_voter, by_voter_comment, by_voter_last_update, by_comment_weight_voter
after: by_comment_voter, by_voter_comment

Missing indexed are simulated by algorithm.

correctness tests
During 1 week were made hundred of thousands API calls to builds: develop, 1952-unique-index2( using pyresttest tool ).
Comparision showed, that result from above servers are identical.

performance tests

reindex( develop branch vs 1952-unique-index2 branch ): branch 1952-unique-index2 faster 10 minutes
Time of 5000 different calls[ s ]( local machine, 1 thread, used tool pyresttest )
list_votes;by_voter_last_update ( 12min. 23 sec develop ) ( 12min. 20 sec. 1952-unique-index2 )
Time of 4000 different calls[ s ]( local machine, 1 thread, used tool pyresttest )
get_ops_in_block ( 435s develop ) ( 435s 1952-unique-index2 )

Memory consumption

on fullnode without AH there was ~9GB less memory used
on consensus node 300MB less

mvandeberg · 2018-03-14T19:14:03Z

@theoreticalbts The changes to the indices seems reasonable to me. I want your opinion on them as well. Currently, database_api is WIP, so I am not worried about the dense implementation or it even being correct right now.

theoreticalbts · 2018-03-26T22:03:00Z

pay_curators() refactor: Should be faster to sort() a vector< const comment_object* > instead of using a set< proxy_item >.
get_ops_in_block refactoring, similarly.
list_votes re-implementation is problematic (see explanation).

So about the last item, I didn't go through the code in detail. (It is quite complicated and uses template magic.) But I suspect it narrows down the set of objects as much as possible using an index, then sorts all the objects in the set into an in-memory structure. The problem with this approach is it defeats the purpose of pagination. The purpose of pagination is, by limiting the number of requested objects to M objects, we can limit the work done by an API call to O(M) database operations, even if there are N objects (where N is much larger than M). I didn't read the code in detail, but I suspect the new approach must be to retrieve all N objects, sort them, then throw all but M away. The problem is that now you're doing O(N) operations. It's not something you can really avoid -- you have to read all the objects to reconstruct the information in the deleted index, and that means O(N) operations just to even see them all.

Since we're breaking API's with appbase anyway, I suggest we simply delete database_api code for querying any of the deleted indexes. If users want to sort in ways not supported by built-in indexes, it's up to them to implement an external process that does the necessary indexing, or uses the sort-in-memory approach. If this is too harsh on backward compatibility (for example condenser cannot function without some of the deleted API options), we should either make the now-unnecessary indexes either (a) a compile-time option, or (b) a run-time option implemented by proxy objects in some plugin.

mvandeberg · 2018-03-26T22:25:10Z

I am fine removing unused indices from database_api code, but we need to keep them in for the time being if they are used by condenser_api. To be clear, appbase is not breaking APIs. We are deprecating old, inefficient, and inconsistent APIs with a properly designed API. If, as part of that redesign, we want to change how certain data is returned, that is fine. We do need to support the old APIs in condenser_api for the time being however. Any change that breaks condenser_api needs to have a compatibility fix. If it is something we cannot change now, but want to in the future, comment it in source with an FC_TODO.

mariusztrela · 2018-04-12T08:12:22Z

comment_vote_object instead of proxy_item
I agree with you - fix is done.
vector instead of set
I don't agree, because for vector double scanning is needed( inserting + sorting ). For set is only inserting.
list_votes re-implementation is problematic

In fact if we want to retrieve N records, now we have to retrieve N + K additional information to close actual subset.
Despite that additional records, there is no problem with performance. It was done many tests regarding to performance and no any slowdown was detected.

mvandeberg · 2018-04-13T16:43:10Z

I agree that using a set instead of a vector for comment vote sorting is best. vector is alright, but as @mariusztrela points out, we need to insert and then sort, whereas we can insert into a sorted data structure with set. If we absolutely wanted to use a flat structure, a flat_set would be the correct choice.

mvandeberg

I am in agreement with @theoreticalbts regarding the database_api implementation. The purpose of the API is to return objects with the indices that we need for consensus. If we find a way to eliminate a consensus index, by definition of database_api, it does not need to be returned. Instead of a complicated reconstruction of the sort order in the API, let's simply eliminate it from the API. The API is WIP. If we don't ship it with the sort order, then it won't be used and we won't have to maintain a complicated implementation.

mvandeberg · 2018-04-13T16:49:19Z

libraries/plugins/apis/account_history_api/account_history_api.cpp

@@ -37,17 +37,30 @@ DEFINE_API_IMPL( account_history_api_chainbase_impl, get_ops_in_block )
 {
   return _db.with_read_lock( [&]()
   {
+      std::multiset< api_operation_object > tmp_result;


FC has multiset serialization. Let's change the returned type in get_ops_in_block_return to the multiset type to eliminate copying to a vector. When we implement this for rocksdb account history plugin, we can override the comparator to always return true so that it behaves as a queue.

…st_update.

…pdate`.

mariusztrela · 2018-04-17T12:43:53Z

@mvandeberg It was made changes according to your tips.

mariusztrela assigned mvandeberg, mariusztrela and vogel76 Mar 14, 2018

goldibex mentioned this pull request Mar 27, 2018

Eliminate Unused Indices #1950

Closed

mariusztrela added the status/WIP label Apr 11, 2018

mariusztrela force-pushed the 1952-unique-index2 branch from 6090e6d to 3e70491 Compare April 12, 2018 08:10

mariusztrela added status/ReadyToMerge and removed status/WIP labels Apr 12, 2018

mvandeberg suggested changes Apr 13, 2018

View reviewed changes

Mariusz-Trela added 16 commits April 17, 2018 13:45

Issue #1952 - review unique index.

ecf406f

Small correction + refactoring.

fce6538

Work in 'list_votes': by_comment_voter, by_voter_comment, by_voter_la…

55ef9d5

…st_update.

Issue 1952 - definitely work: by_voter_comment and `by_voter_last_u…

94e16b6

…pdate`.

Refactoring + added processing 'by_comment_weight_voter' flag.

e123604

Refactoring part 2 + some comments.

78b7b77

After merge.

e1d5a11

Refactoring - part 1.

9fe382f

Refactoring - part 2.

14e5edd

Refactoring - part 3.

49009c7

Small fix.

dbb4d24

Small fix 2.

11fa822

Small fix 3.

c53c3a0

Simplifing objects held in std::set

2e87dd9

Refactoring in 'votes_sorter'

2170a8d

Refactoring + removing excess types of sort.

65b72ae

mariusztrela force-pushed the 1952-unique-index2 branch from 3e70491 to 65b72ae Compare April 17, 2018 12:04

mvandeberg approved these changes Apr 17, 2018

View reviewed changes

mvandeberg merged commit 42cf1a1 into develop Apr 17, 2018

mvandeberg deleted the 1952-unique-index2 branch April 17, 2018 17:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1952 unique index2 #2228

1952 unique index2 #2228

mariusztrela commented Mar 14, 2018 •

edited

Loading

mvandeberg commented Mar 14, 2018

theoreticalbts commented Mar 26, 2018 •

edited

Loading

mvandeberg commented Mar 26, 2018

mariusztrela commented Apr 12, 2018

mvandeberg commented Apr 13, 2018

mvandeberg left a comment •

edited

Loading

mvandeberg Apr 13, 2018

mariusztrela commented Apr 17, 2018 •

edited

Loading

1952 unique index2 #2228

1952 unique index2 #2228

Conversation

mariusztrela commented Mar 14, 2018 • edited Loading

mvandeberg commented Mar 14, 2018

theoreticalbts commented Mar 26, 2018 • edited Loading

mvandeberg commented Mar 26, 2018

mariusztrela commented Apr 12, 2018

mvandeberg commented Apr 13, 2018

mvandeberg left a comment • edited Loading

Choose a reason for hiding this comment

mvandeberg Apr 13, 2018

Choose a reason for hiding this comment

mariusztrela commented Apr 17, 2018 • edited Loading

mariusztrela commented Mar 14, 2018 •

edited

Loading

theoreticalbts commented Mar 26, 2018 •

edited

Loading

mvandeberg left a comment •

edited

Loading

mariusztrela commented Apr 17, 2018 •

edited

Loading