Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ABI-less Decoding of Calls + Event #2

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sourabhniyogi
Copy link

This PR contains new BigQuery UDFs to decode EVM calls and events without knowing an ABI using 2 key functions:

  • abi_functions.PARSE_CALL(input)
  • abi_functions.PARSE_EVENT(data, topics)

To support ABI-less decoding, each of the above function uses a large ~1MB library containing 2 compressed maps, one covering known call methods and known event topics. Using the call 4byte methodID or the event's topic[0], the most likely ABI is synthesized from the map and decoding is attempted. For events, given the number of topics observed and the ABI guess, any mismatch results in combinatorial search on all possible indexed combinations, with the first success considered valid.

How It Works:

  • Big Picture: approach taken in PARSE_CALL and PARSE_EVENT is to build a 1MB library of 2 compressed maps: call_map and event_map
  • A signatures table is compiled by aggregating ABI signatures with 2.8MM+ records from multiple open-source repos. However most of these hex signatures have never bee observed on chain.
  • So, we tally actual observations (numObservations from logs and transactions) and presence in ABI contracts (numContracts from contracts) in crypto_ethereum.{ EVM Chains to build a reduced dataset

This PR contains new BigQuery UDFs to decode EVM calls and events **without knowing an ABI** using 2 key functions:

* `abi_functions.PARSE_CALL(input)`
* `abi_functions.PARSE_EVENT(data, topics)`

To support ABI-less decoding, each of the above function uses a large ~1MB library containing 2 compressed maps, one covering known call
methods and known event topics.  Using the call 4byte methodID or the event's topic[0], the most likely ABI is synthesized from the map and
decoding is attempted.  For events, given the number of topics observed and the ABI guess, any mismatch results in combinatorial
search on all possible indexed combinations, with the first success considered valid.

How It Works:
* Big Picture: approach taken in `PARSE_CALL` and `PARSE_EVENT` is to build a 1MB library of 2 _compressed_ maps: `call_map` and `event_map`
* A `signatures` table is compiled by aggregating ABI signatures with 2.8MM+ records from multiple open-source repos.  However most of these hex signatures have never bee observed on chain.
* So, we tally _actual_ observations (`numObservations` from `logs` and `transactions`) and presence in ABI contracts (`numContracts` from `contracts`)
in `crypto_ethereum.{` EVM Chains to build a reduced dataset
@allenday
Copy link
Member

Hi @sourabhniyogi, I've requested that the referenced JS files be added to a GCP-owned bucket.

I'll merge the PR after we're able to get them added to the new location and update the diff here.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants