Polkadot Analytics Platform (Stage 2) #1969

rbrandao · 2023-09-13T14:05:52Z

Project Abstract

The Polkadot Analytics Platform aims at building a comprehensive data analysis and visualization tool for the Polkadot ecosystem. The platform will allow users to retrieve and analyze data from various Polkadot-related sources (e.g., different parachains and components such as browser wallets), aligned with the POnto ontology [1, 2, 3]. Users will be able to specify their queries using a controlled natural language (CNL), and the platform will provide a query engine to process these queries. Additionally, the platform will provide a UI to support constructing queries and visualizing informative artifacts that represent query results. As well as support for composing customizable dashboards using these artifacts.

In its current stage, the platform is composed of a knowledge base (KB) and its initial representation with the POnto ontology [4]. The current grant proposal focuses on populating this KB through the creation of semantic ETL pipelines, i.e. information extraction workflows that will extract, reuse and integrate data from different sources, aligning and structuring domain knowledge in the KB. We will create data extraction mechanisms to gather data from various Polkadot-related sources using Substrate interfaces (e.g. using Substrate-ETL), as well as offchain (e.g., using Github API).

[1] POnto source code: https://github.com/mobr-ai/POnto
[2] POnto documentation: https://www.mobr.ai/ponto
[3] POnto scientific paper: https://github.com/mobr-ai/POnto/raw/main/deliverables/milestone3/article.pdf
[4] Polkadot Analytics Platform source code: https://github.com/mobr-ai/PolkadotAnalytics

This is a follow-up grant application for the project A Knowledge-Oriented Approach to Enhance Integration and Communicability in the Polkadot Ecosystem and the project A Polkadot Analytics Platform: Stage 1

Grant level

Level 1: Up to $10,000, 2 approvals
Level 2: Up to $30,000, 3 approvals
Level 3: Unlimited, 5 approvals (for >$100k: Web3 Foundation Council approval)

Application Checklist

The application template has been copied and aptly renamed (project_name.md).
I have read the application guidelines.
Payment details have been provided (bank details via email or BTC, Ethereum (USDC/DAI) or Polkadot/Kusama (USDT) address in the application).
The software delivered for this grant will be released under an open-source license specified in the application.
The initial PR contains only one commit (squash and force-push if needed).
The grant will only be announced once the first milestone has been accepted (see the announcement guidelines).
I prefer the discussion of this application to take place in a private Element/Matrix channel. My username is: @_______:matrix.org (change the homeserver if you use a different one)

The Polkadot Analytics Platform aims at building a comprehensive data analysis and visualization tool for the Polkadot ecosystem. This is a follow-up grant application for the project: w3f#1420

The Polkadot Analytics Platform aims at building a comprehensive data analysis and visualization tool for the Polkadot ecosystem. The platform will allow users to retrieve and analyze data from various Polkadot-related sources (e.g., different parachains and components such as browser wallets), aligned with the POnto ontology [1, 2, 3]. Users will be able to specify their queries using a controlled natural language (CNL), and the platform will provide a query engine to process these queries. Additionally, the platform will provide a UI to support constructing queries and visualizing informative artifacts that represent query results. As well as support for composing customizable dashboards using these artifacts. In its current stage, the platform is composed of a knowledge base (KB) and its initial representation with the POnto ontology [4]. The current grant proposal focuses on populating this KB through the creation of semantic ETL pipelines, i.e. information extraction workflows that will extract, reuse and integrate data from different sources, aligning and structuring domain knowledge in the KB. We will create data extraction mechanisms to gather data from various Polkadot-related sources using Substrate interfaces (e.g. using Substrate-ETL), as well as offchain (e.g., using Github API). [1] POnto source code: https://github.com/mobr-ai/POnto [2] POnto documentation: https://www.mobr.ai/ponto [3] POnto scientific paper: https://github.com/mobr-ai/POnto/raw/main/deliverables/milestone3/article.pdf [4] Polkadot Analytics Platform source code: https://github.com/mobr-ai/PolkadotAnalytics

github-actions · 2023-09-13T14:06:11Z

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

rbrandao · 2023-09-13T14:10:52Z

I have read and hereby sign the Contributor License Agreement.

rbrandao · 2023-09-13T14:11:13Z

recheck

applications/polkadot_analytics_platform_stage2.md

Co-authored-by: Nikhil W3F <[email protected]>

Changing all quotes (>) to bold (**) in markdown format.

rbrandao · 2023-09-20T12:32:27Z

Thanks for the heads-up @nikw3f, I changed all the quotes (>) to bold (**) in the markdown format.

takahser

@rbrandao I added a few inline comments, feel free to have a look.

applications/polkadot_analytics_platform_stage2.md

takahser · 2023-09-22T11:07:50Z

applications/polkadot_analytics_platform_stage2.md

+| **1** | Substrate-ETL extraction scripts | Extraction scripts to gather data from Substrate-ETL. |
+| **2** | Polkadot Data extraction scripts | Extraction scripts to gather polkadot related data not available on Substrate-ETL. |
+| **3** | GitHub Data extraction scripts | Extraction scripts to gather data from GitHub, which will be used to support queries relating to network engagement and innovation. |


This is currently to vague. Could you dive more into specifics so we know which kind of data we can expect to be extracted here from the various sources?

In the previous stage, we extended the POnto ontology to structure relevant concepts of the Polkadot ecosystem, including entities from example queries of the Substrate-ETL project and the related RFP mentioned in this application.

In M1, the idea is to implement scripts to gather data to support the previously analyzed queries. That is, gathering data from Substrate-ETL, Substrate-based assets and services (such as Polkassembly, polkadot.js, and others), and GitHub (in order to answer questions about the ecosystem evolution).

In deliverable 1, we will extract data from all the SQL tables specified in the Substrate-ETL. In deliverable 2, we will explore APIs like the ones used in the Polkassembly to extract OpenGov data to answer the queries specified in the RFP. In deliverable 3, we will use github APIs to extract data from the repositories related to the Polkadot Ecosystem, aiming at answering questions regarding dev-community engagement. E.g.: which parachain had more PR in a specific time interval.

applications/polkadot_analytics_platform_stage2.md

@takahser

Adjusting application to address @takahser's comments

Adding funding estimates info to the "Future plans" section.

semuelle

Thanks for the application, @rbrandao. Could you add an architecture diagram that shows where and how the deliverables fit into the complete architecture? And ideally a deliverable in form of a demo showing how the results of this grant will be used in the future? Basically what the testing guide would be, but with the future work in mind ("This question will be converted in this query and this is the result of that query")?

semuelle · 2023-09-27T09:30:15Z

applications/polkadot_analytics_platform_stage2.md

+| **0d.** | Docker | Docker images for the ETL workflows. |
+| **0e.** | Article | We will write a medium post with an overview of semantic ETL workflows and their tasks. |
+| **1** | Concepts and data sources mapping | Extension of POnto, mapping relevant entities and corresponding datasources. This mapping will be used later on to automatize the proposed Semantic ETL Pipelines to continuously sync the KB with the latest ecosystem state. |
+| **2.** | Ontology Alignment | Scripts to perform ontology alignment with the extracted data as entities in the POnto ontology. |


What is ontology alignment?

In our project, we consider "ontology alignment" the task of establishing correspondence between the POnto ontology and the schema structures used in different data sources. This process aims at enriching the extracted data with semantics, to allow structured queries and knowledge consumption.

In general, ontologies are formal representations of knowledge that define the concepts and relationships within a specific domain. When extracting data from multiple sources, each source may employ its own ontology, leading to semantic heterogeneity. Ontology alignment aims to resolve these differences by mapping equivalent or related concepts across different sources, ensuring interoperability and coherence in the integrated data.

@semuelle

Updating application considering @semuelle's feedback

rbrandao · 2023-09-27T13:30:01Z

Thanks for the application, @rbrandao. Could you add an architecture diagram that shows where and how the deliverables fit into the complete architecture?

Hi @semuelle, we extended the architectural diagram (see Figure 2) reflecting your suggestions. The orange boxes highlight the components that we will develop in the current application. We are focusing on the Data Layer, to develop the extraction scripts (in M1) and to structure these scripts in "semantic workflows" (in M2) that will perform alignment with the POnto ontology and inject entities as individuals in the KB.

And ideally a deliverable in form of a demo showing how the results of this grant will be used in the future? Basically what the testing guide would be, but with the future work in mind ("This question will be converted in this query and this is the result of that query")?

We updated the milestone 2 deliverable table, extending the tutorial (deliverable 0b) to comprise the demo you suggested in the Jupyter notebook. In addition, Figure 1 illustrates the main steps of the user interaction you mentioned (from the CNL query input to result visualization, and final user feedback).

rbrandao · 2023-10-12T11:51:50Z

Hi @nikw3f, just saw that the github bot added the "stale" label to this PR. Anything pending on our side?

takahser · 2023-10-12T11:53:58Z

@rbrandao sorry for the delay here, we've currently got a bit of a backlog. I removed the label. We'll be back with more feedback soon!

rbrandao · 2023-10-12T11:55:38Z

@rbrandao sorry for the delay here, we've currently got a bit of a backlog. I removed the label. We'll be back with more feedback soon!

Thanks for the prompt reply @takahser. We are looking forward to it.

semuelle

Hi @rbrandao. Thanks for the updates. I just noticed that you are estimating another 55,000 USD required after this grant to get to the final stage of the project. Including the previous grants, that's over 120,000 USD. Given the scope of other, much smaller grants from already established teams, this seems disproportionate.
Since I won't be able to reply to comments in the next days, I will withhold my vote and let the rest of the committee give their view.

Noc2

Thanks a lot for the new application. However, after taking a look at your previous deliveries and your GitHub org, I decided not to support this application for now. In general, I think that UI is better suited for most of the aspects that the application tries to cover. Additionally, I wasn't convinced by your deliveries, nor do I see any progress since the delivery in your GitHub org. That said, I wish you all the best with the project and will ping the rest of the committee again.

rbrandao · 2023-10-20T12:23:43Z

Hi @rbrandao. Thanks for the updates. I just noticed that you are estimating another 55,000 USD required after this grant to get to the final stage of the project. Including the previous grants, that's over 120,000 USD. Given the scope of other, much smaller grants from already established teams, this seems disproportionate. Since I won't be able to reply to comments in the next days, I will withhold my vote and let the rest of the committee give their view.

Thanks for the feedback @semuelle. If I may ask, how much you think should be a proportional value for a similar scope of work that we are proposing? I mean, a broad solution encompassing mechanisms to structure diverse data sources, aligned through an ontology in a KB to be queried over a controlled natural language. And that led by a team of experts with proven record of delivering similar solutions in different industries. From our experience, 120k for such a project is below the market.

rbrandao · 2023-10-20T12:33:20Z

Hi @Noc2 , thanks for the comment.

In general, I think that UI is better suited for most of the aspects that the application tries to cover.

Maybe I'm missing something here? The UI (frontend) has to retrieve data from a source in the backend, as explained in our application we are pulling data from different sources (including chain data from ETLs) that would be aligned in a Knowledge Base. There is no way to get all of that on the UI without backend assets.

Additionally, I wasn't convinced by your deliveries, nor do I see any progress since the delivery in your GitHub org.

That is really unfortunate that you aren't convinced by our deliveries. We delivered all of the proposed milestones according to the roadmap defined in our previous applications, and we did not miss any deadline whatsoever. Regarding the progress on our github org, we have been working on private projects in our startup. I don´t think it is fair to assess the progress of a project that wasn't approved in the first place.

takahser · 2023-10-25T14:25:14Z

@rbrandao thanks for your patience and the work you put into this. However, unfortunately the w3f grants committee decided not to pursue your proposal further. The reasons for this decision include:

the amount of funds you'd require to finish your projects (>$120k) in comparison to the value that we perceive is being created for the community in return
the vague formulation of some parts of the proposal
concerns related to the deliveries in your previous grant

At this point, getting your project funded through alternative funding sources like the treasury seem worth exploring. We'd like to add that you're still welcome to apply for grants at our program in the future.

We wish you all the best for the future of your project and thank you for your interest in our program!

rbrandao added 7 commits August 3, 2023 12:18

Adding polkadot_analytics_platform.md

45ff92b

The Polkadot Analytics Platform aims at building a comprehensive data analysis and visualization tool for the Polkadot ecosystem. This is a follow-up grant application for the project: w3f#1420

Merge branch 'w3f:master' into master

6e0c997

Update polkadot_analytics_platform.md

a3eb6aa

Update polkadot_analytics_platform.md

88570fe

Merge branch 'w3f:master' into master

81b8d38

Merge branch 'w3f:master' into master

e9992cd

semuelle assigned nikw3f Sep 15, 2023

nikw3f reviewed Sep 20, 2023

View reviewed changes

applications/polkadot_analytics_platform_stage2.md Outdated Show resolved Hide resolved

nikw3f reviewed Sep 20, 2023

View reviewed changes

applications/polkadot_analytics_platform_stage2.md Outdated Show resolved Hide resolved

rbrandao and others added 3 commits September 20, 2023 09:22

Update applications/polkadot_analytics_platform_stage2.md

1e67af5

Co-authored-by: Nikhil W3F <[email protected]>

Update applications/polkadot_analytics_platform_stage2.md

2bffa27

Co-authored-by: Nikhil W3F <[email protected]>

Update polkadot_analytics_platform_stage2.md

0f76c69

Changing all quotes (>) to bold (**) in markdown format.

nikw3f added the ready for review The project is ready to be reviewed by the committee members. label Sep 21, 2023

takahser requested changes Sep 24, 2023

View reviewed changes

rbrandao added 2 commits September 25, 2023 10:42

Update polkadot_analytics_platform_stage2.md

d76e3dd

Adjusting application to address @takahser's comments

Update polkadot_analytics_platform_stage2.md

3033926

Adding funding estimates info to the "Future plans" section.

semuelle requested changes Sep 27, 2023

View reviewed changes

Update polkadot_analytics_platform_stage2.md

c53b791

Updating application considering @semuelle's feedback

github-actions bot added the stale label Oct 12, 2023

takahser self-requested a review October 12, 2023 11:53

takahser removed the stale label Oct 12, 2023

takahser requested a review from nikw3f October 12, 2023 11:54

takahser requested a review from semuelle October 12, 2023 11:54

semuelle reviewed Oct 13, 2023

View reviewed changes

Noc2 requested changes Oct 20, 2023

View reviewed changes

takahser closed this Oct 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Polkadot Analytics Platform (Stage 2) #1969

Polkadot Analytics Platform (Stage 2) #1969

rbrandao commented Sep 13, 2023

github-actions bot commented Sep 13, 2023 •

edited

Loading

rbrandao commented Sep 13, 2023

rbrandao commented Sep 13, 2023

rbrandao commented Sep 20, 2023

takahser left a comment

takahser Sep 22, 2023

rbrandao Sep 25, 2023 •

edited

Loading

semuelle left a comment

semuelle Sep 27, 2023

rbrandao Sep 27, 2023

rbrandao commented Sep 27, 2023 •

edited

Loading

rbrandao commented Oct 12, 2023

takahser commented Oct 12, 2023

rbrandao commented Oct 12, 2023 •

edited

Loading

semuelle left a comment

Noc2 left a comment

rbrandao commented Oct 20, 2023

rbrandao commented Oct 20, 2023

takahser commented Oct 25, 2023

Polkadot Analytics Platform (Stage 2) #1969

Polkadot Analytics Platform (Stage 2) #1969

Conversation

rbrandao commented Sep 13, 2023

Project Abstract

Grant level

Application Checklist

github-actions bot commented Sep 13, 2023 • edited Loading

rbrandao commented Sep 13, 2023

rbrandao commented Sep 13, 2023

rbrandao commented Sep 20, 2023

takahser left a comment

Choose a reason for hiding this comment

takahser Sep 22, 2023

Choose a reason for hiding this comment

rbrandao Sep 25, 2023 • edited Loading

Choose a reason for hiding this comment

semuelle left a comment

Choose a reason for hiding this comment

semuelle Sep 27, 2023

Choose a reason for hiding this comment

rbrandao Sep 27, 2023

Choose a reason for hiding this comment

rbrandao commented Sep 27, 2023 • edited Loading

rbrandao commented Oct 12, 2023

takahser commented Oct 12, 2023

rbrandao commented Oct 12, 2023 • edited Loading

semuelle left a comment

Choose a reason for hiding this comment

Noc2 left a comment

Choose a reason for hiding this comment

rbrandao commented Oct 20, 2023

rbrandao commented Oct 20, 2023

takahser commented Oct 25, 2023

github-actions bot commented Sep 13, 2023 •

edited

Loading

rbrandao Sep 25, 2023 •

edited

Loading

rbrandao commented Sep 27, 2023 •

edited

Loading

rbrandao commented Oct 12, 2023 •

edited

Loading