Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polkadot Analytics Platform (Stage 2) #1969

Closed
wants to merge 13 commits into from
Closed

Conversation

rbrandao
Copy link
Contributor

Project Abstract

The Polkadot Analytics Platform aims at building a comprehensive data analysis and visualization tool for the Polkadot ecosystem. The platform will allow users to retrieve and analyze data from various Polkadot-related sources (e.g., different parachains and components such as browser wallets), aligned with the POnto ontology [1, 2, 3]. Users will be able to specify their queries using a controlled natural language (CNL), and the platform will provide a query engine to process these queries. Additionally, the platform will provide a UI to support constructing queries and visualizing informative artifacts that represent query results. As well as support for composing customizable dashboards using these artifacts.

In its current stage, the platform is composed of a knowledge base (KB) and its initial representation with the POnto ontology [4]. The current grant proposal focuses on populating this KB through the creation of semantic ETL pipelines, i.e. information extraction workflows that will extract, reuse and integrate data from different sources, aligning and structuring domain knowledge in the KB. We will create data extraction mechanisms to gather data from various Polkadot-related sources using Substrate interfaces (e.g. using Substrate-ETL), as well as offchain (e.g., using Github API).

[1] POnto source code: https://github.com/mobr-ai/POnto
[2] POnto documentation: https://www.mobr.ai/ponto
[3] POnto scientific paper: https://github.com/mobr-ai/POnto/raw/main/deliverables/milestone3/article.pdf
[4] Polkadot Analytics Platform source code: https://github.com/mobr-ai/PolkadotAnalytics

This is a follow-up grant application for the project A Knowledge-Oriented Approach to Enhance Integration and Communicability in the Polkadot Ecosystem and the project A Polkadot Analytics Platform: Stage 1

Grant level

  • Level 1: Up to $10,000, 2 approvals
  • Level 2: Up to $30,000, 3 approvals
  • Level 3: Unlimited, 5 approvals (for >$100k: Web3 Foundation Council approval)

Application Checklist

  • The application template has been copied and aptly renamed (project_name.md).
  • I have read the application guidelines.
  • Payment details have been provided (bank details via email or BTC, Ethereum (USDC/DAI) or Polkadot/Kusama (USDT) address in the application).
  • The software delivered for this grant will be released under an open-source license specified in the application.
  • The initial PR contains only one commit (squash and force-push if needed).
  • The grant will only be announced once the first milestone has been accepted (see the announcement guidelines).
  • I prefer the discussion of this application to take place in a private Element/Matrix channel. My username is: @_______:matrix.org (change the homeserver if you use a different one)

The Polkadot Analytics Platform aims at building a comprehensive data analysis and visualization tool for the Polkadot ecosystem. 

This is a follow-up grant application for the project: w3f#1420
The Polkadot Analytics Platform aims at building a comprehensive data analysis and visualization tool for the Polkadot ecosystem. The platform will allow users to retrieve and analyze data from various Polkadot-related sources (e.g., different parachains and components such as browser wallets), aligned with the POnto ontology [1, 2, 3]. Users will be able to specify their queries using a controlled natural language (CNL), and the platform will provide a query engine to process these queries. Additionally, the platform will provide a UI to support constructing queries and visualizing informative artifacts that represent query results. As well as support for composing customizable dashboards using these artifacts. 

In its current stage, the platform is composed of a knowledge base (KB) and its initial representation with the POnto ontology [4]. The current grant proposal focuses on populating this KB through the creation of semantic ETL pipelines, i.e. information extraction workflows that will extract, reuse and integrate data from different sources, aligning and structuring domain knowledge in the KB. We will create data extraction mechanisms to gather data from various Polkadot-related sources using Substrate interfaces (e.g. using Substrate-ETL), as well as offchain (e.g., using Github API).


[1] POnto source code: https://github.com/mobr-ai/POnto
[2] POnto documentation: https://www.mobr.ai/ponto
[3] POnto scientific paper:  https://github.com/mobr-ai/POnto/raw/main/deliverables/milestone3/article.pdf  
[4] Polkadot Analytics Platform source code: https://github.com/mobr-ai/PolkadotAnalytics
@github-actions
Copy link
Contributor

github-actions bot commented Sep 13, 2023

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@rbrandao
Copy link
Contributor Author

I have read and hereby sign the Contributor License Agreement.

@rbrandao
Copy link
Contributor Author

recheck

@rbrandao
Copy link
Contributor Author

Thanks for the heads-up @nikw3f, I changed all the quotes (>) to bold (**) in the markdown format.

@nikw3f nikw3f added the ready for review The project is ready to be reviewed by the committee members. label Sep 21, 2023
Copy link
Collaborator

@takahser takahser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rbrandao I added a few inline comments, feel free to have a look.

applications/polkadot_analytics_platform_stage2.md Outdated Show resolved Hide resolved
applications/polkadot_analytics_platform_stage2.md Outdated Show resolved Hide resolved
Comment on lines 350 to 352
| **1** | Substrate-ETL extraction scripts | Extraction scripts to gather data from Substrate-ETL. |
| **2** | Polkadot Data extraction scripts | Extraction scripts to gather polkadot related data not available on Substrate-ETL. |
| **3** | GitHub Data extraction scripts | Extraction scripts to gather data from GitHub, which will be used to support queries relating to network engagement and innovation. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently to vague. Could you dive more into specifics so we know which kind of data we can expect to be extracted here from the various sources?

Copy link
Contributor Author

@rbrandao rbrandao Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the previous stage, we extended the POnto ontology to structure relevant concepts of the Polkadot ecosystem, including entities from example queries of the Substrate-ETL project and the related RFP mentioned in this application.

In M1, the idea is to implement scripts to gather data to support the previously analyzed queries. That is, gathering data from Substrate-ETL, Substrate-based assets and services (such as Polkassembly, polkadot.js, and others), and GitHub (in order to answer questions about the ecosystem evolution).

In deliverable 1, we will extract data from all the SQL tables specified in the Substrate-ETL. In deliverable 2, we will explore APIs like the ones used in the Polkassembly to extract OpenGov data to answer the queries specified in the RFP. In deliverable 3, we will use github APIs to extract data from the repositories related to the Polkadot Ecosystem, aiming at answering questions regarding dev-community engagement. E.g.: which parachain had more PR in a specific time interval.

applications/polkadot_analytics_platform_stage2.md Outdated Show resolved Hide resolved
Adjusting application to address @takahser's comments
Adding funding estimates info to the "Future plans" section.
Copy link
Member

@semuelle semuelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the application, @rbrandao. Could you add an architecture diagram that shows where and how the deliverables fit into the complete architecture? And ideally a deliverable in form of a demo showing how the results of this grant will be used in the future? Basically what the testing guide would be, but with the future work in mind ("This question will be converted in this query and this is the result of that query")?

| **0d.** | Docker | Docker images for the ETL workflows. |
| **0e.** | Article | We will write a medium post with an overview of semantic ETL workflows and their tasks. |
| **1** | Concepts and data sources mapping | Extension of POnto, mapping relevant entities and corresponding datasources. This mapping will be used later on to automatize the proposed Semantic ETL Pipelines to continuously sync the KB with the latest ecosystem state. |
| **2.** | Ontology Alignment | Scripts to perform ontology alignment with the extracted data as entities in the POnto ontology. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is ontology alignment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our project, we consider "ontology alignment" the task of establishing correspondence between the POnto ontology and the schema structures used in different data sources. This process aims at enriching the extracted data with semantics, to allow structured queries and knowledge consumption.

In general, ontologies are formal representations of knowledge that define the concepts and relationships within a specific domain. When extracting data from multiple sources, each source may employ its own ontology, leading to semantic heterogeneity. Ontology alignment aims to resolve these differences by mapping equivalent or related concepts across different sources, ensuring interoperability and coherence in the integrated data.

Updating application considering @semuelle's feedback
@rbrandao
Copy link
Contributor Author

rbrandao commented Sep 27, 2023

Thanks for the application, @rbrandao. Could you add an architecture diagram that shows where and how the deliverables fit into the complete architecture?

Hi @semuelle, we extended the architectural diagram (see Figure 2) reflecting your suggestions. The orange boxes highlight the components that we will develop in the current application. We are focusing on the Data Layer, to develop the extraction scripts (in M1) and to structure these scripts in "semantic workflows" (in M2) that will perform alignment with the POnto ontology and inject entities as individuals in the KB.

And ideally a deliverable in form of a demo showing how the results of this grant will be used in the future? Basically what the testing guide would be, but with the future work in mind ("This question will be converted in this query and this is the result of that query")?

We updated the milestone 2 deliverable table, extending the tutorial (deliverable 0b) to comprise the demo you suggested in the Jupyter notebook. In addition, Figure 1 illustrates the main steps of the user interaction you mentioned (from the CNL query input to result visualization, and final user feedback).

@github-actions github-actions bot added the stale label Oct 12, 2023
@rbrandao
Copy link
Contributor Author

Hi @nikw3f, just saw that the github bot added the "stale" label to this PR. Anything pending on our side?

@takahser takahser self-requested a review October 12, 2023 11:53
@takahser takahser removed the stale label Oct 12, 2023
@takahser
Copy link
Collaborator

@rbrandao sorry for the delay here, we've currently got a bit of a backlog. I removed the label. We'll be back with more feedback soon!

@takahser takahser requested a review from nikw3f October 12, 2023 11:54
@takahser takahser requested a review from semuelle October 12, 2023 11:54
@rbrandao
Copy link
Contributor Author

rbrandao commented Oct 12, 2023

@rbrandao sorry for the delay here, we've currently got a bit of a backlog. I removed the label. We'll be back with more feedback soon!

Thanks for the prompt reply @takahser. We are looking forward to it.

Copy link
Member

@semuelle semuelle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rbrandao. Thanks for the updates. I just noticed that you are estimating another 55,000 USD required after this grant to get to the final stage of the project. Including the previous grants, that's over 120,000 USD. Given the scope of other, much smaller grants from already established teams, this seems disproportionate.
Since I won't be able to reply to comments in the next days, I will withhold my vote and let the rest of the committee give their view.

Copy link
Collaborator

@Noc2 Noc2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the new application. However, after taking a look at your previous deliveries and your GitHub org, I decided not to support this application for now. In general, I think that UI is better suited for most of the aspects that the application tries to cover. Additionally, I wasn't convinced by your deliveries, nor do I see any progress since the delivery in your GitHub org. That said, I wish you all the best with the project and will ping the rest of the committee again.

@rbrandao
Copy link
Contributor Author

Hi @rbrandao. Thanks for the updates. I just noticed that you are estimating another 55,000 USD required after this grant to get to the final stage of the project. Including the previous grants, that's over 120,000 USD. Given the scope of other, much smaller grants from already established teams, this seems disproportionate. Since I won't be able to reply to comments in the next days, I will withhold my vote and let the rest of the committee give their view.

Thanks for the feedback @semuelle. If I may ask, how much you think should be a proportional value for a similar scope of work that we are proposing? I mean, a broad solution encompassing mechanisms to structure diverse data sources, aligned through an ontology in a KB to be queried over a controlled natural language. And that led by a team of experts with proven record of delivering similar solutions in different industries. From our experience, 120k for such a project is below the market.

@rbrandao
Copy link
Contributor Author

Hi @Noc2 , thanks for the comment.

In general, I think that UI is better suited for most of the aspects that the application tries to cover.

Maybe I'm missing something here? The UI (frontend) has to retrieve data from a source in the backend, as explained in our application we are pulling data from different sources (including chain data from ETLs) that would be aligned in a Knowledge Base. There is no way to get all of that on the UI without backend assets.

Additionally, I wasn't convinced by your deliveries, nor do I see any progress since the delivery in your GitHub org.

That is really unfortunate that you aren't convinced by our deliveries. We delivered all of the proposed milestones according to the roadmap defined in our previous applications, and we did not miss any deadline whatsoever. Regarding the progress on our github org, we have been working on private projects in our startup. I don´t think it is fair to assess the progress of a project that wasn't approved in the first place.

@takahser
Copy link
Collaborator

@rbrandao thanks for your patience and the work you put into this. However, unfortunately the w3f grants committee decided not to pursue your proposal further. The reasons for this decision include:

  • the amount of funds you'd require to finish your projects (>$120k) in comparison to the value that we perceive is being created for the community in return
  • the vague formulation of some parts of the proposal
  • concerns related to the deliveries in your previous grant

At this point, getting your project funded through alternative funding sources like the treasury seem worth exploring. We'd like to add that you're still welcome to apply for grants at our program in the future.

We wish you all the best for the future of your project and thank you for your interest in our program!

@takahser takahser closed this Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready for review The project is ready to be reviewed by the committee members.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants