-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Polkadot Analytics Platform (Stage 2) #1969
Conversation
The Polkadot Analytics Platform aims at building a comprehensive data analysis and visualization tool for the Polkadot ecosystem. This is a follow-up grant application for the project: w3f#1420
The Polkadot Analytics Platform aims at building a comprehensive data analysis and visualization tool for the Polkadot ecosystem. The platform will allow users to retrieve and analyze data from various Polkadot-related sources (e.g., different parachains and components such as browser wallets), aligned with the POnto ontology [1, 2, 3]. Users will be able to specify their queries using a controlled natural language (CNL), and the platform will provide a query engine to process these queries. Additionally, the platform will provide a UI to support constructing queries and visualizing informative artifacts that represent query results. As well as support for composing customizable dashboards using these artifacts. In its current stage, the platform is composed of a knowledge base (KB) and its initial representation with the POnto ontology [4]. The current grant proposal focuses on populating this KB through the creation of semantic ETL pipelines, i.e. information extraction workflows that will extract, reuse and integrate data from different sources, aligning and structuring domain knowledge in the KB. We will create data extraction mechanisms to gather data from various Polkadot-related sources using Substrate interfaces (e.g. using Substrate-ETL), as well as offchain (e.g., using Github API). [1] POnto source code: https://github.com/mobr-ai/POnto [2] POnto documentation: https://www.mobr.ai/ponto [3] POnto scientific paper: https://github.com/mobr-ai/POnto/raw/main/deliverables/milestone3/article.pdf [4] Polkadot Analytics Platform source code: https://github.com/mobr-ai/PolkadotAnalytics
CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅ |
I have read and hereby sign the Contributor License Agreement. |
recheck |
Co-authored-by: Nikhil W3F <[email protected]>
Co-authored-by: Nikhil W3F <[email protected]>
Changing all quotes (>) to bold (**) in markdown format.
Thanks for the heads-up @nikw3f, I changed all the quotes (>) to bold (**) in the markdown format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rbrandao I added a few inline comments, feel free to have a look.
| **1** | Substrate-ETL extraction scripts | Extraction scripts to gather data from Substrate-ETL. | | ||
| **2** | Polkadot Data extraction scripts | Extraction scripts to gather polkadot related data not available on Substrate-ETL. | | ||
| **3** | GitHub Data extraction scripts | Extraction scripts to gather data from GitHub, which will be used to support queries relating to network engagement and innovation. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently to vague. Could you dive more into specifics so we know which kind of data we can expect to be extracted here from the various sources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the previous stage, we extended the POnto ontology to structure relevant concepts of the Polkadot ecosystem, including entities from example queries of the Substrate-ETL project and the related RFP mentioned in this application.
In M1, the idea is to implement scripts to gather data to support the previously analyzed queries. That is, gathering data from Substrate-ETL, Substrate-based assets and services (such as Polkassembly, polkadot.js, and others), and GitHub (in order to answer questions about the ecosystem evolution).
In deliverable 1, we will extract data from all the SQL tables specified in the Substrate-ETL. In deliverable 2, we will explore APIs like the ones used in the Polkassembly to extract OpenGov data to answer the queries specified in the RFP. In deliverable 3, we will use github APIs to extract data from the repositories related to the Polkadot Ecosystem, aiming at answering questions regarding dev-community engagement. E.g.: which parachain had more PR in a specific time interval.
Adjusting application to address @takahser's comments
Adding funding estimates info to the "Future plans" section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the application, @rbrandao. Could you add an architecture diagram that shows where and how the deliverables fit into the complete architecture? And ideally a deliverable in form of a demo showing how the results of this grant will be used in the future? Basically what the testing guide would be, but with the future work in mind ("This question will be converted in this query and this is the result of that query")?
| **0d.** | Docker | Docker images for the ETL workflows. | | ||
| **0e.** | Article | We will write a medium post with an overview of semantic ETL workflows and their tasks. | | ||
| **1** | Concepts and data sources mapping | Extension of POnto, mapping relevant entities and corresponding datasources. This mapping will be used later on to automatize the proposed Semantic ETL Pipelines to continuously sync the KB with the latest ecosystem state. | | ||
| **2.** | Ontology Alignment | Scripts to perform ontology alignment with the extracted data as entities in the POnto ontology. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is ontology alignment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our project, we consider "ontology alignment" the task of establishing correspondence between the POnto ontology and the schema structures used in different data sources. This process aims at enriching the extracted data with semantics, to allow structured queries and knowledge consumption.
In general, ontologies are formal representations of knowledge that define the concepts and relationships within a specific domain. When extracting data from multiple sources, each source may employ its own ontology, leading to semantic heterogeneity. Ontology alignment aims to resolve these differences by mapping equivalent or related concepts across different sources, ensuring interoperability and coherence in the integrated data.
Updating application considering @semuelle's feedback
Hi @semuelle, we extended the architectural diagram (see Figure 2) reflecting your suggestions. The orange boxes highlight the components that we will develop in the current application. We are focusing on the Data Layer, to develop the extraction scripts (in M1) and to structure these scripts in "semantic workflows" (in M2) that will perform alignment with the POnto ontology and inject entities as individuals in the KB.
We updated the milestone 2 deliverable table, extending the tutorial (deliverable 0b) to comprise the demo you suggested in the Jupyter notebook. In addition, Figure 1 illustrates the main steps of the user interaction you mentioned (from the CNL query input to result visualization, and final user feedback). |
Hi @nikw3f, just saw that the github bot added the "stale" label to this PR. Anything pending on our side? |
@rbrandao sorry for the delay here, we've currently got a bit of a backlog. I removed the label. We'll be back with more feedback soon! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rbrandao. Thanks for the updates. I just noticed that you are estimating another 55,000 USD required after this grant to get to the final stage of the project. Including the previous grants, that's over 120,000 USD. Given the scope of other, much smaller grants from already established teams, this seems disproportionate.
Since I won't be able to reply to comments in the next days, I will withhold my vote and let the rest of the committee give their view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the new application. However, after taking a look at your previous deliveries and your GitHub org, I decided not to support this application for now. In general, I think that UI is better suited for most of the aspects that the application tries to cover. Additionally, I wasn't convinced by your deliveries, nor do I see any progress since the delivery in your GitHub org. That said, I wish you all the best with the project and will ping the rest of the committee again.
Thanks for the feedback @semuelle. If I may ask, how much you think should be a proportional value for a similar scope of work that we are proposing? I mean, a broad solution encompassing mechanisms to structure diverse data sources, aligned through an ontology in a KB to be queried over a controlled natural language. And that led by a team of experts with proven record of delivering similar solutions in different industries. From our experience, 120k for such a project is below the market. |
Hi @Noc2 , thanks for the comment.
Maybe I'm missing something here? The UI (frontend) has to retrieve data from a source in the backend, as explained in our application we are pulling data from different sources (including chain data from ETLs) that would be aligned in a Knowledge Base. There is no way to get all of that on the UI without backend assets.
That is really unfortunate that you aren't convinced by our deliveries. We delivered all of the proposed milestones according to the roadmap defined in our previous applications, and we did not miss any deadline whatsoever. Regarding the progress on our github org, we have been working on private projects in our startup. I don´t think it is fair to assess the progress of a project that wasn't approved in the first place. |
@rbrandao thanks for your patience and the work you put into this. However, unfortunately the w3f grants committee decided not to pursue your proposal further. The reasons for this decision include:
At this point, getting your project funded through alternative funding sources like the treasury seem worth exploring. We'd like to add that you're still welcome to apply for grants at our program in the future. We wish you all the best for the future of your project and thank you for your interest in our program! |
Project Abstract
The Polkadot Analytics Platform aims at building a comprehensive data analysis and visualization tool for the Polkadot ecosystem. The platform will allow users to retrieve and analyze data from various Polkadot-related sources (e.g., different parachains and components such as browser wallets), aligned with the POnto ontology [1, 2, 3]. Users will be able to specify their queries using a controlled natural language (CNL), and the platform will provide a query engine to process these queries. Additionally, the platform will provide a UI to support constructing queries and visualizing informative artifacts that represent query results. As well as support for composing customizable dashboards using these artifacts.
In its current stage, the platform is composed of a knowledge base (KB) and its initial representation with the POnto ontology [4]. The current grant proposal focuses on populating this KB through the creation of semantic ETL pipelines, i.e. information extraction workflows that will extract, reuse and integrate data from different sources, aligning and structuring domain knowledge in the KB. We will create data extraction mechanisms to gather data from various Polkadot-related sources using Substrate interfaces (e.g. using Substrate-ETL), as well as offchain (e.g., using Github API).
[1] POnto source code: https://github.com/mobr-ai/POnto
[2] POnto documentation: https://www.mobr.ai/ponto
[3] POnto scientific paper: https://github.com/mobr-ai/POnto/raw/main/deliverables/milestone3/article.pdf
[4] Polkadot Analytics Platform source code: https://github.com/mobr-ai/PolkadotAnalytics
This is a follow-up grant application for the project A Knowledge-Oriented Approach to Enhance Integration and Communicability in the Polkadot Ecosystem and the project A Polkadot Analytics Platform: Stage 1
Grant level
Application Checklist
project_name.md
).@_______:matrix.org
(change the homeserver if you use a different one)