-
Notifications
You must be signed in to change notification settings - Fork 23
Roadmap
Welcome to the extractnet roadmap 2021
Our roadmap typically looks out 12-18 months and we establish topics we want to work on. We don't start with our roadmap on a blank sheet.
Before we go into the details, we are going to reiterate the values of Extractnet
-
ExtractNet will be a core library for extraction algorithm that is versatile to changes in the new decade
-
Allow other users to extend ExtractNet beyond simple content extraction ( ie shopping items, forum discussion threads )
For 2021 we will continue to focus in the following points
-
Increase content extraction performance in terms of accuracy and F1
-
Content blocked detection
-
Streamline model releases and increase release pace ideally to every 6 months
- Implement space/newline classification during the join of each text block.
-
Detect input html text contain limited content due to paywall
-
Return the extraction success rate in terms of confidence value
These are examples of some of the work we will be focusing on in the next 12-18 months. We continuously tune the plan based on feedback. We will develop our next roadmap in around 12 months from now. Please follow along and let us know what you think!