Skip to content

Roadmap

theblackcat102 edited this page Apr 20, 2021 · 1 revision

Welcome to the extractnet roadmap 2021

Our roadmap typically looks out 12-18 months and we establish topics we want to work on. We don't start with our roadmap on a blank sheet.

Values

Before we go into the details, we are going to reiterate the values of Extractnet

  • ExtractNet will be a core library for extraction algorithm that is versatile to changes in the new decade

  • Allow other users to extend ExtractNet beyond simple content extraction ( ie shopping items, forum discussion threads )

Themes for 2021

For 2021 we will continue to focus in the following points

  • Increase content extraction performance in terms of accuracy and F1

  • Content blocked detection

  • Streamline model releases and increase release pace ideally to every 6 months

Performance

  • Implement space/newline classification during the join of each text block.

Features

  • Detect input html text contain limited content due to paywall

  • Return the extraction success rate in terms of confidence value

Summary

These are examples of some of the work we will be focusing on in the next 12-18 months. We continuously tune the plan based on feedback. We will develop our next roadmap in around 12 months from now. Please follow along and let us know what you think!

Clone this wiki locally