-
Notifications
You must be signed in to change notification settings - Fork 446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to cutout documents from images before further processing #554
Comments
Hey there 👋 Actually we have tackled this internally a few weeks back and it will be integrated into docTR soon 😄 Cheers! |
Hi 👋, in this case i would say if you are ready with your model lets compare both ways :) |
That's a good idea indeed, if you could have a runnable Colab notebook so that we can compare this 👍 (not opening a PR, just sharing it here) |
Only a very basic example but for test purpose it should be enough :) PS: i have also faiced that it works much better if it is resized much smaller and than before _four_point_transform calculate the points back in relation to the original size (not in the colab example) |
Thanks a lot, it looks promising for single page docs! |
@fg-mindee Have a nice weekend |
Well sure, but perhaps we could change your colab to make it work for multiple pages? Regarding the segmentation option, no colab but it will be integrated into docTR within a week or 2! |
@fg-mindee |
@fg-mindee |
@fg-mindee BUT: this works great but for prod it would be need many checks |
Nice 👍 I'm only concerned about the color filtering that seems to be key to the performances of this method. It's usually not robust in bad lightning conditions or any degrading conditions. For the segmentation-based approach, I'll have to check and will let you know next week 👍 |
@fg-mindee
|
For sure, we need to conduct some thorough evaluations now to ensure that this method is robust (or can be made robust)! We'll check next week with @charlesmindee, in the meantime, if you have any idea to make it more robust, feel free to iterate on this approach 👍 |
@fg-mindee |
@fg-mindee |
We should be able to have something in December but for now there is already a lot on our plate 😅 |
@charlesmindee would you mind taking a look at integrating your implementation in docTR for release 0.6.0? 🙏 |
@charlesmindee @frgfm any update if we will keep it for 0.6.0 ? :) |
This is more up to @charlesmindee for the integration 👍 Generally speaking:
So in this case, that should be kept for 0.6.0 yes :) |
@frgfm @charlesmindee |
Mmmh, I think we should consider document edge segmentation as a separate task that can be handled by docTR. That way, people could pass it to the corresponding model without making the core pipeline too complex for now |
Sounds good to me 👍 |
Topic for |
@fg-mindee
@charlesmindee
what do you think about an option in Documentfile.from_images(.., try_cutout=True) which does the following:
Example
I have currently a modified, more stable version running in our company :)
Use Case for example mobile phone images from documents
Would be nice if i can implement this in doctr also :)
What do you think ?
The text was updated successfully, but these errors were encountered: