Question about training Tnet model only using real data #3

Sanster · 2022-08-17T09:38:45Z

Have you ever tried to train Tnet only using real data(using unsupervised training)? I am curious if it is possible to converge. Thanks

wkema · 2022-08-18T06:16:54Z

This is a good idea. Tnet could converge with some regularizers. Unfortunately, Tnet has no idea what a flat document looks like if you only use real data for unsupervised training.

In my experiment, all the input images were "unwarped" to a barrel-like distortion. I won't be surprised if it converges to other distortions lol.

We added a lot of tricks to make it work finally but the quantitively results could not even match the model trained on the synthetic data only.

I would be interested/excited to see if any unsupervised methods could achieve better results :)

Sanster · 2022-08-18T08:50:14Z

Thank you for sharing the experiment detail! I am currently trying to implement the document dewarping method to achieve the following effect (recorded from https://www.textin.com/experience/text_auto_removal)

2022-08-18.2.07.59.mov

Among several methods, PaperEdge can get very good results

PaperEdge	DDCP	docTr

Sanster · 2022-08-19T09:31:26Z

As far as I know, maybe the closest approach to self-supervision is Fourier Document Restoration for Robust Document Dewarping and Recognition. Although it open-sources the dataset, unfortunately, the authors do not have the open source code.

wkema · 2022-08-20T04:21:23Z

As far as I know, maybe the closest approach to self-supervision is Fourier Document Restoration for Robust Document Dewarping and Recognition. Although it open-sources the dataset, unfortunately, the authors do not have the open source code.

lol yeah I read that paper. The ideas are very similar. Might be a concurrent work lol.

hanquansanren · 2022-09-27T06:37:48Z

I have tried to reproduce FDRNet recently, but it seems hard to converge.

As far as I know, maybe the closest approach to self-supervision is Fourier Document Restoration for Robust Document Dewarping and Recognition. Although it open-sources the dataset, unfortunately, the authors do not have the open source code.

zbzzz · 2022-11-01T11:25:58Z

您好，请问docunet数据集您那里有吗

wkema · 2022-11-02T01:55:17Z

您好，请问docunet数据集您那里有吗

Sorry I just found the data server in my previous lab had been down...so neither the DocUNet benchmark nor the Doc3D dataset is inaccessible.

If you just need the benchmark dataset, I have a backup copy on google drive:
scan.zip
https://drive.google.com/file/d/1IxeS8wwwXQUBt6grcUcNoszL2UyHCSBb/view?usp=sharing
crop.zip
https://drive.google.com/file/d/1w5_eimkpS2lpB9w-XKc8uKby5GDN8NIf/view?usp=share_link
eval.zip
https://drive.google.com/file/d/1RpjNxTF6hg2lv65qiYRWfsy9UGNFgal0/view?usp=share_link

As to the Doc3D dataset, it is too large to put on gdrive....I am not sure when the data server will be back online... sorry for the inconvenience.

ZhangXueBang · 2022-11-02T02:37:59Z

感谢您分享实验细节！我目前正在尝试实现文档去畸变方法以达到以下效果（记录自https://www.textin.com/experience/text_auto_removal)

2022-08-18.2.07.59.mov
在几种方法中，PaperEdge可以获得非常好的结果

纸边 DDCP 文档

您好，很抱歉打扰。我刚刚做这个项目，但是作者实验室的服务器关闭了，无法访问 Doc3D 数据集。看到评论区您有做过这个项目，所以想问一下您还有没有数据集的保存，如果有的话，希望能够得到分享非常感谢

hanquansanren · 2022-11-02T02:46:45Z

该数据集非常庞大，大约有1TB，通过网络传输非常困难，如果你在中国大陆，或许我可以线下分享给你们

Sanster · 2022-11-02T02:47:24Z

Doc3D 的数据太大了。。。还是等作者服务器恢复吧

ZhangXueBang · 2022-11-02T03:00:10Z

该数据集非常庞大，大约有1TB，通过网络传输非常困难，如果你在中国大陆，或许我可以线下分享给你们

嗯嗯嗯，确实太大了，谢谢您的回复。我在中国大连，线下还是太麻烦您了，就不用了。我看到上面作者有回复benchmark dataset，不知道用这个数据集还有lDIW数据集能不能运行这个项目呢。

ZhangXueBang · 2022-11-02T03:00:55Z

Doc3D 的数据太大了。。。还是等作者服务器恢复吧

嗯嗯嗯，好的，谢谢您的回复。

ZhangXueBang · 2022-11-02T11:48:43Z

Doc3D 的数据太大了。。。还是等作者服务器恢复吧

您好，能问一下文档bgtex.txt中的图片对应的是数据集中的那一部分吗
/nfs/bigretina/kema/data/dtd/images/perforated/perforated_0103.jpg
/nfs/bigretina/kema/data/dtd/images/perforated/perforated_0089.jpg
/nfs/bigretina/kema/data/dtd/images/perforated/perforated_0015.jpg
/nfs/bigretina/kema/data/dtd/images/perforated/perforated_0069.jpg
/nfs/bigretina/kema/data/dtd/images/perforated/perforated_0144.jpg
进行训练的时候一直在报路径的错误

zbzzz · 2022-11-13T02:44:30Z

Doc3D 的数据太大了。。。还是等作者服务器恢复吧

您好，请问一下，数据量这么大，你们是怎样下载下来的，电脑的存储容量不够啊，还有就是能不能用其他数据集代替呢

Sanster · 2022-11-13T04:43:52Z

Doc3D 的数据太大了。。。还是等作者服务器恢复吧

您好，请问一下，数据量这么大，你们是怎样下载下来的，电脑的存储容量不够啊，还有就是能不能用其他数据集代替呢

原来就是通过作者的服务器下载的，据我所知没有这么全的数据集了，另一个选择是使用作者的代码自己生成 https://github.com/sagniklp/doc3D-renderer

zbzzz · 2022-11-14T01:29:47Z

Doc3D 的数据太大了。。。还是等作者服务器恢复吧

您好，请问一下，数据量这么大，你们是怎样下载下来的，电脑的存储容量不够啊，还有就是能不能用其他数据集代替呢

原来就是通过作者的服务器下载的，据我所知没有这么全的数据集了，另一个选择是使用作者的代码自己生成 https://github.com/sagniklp/doc3D-renderer

非常感谢您的回复，这个脚本里用到了一个bpy包，下载完blender后，在python里还是无法运行，想知道如何解决

erenxjw · 2023-03-28T07:07:29Z

我是一个学生，请帮助我一下，我想知道大家是如何自己训练出个作者已经训练好的那两个预模型，希望大家可以给我分享代码，感激不尽，本人邮箱[email protected]

leonodelee · 2023-04-24T07:24:12Z

Have you solved this？Also mentioned in #18 (comment)

yy769405513 · 2023-10-31T09:41:23Z

该数据集非常庞大，大约有1TB，通过网络传输非常困难，如果你在中国大陆，或许我可以线下分享给你们

您好，如果您愿意分享这个数据集，我将十分感谢，我的坐标在杭州

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about training Tnet model only using real data #3

Question about training Tnet model only using real data #3

Sanster commented Aug 17, 2022

wkema commented Aug 18, 2022

Sanster commented Aug 18, 2022

Sanster commented Aug 19, 2022

wkema commented Aug 20, 2022

hanquansanren commented Sep 27, 2022

zbzzz commented Nov 1, 2022

wkema commented Nov 2, 2022

ZhangXueBang commented Nov 2, 2022

hanquansanren commented Nov 2, 2022

Sanster commented Nov 2, 2022

ZhangXueBang commented Nov 2, 2022

ZhangXueBang commented Nov 2, 2022

ZhangXueBang commented Nov 2, 2022

zbzzz commented Nov 13, 2022

Sanster commented Nov 13, 2022

zbzzz commented Nov 14, 2022

erenxjw commented Mar 28, 2023

leonodelee commented Apr 24, 2023

yy769405513 commented Oct 31, 2023

Question about training Tnet model only using real data #3

Question about training Tnet model only using real data #3

Comments

Sanster commented Aug 17, 2022

wkema commented Aug 18, 2022

Sanster commented Aug 18, 2022

Sanster commented Aug 19, 2022

wkema commented Aug 20, 2022

hanquansanren commented Sep 27, 2022

zbzzz commented Nov 1, 2022

wkema commented Nov 2, 2022

ZhangXueBang commented Nov 2, 2022

hanquansanren commented Nov 2, 2022

Sanster commented Nov 2, 2022

ZhangXueBang commented Nov 2, 2022

ZhangXueBang commented Nov 2, 2022

ZhangXueBang commented Nov 2, 2022

zbzzz commented Nov 13, 2022

Sanster commented Nov 13, 2022

zbzzz commented Nov 14, 2022

erenxjw commented Mar 28, 2023

leonodelee commented Apr 24, 2023

yy769405513 commented Oct 31, 2023