You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You are correct. Supervision signals are indeed required for both the Coarse and Fine stages of LoFTR. It took me some time and effort to adapt LoFTR to the training data of GIM. In comparison, adapting DKM and SuperGlue was simpler. In theory, GIM can extract correspondences from any (unedited) video, so it should be able to provide supervision signals for your task (perhaps your task involves reconstruction or localization from medical surgery videos).
I am almost done organizing everything, and I will release the remaining code by December (or within this year), so please wait a little longer.
您好!非常精彩的工作,感谢您对社区的贡献。
我想借鉴类似的思路(用视频生成匹配数据集来训练LoFTR等模型),因为GIM的训练代码还没公布,因此无法知道具体的数据集构建的一些具体细节。因为Scannet提供的数据是包含深度图、相机参数这些信息的,这些信息可以用来计算一对匹配图片的像素对应关系作为监督ground truth。但这些信息但网络视频肯定无法获取这些参数,所以我猜测你们的思路是构建自己的dataloader, 把匹配的ground truth结果直接传入给模型(具体为compute_supervision_coarse函数和compute_supervision_fine函数),而不需要按照原始的LoFTR代码里面计算一遍。
不知道我的理解是否正确,敬请指正。想了解以上逻辑的背景是因为我的任务是医疗手术视频,无法类似构造Scannet或者MegaDepth这种数据集,因此您用网络视频这种思路比较适合我这类任务。
非常感谢!
The text was updated successfully, but these errors were encountered: