Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about thumos14 label #8

Open
menghuaa opened this issue Apr 27, 2022 · 17 comments
Open

about thumos14 label #8

menghuaa opened this issue Apr 27, 2022 · 17 comments

Comments

@menghuaa
Copy link

Hello, in thumbos14, CliffDiving is a subclass of Diving, and the action instances of CliffDiving in the annotation file also belong to Diving. Why don't you use this prior knowledge to remove the action instance of CliffDiving class in the Diving class during training and add a Diving class for each predicted CliffDiving action instance during post-processing?
I think an action instance belonging to two categories may make the training difficult to converge.

@Pilhyeon
Copy link
Owner

Thanks for your suggestion!

In fact, I have noticed some papers on fully-supervised temporal action localization that use such a label engineering technique.

However, to my knowledge, existing weakly-supervised approaches do not use it.

Therefore, we did not adopt it for a fair comparison with the previous works, although it may bring some performance gains.

@menghuaa
Copy link
Author

谢谢你的建议!

事实上,我注意到一些关于全监督时间动作定位的论文使用了这种标签工程技术。

然而,据我所知,现有的弱监督方法并没有使用它。

因此,我们没有采用它来与之前的作品进行公平的比较,尽管它可能会带来一些性能提升。

Thanks for your reply. For the point annotation of Thumos14, SFnet provides four annotation files. Are these four files manually annotated? Is the Thumos 14 point annotation uniformly sampled from the ground truth mentioned in your paper generated by yourself or provided by other papers?

@Pilhyeon
Copy link
Owner

Pilhyeon commented May 2, 2022

As I have stated in the paper, we used the automatically generated point-level labels that are provided by Moltisanti et al. (CVPR'19).

The point-level labels can be found on their project page, specifically the 'train_df_ts_in_gt.csv' file.

@menghuaa
Copy link
Author

menghuaa commented May 2, 2022

As I have stated in the paper, we used the automatically generated point-level labels that are provided by Moltisanti et al. (CVPR'19).

The point-level labels can be found on their project page, specifically the 'train_df_ts_in_gt.csv' file.

In the paper,you perform experiments about comparing different label distributions:Manual,Uniform and Gaussian. Where did you get the manual and uniform label?

@Pilhyeon
Copy link
Owner

Pilhyeon commented May 2, 2022

The manual labels are provided by SF-Net, while the Uniform-distributed labels are generated using ground-truth intervals in the dataset construction stage before the training starts.

@menghuaa
Copy link
Author

menghuaa commented May 2, 2022

The manual labels are provided by SF-Net, while the Uniform-distributed labels are generated using ground-truth intervals in the dataset construction stage before the training starts.

I see the SF-Net,but it provides the four single-frame text files. Are these four files manually annotated?Do you use one of the txt files?

@Pilhyeon
Copy link
Owner

Pilhyeon commented May 2, 2022

All the four files contain manually labeled annotations from different annotators. For selection, we followed SF-Net official code that randomly chooses the annotator id for each video in the dataset construction stage.

@menghuaa
Copy link
Author

menghuaa commented May 2, 2022

All the four files contain manually labeled annotations from different annotators. For selection, we followed SF-Net official code that randomly chooses the annotator id for each video in the dataset construction stage.

Thanks for your reply. Have you noticed that there is an annotation file THUMOS2.txt that the video of CliffDiving class is not marked with its parent class Diving? But in other annotation files, the videos of CliffDiving class still belong to the parent class Diving.

@Pilhyeon
Copy link
Owner

Pilhyeon commented May 3, 2022

I am not sure whether there are any papers that reduce the Cliffdiving class to the Diving class.
An example of the opposite case is the WTAL-C codebase, which is widely used as the baselines for many other works.
You may check out how others do by navigating their code links here.

@menghuaa
Copy link
Author

I am not sure whether there are any papers that reduce the Cliffdiving class to the Diving class. An example of the opposite case is the WTAL-C codebase, which is widely used as the baselines for many other works. You may check out how others do by navigating their code links here.
Hi,I find the split_test.txt that you provide lacks three videos,for example video_test_0000270.I want to know the reason.

@Pilhyeon
Copy link
Owner

I followed the implementation of STPN, where it is mentioned that the test split of THUMOS'14 is the same with SSN.

In the SSN paper, the authors mentioned that "2 falsely annotated videos (“270”,“1496”) in the test set are excluded in evaluation" and they used only 210 testing videos for evaluation.

@menghuaa
Copy link
Author

I followed the implementation of STPN, where it is mentioned that the test split of THUMOS'14 is the same with SSN.

In the SSN paper, the authors mentioned that "2 falsely annotated videos (“270”,“1496”) in the test set are excluded in evaluation" and they used only 210 testing videos for evaluation.
Thank you very much.

@daidaiershidi
Copy link

I followed the implementation of STPN, where it is mentioned that the test split of THUMOS'14 is the same with SSN.
In the SSN paper, the authors mentioned that "2 falsely annotated videos (“270”,“1496”) in the test set are excluded in evaluation" and they used only 210 testing videos for evaluation.
Thank you very much.

Hello, I'm sorry to bother you. I am a beginner and would like to ask you why some fully supervised methods, such as actionformer, use feature lengths that are inconsistent with the feature lengths you provide. Is it because i3d uses different sampling rates when extracting features?

@Pilhyeon
Copy link
Owner

I followed the implementation of STPN, where it is mentioned that the test split of THUMOS'14 is the same with SSN.
In the SSN paper, the authors mentioned that "2 falsely annotated videos (“270”,“1496”) in the test set are excluded in evaluation" and they used only 210 testing videos for evaluation.
Thank you very much.

Hello, I'm sorry to bother you. I am a beginner and would like to ask you why some fully supervised methods, such as actionformer, use feature lengths that are inconsistent with the feature lengths you provide. Is it because i3d uses different sampling rates when extracting features?

The feature lengths depend on the sampling rate and the total number of frames. Actionformer adopts a smaller stride of 4 (vs. 16 for ours) with a video fps of 30 (vs. 25 for ours).

@wj0323i
Copy link

wj0323i commented Feb 28, 2024

Hello, I would like to ask that the point label is frame-level, and the video is divided into 16 frames. So how to apply this point-level classification loss? One is a frame and the other is a segment. Looking forward to your reply.

@Pilhyeon
Copy link
Owner

Hello, I would like to ask that the point label is frame-level, and the video is divided into 16 frames. So how to apply this point-level classification loss? One is a frame and the other is a segment. Looking forward to your reply.

The segment, within which the labeled point (frame) falls, is utilized as a positive sample for the point-level loss.

@wj0323i
Copy link

wj0323i commented Feb 29, 2024

Hello, I would like to ask that the point label is frame-level, and the video is divided into 16 frames. So how to apply this point-level classification loss? One is a frame and the other is a segment. Looking forward to your reply.

The segment, within which the labeled point (frame) falls, is utilized as a positive sample for the point-level loss.

Thank you for your reply, and I wish you a happy life!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants