-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
about thumos14 label #8
Comments
Thanks for your suggestion! In fact, I have noticed some papers on fully-supervised temporal action localization that use such a label engineering technique. However, to my knowledge, existing weakly-supervised approaches do not use it. Therefore, we did not adopt it for a fair comparison with the previous works, although it may bring some performance gains. |
Thanks for your reply. For the point annotation of Thumos14, SFnet provides four annotation files. Are these four files manually annotated? Is the Thumos 14 point annotation uniformly sampled from the ground truth mentioned in your paper generated by yourself or provided by other papers? |
As I have stated in the paper, we used the automatically generated point-level labels that are provided by Moltisanti et al. (CVPR'19). The point-level labels can be found on their project page, specifically the 'train_df_ts_in_gt.csv' file. |
In the paper,you perform experiments about comparing different label distributions:Manual,Uniform and Gaussian. Where did you get the manual and uniform label? |
The manual labels are provided by SF-Net, while the Uniform-distributed labels are generated using ground-truth intervals in the dataset construction stage before the training starts. |
I see the SF-Net,but it provides the four single-frame text files. Are these four files manually annotated?Do you use one of the txt files? |
All the four files contain manually labeled annotations from different annotators. For selection, we followed SF-Net official code that randomly chooses the annotator id for each video in the dataset construction stage. |
Thanks for your reply. Have you noticed that there is an annotation file THUMOS2.txt that the video of CliffDiving class is not marked with its parent class Diving? But in other annotation files, the videos of CliffDiving class still belong to the parent class Diving. |
|
I followed the implementation of STPN, where it is mentioned that the test split of THUMOS'14 is the same with SSN. In the SSN paper, the authors mentioned that "2 falsely annotated videos (“270”,“1496”) in the test set are excluded in evaluation" and they used only 210 testing videos for evaluation. |
|
Hello, I'm sorry to bother you. I am a beginner and would like to ask you why some fully supervised methods, such as actionformer, use feature lengths that are inconsistent with the feature lengths you provide. Is it because i3d uses different sampling rates when extracting features? |
The feature lengths depend on the sampling rate and the total number of frames. Actionformer adopts a smaller stride of 4 (vs. 16 for ours) with a video fps of 30 (vs. 25 for ours). |
Hello, I would like to ask that the point label is frame-level, and the video is divided into 16 frames. So how to apply this point-level classification loss? One is a frame and the other is a segment. Looking forward to your reply. |
The segment, within which the labeled point (frame) falls, is utilized as a positive sample for the point-level loss. |
Thank you for your reply, and I wish you a happy life! |
Hello, in thumbos14, CliffDiving is a subclass of Diving, and the action instances of CliffDiving in the annotation file also belong to Diving. Why don't you use this prior knowledge to remove the action instance of CliffDiving class in the Diving class during training and add a Diving class for each predicted CliffDiving action instance during post-processing?
I think an action instance belonging to two categories may make the training difficult to converge.
The text was updated successfully, but these errors were encountered: