A curated publication list on weakly-supervised temporal action localization.
This repository was built to facilitate navigating the mainstream on weakly-supervised temporal action localization.
Please note that only accepted papers (for reliability) by conferences (for brevity) are contained here.
Last updated: 2021/09/17 (ICCV'21 added)
The mean average precisions (mAPs) under the standard intersection over union (IoU) thresholds are reported.
For example, '@0.5' indicates the mAP score at the IoU threshold of 0.5.
The AVG denotes the average mAP under the IoU thresholds from 0.1 to 0.7 (for THUMOS14) or from 0.5 to 0.95 with a step size of 0.05 (for ActivityNet both versions).
In addition, links to the implementations are attached with their framework specification if available. 'o-' and 'u-' indicate the official and the unofficial implementations, respectively.
[Note]
*: use of additional trimmed videos
†: use of additional information such as action count, pose, and audio
ID | Year | Venue | Model (or Authors) |
@0.1 | @0.2 | @0.3 | @0.4 | @0.5 | @0.6 | @0.7 | AVG | code |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2017 | CVPR | UntrimmedNets | 44.4 | 37.7 | 28.2 | 21.1 | 13.7 | - | - | - | [o-matlab] |
2 | 2017 | ICCV | Hide-and-seek | 36.4 | 27.8 | 19.5 | 12.7 | 6.8 | - | - | - | [o-torch] |
3 | 2018 | CVPR | STPN | 52.0 | 44.7 | 35.5 | 25.8 | 16.9 | 9.9 | 4.3 | 27.0 | [u-tensorflow] |
4 | 2018 | ECCV | AutoLoc | - | - | 35.8 | 29.0 | 21.2 | 13.4 | 5.8 | - | [o-caffe] |
5 | 2018 | ECCV | W-TALC | 55.2 | 49.6 | 40.1 | 31.1 | 22.8 | - | 7.6 | - | [o-pytorch] [o-tensorflow] |
6 | 2018 | MM | Zhong et al. | 45.8 | 39.0 | 31.1 | 22.5 | 15.9 | - | - | - | |
7 | 2019 | AAAI | TSRNet* | 55.9 | 46.9 | 38.3 | 28.1 | 18.6 | 11.0 | 5.6 | 29.2 | |
8 | 2019 | AAAI | STAR† | 68.8 | 60.0 | 48.7 | 34.7 | 23.0 | - | - | - | |
9 | 2019 | ICLR | MAAN | 59.8 | 50.8 | 41.1 | 30.6 | 20.3 | 12.0 | 6.9 | 31.6 | [o-pytorch] |
10 | 2019 | CVPR | Liu et al. | 57.4 | 50.8 | 41.2 | 32.1 | 23.1 | 15.0 | 7.0 | 32.4 | [o-pytorch] |
11 | 2019 | ICIP | Park et al. | - | - | 40.2 | 32.2 | 21.7 | - | 9.2 | - | |
12 | 2019 | ICIP | ACN | - | - | 35.9 | 30.7 | 24.2 | 15.7 | 7.4 | - | |
13 | 2019 | MM | ASSG | 65.6 | 59.4 | 50.4 | 38.7 | 25.4 | 15.0 | 6.6 | 37.3 | |
14 | 2019 | ICCV | CleanNet | - | - | 37.0 | 30.9 | 23.9 | 13.9 | 7.1 | - | |
15 | 2019 | ICCV | TSM | - | - | 39.5 | - | 24.5 | - | 7.1 | - | |
16 | 2019 | ICCV | 3C-Net† | 59.1 | 53.5 | 44.2 | 34.1 | 26.6 | - | 8.1 | - | [o-pytorch] |
17 | 2019 | ICCV | Nguyen et al. | 60.4 | 56.0 | 46.6 | 37.5 | 26.8 | 17.6 | 9.0 | 36.3 | |
18 | 2020 | AAAI | PreTrimNet† | 57.5 | 50.7 | 41.4 | 32.1 | 23.1 | 14.2 | 7.7 | 32.4 | |
19 | 2020 | AAAI | BaS-Net | 58.2 | 52.3 | 44.6 | 36.0 | 27.0 | 18.6 | 10.4 | 35.3 | [o-pytorch] |
20 | 2020 | AAAI | RPN | 62.3 | 57.0 | 48.2 | 37.2 | 27.9 | 16.7 | 8.1 | 36.8 | |
21 | 2020 | WACV | WSGN | 57.9 | 51.2 | 42.0 | 33.1 | 25.1 | 16.7 | 8.9 | 33.6 | |
22 | 2020 | WACV | Islam and Radke | 62.3 | - | 46.8 | - | 29.6 | - | 9.7 | - | [o-pytorch] |
23 | 2020 | WACV | Rashid et al. | 63.7 | 56.9 | 47.3 | 36.4 | 26.1 | - | - | - | [o-pytorch] |
24 | 2020 | CVPR | ActionBytes | - | - | 43.0 | 35.8 | 29.0 | - | 9.5 | - | |
25 | 2020 | CVPR | DGAM | 60.0 | 54.2 | 46.8 | 38.2 | 28.8 | 19.8 | 11.4 | 37.0 | [o-pytorch] |
26 | 2020 | CVPR | Gong et al. | - | - | 46.9 | 38.9 | 30.1 | 19.8 | 10.4 | - | [o-pytorch] |
27 | 2020 | ECCV | EM-MIL | 59.1 | 52.7 | 45.5 | 36.8 | 30.5 | 22.7 | 16.4 | 37.7 | |
28 | 2020 | ECCV | A2CL-PT | 61.2 | 56.1 | 48.1 | 39.0 | 30.1 | 19.2 | 10.6 | 37.8 | [o-pytorch] |
29 | 2020 | ECCV | TSCN | 63.4 | 57.6 | 47.8 | 37.7 | 28.7 | 19.4 | 10.2 | 37.8 | |
30 | 2020 | MM | ACM-BANet | 64.6 | 57.7 | 48.9 | 40.9 | 32.3 | 21.9 | 13.5 | 40.0 | |
31 | 2021 | WACV | RefineLoc | - | - | 40.8 | 32.7 | 23.1 | 13.3 | 5.3 | - | [o-pytorch] |
32 | 2021 | AAAI | Liu et al. | - | - | 50.8 | 41.7 | 29.6 | 20.1 | 10.7 | - | |
33 | 2021 | AAAI | ACSNet | - | - | 51.4 | 42.7 | 32.4 | 22.0 | 11.7 | - | |
34 | 2021 | AAAI | HAM-Net | 65.9 | 59.6 | 52.2 | 43.1 | 32.6 | 21.9 | 12.5 | 41.1 | [o-pytorch] |
35 | 2021 | AAAI | Lee et al. | 67.5 | 61.2 | 52.3 | 43.4 | 33.7 | 22.9 | 12.1 | 41.9 | [o-pytorch] |
37 | 2021 | CVPR | ASL | 67.0 | - | 51.8 | - | 31.1 | - | 11.4 | - | [o-pytorch] |
38 | 2021 | CVPR | CoLA | 66.2 | 59.5 | 51.5 | 41.9 | 32.2 | 22.0 | 13.1 | 40.9 | [o-pytorch] |
39 | 2021 | CVPR | AUMN | 66.2 | 61.9 | 54.9 | 44.4 | 33.3 | 20.5 | 9.0 | 41.5 | |
40 | 2021 | CVPR | TS-PCA | 67.6 | 61.1 | 53.4 | 43.4 | 34.3 | 24.7 | 13.7 | 42.6 | |
41 | 2021 | CVPR | UGCT | 69.2 | 62.9 | 55.5 | 46.5 | 35.9 | 23.8 | 11.4 | 43.6 | |
42 | 2021 | MM | CO2-Net | 70.1 | 63.6 | 54.5 | 45.7 | 38.3 | 26.4 | 13.4 | 44.6 | |
43 | 2021 | ICCV | D2-Net | 65.7 | 60.2 | 52.3 | 43.4 | 36.0 | - | - | - | [o-pytorch] |
44 | 2021 | ICCV | FAC-Net | 67.6 | 62.1 | 52.6 | 44.3 | 33.4 | 22.5 | 12.7 | 42.2 |
ID | Year | Venue | Model (or Authors) |
@0.5 | @0.75 | @0.95 | AVG | code |
---|---|---|---|---|---|---|---|---|
4 | 2018 | ECCV | AutoLoc | 27.3 | 15.1 | 3.3 | 16.0 | [o-caffe] |
5 | 2018 | ECCV | W-TALC | 37.0 | - | - | 18.0 | [o-pytorch] [o-tensorflow] |
6 | 2018 | MM | Zhong et al. | 27.3 | 14.7 | 2.9 | 15.6 | |
10 | 2019 | CVPR | Liu et al. | 36.8 | 22.0 | 5.6 | 22.4 | [o-pytorch] |
11 | 2019 | ICIP | Park et al. | 33.7 | - | - | - | |
12 | 2019 | ICIP | ACN | 30.4 | 15.4 | 3.7 | 17.0 | |
14 | 2019 | ICCV | CleanNet | 37.1 | 20.3 | 5.0 | 21.6 | |
15 | 2019 | ICCV | TSM | 28.3 | 17.0 | 3.5 | 17.1 | |
16 | 2019 | ICCV | 3C-Net† | 37.2 | - | - | 21.7 | [o-pytorch] |
19 | 2020 | AAAI | BaS-Net | 38.5 | 24.2 | 5.6 | 24.3 | [o-pytorch] |
20 | 2020 | AAAI | RPN | 37.6 | 23.9 | 5.4 | 23.3 | |
22 | 2020 | WACV | Islam and Radke | 35.2 | - | - | - | [o-pytorch] |
23 | 2020 | WACV | Rashid et al. | 29.4 | - | - | - | [o-pytorch] |
24 | 2020 | CVPR | ActionBytes | 39.4 | - | - | - | |
25 | 2020 | CVPR | DGAM | 41.0 | 23.5 | 5.3 | 24.4 | [o-pytorch] |
26 | 2020 | CVPR | Gong et al. | 40.0 | 25.0 | 4.6 | 24.6 | [o-pytorch] |
27 | 2020 | ECCV | EM-MIL | 37.4 | - | - | 20.3 | |
29 | 2020 | ECCV | TSCN | 37.6 | 23.7 | 5.7 | 23.6 | |
31 | 2021 | WACV | RefineLoc | 38.7 | 22.6 | 5.5 | 23.2 | [o-pytorch] |
32 | 2021 | AAAI | Liu et al. | 39.2 | 25.6 | 6.8 | 25.5 | |
33 | 2021 | AAAI | ACSNet | 40.1 | 26.1 | 6.8 | 26.0 | |
34 | 2021 | AAAI | HAM-Net | 41.0 | 24.8 | 5.3 | 25.1 | [o-pytorch] |
35 | 2021 | AAAI | Lee et al. | 41.2 | 25.6 | 6.0 | 25.9 | [o-pytorch] |
36 | 2021 | ICLR | Lee et al.† | 44.8 | 26.7 | 1.0 | 26.0 | |
37 | 2021 | CVPR | ASL | 40.2 | - | - | 25.8 | [o-pytorch] |
38 | 2021 | CVPR | CoLA | 42.7 | 25.7 | 5.8 | 26.1 | [o-pytorch] |
39 | 2021 | CVPR | AUMN | 42.0 | 25.0 | 5.6 | 25.5 | |
41 | 2021 | CVPR | UGCT | 41.8 | 25.3 | 5.9 | 25.8 | |
42 | 2021 | MM | CO2-Net | 43.3 | 26.3 | 5.2 | 26.4 | |
43 | 2021 | ICCV | D2-Net | 42.3 | 25.5 | 5.8 | 26.0 | [o-pytorch] |
ID | Year | Venue | Model (or Authors) |
@0.5 | @0.75 | @0.95 | AVG | code |
---|---|---|---|---|---|---|---|---|
3 | 2018 | CVPR | STPN | 29.3 | 16.9 | 2.6 | - | [u-tensorflow] |
7 | 2019 | AAAI | TSRNet* | 33.1 | 18.7 | 3.3 | 21.8 | |
8 | 2019 | AAAI | STAR† | 31.1 | 18.8 | 4.7 | - | |
9 | 2019 | ICLR | MAAN | 33.7 | 21.9 | 5.5 | - | [o-pytorch] |
10 | 2019 | CVPR | Liu et al. | 34.0 | 20.9 | 5.7 | 21.2 | [o-pytorch] |
13 | 2019 | MM | ASSG | 32.3 | 20.1 | 4.0 | - | |
15 | 2019 | ICCV | TSM | 30.3 | 19.0 | 4.5 | - | |
17 | 2019 | ICCV | Nguyen et al. | 36.4 | 19.2 | 2.9 | - | |
18 | 2020 | AAAI | PreTrimNet† | 34.8 | 20.9 | 5.3 | 22.5 | |
19 | 2020 | AAAI | BaS-Net | 34.5 | 22.5 | 4.9 | 22.2 | [o-pytorch] |
28 | 2020 | ECCV | A2CL-PT | 36.8 | 22.0 | 5.2 | 22.5 | [o-pytorch] |
29 | 2020 | ECCV | TSCN | 35.3 | 21.4 | 5.3 | 21.7 | |
30 | 2020 | MM | ACM-BANet | 37.6 | 24.7 | 6.5 | 24.4 | |
32 | 2021 | AAAI | Liu et al. | 35.1 | 23.7 | 5.6 | 23.2 | |
33 | 2021 | AAAI | ACSNet | 36.3 | 24.2 | 5.8 | 23.9 | |
35 | 2021 | AAAI | Lee et al. | 37.0 | 23.9 | 5.7 | 23.7 | [o-pytorch] |
39 | 2021 | CVPR | AUMN | 38.3 | 23.5 | 5.2 | 23.5 | |
40 | 2021 | CVPR | TS-PCA | 37.4 | 23.5 | 5.9 | 23.7 | |
41 | 2021 | CVPR | UGCT | 39.1 | 22.4 | 5.8 | 23.8 | |
44 | 2021 | ICCV | FAC-Net | 37.6 | 24.2 | 6.0 | 24.0 |
- [UntrimmedNets] | CVPR'17 | UntrimmedNets for Weakly Supervised Action Recognition and Detection |
[pdf]
|[o-matlab]
- [Hide-and-seek] | ICCV'17 | Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-supervised Object and Action Localization |
[pdf]
|[o-torch]
- [STPN] | CVPR'18 | Weakly Supervised Action Localization by Sparse Temporal Pooling Network |
[pdf]
|[u-tensorflow]
- [AutoLoc] | ECCV'18 | AutoLoc: Weakly-supervised Temporal Action Localization in Untrimmed Videos |
[pdf]
|[o-caffe]
- [W-TALC] | ECCV'18 | W-TALC: Weakly-supervised Temporal Activity Localization and Classification |
[pdf]
|[o-pytorch]
|[o-tensorflow]
- [Zhong et al.] | MM'18 | Step-by-step Erasion, One-by-one Collection: A Weakly Supervised Temporal Action Detector |
[pdf]
- [TSR-Net*] | AAAI'19 | Learning Transferable Self-attentive Representations for Action Recognition in Untrimmed Videos with Weak Supervision |
[pdf]
- [STAR†] | AAAI'19 | Learning Transferable Self-attentive Representations for Action Recognition in Untrimmed Videos with Weak Supervision |
[pdf]
- [MAAN] | ICLR'19 | Marginalized Average Attentional Network for Weakly-Supervised Learning |
[pdf]
- [Liu et al.] | CVPR'19 | Completeness Modeling and Context Separation for Weakly Supervised
Temporal Action Localization |
[pdf]
|[o-pytorch]
- [Park et al.] | ICIP'19 | Graph Regularization Network with Semantic Affinity for Weakly-Supervised Temporal Action Localization |
[pdf]
- [ACN] | ICIP'19 | Action Coherence Network for Weakly Supervised Temporal Action Localization |
[pdf]
- [ASSG] | MM'19 | Adversarial Seeded Sequence Growing for Weakly-Supervised Temporal Action Localization |
[pdf]
- [CleanNet] | ICCV'19 | Weakly Supervised Temporal Action Localization through Contrast based Evaluation Networks |
[pdf]
- [TSM] | ICCV'19 | Temporal Structure Mining for Weakly Supervised Action Detection |
[pdf]
- [3C-Net†] | ICCV'19 | 3C-Net: Category Count and Center Loss for Weakly-Supervised Action Localization |
[pdf]
|[o-pytorch]
- [Nguyen et al.] | ICCV'19 | Weakly-supervised Action Localization with Background Modeling |
[pdf]
- [PreTrimNet†] | AAAI'20 | Multi-Instance Multi-Label Action Recognition and Localization Based on Spatio-Temporal Pre-Trimming for Untrimmed Videos |
[pdf]
- [BaS-Net] | AAAI'20 | Background Suppression Network for Weakly-supervised Temporal Action Localization |
[pdf]
|[o-pytorch]
- [RPN] | AAAI'20 | Relational Prototypical Network for Weakly Supervised Temporal Action Localization |
[pdf]
- [WSGN] | WACV'20 | Weakly Supervised Gaussian Networks for Action Detection |
[pdf]
- [Islam and Radke] | WACV'20 | Weakly Supervised Temporal Action Localization Using Deep Metric Learning |
[pdf]
|[o-pytorch]
- [Rashid et al.] | WACV'20 | Action Graphs: Weakly-supervised Action Localization with Graph Convolution Networks |
[pdf]
|[o-pytorch]
- [ActionBytes] | CVPR'20 | ActionBytes: Learning from Trimmed Videos to Localize Actions |
[pdf]
- [DGAM] | CVPR'20 | Weakly-Supervised Action Localization by Generative Attention Modeling |
[pdf]
|[o-pytorch]
- [Gong et al.] | CVPR'20 | Learning Temporal Co-Attention Models for Unsupervised Video Action Localization |
[pdf]
|[o-pytorch]
- [EM-MIL] | ECCV'20 | Weakly-Supervised Action Localization with Expectation-Maximization Multi-Instance Learning |
[pdf]
- [A2CL-PT] | ECCV'20 | Adversarial Background-Aware Loss for Weakly-supervised Temporal Activity Localization |
[pdf]
|[o-pytorch]
- [TSCN] | ECCV'20 | Two-Stream Consensus Network for Weakly-Supervised Temporal Action Localization |
[pdf]
- [ACM-BANet] | MM'20 | Action Completeness Modeling with Background Aware Networks for Weakly-Supervised Temporal Action Localization |
[pdf]
- [RefineLoc] | WACV'21 | RefineLoc: Iterative Refinement for Weakly-Supervised Action Localization |
[pdf]
|[o-pytorch]
- [Liu et al.] | AAAI'21 | Weakly Supervised Temporal Action Localization Through Learning Explicit Subspaces for Action and Context |
[pdf]
- [ACSNet] | AAAI'21 | ACSNet: Action-Context Separation Network for Weakly Supervised Temporal Action Localization |
[pdf]
- [HAM-Net] | AAAI'21 | A Hybrid Attention Mechanism for Weakly-Supervised Temporal Action Localization |
[pdf]
|[o-pytorch]
- [Lee et al.] | AAAI'21 | Weakly-supervised Temporal Action Localization by Uncertainty Modeling |
[pdf]
|[o-pytorch]
- [Lee et al.†] | ICLR'21 | Cross-attentional Audio-visual Fusion for Weakly-supervised Action Localization |
[pdf]
- [ASL] | CVPR'21 | Weakly Supervised Action Selection Learning in Video |
[pdf]
|[o-pytorch]
- [CoLA] | CVPR'21 | CoLA: Weakly-Supervised Temporal Action Localization with Snippet Contrastive Learning |
[pdf]
|[o-pytorch]
- [AUMN] | CVPR'21 | Action Unit Memory Network for Weakly Supervised Temporal Action Localization |
[pdf]
- [TS-PCA] | CVPR'21 | The Blessings of Unlabeled Background in Untrimmed Videos |
[pdf]
- [UGCT] | CVPR'21 | Uncertainty Guided Collaborative Training for Weakly Supervised Temporal Action Detection |
[pdf]
- [CO2-Net] | MM'21 | Cross-modal Consensus Network for Weakly Supervised Temporal Action Localization |
[pdf]
- [D2-Net] | ICCV'21 | D2-Net: Weakly-Supervised Action Localization via Discriminative Embeddings
and Denoised Activations |
[pdf]
|[o-pytorch]
- [FAC-Net] | ICCV'21 | Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization
|
[pdf]
|[o-pytorch]
If you have any suggestions or find missing papers, please feel free to contact me.