Jinwoo's list of Computer Vision and Machine Learning papers, codes, project webs and others.
- Incremental Tube Construction for Human Action Detection - H. S. Behl et al, arXiv2017.
- Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos - R. Hou et al, arXiv2017.
- Online Real time Multiple Spatiotemporal Action Localisation and Prediction - G. Singh et al, arXiv2016.
- Multi-region two-stream R-CNN for action detection - Xiaojiang Peng and Cordelia Schmid. ECCV2016. [code]
- Spot On: Action Localization from Pointly-Supervised Proposals - P. Mettes et al, ECCV2016.
- Deep Learning for Detecting Multiple Space-Time Action Tubes in Videos - S. Saha et al, BMVC2016. [code] [project web]
- Learning to track for spatio-temporal action localization - P. Weinzaepfel et al. ICCV2015.
- Action detection by implicit intentional motion clustering - W. Chen and J. Corso, ICCV2015.
- Finding Action Tubes - G. Gkioxari and J. Malik CVPR2015. [code] [project web]
- APT: Action localization proposals from dense trajectories - J. Gemert et al, BMVC2015. [code]
- Spatio-Temporal Object Detection Proposals - D. Oneata et al, ECCV2014. [code] [project web]
- Action localization with tubelets from motion - M. Jain et al, CVPR2014.
- Spatiotemporal deformable part models for action detection - Y. Tian et al, CVPR2013. [code]
- Action localization in videos through context walk - K. Soomro et al, ICCV2015.
- Fast Action Proposals for Human Action Detection and Search - G. Yu and J. Yuan, CVPR2015. Note: code for FAP is NOT available online. Note: Aka FAP.
- DAPs: Deep Action Proposals for Action Understanding - V. Escorcia et al, ECCV2016. [code] [raw data]
- Online Action Detection using Joint Classification-Regression Recurrent Neural Networks - Y. Li et al, ECCV2016. Noe: RGB-D Action Detection
- Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs - Z. Shou et al, CVPR2016. [code] Note: Aka S-CNN.
- Fast Temporal Activity Proposals for Efficient Detection of Human Actions in Untrimmed Videos - F. Heilbron et al, CVPR2016. [code] Note: Depends on C3D, aka SparseProp.
- Actionness Estimation Using Hybrid Fully Convolutional Networks - L. Wang et al, CVPR2016. [code] Note: The code is not a complete verision. It only contains a demo, not training. [project web]
- Learning Activity Progression in LSTMs for Activity Detection and Early Detection - S. Ma et al, CVPR2016.
- End-to-end Learning of Action Detection from Frame Glimpses in Videos - S. Yeung et al, CVPR2016. [code] [project web] Note: This method uses reinforcement learning
- Fast Action Proposals for Human Action Detection and Search - G. Yu and J. Yuan, CVPR2015. Note: code for FAP is NOT available online. Note: Aka FAP.
- Bag-of-fragments: Selecting and encoding video fragments for event detection and recounting - P. Mettes et al, ICMR2015.
- Action localization in videos through context walk - K. Soomro et al, ICCV2015.
- Deep Temporal Linear Encoding Networks - A. Diva et al, arXiv 2016.
- Temporal Convolutional Networks: A Unified Approach to Action Segmentation and Detection - C. Lea et al, arXiv 2016. [code]
- Long-term Temporal Convolutions - G. Varol et al, arXiv 2016. [project web] [code]
- Temporal Segment Networks: Towards Good Practices for Deep Action Recognition - L. Wang et al, arXiv 2016. [code]
- Dynamic Image Networks for Action Recognition - H. Bilen et al, CVPR2016. [code] [project web]
- Long-term Recurrent Convolutional Networks for Visual Recognition and Description - J. Donahue et al, CVPR2015. [code] [project web]
- Describing Videos by Exploiting Temporal Structure - L. Yao et al, ICCV2015. [code] note: from the same group of RCN paper “Delving Deeper into Convolutional Networks for Learning Video Representations"
- Two-Stream SR-CNNs for Action Recognition in Videos - L. Wang et al, BMVC2016.
- Real-time Action Recognition with Enhanced Motion Vector CNNs - B. Zhang et al, CVPR2016. [code]
- Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors - L. Wang et al, CVPR2015. [code]
- Convolutional Two-Stream Network Fusion for Video Action Recognition - C. Feichtenhofer et al, CVPR2016. [code]
- Learning Spatiotemporal Features with 3D Convolutional Networks - D. Tran et al, ICCV2015. [the official Caffe code] [project web] Note: Aka C3D. [TensorFlow], [TensorFlow + Keras], [Keras C3D Project web]: [Keras code], [Pretrained weights]
- Slicing Convolutional Neural Network for Crowd Video Understanding - J. Shao et al, CVPR2016. [code]
- Two-Stream(RGB and Flow) Pretrained Model Weights
- ActivityNet Note: They provide a download script and evaluation code here .
- Charades
- THUMOS14 Note: It overlaps with UCF-101 dataset.
- THUMOS15 Note: It overlaps with UCF-101 dataset.
- HOLLYWOOD2: Spatio-Temporal annotations
- UCF-101, annotation provided by THUMOS-14, and corrupted annotation list, UCF-101 corrected annotations and different version annotaions. And there are also some pre-computed spatiotemporal action detection results
- UCF-50.
- UCF-Sports, note: the train/test split link in the official website is broken. Instead, you can download it from here.
- HMDB
- J-HMDB
- LIRIS-HARL
- KTH
- MSR Action Note: It overlaps with KTH datset.
- Sports-1M
- YouTube-8M, technical report
- YouTube-BB, technical report
- Efficiently scaling up crowdsourced video annotation - C. Vondrick et. al, IJCV2013. [code]
- The Design and Implementation of ViPER - D. Mihalcik and D. Doermann, Technical report.
- Faster R-CNN - S. Ren et al, NIPS2015. [official MatCaffe code], [PyCaffe], [TensorFlow], [Keras]
- YOLO - J. Redmon et al, CVPR2016. [official code], [TensorFLow]
- SSD - W. Liu et al, ECCV2016. [official PyCaffe code], [TensorFlow], [Keras]
- Mask R-CNN - K. He et al, [TensorFlow], [PyTorch]
- Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields - Z. Cao et al, CVPR2017. [code] depends on the [caffe RT pose]