This repo contains annotated research papers. These notes/annotations were prepared using Logseq.
- ImageNet Classification with Deep Convolutional Neural Networks [Paper]
- Spatial Transformer Networks [Paper]
- An Image is 16x16 words: Transformers for Image Recognition at Scale [Paper]
- MLP-Mixer: An all-MLP Architecture for Vision [Paper]
- Swin Transformer: Hierarchical Vision Transformer using Shifted Windows [Paper]
- You Only Look Once: Unified, Real-Time Object Detection [Paper]
- YOLO9000: Better, Faster, Stronger [Paper]
- Focal Loss for Dense Object Detection [Paper]
- DeepFace: Closing the Gap to Human- Level Performance in Face Verification [Paper]
- FaceNet: A Unified Embedding for Face Recognition and Clustering [Paper]
- A Discriminative Feature Learning Approach for Deep Face Recognition [Paper]
- Learning Deep Features for Discriminative Localization [Paper]
- Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization [Paper]