Skip to content

y-akbal/SdP-Net

Repository files navigation

SdP-Net - SlapDash Net

This is actually a less serious weekend project called SlapDash-Net which can be considered a less serious variation on VIT architecture. We use some encoder type transformer layers, together with some register tokens. Prior to encoder encoder layer we introduce some convolution layers in a highly slapdash manner. The dudes will be trained on ImageNet1k/22k dataset.

Our motto in coming up with SdP-Net:

  • No promise to get very high accuracy,
  • No prior assumption that exactly the same idea might have been used elsewhere,
  • No attempt to tweak hyperparameters more than needed,
  • We would like to hybridize things,
  • We do bizzare combinations for the mere reason: because we would like to!!!,
  • In SdP-Net we trust!

Training Details

#Size #Params #Blocks Patch_size Conv_Size Embed_Dim Top1 Acc
XXS 55M 7 16 7 128 ?
S 76M 12 16 7 512 ?
M 86M 12 16 7 768 ?
L 86M 12 16 7 768 ?
XL 86M 15 16 7 768 79.8

Bitter lesson: The biggest model gives 79.8 acc on Imagenet1k. Still training the others on time permitting.

Optimizers

AdamW: lr = 0.001875 (=0.001*batch_size/512) Weight decay 0.05 CosineAnnealing with warm starts in addition to 5 warming up epochs.

Augmentation and Regularization

RandAugment + Random erase + Random resize+ CutMix + MixUp + Dropout(0.2) (Only to FFN parts of Attention layers)

Augmentation and Regularization

#TODO

  1. Gating mechanism in FFN?
  2. EMA Model (This is important for future use!!!)
  3. Gradient Accumulation -- larger learning rate (ok!!!)
  4. Register tokens (VITs need registers)
  5. Stochastic Depth (Further research is needed!!!)
  6. No more batchnorm layers (Layer norm is implemented here!!!)
  7. If possible binary loss - instead of cross-entropy loss (Resnet strikes back!!!)
  8. Write kind-of-a unit-test for intermediate activations!!! (ok!!!)
  9. Write trainer class from scratch -- if possible do some subclassing kinda thing!!!
  10. Use KeLü activation instead of GeLu (KeLü implemented but may not be really optimized!)

About

An Experimental Attention-Based ConvNet

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published