SdP-Net - SlapDash Net

This is actually a less serious weekend project called SlapDash-Net which can be considered a less serious variation on VIT architecture. We use some encoder type transformer layers, together with some register tokens. Prior to encoder encoder layer we introduce some convolution layers in a highly slapdash manner. The dudes will be trained on ImageNet1k/22k dataset.

Our motto in coming up with SdP-Net:

No promise to get very high accuracy,
No prior assumption that exactly the same idea might have been used elsewhere,
No attempt to tweak hyperparameters more than needed,
We would like to hybridize things,
We do bizzare combinations for the mere reason: because we would like to!!!,
In SdP-Net we trust!

Training Details

#Size	#Params	#Blocks	Patch_size	Conv_Size	Embed_Dim	Top1 Acc
XXS	55M	7	16	7	128	?
S	76M	12	16	7	512	?
M	86M	12	16	7	768	?
L	86M	12	16	7	768	?
XL	86M	15	16	7	768	79.8

Bitter lesson: The biggest model gives 79.8 acc on Imagenet1k. Still training the others on time permitting.

Optimizers

AdamW: lr = 0.001875 (=0.001*batch_size/512) Weight decay 0.05 CosineAnnealing with warm starts in addition to 5 warming up epochs.

Augmentation and Regularization

RandAugment + Random erase + Random resize+ CutMix + MixUp + Dropout(0.2) (Only to FFN parts of Attention layers)

Augmentation and Regularization

#TODO

Gating mechanism in FFN?
EMA Model (This is important for future use!!!)
Gradient Accumulation -- larger learning rate (ok!!!)
Register tokens (VITs need registers)
Stochastic Depth (Further research is needed!!!)
No more batchnorm layers (Layer norm is implemented here!!!)
If possible binary loss - instead of cross-entropy loss (Resnet strikes back!!!)
Write kind-of-a unit-test for intermediate activations!!! (ok!!!)
Write trainer class from scratch -- if possible do some subclassing kinda thing!!!
Use KeLü activation instead of GeLu (KeLü implemented but may not be really optimized!)

Name		Name	Last commit message	Last commit date
Latest commit History 306 Commits
README.md		README.md
cifar100_test.py		cifar100_test.py
dataset_generator.py		dataset_generator.py
hf_dataset_generator.py		hf_dataset_generator.py
layers.py		layers.py
lr.ipynb		lr.ipynb
model.py		model.py
model_config_vit.yaml		model_config_vit.yaml
model_train.py		model_train.py
training_tools.py		training_tools.py
training_utilities.py		training_utilities.py
utility_layers.py		utility_layers.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SdP-Net - SlapDash Net

Our motto in coming up with SdP-Net:

Training Details

Optimizers

Augmentation and Regularization

Augmentation and Regularization

About

Releases

Packages

Languages

y-akbal/SdP-Net

Folders and files

Latest commit

History

Repository files navigation

SdP-Net - SlapDash Net

Our motto in coming up with SdP-Net:

Training Details

Optimizers

Augmentation and Regularization

Augmentation and Regularization

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages