CIFAR100 (P)

A brand new activation function: KeLü (Keen Learning Unit):

It has continuous derivative at 0, while its third derivative has singularities at -3.5 and 3.5. Furthermore, compared to GELU it decays zero a bit faster. Both Flux and Jax implementations are included. For comparison we implement three well known networks.

Both Jax and Flux directories include implementation of the papers with respective abbreviations:

Patches are all you need --- P

ResNet20 --- R20 P

GPT2 --- GPT2

All the above models are trained with standard augmentation techniques (See SdP Repo).

CIFAR100 (P)

#Act.	Depth	Patch_size	Kernel_size	Embed_Dim	Acc	Loss
Relu	8	2	5	384	77.79	1.075
Gelu	8	2	5	384	78.04	1.083
Swish	8	2	5	384	78.26	1.052
KeLu	8	2	5	384	78.53	1.043
KeLu	12	2	5	384	79.63	0.9787
Gelu	12	2	5	384	79.14	0.9995

CIFAR10 (P)

#Act.	Depth	Patch_size	Kernel_size	Embed_Dim	Acc	Loss
Relu	8	2	5	256	93.16	0.4382
Gelu	8	2	5	256	93.23	0.4281
KeLu	8	2	5	256	93.44	0.4274

Note: For 150 epoch training, I am not able to reproduce the aforementioned results in "Patches are all you need article" for CIFAR10. This is probably due to penalization methods.

ImageNet1K (64x64) (R20) (Need to retrain these two one more time!)

#Act.	Acc	Loss
Gelu	78.04%	1.083
KeLu	78.53	1.043

XXS GPT2 (Character Based- Being trained on 100MB Text of Newspaper articles)

#Params	Embed_Dim	#Heads	#Blocks	KeLu - Val. Loss	gelu - Val. Loss
55M	384	6	10

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Flux		Flux
Jax		Jax
assets		assets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CIFAR100 (P)

CIFAR10 (P)

ImageNet1K (64x64) (R20) (Need to retrain these two one more time!)

XXS GPT2 (Character Based- Being trained on 100MB Text of Newspaper articles)

About

Releases

Packages

Languages

y-akbal/KeLu

Folders and files

Latest commit

History

Repository files navigation

CIFAR100 (P)

CIFAR10 (P)

ImageNet1K (64x64) (R20) (Need to retrain these two one more time!)

XXS GPT2 (Character Based- Being trained on 100MB Text of Newspaper articles)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages