Skip to content

Latest commit

 

History

History
22 lines (15 loc) · 364 Bytes

README.md

File metadata and controls

22 lines (15 loc) · 364 Bytes

Deep-Model-Play

reproduce for the modern basic deep model

Model

  • Attention
  • Multi-Head Attention
  • GPT-2

Optimizer

This following the custom optimizer with my understand. Your issue and question is welcomed!

  • SGD
  • Momentum SGD
  • Nestrov SGD
  • Adam
  • Nadam
  • Adamw,(but maybe some bug here)

no weight decay supported. This will be added soon!