Official implementation for paper "Knowledge Diffusion for Distillation" (DiffKD), NeurIPS 2023
git clone https://github.com/hunto/DiffKD.git --recurse-submodules
cd DiffKD
The implementation of DiffKD is in classification/lib/models/losses/diffkd.
- classification: prepare your environment and datasets following the
README.md
inclassification
.
cd classification
sh tools/dist_train.sh 8 ${CONFIG} ${MODEL} --teacher-model ${T_MODEL} --experiment ${EXP_NAME}
Example script for reproducing DiffKD on ResNet-34 teacher and ResNet-18 student with B1 baseline setting:
sh tools/dist_train.sh 8 configs/strategies/distill/diffkd/diffkd_b1.yaml tv_resnet18 --teacher-model tv_resnet34 --experiment diffkd_res34_res18
- Baseline settings (
R34-R18
andR50-MBV1
):CONFIG=configs/strategies/distill/TODO
Student Teacher DiffKD MODEL T_MODEL Log Ckpt ResNet-18 (69.76) ResNet-34 (73.31) 72.20 tv_resnet18
tv_resnet34
log ckpt MobileNet V1 (70.13) ResNet-50 (76.16) 73.24 mobilenet_v1
tv_resnet50
to be reproduced
This project is released under the Apache 2.0 license.
@article{huang2023knowledge,
title={Knowledge Diffusion for Distillation},
author={Huang, Tao and Zhang, Yuan and Zheng, Mingkai and You, Shan and Wang, Fei and Qian, Chen and Xu, Chang},
journal={arXiv preprint arXiv:2305.15712},
year={2023}
}