A Fast Knowledge Distillation Framework for Visual Recognition
ECCV 2022

Zhiqiang Shen Eric Xing

Carnegie Mellon University
Mohamed bin Zayed University of Artificial Intelligence

[Paper]

[Bibtex]

[PyTorch Code]

Abstract

Knowledge Distillation (KD) has been recognized as a useful tool in many visual tasks, such as the supervised classification and self-supervised representation learning, while the main drawback of a vanilla KD framework lies in its mechanism that most of the computational overhead is consumed on forwarding through the giant teacher networks, which makes the whole learning procedure in a low-efficient and costly manner. Recent solution ReLabel proposes to generate a label map for the global image. During training, it obtains the cropped region-level label by RoI align on the pre-generated complete label map, which can obtain the supervision efficiently without forwarding the teachers multiple times. However, as the teachers used for KD are from the regular multi-crop training, this strategy exists several mismatches between the global label-map and region-level label, which leads to performance decline. In this paper, we propose a Fast Knowledge Distillation (FKD) framework that simulates the distillation training phase and generates soft labels following the multi-crop KD procedure, meanwhile enjoying the faster training speed than ReLabel as we have no post-processes like RoI align and softmax operations. Our FKD is even more efficient than the conventional classification framework when employing multi-crop in the same image for data loading. We achieve 80.1% using ResNet-50 on ImageNet-1K, outperforming ReLabel by 1.2% while being faster. We also demonstrate the efficiency advantage of FKD on the self-supervised learning task.

Paper

Z. Shen and E. Xing.
A Fast Knowledge Distillation Framework for Visual Recognition.
ECCV 2022.

[Paper] | [Bibtex] | [GitHub (Supervised Learning)] | [GitHub (Self-supervised Learning)] | [GitHub (Fast MEAL V2)]

Download (soft labels):

1. Supervised Learning on ImageNet-1K [Github]

Soft Labels from EfficientNet-L2-ns-475 pre-trained model:

More soft labels from stronger networks will be available soon.

⇒ Marginal Smoothing with Top-5 (200-crop) [9.5G]

⇒ Marginal Smoothing with Top-5 (500-crop) [23.6G] (Recommended)
[fp16 version on Google Drive (13G)]
[fp16 version on OneDrive (13G)]

Fast MEAL V2 [Github]:

Soft Labels from SENet154 + ResNet152 v1s ensemble (same as MEAL V2 paper):

2. Self-Supervised Learning on ImageNet-1K [Github]

Soft Labels from MoCo V2-800ep and SwAV/RN50-w4 pre-trained model:

⇒ Full Label (400-crop) [TBA]

Please cite our paper if you use our code and soft labels, or find this help your research:

@article{shen2021afast,
    title = {A Fast Knowledge Distillation Framework for Visual Recognition},
    author = {Shen, Zhiqiang and Xing, Eric},
    journal = {arXiv preprint arXiv:2112.01528},
    year = {2021}
}

Comments, questions to Zhiqiang Shen

*Website template from Swapping Autoencoder.

Abstract

Paper

Download (soft labels):

1. Supervised Learning on ImageNet-1K [Github]

Soft Labels from EfficientNet-L2-ns-475 pre-trained model:

More soft labels from stronger networks will be available soon.

⇒ Marginal Smoothing with Top-5 (200-crop) [9.5G]

⇒ Marginal Smoothing with Top-5 (500-crop) [23.6G] (Recommended)
[fp16 version on Google Drive (13G)]
[fp16 version on OneDrive (13G)]

⇒ Marginal Smoothing with Top-10 (500-crop) [40.0G]

⇒ Hardening (500-crop) [6.5G]

⇒ Smoothing (500-crop) [9.3G]

Fast MEAL V2 [Github]:

Soft Labels from SENet154 + ResNet152 v1s ensemble (same as MEAL V2 paper):

⇒ Marginal Smoothing with Top-5 (200-crop) [9.8G]

⇒ Marginal Smoothing with Top-10 (200-crop) [17.1G]

⇒ Hardening (200-crop) [2.9G]

⇒ Smoothing (200-crop) [4.0G]

2. Self-Supervised Learning on ImageNet-1K [Github]

Soft Labels from MoCo V2-800ep and SwAV/RN50-w4 pre-trained model:

⇒ Full Label (400-crop) [TBA]

Abstract

Paper

Download (soft labels):

1. Supervised Learning on ImageNet-1K [Github]

Soft Labels from EfficientNet-L2-ns-475 pre-trained model:

More soft labels from stronger networks will be available soon.

⇒ Marginal Smoothing with Top-5 (200-crop) [9.5G]

⇒ Marginal Smoothing with Top-5 (500-crop) [23.6G] (Recommended) [fp16 version on Google Drive (13G)] [fp16 version on OneDrive (13G)]

⇒ Marginal Smoothing with Top-10 (500-crop) [40.0G]

⇒ Hardening (500-crop) [6.5G]

⇒ Smoothing (500-crop) [9.3G]

Fast MEAL V2 [Github]:

Soft Labels from SENet154 + ResNet152 v1s ensemble (same as MEAL V2 paper):

⇒ Marginal Smoothing with Top-5 (200-crop) [9.8G]

⇒ Marginal Smoothing with Top-10 (200-crop) [17.1G]

⇒ Hardening (200-crop) [2.9G]

⇒ Smoothing (200-crop) [4.0G]

2. Self-Supervised Learning on ImageNet-1K [Github]

Soft Labels from MoCo V2-800ep and SwAV/RN50-w4 pre-trained model:

⇒ Full Label (400-crop) [TBA]

⇒ Marginal Smoothing with Top-5 (500-crop) [23.6G] (Recommended)
[fp16 version on Google Drive (13G)]
[fp16 version on OneDrive (13G)]