Education, Science, Technology, Innovation and Life
Open Access
Sign In

Knowledge Distillation: A Free-teacher Framework Driven by Word-Vector

Download as PDF

DOI: 10.23977/CNCI2020023

Author(s)

Chuyi Zou

Corresponding Author

Chuyi Zou

ABSTRACT

Knowledge distillation (KD) is an effective method to transferring knowledge from a larger teacher network to a small student network, in order to enhance the generalization ability of the small student network, which satisfies the low-memory and fast running requirements in practice. Existing KD methods often require a pre-trained teacher as a first step to discover useful knowledge, then subsequently transferring knowledge to student network. However, this procedure is a two training complex stages, requiring an expensive computational cost for a pre-trained teacher. In this paper, we propose a free-teacher framework driven by word-vector to address this limitation. By utilizing existing word vector packets (such as 'GoogleNews-vectors-negative300', etc.), we are committed to create a semantic similarity matrix. This matrix provides the additional soft label which is similar to conventional teacher model’s outputs, while does not require any extra training cost. Extensive evaluations show that our approach improve the generalization performance of a variety of deep neural networks competitive to alternative methods on two image classification datasets: CIFAR10 and CIFAR100, whilst not requiring extra expensive training cost.

KEYWORDS

Knowledge Distillation; Classification; Deep Learning

All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.