Knowledge Distillation for Machine Translation

Zhen Li; Dan Qu; Chaojie Xie; Xuejuan Wei

doi:10.23977/csic.2018.0933

Knowledge Distillation for Machine Translation

Download as PDF

DOI: 10.23977/csic.2018.0933

Author(s)

Zhen Li, Dan Qu, Chaojie Xie, Xuejuan Wei

Corresponding Author

Zhen Li

ABSTRACT

Encoder-to-Decoder is a newly architecture for Neural Machine Translation (NMT). Convolutional Neural Network (CNN) based on this framework has gained significant success in NMT task. Challenges remain in the practical use of CNN model, which is in need of bilingual sentence pairs for training and each bilingual data is designed for CNN translation model needing retraining. Although some successful performance has been reported, it is an important research direction to avoid model overfitting caused by the scarcity of parallel corpus. The paper introduces a simple and efficient knowledge distillation method for regularization to solve CNN training overfitting problems by transferring the knowledge of source model to adapted model on low-resource languages in NMT task. The experiment on English-Czech dataset result shows that our model solve the over fitting problem, get better generalization, and improve the performance of a low-resource languages translation task.

KEYWORDS

Neural Machine Translation (Nmt), Convolutional Neural Network (Cnn), Knowledge Distillation, Encoder-Decoder, Low-Resource

Knowledge Distillation for Machine Translation

Author(s)

Corresponding Author

ABSTRACT

KEYWORDS

RESOURCES

JOIN US

PUBLICATION SERVICES

CONTACT US