Knowledge Distillation for Machine Translation
			
				 Download as PDF
Download as PDF
			
			
				DOI: 10.23977/csic.2018.0933			
			
				Author(s)
				Zhen Li, Dan Qu, Chaojie Xie, Xuejuan Wei
			 
			
				
Corresponding Author
				Zhen Li			
			
				
ABSTRACT
				Encoder-to-Decoder is a newly architecture for Neural Machine Translation (NMT). Convolutional Neural Network (CNN) based on this framework has gained significant success in NMT task. Challenges remain in the practical use of CNN model, which is in need of bilingual sentence pairs for training and each bilingual data is designed for CNN translation model needing retraining. Although some successful performance has been reported, it is an important research direction to avoid model overfitting caused by the scarcity of parallel corpus. The paper introduces a simple and efficient knowledge distillation method for regularization to solve CNN training overfitting problems by transferring the knowledge of source model to adapted model on low-resource languages in NMT task. The experiment on English-Czech dataset result shows that our model solve the over fitting problem, get better generalization, and improve the performance of a low-resource languages translation task.			
			
				
KEYWORDS
				Neural Machine Translation (Nmt), Convolutional Neural Network (Cnn), Knowledge Distillation, Encoder-Decoder, Low-Resource