Through knowledge distillation method, a student model can imitate the output of a teacher model to improve its generalization ability without changing the computational complexity. However, in existing knowledge distillation research, the efficiency of knowledge transfer is still not satisfactory, especially from pre-trained language models (PTMs) like Robustly optimized BERT approach (RoBERTa) to another structure student model. To address this issue, this paper proposes a prediction framework (RTLSTM) for Chinese emotion classification based on knowledge distillation. In RTLSTM, a new triple loss strategy is proposed for training a student ‘BiLSTM’, which combines supervised learning, distillation and word vector losses. This strategy enables the student to learn more fully from a teacher model RoBERTa and retains 99% of the teacher models’ language understanding capability. We carried out emotion classification experiments on five Chinese datasets to compare RTLSTM with baseline models. The experiment results show that RTLSTM outperforms the baseline models belonging to the RNN group in terms of prediction performance under similar numbers of parameters. Moreover, RTLSTM is superior to the PTMs group baseline models through 92% fewer parameters and 83% less prediction time under comparable prediction performance.
See how this article has been cited at scite.ai
scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.