国防科技大学
国防科技大学计算机学院
1007-130X
43-1258/TP
1973
计算机工程与科学
王志英
月刊
1-3个月
19216
42-153
¥796.00
0.9643
410073
虽然篇章级神经机器翻译发展多年,并取得了长足的进步,但是其大部分工作都是从模型的角度出发,利用上下文字词信息来构建有效的网络结构,忽视了使用跨句子的篇章结构和修辞信息对模型进行指导。针对这一问题,在修辞结构理论的指导下,提出了对篇章单元和修辞结构树特征分别进行编码的方法。实验结果表明,所提方法加强了编码器对篇章结构和修辞上的表征能力,使用该方法对模型进行改进后,其翻译结果在多个数据集上都获得了明显提升,性能超过了多个优质的基线模型,并且在提出的定量评估方法和人工分析中译文质量上也表现出了明显改善。
Despite years of development and significant progress in document-level neural machine translation, most efforts have focused on building effective network structures from a model perspective by utilizing contextual word information, neglecting the guidance of cross-sentence discourse structure and rhetorical information for the model. Addressing this issue, under the guidance of Rhetorical Structure Theory, a method for separately encoding discourse units and rhetorical structure tree features is proposed. Experimental results show that the proposed method enhances the encoders ability to represent discourse structure and rhetorical aspects. The improved model surpasses several high-quality baseline models, achieving notable improvements in translation performance across multiple datasets. Additionally, significant improvements in translation quality are demonstrated through the proposed quantitative evaluation method and human analysis.
相关文章
[1] | 张迎晨, 高盛祥, 余正涛, 王振晗, 毛存礼, . 融合BERT与词嵌入双重表征的汉越神经机器翻译方法[J]. 计算机工程与科学, 2023, 45(03): 546-553. |