详细信息
文献类型:期刊文献
中文题名:中文分词技术研究进展综述
英文题名:A Summary of the Research Progress of Chinese Word Segmentation Technology
作者:钟昕妤[1];李燕[1]
第一作者:钟昕妤
机构:[1]甘肃中医药大学信息工程学院,甘肃兰州730101
第一机构:甘肃中医药大学信息工程学院(教育技术中心)
年份:2023
卷号:22
期号:2
起止页码:225
中文期刊名:软件导刊
外文期刊名:Software Guide
基金:甘肃中医药大学研究生创新基金项目(2022CX137)。
语种:中文
中文关键词:中文分词;深度学习;语料依赖;多领域分词
外文关键词:Chinese word segmentation;deep learning;corpus dependence;multi-domain participle
摘要:中文分词作为实现机器处理中文的一项基础任务,是近几年的研究热点之一。其结果对后续处理任务具有深远影响,具备充分的研究意义。通过对近5年分词技术研究文献的综合分析,明晰后续研究将以基于神经网络模型的融合方法为主导,进一步追求更精准高效的分词表现。而在分词技术的发展与普及应用中,亦存在着制约其性能的各项瓶颈。除传统的歧义和未登录词问题外,分词还面临着语料规模质量依赖和多领域分词等新难题,针对这些新问题的突破研究将成为后续研究的重点之一。
As a basic task of machine processing, Chinese word segmentation is one of the research hotspots in recent years. The results have a far-reaching impact on the follow-up processing tasks, and are of full research significance. Through the comprehensive analysis of the research literature on word segmentation technology in the past five years, it is clear that the follow-up research will be dominated by the fusion method based on neural network model, and further pursue more accurate and efficient word segmentation performance. In the development and application of word segmentation technology, there are also various bottlenecks restricting its performance. In addition to the traditional ambiguity and unknown words, word segmentation is now faced with new problems such as corpus scale and quality dependence and multi-domain word segmentation. The breakthrough research on these new problems will become one of the focuses of the follow-up research.
参考文献:
正在载入数据...