详细信息
基于BiLSTM-CRF的中医医案命名实体识别 被引量:10
Named Entity Recognition of TCM Medical Records Based on BiLSTM-CRF
文献类型:期刊文献
中文题名:基于BiLSTM-CRF的中医医案命名实体识别
英文题名:Named Entity Recognition of TCM Medical Records Based on BiLSTM-CRF
作者:羊艳玲[1];李燕[1];钟昕妤[1];徐丽娜[1]
第一作者:羊艳玲
机构:[1]甘肃中医药大学信息工程学院,甘肃兰州730000
第一机构:甘肃中医药大学信息工程学院(教育技术中心)
年份:2021
卷号:38
期号:11
起止页码:15
中文期刊名:中医药信息
外文期刊名:Information on Traditional Chinese Medicine
基金:国家中医药管理局项目(2305181101):甘肃省基层医疗卫生机构中医诊疗区健康信息平台。
语种:中文
中文关键词:BiLSTM-CRF模型;命名实体识别;中医医案;信息抽取
外文关键词:BiLSTM-CRF model;Named entity recognition;TCM medical records;Information extraction
摘要:目的:针对中医医案中分词困难及实体种类繁多、歧义等难点,提出基于双向长短期记忆(BiLSTM)网络和条件随机场(CRF)的深度学习混合模型命名实体识别方法。方法:基于人工标注的名老中医诊断高血压医案构建BiLSTM-CRF模型进行命名实体识别,根据深度学习模型特点,该模型使用BiLSTM作为特征提取器,使用CRF进行序列标注,通过对语料集进行标注,对医案中疾病、症状、证候、治法和处方五类实体进行命名实体类别识别。结果:对整理的435份医案进行序列标注,基于向量构建从而进行命名实体类别识别。经过增加轮次后,综合测试实验结果精确率为81.3%,准确率达到90.13%;在各类别识别中,疾病精确率为73.87%,症状精确率为75.93%,证候精确率为72.33%,治法精确率为68.13%,处方精确率最高达到89.15%。结论:利用BiLSTM-CRF模型能够有效实现中医医案命名实体类别识别,有效提高了中医医案的实体识别准确率,为临床诊断提供有效数据支持。
Objective:To propose a named entity recognition(NER)method of deep learning hybrid model based on Bi-directory long short-term memory(BiLSTM)network and conditional random field(CRF)considering the difficulties of word classification,entity variety and ambiguity in TCM medical records.Methods:According to the characteristics of the depth model,the BiLSTM-CRF model was constructed based on the manually marked hypertension medical records of the veteran TCM diagnosis section for NER.The model used BiLSTM as the feature extractor and CRF for sequence tagging.By tagging the corpus set,the named entity categories of diseases,symptoms,syndromes,treatments and prescriptions in the medical records were identified in the study.Results:435 medical records were sequentially tagged and named entity categories were identified based on vector construction.After increasing the number of rounds,the precision rate of the comprehensive test results was 81.3%,and the accuracy rate was 90.13%.In all types of identification,the disease precision rate was 73.87%,the symptom precision rate was 75.93%,the syndrome precision rate was 72.33%,the treatment precision rate was 68.13%,and the prescription precision rate was up to 89.15%.Conclusion:Using BiLSTM-CRF model can effectively realize the category recognition of named entities of TCM medical records and greatly improve its accuracy rate,which can provide effective data support for clinical diagnosis.
参考文献:
正在载入数据...