Tag-TextRank:一种基于Tag的网页关键词抽取方法*
***,王斌,石志伟,崔雅超,李恒训
中国科学院计算技术研究所,北京,100190
E-mail: ******@ict.
摘要:关键词抽取是从文本中抽取代表性关键词的过程,在文本处理领域中具有重要的应用价值。本文
尝试利用一种近年来受到广泛关注的新的信息源--社会化标签(Tag)来提高网页关键词抽取的质量。在对
Tag 数据进行统计分析的基础上,提出了利用 Tag 进行关键词抽取的框架,并给出了一种具体的实现方法
Tag-TextRank。该方法在 TextRank 基础上,通过目标文档中的每个 Tag 引入相关文档来估计词项图的边权
重并计算得到词项的重要度,最后将不同 Tag 下的词项权重计算结果进行融合。在公开语料上的实验表明,
Tag-TextRank 在各项评价指标上均优于经典的关键词抽取方法 TextRank,并具有很好的推广性。
关键词:社会化标签;关键词抽取;TextRank;Tag-TextRank。
Tag-TextRank: a Webpage Keyword Extraction Method
Based on Tags
Li Peng, Wang Bin, Shi Zhiwei, Cui Yachao, Li Hengxun
Institute puting Technology, Chinese Academy of Sciences, Beijing 100190
E-mail: ******@ict.
Abstract: On one hand, keyword extraction is to extract representative keywords from texts and can be widely
used in most text processing applications. On the other hand, as a new information resource, tag information has
been attracting extensive attention in the past few years. This paper tried to apply tag information in webpage
keyword extraction task, and a tag based method called Tag-TextRank was proposed. By introducing relevant
documents based on each tag word of the target webpage, Tag-TextRank can estimate the edge weight and the
term importance of the term graph for the page more accurately. After that, the above tag dependent importances
for one term bined togethe
Tag-TextRank 一种基于Tag 的网页关键词抽取方法.pdf 来自淘豆网www.taodocs.com转载请标明出处.