纺织学报 ›› 2018, Vol. 39 ›› Issue (10): 156-161.doi: 10.13475/j.fzxb.20171010106
摘要:
针对目前网络家纺资源采集方式在处理海量网络资源尤其是深网资源时效率低下的问题,提出了一种自动化的网络家纺资源抽取方法。该方法首先根据查询接口属性有限性和收敛性的特征,构建领域模型对深网查询接口进行识别,然后利用家纺领域关键词自动填写查询接口,抽取深网家纺资源;对于返回的查询页面,为过滤与抽取与主题无关的噪声信息,对页面进行视觉分块,利用标记的分块样本数据训练分块重要度模型,并利用该模型过滤与主题无关的噪声信息。实验结果表明,领域模型识别深网查询接口的阳性预测值和准确率比基于规则的方法分别提高了8%和6%,分块重要度模型过滤噪声的准确率和召回率的调和平均数值在3 个等级上比基于规则方法的正确率平均提高了12.90%。
[1] | 郭春花. 纺织“十三五”蓝图初绘 访中国纺织工业联合会副会长孙瑞哲[J]. 纺织服装周刊,2016,(02):16-17. |
GUO Chunhua. Textile "13th five-year" blueprint: Inter-view with Sun Ruizhe, vice president of China Textile In-dustry Association [J]. Textiles and clothing week-ly,2016,(02):16-17. | |
[2] | 战洪飞. 基于网格的家纺行业产品协同设计[J]. 纺织学报,2009,30(08):138-142. |
ZHAN H F. Study on grid based product collaborative de-sign for home textile enterprises [J]. Journal of | |
Textile Research,2009,30(8):138-142. | |
[3] | 曹飞. 家纺床品数据库查询系统的研究与实现[D]. 苏州大学, 2011. |
CAO Fei. The Research and Implementation of Home Textile Bedding Database Query System[D]. Soochow University, 2011. | |
[4] | ZHENG Q H, WU Z H, CHENG X C, et al. Learning to crawl deep web [J]. Information Systems, 38(6): 801-819. |
[5] | Jan Zeleny, Radek Burget, Jaroslav Zendulka. Box cluster-ing segmentation: A new method for vision-based web page preprocessing[J]. Information Processing & Man-agement, 2017, 53(3): 735-750. |
[6] | Fayzrakhmanov R R. Information Extraction from Web Pages Based on Their Visual Representation[M]Current Trends in Web Engineering. Springer Berlin Heidelberg, 2011:342-346. |
[7] | Seung Min Kim, Suk I. Yoo. DOM tree browsing of a very large XML document: Design and implementation [J]. Journal of Systems and Software, 82(11): 1843-1858. |
[8] | Maksim Lapin, Matthias Hein, Bernt Schiele. Learning using privileged information: SVM+ and weighted SVM[J]. Neural Networks, 53: 95-108. |
[9] | FU Y, YANG D Q, TANG S W. Using Xpath to discover informative content blocks of web pages[C]//Proceedings of the Third International Conference on Semantics, Knowledge and Grid, Shan Xi;2007:450-453. |
[1] | 龚建培. 家用纺织品材质再设计[J]. 纺织学报, 2005, 26(3): 153-155. |
[2] | 李栋高. 家纺业的发展需要新理念的推动[J]. 纺织学报, 2003, 24(01): 79-79. |
|