基于上下文提取与注意力融合的遮挡服装图像分割

摘要/Abstract

图/表 10

参考文献 30

Metrics

doi:10.13475/j.fzxb.20230502601

摘要：

针对遮挡服装图像分割准确率低的问题,提出一种融合上下文提取与注意力机制的遮挡服装图像实例分割方法。以Mask R-CNN为基础网络,首先采用上下文提取模块优化ResNet的输出特征,通过融合不同速率的多路径特征从多个感受野中捕获图像的上下文信息,强化遮挡服装特征表示的识别及提取能力;然后引入通道注意力机制与空间注意力机制的残差连接,自适应地专注于捕捉遮挡服装图像的空间和通道维度上的语义相互依赖关系,降低上下文提取模块在处理特征图时因冗余的上下文关系扩大造成误定位与误识别的概率;最后,采用目标检测损失函数CIoU计算原理作为非极大值抑制的评判标准,关注预测框和真实框的重叠与非重叠区域,最大程度地选择遮挡服装的最优目标框,使预测框更加贴近真实框。结果表明,与其它方法相比,改进方法显著改善了不同遮挡程度服装图像的误分割现象,能提取出更精确的服装实例,其对遮挡服装图像的平均分割精度比原模型提升了4.4%。

关键词: 图像分割, 遮挡服装, 上下文提取, 注意力机制, CIoU计算原理

Abstract:

Objective Visual analysis of clothing attracts attention, while convenitional methods for clothing parsing fail to capture richer information about clothing details due to various factors including complex backgrounds and mutual occlusion of clothing. Therefore, a novel clothing image instance segmentation method is proposed to effectively extract and segment the multi-pose and mutually occluded target clothing in complex scenes for the subsequent processing of clothing analysis, retrieval, and other tasks to better meet targeted needs for personalized clothing design, retrieval, and matching.

Method The output features of ResNet were optimized by using a context extraction module to enhance the recognition and extraction of feature representations of occlusive clothing. Then the attention mechanism of residual connectivity was introduced to adaptively focus on capturing the semantic inter-dependencies in the spatial and channel dimensions of occlusive clothing images. As the last step, CIoU computational principle was used as the criterion for non-maximal suppression, while focusing on the overlapping and non-overlapping regions of the predicted box and the real box to select the optimal target box that covers the occlusive clothing to the fullest extent.

Results In qualitative comparison with Mask R-CNN as well as Mask Scoring R-CNN and YoLact methods, the proposed method showed stronger mask perception and inference ability, effectively decoupling the overlapping relationship between masked garment instances with more accurate segmentation visual effect. In addition, accuracy (A_P) was used as an evaluation index for further quantitative analysis of the improved model, and the segmentation accuracy APm under different IoU was 49.3%, which was 3.6% higher than the original model. Meanwhile, by comparing the segmentation accuracy of each improved model for different occlusion degrees, it was seen that the Mask R-CNN model had the lowest segmentation accuracy for various occlusion degrees, while with the optimization of CEM, AM and CIoU strategy, the accuracy of the improved model in minor occlusion A_PL1, moderate occlusion A_PL2 and severe occlusion A_PL3 was improved by 4.3%, 4.2% and 4.8%, respectively, and the most significant improvement in segmentation accuracy was for severely occluded clothing. Finally, the accuracy of the proposed method was compared with that of Mask R-CNN, Mask Scoring R-CNN, SOLOv1, and Yolact. The overall accuracy of Yolact model for segmenting clothing with different degrees of occlusion was slightly lower, the overall accuracy of Mask Scoring R-CNN for segmenting clothing was slightly higher than that of Mask R-CNN, and SOLOv1 achieved similar segmentation accuracy as Mask R-CNN. The accuracy of the proposed method was significantly better than that of other methods for segmentation of garments with different occlusion degrees, where A_PL3 for segmentation of severely occlusive clothing was improved the most, which was 4.8% higher than Mask R-CNN and 4.2%-11.1% higher than other models.

Conclusion By embedding the context extraction module, attention mechanism module, and CIoU computation strategy into Mask R-CNN network, a novel clothing instance segmentation model is constructed, with enhanced recognition and extraction ability of the model for clothing features. The semantic inter-dependencies between masked clothing feature maps in spatial and channel dimensions are captured, and the segmentation accuracy for each clothing is improved. The optimal target frame is predicted for each clothing instance, which improves the accuracy of the model for segmenting occlusive clothing instances. Through a series of comprehensive experiments, the feasibility and effectiveness of the proposed method are proved, providing a new idea for the research of clothing image instance segmentation.

Key words: image segmentation, occlusive clothing, context extraction, attention mechanism, CIoU computational principle

中图分类号:

TS941.2

顾梅花, 花玮, 董晓晓, 张晓丹. 基于上下文提取与注意力融合的遮挡服装图像分割[J]. 纺织学报, 2024, 45(05): 155-164.

GU Meihua, HUA Wei, DONG Xiaoxiao, ZHANG Xiaodan. Occlusive clothing image segmentation based on context extraction and attention fusion[J]. Journal of Textile Research, 2024, 45(05): 155-164.

导出引用管理器 EndNote|Reference Manager|ProCite|BibTeX|RefWorks

链接本文: http://www.fzxb.org.cn/CN/10.13475/j.fzxb.20230502601

http://www.fzxb.org.cn/CN/Y2024/V45/I05/155

图1

图2

图3

图4

图5

表1

表2

图6

图7

表3

[1]	庹武, 王晓玉, 高雅昆, 等. 基于改进边缘检测算法的服装款式识别[J]. 纺织学报, 2021, 42(10): 157-162. doi: 10.13475/j.fzxb.20201205006
	TOU Wu, WANG Xiaoyu, GAO Yakun, et al. Apparel style recognition based on improved edge detection algorithm[J]. Journal of Textile Research, 2021, 42(10): 157-162.
[2]	吴传彬, 刘骊, 付晓东, 等. 结合显著区域检测和手绘草图的服装图像检索[J]. 纺织学报, 2019, 40(7): 174-181.
	WU Chuanbin, LIU Li, FU Xiaodong, et al. Combining significant region detection and hand sketching for garment image retrieval[J]. Journal of Textile Research, 2019, 40(7): 174-181.
[3]	KURNIA R, HERYANI W. Fashion harmony of blouse color and pants/skirts for women clothing using fuzzy logic[C]// International Conference on Multimedia and Image Processing. Brunei: IEEE, 2016: 57-60.
[4]	ZHAO B, WU X, PENG Q, et al. Clothing cosegmentation for shopping images with cluttered background[J]. IEEE Transactions on Multimedia, 2016, 18(6): 1111-1123.
[5]	LIU F, DAI S. Normal distribution sampling convolutional neural network for fine-grained image classification[C]// Proceedings of 2019 Chinese Intelligent Systems Conference. Singapore: Springer, 2020: 645-652.
[6]	黄涛, 李华, 周桂, 等. 实例分割方法研究综述[J]. 计算机科学与探索, 2023, 17(4): 810-825.
	HUANG Tao, LI Hua, ZHOU Gui, et al. A review of instance segmentation methods[J]. Computer Science and Exploration, 2023, 17(4): 810-825.
[7]	HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]// IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2980-2988.
[8]	张绪义, 曹家乐. 基于轮廓点掩模细化的单阶段实例分割网络[J]. 光学学报, 2020, 40(21): 113-121.
	ZHANG Xuyi, CAO Jiale. A single-stage instance segmentation network based on contour point mask refinement[J]. Journal of Optics, 2020, 40(21): 113-121.
[9]	BOLAY D, ZHOU C, XIAO F, et al. YOLACT: real-time instance segmentation[C]// IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 9157-9166.
[10]	BOLAY D, ZHOU C, XIAO F, et al. YOLACT++: better real-time instance segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(2): 1108-1121.
[11]	LI K, MALIK J. Amodal instance segmentation[C]// Computer Vision-ECCV 2016. Amsterdam: Springer International Publishing, 2016: 677-693.
[12]	SALEH K, SZÉNÁSI S, VÁMOSSY Z. Occlusion handling in generic object dection: a review[C]// 2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics. Slovakia: IEEE, 2021: 477-484.
[13]	KE L, TAI Y. Deep occlusion-aware instance segmentation with overlapping bilayers[C]// IEEE Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 4019-4028.
[14]	WANG P, YUILLE A. Doc: deep occlusion estimation from a single image[C]// Computer Vision-ECCV 2016. Amsterdam: Springer International Publishing, 2016: 545-561.
[15]	ZHOU Y Z, ZHU Y, YE Q X, et al. Weakly supervised instance segmentation using class peak response[C]// IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 3791-3800.
[16]	KE L, BHARATH H, JITENDRA M. Iterative instance segmentation[C]// IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 3659-3667.
[17]	LIU Z W, LUO P, QIU S. DeepFashion: powering robust clothes recognition and retrieval with rich annotations[C]// IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 1305-1311.
[18]	SRIRAM G, GANESH Babu T R, PRAVEENA R, et al. Classification of leukemia and leukemoid using VGG-16 convolutional neural network architecture[J]. Molecular & Cellular Biomechanics, 2022, 19(1): 29-40.
[19]	GE Y Y, ZHANG R M, WANG X G, et al. DeepFashion2: a versatile benchmark for detection, pose estimation, segmentation and re-identification of clothing images[C]// IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 5337-5345.
[20]	JIA M L, SHI M Y, MIKHAIL S, et al. Fashionpedia: ontology, segmentation, and an attribute localization dataset[C]// Computer Vision-ECCV 2020. [s.l.]: Springer International Publishing, 2020: 316-332.
[21]	顾梅花, 刘杰, 李立瑶, 等. 结合特征学习与注意力机制的服装图像分割[J]. 纺织学报, 2022, 43(11): 163-171. doi: 10.13475/j.fzxb.20210901109
	GU Meihua, LIU Jie, LI Liyao, et al. Combining feature learning and attention mechanism for garment image segmentation[J]. Journal of Textile Research, 2022, 43(11): 163-171. doi: 10.13475/j.fzxb.20210901109
[22]	花玮, 顾梅花, 李立瑶, 等. 改进SOLOv2的服装图像分割算法[J]. 纺织高校基础科学学报, 2021, 34(4): 74-81.
	HUA Wei, GU Meihua, LI Liyao, et al. Improved SOLOv2 algorithm for garment image segmentation[J]. Basic Sciences Journal of Textile Universities, 2021, 34(4): 74-81.
[23]	WANG X L, ZHANG R F, KONG T, et al. SOLOv2: dynamic and fast instance segmentation[C]// Advances in Neural Information Processing Systems. Vancouver, Nips Foundation, 2020: 17721-17732.
[24]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. doi: 10.1109/TPAMI.2016.2577031 pmid: 27295650
[25]	袁明阳, 宋亚林, 张潮, 等. 基于GA-RetinaNet的水下目标检测[J]. 计算机系统应用, 2023, 32(6): 80-90.
	YUAN Mingyang, SONG Yalin, ZHANG Chao, et al. Underwater object detection based on GA-RetinaNet[J]. Computer Systems & Applications, 2023, 32(6): 80-90.
[26]	CHEN L C, PAPANDREOU G, KOKKINOS L, et al. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. Patern Analysis and Machine Intelligence, 2017, 40(4): 834-848.
[27]	WOO S, PARK J, LEE J Y, et al. Cbam:vonvolutional block attention module[C]// Proceeding of the 15th European Conference on Computer Vision. Munich: Springer International Publishing, 2018: 3-19.
[28]	ZHENG Z Hi, WANG P, LIU W, et al. Distance-IOU loss:faster and better learning for bounding box regression[C]// AAAI Conference on Artificial Intelligence. New York: AAAI, 2020: 12993-13000.
[29]	HUANG Z J, HUANG L C, GONG Y C, et al. Mask scoring R-CNN[C]// IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 6409-6418.
[30]	WANG X L, KONG T, SHEN C H, et al. SOLO: Segmenting objects by locations[C]// Computer Vision-ECCV 2020. Glasgow: Springer International Publishing, 2020: 649-665.

Just accepted

Online first

Just accepted

Online first

Viewed

Full text

From	Others	local

Times	14	50
Rate	22%	78%

Abstract

199

Just accepted	Online first	Issue

0	0	199

From	Others	local

Times	166	33
Rate	83%	17%

Cited

Web of Science	Crossref	ScienceDirect	Search for Citations in Google Scholar >>


This page requires you have already subscribed to WoS.

Shared

Discussed

CEM	AM	CIoU	$A P m$ /%	$A P 50 m$ /%	$A P 75 m$ /%
—	—	—	45.7	75.9	51.8
√	—	—	46.2	77.3	53.2
—	√	—	46.0	76.9	53.4
—	—	√	46.4	77.3	54.2
√	√	—	47.8	79.8	56.7
√	√	√	49.3	80.4	57.0

CEM	AM	CIoU	A_PL1/%	A_PL2/%	A_PL3/%
—	—	—	61.2	57.3	47.5
√	—	—	63.3	59.2	49.1
—	√	—	62.4	58.5	48.5
—	—	√	62.8	58.5	48.4
√	√	—	63.9	59.9	49.8
√	√	√	65.5	61.5	52.3

方法	A_P50	A_P75	A_PL1	A_PL2	A_PL3
Yolact^[9]	72.9	43.1	53.8	50.1	41.2
Mask Scoring R-CNN^[29]	79.3	57.2	63.7	59.1	48.1
SOLOv1^[30]	79.1	54.1	61.4	56.5	46.7
Mask R-CNN^[7]	78.7	53.8	61.2	57.3	47.5
本文方法	87.8	63.1	65.5	61.5	52.3