基于混合知识蒸馏和特征增强技术的轻量级无解析式虚拟试衣网络

RichHTML

PDF (PC)

Lightweight parser-free virtual try-on based on mixed knowledge distillation and feature enhancement techniques

Abstract

Figures/Tables 10

References 33

Metrics

Comments

Recommended 0

Abstract

Cite this article

share this article

Figures/Tables 10

References 33

Related Articles 7

Metrics

Comments

Recommended 0

doi:10.13475/j.fzxb.20230904501

Abstract:

Objective In order to address the issues of low accuracy in clothing deformation, texture distortion, and high computational costs in image-based virtual try-on systems, this paper proposes a lightweight parser-free virtual try-on based on mixed knowledge distillation and feature enhancement techniques.

Method Firstly, by integrating global features and calibrating the results of flow computation at different scales, an improved appearance flow estimation method was proposed to enhance the accuracy of appearance flow estimation. Moreover, a lightweight try-on network based on depth separable convolution was constructed by decoupling image segmentation results and virtual try-on processes using knowledge distillation. Finally, a garment complexity index GTC (garment texture complexity) based on the pixel-wise average gradient was proposed to quantitatively analyze the texture complexity of clothing. Based on this, the VITON dataset is divided into a simple texture set, a moderately complex texture set, and a highly complex texture set.

Results This paper used the VITON dataset to verify and analyze the proposed model. Compared with the SOTA (state-of-art) model, the number of parameters and computational complexity (flops) was decreased by 70.12% and 42.38%, respectively, suggesting a faster and better model to meet the deployment requirements of the mobile Internet. Moreover, the experimental results showed that the scores of the proposed model in image quality evaluation indicators (FID, LPIPS, PSNR, KID) were increased by 5.06%, 28.57%, 3.71%, and 33.33%, respectively, compared with the SOTA model. In the segmentation analysis of clothing complexity, the score of KID and LPIPS in this model was 48.08%, 30.45%, 1.03%, 35.54%, 30.41%, and 12.94% higher than that of the SOTA model, respectively, proving that the method proposed is superior to other methods in restoring and preserving original clothing details when warping clothing images with complex textures.

Conclusion A lightweight parser-free virtual try-on based on mixed knowledge distillation and feature enhancement techniques is proposed, which uses an efficient appearance flow estimation method to reduce registration errors, complex texture loss, and distortion during the clothing distortion process. In addition, the method proposed is shown to reduce the size and computational complexity of the final model by mixing distillation and using depth-separable convolution effectively and speeding up the running of the model. Finally, a quantitative index used for characterizing the complexity of clothing texture is proposed and the VITON test set is divided into samples. Compared with other virtual try-on methods, the experimental results show that on the VITON test set, the evaluation index results obtained from the proposed method are better than the current virtual try-on method with the best performance, and the ability of the proposed method to deal with clothing with complex patterns is also better than other methods. In addition, the ablation experiment proves that the proposed method has an obvious improvement on the final virtual try-on result.

Key words: virtual try-on, appearance flow, knowledge distillation, feature enhancement technique, garment texture complexity, apparel e-commerce

CLC Number:

TS942.8

HOU Jue, DING Huan, YANG Yang, LU Yinwen, YU Lingjie, LIU Zheng. Lightweight parser-free virtual try-on based on mixed knowledge distillation and feature enhancement techniques[J].Journal of Textile Research, 2024, 45(09): 164-174.

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

URL: http://www.fzxb.org.cn/EN/10.13475/j.fzxb.20230904501

http://www.fzxb.org.cn/EN/Y2024/V45/I09/164

Fig.1

Fig.2

Fig.3

Tab.1

Fig.4

Tab.2

Fig.5

Tab.3

Fig.6

Tab.4

Quantitative comparison for different training loss function configurations"

方法	FID	SSIM	LPIPS	KID/10^-2	PSNR
未加入 $L_{S K D}$ $L S K D$	8.60	0.89	0.06	0.09	27.48
加入 $L_{S K D}$ $L S K D$	8.44	0.90	0.05	0.08	27.69

Tab.4

[1]	BHATNAGAR B L, TIWARI G, THEOBALT C, et al. Multi-garment net: learning to dress 3D people from images[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul, Korea: IEEE, 2019: 5420-5430.
[2]	MIR A, ALLDIECK T, PONS-MOLL G. Learning to transfer texture from clothing images to 3D humans[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE, 2020: 7023-7034.
[3]	ZHAO F, XIE Z, KAMPFFMEYER M, et al. M3D-VTON: a monocular-to-3D virtual try-on network[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021: 13239-13249.
[4]	DUCHON J. Splines minimizing rotation-invariant semi-norms in sobolev spaces[C]// Proceedings of the Constructive Theory of Functions of Several Variables. Berlin:Springer, 1977: 85-100.
[5]	GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[J]. Advances in Neural Information Processing Systems, 2014, 27(4): 2670-2680.
[6]	HAN X, WU Z, WU Z, et al. Viton: an image-based virtual try-on network[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 2018: 7543-7552.
[7]	GONG K, LIANG X, ZHANG D, et al. Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017: 932-940.
[8]	CAO Z, SIMON T, WEI S-E, et al. Realtime multi-person 2D pose estimation using part affinity fields[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017: 7291-7299.
[9]	HAN X, HU X, HUANG W, et al. Clothflow: a flow-based model for clothed person generation[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul, Korea: IEEE, 2019: 10471-10480.
[10]	CHOPRA A, JAIN R, HEMANI M, et al. Zflow: gated appearance flow-based virtual try-on with 3D priors[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal, QC, Canada: IEEE, 2021: 5433-5442.
[11]	LEE S, GU G, PARK S, et al. High-resolution virtual try-on with misalignment and occlusion-handled conditions[C]// Proceedings of the European Conference on Computer Vision. Tel-Aviv, Israel: Springer, 2022: 204-219.
[12]	XIE Z, HUANG Z, DONG X, et al. GP-VTON: Towards general purpose virtual try-on via collaborative local-flow global-parsing learning[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, BC, Canada: IEEE, 2023: 23550-23559.
[13]	BAI S, ZHOU H, LI Z, et al. Single stage virtual try-on via deformable attention flows[C]// Proceedings of the European Conference on Computer Vision. Tel-Aviv, Israel: Springer, 2022: 409-425.
[14]	GE Y, SONG Y, ZHANG R, et al. Parser-free virtual try-on via distilling appearance flows[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA: IEEE, 2021: 8485-8493.
[15]	HE S, SONG Y Z, XIANG T. Style-based global appearance flow for virtual try-on[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, LA, USA: IEEE, 2022: 3470-3479.
[16]	KARRAS T, LAINE S, AILA T. A style-based generator architecture for generative adversarial networks[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA: IEEE, 2019: 4401-4410.
[17]	GÜLER R A, NEVEROVA N, KOKKINOS I. Densepose: ense human pose estimation in the wild[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 2018: 7297-7306.
[18]	LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017: 2117-2125.
[19]	RONNEBERGER O, FISCHER P, BROX T. U-net: Convolutional networks for biomedical image segmentation[C]// Proceedings of the Medical Image Computing and Computer-Assisted Intervention:MICCAI 2015. Munich, Germany: Springer, 2015: 234-241.
[20]	HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 770-778.
[21]	CHOLLET F. Xception: deep learning with depthwise separable convolutions[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017: 1251-1258.
[22]	SANDLER M, HOWARD A, ZHU M, et al. Mobilenetv2: inverted residuals and linear bottle-necks[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 2018: 4510-4520.
[23]	JOHNSON J, ALAHI A, FEI-FEI L. Perceptual losses for real-time style transfer and super-resolution[C]// Proceedings of the Computer Vision:ECCV 2016. Amsterdam, The Netherlands: Springer, 2016: 694-711.
[24]	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recogni-tion[J]. Computer Science, 2014. DOI: 10.48550/arXiv.1409.1556.
[25]	SUN D, ROTH S, BLACK M J. A quantitative analysis of current practices in optical flow estimation and the principles behind them[J]. International Journal of Computer Vision, 2014(106): 115-137.
[26]	HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. Gans trained by a two time-scale update rule converge to a local nash equilibrium[J]. Advances in Neural Information Processing Systems, 2017(30): 6626-6637.
[27]	WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612. doi: 10.1109/tip.2003.819861 pmid: 15376593
[28]	ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 2018: 586-595.
[29]	SUTHERLAND J D, ARBEL M, GRETTON A. Demystifying mmd gans[C]// International Conference on Learning Representations. Vancouver, Canada: ICLR, 2018: 1-36.
[30]	HORE A, ZIOU D. Image quality metrics: PSNR vs. SSIM[C]// Proceedings of the 2010 20th International Conference on Pattern Recognition. Istanbul, Turkey: IEEE, 2010: 2366-2369.
[31]	MINAR M R. TUAN T T, AHN H, et al. Cp-vton+: clothing shape and texture preserving image-based virtual try-on[C]// Proceedings of the CVPR Workshops. Seattle, WA, USA: IEEE, 2020: 10-14.
[32]	YANG H, ZHANG R, GUO X, et al. Towards photo-realistic virtual try-on by adaptively generating-preserving image content[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE, 2020: 7850-7859.
[33]	YANG H, YU X, LIU Z. Full-range virtual try-on with recurrent tri-level transform[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, LA, USA: IEEE, 2022: 3460-3469.

Viewed

Full text

From	Others	local

Times	15	40
Rate	27%	73%

Abstract

186

Just accepted	Online first	Issue

0	0	186

From	Others	local

Times	137	49
Rate	74%	26%

Cited

Web of Science	Crossref	ScienceDirect	Search for Citations in Google Scholar >>


This page requires you have already subscribed to WoS.

Shared

Discussed

No Suggested Reading articles found!

方法	FID	SSIM	LPIPS	KID/10^-2	PSNR
CP-VTON+	22.36	0.82	0.12	0.92	21.81
ACGPN	17.95	0.84	0.11	0.73	23.11
PF-AFN	10.09	0.89	0.06	0.21	27.25
Style-VTON	8.89	0.91	0.07	0.12	26.70
RT-VTON	11.66	—	—	—	—
SDAFN	10.42	0.87	0.08	0.16	26.48
本文模型	8.44	0.90	0.05	0.08	27.69

方法	模型参数量/10⁶	计算复杂度/10⁸
CP-VTON+	40.41	136.0
ACGPN	>100	>1 000
SDAFN	37.65	747.0
PF-AFN	73.20	689.3
Style-VTON	88.41	542.9
本文模型	26.41	312.9

模型配置	外观流校准	粗糙流估计	精细流估计	FID	KID/10^-2	SSIM
配置1		√		9.62	0.115	0.891
配置2	√	√		9.41	0.110	0.893
配置3		√	√	8.54	0.089	0.902
配置4	√	√	√	8.44	0.083	0.903

Just accepted

Online first

Just accepted

Online first

[1]	LU Yinwen, HOU Jue, YANG Yang, GU Bingfei, ZHANG Hongwei, LIU Zheng. Single dress image video synthesis based on pose embedding and multi-scale attention [J]. Journal of Textile Research, 2024, 45(07): 165-172.
[2]	REN Guodong, TU Jiajia, LI Yang, QIU Zian, SHI Weiming. Yarn state detection based on lightweight network and knowledge distillation [J]. Journal of Textile Research, 2023, 44(09): 205-212.
[3]	YUAN Tiantian, WANG Xin, LUO Weihao, MEI Chennan, WEI Jingyan, ZHONG Yueqi. Three-dimensional virtual try-on network based on attention mechanism and vision transformer [J]. Journal of Textile Research, 2023, 44(07): 192-198.
[4]	LI Bowen, WANG Ping, LIU Yuye. 3-D virtual try-on technique based on dynamic feature of body postures [J]. Journal of Textile Research, 2021, 42(09): 144-149.
[5]	ZHANG Yijie, LI Tao, LÜ Yexin, DU Lei, ZOU Fengyuan. Progress in garment ease design and its modeling methods [J]. Journal of Textile Research, 2021, 42(04): 184-190.
[6]	XIA Haibang, HUANG Hongyun, DING Zuohua. Clothing comfort evaluation based on transfer learning and support vector machine [J]. Journal of Textile Research, 2020, 41(06): 125-131.
[7]	YANG Gang;;ZHONG Yueqi;. It is not rare for penetration phenomena to occur between garment and mannequin when 3-D garment is dressed onto various mannequins. In order to enhance the reusability of the 3-D scanned garment model, an algorithm based on same layer penetration compensation has been proposed, in which, the penetration detection and compensation between garment and mannequin are expressed respectively by the crossover and compensation between garment vertex and body triangle, and between garment edge/body triangle. The over-deformation is compensated via the position adjustment procedure. Experimental results verify that this method is an efficient approach for reusing 3-D scanned garment models. [J]. JOURNAL OF TEXTILE RESEARCH, 2010, 31(10): 134-138.