Journal of Textile Research ›› 2024, Vol. 45 ›› Issue (09): 164-174.doi: 10.13475/j.fzxb.20230904501

• Apparel Engineering • Previous Articles     Next Articles

Lightweight parser-free virtual try-on based on mixed knowledge distillation and feature enhancement techniques

HOU Jue1,2, DING Huan1, YANG Yang1,2, LU Yinwen1, YU Lingjie3, LIU Zheng2,4()   

  1. 1. School of Fashion Design & Engineering, Zhejiang Sci-Tech University, Hangzhou, Zhejiang 310018, China
    2. Key Laboratory of Silk Culture Inheritance and Digital Technology of Product Design, Ministry of Culture and Tourism, Hangzhou, Zhejiang 310018, China
    3. School of Textile Science and Engineering, Xi'an Polytechnic University, Xi'an, Shaanxi 710048, China
    4. International Institute of Fashion Technology, Zhejiang Sci-Tech University, Hangzhou, Zhejiang 310018, China
  • Received:2023-09-18 Revised:2024-03-23 Online:2024-09-15 Published:2024-09-15
  • Contact: LIU Zheng E-mail:koala@zstu.edu.cn

Abstract:

Objective In order to address the issues of low accuracy in clothing deformation, texture distortion, and high computational costs in image-based virtual try-on systems, this paper proposes a lightweight parser-free virtual try-on based on mixed knowledge distillation and feature enhancement techniques.

Method Firstly, by integrating global features and calibrating the results of flow computation at different scales, an improved appearance flow estimation method was proposed to enhance the accuracy of appearance flow estimation. Moreover, a lightweight try-on network based on depth separable convolution was constructed by decoupling image segmentation results and virtual try-on processes using knowledge distillation. Finally, a garment complexity index GTC (garment texture complexity) based on the pixel-wise average gradient was proposed to quantitatively analyze the texture complexity of clothing. Based on this, the VITON dataset is divided into a simple texture set, a moderately complex texture set, and a highly complex texture set.

Results This paper used the VITON dataset to verify and analyze the proposed model. Compared with the SOTA (state-of-art) model, the number of parameters and computational complexity (flops) was decreased by 70.12% and 42.38%, respectively, suggesting a faster and better model to meet the deployment requirements of the mobile Internet. Moreover, the experimental results showed that the scores of the proposed model in image quality evaluation indicators (FID, LPIPS, PSNR, KID) were increased by 5.06%, 28.57%, 3.71%, and 33.33%, respectively, compared with the SOTA model. In the segmentation analysis of clothing complexity, the score of KID and LPIPS in this model was 48.08%, 30.45%, 1.03%, 35.54%, 30.41%, and 12.94% higher than that of the SOTA model, respectively, proving that the method proposed is superior to other methods in restoring and preserving original clothing details when warping clothing images with complex textures.

Conclusion A lightweight parser-free virtual try-on based on mixed knowledge distillation and feature enhancement techniques is proposed, which uses an efficient appearance flow estimation method to reduce registration errors, complex texture loss, and distortion during the clothing distortion process. In addition, the method proposed is shown to reduce the size and computational complexity of the final model by mixing distillation and using depth-separable convolution effectively and speeding up the running of the model. Finally, a quantitative index used for characterizing the complexity of clothing texture is proposed and the VITON test set is divided into samples. Compared with other virtual try-on methods, the experimental results show that on the VITON test set, the evaluation index results obtained from the proposed method are better than the current virtual try-on method with the best performance, and the ability of the proposed method to deal with clothing with complex patterns is also better than other methods. In addition, the ablation experiment proves that the proposed method has an obvious improvement on the final virtual try-on result.

Key words: virtual try-on, appearance flow, knowledge distillation, feature enhancement technique, garment texture complexity, apparel e-commerce

CLC Number: 

  • TS942.8

Fig.1

Network backbone"

Fig.2

Backbone of coarse flow estimator"

Fig.3

Basic Res-Net structure of each model. (a) Basic Res-Net structure of GN-T; (b) Basic Res-Net structure of GN-S"

Tab.1

Quantitative comparison of methods on VITON test dataset"

方法 FID SSIM LPIPS KID/10-2 PSNR
CP-VTON+ 22.36 0.82 0.12 0.92 21.81
ACGPN 17.95 0.84 0.11 0.73 23.11
PF-AFN 10.09 0.89 0.06 0.21 27.25
Style-VTON 8.89 0.91 0.07 0.12 26.70
RT-VTON 11.66
SDAFN 10.42 0.87 0.08 0.16 26.48
本文模型 8.44 0.90 0.05 0.08 27.69

Fig.4

Six methods in each garment complexity segmentation of image evaluation quantization map"

Tab.2

Comparison of model size and computational complexity of six methods"

方法 模型参数量/106 计算复杂度/108
CP-VTON+ 40.41 136.0
ACGPN >100 >1 000
SDAFN 37.65 747.0
PF-AFN 73.20 689.3
Style-VTON 88.41 542.9
本文模型 26.41 312.9

Fig.5

Qualitative comparison of methods on VITON test dataset. (a) Clothing style 1; (b) Clothing style 2; (c) Clothing style 3; (d) Clothing style 4; (e) Clothing style 5; (f) Clothing style 6"

Tab.3

Quantitative comparison form model configurations"

模型
配置
外观流
校准
粗糙流
估计
精细流
估计
FID KID/10-2 SSIM
配置1 9.62 0.115 0.891
配置2 9.41 0.110 0.893
配置3 8.54 0.089 0.902
配置4 8.44 0.083 0.903

Fig.6

Qualitative comparison for model configurations. (a) Colthing style 7; (b) Clothing style 8"

Tab.4

Quantitative comparison for different training loss function configurations"

方法 FID SSIM LPIPS KID/10-2 PSNR
未加入 L S K D 8.60 0.89 0.06 0.09 27.48
加入 L S K D 8.44 0.90 0.05 0.08 27.69
[1] BHATNAGAR B L, TIWARI G, THEOBALT C, et al. Multi-garment net: learning to dress 3D people from images[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul, Korea: IEEE, 2019: 5420-5430.
[2] MIR A, ALLDIECK T, PONS-MOLL G. Learning to transfer texture from clothing images to 3D humans[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE, 2020: 7023-7034.
[3] ZHAO F, XIE Z, KAMPFFMEYER M, et al. M3D-VTON: a monocular-to-3D virtual try-on network[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE, 2021: 13239-13249.
[4] DUCHON J. Splines minimizing rotation-invariant semi-norms in sobolev spaces[C]// Proceedings of the Constructive Theory of Functions of Several Variables. Berlin:Springer, 1977: 85-100.
[5] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[J]. Advances in Neural Information Processing Systems, 2014, 27(4): 2670-2680.
[6] HAN X, WU Z, WU Z, et al. Viton: an image-based virtual try-on network[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 2018: 7543-7552.
[7] GONG K, LIANG X, ZHANG D, et al. Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017: 932-940.
[8] CAO Z, SIMON T, WEI S-E, et al. Realtime multi-person 2D pose estimation using part affinity fields[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017: 7291-7299.
[9] HAN X, HU X, HUANG W, et al. Clothflow: a flow-based model for clothed person generation[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Seoul, Korea: IEEE, 2019: 10471-10480.
[10] CHOPRA A, JAIN R, HEMANI M, et al. Zflow: gated appearance flow-based virtual try-on with 3D priors[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal, QC, Canada: IEEE, 2021: 5433-5442.
[11] LEE S, GU G, PARK S, et al. High-resolution virtual try-on with misalignment and occlusion-handled conditions[C]// Proceedings of the European Conference on Computer Vision. Tel-Aviv, Israel: Springer, 2022: 204-219.
[12] XIE Z, HUANG Z, DONG X, et al. GP-VTON: Towards general purpose virtual try-on via collaborative local-flow global-parsing learning[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, BC, Canada: IEEE, 2023: 23550-23559.
[13] BAI S, ZHOU H, LI Z, et al. Single stage virtual try-on via deformable attention flows[C]// Proceedings of the European Conference on Computer Vision. Tel-Aviv, Israel: Springer, 2022: 409-425.
[14] GE Y, SONG Y, ZHANG R, et al. Parser-free virtual try-on via distilling appearance flows[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA: IEEE, 2021: 8485-8493.
[15] HE S, SONG Y Z, XIANG T. Style-based global appearance flow for virtual try-on[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, LA, USA: IEEE, 2022: 3470-3479.
[16] KARRAS T, LAINE S, AILA T. A style-based generator architecture for generative adversarial networks[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA: IEEE, 2019: 4401-4410.
[17] GÜLER R A, NEVEROVA N, KOKKINOS I. Densepose: ense human pose estimation in the wild[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 2018: 7297-7306.
[18] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017: 2117-2125.
[19] RONNEBERGER O, FISCHER P, BROX T. U-net: Convolutional networks for biomedical image segmentation[C]// Proceedings of the Medical Image Computing and Computer-Assisted Intervention:MICCAI 2015. Munich, Germany: Springer, 2015: 234-241.
[20] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 770-778.
[21] CHOLLET F. Xception: deep learning with depthwise separable convolutions[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017: 1251-1258.
[22] SANDLER M, HOWARD A, ZHU M, et al. Mobilenetv2: inverted residuals and linear bottle-necks[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 2018: 4510-4520.
[23] JOHNSON J, ALAHI A, FEI-FEI L. Perceptual losses for real-time style transfer and super-resolution[C]// Proceedings of the Computer Vision:ECCV 2016. Amsterdam, The Netherlands: Springer, 2016: 694-711.
[24] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recogni-tion[J]. Computer Science, 2014. DOI: 10.48550/arXiv.1409.1556.
[25] SUN D, ROTH S, BLACK M J. A quantitative analysis of current practices in optical flow estimation and the principles behind them[J]. International Journal of Computer Vision, 2014(106): 115-137.
[26] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. Gans trained by a two time-scale update rule converge to a local nash equilibrium[J]. Advances in Neural Information Processing Systems, 2017(30): 6626-6637.
[27] WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612.
doi: 10.1109/tip.2003.819861 pmid: 15376593
[28] ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 2018: 586-595.
[29] SUTHERLAND J D, ARBEL M, GRETTON A. Demystifying mmd gans[C]// International Conference on Learning Representations. Vancouver, Canada: ICLR, 2018: 1-36.
[30] HORE A, ZIOU D. Image quality metrics: PSNR vs. SSIM[C]// Proceedings of the 2010 20th International Conference on Pattern Recognition. Istanbul, Turkey: IEEE, 2010: 2366-2369.
[31] MINAR M R. TUAN T T, AHN H, et al. Cp-vton+: clothing shape and texture preserving image-based virtual try-on[C]// Proceedings of the CVPR Workshops. Seattle, WA, USA: IEEE, 2020: 10-14.
[32] YANG H, ZHANG R, GUO X, et al. Towards photo-realistic virtual try-on by adaptively generating-preserving image content[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE, 2020: 7850-7859.
[33] YANG H, YU X, LIU Z. Full-range virtual try-on with recurrent tri-level transform[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, LA, USA: IEEE, 2022: 3460-3469.
[1] LU Yinwen, HOU Jue, YANG Yang, GU Bingfei, ZHANG Hongwei, LIU Zheng. Single dress image video synthesis based on pose embedding and multi-scale attention [J]. Journal of Textile Research, 2024, 45(07): 165-172.
[2] REN Guodong, TU Jiajia, LI Yang, QIU Zian, SHI Weiming. Yarn state detection based on lightweight network and knowledge distillation [J]. Journal of Textile Research, 2023, 44(09): 205-212.
[3] YUAN Tiantian, WANG Xin, LUO Weihao, MEI Chennan, WEI Jingyan, ZHONG Yueqi. Three-dimensional virtual try-on network based on attention mechanism and vision transformer [J]. Journal of Textile Research, 2023, 44(07): 192-198.
[4] LI Bowen, WANG Ping, LIU Yuye. 3-D virtual try-on technique based on dynamic feature of body postures [J]. Journal of Textile Research, 2021, 42(09): 144-149.
[5] ZHANG Yijie, LI Tao, LÜ Yexin, DU Lei, ZOU Fengyuan. Progress in garment ease design and its modeling methods [J]. Journal of Textile Research, 2021, 42(04): 184-190.
[6] XIA Haibang, HUANG Hongyun, DING Zuohua. Clothing comfort evaluation based on transfer learning and support vector machine [J]. Journal of Textile Research, 2020, 41(06): 125-131.
[7] YANG Gang;;ZHONG Yueqi;. It is not rare for penetration phenomena to occur between garment and mannequin when 3-D garment is dressed onto various mannequins. In order to enhance the reusability of the 3-D scanned garment model, an algorithm based on same layer penetration compensation has been proposed, in which, the penetration detection and compensation between garment and mannequin are expressed respectively by the crossover and compensation between garment vertex and body triangle, and between garment edge/body triangle. The over-deformation is compensated via the position adjustment procedure. Experimental results verify that this method is an efficient approach for reusing 3-D scanned garment models. [J]. JOURNAL OF TEXTILE RESEARCH, 2010, 31(10): 134-138.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!