The work aims to achieve high-accuracy, robust, and real-time detection of barrel packaging text in tobacco slurry preparation production lines under complex interference conditions, including curved-surface deformation, reflection, low contrast, and stain occlusion, thereby providing reliable front-end detection support for automatic packaging information recognition and production traceability. To address the problems of principal-direction mismatch, boundary discontinuity, and missed detection of barrel packaging text under curved-surface deformation, reflection interference, low contrast, and local stains, a geometry-aware multi-branch feature fusion model named Vim-DFUMNet was proposed. The model was designed around three key requirements: geometric alignment, global modeling, and multi-scale collaborative fusion. Specifically, PRSS was used to alleviate principal-direction deviation caused by curved-surface projection; P-VimNet was employed to enhance long-range dependency modeling for curved text; and DFUM was designed to coordinate high-level semantic information with low-level boundary details, thereby improving the continuous representation capability, boundary integrity, and detection stability of barrel packaging text in complex industrial scenarios. Comparative experiments, ablation studies, and visualization analyses were conducted on a self-built industrial barrel packaging dataset. The dataset contained 600 original images and was expanded to 1 500 images through data augmentation, with training, validation, and test sets divided at a ratio of 6:2:2. The proposed method achieved a precision of 95.0%, a recall rate of 92.4%, an F1-score of 93.7%, and a detection speed of 46 FPS on the test set. Compared with the baseline DBNet++, the precision, recall rate, and F1-score were improved by 7.2%, 9.2%, and 8.3%, respectively. Compared with TextMamba, the F1-score was further improved by 2.2%. The proposed method effectively improves the geometric alignment capability, boundary integrity, and detection stability of barrel packaging text under complex industrial interference conditions, including curved-surface deformation, reflection-induced boundary discontinuity, low contrast, and local stains. While maintaining real-time performance, it provides technical support for automatic barrel packaging information acquisition, online detection, and production traceability.
Key words
barrel packaging text detection /
Vision Mamba /
differentiable binarization /
geometry-aware /
multi-branch feature fusion
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
References
[1] BAEK Y, LEE B, HAN D, et al.Character Region Awareness for Text Detection[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2020.
[2] WANG W H, XIE E Z, LI X, et al.Shape Robust Text Detection with Progressive Scale Expansion Network[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2020.
[3] LIAO M H, WAN Z Y, YAO C, et al.Real-Time Scene Text Detection with Differentiable Binarization[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 11474-11481.
[4] ZHU J J, WANG G D.TransText: Improving Scene Text Detection via Transformer[J]. Digital Signal Processing, 2022, 130: 103698.
[5] YE M Y, ZHANG J, ZHAO S S, et al.DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2023, 37(3): 3241-3249.
[6] ZHU L, LIAO B, ZHANG Q, et al.Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model[C]// Proceedings of the 41st International Conference on Machine Learning.[s.l.]: PMLR, 2024.
[7] ZHAO Q Y, YAN Y, WANG D H.TextMamba: Scene Text Detector with Mamba[C]// Proceedings of 2025 International Joint Conference on Neural Networks (IJCNN). Rome: IEEE, 2025: 1-8.
[8] TIAN Z, HUANG W L, HE T, et al.Detecting Text in Natural Image with Connectionist Text Proposal Network[C]// Proceedings of Computer Vision-ECCV 2016. Cham: Springer, 2016.
[9] ZHOU X Y, YAO C, WEN H, et al.EAST: An Efficient and Accurate Scene Text Detector[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017.
[10] LONG S B, RUAN J Q, ZHANG W J, et al.TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes[C]// Computer Vision—ECCV 2018. Cham: Springer, 2018.
[11] LIU Y L, CHEN H, SHEN C H, et al.ABCNet: Real-Time Scene Text Spotting with Adaptive Bezier-Curve Network[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE, 2020.
[12] ZHU Y Q, CHEN J Y, LIANG L Y, et al.Fourier Contour Embedding for Arbitrary-Shaped Text Detection[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville: IEEE, 2021.
[13] LIAO M H, ZOU Z S, WAN Z Y, et al.Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(1): 919-931.
[14] ZHAO Y N, HU Z, DING F Q, et al.GDText-VM: An Arbitrary-Shaped Scene Text Detector Based on Globally Deformable VMamba[J]. Complex & Intelligent Systems, 2025, 11(8): 348.
[15] LE T T H, HWANG Y, KADIPTYA A Y, et al. A Robust Framework for Coffee Bean Package Label Recognition: Integrating Image Enhancement with Vision-Language OCR Models[J]. Sensors, 2025, 25(20): 6484.
[16] YEH W C, LIAO S Y, HUANG C L.Label Recognition on Metal Surfaces in Semiconductor Industry by YOLO Object Detection Model[J]. The International Journal of Advanced Manufacturing Technology, 2025, 138(3): 1349-1363.
[17] WU S Y, CHANG F.A Dual-Engine Fusion Optical Character Recognition Method for Fast Identification and Key Information Extraction of Drug Labels[J]. Alexandria Engineering Journal, 2025, 128: 1027-1036.
[18] GU A, GOEL K, RÉ C. Efficiently Modeling Long Sequences with Structured State Spaces[EB/OL]. 2021: arXiv: 2111.00396. https://arxiv.org/abs/2111.00396
[19] GU A, DAO T. Mamba: Linear-Time Sequence Modeling with Selective State Spaces[EB/OL]. 2023: arXiv: 2312.00752. https://arxiv.org/abs/2312.00752
[20] XIN Q Y, ZHANG C, WANG Y H, et al.DRA-Net: Dynamic Feature Fusion Upsampling and Text-Region Focus for Ancient Chinese Scene Text Detection[J]. Electronics, 2025, 14(16): 3324.
[21] CHEN Z, WANG J H, WANG W H, et al. FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation[EB/OL]. 2021: arXiv: 2111.02394. https://arxiv.org/abs/2111.02394