基于Vision Mamba骨干网架构的桶体包装文本检测算法研究

臧浩科; 张逸寒; 李渤浩; 苏洪军; 鲍峰伟; 韩少龙

doi:10.19554/j.cnki.1001-3563.2026.09.029

PDF(14650 KB)

包装工程（技术栏目） ›› 2026, Vol. 47 ›› Issue (9) : 275-285. DOI: 10.19554/j.cnki.1001-3563.2026.09.029

自动化与智能化技术

基于Vision Mamba骨干网架构的桶体包装文本检测算法研究

臧浩科¹, 张逸寒¹, 李渤浩², 苏洪军¹, 鲍峰伟¹, 韩少龙^3,*

作者信息 +

Barrel Packaging Text Detection Algorithm Based on Vision Mamba Backbone

ZANG Haoke¹, ZHANG Yihan¹, LI Bohao², SU Hongjun¹, BAO Fengwei¹, HAN Shaolong^3,*

Author information +

文章历史 +

摘要

目的实现烟草料液配制产线中,桶体包装文本在曲面形变、反光、低对比度及污损遮挡等复杂干扰条件下的高精度、高鲁棒性和实时检测,为包装信息自动识别与生产过程追溯提供可靠的前端检测支撑。方法针对桶体包装文本在曲面形变、反光干扰、低对比度及局部污损条件下易出现主方向失配、边界断裂和漏检的问题,提出几何感知多分支特征融合模型Vim-DFUMNet。该模型围绕几何对齐、全局建模和多尺度协同融合3类关键需求展开设计。通过PRSS缓解曲面投影导致的主方向偏移,P-VimNet增强弯曲文本的长程依赖建模能力,以及DFUM协调高层语义信息与低层边界细节,从而提升复杂工业场景下桶体包装文本的连续表征能力、边界完整性与检测稳定性。基于自建工业桶体包装数据集开展对比实验、消融实验与可视化分析。数据集原始采集样本600张,经增强扩充至1 500张,按6∶2∶2划分为训练集、验证集和测试集。结果该方法在测试集上取得95.0%的精确率、92.4%的召回率、93.7%的F₁值和46FPS的检测速度。相比基线DBNet++,精确率、召回率和F₁值分别提升7.2%、9.2%和8.3%。相比TextMamba,F₁值进一步提升2.2%。结论所提方法能够在曲面形变、反光断裂、低对比度及局部污损等复杂工业干扰条件下有效提升桶体包装文本的几何对齐能力、边界完整性和检测稳定性,并在保持实时性的同时,为桶体包装信息自动采集、在线检测与生产过程追溯提供技术支持。

Abstract

The work aims to achieve high-accuracy, robust, and real-time detection of barrel packaging text in tobacco slurry preparation production lines under complex interference conditions, including curved-surface deformation, reflection, low contrast, and stain occlusion, thereby providing reliable front-end detection support for automatic packaging information recognition and production traceability. To address the problems of principal-direction mismatch, boundary discontinuity, and missed detection of barrel packaging text under curved-surface deformation, reflection interference, low contrast, and local stains, a geometry-aware multi-branch feature fusion model named Vim-DFUMNet was proposed. The model was designed around three key requirements: geometric alignment, global modeling, and multi-scale collaborative fusion. Specifically, PRSS was used to alleviate principal-direction deviation caused by curved-surface projection; P-VimNet was employed to enhance long-range dependency modeling for curved text; and DFUM was designed to coordinate high-level semantic information with low-level boundary details, thereby improving the continuous representation capability, boundary integrity, and detection stability of barrel packaging text in complex industrial scenarios. Comparative experiments, ablation studies, and visualization analyses were conducted on a self-built industrial barrel packaging dataset. The dataset contained 600 original images and was expanded to 1 500 images through data augmentation, with training, validation, and test sets divided at a ratio of 6:2:2. The proposed method achieved a precision of 95.0%, a recall rate of 92.4%, an F₁-score of 93.7%, and a detection speed of 46 FPS on the test set. Compared with the baseline DBNet++, the precision, recall rate, and F₁-score were improved by 7.2%, 9.2%, and 8.3%, respectively. Compared with TextMamba, the F₁-score was further improved by 2.2%. The proposed method effectively improves the geometric alignment capability, boundary integrity, and detection stability of barrel packaging text under complex industrial interference conditions, including curved-surface deformation, reflection-induced boundary discontinuity, low contrast, and local stains. While maintaining real-time performance, it provides technical support for automatic barrel packaging information acquisition, online detection, and production traceability.

导出引用

臧浩科, 张逸寒, 李渤浩, 苏洪军, 鲍峰伟, 韩少龙. 基于Vision Mamba骨干网架构的桶体包装文本检测算法研究[J]. 包装工程. 2026, 47(9): 275-285 https://doi.org/10.19554/j.cnki.1001-3563.2026.09.029

ZANG Haoke, ZHANG Yihan, LI Bohao, SU Hongjun, BAO Fengwei, HAN Shaolong. Barrel Packaging Text Detection Algorithm Based on Vision Mamba Backbone[J]. Packaging Engineering. 2026, 47(9): 275-285 https://doi.org/10.19554/j.cnki.1001-3563.2026.09.029

中图分类号： TB487 TP391.41

参考文献

[1] BAEK Y, LEE B, HAN D, et al.Character Region Awareness for Text Detection[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2020.
[2] WANG W H, XIE E Z, LI X, et al.Shape Robust Text Detection with Progressive Scale Expansion Network[C]// Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2020.
[3] LIAO M H, WAN Z Y, YAO C, et al.Real-Time Scene Text Detection with Differentiable Binarization[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 11474-11481.
[4] ZHU J J, WANG G D.TransText: Improving Scene Text Detection via Transformer[J]. Digital Signal Processing, 2022, 130: 103698.
[5] YE M Y, ZHANG J, ZHAO S S, et al.DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Transformer[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2023, 37(3): 3241-3249.
[6] ZHU L, LIAO B, ZHANG Q, et al.Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model[C]// Proceedings of the 41st International Conference on Machine Learning.[s.l.]: PMLR, 2024.
[7] ZHAO Q Y, YAN Y, WANG D H.TextMamba: Scene Text Detector with Mamba[C]// Proceedings of 2025 International Joint Conference on Neural Networks (IJCNN). Rome: IEEE, 2025: 1-8.
[8] TIAN Z, HUANG W L, HE T, et al.Detecting Text in Natural Image with Connectionist Text Proposal Network[C]// Proceedings of Computer Vision-ECCV 2016. Cham: Springer, 2016.
[9] ZHOU X Y, YAO C, WEN H, et al.EAST: An Efficient and Accurate Scene Text Detector[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017.
[10] LONG S B, RUAN J Q, ZHANG W J, et al.TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes[C]// Computer Vision—ECCV 2018. Cham: Springer, 2018.
[11] LIU Y L, CHEN H, SHEN C H, et al.ABCNet: Real-Time Scene Text Spotting with Adaptive Bezier-Curve Network[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle: IEEE, 2020.
[12] ZHU Y Q, CHEN J Y, LIANG L Y, et al.Fourier Contour Embedding for Arbitrary-Shaped Text Detection[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville: IEEE, 2021.
[13] LIAO M H, ZOU Z S, WAN Z Y, et al.Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(1): 919-931.
[14] ZHAO Y N, HU Z, DING F Q, et al.GDText-VM: An Arbitrary-Shaped Scene Text Detector Based on Globally Deformable VMamba[J]. Complex & Intelligent Systems, 2025, 11(8): 348.
[15] LE T T H, HWANG Y, KADIPTYA A Y, et al. A Robust Framework for Coffee Bean Package Label Recognition: Integrating Image Enhancement with Vision-Language OCR Models[J]. Sensors, 2025, 25(20): 6484.
[16] YEH W C, LIAO S Y, HUANG C L.Label Recognition on Metal Surfaces in Semiconductor Industry by YOLO Object Detection Model[J]. The International Journal of Advanced Manufacturing Technology, 2025, 138(3): 1349-1363.
[17] WU S Y, CHANG F.A Dual-Engine Fusion Optical Character Recognition Method for Fast Identification and Key Information Extraction of Drug Labels[J]. Alexandria Engineering Journal, 2025, 128: 1027-1036.
[18] GU A, GOEL K, RÉ C. Efficiently Modeling Long Sequences with Structured State Spaces[EB/OL]. 2021: arXiv: 2111.00396. https://arxiv.org/abs/2111.00396
[19] GU A, DAO T. Mamba: Linear-Time Sequence Modeling with Selective State Spaces[EB/OL]. 2023: arXiv: 2312.00752. https://arxiv.org/abs/2312.00752
[20] XIN Q Y, ZHANG C, WANG Y H, et al.DRA-Net: Dynamic Feature Fusion Upsampling and Text-Region Focus for Ancient Chinese Scene Text Detection[J]. Electronics, 2025, 14(16): 3324.
[21] CHEN Z, WANG J H, WANG W H, et al. FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation[EB/OL]. 2021: arXiv: 2111.02394. https://arxiv.org/abs/2111.02394