基于树结构与深度强化学习的在线三维装箱算法

张长勇; 张宇浩; 李铮

doi:10.19554/j.cnki.1001-3563.2026.05.015

PDF(2782 KB)

包装工程（技术栏目） ›› 2026, Vol. 47 ›› Issue (5) : 130-143. DOI: 10.19554/j.cnki.1001-3563.2026.05.015

自动化与智能化技术

基于树结构与深度强化学习的在线三维装箱算法

张长勇^*, 张宇浩, 李铮

作者信息 +

An Online 3D Packing Algorithm Based on Tree Structures and Deep Reinforcement Learning

ZHANG Changyong^*, ZHANG Yuhao, LI Zheng

Author information +

文章历史 +

摘要

目的针对现有在线三维装箱算法在应对动态、复杂及大规模装箱场景时普遍存在的计算效率低、学习空间易爆炸等问题,提出一种融合树结构与深度强化学习（DRL）的在线装载方法。方法构建了分层耦合优化框架：在外层,引入缓冲区与临时备份空间,利用树结构规划器生成包含货物选取、装载及“移除重排”的动作序列,突破传统方法无法修正历史决策的局限;在内层,设计拓扑-空间融合感知网络,结合卷积神经网络（CNN）与图卷积网络（GCN）提取容器几何剩余空间与货物支撑结构特征,利用DRL智能体输出最优落位并评估状态价值,引导树搜索进行高效剪枝。结果实验显示,在考虑物理支撑与稳定性等复杂现实约束下,该方法较现有主流DRL算法在容器的空间利用率上提升约15%,并在50~200件大规模货物序列下保持了稳定的实时响应能力。结论所提算法有效实现了在线装箱中长序列规划与局部空间优化的平衡,具有显著的工程应用价值。

Abstract

The work aims to propose an online loading method integrating tree structures with deep reinforcement learning (DRL) so as to address the prevalent issues of low computational efficiency and prone-to-exploding learning space in existing online 3D packing algorithms when handling dynamic, complex, and large-scale packing scenarios. A hierarchical coupled optimisation framework was constructed. At the outer layer, a buffer zone and a temporary backup space were introduced. A tree-structured planner was applied to generate action sequences encompassing cargo selection, loading, and "remove-and-rearrange" operations, overcoming the limitation of traditional methods that could not correct historical decisions. At the inner layer, a topology-space fusion perception network was designed. Combining convolutional neural networks (CNN) and graph convolutional networks (GCN), the container geometric residual space and cargo support structure features were extracted. The DRL agent was applied to output optimal placement positions and evaluate state values, guiding tree search for efficient pruning. Experiments demonstrated that under complex real-world constraints such as physical support and stability, this method achieved approximately 15% higher container space utilisation than existing mainstream DRL algorithms, while maintaining stable real-time response capabilities across large-scale cargo sequences ranging from 50 to 200 items. The proposed algorithm effectively balances long-sequence planning with local space optimisation in online container loading, demonstrating significant engineering application value.

导出引用

张长勇, 张宇浩, 李铮. 基于树结构与深度强化学习的在线三维装箱算法[J]. 包装工程. 2026, 47(5): 130-143 https://doi.org/10.19554/j.cnki.1001-3563.2026.05.015

ZHANG Changyong, ZHANG Yuhao, LI Zheng. An Online 3D Packing Algorithm Based on Tree Structures and Deep Reinforcement Learning[J]. Packaging Engineering. 2026, 47(5): 130-143 https://doi.org/10.19554/j.cnki.1001-3563.2026.05.015

中图分类号： V353 TB181

参考文献

[1] WRÓBEL S. Constraint Programming Methods in Three-Dimensional Container Packing[EB/OL]. ArXiv, 2023[2026-02-11] https://arxiv.org/abs/2311.06314.
[2] FEKETE S P, SCHEPERS J, VAN DER VEEN J C. An Exact Algorithm for Higher-Dimensional Orthogonal Packing[J]. Operations Research, 2007, 55(3): 569-587.
[3] DELL’AMICO M, FURINI F, IORI M. A Branch-and-Price Algorithm for the Temporal Bin Packing Problem[J]. Computers & Operations Research, 2020, 114: 104825.
[4] 张德富, 彭煜, 张丽丽. 求解三维装箱问题的多层启发式搜索算法[J]. 计算机学报, 2012, 35(12): 2553-2561.
[5] ZHANG D F, PENG Y, ZHANG L L.A Multi-Layer Heuristic Search Algorithm for Three Dimensional Container Loading Problem[J]. Chinese Journal of Computers, 2012, 35(12): 2553-2561.
[6] SANGCHOOLI A S, SAJADIFAR S M.A Heuristic and GRASP Algorithm for Three-Dimensional Multiple Bin-Size Bin Packing Problem Based on the Needs of a Spare-Part Company[J]. International Journal of Services and Operations Management, 2021, 38(1): 73.
[7] 张长勇, 翟一鸣. 基于改进遗传算法的航空集装箱装载问题研究[J]. 北京航空航天大学学报, 2021, 47(7): 1345-1352.
[8] ZHANG C Y, ZHAI Y M.Air Container Loading Based on Improved Genetic Algorithm[J]. Journal of Beijing University of Aeronautics and Astronautics, 2021, 47(7): 1345-1352.
[9] ZHANG B L, YAO Y, KAN H K, et al.A GAN-Based Genetic Algorithm for Solving the 3D Bin Packing Problem[J]. Scientific Reports, 2024, 14: 7775.
[10] 刘胜, 沈大勇, 商秀芹, 等. 求解三维装箱问题的多层树搜索算法[J]. 自动化学报, 2020, 46(6): 1178-1187.
[11] LIU S, SHEN D Y, SHANG X Q, et al.A Multi-Level Tree Search Algorithm for Three Dimensional Container Loading Problem[J]. Acta Automatica Sinica, 2020, 46(6): 1178-1187.
[12] 邢志伟, 侯翔开, 李彪, 等. 基于动态四叉树搜索的民航行李车码放算法[J]. 北京航空航天大学学报, 2022, 48(12): 2345-2355.
[13] XING Z W, HOU X K, LI B, et al.Civil Aviation Luggage Cart Stacking Algorithm Based on Dynamic Quadtree Search[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(12): 2345-2355.
[14] SILVER D, SCHRITTWIESER J, SIMONYAN K, et al.Mastering the Game of Go without Human Knowledge[J]. Nature, 2017, 550(7676): 354-359.
[15] MNIH V, KAVUKCUOGLU K, SILVER D, et al.Human-Level Control through Deep Reinforcement Learning[J]. Nature, 2015, 518(7540): 529-533.
[16] JIA J, SHANG H L, CHEN X.Robot Online 3D Bin Packing Strategy Based on Deep Reinforcement Learning and 3D Vision[C]// 2022 IEEE International Conference on Networking, Sensing and Control (ICNSC). Shanghai, China. IEEE, 2023: 1-6.
[17] ZHAO H, SHE Q J, ZHU C Y, et al.Online 3D Bin Packing with Constrained Deep Reinforcement Learning[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(1): 741-749.
[18] ZHAO H, YU Y, XU K.Learning Efficient Online 3d Bin Packing on Packing Configuration Trees[J]. International Conference on Learning Representations, 2021, 1(1): 1-18.
[19] YANG S, SONG S, CHU S L, et al.Heuristics Integrated Deep Reinforcement Learning for Online 3D Bin Packing[J]. IEEE Transactions on Automation Science and Engineering, 2024, 21(1): 939-950.
[20] WANG B Y, LIN Z H, KONG W J, et al.Bin Packing Optimization via Deep Reinforcement Learning[J]. IEEE Robotics and Automation Letters, 2025, 10(3): 2542-2549.
[21] TSANG Y P, MO D Y, CHUNG K T, et al.A Deep Reinforcement Learning Approach for Online and Concurrent 3D Bin Packing Optimisation with Bin Replacement Strategies[J]. Computers in Industry, 2025, 164: 104202.
[22] XIONG H, DING K, DING W, et al.Towards Reliable Robot Packing System Based on Deep Reinforcement Learning[J]. Advanced Engineering Informatics, 2023, 57: 102028.
[23] QUE Q Q, YANG F, ZHANG D F.Solving 3D Packing Problem Using Transformer Network and Reinforcement Learning[J]. Expert Systems with Applications, 2023, 214: 119153.
[24] XIONG H, GUO C R, PENG J, et al.GOPT: Generalizable Online 3D Bin Packing via Transformer-Based Deep Reinforcement Learning[J]. IEEE Robotics and Automation Letters, 2024, 9(11): 10335-10342.
[25] ALMANAKLY H.Online 3D Bin Packing an Image-Based Deep Reinforcement Learning Approach[D]. New York: The Cooper Union for the Advancement of Science and Art, 2025.