非均衡训练集过采样的印刷套准识别方法

简川霞, 叶荣, 林浩, 贺鑫, 杜美剑

包装工程(技术栏目) ›› 2020 ›› Issue (21) : 251-260.

PDF(24826 KB)
PDF(24826 KB)
包装工程(技术栏目) ›› 2020 ›› Issue (21) : 251-260. DOI: 10.19554/j.cnki.1001-3563.2020.21.037

非均衡训练集过采样的印刷套准识别方法

  • 简川霞, 叶荣, 林浩, 贺鑫, 杜美剑
作者信息 +

Printing Registration Recognition Method Based on Oversampling Imbalanced Training Dataset

  • JIAN Chuan-xia, YE Rong, LIN Hao, HE Xin, DU Mei-jian
Author information +
文章历史 +

摘要

目的 针对印刷标志图像训练数据集非均衡性导致印刷标志图像中少类数据套准状态识别准确率低的问题,提出改进的SMOTE训练集过采样方法,以提高少类数据的识别准确率。方法 提取印刷标志图像灰度行程矩阵的纹理特征,组成多维的模型输入特征数据。基于少类样本的邻域信息,得到少类样本的过采样参数。对少类样本采取不同的过采样策略,实现训练集样本的均衡。使用均衡的训练集建立支持向量机模型,实现对印刷套准状态的识别。结果 实验结果表明,文中方法在不同非均衡印刷数据集上,获得的平均分类准确率几何平均数Gmean为0.8507,召回率Re为0.7192,ROC曲线下面积A为0.8549。结论 文中方法在不同非均衡印刷套准数据集上的分类性能要优于实验中的SMOTE,IS和SVM等方法。

Abstract

The work aims to propose an improved SMOTE oversampling method to deal with the minority class low data registration recognition accuracy of printing mark images caused by the imbalanced training dataset so as to improve the recognition accuracy of data. The texture features were extracted from the gray-level run-length matrix (GLRLM) of the printing mark images to form multi-dimensional feature data as the input vectors of the model. The oversampling parameter of the minority class was computed based on the neighborhood information of the minority class. Different oversampling strategies were implemented for the minority class. An unbalanced training dataset was learned to construct a support vector machine (SVM) model to realize the printing registration status recognition.The experimental results showed that, in terms of different imbalanced printing datasets, the method proposed in this paper can obtain the values of three evaluation indexes, geometric mean of average classification accuracy Gmean=0.8507, recall rate Re=0.7192 and area under the curve A=0.8549.The proposed method outperforms the SMOTE, the IS and the SVM in the experiment in classifying the different imbalanced printing registration datasets.

引用本文

导出引用
简川霞, 叶荣, 林浩, 贺鑫, 杜美剑. 非均衡训练集过采样的印刷套准识别方法[J]. 包装工程(技术栏目). 2020(21): 251-260 https://doi.org/10.19554/j.cnki.1001-3563.2020.21.037
JIAN Chuan-xia, YE Rong, LIN Hao, HE Xin, DU Mei-jian. Printing Registration Recognition Method Based on Oversampling Imbalanced Training Dataset[J]. Packaging Engineering. 2020(21): 251-260 https://doi.org/10.19554/j.cnki.1001-3563.2020.21.037

基金

广东省信息物理融合系统重点实验室项目(2016B030301008);广东工业大学青年基金重点项目(17QNZD001);2019—2020年大学生创新创业训练项目(xj201911845014,201911845008,xj202011845015,xj202011845016)

PDF(24826 KB)

Accesses

Citation

Detail

段落导航
相关文章

/