基于YOLO-NPDL的复杂交通场景检测方法
张浩晨,张竹林*,史瑞岩,曹士杰,王文翰,雷镇诺
山东交通学院汽车工程学院,山东 济南 250357
摘要:为提高复杂交通场景下车辆目标检测模型的检测精度,以YOLOv8n(you only look once version 8 nano)为基准模型,设计具有复合主干的Neck-ARW(包括辅助检测分支、RepBlock模块、加权跳跃特征连接)颈部结构,减少信息瓶颈造成沿网络深度方向的信息丢失;引入RepBlock结构重参数化模块,在训练过程中采用多分支结构提高模型特征提取性能;添加P2检测层捕捉更多小目标细节特征,丰富网络内小目标的特征信息流;采用Dynamic Head自注意力机制检测头,将尺度感知、空间感知和任务感知自注意力机制融合到统一框架中,提高检测性能;采用基于层自适应幅度的剪枝(layer-adaptive magnitude based pruning,LAMP)算法,移除模型的冗余参数,构建YOLO-NPDL(Neck-ARW,P2,Dynamic Head,LAMP)车辆目标检测模型。以UA-DETRAC(university at Albany detection and tracking)数据集为试验数据集,分别进行RepBlock模块嵌入位置试验、不同颈部结构对比试验、剪枝试验、消融试验、模型性能对比试验,验证YOLO-NPDL模型的平均精度均值。试验结果表明:RepBlock模块同时嵌入辅助检测分支和颈部主干结构时对多尺度目标的特征提取能力更优,在训练过程中可保留更多的细节信息,但参数量和计算量均增大;采用Neck-ARW颈部结构后模型的平均精度均值EmAP50、EmAP50-95分别提高1.1%、1.7%,参数量减小约17.9%,结构较优;剪枝率为1.3时,模型参数量、计算量分别减小约38.0%、24.0%,冗余通道占比较少,结构较紧凑;与YOLOv8n模型相比,YOLO-NPDL模型在参数量基本相同的基础上,召回率增大2.7%,EmAP50增大2.7%,达到94.7%,EmAP50-95增大6.4%,达到79.7%;与目前广泛使用的YOLO系列模型相比,YOLO-NPDL模型在较少参数量的基础上,检测精度较高。YOLO-NPDL模型在检测远端目标、雨天及夜景等实际复杂交通情景中无明显误检、漏检情况,可检测到更多的远端小目标车辆,检测效果更优。
关键词:目标检测;复杂交通场景;YOLOv8n;Neck-ARW;RepBlock;LAMP算法
Complex traffic scene detection method based on YOLO-NPDL
ZHANG Haochen, ZHANG Zhulin*, SHI Ruiyan, CAO Shijie, WANG Wenhan, LEI Zhennuo
School of Automotive Engineering, Shandong Jiaotong University, Jinan 250357, China
Abstract: In order to improve the detection accuracy of the vehicle object detection model in complex traffic scenes, using YOLOv8n (you only look once version 8 nano) as the benchmark model, a Neck-ARW (including auxiliary detection branch, RepBlock module, and weighted jump feature fusion) neck structure with a composite backbone is designed to reduce information loss caused by information bottlenecks along the network depth direction; the RepBlock structure heavy parameterization module is introduced, and the multi-branch structure is used in the training process to improve the model feature extraction performance; the P2 detection layer is added to capture more small target detail features and enrich the feature information flow of small targets in the network; the Dynamic Head self-attention mechanism detection head is used, which integrates scale perception, spatial perception, and task perception self-attention mechanism into a unified framework to improve detection performance. The layer-adaptive magnitude based pruning(LAMP) algorithm is used to remove redundant parameters of the model and construct the YOLO-NPDL(Neck-ARW, P2, Dynamic Head, LAMP) vehicle object detection model. Using the UA-DETRAC(university at Albany detection and tracking) dataset as the experimental dataset, RepBlock module embedding position test, different neck structure comparison test, pruning test, ablation test, and model performance comparison test are conducted to verify the detection accuracy of the YOLO-NPDL model. The experimental results show that: RepBlock module has better feature extraction ability for multi-scale targets when embedding auxiliary detection branches and neck trunk structures at the same time, and can retain more detailed information during the training process, but the amount of parameters and computation increases; after using the Neck-ARW neck structure, the detection accuracy indicators EmAP50 and EmAP50-95 of the model are increased by 1.1% and 1.7%, respectively, and the number of parameters is reduced by about 17.9%, and the structure is better; when the pruning rate is 1.3, the model parameters and computation are reduced by about 38.0% and 24.0%, respectively, and the redundant channel accounts for less, and the structure is more compact; compared with the YOLOv8n model, the YOLO-NPDL model has a 2.7% increase in recall rate, a 2.7% increase in EmAP50, reaching 94.7%,a 6.4% increase in EmAP50-95, reaching 79.7%; compared with the widely used YOLO series models, the YOLO-NPDL model has higher detection accuracy on the basis of fewer parameters. The YOLO-NPDL model has no obvious false detection or omission in detecting remote targets, rainy days and night scenes, and can detect more remote small target vehicles, with better detection effect.
Keywords: object detection; complex traffic scene; YOLOv8n; Neck-ARW; RepBlock; LAMP algorithm
