Optimizing YOLOv8 for Parking Space Detection: Comparative Analysis of Custom Backbone Architectures

Department of Computer Science and Engineering
The University of Texas at Arlington

*Indicates First Author
An overview of YOLOv8 architecture

YOLOv8 is the part of the You Only Look Once (YOLO) family of object detectors, featuring an anchor-free detection head, decoupled classification and regression branches, and improved scalability. Its architecture consists of three main components: the backbone for feature extraction, the neck (typically a PAN/FPN structure) for multi-scale feature fusion, and the head for object classification and bounding box regression. In this project, we replaced the default YOLOv8 backbone with four custom architectures—ResNet-18, VGG16, EfficientNetV2, and GhostNet—to evaluate their performance in detecting parking space occupancy on the PKLot dataset. Each backbone was integrated into the YOLOv8 pipeline, and the resulting models were benchmarked based on accuracy (mAP), inference speed (FPS), and computational efficiency (parameter count). This modular experimentation helped us understand the trade-offs between lightweight and high-capacity backbones in real-world detection scenarios.

Abstract

Parking space occupancy detection is a critical component in the development of intelligent parking management systems. Traditional object detection approaches, such as YOLOv8, provide fast and accurate vehicle detection across parking lots but can struggle with borderline cases, such as partially visible vehicles, small vehicles (e.g., motorcycles), and poor lighting conditions. In this work, we perform a comprehensive comparative analysis of customized backbone architectures integrated with YOLOv8 on the PKLot dataset in terms of detection accuracy and computational efficiency. Experimental results highlight each architecture’s strengths and trade-offs, providing insight into selecting suitable models for parking occupancy.

Architecture Overview

YAML Configuration

YOLOv8 allows model customization via a YAML configuration file. Ain't Markup Language (YAML) is a human-readable data serialization language used for storing and organizing data, especially configuration settings. The YAML file for YOLOv8 defines the model architecture by specifying the layers and their configurations. When loading or training a model, the YAML file is passed to a parser that dynamically constructs the model's architecture at runtime. The YAML configuration for YOLOv8 typically consists of: number of classes (nc), scales, and the layer definitions grouped under backbone and head. Each entry in the layer is represented as a list in the form [from, number, module, args]. Each custom backbone is implemented as a torchvision model and plugged into the YAML file.

Results

All models achieved high precision and recall, but sub- tle differences emerge when comparing stricter metrics like mAP50:95. YOLO-EfficientNetv2 achieved the best overall detection performance, with the highest mAP50:95 (0.986) and excellent precision (0.998), while maintaining a moderate computational cost (56.4 GFLOPs) and lower inference time (4.1 ms). YOLO-ResNet-18 offered a strong balance, matching the mAP50 score (0.994) and achieving strong precision (0.998) with slightly lower localization accuracy (mAP50:95 = 0.976). YOLO-VGG16, while achieving high precision (0.998), showed a slightly lower recall (0.985) and mAP50:95 (0.985), and was the most computationally expensive model (262.1 GFLOPs), despite a moderate inference time (3.3 ms), making it less suitable for lightweight deployment. YOLOv8n maintained competi- tive precision and recall (both 0.996) and fast inference (0.9 ms), demonstrating its suitability for real-time applications with minimal compute. YOLO-Ghost-P2 achieved the fastest inference (1.5 ms) among custom modifications with the lowest parameter count (1.6M), but ts slightly lower recall (0.978) and mAP50:95 (0.896) suggest a higher likelihood of missed detections.

Overall,models like YOLO-VGG16 and EfficientNet excel in detection accuracy, while YOLOv8n and Ghost- P2 are better suited for edge applications where inference speed and model size are critical. Precision-recall trade-offs highlight the need to match model choice with the deployment scenario’s tolerance for false positives and missed detections.

Comparisons

To better understand the training behavior of each YOLO variant, we present key performance metrics plotted over training epochs. YOLOv8n and YOLO-Ghost-P2 showed the fastest drop in loss, attributed to their lightweight architectures, while YOLO-EfficientNet and YOLO-VGG16 required more epochs to stabilize due to deeper networks. Other plots are readily available in the code repository.

Paper

BibTeX

          
            @misc{pokhrel2025optimizingyolov8parkingspace,
              title={Optimizing YOLOv8 for Parking Space Detection: Comparative Analysis of Custom YOLOv8 Architecture}, 
              author={Apar Pokhrel and Gia Dao},
              year={2025},
              eprint={2505.17364},
              archivePrefix={arXiv},
              primaryClass={cs.CV},
              url={https://arxiv.org/abs/2505.17364}, 
        }