The "Wait Time Optimization and Analysis of Interactions in Public Areas" project, part of the CTE SQUARE Pesaro, aims to monitor and analyze the flow of people using advanced Computer Vision (CV) and Deep Learning (DL) techniques. By installing webcams in Piazza del Popolo in Pesaro and employing YOLOv8 (You Only Look Once) neural networks, the project seeks to reduce waiting times and better understand people’s behaviors and attention in public spaces.
- Project Overview
- Supervision
- Features
- Installation
- Usage
- Results
- Architecture modify
- Video Processing
- Contributions
- License
This project focuses on three main objectives:
- Wait Time Analysis: Monitoring the duration people spend in specific areas to identify bottlenecks.
- Model Optimization: Enhancing DL models for deployment on edge devices.
- Practical Implementation: Installing webcams and deploying models in real-world scenarios.
Supervision is a powerful library used to enhance computer vision and deep learning applications. It simplifies the process of training, evaluating, and deploying DL models. Its features include data augmentation, model evaluation metrics, and support for various neural network architectures.
- Real-time Wait Time Analysis
- Edge Device Optimization
- Deployment in Public Areas
To install the supervision package in a Python>=3.8 environment, use the following command:
pip install supervision
To verify that the installation was successful, run the following commands:
import supervision
print(supervision.__version__)
Download and install Miniconda from the official Miniconda website. Miniconda website.
Ensure that Conda is added to your environment variables during the installation process.
Once Miniconda is installed and configured, open your terminal (or Anaconda Prompt on Windows) and run the following command: If no errors occur and the version number is displayed, the installation was successful. Then to install the necessary package, please follow these steps:
conda install -c conda-forge supervision
Clone the repository and navigate to the root directory:
git clone https://github.com/andreaFaccenda00/DeepVisionAnalytics.git
Set up the Python environment and activate it:
python3 -m venv venv
source venv\Scripts\activate
pip install --upgrade pip
Perform a headless install:
pip install -e "."
For desktop installation, use:
pip install -e ".[desktop]"
Install the required dependencies listed in requirements.txt:
pip install -r requirements.txt
If you want to test zone time in zone analysis on your own video, you can use this script to design custom zones and save results as a JSON file. The script will open a window where you can draw polygons on the source image or video file. The polygons will be saved as a JSON file.
-
--source_path
: Path to the source image or video file for drawing polygons. -
--zone_configuration_path
: Path where the polygon annotations will be saved as a JSON file. -
enter
- finish drawing the current polygon. -
escape
- cancel drawing the current polygon. -
q
- quit the drawing window. -
s
- save zone configuration to a JSON file.
python scripts/draw_zones.py
--source_path "data/people.mp4"
--zone_configuration_path "data/config.json"
design_zones.mp4
We trained YOLOv8 variants (YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l) on the MOTSynth and EuroCity Persons datasets. MOTSynth is a synthetic dataset generated using GTA V, while EuroCity Persons consists of urban images from various European cities.
-
Datasets:
- MOTSynth: Large-scale synthetic dataset for pedestrian detection, segmentation, and tracking.
- EuroCity Persons: Urban images from multiple European cities for realistic pedestrian detection.
-
Training Split: 80% for training, 10% for validation, 10% for testing.
-
Image Size: 640x640 pixels.
Hyperparameter | YOLOv8n | YOLOv8s | YOLOv8m | YOLOv8l |
---|---|---|---|---|
Batch size | 16 | 16 | 16 | 16 |
Image size | 640x640 | 640x640 | 640x640 | 640x640 |
Epochs | 250 | 250 | 250 | 250 |
Early stopping | 30 | 30 | 30 | 30 |
Optimizer | SGD | SGD | SGD | SGD |
Initial learning rate | 0.01 | 0.01 | 0.01 | 0.01 |
LR reduction factor | 0.01 | 0.01 | 0.01 | 0.01 |
Momentum | 0.95 | 0.95 | 0.95 | 0.95 |
Weight decay | 0.0001 | 0.0001 | 0.0001 | 0.0001 |
IOU threshold | 0.7 | 0.7 | 0.7 | 0.7 |
Detection limit | 300 | 300 | 300 | 300 |
Mixed precision | Yes | Yes | Yes | Yes |
Warmup epochs | 10 | 10 | 10 | 10 |
Warmup momentum | 0.5 | 0.5 | 0.5 | 0.5 |
Warmup LR | 0.1 | 0.1 | 0.1 | 0.1 |
Masking ratio | 4 | 4 | 4 | 4 |
The trained YOLOv8 variants were tested on the SOMPT22 dataset in addition to the MOTSynth and EuroCity Persons datasets. SOMPT22 was used exclusively for testing to provide a rigorous evaluation of the model's capability in urban surveillance scenarios.
Dataset | Model | Inference | mAP@50 | mAP@50-95 | Precision | Recall |
---|---|---|---|---|---|---|
MOTSynth | YOLOv8n | 1.4ms | 0.729 | 0.430 | 0.816 | 0.585 |
YOLOv8s | 2.1ms | 0.749 | 0.449 | 0.836 | 0.606 | |
YOLOv8m | 4.5ms | 0.777 | 0.483 | 0.849 | 0.648 | |
YOLOv8l | 7.6ms | 0.764 | 0.47 | 0.86 | 0.617 | |
EuroCity Persons | YOLOv8n | 1.7ms | 0.602 | 0.401 | 0.721 | 0.287 |
YOLOv8s | 3.7ms | 0.629 | 0.439 | 0.846 | 0.298 | |
YOLOv8m | 7.9ms | 0.633 | 0.415 | 0.844 | 0.388 | |
YOLOv8l | 13.5ms | 0.621 | 0.425 | 0.900 | 0.328 | |
MOTSynth EuroCity | YOLOv8n | 1.2ms | 0.741 | 0.454 | 0.824 | 0.600 |
YOLOv8s | 2.2ms | 0.778 | 0.485 | 0.850 | 0.651 | |
YOLOv8m | 4.6ms | 0.781 | 0.488 | 0.854 | 0.652 | |
YOLOv8l | 7.6ms | 0.786 | 0.487 | 0.859 | 0.659 | |
Coco | YOLOv8n | 1.2ms | 0.626 | 0.434 | 0.799 | 0.377 |
YOLOv8s | 2.2ms | 0.641 | 0.459 | 0.810 | 0.447 | |
YOLOv8m | 4.6ms | 0.649 | 0.471 | 0.809 | 0.463 | |
YOLOv8l | 7.9ms | 0.643 | 0.465 | 0.809 | 0.431 |
YOLOv8s was chosen for its balance of rapid inference (2.4ms), high precision (0.956), and significant recall (0.742), making it suitable for real-time surveillance and monitoring.
All trained models can be accessed via this [OneDrive link][https://univpm-my.sharepoint.com/:f:/g/personal/s1119359_studenti_univpm_it/EsUTP-Jwi65GlAL8wUo6WzIBGzOsCs2ITQL4bhMY-cACqg?e=iM0VQh].
The following images demonstrate the performance and evaluation metrics of the YOLOv8s model trained on MOTSynth & Eurocity:
These results illustrate the model's accuracy in detecting pedestrians, its confidence at various thresholds, and the improvements in training metrics over time.
In the "Dwell Time Analysis for People Flow" project, the YOLOv8 neural network architecture was modified for improved small object detection and computational efficiency:
-
Parameter Reduction: The network was reduced from 11 million to 2.2 million parameters, enhancing computational efficiency and image processing speed.
-
Depth Reduction: The network depth was limited to 90 layers, optimizing execution speed while maintaining small object detection capability.
-
Layer Removal: Layers responsible for medium and large object detection were removed, reducing model complexity and parameters.
-
Gradient Flow Optimization: Unnecessary connections were reduced to improve gradient flow efficiency, making the model lighter and faster.
-
Small Object Focus: The modified structure was optimized to detect small objects, the main focus of this use case.
The performance of the modified YOLOv8s network, optimized for small object detection, was evaluated with the following metrics:
- Box Precision (P): 0.826
- Recall (R): 0.641
- mAP@50: 0.766
- mAP@50-95: 0.47
These modifications resulted in significant inference speed improvements, from 30 fps to 45 fps, with a reduction in parameters from 11.2 million to 2.2 million, making the model optimal for real-time applications. Despite a slight decrease in performance metrics, the speed improvement was a crucial priority.
Script to run object detection on a video file using the Ultralytics YOLOv8 model.Key parameters include:
--zone_configuration_path
: Path to the zone configuration JSON file.--source_video_path
: Path to the source video file.--weights
: Path to the model weights file. Default is'yolov8s_pedestrian.pt'
.--device
: Computation device ('cpu'
,'mps'
or'cuda'
). Default is'cuda'
.--classes
: List of class IDs to track. If empty, all classes are tracked.--confidence_threshold
: Confidence level for detections (0
to1
). Default is0.3
.--iou_threshold
: IOU threshold for non-max suppression. Default is0.7
.
To run this code, ensure you have all the required libraries installed and the correct file paths set for your video, configuration, and model weights. Execute the script as follows:
python ultralytics_static_video.py --zone_configuration_path data/people.mp4 --source_video_path data/config.json --weights 'yolov8s_pedestrian.pt' --device 'cuda' --classes 0 --confidence_threshold 0.3 --iou_threshold 0.7
The script will process the video, detect and track objects, annotate zones of interest, and calculate the time spent in each zone. The output will be saved as an annotated video.
pedestrian_analysis.online-video-cutter.com.mp4
Script to run object detection on a video stream using the Roboflow Inference model.
--zone_configuration_path
: Path to the zone configuration JSON file.--rtsp_url
: Complete RTSP URL for the video stream.--weights
: Path to the model weights file. Default is'yolov8s_pedestrian.pt'
.--device
: Computation device ('cpu'
,'mps'
or'cuda'
). Default is'cuda'
.--confidence_threshold
: Confidence level for detections (0
to1
). Default is0.3
.--iou_threshold
: IOU threshold for non-max suppression. Default is0.7
.
python ultralytics_stream_example.py
--zone_configuration_path "data/config.json"
--rtsp_url "rtsp://localhost:8554/live0.stream"
--weights "yolov8s_pedestrian.pt"
--device "cuda"
--classes 0
--confidence_threshold 0.3
--iou_threshold 0.7
Contributions are welcome! Please open an issue or submit a pull request for any improvements or features.
This project is licensed under the MIT License. See the LICENSE file for details.