danhlephuoc / BerlinV2X

Support code and documentation for the Berlin V2X dataset

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Berlin V2X

Pandas NumPy Jupyter

Download the dataset on IEEE Dataport

Read about the problem statement for ITU AI/ML in 5G Challenge 2023

The Berlin V2X dataset offers high-resolution GPS-located wireless measurements across diverse urban environments in the city of Berlin for both cellular and sidelink radio access technologies, acquired with up to 4 cars over 3 days. The data enables thus a variety of different machine learning (ML) studies towards vehicle-to-anything (V2X) communication.

In the following, an overview of the data is provided. For a detailed description of the measurement campaign please refer to the paper.

Requirements

We strongly recommend to work on Python with the following libraries:

Furthermore, we suggest some additional libraries to process and analyze the data, such as:

File overview

filetree

Berlin V2X includes different data files in parquet format. For an easy on-boarding, we provide GPS-located and labelled data frames, merged and resampled to 1 second, for:

  • Cellular measurements (cellular_dataframe.parquet).
  • Sidelink measurements (sidelink_dataframe.parquet).

All data sources are also provided in high-resolution for:

  • Ping traces - per car (x4).
  • Iperf traces - per car (x4) for downlink measurements + server (x1) for uplink measurements.
  • TCPdump data for the sidelink - per car (x4).
  • GPS data - per car (x4).
  • MobileInsight traces - per car (x4) and message type (x50, check directory mi for details).
    • The complete messages are zipped per device. The following message types, which have been merged into the cellular dataframe, are also provided unzipped for a quick access:
      • LTE_PHY_Serv_Cell_Measurement
      • LTE_RRC_Serv_Cell_Info
      • LTE_PHY_PDSCH_Stat_Indication
      • LTE_PHY_PUSCH_Tx_Report
Data category Source Tool Sampling interval Features
Quality of Service DME ping 1 s Delay
iperf 1 s DL Datarate, jitter
Server iperf 1 s UL Datarate, jitter
Cellular DME MobileInsight 10 ms PHY: SNR, RSRP, RSRQ, RSSI
20 ms PDSCH/PUSCH: Assigned RBs, TB Size, DL MCS, UL Tx Power
Event-based RRC: Cell Identity, DL/UL frequency, DL/UL bandwidth
Sidelink SDR UE tcpdump Event-based SNR, RSRP, RSRQ, RSSI, Noise Power, Rx Power, Rx Gain
Position GPS 1 s Latitude, Longitude, Altitude, Velocity, Heading
Side information Internet database HERE API 5 min Traffic Jam Factor, Traffic Street Name, Traffic Distance
DarkSky 1 hour Cloud cover, Humidity, Precipitation Intensity & Probability, Temperature, Pressure, Wind Speed
Metadata Scenario, operator, drive type, target datarate, direction

For merging and preprocessing details check the notebooks in the preprocess folder.

Quickstart

You can directly load the cellular or sidelink merged dataframe in pandas and inspect the columns. Some general information on each column can be found in sidelink_info.csv and cellular_info.csv.

In order to reuse the code in analyze and preprocess, place the dataset under data. The publication figures will be saved to plots.

Data sources

  1. MobileInsight
  2. Iperf
  3. Ping
  4. RUDE & CRUDE
  5. TCP Dump
  6. GPS traces
  7. Here API
  8. DarkSky API

MobileInsight

All information captured by MobileInsight from available LTE channels. The available information also depends on the modem of the measurement device.

Information from the following message types is provided in the merged cellular dataframe:

  • LTE_PHY_Serv_Cell_Measurement
  • LTE_RRC_Serv_Cell_Info
  • LTE_PHY_PDSCH_Stat_Indication
  • LTE_PHY_PUSCH_Tx_Report

The merged data is based in the signal strength and quality PHY Serving Cell information for both primary and secondary cells (marked as PCelland SCell, respectively). This has been enriched with RRC information on the cells and the aggregated number of allocated resource blocks and transport block size in downlink and uplink from PDSCH/PUSCH, respectively. The shared channel information also allows us to include the transmitted power in uplink and the modulation and coding scheme (MCS) in downlink.

More details about the preprocessing of MobileInsight data can be found under the mi folder.

Iperf

Iperf is a speed test application that enables measuring the bandwidth and jitter of a UDP or TCP connection.

In the measurement campaign, Iperf was run on both a DME and server to receive throughput measurements with a granularity of 1s. For experiments that require high accuracy. The iperf measurements that have been merged in the cellular dataframe are extracted from the destination, i.e., DME for downlink and server for uplink.

Ping

Collected from the console command ping, it provides the delay measurements.

RUDE

The packets were transmitted according to scenarios S1 and S2:

  • S1: Cooperative awareness messages (CAM) of length 69 bytes at a packet transmission rate of 20 Hz and modulation and coding scheme (MCS) 8 using two sub-channels.
  • S2: Collective perception messages (CPM) of length 1000 bytes (including IP header and payload) at a rate of 50 Hz with MCS 12 using ten sub-channels.

We aggregate the information on the received packets down to 1 second to estimate packet error rate. For a detailed insight of the sidelink signal parameters, check the dataset publication or RUDE's documentation.

TCP Dump

TCP Dump is a packet analyzer that allows tracking transmitted packets and their properties (e.g. payload, size of the packet). The received sidelink packages were decoded with TCPdump and parsed into parquet files for every receiver. The sidelink data extracted from the incoming messages for any given sidelink UE are provided as separate parquet files.

GPS

GPS data is collected for each device with a granularity of 1 second.

The GPS traces are enriched with the variable Pos in Ref Round, i.e., the position in the reference round. This variable is a mapping of the latitude and longitude into the distance that was driven by an arbitrarily chosen car from an arbitrarily chosen point in an arbitrarily chosen round. In this way, Pos in Ref Round allows the analysis of wireless parameters against a single 1-dimensional spatial value (Check add_pos_in_ref_round.ipynb for details). For the sidelink data, the distance between cars is also provided.

Distance is computed in all cases after conversion to planar coordinates for simplicity. The error should be negligible due to the small area that is covered by the data.

The merged GPS traces in the IEEE dataport also include the side information from the APIs HERE and DarkSky.

HERE API

The information about traffic density was downloaded from the HERE Traffic API every 5 minutes during the measurements.

For determining the Traffic Jam Factor at a given location the closest route from the API data was calculated. For debugging purposes the name of the street where this Traffic Jam Factor was taken from is saved in the Traffic Street Name column and the distance to this street is saved in the Traffic Distance column.

DarkSky

The information about Cloud cover, Humidity, Precipitation Intensity & Probability, Temperature, Pressure and Wind Speed was downloaded from the DarkSky API.

Time granularity is 1 hour and location granularity is 0.01 degrees in both latitude and longitude.

Reference

ITU AI/ML in 5G Challenge

The Berlin V2X dataset forms the problem statement "Multi-environment automotive QoS prediction using AI/ML" at ITU AI/ML in 5G Challenge 2023. Read more on ml5g-2023.

Examples

Publication figures

The code to generate the publication figures can be found in the analyze folder.

AI4Mobile

AI4Mobile is a research project funded by the Federal Ministry for Education and Research (BMBF), from the announcement Artificial Intelligence in Communication Networks within the scope of the High-Tech Strategy of the German Federal Government.

The scope of the project is the study of AI-aided wireless systems for mobility in industry and traffic. More information at ai4mobile.org.

Citation

If you use the dataset, please cite it as:

@inproceedings{hernangomez2023berlin,
  title = {Berlin {{V2X}}: {{A Machine Learning Dataset}} from {{Multiple Vehicles}} and {{Radio Access Technologies}}},
  shorttitle = {Berlin {{V2X}}},
  booktitle = {2023 {{IEEE}} 97th {{Vehicular Technology Conference}} ({{VTC2023-Spring}})},
  author = {Hernang{\'o}mez, Rodrigo and Geuer, Philipp and Palaios, Alexandros and Sch{\"a}ufele, Daniel and Watermann, Cara and {Taleb-Bouhemadi}, Khawla and Parvini, Mohammad and Krause, Anton and Partani, Sanket and Vielhaus, Christian and Kasparick, Martin and K{\"u}lzer, Daniel F. and Burmeister, Friedrich and Fitzek, Frank H. P. and Schotten, Hans D. and Fettweis, Gerhard and Sta{\'n}czak, S{\l}awomir},
  year = {2023},
  month = jun,
  pages = {1--5},
  address = {{Florence, Italy}},
  issn = {2577-2465},
  doi = {10.1109/VTC2023-Spring57618.2023.10200750},
  copyright = {All rights reserved},
}

Project logos

About

Support code and documentation for the Berlin V2X dataset

License:Creative Commons Zero v1.0 Universal


Languages

Language:Jupyter Notebook 97.4%Language:Python 2.5%Language:Shell 0.1%