The dataset used for evaluation, which contains 300k image pairs across three diverse simulation scenarios, is publicly available via Zenodo and also available through Hugging Face Datasets.
Urban Sound Propagation: [sound_baseline, sound_reflection, sound_diffraction, sound_combined]
Each sound example includes:
lat
, long
db
soundmap
, osm
, soundmap_512
temperature
, humidity
, yaw
, sample_id
Lens Distortion: [lens_p1, lens_p2]
Each lens example includes:
fx
, k1
, k2
, k3
, p1
, p2
, cx
label_path
Dynamics of rolling and bouncing movements: [ball_roll, ball_bounce]
Each ball example includes:
ImgName
, StartHeight
, GroundIncli
, InputTime
, TargetTime
input_image
, target_image
Data is divided into train
, test
, and eval
splits. For efficient storage and faster uploads, the data is converted and stored as Parquet files with image data stored as binary blobs.
You can load and use the dataset with the Hugging Face datasets
library. For example, to load the sound_combined variant:
from datasets import load_dataset
dataset = load_dataset("mspitzna/physicsgen", name="sound_combined", trust_remote_code=True)
# Access a sample from the training split.
sample = dataset["train"][0]
input_img = sample["osm"]
target_img = sample["soundmap_512"]
# plot Input vs Target Image for a single sample
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))
ax1.imshow(input_img)
ax2.imshow(target_img)
plt.show()
The code is located in the GitHub repository.
Project Structure:
project_root/
│
├── data/ # Dedicated data folder
│ └── urban_sound_25k_baseline/ # Download this via provided DOI
│ ├── test/
│ │ ├── test.csv
│ │ ├── soundmaps/
│ │ └── buildings/
│ │
│ └── pred/ # Your predictions
│ ├── y_0.png
│ └── ...
│
└── eval_scripts/
├── lens_metrics.py
└── sound_metrics.py
The indexing system for predicted sound propagation images in the pred
folder aligns directly with the test.csv
dataframe rows. Each predicted image file, named as y_{index}.png
, corresponds to the test data’s row at the same index, with index 0 referring to the dataframe’s first row.
Description: Evaluates sound propagation predictions by comparing them to ground truth noise maps, including Line-of-Sight (LoS) and Non-Line-of-Sight (NLoS) errors.
Usage:
python sound_metrics.py --data_dir data/true --pred_dir data/pred --output evaluation.csv
Arguments:
--data_dir
: Directory containing true sound maps and test.csv
.--pred_dir
: Directory containing predicted sound maps.--output
: Path to save the evaluation results.Description: Evaluates the accuracy of facial landmark predictions by comparing them to ground truth images.
Usage:
python lens_metrics.py --data_dir data/true --pred_dir data/pred --output results/
Arguments:
--data_dir
: Directory containing true label images and test.csv
.--pred_dir
: Directory containing predicted landmark images.--output
: Directory to save the results.The table below presents baseline performance metrics for various architectural approaches, encompassing combined mean absolute error (MAE) and weighted mean absolute percentage error (wMAPE), alongside specific line-of-sight (LoS) and non-line-of-sight (NLoS) metrics.
Condition | Architecture | LoS MAE | NLoS MAE | LoS wMAPE | NLoS wMAPE | Runtime / Sample (ms) |
---|---|---|---|---|---|---|
Baseline | Simulation | 0.00 | 0.00 | 0.00 | 0.00 | 204700 |
Baseline | convAE | 3.67 | 2.74 | 20.24 | 67.13 | 0.128 |
Baseline | VAE | 3.92 | 2.84 | 21.33 | 75.58 | 0.124 |
Baseline | UNet | 2.29 | 1.73 | 12.91 | 37.57 | 0.138 |
Baseline | Pix2Pix | 1.73 | 1.19 | 9.36 | 6.75 | 0.138 |
Baseline | DDPM | 2.42 | 3.26 | 15.57 | 51.08 | 3986.353 |
Baseline | SD(w.CA) | 3.76 | 3.34 | 17.42 | 35.18 | 2961.027 |
Baseline | SD | 2.12 | 1.08 | 13.23 | 32.46 | 2970.86 |
Baseline | DDBM | 1.61 | 2.17 | 17.50 | 65.24 | 3732.21 |
Diffraction | Simulation | 0.00 | 0.00 | 0.00 | 0.00 | 206000 |
Diffraction | convAE | 3.59 | 8.04 | 13.77 | 32.09 | 0.128 |
Diffraction | VAE | 3.92 | 8.22 | 14.46 | 32.57 | 0.124 |
Diffraction | UNet | 0.94 | 3.27 | 4.22 | 22.36 | 0.138 |
Diffraction | Pix2Pix | 0.91 | 3.36 | 3.51 | 18.06 | 0.138 |
Diffraction | DDPM | 1.59 | 3.27 | 8.25 | 20.30 | 3986.353 |
Diffraction | SD(w.CA) | 2.46 | 7.72 | 10.14 | 31.23 | 2961.027 |
Diffraction | SD | 1.33 | 5.07 | 8.15 | 24.45 | 2970.86 |
Diffraction | DDBM | 1.35 | 3.35 | 11.22 | 23.56 | 3732.21 |
Reflection | Simulation | 0.00 | 0.00 | 0.00 | 0.00 | 251000 |
Reflection | convAE | 3.83 | 6.56 | 20.67 | 93.54 | 0.128 |
Reflection | VAE | 4.15 | 6.32 | 21.57 | 92.47 | 0.124 |
Reflection | UNet | 2.29 | 5.72 | 12.75 | 80.46 | 0.138 |
Reflection | Pix2Pix | 2.14 | 4.79 | 11.30 | 30.67 | 0.138 |
Reflection | DDPM | 2.74 | 7.93 | 17.85 | 80.38 | 3986.353 |
Reflection | SD(w.CA) | 3.81 | 6.82 | 19.78 | 81.61 | 2961.027 |
Reflection | SD | 2.53 | 5.26 | 15.04 | 55.27 | 2970.86 |
Reflection | DDBM | 1.93 | 6.38 | 18.34 | 79.13 | 3732.21 |
The table presents a comparative analysis of different models’ performance in accurately predicting facial landmarks under varying lens distortion settings, represented by the coefficients ( p_1 ) and ( p_2 ). It details the combined error, X Error, Y Error, and Shift for each model, highlighting how each model copes with horizontal and vertical distortion impacts separately.
Model | Comb. | X Err. | Y Err. | Shift | Runtime / Sample (ms) |
---|---|---|---|---|---|
$p_1 \neq 0, p_2 = 0$ | |||||
Simulation | 0.00 | 0.00 | 0.00 | 0.00 | 153.205 |
convAE | 11.93 | 6.75 | 8.13 | 1.38 | 0.110 |
VAE | 11.53 | 6.55 | 7.83 | 1.28 | 0.122 |
UNet | 2.82 | 1.28 | 2.15 | 0.87 | 0.118 |
Pix2Pix | 2.00 | 0.99 | 1.43 | 0.44 | 0.122 |
DDPM | 1.93 | 0.94 | 1.39 | 0.45 | 3970.603 |
SD(w.CA) | 3.09 | 1.59 | 2.21 | 0.62 | 2991.678 |
SD | 2.79 | 1.41 | 2.01 | 0.60 | 2997.576 |
$p_1 = 0, p_2 \neq 0$ | |||||
Simulation | 0.00 | 0.00 | 0.00 | 0.00 | 153.205 |
convAE | 10.56 | 8.35 | 4.77 | 2.21 | 0.110 |
VAE | 10.40 | 8.26 | 4.62 | 3.64 | 0.122 |
UNet | 2.36 | 1.33 | 1.60 | 0.27 | 0.117 |
Pix2Pix | 1.77 | 1.02 | 1.14 | 0.13 | 0.123 |
DDPM | 2.13 | 1.39 | 1.23 | 0.16 | 3970.603 |
SD(w.CA) | 2.85 | 1.60 | 1.94 | 0.34 | 2991.678 |
SD | 2.44 | 1.38 | 1.64 | 0.26 | 2997.576 |
The table evaluates the performance of three generative models—GAN, UNet, and DDPM—on four key error metrics: Position X, Position Y, Rotation, and Roundness. These metrics assess each model’s ability to accurately predict ball position, rotation, and shape in a controlled simulation environment, highlighting their precision in handling geometric distortions.
Model | Position X | Position Y | Rotation | Roundness | Error |
---|---|---|---|---|---|
convAE | 4.24 $\pm$ 3.9 | 6.08 $\pm$ 5.9 | 12.2 $\pm$ 8.6 | 1.06 $\pm$ 0.0 | 99% |
VAE | 4.69 $\pm$ 6.1 | 6.25 $\pm$ 6.9 | 31.0 $\pm$ 40 | 0.90 $\pm$ 0.1 | 95% |
UNet | 5.53 $\pm$ 7.5$ | 10.8 $\pm$ 12 | 15.2 $\pm$ 23 | 0.74 $\pm$ 0.2 | 28% |
Pix2Pix | 6.28 $\pm$ 8.0 | 11.7 $\pm$ 13 | 17.2 $\pm$ 21 | 0.56 $\pm$ 0.1 | 11% |
DDPM | 7.91 $\pm$ 9.0 | 15.5 $\pm$ 14 | 32.9 $\pm$ 34 | 0.61 $\pm$ 0.2 | 5.7% |
SD(w.CA) | 40.0 $\pm$ 49 | 24.8 $\pm$ 23 | 61.1 $\pm$ 52 | 0.53 $\pm$ 0.2 | 7.3% |
SD | 8.55 $\pm$ 12 | 16.2 $\pm$ 14 | 34.2 $\pm$ 38 | 0.47 $\pm$ 0.1 | 2% |
We welcome contributions from the research community! If you have conducted research on sound propagation prediction and have results that outperform those listed in our leaderboard, or if you’ve developed a new architectural approach that shows promise, we invite you to share your findings with us.
Please submit your results, along with a link to your publication, to martin.spitznagel@hs-offenburg.de. Submissions should include detailed performance metrics and a description of the methodology used. Accepted contributions will be updated on the leaderboard.
Accepted at IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2025.
Preprint is available here: https://arxiv.org/abs/2503.05333
This dataset is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
We express our gratitude for the financial support provided by the German Federal Ministry of Education and Research (BMBF). This project is part of the “Forschung an Fachhochschulen in Kooperation mit Unternehmen (FH-Kooperativ)” program, within the joint project KI-Bohrer, and is funded under the grant number 13FH525KX1.