GRaF: Generalizable Radio-Frequency Radiance Fields for Spatial Spectrum Synthesis

Kang Yang1, Yuning Chen1, Wan Du1

1 UC Merced

CVPR 2026

GRaF Overview

GRaF Overview: training scene, testing scene, and synthesized spatial spectra

In the training scene, Radio-Frequency (RF) signals from each transmitter are measured across all surrounding directions by the receiver to form a spatial spectrum. Trained on this scene, GRaF synthesizes spectra for arbitrary transmitter locations in unseen scenes.

Abstract

We present GRaF, Generalizable Radio-Frequency (RF) Radiance Fields, a framework that models RF signal propagation to synthesize spatial spectra at arbitrary transmitter or receiver locations, where each spectrum measures signal power across all surrounding directions at the receiver. Synthesizing spatial spectra is a fundamental capability for a broad range of wireless networking and sensing tasks, including indoor localization, beam management, and channel estimation. Existing neural radiance field (NeRF)-based methods can produce high-fidelity spectra but must be retrained from scratch for every new scene, which severely limits their scalability and practical deployment.

GRaF achieves generalization by exploiting an interpolation theory: the spatial spectrum at a target transmitter can be well approximated using spectra from a small set of geographically proximate transmitters. Two key components realize this idea: (1) a geometry-aware Transformer encoder that captures spatial correlations among neighboring transmitters to learn scene-independent latent representations, and (2) a neural ray tracing algorithm that, given the latent representations, estimates the spatial spectrum at the receiver by aggregating directional contributions along rays. Across multiple wireless technologies and environments, GRaF generalizes to previously unseen scenes while preserving the synthesis quality of state-of-the-art per-scene methods, and substantially reduces the deployment cost.

Architecture

GRaF architecture: wireless scene representation, latent voxel features, and neural-driven ray tracing

GRaF is built around two cooperating modules. The wireless scene representation (left) takes the target transmitter's position together with the positions and measured spectra of a small set of geographically proximate neighbor transmitters. A ResNet-18 feature extractor encodes each neighbor's spectrum, a Transformer encoder fuses the neighbors with respect to the target's geometry, and a radiance & attenuation MLP produces a scene-independent latent voxel feature vis, where i indexes the M rays cast from the receiver and s denotes the voxel's position along the i-th ray.

The neural-driven ray tracing module (right) consumes these latent voxel features and synthesizes the spatial spectrum at the receiver. A multi-head Transformer aggregates voxel contributions along each ray, and a real-and-imaginary pooling step combines them as complex-valued signals so that constructive and destructive interference are preserved. The predicted spectrum is supervised against the groundtruth measurement with an L2 loss. Because the latent representation is conditioned only on transmitter geometry and neighbor spectra rather than on any specific room, the trained model transfers to unseen scenes without per-scene retraining.

Qualitative Results

Visual comparison of synthesized spatial spectra at multiple receiver positions in an unseen scene. GRaF reproduces the directional structure of the groundtruth spectra more faithfully than the per-scene NeRF2 baseline.

Qualitative results: groundtruth vs. NeRF^2 vs. GRaF spatial spectra across seven receiver positions

BibTeX

@inproceedings{Yang2026_GRaF,
  author    = {Kang Yang and Yuning Chen and Wan Du},
  title     = {Generalizable Radio-Frequency Radiance Fields for Spatial Spectrum Synthesis},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026},
}