Abstract

In this work, we introduce SCALAR-NeRF, a novel framework tailored for scalable large-scale neural scene reconstruction. We structure the neural representation as an encoder-decoder architecture, where the encoder processes 3D point coordinates to produce encoded features, and the decoder generates geometric values that include volume densities of signed distances and colors. Our approach first trains a coarse global model on the entire image dataset. Subsequently, we partition the images into smaller blocks using KMeans %, with each block being modeled by a dedicated local model. We enhance the overlapping regions across different blocks by scaling up the bounding boxes of each local block. Notably, the decoder from the global model is shared across distinct blocks and therefore promoting alignment in the feature space of local encoders. We propose an effective and efficient methodology to fuse the outputs from these local models to attain the final reconstruction. Employing this refined coarse-to-fine strategy, our method outperforms state-of-the-art NeRF methods and demonstrates scalability for large-scale scene reconstruction.

Qualitative Analysis on Tanks-and-Temples dataset

Hover over image to move the zoomed in patch; Click on ground-truth image to switch to a different image


Ground Truth
NeRF
NGP (size=2^21)
Block-NeRF
Ours

Qualitative Results for different blocks and final model





Block 0 Block 1 Block 2 SCALAR-NeRF