Papers summary: Neural Fields in Visual Computing and Beyond

Papers summary: Neural Fields in Visual Computing and Beyond

Contents

Applications of the neural fields method
Foundation of the neural fields method

We present to you a review article based on over 250 papers regarding neural fields, which are 36 pages of reading. Let’s compress it even more for hurried readers.

The paper presents the foundation and applications of the neural fields method. The method was introduced in 1998, but only recently (in the last two years) has it gained traction due to its usefulness in visual and 3d computing. The review's authors have identified the neural fields method articles even if the article doesn’t mention the method name explicitly.

Let’s start off by showing you the carrot, that is, the method's capabilities.

Applications of the neural fields method

Reconstruction of a 3D shape
The differentiable rendering method was a huge improvement in the neural field’s methodology. A 3D object can be reconstructed from 2D camera footage or sparse 3D LiDAR point clouds. This technique allowed the reconstruction of a 3D shape even from a single 2D image. Exemplary results can be seen here.
Dynamic reconstruction of a 3D scene
Especially impressive results of the neural fields method are presented here. The model allows for image segmentation with remarkable time consistency. The network can also produce depth maps. It even provides for scene editing, i.e. removing some of the objects from the scene and altering their size and orientation. These are truly outstanding results.
Synthesis of human shape
The method is used to render both the 3D face as well as the body of a human. A human model can be created from a few images and then novel views of the human can be generated. See an example here.
Simultaneous Localization and Mapping (SLAM)
Neural fields can also be used for building 3D maps of an environment which is a handy skill for mobile robots. In this example, the neural network is constantly trained while the robot is moving around the environment. If you want to learn more about the SLAM problem, you can check our SLAM series here.

‍

This is only the tip of the iceberg of possible usages of the neural fields method. To see a much more extensive list go to the original article.

Is carrot big enough for you? Follow it to learn the method foundations.

Foundation of the neural fields method

By physics definition, a field is a quantity defined for all spatial and/or temporal coordinates. The amount can be either a scalar or a vector. The gravitational field is a good physical example of a vector field, i.e. for every location in space, we have a force (3d vector) that is exerted on a unit mass. Images and audio recordings can also be interpreted as fields. Images map 2d coordinates to RGB intensity vector [R, G, B], while Audio recording maps a time coordinate to scalar amplitude.

So you came here to know what a neural field is. A neural field is a field that is approximated by a neural network. In other words, it is a neural network that takes space-time coordinates as an input and outputs the field value. A multilayer perceptron (MLP) is often chosen as a neural representation of a field.

Note: Other names of neural fields are: implicit neural representations, neural implicit or coordinate-based neural networks.

Steps of a neural field algorithm:

Sample coordinates in space-time
Feed sampled coordinates to the neural network to obtain field values
Calculate reconstruction error loss based on the difference between the neural network output and the actual field values (e.g. RGB intensity of an image)
Ran an optimisation procedure to minimise the loss and find the best neural network weights

‍

With the above method, we can convert an image, an audio recording or any other field into its neural representation. If we use a neural network with a smaller number of parameters than the size of the image, bang, we’ve just compressed the image. We can simply recover the compressed image by plugging the pixel coordinates into the neural network and reading RGB values from the network output.

‍

These are just the basics. To successfully encode and reconstruct some data, you may need to use one of the following techniques:

Prior learning and conditioning are used for reconstruction from incomplete sensor signals
Hybrid representation with discrete data structures are used to improve memory and computation efficiency
Forward maps can be used to project domain, e.g. convert 2d image to 3d object
Choice of network architecture to get rid of the blurriness
Manipulation of neural fields to edit representations, e.g. rotating an object in a 2d image as if it was a 3d model.

‍

These were just the basics of the method. If you’d like to dive deeper, I strongly recommend reading the original paper.

As you well know, there is more interesting news on the internet, if you find something really curious, please share it with us! We can promise you, that we gonna regularly share science shortcuts, so stay tuned! Until waiting for the next episode, learn more about outstanding projects from our blog’s posts.