3D Deep Learning with Python

Name: 3D Deep Learning with Python | Design and develop your computer vision model with 3D data using PyTorch3D and more
Brand: Packt Publishing
Price: 32.39 EUR
Availability: OnlineOnly

Design and develop your computer vision model with 3D data using PyTorch3D and more

Xudong Ma David Farrugia Vishakh Hegde Lilit Yolyan(Author)

Packt Publishing

1st Edition

Published on 31. October 2022

236 pages

E-Book

ePUB with Adobe-DRM

System requirements

E-Book

ePUB without DRM

System requirements

978-1-80323-368-0 (ISBN)

from €32.39

Available for download

Watchlist: see prices

Description

All prices

More details

Other editions

Content

Cover
Title Page
Copyright and Credits
Contributors
Table of Contents
Preface
PART 1: 3D Data Processing Basics
Chapter 1: Introducing 3D Data Processing
Technical requirements
Setting up a development environment
3D data representation
Understanding point cloud representation
Understanding mesh representation
Understanding voxel representation
3D data file format - Ply files
3D data file format - OBJ files
Understanding 3D coordination systems
Understanding camera models
Coding for camera models and coordination systems
Summary
Chapter 2: Introducing 3D Computer Vision and Geometry
Technical requirements
Exploring the basic concepts of rendering, rasterization, and shading
Understanding barycentric coordinates
Light source models
Understanding the Lambertian shading model
Understanding the Phong lighting model
Coding exercises for 3D rendering
Using PyTorch3D heterogeneous batches and PyTorch optimizers
A coding exercise for a heterogeneous mini-batch
Understanding transformations and rotations
A coding exercise for transformation and rotation
Summary
PART 2: 3D Deep Learning Using PyTorch3D
Chapter 3: Fitting Deformable Mesh Models to Raw Point Clouds
Technical requirements
Fitting meshes to point clouds - the problem
Formulating a deformable mesh fitting problem into an optimization problem
Loss functions for regularization
Mesh Laplacian smoothing loss
Mesh normal consistency loss
Mesh edge loss
Implementing the mesh fitting with PyTorch3D
The experiment of not using any regularization loss functions
The experiment of using only the mesh edge loss
Summary
Chapter 4: Learning Object Pose Detection and Tracking by Differentiable Rendering
Technical requirements
Why we want to have differentiable rendering
How to make rendering differentiable
What problems can be solved by using differentiable rendering
The object pose estimation problem
How it is coded
An example of object pose estimation for both silhouette fitting and texture fitting
Summary
Chapter 5: Understanding Differentiable Volumetric Rendering
Technical requirements
Overview of volumetric rendering
Understanding ray sampling
Using volume sampling
Exploring the ray marcher
Differentiable volumetric rendering
Reconstructing 3D models from multi-view images
Summary
Chapter 6: Exploring Neural Radiance Fields (NeRF)
Technical requirements
Understanding NeRF
What is a radiance field?
Representing radiance fields with neural networks
Training a NeRF model
Understanding the NeRF model architecture
Understanding volume rendering with radiance fields
Projecting rays into the scene
Accumulating the color of a ray
Summary
PART 3: State-of-the-art 3D Deep Learning Using PyTorch3D
Chapter 7: Exploring Controllable Neural Feature Fields
Technical requirements
Understanding GAN-based image synthesis
Introducing compositional 3D-aware image synthesis
Generating feature fields
Mapping feature fields to images
Exploring controllable scene generation
Exploring controllable car generation
Exploring controllable face generation
Training the GIRAFFE model
Frechet Inception Distance
Training the model
Summary
Chapter 8: Modeling the Human Body in 3D
Technical requirements
Formulating the 3D modeling problem
Defining a good representation
Understanding the Linear Blend Skinning technique
Understanding the SMPL model
Defining the SMPL model
Using the SMPL model
Estimating 3D human pose and shape using SMPLify
Defining the optimization objective function
Exploring SMPLify
Running the code
Exploring the code
Summary
Chapter 9: Performing End-to-End View Synthesis with SynSin
Technical requirements
Overview of view synthesis
SynSin network architecture
Spatial feature and depth networks
Neural point cloud renderer
Refinement module and discriminator
Hands-on model training and testing
Summary
Chapter 10: Mesh R-CNN
Technical requirements
Overview of meshes and voxels
Mesh R-CNN architecture
Graph convolutions
Mesh predictor
Demo of Mesh R-CNN with PyTorch
Demo
Summary
Index
Other Books You May Enjoy

1 Introducing 3D Data Processing

In this chapter, we are going to discuss some basic concepts that are very fundamental to 3D deep learning and that will be used frequently in later chapters. We will begin by learning about the most frequently used 3D data formats, as well as the many ways that we are going to manipulate them and convert them to different formats. We will start by setting up our development environment and installing all the necessary software packages, including Anaconda, Python, PyTorch, and PyTorch3D. We will then talk about the most frequently used ways to represent 3D data - for example, point clouds, meshes, and voxels. We will then move on to the 3D data file formats, such as PLY and OBJ files. We will then discuss 3D coordination systems. Finally, we will discuss camera models, which are mostly related to how 3D data is mapped to 2D images.

After reading this chapter, you will be able to debug 3D deep learning algorithms easily by inspecting output data files. With a solid understanding of coordination systems and camera models, you will be ready to build on that knowledge and learn about more advanced 3D deep learning topics.

In this chapter, we're going to cover the following main topics:

Setting up a development environment and installing Anaconda, PyTorch, and PyTorch3D
3D data representation
3D data formats - PLY and OBJ files
3D coordination systems and conversion between them
Camera models - perspective and orthographic cameras

Technical requirements

In order to run the example code snippets in this book, you will need to have a computer ideally with a GPU. However, running the code snippets with only CPUs is possible.

The recommended computer configuration includes the following:

A GPU such as the GTX series or RTX series with at least 8 GB of memory
Python 3
The PyTorch library and PyTorch3D libraries

The code snippets for this chapter can be found at https://github.com/PacktPublishing/3D-Deep-Learning-with-Python.

Setting up a development environment

Let us first set up a development environment for all the coding exercises in this book. We recommend using a Linux machine for all the Python code examples in this book:

We will first set up Anaconda. Anaconda is a widely used Python distribution that bundles with the powerful CPython implementation. One advantage of using Anaconda is its package management system, enabling users to create virtual environments easily. The individual edition of Anaconda is free for solo practitioners, students, and researchers. To install Anaconda, we recommend visiting the website, anaconda.com, for detailed instructions. The easiest way to install Anaconda is usually by running a script downloaded from their website. After setting up Anaconda, run the following command to create a virtual environment of Python 3.7:
$ conda create -n python3d python=3.7

This command will create a virtual environment with Python version 3.7. In order to use this virtual environment, we need to activate it first by running the command:

Activate the newly created virtual environments with the following command:
$ source activate python3d
Install PyTorch. Detailed instructions on installing PyTorch can be found on its web page at www.pytorch.org/get-started/locally/. For example, I will install PyTorch 1.9.1 on my Ubuntu desktop with CUDA 11.1, as follows:
$ conda install pytorch torchvision torchaudio cudatoolkit-11.1 -c pytorch -c nvidia
Install PyTorch3D. PyTorch3D is an open source Python library for 3D computer vision recently released by Facebook AI Research. PyTorch3D provides many utility functions to easily manipulate 3D data. Designed with deep learning in mind, almost all 3D data can be handled by mini-batches, such as cameras, point clouds, and meshes. Another key feature of PyTorch3D is the implementation of a very important 3D deep learning technique, called differentiable rendering. However, the biggest advantage of PyTorch3D as a 3D deep learning library is its close ties to PyTorch.

PyTorch3D may need some dependencies, and detailed instructions on how to install these dependencies can be found on the PyTorch3D GitHub home page at github.com/facebookresearch/pytorch3d. After all the dependencies have been installed by following the instructions from the website, installing PyTorch3D can be easily done by running the following command:

$ conda install pytorch3d -c pytorch3d

Now that we have set up the development environment, let's go ahead and start learning data representation.

3D data representation

In this section, we will learn the most frequently used data representation of 3D data. Choosing data representation is a particularly important design decision for many 3D deep learning systems. For example, point clouds do not have grid-like structures, thus convolutions cannot be usually used directly for them. Voxel representations have grid-like structures; however, they tend to consume a high amount of computer memory. We will discuss the pros and cons of these 3D representations in more detail in this section. Widely used 3D data representations usually include point clouds, meshes, and voxels.

Understanding point cloud representation

A 3D point cloud is a very straightforward representation of 3D objects, where each point cloud is just a collection of 3D points, and each 3D point is represented by one three-dimensional tuple (x, y, or z). The raw measurements of many depth cameras are usually 3D point clouds.

From a deep learning point of view, 3D point clouds are one of the unordered and irregular data types. Unlike regular images, where we can define neighboring pixels for each individual pixel, there are no clear and regular definitions for neighboring points for each point in a point cloud - that is, convolutions usually cannot be applied to point clouds. Thus, special types of deep learning models need to be used for processing point clouds, such as PointNet: https://arxiv.org/abs/1612.00593.

Another issue for point clouds as training data for 3D deep learning is the heterogeneous data issue - that is, for one training dataset, different point clouds may contain different numbers of 3D points. One approach for avoiding such a heterogeneous data issue is forcing all the point clouds to have the same number of points. However, this may not be always possible - for example, the number of points returned by depth cameras may be different from frame to frame.

The heterogeneous data may create some difficulties for mini-batch gradient descent in training deep learning models. Most deep learning frameworks assume that each mini-batch contains training examples of the same size and dimensions. Such homogeneous data is preferred because it can be most efficiently processed by modern parallel processing hardware, such as GPUs. Handling heterogeneous mini-batches in an efficient way needs some additional work. Luckily, PyTorch3D provides many ways of handling heterogeneous mini-batches efficiently, which are important for 3D deep learning.

Understanding mesh representation

Meshes are another widely used 3D data representation. Like points in point clouds, each mesh contains a set of 3D points called vertices. In addition, each mesh also contains a set of polygons called faces, which are defined on vertices.

In most data-driven applications, meshes are a result of post-processing from raw measurements of depth cameras. Often, they are manually created during the process of 3D asset design. Compared to point clouds, meshes contain additional geometric information, encode topology, and have surface-normal information. This additional information becomes especially useful in training learning models. For example, graph convolutional neural networks usually treat meshes as graphs and define convolutional operations using the vertex neighboring information.

Just like point clouds, meshes also have similar heterogeneous data issues. Again, PyTorch3D provides efficient ways for handling heterogeneous mini-batches for mesh data, which makes 3D deep learning efficient.

Understanding voxel representation

Another important 3D data representation is voxel representation. A voxel is the counterpart of a pixel in 3D computer vision. A pixel is defined by dividing a rectangle in 2D into smaller rectangles and each small rectangle is one pixel. Similarly, a voxel is defined by dividing a 3D cube into smaller-sized cubes and each cube is called one voxel. The processes are shown in the following figure:

Figure 1.1 - Voxel representation is the 3D counterpart of 2D pixel representation, where a cubic space is divided into small volume...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

3D Deep Learning with Python

Description

All prices

More details

Other editions

Additional editions

Content

1

Introducing 3D Data Processing

Technical requirements

Setting up a development environment

3D data representation

Understanding point cloud representation

Understanding mesh representation

Understanding voxel representation

System requirements