
3D Deep Learning with Python
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
With this hands-on guide to 3D deep learning, developers working with 3D computer vision will be able to put their knowledge to work and get up and running in no time. Complete with step-by-step explanations of essential concepts and practical examples, this book lets you explore and gain a thorough understanding of state-of-the-art 3D deep learning. You'll see how to use PyTorch3D for basic 3D mesh and point cloud data processing, including loading and saving ply and obj files, projecting 3D points into camera coordination using perspective camera models or orthographic camera models, rendering point clouds and meshes to images, and much more. As you implement some of the latest 3D deep learning algorithms, such as differential rendering, Nerf, synsin, and mesh RCNN, you'll realize how coding for these deep learning models becomes easier using the PyTorch3D library. By the end of this deep learning book, you'll be ready to implement your own 3D deep learning models confidently.
All prices
More details
Other editions
Additional editions

Content
- Cover
- Title Page
- Copyright and Credits
- Contributors
- Table of Contents
- Preface
- PART 1: 3D Data Processing Basics
- Chapter 1: Introducing 3D Data Processing
- Technical requirements
- Setting up a development environment
- 3D data representation
- Understanding point cloud representation
- Understanding mesh representation
- Understanding voxel representation
- 3D data file format - Ply files
- 3D data file format - OBJ files
- Understanding 3D coordination systems
- Understanding camera models
- Coding for camera models and coordination systems
- Summary
- Chapter 2: Introducing 3D Computer Vision and Geometry
- Technical requirements
- Exploring the basic concepts of rendering, rasterization, and shading
- Understanding barycentric coordinates
- Light source models
- Understanding the Lambertian shading model
- Understanding the Phong lighting model
- Coding exercises for 3D rendering
- Using PyTorch3D heterogeneous batches and PyTorch optimizers
- A coding exercise for a heterogeneous mini-batch
- Understanding transformations and rotations
- A coding exercise for transformation and rotation
- Summary
- PART 2: 3D Deep Learning Using PyTorch3D
- Chapter 3: Fitting Deformable Mesh Models to Raw Point Clouds
- Technical requirements
- Fitting meshes to point clouds - the problem
- Formulating a deformable mesh fitting problem into an optimization problem
- Loss functions for regularization
- Mesh Laplacian smoothing loss
- Mesh normal consistency loss
- Mesh edge loss
- Implementing the mesh fitting with PyTorch3D
- The experiment of not using any regularization loss functions
- The experiment of using only the mesh edge loss
- Summary
- Chapter 4: Learning Object Pose Detection and Tracking by Differentiable Rendering
- Technical requirements
- Why we want to have differentiable rendering
- How to make rendering differentiable
- What problems can be solved by using differentiable rendering
- The object pose estimation problem
- How it is coded
- An example of object pose estimation for both silhouette fitting and texture fitting
- Summary
- Chapter 5: Understanding Differentiable Volumetric Rendering
- Technical requirements
- Overview of volumetric rendering
- Understanding ray sampling
- Using volume sampling
- Exploring the ray marcher
- Differentiable volumetric rendering
- Reconstructing 3D models from multi-view images
- Summary
- Chapter 6: Exploring Neural Radiance Fields (NeRF)
- Technical requirements
- Understanding NeRF
- What is a radiance field?
- Representing radiance fields with neural networks
- Training a NeRF model
- Understanding the NeRF model architecture
- Understanding volume rendering with radiance fields
- Projecting rays into the scene
- Accumulating the color of a ray
- Summary
- PART 3: State-of-the-art 3D Deep Learning Using PyTorch3D
- Chapter 7: Exploring Controllable Neural Feature Fields
- Technical requirements
- Understanding GAN-based image synthesis
- Introducing compositional 3D-aware image synthesis
- Generating feature fields
- Mapping feature fields to images
- Exploring controllable scene generation
- Exploring controllable car generation
- Exploring controllable face generation
- Training the GIRAFFE model
- Frechet Inception Distance
- Training the model
- Summary
- Chapter 8: Modeling the Human Body in 3D
- Technical requirements
- Formulating the 3D modeling problem
- Defining a good representation
- Understanding the Linear Blend Skinning technique
- Understanding the SMPL model
- Defining the SMPL model
- Using the SMPL model
- Estimating 3D human pose and shape using SMPLify
- Defining the optimization objective function
- Exploring SMPLify
- Running the code
- Exploring the code
- Summary
- Chapter 9: Performing End-to-End View Synthesis with SynSin
- Technical requirements
- Overview of view synthesis
- SynSin network architecture
- Spatial feature and depth networks
- Neural point cloud renderer
- Refinement module and discriminator
- Hands-on model training and testing
- Summary
- Chapter 10: Mesh R-CNN
- Technical requirements
- Overview of meshes and voxels
- Mesh R-CNN architecture
- Graph convolutions
- Mesh predictor
- Demo of Mesh R-CNN with PyTorch
- Demo
- Summary
- Index
- Other Books You May Enjoy
1
Introducing 3D Data Processing
In this chapter, we are going to discuss some basic concepts that are very fundamental to 3D deep learning and that will be used frequently in later chapters. We will begin by learning about the most frequently used 3D data formats, as well as the many ways that we are going to manipulate them and convert them to different formats. We will start by setting up our development environment and installing all the necessary software packages, including Anaconda, Python, PyTorch, and PyTorch3D. We will then talk about the most frequently used ways to represent 3D data - for example, point clouds, meshes, and voxels. We will then move on to the 3D data file formats, such as PLY and OBJ files. We will then discuss 3D coordination systems. Finally, we will discuss camera models, which are mostly related to how 3D data is mapped to 2D images.
After reading this chapter, you will be able to debug 3D deep learning algorithms easily by inspecting output data files. With a solid understanding of coordination systems and camera models, you will be ready to build on that knowledge and learn about more advanced 3D deep learning topics.
In this chapter, we're going to cover the following main topics:
- Setting up a development environment and installing Anaconda, PyTorch, and PyTorch3D
- 3D data representation
- 3D data formats - PLY and OBJ files
- 3D coordination systems and conversion between them
- Camera models - perspective and orthographic cameras
Technical requirements
In order to run the example code snippets in this book, you will need to have a computer ideally with a GPU. However, running the code snippets with only CPUs is possible.
The recommended computer configuration includes the following:
- A GPU such as the GTX series or RTX series with at least 8 GB of memory
- Python 3
- The PyTorch library and PyTorch3D libraries
The code snippets for this chapter can be found at https://github.com/PacktPublishing/3D-Deep-Learning-with-Python.
Setting up a development environment
Let us first set up a development environment for all the coding exercises in this book. We recommend using a Linux machine for all the Python code examples in this book:
- We will first set up Anaconda. Anaconda is a widely used Python distribution that bundles with the powerful CPython implementation. One advantage of using Anaconda is its package management system, enabling users to create virtual environments easily. The individual edition of Anaconda is free for solo practitioners, students, and researchers. To install Anaconda, we recommend visiting the website, anaconda.com, for detailed instructions. The easiest way to install Anaconda is usually by running a script downloaded from their website. After setting up Anaconda, run the following command to create a virtual environment of Python 3.7:
$ conda create -n python3d python=3.7
This command will create a virtual environment with Python version 3.7. In order to use this virtual environment, we need to activate it first by running the command:
- Activate the newly created virtual environments with the following command:
$ source activate python3d
- Install PyTorch. Detailed instructions on installing PyTorch can be found on its web page at www.pytorch.org/get-started/locally/. For example, I will install PyTorch 1.9.1 on my Ubuntu desktop with CUDA 11.1, as follows:
$ conda install pytorch torchvision torchaudio cudatoolkit-11.1 -c pytorch -c nvidia
- Install PyTorch3D. PyTorch3D is an open source Python library for 3D computer vision recently released by Facebook AI Research. PyTorch3D provides many utility functions to easily manipulate 3D data. Designed with deep learning in mind, almost all 3D data can be handled by mini-batches, such as cameras, point clouds, and meshes. Another key feature of PyTorch3D is the implementation of a very important 3D deep learning technique, called differentiable rendering. However, the biggest advantage of PyTorch3D as a 3D deep learning library is its close ties to PyTorch.
PyTorch3D may need some dependencies, and detailed instructions on how to install these dependencies can be found on the PyTorch3D GitHub home page at github.com/facebookresearch/pytorch3d. After all the dependencies have been installed by following the instructions from the website, installing PyTorch3D can be easily done by running the following command:
$ conda install pytorch3d -c pytorch3d
Now that we have set up the development environment, let's go ahead and start learning data representation.
3D data representation
In this section, we will learn the most frequently used data representation of 3D data. Choosing data representation is a particularly important design decision for many 3D deep learning systems. For example, point clouds do not have grid-like structures, thus convolutions cannot be usually used directly for them. Voxel representations have grid-like structures; however, they tend to consume a high amount of computer memory. We will discuss the pros and cons of these 3D representations in more detail in this section. Widely used 3D data representations usually include point clouds, meshes, and voxels.
Understanding point cloud representation
A 3D point cloud is a very straightforward representation of 3D objects, where each point cloud is just a collection of 3D points, and each 3D point is represented by one three-dimensional tuple (x, y, or z). The raw measurements of many depth cameras are usually 3D point clouds.
From a deep learning point of view, 3D point clouds are one of the unordered and irregular data types. Unlike regular images, where we can define neighboring pixels for each individual pixel, there are no clear and regular definitions for neighboring points for each point in a point cloud - that is, convolutions usually cannot be applied to point clouds. Thus, special types of deep learning models need to be used for processing point clouds, such as PointNet: https://arxiv.org/abs/1612.00593.
Another issue for point clouds as training data for 3D deep learning is the heterogeneous data issue - that is, for one training dataset, different point clouds may contain different numbers of 3D points. One approach for avoiding such a heterogeneous data issue is forcing all the point clouds to have the same number of points. However, this may not be always possible - for example, the number of points returned by depth cameras may be different from frame to frame.
The heterogeneous data may create some difficulties for mini-batch gradient descent in training deep learning models. Most deep learning frameworks assume that each mini-batch contains training examples of the same size and dimensions. Such homogeneous data is preferred because it can be most efficiently processed by modern parallel processing hardware, such as GPUs. Handling heterogeneous mini-batches in an efficient way needs some additional work. Luckily, PyTorch3D provides many ways of handling heterogeneous mini-batches efficiently, which are important for 3D deep learning.
Understanding mesh representation
Meshes are another widely used 3D data representation. Like points in point clouds, each mesh contains a set of 3D points called vertices. In addition, each mesh also contains a set of polygons called faces, which are defined on vertices.
In most data-driven applications, meshes are a result of post-processing from raw measurements of depth cameras. Often, they are manually created during the process of 3D asset design. Compared to point clouds, meshes contain additional geometric information, encode topology, and have surface-normal information. This additional information becomes especially useful in training learning models. For example, graph convolutional neural networks usually treat meshes as graphs and define convolutional operations using the vertex neighboring information.
Just like point clouds, meshes also have similar heterogeneous data issues. Again, PyTorch3D provides efficient ways for handling heterogeneous mini-batches for mesh data, which makes 3D deep learning efficient.
Understanding voxel representation
Another important 3D data representation is voxel representation. A voxel is the counterpart of a pixel in 3D computer vision. A pixel is defined by dividing a rectangle in 2D into smaller rectangles and each small rectangle is one pixel. Similarly, a voxel is defined by dividing a 3D cube into smaller-sized cubes and each cube is called one voxel. The processes are shown in the following figure:
Figure 1.1 - Voxel representation is the 3D counterpart of 2D pixel representation, where a cubic space is divided into small volume...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.
File format: ePUB
Copy protection: without DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Use a reader that can handle the file format ePUB, such as Adobe Digital Editions or FBReader – both free (see eBook Help).
- Tablet/Smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePUB works well for novels and non-fiction books – i.e., 'flowing' text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook does not use copy protection or Digital Rights Management
For more information, see our eBook Help page.