Electronic Structure Calculations on Graphics Processing Units

Name: Electronic Structure Calculations on Graphics Processing Units | From Quantum Chemistry to Condensed Matter Physics
Brand: Wiley
Price: 127.99 EUR
Availability: OnlineOnly

From Quantum Chemistry to Condensed Matter Physics

Ross C. Walker Andreas W. Goetz(Herausgeber*in)

Wiley (Verlag)

Erschienen am 16. Februar 2016

368 Seiten

E-Book

ePUB mit Adobe-DRM

Systemvoraussetzungen

978-1-118-67069-9 (ISBN)

127,99 €inkl. 7% MwSt.

Systemvoraussetzungen

für ePUB mit Adobe-DRM

E-Book Einzellizenz

Als Download verfügbar

Beschreibung

Weitere Details

Weitere Ausgaben

Personen

Ross C. Walker, San Diego Supercomputer Center and Department of Chemistry and Biochemistry, University of California San Diego
Dr. Walker is an Assistant Research Professor at the San Diego Supercomputer Center, an Adjunct Assistant Professor in the Department of Chemistry and Biochemistry at the University of California San Diego, and an NVIDIA CUDA fellow. He leads a team of scientists that develop advanced techniques for molecular dynamics (MD) simulations aimed at improving drug and biocatalyst design.
Aspects of his work that are of particular relevance for the proposed book include the development of quantum mechanics (QM) and quantum mechanics/molecular mechanics (QM/MM) methods for MD simulations, and the development of a widely used GPU accelerated MD code with funding from the National Science Foundation program SI2 (Software Infrastructure for Sustained Innovation). These methods, including the GPU accelerated MD code, are integrated into the AMBER MD software package that is used worldwide.
Over the course of the last years Dr. Walker has given presentations and lectured on multiple occasions about GPU acceleration of MD codes and scientific applications. Dr. Walker's research is documented in over 30 peer-reviewed journal articles and multiple collected works. In 2010 Dr. Walker co-authored with Dr. Goetz a book chapter that reviews the use of GPU accelerators in quantum chemistry.
Andreas W. Goetz, San Diego Supercomputer Center, University of California San Diego
Dr. Goetz is an Assistant Project Scientist at the San Diego Supercomputer Center with strong expertise in method and scientific software development for quantum chemistry and molecular dynamics simulations on high performance computing platforms. He is a contributing author of the ADF (Amsterdam Density Functional) software for DFT calculations and the AMBER MD software package.
Over the last years, Dr. Goetz has given various contributed and invited presentations of his work at renowned universities and international conferences. Dr. Goetz has also organized and taught workshops demonstrating the use of the software he develops. His research is documented in 21 peer-reviewed journal articles and 1 book contribution.

Inhalt

List of Contributors xiii

Preface xvii

Acknowledgments xix

Glossary xxi

Abbreviations xxv

1. Why Graphics Processing Units 1"
Perri Needham, Andreas W. Götz and Ross C. Walker

1.1 A Historical Perspective of Parallel Computing 1

1.2 The Rise of the GPU 5

1.3 Parallel Computing on Central Processing Units 7

1.4 Parallel Computing on Graphics Processing Units 12

1.5 GPU-Accelerated Applications 15

References 19

2. GPUs: Hardware to Software 23
Perri Needham, Andreas W. Götz and Ross C. Walker

2.1 Basic GPU Terminology 24

2.2 Architecture of GPUs 24

2.3 CUDA Programming Model 26

2.4 Programming and Optimization Concepts 30

2.5 Software Libraries for GPUs 34

2.6 Special Features of CUDA-Enabled GPUs 35

References 36

3. Overview of Electronic Structure Methods 39
Andreas W. Götz

3.1 Introduction 39

3.2 Hartree-Fock Theory 42

3.3 Density Functional Theory 46

3.4 Basis Sets 49

3.5 Semiempirical Methods 53

3.6 Density Functional Tight Binding 56

3.7 Wave Function-Based Electron Correlation Methods 57

Acknowledgments 60

References 61

4. Gaussian Basis Set Hartree-Fock, Density Functional Theory, and Beyond on GPUs 67
Nathan Luehr, Aaron Sisto and Todd J. Martínez

4.1 Quantum Chemistry Review 68

4.2 Hardware and CUDA Overview 72

4.3 GPU ERI Evaluation 73

4.4 Integral-Direct Fock Construction on GPUs 78

4.5 Precision Considerations 88

4.6 Post-SCF Methods 91

4.7 Example Calculations 93

4.8 Conclusions and Outlook 97

References 98

5. GPU Acceleration for Density Functional Theory with Slater-Type Orbitals 101
Hans van Schoot and Lucas Visscher

5.1 Background 101

5.2 Theory and CPU Implementation 102

5.3 GPU Implementation 105

5.4 Conclusion 112

References 113

6. Wavelet-Based Density Functional Theory on Massively Parallel Hybrid Architectures 115
Luigi Genovese, Brice Videau, Damien Caliste, Jean-François Méhaut, Stefan Goedecker and Thierry Deutsch

6.1 Introductory Remarks on Wavelet Basis Sets for Density Functional Theory Implementations 115

6.2 Operators in Wavelet Basis Sets 117

6.3 Parallelization 123

6.4 GPU Architecture 124

6.5 Conclusions and Outlook 132

References 133

7. Plane-Wave Density Functional Theory 135
Maxwell Hutchinson, Paul Fleurat-Lessard, Ani Anciaux-Sedrakian, Dusan Stosic, Jeroen Bédorf and Sarah Tariq

7.1 Introduction 135

7.2 Theoretical Background 136

7.3 Implementation 143

7.4 Optimizations 148

7.5 Performance Examples 151

7.6 Exact Exchange with Plane Waves 159

7.7 Summary and Outlook 165

Acknowledgments 165

References 165

Appendix A: Definitions and Conventions 168

Appendix B: Example Kernels 168

8. GPU-Accelerated Sparse Matrix-Matrix Multiplication for Linear Scaling Density Functional Theory 173
Ole Schütt, Peter Messmer, Jürg Hutter and Joost VandeVondele

8.1 Introduction 173

8.2 Software Architecture for GPU-Acceleration 177

8.3 Maximizing Asynchronous Progress 180

8.4 Libcusmm: GPU Accelerated Small Matrix Multiplications 183

8.5 Benchmarks and Conclusions 186

Acknowledgments 189

References 189

9. Grid-Based Projector-Augmented Wave Method 191
Samuli Hakala, Jussi Enkovaara, Ville Havu, Jun Yan, Lin Li, Chris O'Grady

and Risto M. Nieminen

9.1 Introduction 191

9.2 General Overview 193

9.3 Using GPUs in Ground-State Calculations 196

9.4 Time-Dependent Density Functional Theory 202

9.5 Random Phase Approximation for the Correlation Energy 203

9.6 Summary and Outlook 207

Acknowledgments 208

References 208

10. Application of Graphics Processing Units to Accelerate Real-Space Density Functional Theory and Time-Dependent Density Functional Theory Calculations 211
Xavier Andrade and Alán Aspuru-Guzik

10.1 Introduction 212

10.2 The Real-Space Representation 213

10.3 Numerical Aspects of the Real-Space Approach 214

10.4 General GPU Optimization Strategy 216

10.5 Kohn-Sham Hamiltonian 217

10.6 Orthogonalization and Subspace Diagonalization 221

10.7 Exponentiation 222

10.8 The Hartree Potential 223

10.9 Other Operations 224

10.10 Numerical Performance 225

10.11 Conclusions 228

10.12 Computational Methods 228

Acknowledgments 229

References 229

11. Semiempirical Quantum Chemistry 239
Xin Wu, Axel Koslowski and Walter Thiel

11.1 Introduction 239

11.2 Overview of Semiempirical Methods 240

11.3 Computational Bottlenecks 241

11.4 Profile-Guided Optimization for the Hybrid Platform 244

11.5 Performance 249

11.6 Applications 251

11.7 Conclusion 252

Acknowledgement 253

References 253

12. GPU Acceleration of Second-Order Møller-Plesset Perturbation Theory with Resolution of Identity 259
Roberto Olivares-Amaya, Adrian Jinich, Mark A. Watson and Alán Aspuru-Guzik

12.1 Møller-Plesset Perturbation Theory with Resolution of Identity Approximation (RI-MP2) 259

12.2 A Mixed-Precision Matrix Multiplication Library 263

12.3 Performance of Accelerated RI-MP2 266

12.4 Example Applications 270

12.5 Conclusions 273

References 273

13. Iterative Coupled-Cluster Methods on Graphics Processing Units 279
A. Eugene DePrince III, Jeff R. Hammond and C. David Sherrill

13.1 Introduction 279

13.2 Related Work 280

13.3 Theory 281

13.4 Algorithm Details 284

13.5 Computational Details 287

13.6 Results 290

13.7 Conclusions 295

Acknowledgments 296

References 296

14. Perturbative Coupled-Cluster Methods on Graphics Processing Units: Single- and Multi-Reference Formulations 301
Wenjing Ma, Kiran Bhaskaran-Nair, Oreste Villa, Edoardo Aprà, Antonino Tumeo, Sriram Krishnamoorthy and Karol Kowalski

14.1 Introduction 302

14.2 Overview of Electronic Structure Methods 303

14.3 NWChem Software Architecture 308

14.4 GPU Implementation 309

14.5 Performance 315

14.6 Outlook 319

Acknowledgments 320

References 320

Index 327

Chapter 1
Why Graphics Processing Units

Perri Needham1, Andreas W. Götz2 and Ross C. Walker1,2

1San Diego Supercomputer Center, UCSD, La Jolla, CA, USA

2Department of Chemistry and Biochemistry, UCSD, La Jolla, CA, USA

1.1 A Historical Perspective of Parallel Computing

The first general-purpose electronic computers capable of storing instructions came into existence in 1950. That is not to say, however, that the use of computers to solve electronic structure problems had not already been considered, or realized. From as early as 1930 scientists used a less advanced form of computation to solve their quantum mechanical problems, albeit a group of assistants simultaneously working on mechanical calculators but an early parallel computing machine nonetheless [1]. It was clear from the beginning that solutions to electronic structure problems could not be carried forward to many-electron systems without the use of some computational device to lessen the mathematical burden. Today's computational scientists rely heavily on the use of parallel electronic computers.

Parallel electronic computers can be broadly classified as having either multiple processing elements in the same machine (shared memory) or multiple machines coupled together to form a cluster/grid of processing elements (distributed memory). These arrangements make it possible to perform calculations concurrently across multiple processing elements, enabling large problems to be broken down into smaller parts that can be solved simultaneously (in parallel).

The first electronic computers were primarily designed for and funded by military projects to assist in World War II and the start of the Cold War [2]. The first working programmable digital computer, Konrad Zuse's Z3 [3], was an electromechanical device that became operational in 1941 and was used by the German aeronautical research organization. Colossus, developed by the British for cryptanalysis during World War II, was the world's first programmable electronic digital computer and was responsible for the decryption of valuable German military intelligence from 1944 onwards. Colossus was a purpose-built machine to determine the encryption settings for the German Lorenz cipher and read encrypted messages and instructions from paper tape. It was not until 1955, however, that the first general-purpose machine to execute floating-point arithmetic operations became commercially available, the IBM 704 (see Figure 1.1).

Figure 1.1 Photograph taken in 1957 at NASA featuring an IBM 704 computer, the first commercially available general-purpose computer with floating-point arithmetic hardware [4]

A common measure of compute performance is floating point operations per second (FLOPS). The IBM 704 was capable of a mere 12,000 floating-point additions per second and required 1500-2000 ft2 of floor space. Compare this to modern smartphones, which are capable of around 1.5 GIGA FLOPS [5] thanks to the invention in 1958 and a subsequent six decades of refinement of the integrated circuit. To put this in perspective, if the floor footprint of an IBM 704 was instead covered with modern-day smartphones laid side by side, the computational capacity of the floor space would grow from 12,000 to around 20,000,000,000,000 FLOPS. This is the equivalent of every person on the planet carrying out roughly 2800 floating point additions per second. Statistics like these make it exceptionally clear just how far computer technology has advanced, and, while mobile internet and games might seem like the apex of the technology's capabilities, it has also opened doorways to computationally explore scientific questions in ways previously believed impossible.

Computers today find their use in many different areas of science and industry, from weather forecasting and film making to genetic research, drug discovery, and nuclear weapon design. Without computers many scientific exploits would not be possible.

While the performance of individual computers continued to advance the thirst for computational power for scientific simulation was such that by the late 1950s discussions had turned to utilizing multiple processors, working in harmony, to address more complex scientific problems. The 1960s saw the birth of parallel computing with the invention of multiprocessor systems. The first recorded example of a commercially available multiprocessor (parallel) computer was Burroughs Corporation's D825, released in 1962, which had four processors that accessed up to 16 memory modules via a cross switch (see Figure 1.2).

Figure 1.2 Photograph of Burroughs Corporation's D825 parallel computer [6]

This was followed in the 1970s by the concept of single-instruction multiple-data (SIMD) processor architectures, forming the basis of vector parallel computing. SIMD is an important concept in graphics processing unit (GPU) computing and is discussed in the next chapter.

Parallel computing opened the door to tackling complex scientific problems including modeling electrons in molecular systems through quantum mechanical means (the subject of this book). To give an example, optimizing the geometry of any but the smallest molecular systems using sophisticated electronic structure methods can take days (if not weeks) on a single processor element (compute core). Parallelizing the calculation over multiple compute cores can significantly cut down the required computing time and thus enables a researcher to study complex molecular systems in more practical time frames, achieving insights otherwise thought inaccessible. The use of parallel electronic computers in quantum chemistry was pioneered in the early 1980s by the Italian chemist Enrico Clementi and co-workers [7]. The parallel computer consisted of 10 compute nodes, loosely coupled into an array, which was used to calculate the Hartree-Fock (HF) self-consistent field (SCF) energy of a small fragment of DNA represented by 315 basis functions. At the time this was a considerable achievement. However, this was just the start, and by the late 1980s all sorts of parallel programs had been developed for quantum chemistry methods. These included HF methods to calculate the energy and nuclear gradients of a molecular system [8-11], the transformation of two-electron integrals [8, 9, 12], the second-order Møller-Plesset perturbation theory [9, 13], and the configuration interaction method [8]. The development of parallel computing in quantum chemistry was dictated by developments in available technologies. In particular, the advent of application programming interfaces (APIs) such as the message-passing interface (MPI) library [14] made parallel computing much more accessible to quantum chemists, along with developments in hardware technology driving down the cost of parallel computing machines [10].

While finding widespread use in scientific computing until recently, parallel computing was reserved for those with access to high-performance computing (HPC) resources. However, for reasons discussed in the following, all modern computer architectures exploit parallel technology, and effective parallel programming is vital to be able to utilize the computational power of modern devices. Parallel processing is now standard across all devices fitted with modern-day processor architectures. In his 1965 paper [15], Gordon E. Moore first observed that the number of transistors (in principle, directly related to performance) on integrated circuits was doubling every 2 years (see Figure 1.3).

Figure 1.3 Microprocessor transistor counts 1971-2011. Until recently, the number of transistors on integrated circuits has been following Moore's Law [16], doubling approximately every 2 years

Since this observation was announced, the semiconductor industry has preserved this trend by ensuring that chip performance doubles every 18 months through improved transistor efficiency and/or quantity. In order to meet these performance goals the semiconductor industry has now improved chip design close to the limits of what is physically possible. The laws of physics dictate the minimum size of a transistor, the rate of heat dissipation, and the speed of light.

"The size of transistors is approaching the size of atoms, which is a fundamental barrier" [17].

At the same time, the clock frequencies cannot be easily increased since both clock frequency and transistor density increase the power density, as illustrated by Figure 1.4. Processors are already operating at a power density that exceeds that of a hot plate and are approaching that of the core of a nuclear reactor.

Figure 1.4 Illustration of the ever-increasing power density within silicon chips, with decreasing gate length.

Courtesy Intel Corporation [18]

In order to continue scaling with Moore's Law, but keep power densities manageable, chip manufacturers have taken to increasing the number of cores per processor as opposed to transistors per core. Most processors produced today comprise multiple cores and so are parallel processing machines by definition. In terms of processor performance, this is a tremendous boon to science and industry; however, the increasing number of cores brings with them increased complexity to the programmer in order to fully utilize the available compute power. It is becoming more and more difficult for applications to achieve good scaling with increasing core counts, and hence...

Inhalt (EPUB)

Systemvoraussetzungen

Als PDF speichern Als Link merken

Electronic Structure Calculations on Graphics Processing Units

Beschreibung

Weitere Details

Weitere Ausgaben

Personen

Inhalt

Chapter 1 Why Graphics Processing Units

1.1 A Historical Perspective of Parallel Computing

Systemvoraussetzungen

Chapter 1
Why Graphics Processing Units