Schweitzer Fachinformationen
Wenn es um professionelles Wissen geht, ist Schweitzer Fachinformationen wegweisend. Kunden aus Recht und Beratung sowie Unternehmen, öffentliche Verwaltungen und Bibliotheken erhalten komplette Lösungen zum Beschaffen, Verwalten und Nutzen von digitalen und gedruckten Medien.
List of Contributors xiii
Preface xvii
Acknowledgments xix
Glossary xxi
Abbreviations xxv
1. Why Graphics Processing Units 1"Perri Needham, Andreas W. Götz and Ross C. Walker
1.1 A Historical Perspective of Parallel Computing 1
1.2 The Rise of the GPU 5
1.3 Parallel Computing on Central Processing Units 7
1.4 Parallel Computing on Graphics Processing Units 12
1.5 GPU-Accelerated Applications 15
References 19
2. GPUs: Hardware to Software 23Perri Needham, Andreas W. Götz and Ross C. Walker
2.1 Basic GPU Terminology 24
2.2 Architecture of GPUs 24
2.3 CUDA Programming Model 26
2.4 Programming and Optimization Concepts 30
2.5 Software Libraries for GPUs 34
2.6 Special Features of CUDA-Enabled GPUs 35
References 36
3. Overview of Electronic Structure Methods 39Andreas W. Götz
3.1 Introduction 39
3.2 Hartree-Fock Theory 42
3.3 Density Functional Theory 46
3.4 Basis Sets 49
3.5 Semiempirical Methods 53
3.6 Density Functional Tight Binding 56
3.7 Wave Function-Based Electron Correlation Methods 57
Acknowledgments 60
References 61
4. Gaussian Basis Set Hartree-Fock, Density Functional Theory, and Beyond on GPUs 67Nathan Luehr, Aaron Sisto and Todd J. Martínez
4.1 Quantum Chemistry Review 68
4.2 Hardware and CUDA Overview 72
4.3 GPU ERI Evaluation 73
4.4 Integral-Direct Fock Construction on GPUs 78
4.5 Precision Considerations 88
4.6 Post-SCF Methods 91
4.7 Example Calculations 93
4.8 Conclusions and Outlook 97
References 98
5. GPU Acceleration for Density Functional Theory with Slater-Type Orbitals 101Hans van Schoot and Lucas Visscher
5.1 Background 101
5.2 Theory and CPU Implementation 102
5.3 GPU Implementation 105
5.4 Conclusion 112
References 113
6. Wavelet-Based Density Functional Theory on Massively Parallel Hybrid Architectures 115Luigi Genovese, Brice Videau, Damien Caliste, Jean-François Méhaut, Stefan Goedecker and Thierry Deutsch
6.1 Introductory Remarks on Wavelet Basis Sets for Density Functional Theory Implementations 115
6.2 Operators in Wavelet Basis Sets 117
6.3 Parallelization 123
6.4 GPU Architecture 124
6.5 Conclusions and Outlook 132
References 133
7. Plane-Wave Density Functional Theory 135Maxwell Hutchinson, Paul Fleurat-Lessard, Ani Anciaux-Sedrakian, Dusan Stosic, Jeroen Bédorf and Sarah Tariq
7.1 Introduction 135
7.2 Theoretical Background 136
7.3 Implementation 143
7.4 Optimizations 148
7.5 Performance Examples 151
7.6 Exact Exchange with Plane Waves 159
7.7 Summary and Outlook 165
Acknowledgments 165
References 165
Appendix A: Definitions and Conventions 168
Appendix B: Example Kernels 168
8. GPU-Accelerated Sparse Matrix-Matrix Multiplication for Linear Scaling Density Functional Theory 173Ole Schütt, Peter Messmer, Jürg Hutter and Joost VandeVondele
8.1 Introduction 173
8.2 Software Architecture for GPU-Acceleration 177
8.3 Maximizing Asynchronous Progress 180
8.4 Libcusmm: GPU Accelerated Small Matrix Multiplications 183
8.5 Benchmarks and Conclusions 186
Acknowledgments 189
References 189
9. Grid-Based Projector-Augmented Wave Method 191Samuli Hakala, Jussi Enkovaara, Ville Havu, Jun Yan, Lin Li, Chris O'Grady
and Risto M. Nieminen
9.1 Introduction 191
9.2 General Overview 193
9.3 Using GPUs in Ground-State Calculations 196
9.4 Time-Dependent Density Functional Theory 202
9.5 Random Phase Approximation for the Correlation Energy 203
9.6 Summary and Outlook 207
Acknowledgments 208
References 208
10. Application of Graphics Processing Units to Accelerate Real-Space Density Functional Theory and Time-Dependent Density Functional Theory Calculations 211Xavier Andrade and Alán Aspuru-Guzik
10.1 Introduction 212
10.2 The Real-Space Representation 213
10.3 Numerical Aspects of the Real-Space Approach 214
10.4 General GPU Optimization Strategy 216
10.5 Kohn-Sham Hamiltonian 217
10.6 Orthogonalization and Subspace Diagonalization 221
10.7 Exponentiation 222
10.8 The Hartree Potential 223
10.9 Other Operations 224
10.10 Numerical Performance 225
10.11 Conclusions 228
10.12 Computational Methods 228
Acknowledgments 229
References 229
11. Semiempirical Quantum Chemistry 239Xin Wu, Axel Koslowski and Walter Thiel
11.1 Introduction 239
11.2 Overview of Semiempirical Methods 240
11.3 Computational Bottlenecks 241
11.4 Profile-Guided Optimization for the Hybrid Platform 244
11.5 Performance 249
11.6 Applications 251
11.7 Conclusion 252
Acknowledgement 253
References 253
12. GPU Acceleration of Second-Order Møller-Plesset Perturbation Theory with Resolution of Identity 259Roberto Olivares-Amaya, Adrian Jinich, Mark A. Watson and Alán Aspuru-Guzik
12.1 Møller-Plesset Perturbation Theory with Resolution of Identity Approximation (RI-MP2) 259
12.2 A Mixed-Precision Matrix Multiplication Library 263
12.3 Performance of Accelerated RI-MP2 266
12.4 Example Applications 270
12.5 Conclusions 273
References 273
13. Iterative Coupled-Cluster Methods on Graphics Processing Units 279A. Eugene DePrince III, Jeff R. Hammond and C. David Sherrill
13.1 Introduction 279
13.2 Related Work 280
13.3 Theory 281
13.4 Algorithm Details 284
13.5 Computational Details 287
13.6 Results 290
13.7 Conclusions 295
Acknowledgments 296
References 296
14. Perturbative Coupled-Cluster Methods on Graphics Processing Units: Single- and Multi-Reference Formulations 301Wenjing Ma, Kiran Bhaskaran-Nair, Oreste Villa, Edoardo Aprà, Antonino Tumeo, Sriram Krishnamoorthy and Karol Kowalski
14.1 Introduction 302
14.2 Overview of Electronic Structure Methods 303
14.3 NWChem Software Architecture 308
14.4 GPU Implementation 309
14.5 Performance 315
14.6 Outlook 319
Acknowledgments 320
References 320
Index 327
Perri Needham1, Andreas W. Götz2 and Ross C. Walker1,2
1San Diego Supercomputer Center, UCSD, La Jolla, CA, USA
2Department of Chemistry and Biochemistry, UCSD, La Jolla, CA, USA
The first general-purpose electronic computers capable of storing instructions came into existence in 1950. That is not to say, however, that the use of computers to solve electronic structure problems had not already been considered, or realized. From as early as 1930 scientists used a less advanced form of computation to solve their quantum mechanical problems, albeit a group of assistants simultaneously working on mechanical calculators but an early parallel computing machine nonetheless [1]. It was clear from the beginning that solutions to electronic structure problems could not be carried forward to many-electron systems without the use of some computational device to lessen the mathematical burden. Today's computational scientists rely heavily on the use of parallel electronic computers.
Parallel electronic computers can be broadly classified as having either multiple processing elements in the same machine (shared memory) or multiple machines coupled together to form a cluster/grid of processing elements (distributed memory). These arrangements make it possible to perform calculations concurrently across multiple processing elements, enabling large problems to be broken down into smaller parts that can be solved simultaneously (in parallel).
The first electronic computers were primarily designed for and funded by military projects to assist in World War II and the start of the Cold War [2]. The first working programmable digital computer, Konrad Zuse's Z3 [3], was an electromechanical device that became operational in 1941 and was used by the German aeronautical research organization. Colossus, developed by the British for cryptanalysis during World War II, was the world's first programmable electronic digital computer and was responsible for the decryption of valuable German military intelligence from 1944 onwards. Colossus was a purpose-built machine to determine the encryption settings for the German Lorenz cipher and read encrypted messages and instructions from paper tape. It was not until 1955, however, that the first general-purpose machine to execute floating-point arithmetic operations became commercially available, the IBM 704 (see Figure 1.1).
Figure 1.1 Photograph taken in 1957 at NASA featuring an IBM 704 computer, the first commercially available general-purpose computer with floating-point arithmetic hardware [4]
A common measure of compute performance is floating point operations per second (FLOPS). The IBM 704 was capable of a mere 12,000 floating-point additions per second and required 1500-2000 ft2 of floor space. Compare this to modern smartphones, which are capable of around 1.5 GIGA FLOPS [5] thanks to the invention in 1958 and a subsequent six decades of refinement of the integrated circuit. To put this in perspective, if the floor footprint of an IBM 704 was instead covered with modern-day smartphones laid side by side, the computational capacity of the floor space would grow from 12,000 to around 20,000,000,000,000 FLOPS. This is the equivalent of every person on the planet carrying out roughly 2800 floating point additions per second. Statistics like these make it exceptionally clear just how far computer technology has advanced, and, while mobile internet and games might seem like the apex of the technology's capabilities, it has also opened doorways to computationally explore scientific questions in ways previously believed impossible.
Computers today find their use in many different areas of science and industry, from weather forecasting and film making to genetic research, drug discovery, and nuclear weapon design. Without computers many scientific exploits would not be possible.
While the performance of individual computers continued to advance the thirst for computational power for scientific simulation was such that by the late 1950s discussions had turned to utilizing multiple processors, working in harmony, to address more complex scientific problems. The 1960s saw the birth of parallel computing with the invention of multiprocessor systems. The first recorded example of a commercially available multiprocessor (parallel) computer was Burroughs Corporation's D825, released in 1962, which had four processors that accessed up to 16 memory modules via a cross switch (see Figure 1.2).
Figure 1.2 Photograph of Burroughs Corporation's D825 parallel computer [6]
This was followed in the 1970s by the concept of single-instruction multiple-data (SIMD) processor architectures, forming the basis of vector parallel computing. SIMD is an important concept in graphics processing unit (GPU) computing and is discussed in the next chapter.
Parallel computing opened the door to tackling complex scientific problems including modeling electrons in molecular systems through quantum mechanical means (the subject of this book). To give an example, optimizing the geometry of any but the smallest molecular systems using sophisticated electronic structure methods can take days (if not weeks) on a single processor element (compute core). Parallelizing the calculation over multiple compute cores can significantly cut down the required computing time and thus enables a researcher to study complex molecular systems in more practical time frames, achieving insights otherwise thought inaccessible. The use of parallel electronic computers in quantum chemistry was pioneered in the early 1980s by the Italian chemist Enrico Clementi and co-workers [7]. The parallel computer consisted of 10 compute nodes, loosely coupled into an array, which was used to calculate the Hartree-Fock (HF) self-consistent field (SCF) energy of a small fragment of DNA represented by 315 basis functions. At the time this was a considerable achievement. However, this was just the start, and by the late 1980s all sorts of parallel programs had been developed for quantum chemistry methods. These included HF methods to calculate the energy and nuclear gradients of a molecular system [8-11], the transformation of two-electron integrals [8, 9, 12], the second-order Møller-Plesset perturbation theory [9, 13], and the configuration interaction method [8]. The development of parallel computing in quantum chemistry was dictated by developments in available technologies. In particular, the advent of application programming interfaces (APIs) such as the message-passing interface (MPI) library [14] made parallel computing much more accessible to quantum chemists, along with developments in hardware technology driving down the cost of parallel computing machines [10].
While finding widespread use in scientific computing until recently, parallel computing was reserved for those with access to high-performance computing (HPC) resources. However, for reasons discussed in the following, all modern computer architectures exploit parallel technology, and effective parallel programming is vital to be able to utilize the computational power of modern devices. Parallel processing is now standard across all devices fitted with modern-day processor architectures. In his 1965 paper [15], Gordon E. Moore first observed that the number of transistors (in principle, directly related to performance) on integrated circuits was doubling every 2 years (see Figure 1.3).
Figure 1.3 Microprocessor transistor counts 1971-2011. Until recently, the number of transistors on integrated circuits has been following Moore's Law [16], doubling approximately every 2 years
Since this observation was announced, the semiconductor industry has preserved this trend by ensuring that chip performance doubles every 18 months through improved transistor efficiency and/or quantity. In order to meet these performance goals the semiconductor industry has now improved chip design close to the limits of what is physically possible. The laws of physics dictate the minimum size of a transistor, the rate of heat dissipation, and the speed of light.
"The size of transistors is approaching the size of atoms, which is a fundamental barrier" [17].
At the same time, the clock frequencies cannot be easily increased since both clock frequency and transistor density increase the power density, as illustrated by Figure 1.4. Processors are already operating at a power density that exceeds that of a hot plate and are approaching that of the core of a nuclear reactor.
Figure 1.4 Illustration of the ever-increasing power density within silicon chips, with decreasing gate length.
Courtesy Intel Corporation [18]
In order to continue scaling with Moore's Law, but keep power densities manageable, chip manufacturers have taken to increasing the number of cores per processor as opposed to transistors per core. Most processors produced today comprise multiple cores and so are parallel processing machines by definition. In terms of processor performance, this is a tremendous boon to science and industry; however, the increasing number of cores brings with them increased complexity to the programmer in order to fully utilize the available compute power. It is becoming more and more difficult for applications to achieve good scaling with increasing core counts, and hence...
Dateiformat: ePUBKopierschutz: Adobe-DRM (Digital Rights Management)
Systemvoraussetzungen:
Das Dateiformat ePUB ist sehr gut für Romane und Sachbücher geeignet – also für „fließenden” Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein „harter” Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.Bitte beachten Sie: Wir empfehlen Ihnen unbedingt nach Installation der Lese-Software diese mit Ihrer persönlichen Adobe-ID zu autorisieren!
Weitere Informationen finden Sie in unserer E-Book Hilfe.