High Performance Python

Practical Performant Programming for Humans
 
 
O'Reilly (Verlag)
  • erschienen am 30. April 2020
  • |
  • 468 Seiten
 
E-Book | ePUB mit Adobe-DRM | Systemvoraussetzungen
978-1-4920-5497-9 (ISBN)
 
Your Python code may run correctly, but you need it to run faster. Updated for Python 3, this expanded edition shows you how to locate performance bottlenecks and significantly speed up your code in high-data-volume programs. By exploring the fundamental theory behind design choices, High Performance Python helps you gain a deeper understanding of Pythons implementation.How do you take advantage of multicore architectures or clusters? Or build a system that scales up and down without losing reliability? Experienced Python programmers will learn concrete solutions to many issues, along with war stories from companies that use high-performance Python for social media analytics, productionized machine learning, and more.Get a better grasp of NumPy, Cython, and profilersLearn how Python abstracts the underlying computer architectureUse profiling to find bottlenecks in CPU time and memory usageWrite efficient programs by choosing appropriate data structuresSpeed up matrix and vector computationsUse tools to compile Python down to machine codeManage multiple I/O and computational operations concurrentlyConvert multiprocessing code to run on local or remote clustersDeploy code faster using tools like Docker
  • Englisch
  • Sebastopol
  • |
  • USA
  • 6,67 MB
978-1-4920-5497-9 (9781492054979)
weitere Ausgaben werden ermittelt
  • Intro
  • Foreword
  • Preface
  • Who This Book Is For
  • Who This Book Is Not For
  • What You'll Learn
  • Python 3
  • Changes from Python 2.7
  • License
  • How to Make an Attribution
  • Errata and Feedback
  • Conventions Used in This Book
  • Using Code Examples
  • O'Reilly Online Learning
  • How to Contact Us
  • Acknowledgments
  • 1. Understanding Performant Python
  • The Fundamental Computer System
  • Computing Units
  • Memory Units
  • Communications Layers
  • Putting the Fundamental Elements Together
  • Idealized Computing Versus the Python Virtual Machine
  • Idealized computing
  • Python's virtual machine
  • So Why Use Python?
  • How to Be a Highly Performant Programmer
  • Good Working Practices
  • Some Thoughts on Good Notebook Practice
  • Getting the Joy Back into Your Work
  • 2. Profiling to Find Bottlenecks
  • Profiling Efficiently
  • Introducing the Julia Set
  • Calculating the Full Julia Set
  • Simple Approaches to Timing-print and a Decorator
  • Simple Timing Using the Unix time Command
  • Using the cProfile Module
  • Visualizing cProfile Output with SnakeViz
  • Using line_profiler for Line-by-Line Measurements
  • Using memory_profiler to Diagnose Memory Usage
  • Introspecting an Existing Process with PySpy
  • Bytecode: Under the Hood
  • Using the dis Module to Examine CPython Bytecode
  • Different Approaches, Different Complexity
  • Unit Testing During Optimization to Maintain Correctness
  • No-op @profile Decorator
  • Strategies to Profile Your Code Successfully
  • Wrap-Up
  • 3. Lists and Tuples
  • A More Efficient Search
  • Lists Versus Tuples
  • Lists as Dynamic Arrays
  • Tuples as Static Arrays
  • Wrap-Up
  • 4. Dictionaries and Sets
  • How Do Dictionaries and Sets Work?
  • Inserting and Retrieving
  • Deletion
  • Resizing
  • Hash Functions and Entropy
  • Dictionaries and Namespaces
  • Wrap-Up
  • 5. Iterators and Generators
  • Iterators for Infinite Series
  • Lazy Generator Evaluation
  • Wrap-Up
  • 6. Matrix and Vector Computation
  • Introduction to the Problem
  • Aren't Python Lists Good Enough?
  • Problems with Allocating Too Much
  • Memory Fragmentation
  • Understanding perf
  • Making Decisions with perf's Output
  • Enter numpy
  • Applying numpy to the Diffusion Problem
  • Memory Allocations and In-Place Operations
  • Selective Optimizations: Finding What Needs to Be Fixed
  • numexpr: Making In-Place Operations Faster and Easier
  • A Cautionary Tale: Verify "Optimizations" (scipy)
  • Lessons from Matrix Optimizations
  • Pandas
  • Pandas's Internal Model
  • Applying a Function to Many Rows of Data
  • Which OLS implementation should we use?
  • Applying lstsq to our rows of data
  • Building DataFrames and Series from Partial Results Rather than Concatenating
  • There's More Than One (and Possibly a Faster) Way to Do a Job
  • Advice for Effective Pandas Development
  • Wrap-Up
  • 7. Compiling to C
  • What Sort of Speed Gains Are Possible?
  • JIT Versus AOT Compilers
  • Why Does Type Information Help the Code Run Faster?
  • Using a C Compiler
  • Reviewing the Julia Set Example
  • Cython
  • Compiling a Pure Python Version Using Cython
  • pyximport
  • Cython Annotations to Analyze a Block of Code
  • Adding Some Type Annotations
  • Cython and numpy
  • Parallelizing the Solution with OpenMP on One Machine
  • Numba
  • Numba to Compile NumPy for Pandas
  • PyPy
  • Garbage Collection Differences
  • Running PyPy and Installing Modules
  • A Summary of Speed Improvements
  • When to Use Each Technology
  • Other Upcoming Projects
  • Graphics Processing Units (GPUs)
  • Dynamic Graphs: PyTorch
  • Basic GPU Profiling
  • Performance Considerations of GPUs
  • When to Use GPUs
  • Foreign Function Interfaces
  • ctypes
  • cffi
  • f2py
  • CPython Module
  • Wrap-Up
  • 8. Asynchronous I/O
  • Introduction to Asynchronous Programming
  • How Does async/await Work?
  • Serial Crawler
  • Gevent
  • tornado
  • aiohttp
  • Shared CPU-I/O Workload
  • Serial
  • Batched Results
  • Full Async
  • Wrap-Up
  • 9. The multiprocessing Module
  • An Overview of the multiprocessing Module
  • Estimating Pi Using the Monte Carlo Method
  • Estimating Pi Using Processes and Threads
  • Using Python Objects
  • Replacing multiprocessing with Joblib
  • Intelligent caching of function call results
  • Random Numbers in Parallel Systems
  • Using numpy
  • Finding Prime Numbers
  • Queues of Work
  • Asynchronously adding jobs to the Queue
  • Verifying Primes Using Interprocess Communication
  • Serial Solution
  • Naive Pool Solution
  • A Less Naive Pool Solution
  • Using Manager.Value as a Flag
  • Using Redis as a Flag
  • Using RawValue as a Flag
  • Using mmap as a Flag
  • Using mmap as a Flag Redux
  • Sharing numpy Data with multiprocessing
  • Synchronizing File and Variable Access
  • File Locking
  • Locking a Value
  • Wrap-Up
  • 10. Clusters and Job Queues
  • Benefits of Clustering
  • Drawbacks of Clustering
  • $462 Million Wall Street Loss Through Poor Cluster Upgrade Strategy
  • Skype's 24-Hour Global Outage
  • Common Cluster Designs
  • How to Start a Clustered Solution
  • Ways to Avoid Pain When Using Clusters
  • Two Clustering Solutions
  • Using IPython Parallel to Support Research
  • Parallel Pandas with Dask
  • Parallelized apply with Swifter on Dask
  • Vaex for bigger-than-RAM DataFrames
  • NSQ for Robust Production Clustering
  • Queues
  • Pub/sub
  • Distributed Prime Calculation
  • Other Clustering Tools to Look At
  • Docker
  • Docker's Performance
  • Advantages of Docker
  • Wrap-Up
  • 11. Using Less RAM
  • Objects for Primitives Are Expensive
  • The array Module Stores Many Primitive Objects Cheaply
  • Using Less RAM in NumPy with NumExpr
  • Understanding the RAM Used in a Collection
  • Bytes Versus Unicode
  • Efficiently Storing Lots of Text in RAM
  • Trying These Approaches on 11 Million Tokens
  • list
  • set
  • More efficient tree structures
  • Directed acyclic word graph
  • Marisa trie
  • Using tries (and DAWGs) in production systems
  • Modeling More Text with Scikit-Learn's FeatureHasher
  • Introducing DictVectorizer and FeatureHasher
  • Comparing DictVectorizer and FeatureHasher on a Real Problem
  • SciPy's Sparse Matrices
  • Tips for Using Less RAM
  • Probabilistic Data Structures
  • Very Approximate Counting with a 1-Byte Morris Counter
  • K-Minimum Values
  • Bloom Filters
  • LogLog Counter
  • Real-World Example
  • 12. Lessons from the Field
  • Streamlining Feature Engineering Pipelines with Feature-engine
  • Feature Engineering for Machine Learning
  • The Hard Task of Deploying Feature Engineering Pipelines
  • Leveraging the Power of Open Source Python Libraries
  • Feature-engine Smooths Building and Deployment of Feature Engineering Pipelines
  • Helping with the Adoption of a New Open Source Package
  • Developing, Maintaining, and Encouraging Contribution to Open Source Libraries
  • Highly Performant Data Science Teams
  • How Long Will It Take?
  • Discovery and Planning
  • Managing Expectations and Delivery
  • Numba
  • A Simple Example
  • Best Practices and Recommendations
  • Getting Help
  • Optimizing Versus Thinking
  • Adaptive Lab's Social Media Analytics (2014)
  • Python at Adaptive Lab
  • SoMA's Design
  • Our Development Methodology
  • Maintaining SoMA
  • Advice for Fellow Engineers
  • Making Deep Learning Fly with RadimRehurek.com (2014)
  • The Sweet Spot
  • Lessons in Optimizing
  • Conclusion
  • Large-Scale Productionized Machine Learning at Lyst.com (2014)
  • Cluster Design
  • Code Evolution in a Fast-Moving Start-Up
  • Building the Recommendation Engine
  • Reporting and Monitoring
  • Some Advice
  • Large-Scale Social Media Analysis at Smesh (2014)
  • Python's Role at Smesh
  • The Platform
  • High Performance Real-Time String Matching
  • Reporting, Monitoring, Debugging, and Deployment
  • PyPy for Successful Web and Data Processing Systems (2014)
  • Prerequisites
  • The Database
  • The Web Application
  • OCR and Translation
  • Task Distribution and Workers
  • Conclusion
  • Task Queues at Lanyrd.com (2014)
  • Python's Role at Lanyrd
  • Making the Task Queue Performant
  • Reporting, Monitoring, Debugging, and Deployment
  • Advice to a Fellow Developer
  • Index

Dateiformat: ePUB
Kopierschutz: Adobe-DRM (Digital Rights Management)

Systemvoraussetzungen:

Computer (Windows; MacOS X; Linux): Installieren Sie bereits vor dem Download die kostenlose Software Adobe Digital Editions (siehe E-Book Hilfe).

Tablet/Smartphone (Android; iOS): Installieren Sie bereits vor dem Download die kostenlose App Adobe Digital Editions (siehe E-Book Hilfe).

E-Book-Reader: Bookeen, Kobo, Pocketbook, Sony, Tolino u.v.a.m. (nicht Kindle)

Das Dateiformat ePUB ist sehr gut für Romane und Sachbücher geeignet - also für "fließenden" Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein "harter" Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.

Bitte beachten Sie bei der Verwendung der Lese-Software Adobe Digital Editions: wir empfehlen Ihnen unbedingt nach Installation der Lese-Software diese mit Ihrer persönlichen Adobe-ID zu autorisieren!

Weitere Informationen finden Sie in unserer E-Book Hilfe.


Download (sofort verfügbar)

58,49 €
inkl. 5% MwSt.
Download / Einzel-Lizenz
ePUB mit Adobe-DRM
siehe Systemvoraussetzungen
E-Book bestellen