
Python Data Science Handbook
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Python is a first-class tool for many researchers, primarily because of its libraries for storing, manipulating, and gaining insight from data. Several resources exist for individual pieces of this data science stack, but only with the new edition of Python Data Science Handbook do you get them all--IPython, NumPy, pandas, Matplotlib, scikit-learn, and other related tools.
Working scientists and data crunchers familiar with reading and writing Python code will find the second edition of this comprehensive desk reference ideal for tackling day-to-day issues: manipulating, transforming, and cleaning data; visualizing different types of data; and using data to build statistical or machine learning models. Quite simply, this is the must-have reference for scientific computing in Python.
With this handbook, you''ll learn how:
- IPython and Jupyter provide computational environments for scientists using Python
- NumPy includes the ndarray for efficient storage and manipulation of dense data arrays
- Pandas contains the DataFrame for efficient storage and manipulation of labeled/columnar data
- Matplotlib includes capabilities for a flexible range of data visualizations
- Scikit-learn helps you build efficient and clean Python implementations of the most important and established machine learning algorithms
More details
Other editions
Additional editions

Content
- Intro
- Copyright
- Table of Contents
- Preface
- What Is Data Science?
- Who Is This Book For?
- Why Python?
- Outline of the Book
- Installation Considerations
- Conventions Used in This Book
- Using Code Examples
- O'Reilly Online Learning
- How to Contact Us
- Part I. Jupyter: Beyond Normal Python
- Chapter 1. Getting Started in IPython and Jupyter
- Launching the IPython Shell
- Launching the Jupyter Notebook
- Help and Documentation in IPython
- Accessing Documentation with ?
- Accessing Source Code with ??
- Exploring Modules with Tab Completion
- Keyboard Shortcuts in the IPython Shell
- Navigation Shortcuts
- Text Entry Shortcuts
- Command History Shortcuts
- Miscellaneous Shortcuts
- Chapter 2. Enhanced Interactive Features
- IPython Magic Commands
- Running External Code: %run
- Timing Code Execution: %timeit
- Help on Magic Functions: ?, %magic, and %lsmagic
- Input and Output History
- IPython's In and Out Objects
- Underscore Shortcuts and Previous Outputs
- Suppressing Output
- Related Magic Commands
- IPython and Shell Commands
- Quick Introduction to the Shell
- Shell Commands in IPython
- Passing Values to and from the Shell
- Shell-Related Magic Commands
- Chapter 3. Debugging and Profiling
- Errors and Debugging
- Controlling Exceptions: %xmode
- Debugging: When Reading Tracebacks Is Not Enough
- Profiling and Timing Code
- Timing Code Snippets: %timeit and %time
- Profiling Full Scripts: %prun
- Line-by-Line Profiling with %lprun
- Profiling Memory Use: %memit and %mprun
- More IPython Resources
- Web Resources
- Books
- Part II. Introduction to NumPy
- Chapter 4. Understanding Data Types in Python
- A Python Integer Is More Than Just an Integer
- A Python List Is More Than Just a List
- Fixed-Type Arrays in Python
- Creating Arrays from Python Lists
- Creating Arrays from Scratch
- NumPy Standard Data Types
- Chapter 5. The Basics of NumPy Arrays
- NumPy Array Attributes
- Array Indexing: Accessing Single Elements
- Array Slicing: Accessing Subarrays
- One-Dimensional Subarrays
- Multidimensional Subarrays
- Subarrays as No-Copy Views
- Creating Copies of Arrays
- Reshaping of Arrays
- Array Concatenation and Splitting
- Concatenation of Arrays
- Splitting of Arrays
- Chapter 6. Computation on NumPy Arrays: Universal Functions
- The Slowness of Loops
- Introducing Ufuncs
- Exploring NumPy's Ufuncs
- Array Arithmetic
- Absolute Value
- Trigonometric Functions
- Exponents and Logarithms
- Specialized Ufuncs
- Advanced Ufunc Features
- Specifying Output
- Aggregations
- Outer Products
- Ufuncs: Learning More
- Chapter 7. Aggregations: min, max, and Everything in Between
- Summing the Values in an Array
- Minimum and Maximum
- Multidimensional Aggregates
- Other Aggregation Functions
- Example: What Is the Average Height of US Presidents?
- Chapter 8. Computation on Arrays: Broadcasting
- Introducing Broadcasting
- Rules of Broadcasting
- Broadcasting Example 1
- Broadcasting Example 2
- Broadcasting Example 3
- Broadcasting in Practice
- Centering an Array
- Plotting a Two-Dimensional Function
- Chapter 9. Comparisons, Masks, and Boolean Logic
- Example: Counting Rainy Days
- Comparison Operators as Ufuncs
- Working with Boolean Arrays
- Counting Entries
- Boolean Operators
- Boolean Arrays as Masks
- Using the Keywords and/or Versus the Operators &/|
- Chapter 10. Fancy Indexing
- Exploring Fancy Indexing
- Combined Indexing
- Example: Selecting Random Points
- Modifying Values with Fancy Indexing
- Example: Binning Data
- Chapter 11. Sorting Arrays
- Fast Sorting in NumPy: np.sort and np.argsort
- Sorting Along Rows or Columns
- Partial Sorts: Partitioning
- Example: k-Nearest Neighbors
- Chapter 12. Structured Data: NumPy's Structured Arrays
- Exploring Structured Array Creation
- More Advanced Compound Types
- Record Arrays: Structured Arrays with a Twist
- On to Pandas
- Part III. Data Manipulation with Pandas
- Chapter 13. Introducing Pandas Objects
- The Pandas Series Object
- Series as Generalized NumPy Array
- Series as Specialized Dictionary
- Constructing Series Objects
- The Pandas DataFrame Object
- DataFrame as Generalized NumPy Array
- DataFrame as Specialized Dictionary
- Constructing DataFrame Objects
- The Pandas Index Object
- Index as Immutable Array
- Index as Ordered Set
- Chapter 14. Data Indexing and Selection
- Data Selection in Series
- Series as Dictionary
- Series as One-Dimensional Array
- Indexers: loc and iloc
- Data Selection in DataFrames
- DataFrame as Dictionary
- DataFrame as Two-Dimensional Array
- Additional Indexing Conventions
- Chapter 15. Operating on Data in Pandas
- Ufuncs: Index Preservation
- Ufuncs: Index Alignment
- Index Alignment in Series
- Index Alignment in DataFrames
- Ufuncs: Operations Between DataFrames and Series
- Chapter 16. Handling Missing Data
- Trade-offs in Missing Data Conventions
- Missing Data in Pandas
- None as a Sentinel Value
- NaN: Missing Numerical Data
- NaN and None in Pandas
- Pandas Nullable Dtypes
- Operating on Null Values
- Detecting Null Values
- Dropping Null Values
- Filling Null Values
- Chapter 17. Hierarchical Indexing
- A Multiply Indexed Series
- The Bad Way
- The Better Way: The Pandas MultiIndex
- MultiIndex as Extra Dimension
- Methods of MultiIndex Creation
- Explicit MultiIndex Constructors
- MultiIndex Level Names
- MultiIndex for Columns
- Indexing and Slicing a MultiIndex
- Multiply Indexed Series
- Multiply Indexed DataFrames
- Rearranging Multi-Indexes
- Sorted and Unsorted Indices
- Stacking and Unstacking Indices
- Index Setting and Resetting
- Chapter 18. Combining Datasets: concat and append
- Recall: Concatenation of NumPy Arrays
- Simple Concatenation with pd.concat
- Duplicate Indices
- Concatenation with Joins
- The append Method
- Chapter 19. Combining Datasets: merge and join
- Relational Algebra
- Categories of Joins
- One-to-One Joins
- Many-to-One Joins
- Many-to-Many Joins
- Specification of the Merge Key
- The on Keyword
- The left_on and right_on Keywords
- The left_index and right_index Keywords
- Specifying Set Arithmetic for Joins
- Overlapping Column Names: The suffixes Keyword
- Example: US States Data
- Chapter 20. Aggregation and Grouping
- Planets Data
- Simple Aggregation in Pandas
- groupby: Split, Apply, Combine
- Split, Apply, Combine
- The GroupBy Object
- Aggregate, Filter, Transform, Apply
- Specifying the Split Key
- Grouping Example
- Chapter 21. Pivot Tables
- Motivating Pivot Tables
- Pivot Tables by Hand
- Pivot Table Syntax
- Multilevel Pivot Tables
- Additional Pivot Table Options
- Example: Birthrate Data
- Chapter 22. Vectorized String Operations
- Introducing Pandas String Operations
- Tables of Pandas String Methods
- Methods Similar to Python String Methods
- Methods Using Regular Expressions
- Miscellaneous Methods
- Example: Recipe Database
- A Simple Recipe Recommender
- Going Further with Recipes
- Chapter 23. Working with Time Series
- Dates and Times in Python
- Native Python Dates and Times: datetime and dateutil
- Typed Arrays of Times: NumPy's datetime64
- Dates and Times in Pandas: The Best of Both Worlds
- Pandas Time Series: Indexing by Time
- Pandas Time Series Data Structures
- Regular Sequences: pd.date_range
- Frequencies and Offsets
- Resampling, Shifting, and Windowing
- Resampling and Converting Frequencies
- Time Shifts
- Rolling Windows
- Example: Visualizing Seattle Bicycle Counts
- Visualizing the Data
- Digging into the Data
- Chapter 24. High-Performance Pandas: eval and query
- Motivating query and eval: Compound Expressions
- pandas.eval for Efficient Operations
- DataFrame.eval for Column-Wise Operations
- Assignment in DataFrame.eval
- Local Variables in DataFrame.eval
- The DataFrame.query Method
- Performance: When to Use These Functions
- Further Resources
- Part IV. Visualization with Matplotlib
- Chapter 25. General Matplotlib Tips
- Importing Matplotlib
- Setting Styles
- show or No show? How to Display Your Plots
- Plotting from a Script
- Plotting from an IPython Shell
- Plotting from a Jupyter Notebook
- Saving Figures to File
- Two Interfaces for the Price of One
- Chapter 26. Simple Line Plots
- Adjusting the Plot: Line Colors and Styles
- Adjusting the Plot: Axes Limits
- Labeling Plots
- Matplotlib Gotchas
- Chapter 27. Simple Scatter Plots
- Scatter Plots with plt.plot
- Scatter Plots with plt.scatter
- plot Versus scatter: A Note on Efficiency
- Visualizing Uncertainties
- Basic Errorbars
- Continuous Errors
- Chapter 28. Density and Contour Plots
- Visualizing a Three-Dimensional Function
- Histograms, Binnings, and Density
- Two-Dimensional Histograms and Binnings
- plt.hist2d: Two-Dimensional Histogram
- plt.hexbin: Hexagonal Binnings
- Kernel Density Estimation
- Chapter 29. Customizing Plot Legends
- Choosing Elements for the Legend
- Legend for Size of Points
- Multiple Legends
- Chapter 30. Customizing Colorbars
- Customizing Colorbars
- Choosing the Colormap
- Color Limits and Extensions
- Discrete Colorbars
- Example: Handwritten Digits
- Chapter 31. Multiple Subplots
- plt.axes: Subplots by Hand
- plt.subplot: Simple Grids of Subplots
- plt.subplots: The Whole Grid in One Go
- plt.GridSpec: More Complicated Arrangements
- Chapter 32. Text and Annotation
- Example: Effect of Holidays on US Births
- Transforms and Text Position
- Arrows and Annotation
- Chapter 33. Customizing Ticks
- Major and Minor Ticks
- Hiding Ticks or Labels
- Reducing or Increasing the Number of Ticks
- Fancy Tick Formats
- Summary of Formatters and Locators
- Chapter 34. Customizing Matplotlib: Configurations and Stylesheets
- Plot Customization by Hand
- Changing the Defaults: rcParams
- Stylesheets
- Default Style
- FiveThiryEight Style
- ggplot Style
- Bayesian Methods for Hackers Style
- Dark Background Style
- Grayscale Style
- Seaborn Style
- Chapter 35. Three-Dimensional Plotting in Matplotlib
- Three-Dimensional Points and Lines
- Three-Dimensional Contour Plots
- Wireframes and Surface Plots
- Surface Triangulations
- Example: Visualizing a Möbius Strip
- Chapter 36. Visualization with Seaborn
- Exploring Seaborn Plots
- Histograms, KDE, and Densities
- Pair Plots
- Faceted Histograms
- Categorical Plots
- Joint Distributions
- Bar Plots
- Example: Exploring Marathon Finishing Times
- Further Resources
- Other Python Visualization Libraries
- Part V. Machine Learning
- Chapter 37. What Is Machine Learning?
- Categories of Machine Learning
- Qualitative Examples of Machine Learning Applications
- Classification: Predicting Discrete Labels
- Regression: Predicting Continuous Labels
- Clustering: Inferring Labels on Unlabeled Data
- Dimensionality Reduction: Inferring Structure of Unlabeled Data
- Summary
- Chapter 38. Introducing Scikit-Learn
- Data Representation in Scikit-Learn
- The Features Matrix
- The Target Array
- The Estimator API
- Basics of the API
- Supervised Learning Example: Simple Linear Regression
- Supervised Learning Example: Iris Classification
- Unsupervised Learning Example: Iris Dimensionality
- Unsupervised Learning Example: Iris Clustering
- Application: Exploring Handwritten Digits
- Loading and Visualizing the Digits Data
- Unsupervised Learning Example: Dimensionality Reduction
- Classification on Digits
- Summary
- Chapter 39. Hyperparameters and Model Validation
- Thinking About Model Validation
- Model Validation the Wrong Way
- Model Validation the Right Way: Holdout Sets
- Model Validation via Cross-Validation
- Selecting the Best Model
- The Bias-Variance Trade-off
- Validation Curves in Scikit-Learn
- Learning Curves
- Validation in Practice: Grid Search
- Summary
- Chapter 40. Feature Engineering
- Categorical Features
- Text Features
- Image Features
- Derived Features
- Imputation of Missing Data
- Feature Pipelines
- Chapter 41. In Depth: Naive Bayes Classification
- Bayesian Classification
- Gaussian Naive Bayes
- Multinomial Naive Bayes
- Example: Classifying Text
- When to Use Naive Bayes
- Chapter 42. In Depth: Linear Regression
- Simple Linear Regression
- Basis Function Regression
- Polynomial Basis Functions
- Gaussian Basis Functions
- Regularization
- Ridge Regression (L2 Regularization)
- Lasso Regression (L1 Regularization)
- Example: Predicting Bicycle Traffic
- Chapter 43. In Depth: Support Vector Machines
- Motivating Support Vector Machines
- Support Vector Machines: Maximizing the Margin
- Fitting a Support Vector Machine
- Beyond Linear Boundaries: Kernel SVM
- Tuning the SVM: Softening Margins
- Example: Face Recognition
- Summary
- Chapter 44. In Depth: Decision Trees and Random Forests
- Motivating Random Forests: Decision Trees
- Creating a Decision Tree
- Decision Trees and Overfitting
- Ensembles of Estimators: Random Forests
- Random Forest Regression
- Example: Random Forest for Classifying Digits
- Summary
- Chapter 45. In Depth: Principal Component Analysis
- Introducing Principal Component Analysis
- PCA as Dimensionality Reduction
- PCA for Visualization: Handwritten Digits
- What Do the Components Mean?
- Choosing the Number of Components
- PCA as Noise Filtering
- Example: Eigenfaces
- Summary
- Chapter 46. In Depth: Manifold Learning
- Manifold Learning: "HELLO"
- Multidimensional Scaling
- MDS as Manifold Learning
- Nonlinear Embeddings: Where MDS Fails
- Nonlinear Manifolds: Locally Linear Embedding
- Some Thoughts on Manifold Methods
- Example: Isomap on Faces
- Example: Visualizing Structure in Digits
- Chapter 47. In Depth: k-Means Clustering
- Introducing k-Means
- Expectation-Maximization
- Examples
- Example 1: k-Means on Digits
- Example 2: k-Means for Color Compression
- Chapter 48. In Depth: Gaussian Mixture Models
- Motivating Gaussian Mixtures: Weaknesses of k-Means
- Generalizing E-M: Gaussian Mixture Models
- Choosing the Covariance Type
- Gaussian Mixture Models as Density Estimation
- Example: GMMs for Generating New Data
- Chapter 49. In Depth: Kernel Density Estimation
- Motivating Kernel Density Estimation: Histograms
- Kernel Density Estimation in Practice
- Selecting the Bandwidth via Cross-Validation
- Example: Not-so-Naive Bayes
- Anatomy of a Custom Estimator
- Using Our Custom Estimator
- Chapter 50. Application: A Face Detection Pipeline
- HOG Features
- HOG in Action: A Simple Face Detector
- 1. Obtain a Set of Positive Training Samples
- 2. Obtain a Set of Negative Training Samples
- 3. Combine Sets and Extract HOG Features
- 4. Train a Support Vector Machine
- 5. Find Faces in a New Image
- Caveats and Improvements
- Further Machine Learning Resources
- Index
- About the Author
- Colophon
System requirements
File format: PDF
Copy-Protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our eBook Help page.