
scikit-learn Cookbook , Second Edition
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions

Person
Trent Hauck is a data scientist living and working in the Seattle area. He grew up in Wichita, Kansas and received his undergraduate and graduate degrees from the University of Kansas. He is the author of the book Instant Data Intensive Apps with pandas How-to, Packt Publishing-a book that can get you up to speed quickly with pandas and other associated technologies.
Content
- Cover
- Copyright
- Credits
- About the Authors
- About the Reviewer
- www.PacktPub.com
- Customer Feedback
- Table of Contents
- Preface
- Chapter 1: High-Performance Machine Learning - NumPy
- Introduction
- NumPy basics
- How to do it...
- The shape and dimension of NumPy arrays
- NumPy broadcasting
- Initializing NumPy arrays and dtypes
- Indexing
- Boolean arrays
- Arithmetic operations
- NaN values
- How it works...
- Loading the iris dataset
- Getting ready
- How to do it...
- How it works...
- Viewing the iris dataset
- How to do it...
- How it works...
- There's more...
- Viewing the iris dataset with Pandas
- How to do it...
- How it works...
- Plotting with NumPy and matplotlib
- Getting ready
- How to do it...
- A minimal machine learning recipe - SVM classification
- Getting ready
- How to do it...
- How it works...
- There's more...
- Introducing cross-validation
- Getting ready
- How to do it...
- How it works...
- There's more...
- Putting it all together
- How to do it...
- There's more...
- Machine learning overview - classification versus regression
- The purpose of scikit-learn
- Supervised versus unsupervised
- Getting ready
- How to do it...
- Quick SVC - a classifier and regressor
- Making a scorer
- How it works...
- There's more...
- Linear versus nonlinear
- Black box versus not
- Interpretability
- A pipeline
- Chapter 2: Pre-Model Workflow and Pre-Processing
- Introduction
- Creating sample data for toy analysis
- Getting ready
- How to do it...
- Creating a regression dataset
- Creating an unbalanced classification dataset
- Creating a dataset for clustering
- How it works...
- Scaling data to the standard normal distribution
- Getting ready
- How to do it...
- How it works...
- Creating binary features through thresholding
- Getting ready
- How to do it...
- There's more...
- Sparse matrices
- The fit method
- Working with categorical variables
- Getting ready
- How to do it...
- How it works...
- There's more...
- DictVectorizer class
- Imputing missing values through various strategies
- Getting ready
- How to do it...
- How it works...
- There's more...
- A linear model in the presence of outliers
- Getting ready
- How to do it...
- How it works...
- Putting it all together with pipelines
- Getting ready
- How to do it...
- How it works...
- There's more...
- Using Gaussian processes for regression
- Getting ready
- How to do it.
- Cross-validation with the noise parameter
- There's more...
- Using SGD for regression
- Getting ready
- How to do it.
- How it works.
- Chapter 3: Dimensionality Reduction
- Introduction
- Reducing dimensionality with PCA
- Getting ready
- How to do it...
- How it works...
- There's more...
- Using factor analysis for decomposition
- Getting ready
- How to do it...
- How it works...
- Using kernel PCA for nonlinear dimensionality reduction
- Getting ready
- How to do it...
- How it works...
- Using truncated SVD to reduce dimensionality
- Getting ready
- How to do it...
- How it works...
- There's more...
- Sign flipping
- Sparse matrices
- Using decomposition to classify with DictionaryLearning
- Getting ready
- How to do it...
- How it works...
- Doing dimensionality reduction with manifolds - t-SNE
- Getting ready
- How to do it...
- How it works...
- Testing methods to reduce dimensionality with pipelines
- Getting ready
- How to do it...
- How it works...
- Chapter 4: Linear Models with scikit-learn
- Introduction
- Fitting a line through data
- Getting ready
- How to do it...
- How it works...
- There's more...
- Fitting a line through data with machine learning
- Getting ready
- How to do it...
- Evaluating the linear regression model
- Getting ready
- How to do it...
- How it works...
- There's more...
- Using ridge regression to overcome linear regression's shortfalls
- Getting ready
- How to do it...
- Optimizing the ridge regression parameter
- Getting ready
- How to do it...
- How it works...
- There's more...
- Bayesian ridge regression
- Using sparsity to regularize models
- Getting ready
- How to do it...
- How it works...
- LASSO cross-validation - LASSOCV
- LASSO for feature selection
- Taking a more fundamental approach to regularization with LARS
- Getting ready
- How to do it...
- How it works...
- There's more...
- References
- Chapter 5: Linear Models - Logistic Regression
- Introduction
- Using linear methods for classification - logistic regression
- Loading data from the UCI repository
- How to do it...
- Viewing the Pima Indians diabetes dataset with pandas
- How to do it...
- Looking at the UCI Pima Indians dataset web page
- How to do it...
- View the citation policy
- Read about missing values and context
- Machine learning with logistic regression
- Getting ready
- Define X, y - the feature and target arrays
- How to do it...
- Provide training and testing sets
- Train the logistic regression
- Score the logistic regression
- Examining logistic regression errors with a confusion matrix
- Getting ready
- How to do it...
- Reading the confusion matrix
- General confusion matrix in context
- Varying the classification threshold in logistic regression
- Getting ready
- How to do it...
- Receiver operating characteristic - ROC analysis
- Getting ready
- Sensitivity
- A visual perspective
- How to do it...
- Calculating TPR in scikit-learn
- Plotting sensitivity
- There's more...
- The confusion matrix in a non-medical context
- Plotting an ROC curve without context
- How to do it...
- Perfect classifier
- Imperfect classifier
- AUC - the area under the ROC curve
- Putting it all together - UCI breast cancer dataset
- How to do it...
- Outline for future projects
- Chapter 6: Building Models with Distance Metrics
- Introduction
- Using k-means to cluster data
- Getting ready
- How to do it.
- How it works...
- Optimizing the number of centroids
- Getting ready
- How to do it...
- How it works...
- Assessing cluster correctness
- Getting ready
- How to do it...
- There's more...
- Using MiniBatch k-means to handle more data
- Getting ready
- How to do it...
- How it works...
- Quantizing an image with k-means clustering
- Getting ready
- How do it.
- How it works.
- Finding the closest object in the feature space
- Getting ready
- How to do it...
- How it works...
- There's more...
- Probabilistic clustering with Gaussian mixture models
- Getting ready
- How to do it...
- How it works...
- Using k-means for outlier detection
- Getting ready
- How to do it...
- How it works...
- Using KNN for regression
- Getting ready
- How to do it.
- How it works..
- Chapter 7: Cross-Validation and Post-Model Workflow
- Introduction
- Selecting a model with cross-validation
- Getting ready
- How to do it...
- How it works...
- K-fold cross validation
- Getting ready
- How to do it..
- There's more...
- Balanced cross-validation
- Getting ready
- How to do it...
- There's more...
- Cross-validation with ShuffleSplit
- Getting ready
- How to do it...
- Time series cross-validation
- Getting ready
- How to do it...
- There's more...
- Grid search with scikit-learn
- Getting ready
- How to do it...
- How it works...
- Randomized search with scikit-learn
- Getting ready
- How to do it...
- Classification metrics
- Getting ready
- How to do it...
- There's more...
- Regression metrics
- Getting ready
- How to do it...
- Clustering metrics
- Getting ready
- How to do it...
- Using dummy estimators to compare results
- Getting ready
- How to do it...
- How it works...
- Feature selection
- Getting ready
- How to do it...
- How it works...
- Feature selection on L1 norms
- Getting ready
- How to do it...
- There's more...
- Persisting models with joblib or pickle
- Getting ready
- How to do it...
- Opening the saved model
- There's more...
- Chapter 8: Support Vector Machines
- Introduction
- Classifying data with a linear SVM
- Getting ready
- Load the data
- Visualize the two classes
- How to do it...
- How it works...
- There's more...
- Optimizing an SVM
- Getting ready
- How to do it...
- Construct a pipeline
- Construct a parameter grid for a pipeline
- Provide a cross-validation scheme
- Perform a grid search
- There's more...
- Randomized grid search alternative
- Visualize the nonlinear RBF decision boundary
- More meaning behind C and gamma
- Multiclass classification with SVM
- Getting ready
- How to do it...
- OneVsRestClassifier
- Visualize it
- How it works...
- Support vector regression
- Getting ready
- How to do it...
- Chapter 9: Tree Algorithms and Ensembles
- Introduction
- Doing basic classifications with decision trees
- Getting ready
- How to do it...
- Visualizing a decision tree with pydot
- How to do it...
- How it works...
- There's more...
- Tuning a decision tree
- Getting ready
- How to do it...
- There's more...
- Using decision trees for regression
- Getting ready
- How to do it...
- There's more...
- Reducing overfitting with cross-validation
- How to do it...
- There's more...
- Implementing random forest regression
- Getting ready
- How to do it...
- Bagging regression with nearest neighbors
- Getting ready
- How to do it...
- Tuning gradient boosting trees
- Getting ready
- How to do it...
- There's more...
- Finding the best parameters of a gradient boosting classifier
- Tuning an AdaBoost regressor
- How to do it...
- There's more...
- Writing a stacking aggregator with scikit-learn
- How to do it...
- Chapter 10: Text and Multiclass Classification with scikit-learn
- Using LDA for classification
- Getting ready
- How to do it...
- How it works...
- Working with QDA - a nonlinear LDA
- Getting ready
- How to do it...
- How it works...
- Using SGD for classification
- Getting ready
- How to do it...
- There's more...
- Classifying documents with Naive Bayes
- Getting ready
- How to do it...
- How it works...
- There's more...
- Label propagation with semi-supervised learning
- Getting ready
- How to do it...
- How it works...
- Chapter 11: Neural Networks
- Introduction
- Perceptron classifier
- Getting ready
- How to do it...
- How it works...
- There's more...
- Neural network - multilayer perceptron
- Getting ready
- How to do it...
- How it works...
- Philosophical thoughts on neural networks
- Stacking with a neural network
- Getting ready
- How to do it...
- First base model - neural network
- Second base model - gradient boost ensemble
- Third base model - bagging regressor of gradient boost ensembles
- Some functions of the stacker
- Meta-learner - extra trees regressor
- There's more...
- Chapter 12: Create a Simple Estimator
- Introduction
- Create a simple estimator
- Getting ready
- How to do it...
- How it works...
- There's more...
- Trying the new GEE classifier on the Pima diabetes dataset
- Saving your trained estimator
- Index
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.