scikit-learn : Machine Learning Simplified

Name: scikit-learn : Machine Learning Simplified | Implement scikit-learn into every step of the data science pipeline
Brand: Packt Publishing
Price: 82.99 EUR
Availability: OnlineOnly

Implement scikit-learn into every step of the data science pipeline

Trent Hauck Guillermo Moncecchi Raul G. Tompson Gavin Hackeling(Author)

Packt Publishing

Published on 15. April 2025

530 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-78883-152-9 (ISBN)

€82.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

More details

Other editions

Persons

Hauck Trent :
Trent Hauck is a data scientist living and working in the Seattle area. He grew up in Wichita, Kansas and received his undergraduate and graduate degrees from the University of Kansas. He is the author of the book Instant Data Intensive Apps with pandas How-to, Packt Publishing-a book that can get you up to speed quickly with pandas and other associated technologies.Moncecchi Guillermo :

Guillermo Moncecchi is a Natural Language Processing researcher at the Universidad de la Repblica of Uruguay. He received a PhD in Informatics from the Universidad de la Repblica, Uruguay and a Ph.D in Language Sciences from the Universit Paris Ouest, France. He has participated in several international projects on NLP. He has almost 15 years of teaching experience on Automata Theory, Natural Language Processing, and Machine Learning. He also works as Head Developer at the Montevideo Council and has lead the development of several public services for the council, particularly in the Geographical Information Systems area. He is one of the Montevideo Open Data movement leaders, promoting the publication and exploitation of the city's data.Tompson Raul G :

Raul Garreta is a Computer Engineer with much experience in the theory and application of Artificial Intelligence (AI), where he specialized in Machine Learning and Natural Language Processing (NLP). He has an entrepreneur profile with much interest in the application of science, technology, and innovation to the Internet industry and startups. He has worked in many software companies, handling everything from video games to implantable medical devices. In 2009, he co-founded Tryolabs with the objective to apply AI to the development of intelligent software products, where he performs as the CTO and Product Manager of the company. Besides the application of Machine Learning and NLP, Tryolabs' expertise lies in the Python programming language and has been catering to many clients in Silicon Valley. Raul has also worked in the development of the Python community in Uruguay, co-organizing local PyDay and PyCon conferences. He is also an assistant professor at the Computer Science Institute of Universidad de la Republica in Uruguay since 2007, where he has been working on the courses of Machine Learning, NLP, as well as Automata Theory and Formal Languages. Besides this, he is finishing his Masters degree in Machine Learning and NLP. He is also very interested in the research and application of Robotics, Quantum Computing, and Cognitive Modeling. Not only is he a technology enthusiast and science fiction lover (geek) but also a big fan of arts, such as cinema, photography, and painting.Hackeling Gavin :

Gavin Hackeling develops machine learning services for large-scale documents and image classification at an advertising network in New York. He received his Master's degree from New York University's Interactive Telecommunications Program, and his Bachelor's degree from the University of North Carolina.

Content

Cover
Copyright
Credits
Table of Contents
Preface
Module 1: Learning scikit-learn: Machine Learning in Python
Chapter 1: Machine Learning - A Gentle Introduction
Installing scikit-learn
Our first machine learning method - linear classification
Evaluating our results
Machine learning categories
Important concepts related to machine learning
Summary
Chapter 2: Supervised Learning
Image recognition with Support Vector Machines
Text classification with Naïve Bayes
Explaining Titanic hypothesis with decision trees
Predicting house prices with regression
Summary
Chapter 3: Unsupervised Learning
Principal Component Analysis
Clustering handwritten digits with k-means
Alternative clustering methods
Summary
Chapter 4: Advanced Features
Feature extraction
Feature selection
Model selection
Grid search
Parallel grid search
Summary
Module 2: scikit-learn Cookbook
Chapter 1: Premodel Workflow
Introduction
Getting sample data from external sources
Creating sample data for toy analysis
Scaling data to the standard normal
Creating binary features through thresholding
Working with categorical variables
Binarizing label features
Imputing missing values through various strategies
Using Pipelines for multiple preprocessing steps
Reducing dimensionality with PCA
Using factor analysis for decomposition
Kernel PCA for nonlinear dimensionality reduction
Using truncated SVD to reduce dimensionality
Decomposition to classify with DictionaryLearning
Putting it all together with Pipelines
Using Gaussian processes for regression
Defining the Gaussian process object directly
Using stochastic gradient descent for regression
Chapter 2: Working with Linear Models
Introduction
Fitting a line through data
Evaluating the linear regression model
Using ridge regression to overcome linear regression's shortfalls
Optimizing the ridge regression parameter
Using sparsity to regularize models
Taking a more fundamental approach to regularization with LARS
Using linear methods for classification - logistic regression
Directly applying Bayesian ridge regression
Using boosting to learn from errors
Chapter 3: Building Models with Distance Metrics
Introduction
Using KMeans to cluster data
Optimizing the number of centroids
Assessing cluster correctness
Using MiniBatch KMeans to handle more data
Quantizing an image with KMeans clustering
Finding the closest objects in the feature space
Probabilistic clustering with Gaussian Mixture Models
Using KMeans for outlier detection
Using k-NN for regression
Chapter 4: Classifying Data with scikit-learn
Introduction
Doing basic classifications with Decision Trees
Tuning a Decision Tree model
Using many Decision Trees - random forests
Tuning a random forest model
Classifying data with support vector machines
Generalizing with multiclass classification
Using LDA for classification
Working with QDA - a nonlinear LDA
Using Stochastic Gradient Descent for classification
Classifying documents with Naïve Bayes
Label propagation with semi-supervised learning
Chapter 5: Postmodel Workflow
Introduction
K-fold cross validation
Automatic cross validation
Cross validation with ShuffleSplit
Stratified k-fold
Poor man's grid search
Brute force grid search
Using dummy estimators to compare results
Regression model evaluation
Feature selection
Feature selection on L1 norms
Persisting models with joblib
Module 3: Mastering Machine Learning with scikit-learn
Chapter 1: The Fundamentals of Machine Learning
Learning from experience
Machine learning tasks
Training data and test data
Performance measures, bias, and variance
An introduction to scikit-learn
Installing scikit-learn
Installing pandas and matplotlib
Summary
Chapter 2: Linear Regression
Simple linear regression
Evaluating the model
Multiple linear regression
Polynomial regression
Regularization
Applying linear regression
Fitting models with gradient descent
Summary
Chapter 3: Feature Extraction and Preprocessing
Extracting features from categorical variables
Extracting features from text
Extracting features from images
Data standardization
Summary
Chapter 4: From Linear Regression to Logistic Regression
Binary classification with logistic regression
Spam filtering
Binary classification performance metrics
Calculating the F1 measure
ROC AUC
Tuning models with grid search
Multi-class classification
Multi-label classification and problem transformation
Summary
Chapter 5: Nonlinear Classification and Regression with Decision Trees
Decision trees
Training decision trees
Decision trees with scikit-learn
Summary
Chapter 6: Clustering with K-Means
Clustering with the K-Means algorithm
Evaluating clusters
Image quantization
Clustering to learn features
Summary
Chapter 7: Dimensionality Reduction with PCA
An overview of PCA
Performing Principal Component Analysis
Using PCA to visualize high-dimensional data
Face recognition with PCA
Summary
Chapter 8: The Perceptron
Activation functions
Binary classification with the perceptron
Limitations of the perceptron
Summary
Chapter 9: From the Perceptron to Support Vector Machines
Kernels and the kernel trick
Maximum margin classification and support vectors
Classifying characters in scikit-learn
Summary
Chapter 10: From the Perceptron to Artificial Neural Networks
Nonlinear decision boundaries
Feedforward and feedback artificial neural networks
Approximating XOR with Multilayer perceptrons
Classifying handwritten digits
Summary
Bibliography
Index

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

scikit-learn : Machine Learning Simplified

Description

More details

Other editions

Additional editions

Persons

Content

System requirements