
scikit-learn : Machine Learning Simplified
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions

Persons
Trent Hauck is a data scientist living and working in the Seattle area. He grew up in Wichita, Kansas and received his undergraduate and graduate degrees from the University of Kansas. He is the author of the book Instant Data Intensive Apps with pandas How-to, Packt Publishing-a book that can get you up to speed quickly with pandas and other associated technologies.Moncecchi Guillermo :
Guillermo Moncecchi is a Natural Language Processing researcher at the Universidad de la Repblica of Uruguay. He received a PhD in Informatics from the Universidad de la Repblica, Uruguay and a Ph.D in Language Sciences from the Universit Paris Ouest, France. He has participated in several international projects on NLP. He has almost 15 years of teaching experience on Automata Theory, Natural Language Processing, and Machine Learning. He also works as Head Developer at the Montevideo Council and has lead the development of several public services for the council, particularly in the Geographical Information Systems area. He is one of the Montevideo Open Data movement leaders, promoting the publication and exploitation of the city's data.Tompson Raul G :
Raul Garreta is a Computer Engineer with much experience in the theory and application of Artificial Intelligence (AI), where he specialized in Machine Learning and Natural Language Processing (NLP). He has an entrepreneur profile with much interest in the application of science, technology, and innovation to the Internet industry and startups. He has worked in many software companies, handling everything from video games to implantable medical devices. In 2009, he co-founded Tryolabs with the objective to apply AI to the development of intelligent software products, where he performs as the CTO and Product Manager of the company. Besides the application of Machine Learning and NLP, Tryolabs' expertise lies in the Python programming language and has been catering to many clients in Silicon Valley. Raul has also worked in the development of the Python community in Uruguay, co-organizing local PyDay and PyCon conferences. He is also an assistant professor at the Computer Science Institute of Universidad de la Republica in Uruguay since 2007, where he has been working on the courses of Machine Learning, NLP, as well as Automata Theory and Formal Languages. Besides this, he is finishing his Masters degree in Machine Learning and NLP. He is also very interested in the research and application of Robotics, Quantum Computing, and Cognitive Modeling. Not only is he a technology enthusiast and science fiction lover (geek) but also a big fan of arts, such as cinema, photography, and painting.Hackeling Gavin :
Gavin Hackeling develops machine learning services for large-scale documents and image classification at an advertising network in New York. He received his Master's degree from New York University's Interactive Telecommunications Program, and his Bachelor's degree from the University of North Carolina.
Content
- Cover
- Copyright
- Credits
- Table of Contents
- Preface
- Module 1: Learning scikit-learn: Machine Learning in Python
- Chapter 1: Machine Learning - A Gentle Introduction
- Installing scikit-learn
- Our first machine learning method - linear classification
- Evaluating our results
- Machine learning categories
- Important concepts related to machine learning
- Summary
- Chapter 2: Supervised Learning
- Image recognition with Support Vector Machines
- Text classification with Naïve Bayes
- Explaining Titanic hypothesis with decision trees
- Predicting house prices with regression
- Summary
- Chapter 3: Unsupervised Learning
- Principal Component Analysis
- Clustering handwritten digits with k-means
- Alternative clustering methods
- Summary
- Chapter 4: Advanced Features
- Feature extraction
- Feature selection
- Model selection
- Grid search
- Parallel grid search
- Summary
- Module 2: scikit-learn Cookbook
- Chapter 1: Premodel Workflow
- Introduction
- Getting sample data from external sources
- Creating sample data for toy analysis
- Scaling data to the standard normal
- Creating binary features through thresholding
- Working with categorical variables
- Binarizing label features
- Imputing missing values through various strategies
- Using Pipelines for multiple preprocessing steps
- Reducing dimensionality with PCA
- Using factor analysis for decomposition
- Kernel PCA for nonlinear dimensionality reduction
- Using truncated SVD to reduce dimensionality
- Decomposition to classify with DictionaryLearning
- Putting it all together with Pipelines
- Using Gaussian processes for regression
- Defining the Gaussian process object directly
- Using stochastic gradient descent for regression
- Chapter 2: Working with Linear Models
- Introduction
- Fitting a line through data
- Evaluating the linear regression model
- Using ridge regression to overcome linear regression's shortfalls
- Optimizing the ridge regression parameter
- Using sparsity to regularize models
- Taking a more fundamental approach to regularization with LARS
- Using linear methods for classification - logistic regression
- Directly applying Bayesian ridge regression
- Using boosting to learn from errors
- Chapter 3: Building Models with Distance Metrics
- Introduction
- Using KMeans to cluster data
- Optimizing the number of centroids
- Assessing cluster correctness
- Using MiniBatch KMeans to handle more data
- Quantizing an image with KMeans clustering
- Finding the closest objects in the feature space
- Probabilistic clustering with Gaussian Mixture Models
- Using KMeans for outlier detection
- Using k-NN for regression
- Chapter 4: Classifying Data with scikit-learn
- Introduction
- Doing basic classifications with Decision Trees
- Tuning a Decision Tree model
- Using many Decision Trees - random forests
- Tuning a random forest model
- Classifying data with support vector machines
- Generalizing with multiclass classification
- Using LDA for classification
- Working with QDA - a nonlinear LDA
- Using Stochastic Gradient Descent for classification
- Classifying documents with Naïve Bayes
- Label propagation with semi-supervised learning
- Chapter 5: Postmodel Workflow
- Introduction
- K-fold cross validation
- Automatic cross validation
- Cross validation with ShuffleSplit
- Stratified k-fold
- Poor man's grid search
- Brute force grid search
- Using dummy estimators to compare results
- Regression model evaluation
- Feature selection
- Feature selection on L1 norms
- Persisting models with joblib
- Module 3: Mastering Machine Learning with scikit-learn
- Chapter 1: The Fundamentals of Machine Learning
- Learning from experience
- Machine learning tasks
- Training data and test data
- Performance measures, bias, and variance
- An introduction to scikit-learn
- Installing scikit-learn
- Installing pandas and matplotlib
- Summary
- Chapter 2: Linear Regression
- Simple linear regression
- Evaluating the model
- Multiple linear regression
- Polynomial regression
- Regularization
- Applying linear regression
- Fitting models with gradient descent
- Summary
- Chapter 3: Feature Extraction and Preprocessing
- Extracting features from categorical variables
- Extracting features from text
- Extracting features from images
- Data standardization
- Summary
- Chapter 4: From Linear Regression to Logistic Regression
- Binary classification with logistic regression
- Spam filtering
- Binary classification performance metrics
- Calculating the F1 measure
- ROC AUC
- Tuning models with grid search
- Multi-class classification
- Multi-label classification and problem transformation
- Summary
- Chapter 5: Nonlinear Classification and Regression with Decision Trees
- Decision trees
- Training decision trees
- Decision trees with scikit-learn
- Summary
- Chapter 6: Clustering with K-Means
- Clustering with the K-Means algorithm
- Evaluating clusters
- Image quantization
- Clustering to learn features
- Summary
- Chapter 7: Dimensionality Reduction with PCA
- An overview of PCA
- Performing Principal Component Analysis
- Using PCA to visualize high-dimensional data
- Face recognition with PCA
- Summary
- Chapter 8: The Perceptron
- Activation functions
- Binary classification with the perceptron
- Limitations of the perceptron
- Summary
- Chapter 9: From the Perceptron to Support Vector Machines
- Kernels and the kernel trick
- Maximum margin classification and support vectors
- Classifying characters in scikit-learn
- Summary
- Chapter 10: From the Perceptron to Artificial Neural Networks
- Nonlinear decision boundaries
- Feedforward and feedback artificial neural networks
- Approximating XOR with Multilayer perceptrons
- Classifying handwritten digits
- Summary
- Bibliography
- Index
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.