Practical Machine Learning

 
 
Packt Publishing Limited
  • 1. Auflage
  • |
  • erschienen am 30. Januar 2016
  • |
  • 468 Seiten
 
E-Book | ePUB mit Adobe DRM | Systemvoraussetzungen
978-1-78439-401-1 (ISBN)
 
Tackle the real-world complexities of modern machine learning with innovative, cutting-edge, techniquesAbout This BookFully-coded working examples using a wide range of machine learning libraries and tools, including Python, R, Julia, and SparkComprehensive practical solutions taking you into the future of machine learningGo a step further and integrate your machine learning projects with HadoopWho This Book Is ForThis book has been created for data scientists who want to see machine learning in action and explore its real-world application. With guidance on everything from the fundamentals of machine learning and predictive analytics to the latest innovations set to lead the big data revolution into the future, this is an unmissable resource for anyone dedicated to tackling current big data challenges. Knowledge of programming (Python and R) and mathematics is advisable if you want to get started immediately.What You Will LearnImplement a wide range of algorithms and techniques for tackling complex dataGet to grips with some of the most powerful languages in data science, including R, Python, and JuliaHarness the capabilities of Spark and Hadoop to manage and process data successfullyApply the appropriate machine learning technique to address real-world problemsGet acquainted with Deep learning and find out how neural networks are being used at the cutting-edge of machine learningExplore the future of machine learning and dive deeper into polyglot persistence, semantic data, and moreIn DetailFinding meaning in increasingly larger and more complex datasets is a growing demand of the modern world. Machine learning and predictive analytics have become the most important approaches to uncover data gold mines. Machine learning uses complex algorithms to make improved predictions of outcomes based on historical patterns and the behaviour of data sets. Machine learning can deliver dynamic insights into trends, patterns, and relationships within data, immensely valuable to business growth and development.This book explores an extensive range of machine learning techniques uncovering hidden tricks and tips for several types of data using practical and real-world examples. While machine learning can be highly theoretical, this book offers a refreshing hands-on approach without losing sight of the underlying principles. Inside, a full exploration of the various algorithms gives you high-quality guidance so you can begin to see just how effective machine learning is at tackling contemporary challenges of big data.This is the only book you need to implement a whole suite of open source tools, frameworks, and languages in machine learning. We will cover the leading data science languages, Python and R, and the underrated but powerful Julia, as well as a range of other big data platforms including Spark, Hadoop, and Mahout. Practical Machine Learning is an essential resource for the modern data scientists who want to get to grips with its real-world application.With this book, you will not only learn the fundamentals of machine learning but dive deep into the complexities of real world data before moving on to using Hadoop and its wider ecosystem of tools to process and manage your structured and unstructured data.You will explore different machine learning techniques for both supervised and unsupervised learning; from decision trees to Naive Bayes classifiers and linear and clustering methods, you will learn strategies for a truly advanced approach to the statistical analysis of data. The book also explores the cutting-edge advancements in machine learning, with worked examples and guidance on deep learning and reinforcement learning, providing you with practical demonstrations and samples that help take the theory-and mystery-out of even the most advanced machine learning methodologies.Style and approachA practical data science tutorial designed to give you an insight into the practical application of machine learning, this book takes you through complex concepts and tasks in an accessible way. Featuring information on a wide range of data science techniques, Practical Machine Learning is a comprehensive data science resource.
  • Englisch
  • Birmingham
  • |
  • Großbritannien
978-1-78439-401-1 (9781784394011)
1784394017 (1784394017)
weitere Ausgaben werden ermittelt
Sunila Gollapudi works as Vice President Technology with Broadridge Financial Solutions (India) Pvt. Ltd., a wholly owned subsidiary of the US-based Broadridge Financial Solutions Inc. (BR). She has close to 14 years of rich hands-on experience in the IT services space. She currently runs the Architecture Center of Excellence from India and plays a key role in the big data and data science initiatives. Prior to joining Broadridge she held key positions at leading global organizations and specializes in Java, distributed architecture, big data technologies, advanced analytics, Machine learning, semantic technologies, and data integration tools. Sunila represents Broadridge in global technology leadership and innovation forums, the most recent being at IEEE for her work on semantic technologies and its role in business data lakes. Sunila's signature strength is her ability to stay connected with ever changing global technology landscape where new technologies mushroom rapidly, connect the dots and architect practical solutions for business delivery. A post graduate in computer science, her first publication was on Big Data Datawarehouse solution, Greenplum titled Getting Started with Greenplum for Big Data Analytics, Packt Publishing. She's a noted Indian classical dancer at both national and international levels, a painting artist, in addition to being a mother, and a wife.
  • Cover
  • Copyright
  • Credits
  • Foreword
  • About the Author
  • Acknowledgments
  • About the Reviewers
  • www.PacktPub.com
  • Preface
  • Chapter 1: Introduction to Machine learning
  • Machine learning
  • Definition
  • Core Concepts and Terminology
  • What is learning?
  • Data
  • Labeled and unlabeled data
  • Tasks
  • Algorithms
  • Models
  • Data and inconsistencies in Machine learning
  • Under-fitting
  • Over-fitting
  • Data instability
  • Unpredictable data formats
  • Practical Machine learning examples
  • Types of learning problems
  • Classification
  • Clustering
  • Forecasting, prediction or regression
  • Simulation
  • Optimization
  • Supervised learning
  • Unsupervised learning
  • Semi-supervised learning
  • Reinforcement learning
  • Deep learning
  • Performance measures
  • Is the solution good?
  • Mean squared error (MSE)
  • Mean absolute error (MAE)
  • Normalized MSE and MAE (NMSE and NMAE)
  • Solving the errors: bias and variance
  • Some complementing fields of Machine learning
  • Data mining
  • Artificial intelligence (AI)
  • Statistical learning
  • Data science
  • Machine learning process lifecycle and solution architecture
  • Machine learning algorithms
  • Decision tree based algorithms
  • Bayesian method based algorithms
  • Kernel method based algorithms
  • Clustering methods
  • Artificial neural networks (ANN)
  • Dimensionality reduction
  • Ensemble methods
  • Instance based learning algorithms
  • Regression analysis based algorithms
  • Association rule based learning algorithms
  • Machine learning tools and frameworks
  • Summary
  • Chapter 2: Machine learning and Large-scale datasets
  • Big data and the context of large-scale Machine learning
  • Functional versus Structural - A methodological mismatch
  • Commoditizing information
  • Theoretical limitations of RDBMS
  • Scaling-up versus Scaling-out storage
  • Distributed and parallel computing strategies
  • Machine learning: Scalability and Performance
  • Too many data points or instances
  • Too many attributes or features
  • Shrinking response time windows - need for real-time responses
  • Highly complex algorithm
  • Feed forward, iterative prediction cycles
  • Model selection process
  • Potential issues in large-scale Machine learning
  • Algorithms and Concurrency
  • Developing concurrent algorithms
  • Technology and implementation options for scaling-up Machine learning
  • MapReduce programming paradigm
  • High Performance Computing (HPC) with Message Passing Interface (MPI)
  • Language Integrated Queries (LINQ) framework
  • Manipulating datasets with LINQ
  • Graphics Processing Unit (GPU)
  • Field Programmable Gate Array (FPGA)
  • Multicore or multiprocessor systems
  • Summary
  • Chapter 3: An Introduction to Hadoop's Architecture and Ecosystem
  • Introduction to Apache Hadoop
  • Evolution of Hadoop (the platform of choice)
  • Hadoop and its core elements
  • Machine learning solution architecture for big data (employing Hadoop)
  • The Data Source layer
  • The Ingestion layer
  • The Hadoop Storage layer
  • The Hadoop (Physical) Infrastructure layer - supporting appliance
  • Hadoop platform / Processing layer
  • The Analytics layer
  • The Consumption layer
  • Explaining and exploring data with Visualizations
  • Security and Monitoring layer
  • Hadoop core components framework
  • Writing to and reading from HDFS
  • Handling failures
  • HDFS command line
  • RESTFul HDFS
  • MapReduce
  • MapReduce architecture
  • What makes MapReduce cater to the needs of large datasets?
  • MapReduce execution flow and components
  • Developing MapReduce components
  • Hadoop 2.x
  • Hadoop ecosystem components
  • Hadoop installation and setup
  • Installing Jdk 1.7
  • Creating a system user for Hadoop (dedicated)
  • Disable IPv6
  • Steps for installing Hadoop 2.6.0
  • Starting Hadoop
  • Hadoop distributions and vendors
  • Summary
  • Chapter 4: Machine Learning Tools, Libraries, and Frameworks
  • Machine learning tools - A landscape
  • Apache Mahout
  • How does Mahout work?
  • Installing and setting up Apache Mahout
  • Setting up Maven
  • Setting-up Apache Mahout using Eclipse IDE
  • Setting up Apache Mahout without Eclipse
  • Mahout Packages
  • Implementing vectors in Mahout
  • R
  • Installing and setting up R
  • Integrating R with Apache Hadoop
  • Approach 1 - Using R and Streaming APIs in Hadoop
  • Approach 2 - Using the Rhipe package of R
  • Approach 3 - Using RHadoop
  • Summary of R/Hadoop integration approaches
  • Implementing in R (using examples)
  • Julia
  • Installing and setting up Julia
  • Downloading and using the command line version of Julia
  • Using Juno IDE for running Julia
  • Using Julia via the browser
  • Running the Julia code from the command line
  • Implementing in Julia (with examples)
  • Using variables and assignments
  • Numeric primitives
  • Data structures
  • Working with Strings and String manipulations
  • Packages
  • Interoperability
  • Graphics and plotting
  • Benefits of adopting Julia
  • Integrating Julia and Hadoop
  • Python
  • Toolkit options in Python
  • Implementation of Python (using examples)
  • Installing Python and setting up scikit-learn
  • Apache Spark
  • Scala
  • Programming with Resilient Distributed Datasets (RDD)
  • Spring XD
  • Summary
  • Chapter 5: Decision Tree based learning
  • Decision trees
  • Terminology
  • Purpose and uses
  • Constructing a Decision tree
  • Handling missing values
  • Considerations for constructing Decision trees
  • Decision trees in a graphical representation
  • Inducing Decision trees - Decision tree algorithms
  • Greedy Decision trees
  • Benefits of Decision trees
  • Specialized trees
  • Oblique trees
  • Random forests
  • Evolutionary trees
  • Hellinger trees
  • Implementing Decision trees
  • Using Mahout
  • Using R
  • Using Spark
  • Using Python (scikit-learn)
  • Using Julia
  • Summary
  • Chapter 6: Instance and Kernel Methods Based Learning
  • Instance-based learning (IBL)
  • Nearest Neighbors
  • Value of k in KNN
  • Distance measures in KNN
  • Case-based reasoning (CBR)
  • Locally weighed regression (LWR)
  • Implementing KNN
  • Using Mahout
  • Using R
  • Using Spark
  • Using Python (scikit-learn)
  • Using Julia
  • Kernel methods-based learning
  • Kernel functions
  • Support Vector Machines (SVM)
  • Inseparable Data
  • Implementing SVM
  • Using Mahout
  • Using R
  • Using Spark
  • Using Python (Scikit-learn)
  • Using Julia
  • Summary
  • Chapter 7: Association Rules based learning
  • Association rules based learning
  • Association rule - a definition
  • Apriori algorithm
  • Rule generation strategy
  • FP-growth algorithm
  • Apriori versus FP-growth
  • Implementing Apriori and FP-growth
  • Using Mahout
  • Using R
  • Using Spark
  • Using Python (Scikit-learn)
  • Using Julia
  • Summary
  • Chapter 8: Clustering based learning
  • Clustering-based learning
  • Types of clustering
  • Hierarchical clustering
  • Partitional clustering
  • The k-means clustering algorithm
  • Convergence or stopping criteria for the k-means clustering
  • K-means clustering on disk
  • Advantages of the k-means approach
  • Disadvantages of the k-means algorithm
  • Distance measures
  • Complexity measures
  • Implementing k-means clustering
  • Using Mahout
  • Using R
  • Using Spark
  • Using Python (scikit-learn)
  • Using Julia
  • Summary
  • Chapter 9: Bayesian learning
  • Bayesian learning
  • Statistician's thinking
  • Important terms and definitions
  • Probability
  • Types of probability
  • Distribution
  • Bernoulli distribution
  • Binomial distribution
  • Bayes' theorem
  • Naïve Bayes classifier
  • Multinomial Naïve Bayes classifier
  • The Bernoulli Naïve Bayes classifier
  • Implementing Naïve Bayes algorithm
  • Using Mahout
  • Using R
  • Using Spark
  • Using scikit-learn
  • Using Julia
  • Summary
  • Chapter 10: Regression based learning
  • Regression analysis
  • Revisiting statistics
  • Properties of expectation, variance, and covariance
  • ANOVA and F Statistics
  • Confounding
  • Effect modification
  • Regression methods
  • Simple regression or simple linear regression
  • Multiple regression
  • Polynomial (non-linear) regression
  • Generalized Linear Models (GLM)
  • Logistic regression (logit link)
  • Odds ratio in logistic regression
  • Poisson regression
  • Implementing linear and logistic regression
  • Using Mahout
  • Using R
  • Using Spark
  • Using scikit-learn
  • Using Julia
  • Summary
  • Chapter 11: Deep learning
  • Background
  • The human brain
  • Neural networks
  • Neuron
  • Synapses
  • Artificial neurons or perceptrons
  • Neural Network size
  • Neural network types
  • Backpropagation algorithm
  • Softmax regression technique
  • Deep learning taxonomy
  • Convolutional neural networks (CNN/ConvNets)
  • Convolutional layer (CONV)
  • Pooling layer (POOL)
  • Fully connected layer (FC)
  • Recurrent Neural Networks (RNNs)
  • Restricted Boltzmann Machines (RBMs)
  • Deep Boltzmann Machines (DBMs)
  • Autoencoders
  • Implementing ANNs and Deep learning methods
  • Using Mahout
  • Using R
  • Using Spark
  • Using Python (Scikit-learn)
  • Using Julia
  • Summary
  • Chapter 12: Reinforcement Learning
  • Reinforcement Learning (RL)
  • The context of Reinforcement Learning
  • Examples of Reinforcement Learning
  • Evaluative Feedback
  • The Reinforcement Learning problem - the world grid example
  • Markov Decision Process (MDP)
  • Basic RL model - agent-environment interface
  • Delayed rewards
  • The policy
  • Reinforcement Learning - key features
  • Reinforcement learning solution methods
  • Dynamic Programming (DP)
  • Generalized Policy Iteration (GPI)
  • Monte Carlo methods
  • Temporal difference (TD) learning
  • Sarsa - on-Policy TD
  • Q-Learning - off-Policy TD
  • Actor-critic methods (on-policy)
  • R Learning (Off-policy)
  • Summary
  • Chapter 13: Ensemble learning
  • Ensemble learning methods
  • The wisdom of the crowd
  • Key use cases
  • Recommendation systems
  • Anomaly detection
  • Transfer learning
  • Stream mining or classification
  • Ensemble methods
  • Supervised ensemble methods
  • Unsupervised ensemble methods
  • Implementing ensemble methods
  • Using Mahout
  • Using R
  • Using Spark
  • Using Python (Scikit-learn)
  • Using Julia
  • Summary
  • Chapter 14: New generation data architectures for Machine learning
  • Evolution of data architectures
  • Emerging perspectives & drivers for new age data architectures
  • Modern data architectures for Machine learning
  • Semantic data architecture
  • The business data lake
  • Semantic Web technologies
  • Vendors
  • Multi-model database architecture / polyglot persistence
  • Vendors
  • Lambda Architecture (LA)
  • Vendors
  • Summary
  • Index

Dateiformat: EPUB
Kopierschutz: Adobe-DRM (Digital Rights Management)

Systemvoraussetzungen:

Computer (Windows; MacOS X; Linux): Installieren Sie bereits vor dem Download die kostenlose Software Adobe Digital Editions (siehe E-Book Hilfe).

Tablet/Smartphone (Android; iOS): Installieren Sie bereits vor dem Download die kostenlose App Adobe Digital Editions (siehe E-Book Hilfe).

E-Book-Reader: Bookeen, Kobo, Pocketbook, Sony, Tolino u.v.a.m. (nicht Kindle)

Das Dateiformat EPUB ist sehr gut für Romane und Sachbücher geeignet - also für "fließenden" Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein "harter" Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.

Weitere Informationen finden Sie in unserer E-Book Hilfe.


Download (sofort verfügbar)

37,41 €
inkl. 19% MwSt.
Download / Einzel-Lizenz
ePUB mit Adobe DRM
siehe Systemvoraussetzungen
E-Book bestellen

Unsere Web-Seiten verwenden Cookies. Mit der Nutzung dieser Web-Seiten erklären Sie sich damit einverstanden. Mehr Informationen finden Sie in unserem Datenschutzhinweis. Ok