
Scala: Guide for Data Science Professionals
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Content
- Cover
- Copyright
- Credits
- Preface
- Table of Contents
- Module 1: Scala for Data Science
- Chapter 1: Scala and Data Science
- Data science
- Programming in data science
- Why Scala?
- When not to use Scala
- Summary
- References
- Chapter 2: Manipulating Data with Breeze
- Code examples
- Installing Breeze
- Getting help on Breeze
- Basic Breeze data types
- An example - logistic regression
- Towards re-usable code
- Alternatives to Breeze
- Summary
- References
- Chapter 3: Plotting with breeze-viz
- Diving into Breeze
- Customizing plots
- Customizing the line type
- More advanced scatter plots
- Multi-plot example - scatterplot matrix plots
- Managing without documentation
- Breeze-viz reference
- Data visualization beyond breeze-viz
- Summary
- Chapter 4: Parallel Collections and Futures
- Parallel collections
- Futures
- Summary
- References
- Chapter 5: Scala and SQL through JDBC
- Interacting with JDBC
- First steps with JDBC
- JDBC summary
- Functional wrappers for JDBC
- Safer JDBC connections with the loan pattern
- Enriching JDBC statements with the "pimp my library" pattern
- Wrapping result sets in a stream
- Looser coupling with type classes
- Creating a data access layer
- Summary
- References
- Chapter 6: Slick - A Functional Interface for SQL
- FEC data
- Invokers
- Operations on columns
- Aggregations with "Group by
- Accessing database metadata
- Slick versus JDBC
- Summary
- References
- Chapter 7: Web APIs
- A whirlwind tour of JSON
- Querying web APIs
- JSON in Scala - an exercise in pattern matching
- Extraction using case classes
- Concurrency and exception handling with futures
- Authentication - adding HTTP headers
- Summary
- References
- Chapter 8: Scala and MongoDB
- MongoDB
- Connecting to MongoDB with Casbah
- Inserting documents
- Extracting objects from the database
- Complex queries
- Casbah query DSL
- Custom type serialization
- Beyond Casbah
- Summary
- References
- Chapter 9: Concurrency with Akka
- GitHub follower graph
- Actors as people
- Hello world with Akka
- Case classes as messages
- Actor construction
- Anatomy of an actor
- Follower network crawler
- Fetcher actors
- Routing
- Message passing between actors
- Queue control and the pull pattern
- Accessing the sender of a message
- Stateful actors
- Follower network crawler
- Fault tolerance
- Custom supervisor strategies
- Life-cycle hooks
- What we have not talked about
- Summary
- References
- Chapter 10: Distributed Batch Processing with Spark
- Installing Spark
- Acquiring the example data
- Resilient distributed datasets
- Building and running standalone programs
- Spam filtering
- Lifting the hood
- Data shuffling and partitions
- Summary
- Reference
- Chapter 11: Spark SQL and DataFrames
- DataFrames - a whirlwind introduction
- Aggregation operations
- Joining DataFrames together
- Custom functions on DataFrames
- DataFrame immutability and persistence
- SQL statements on DataFrames
- Complex data types - arrays, maps, and structs
- Interacting with data sources
- Standalone programs
- Summary
- References
- Chapter 12: Distributed Machine Learning with MLlib
- Introducing MLlib - Spam classification
- Pipeline components
- Evaluation
- Regularization in logistic regression
- Cross-validation and model selection
- Beyond logistic regression
- Summary
- References
- Chapter 13: Web APIs with Play
- Client-server applications
- Introduction to web frameworks
- Model-View-Controller architecture
- Single page applications
- Building an application
- The Play framework
- Dynamic routing
- Actions
- Interacting with JSON
- Querying external APIs and consuming JSON
- Creating APIs with Play: a summary
- Rest APIs: best practice
- Summary
- References
- Chapter 14: Visualization with D3 and the Play Framework
- GitHub user data
- Do I need a backend?
- JavaScript dependencies through web-jars
- Towards a web application: HTML templates
- Modular JavaScript through RequireJS
- Bootstrapping the applications
- Client-side program architecture
- Drawing plots with NVD3
- Summary
- References
- Appendix: Pattern Matching and Extractors
- Pattern matching in for comprehensions
- Pattern matching internals
- Extracting sequences
- Summary
- Reference
- Module 2: Scala Data Analysis Cookbook
- Chapter 1: Getting Started with Breeze
- Introduction
- Getting Breeze - the linear algebra library
- Working with vectors
- Working with matrices
- Vectors and matrices with randomly distributed values
- Reading and writing CSV files
- Chapter 2: Getting Started with Apache Spark DataFrames
- Introduction
- Getting Apache Spark
- Creating a DataFrame from CSV
- Manipulating DataFrames
- Creating a DataFrame from Scala case classes
- Chapter 3: Loading and Preparing Data - DataFrame
- Introduction
- Loading more than 22 features into classes
- Loading JSON into DataFrames
- Storing data as Parquet files
- Using the Avro data model in Parquet
- Loading from RDBMS
- Preparing data in Dataframes
- Chapter 4: Data Visualization
- Introduction
- Visualizing using Zeppelin
- Creating scatter plots with Bokeh-Scala
- Creating a time series MultiPlot with Bokeh-Scala
- Chapter 5: Learning from Data
- Introduction
- Supervised and unsupervised learning
- Gradient descent
- Predicting continuous values using linear regression
- Binary classification using LogisticRegression and SVM
- Binary classification using LogisticRegression with Pipeline API
- Clustering using K-means
- Feature reduction using principal component analysis
- Chapter 6: Scaling Up
- Introduction
- Building the Uber JAR
- Submitting jobs to the Spark cluster (local)
- Running the Spark Standalone cluster on EC2
- Running the Spark Job on Mesos (local)
- Running the Spark Job on YARN (local)
- Chapter 7 : Going Further
- Introduction
- Using Spark Streaming to subscribe to a Twitter stream
- Using Spark as an ETL tool
- Using StreamingLogisticRegression to classify a Twitter stream using Kafka as a training stream
- Using GraphX to analyze Twitter data
- Module 3: Scala for Machine Learning
- Chapter 1: Getting Started
- Mathematical notation for the curious
- Why machine learning?
- Why Scala?
- Model categorization
- Taxonomy of machine learning algorithms
- Tools and frameworks
- Source code
- Let's kick the tires
- Summary
- Chapter 3: Hello World!
- Modeling
- Designing a workflow
- Assessing a model
- Summary
- Chapter 3: Data Preprocessing
- Time series
- Moving averages
- Fourier analysis
- The Kalman filter
- Alternative preprocessing techniques
- Summary
- Chapter 4: Unsupervised Learning
- Clustering
- Dimension reduction
- Performance considerations
- Summary
- Chapter 5: Naïve Bayes Classifiers
- Probabilistic graphical models
- Naïve Bayes classifiers
- Multivariate Bernoulli classification
- Naïve Bayes and text mining
- Pros and cons
- Summary
- Chapter 6: Regression and Regularization
- Linear regression
- Regularization
- Numerical optimization
- The logistic regression
- Summary
- Chapter 7: Sequential Data Models
- Markov decision processes
- The hidden Markov model (HMM)
- Conditional random fields
- CRF and text analytics
- Comparing CRF and HMM
- Performance consideration
- Summary
- Chapter 8: Kernel Models and Support Vector Machines
- Kernel functions
- The support vector machine (SVM)
- Support vector classifier (SVC)
- Anomaly detection with one-class SVC
- Support vector regression (SVR)
- Performance considerations
- Summary
- Chapter 9: Artificial Neural Networks
- Feed-forward neural networks (FFNN)
- The multilayer perceptron (MLP)
- Evaluation
- Benefits and limitations
- Summary
- Chapter 10 : Genetic Algorithms
- Evolution
- Genetic algorithms and machine learning
- Genetic algorithm components
- Implementation
- GA for trading strategies
- Advantages and risks of genetic algorithms
- Summary
- Chapter 11: Reinforcement Learning
- Introduction
- Learning classifier systems
- Summary
- Chapter 12: Scalable Frameworks
- Overview
- Scala
- Scalability with Actors
- Akka
- Apache Spark
- Summary
- Appendix A : Basic Concepts
- Scala programming
- Mathematics
- Finances 101
- Suggested online courses
- References
- Bibliography
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.