
Java: Data Science Made Easy
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Persons
Jennifer L. Reese studied computer science at Tarleton State University. She also earned her M.Ed. from Tarleton in December 2016. She currently teaches computer science to high-school students. Her interests include the integration of computer science concepts with other academic disciplines, increasing diversity in computer science courses, and the application of data science to the field of education. She has co-authored two books: Java for Data Science and Java 7 New Features Cookbook. She previously worked as a software engineer. In her free time she enjoys reading, cooking, and traveling-especially to any destination with a beach. She is a musician and appreciates a variety of musical genres.Reese Richard M. Reese :
Richard Reese has worked in the industry and academics for the past 29 years. For 10 years he provided software development support at Lockheed and at one point developed a C based network application. He was a contract instructor providing software training to industry for 5 years. Richard is currently an Associate Professor at Tarleton State University in Stephenville Texas. Richard is the author of various books and video courses some of which are as follows: Natural Language Processing with Java. Java for Data Science Getting Started with Natural Language Processing in JavaGrigorev Alexey :
Alexey Grigorev is a skilled data scientist, machine learning engineer, and software developer with more than 8 years of professional experience. He started his career as a Java developer working at a number of large and small companies, but after a while he switched to data science. Right now, Alexey works as a data scientist at Simplaex, where, in his day-to-day job, he actively uses Java and Python for data cleaning, data analysis, and modeling. His areas of expertise are machine learning and text mining.
Content
- Cover
- Copyright
- Credits
- Table of Contents
- Preface
- Module 1: Java for Data Science
- Chapter 1: Getting Started with Data Science
- Problems solved using data science
- Understanding the data science problem - solving approach
- Using Java to support data science
- Acquiring data for an application
- The importance and process of cleaning data
- Visualizing data to enhance understanding
- The use of statistical methods in data science
- Machine learning applied to data science
- Using neural networks in data science
- Deep learning approaches
- Performing text analysis
- Visual and audio analysis
- Improving application performance using parallel techniques
- Assembling the pieces
- Summary
- Chapter 2: Data Acquisition
- Understanding the data formats used in data science applications
- Overview of CSV data
- Overview of spreadsheets
- Overview of databases
- Overview of PDF files
- Overview of JSON
- Overview of XML
- Overview of streaming data
- Overview of audio/video/images in Java
- Data acquisition techniques
- Using the HttpUrlConnection class
- Web crawlers in Java
- Creating your own web crawler
- Using the crawler4j web crawler
- Web scraping in Java
- Using API calls to access common social media sites
- Using OAuth to authenticate users
- Handing Twitter
- Handling Wikipedia
- Handling Flickr
- Handling YouTube
- Searching by keyword
- Summary
- Chapter 3: Data Cleaning
- Handling data formats
- Handling CSV data
- Handling spreadsheets
- Handling Excel spreadsheets
- Handling PDF files
- Handling JSON
- Using JSON streaming API
- Using the JSON tree API
- The nitty gritty of cleaning text
- Using Java tokenizers to extract words
- Java core tokenizers
- Third-party tokenizers and libraries
- Transforming data into a usable form
- Simple text cleaning
- Removing stop words
- Finding words in text
- Finding and replacing text
- Data imputation
- Subsetting data
- Sorting text
- Data validation
- Validating data types
- Validating dates
- Validating e-mail addresses
- Validating ZIP codes
- Validating names
- Cleaning images
- Changing the contrast of an image
- Smoothing an image
- Brightening an image
- Resizing an image
- Converting images to different formats
- Summary
- Chapter 4: Data Visualization
- Understanding plots and graphs
- Visual analysis goals
- Creating index charts
- Creating bar charts
- Using country as the category
- Using decade as the category
- Creating stacked graphs
- Creating pie charts
- Creating scatter charts
- Creating histograms
- Creating donut charts
- Creating bubble charts
- Summary
- Chapter 5: Statistical Data Analysis Techniques
- Working with mean, mode, and median
- Calculating the mean
- Using simple Java techniques to find mean
- Using Java 8 techniques to find mean
- Using Google Guava to find mean
- Using Apache Commons to find mean
- Calculating the median
- Using simple Java techniques to find median
- Using Apache Commons to find the median
- Calculating the mode
- Using ArrayLists to find multiple modes
- Using a HashMap to find multiple modes
- Using a Apache Commons to find multiple modes
- Standard deviation
- Sample size determination
- Hypothesis testing
- Regression analysis
- Using simple linear regression
- Using multiple regression
- Summary
- Chapter 6: Machine Learning
- Supervised learning techniques
- Decision trees
- Decision tree types
- Decision tree libraries
- Using a decision tree with a book dataset
- Testing the book decision tree
- Support vector machines
- Using an SVM for camping data
- Testing individual instances
- Bayesian networks
- Using a Bayesian network
- Unsupervised machine learning
- Association rule learning
- Using association rule learning to find buying relationships
- Reinforcement learning
- Summary
- Chapter 7: Neural Networks
- Training a neural network
- Getting started with neural network architectures
- Understanding static neural networks
- A basic Java example
- Understanding dynamic neural networks
- Multilayer perceptron networks
- Building the model
- Evaluating the model
- Predicting other values
- Saving and retrieving the model
- Learning vector quantization
- Self-Organizing Maps
- Using a SOM
- Displaying the SOM results
- Additional network architectures and algorithms
- The k-Nearest Neighbors algorithm
- Instantaneously trained networks
- Spiking neural networks
- Cascading neural networks
- Holographic associative memory
- Backpropagation and neural networks
- Summary
- Chapter 8: Deep Learning
- Deeplearning4j architecture
- Acquiring and manipulating data
- Reading in a CSV file
- Configuring and building a model
- Using hyperparameters in ND4J
- Instantiating the network model
- Training a model
- Testing a model
- Deep learning and regression analysis
- Preparing the data
- Setting up the class
- Reading and preparing the data
- Building the model
- Evaluating the model
- Restricted Boltzmann Machines
- Reconstruction in an RBM
- Configuring an RBM
- Deep autoencoders
- Building an autoencoder in DL4J
- Configuring the network
- Building and training the network
- Saving and retrieving a network
- Specialized autoencoders
- Convolutional networks
- Building the model
- Evaluating the model
- Recurrent Neural Networks
- Summary
- Chapter 9: Text Analysis
- Implementing named entity recognition
- Using OpenNLP to perform NER
- Identifying location entities
- Classifying text
- Word2Vec and Doc2Vec
- Classifying text by labels
- Classifying text by similarity
- Understanding tagging and POS
- Using OpenNLP to identify POS
- Understanding POS tags
- Extracting relationships from sentences
- Using OpenNLP to extract relationships
- Sentiment analysis
- Downloading and extracting the Word2Vec model
- Building our model and classifying text
- Summary
- Chapter 10: Visual and Audio Analysis
- Text-to-speech
- Using FreeTTS
- Getting information about voices
- Gathering voice information
- Understanding speech recognition
- Using CMUPhinx to convert speech to text
- Obtaining more detail about the words
- Extracting text from an image
- Using Tess4j to extract text
- Identifying faces
- Using OpenCV to detect faces
- Classifying visual data
- Creating a Neuroph Studio project for classifying visual images
- Training the model
- Summary
- Chapter 11: Mathematical and Parallel Techniques for Data Analysis
- Implementing basic matrix operations
- Using GPUs with DeepLearning4j
- Using map-reduce
- Using Apache's Hadoop to perform map-reduce
- Writing the map method
- Writing the reduce method
- Creating and executing a new Hadoop job
- Various mathematical libraries
- Using the jblas API
- Using the Apache Commons math API
- Using the ND4J API
- Using OpenCL
- Using Aparapi
- Creating an Aparapi application
- Using Aparapi for matrix multiplication
- Using Java 8 streams
- Understanding Java 8 lambda expressions and streams
- Using Java 8 to perform matrix multiplication
- Using Java 8 to perform map-reduce
- Summary
- Chapter 12: Bringing It All Together
- Defining the purpose and scope of our application
- Understanding the application's architecture
- Data acquisition using Twitter
- Understanding the TweetHandler class
- Extracting data for a sentiment analysis model
- Building the sentiment model
- Processing the JSON input
- Cleaning data to improve our results
- Removing stop words
- Performing sentiment analysis
- Analysing the results
- Other optional enhancements
- Summary
- Module 2: Mastering Java for Data Science
- Chapter 1: Data Science Using Java
- Data science
- Machine learning
- Supervised learning
- Unsupervised learning
- Clustering
- Dimensionality reduction
- Natural Language Processing
- Data science process models
- CRISP-DM
- A running example
- Data science in Java
- Data science libraries
- Data processing libraries
- Math and stats libraries
- Machine learning and data mining libraries
- Text processing
- Summary
- Chapter 2: Data Processing Toolbox
- Standard Java library
- Collections
- Input/Output
- Reading input data
- Writing ouput data
- Streaming API
- Extensions to the standard library
- Apache Commons
- Commons Lang
- Commons IO
- Commons Collections
- Other commons modules
- Google Guava
- AOL Cyclops React
- Accessing data
- Text data and CSV
- Web and HTML
- JSON
- Databases
- DataFrames
- Search engine - preparing data
- Summary
- Chapter 3: Exploratory Data Analysis
- Exploratory data analysis in Java
- Search engine datasets
- Apache Commons Math
- Joinery
- Interactive Exploratory Data Analysis in Java
- JVM languages
- Interactive Java
- Joinery shell
- Summary
- Chapter 4: Supervised Learning - Classification and Regression
- Classification
- Binary classification models
- Smile
- JSAT
- LIBSVM and LIBLINEAR
- Encog
- Evaluation
- Accuracy
- Precision, recall, and F1
- ROC and AU ROC (AUC)
- Result validation
- K-fold cross-validation
- Training, validation, and testing
- Case study - page prediction
- Regression
- Machine learning libraries for regression
- Smile
- JSAT
- Other libraries
- Evaluation
- MSE
- MAE
- Case study - hardware performance
- Summary
- Chapter 5: Unsupervised Learning - Clustering and Dimensionality Reduction
- Dimensionality reduction
- Unsupervised dimensionality reduction
- Principal Component Analysis
- Truncated SVD
- Truncated SVD for categorical and sparse data
- Random projection
- Cluster analysis
- Hierarchical methods
- K-means
- Choosing K in K-Means
- DBSCAN
- Clustering for supervised learning
- Clusters as features
- Clustering as dimensionality reduction
- Supervised learning via clustering
- Evaluation
- Manual evaluation
- Supervised evaluation
- Unsupervised Evaluation
- Summary
- Chapter 6: Working with Text - Natural Language Processing and Information Retrieval
- Natural Language Processing and information retrieval
- Vector Space Model - Bag of Words and TF-IDF
- Vector space model implementation
- Indexing and Apache Lucene
- Natural Language Processing tools
- Stanford CoreNLP
- Customizing Apache Lucene
- Machine learning for texts
- Unsupervised learning for texts
- Latent Semantic Analysis
- Text clustering
- Word embeddings
- Supervised learning for texts
- Text classification
- Learning to rank for information retrieval
- Reranking with Lucene
- Summary
- Chapter 7: Extreme Gradient Boosting
- Gradient Boosting Machines and XGBoost
- Installing XGBoost
- XGBoost in practice
- XGBoost for classification
- Parameter tuning
- Text features
- Feature importance
- XGBoost for regression
- XGBoost for learning to rank
- Summary
- Chapter 8: Deep Learning with DeepLearning4J
- Neural Networks and DeepLearning4J
- ND4J - N-dimensional arrays for Java
- Neural networks in DeepLearning4J
- Convolutional Neural Networks
- Deep learning for cats versus dogs
- Reading the data
- Creating the model
- Monitoring the performance
- Data augmentation
- Running DeepLearning4J on GPU
- Summary
- Chapter 9: Scaling Data Science
- Apache Hadoop
- Hadoop MapReduce
- Common Crawl
- Apache Spark
- Link prediction
- Reading the DBLP graph
- Extracting features from the graph
- Node features
- Negative sampling
- Edge features
- Link Prediction with MLlib and XGBoost
- Link suggestion
- Summary
- Chapter 10: Deploying Data Science Models
- Microservices
- Spring Boot
- Search engine service
- Online evaluation
- A/B testing
- Multi-armed bandits
- Summary
- Bibliography
- Index
System requirements
File format: PDF
Copy-Protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our eBook Help page.