Thinking Data Science
Description
This definitive guide to machine learning projects answers the questions aspiring and experienced data scientists frequently face. Are you unsure which technology to use for your ML development? Should you choose GOFAI, ANN/DNN, or transfer learning? Can you rely on AutoML for model development? What if a client provides gigabytes or terabytes of data for building analytic models? How do you handle high-frequency, dynamic datasets? This book provides practitioners with a consolidated view of the entire data science process in a single "cheat sheet."
The core challenge for a data scientist is to extract meaningful information from huge datasets to create better strategies for businesses. Many machine learning algorithms and neural networks are designed to perform analytics on such datasets. For a data scientist, choosing the most suitable algorithm for a given dataset can be a daunting decision. Although there is no single answer, a systematic approach to problem solving is essential. This book describes a range of ML algorithms conceptually and discusses a structured process for selecting ML/DL models. The consolidation of available algorithms and techniques for designing efficient ML models is the key focus of this book. Thinking Data Science will help practising data scientists, academics, researchers, and students who want to build ML models using the appropriate algorithms and architectures, whether the data is small or big.
More details
Other editions
Previous edition

Person
Poornachandra Sarang , in his IT career spanning four decades, has been consulting large IT organizations on the design and architecture of systems using state-of-the-art technologies. He has authored several books covering a wide range of emerging technologies. Dr. Sarang is a Ph.D. advisor for Computer Science and Engineering and is on the thesis advisory committee for aspiring doctoral candidates. He has designed and delivered courses/curricula for universities at the postgraduate level, including courses and workshops on emerging technologies for industry. He is a known face at technical and research conferences delivering both keynote and technical talks.
Content
Data Science Process.- Dimensionality Reduction - Creating Manageable Training Datasets.- Classical Algorithms - Over-view.- Regression Analysis.- Decision Tree.- Ensemble - Bagging and Boosting.- K-Nearest Neighbors.- Naive Bayes.- Support Vector Machines: A supervised learning algorithm for Classification and Regression.- Clustering Overview.- Centroid-based Clustering.- Connectivity-based Clustering.- Gaussian Mixture Model.- Density-based.- BIRCH.- CLARANS.- Affinity Propagation Clustering.- STING.- CLIQUE.- Artificial Neural Networks.- ANN-based Applications.- Automated Tools.- DataScientist's Ultimate Workflow.