Introduction
Amazon Web Services (AWS) is one of the leading cloud-computing platforms in the industry today. At the time this book was written, AWS offered more than 100 services, each of which resided in one of 18 different service categories. For someone who is new to cloud computing or to the AWS ecosystem, the sheer number of services on offer can be daunting. It can be difficult to know where to begin and what services to focus on.
Developers who are new to machine learning as well as experienced data scientists are often not aware of the power of the public cloud and AWS's offerings in the machine learning space in particular. In the past, cloud-based machine learning offerings have been limited in the types of algorithms they could support and the level of customization that was possible. All of this changed when Amazon announced SageMaker-a service that provided the ability to build machine learning models based on Amazon's implementation of cutting-edge algorithms, as well as the option to build custom models with frameworks such as Scikit-learn and Google TensorFlow.
Real-world use cases of cloud-based machine learning models are not based on using the model in isolation, but instead rely on a number of supporting systems such as databases, load balancers, API gateways, and identity providers, all of which are provided by AWS. This book is written to provide both seasoned machine learning experts and enthusiasts alike an introduction to a selection of AWS machine learning services that are based on pre-trained models, as well as step-by-step examples of how to train and deploy your own custom models on Amazon SageMaker. For enthusiasts who are new to machine learning, this book also provides a selection of chapters that cover the fundamentals of machine learning such as data preprocessing, visualization, feature engineering, and the use of common Python libraries such as NumPy, Pandas, and Scikit-learn.
This book at all times attempts to balance between theory and practice, giving you enough visibility into the underlying concepts and providing you with the best practices and practical advice that you can apply at your workplace right away. I have also made every attempt to keep the content up-to-date and relevant. Even though this makes the book susceptible to being outdated in a few rare instances, I am confident the content will remain useful and relevant through the next versions of the AWS services.
Who This Book Is For
This book is best suited for software developers who wish to learn about machine learning in general and how to leverage machine learning-specific offerings from AWS. The book is also useful to data scientists, system architects, and application architects, who want to get an introduction to some of the commonly used AWS services in the machine learning space.
If you are new to both machine learning and AWS, I advise that you read all chapters from start to finish. If you are an experienced data scientist, you may want to skip ahead to Part 2 to learn about machine learning-specific AWS services.
What This Book Covers
This book covers building and training machine learning models with Python on the AWS cloud, as well as a number of ready-to-use machine learning services such as Amazon Rekognition, Amazon Comprehend, and Amazon Lex.
The book also covers general high-level concepts of machine learning, including feature engineering, data visualization, as well as supporting AWS services that are used to build machine learning systems such as Amazon IAM, Amazon Cognito, Amazon S3, Amazon DynamoDB, and AWS Lambda.
The model-building and evaluation code in this book is written in Python 3. Services provided by Amazon, Apple, and Google are updated frequently and therefore sometimes you may encounter a newer version of a screen when you follow the instructions in a chapter.
How This Book Is Structured
This book consists of 18 chapters that are grouped into two parts, and four appendices. The first part consists of five chapters and covers the fundamentals of machine learning using Python. This part covers techniques for feature engineering, data visualization, model building, and model evaluation using Pandas, NumPy, Matplotlib, and Scikit-learn. The examples developed in this part make use of Jupyter Notebook and are aimed at readers who are new to machine learning.
Part 2 covers building machine learning applications using AWS services. This part starts with introducing the basics of commonly used AWS services such as Amazon S3, Amazon DynamoDB, and AWS Lambda. It then proceeds to AWS services that deal specifically with machine learning such as Amazon Comprehend, Amazon Lex, Amazon Machine Learning, and Amazon SageMaker. Two chapters are dedicated to Amazon SageMaker; the first one covers building and deploying models using built-in algorithms and Scikit-learn, and the second one covers building and deploying a model with Google TensorFlow. Not all chapters in this part include source code, but where applicable, you can download the source code that accompanies each chapter using a GitHub link. Some of the chapters in this part require you to upload files to Amazon S3; you will need to substitute the names of buckets in the examples with those from your own account.
The chapters in Part 1 include:
- Introduction to Machine Learning (Chapter 1) This is an introduction to the types of machine learning systems, their applications, and tools used to build machine learning systems.
- Data Collection and Preprocessing (Chapter 2) This chapter covers sources that can be used to obtain training data, techniques to explore datasets, and basic feature engineering.
- Data Visualization with Python (Chapter 3) This chapter covers techniques to visualize datasets using Matplotlib.
- Creating Machine Learning Models with Scikit-learn (Chapter 4) This chapter covers techniques to build and train classification and regression models using Scikit-learn.
- Evaluating Machine Learning Models (Chapter 5) This chapter covers techniques to evaluate the quality of a machine learning model.
The chapters in Part 2 include:
- Introduction to Amazon Web Services (Chapter 6) This chapter is a brief primer on cloud computing and Amazon Web Services. It also covers commonly encountered service and deployment models.
- AWS Global Infrastructure (Chapter 7) This chapter introduces AWS regions, availability zones, and edge locations.
- Identity and Access Management (Chapter 8) This chapter introduces one of the key services provided by AWS to secure your resources in the Amazon cloud. It also provides instructions to sign up for an account under the AWS free tier.
- Amazon S3 (Chapter 9) This chapter introduces one the most commonly used storage services provided by AWS, Amazon Simple Storage Service (S3).
- Amazon Cognito (Chapter 10) This chapter introduces Amazon's cloud-based OAuth2.0-compliant identity management solution, Amazon Cognito.
- Amazon DynamoDB (Chapter 11) This chapter introduces Amazon's managed NoSQL database service, Amazon DynamoDB.
- AWS Lambda (Chapter 12) This chapter introduces AWS Lambda, a service designed to allow you to run code in the Amazon cloud without having to provision or manage any infrastructure.
- Amazon Comprehend (Chapter 13) This chapter introduces Amazon Comprehend, a cloud-based natural language processing service that you can integrate into your applications to analyze the contents of text documents.
- Amazon Lex (Chapter 14) This chapter introduces Amazon Lex, a cloud-based service that you can use to create chatbots and integrate them into your applications.
- Amazon Machine Learning (Chapter 15) This chapter introduces Amazon Machine Learning, a fully managed cloud-based service that you can use to build and deploy simple machine learning models without any programming.
- Amazon SageMaker (Chapter 16) This chapter introduces Amazon SageMaker, a cloud-based machine learning service that can be used to train and deploy both built-in and custom machine learning models.
- Using Google Tensorflow with Amazon SageMaker (Chapter 17) This chapter introduces Google's Tensorflow framework and covers the use of Amazon SageMaker to build and deploy Tensorflow models.
- Amazon Rekognition (Chapter 18) This chapter introduces Amazon Rekognition, a fully managed cloud-based service that can be used to add computer vision capabilities to your applications.
The appendices cover the following topics:
- Anaconda and Jupyter Notebook Setup (Appendix A) This appendix provides instructions to install the Anaconda distribution and set up a Jupyter Notebook server on your local computer.
- AWS Resources Needed to Use This Book (Appendix B) This appendix provides information on the AWS resources that you need to set up in your account in order to follow along with the examples in the book.
- Installing and Configuring the AWS CLI (Appendix C) This appendix provides instructions to download and install the AWS CLI tool.
- Introduction to NumPy and Pandas (Appendix D) This appendix provides an introduction to two Python libraries commonly used by data scientists: NumPy and...