Machine Learning for Business Analytics

Name: Machine Learning for Business Analytics | Concepts, Techniques, and Applications in Python
Brand: Wiley
Price: 123.99 EUR
Availability: OnlineOnly

Concepts, Techniques, and Applications in Python

Galit Shmueli Peter C. Bruce Peter Gedeck Nitin R. Patel(Author)

Wiley (Publisher)

2nd Edition

Published on 2. June 2025

1276 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-394-28680-5 (ISBN)

€123.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.

Alles über E-Books, Kopierschutz & Dateiformate finden Sie in unserem Info- & Hilfebereich.

Machine Learning for Business Analytics: Concepts, Techniques, and Applications in Python is a comprehensive introduction to and an overview of the methods that underlie modern AI. This best-selling textbook covers both statistical and machine learning (AI) algorithms for prediction, classification, visualization, dimension reduction, rule mining, recommendations, clustering, text mining, experimentation, network analytics and generative AI. Along with hands-on exercises and real-life case studies, it also discusses managerial and ethical issues for responsible use of machine learning techniques.

This is the second Python edition of Machine Learning for Business Analytics. This edition also includes:

A new chapter on generative AI (large language models or LLMs, and image generation)
An expanded chapter on deep learning
A new chapter on experimental feedback techniques including A/B testing, uplift modeling, and reinforcement learning
A new chapter on responsible data science
Updates and new material based on feedback from instructors teaching MBA, Masters in Business Analytics and related programs, undergraduate, diploma and executive courses, and from their students
A full chapter of cases demonstrating applications for the machine learning techniques
End-of-chapter exercises with data
A companion website with more than two dozen data sets, and instructor materials including exercise solutions, slides, and case solutions

This textbook is an ideal resource for upper-level undergraduate and graduate level courses in AI, data science, predictive analytics, and business analytics. It is also an excellent reference for analysts, researchers, and data science practitioners working with quantitative data in management, finance, marketing, operations management, information systems, computer science, and information technology.

More details

Other editions

Persons

Content

Foreword by Gareth James xxi

Preface to the Second Python Edition xxiii

Acknowledgments xxvii

Part I Preliminaries

Chapter 1 Introduction 3

1.1 What Is Business Analytics? 3

1.2 What Is Machine Learning? 5

1.3 Machine Learning, AI, and Related Terms 5

1.4 Big Data 7

1.5 Data Science 8

1.6 Why Are There So Many Different Methods? 8

1.7 Terminology and Notation 9

1.8 Road Maps to This Book 12

Order of Topics 13

Chapter 2 Overview of the Machine Learning Process 17

2.1 Introduction 18

2.2 Core Ideas in Machine Learning 18

2.3 The Steps in a Machine Learning Project 22

2.4 Preliminary Steps 23

2.5 Predictive Power and Overfitting 37

2.6 Building a Predictive Model 43

2.7 Using Python for Machine Learning on a Local Machine 49

2.8 Automating Machine Learning Solutions 49

2.9 Ethical Practice in Machine Learning 54

Problems 55

Part II Data Exploration and Dimension Reduction

Chapter 3 Data Visualization 61

3.1 Uses of Data Visualization 62

3.2 Data Examples 64

3.3 Basic Charts: Bar Charts, Line Charts, and Scatter Plots 66

3.4 Multidimensional Visualization 75

3.5 Specialized Visualizations 90

Problems 98

Chapter 4 Dimension Reduction 101

4.1 Introduction 102

4.2 Curse of Dimensionality 102

4.3 Practical Considerations 103

4.4 Data Summaries 103

4.5 Correlation Analysis 108

4.6 Reducing the Number of Categories in Categorical Variables 109

4.7 Converting a Categorical Variable to a Numerical Variable 109

4.8 Principal Component Analysis 111

4.9 Dimension Reduction Using Regression Models 121

4.10 Dimension Reduction Using Classification and Regression Trees 121

Problems 123

Part III Performance Evaluation

Chapter 5 Evaluating Predictive Performance 129

5.1 Introduction 130

5.2 Evaluating Predictive Performance 131

5.3 Judging Classifier Performance 137

5.4 Judging Ranking Performance 150

5.5 Oversampling 156

Problems 162

Part IV Prediction and Classification Methods

Chapter 6 Multiple Linear Regression 167

6.1 Introduction 168

6.2 Explanatory vs. Predictive Modeling 168

6.3 Estimating the Regression Equation and Prediction 170

6.4 Variable Selection in Linear Regression 176

Problems 188

Chapter 7 k-Nearest Neighbors (k-NN) 193

7.1 The k-NN Classifier (Categorical Outcome) 194

7.2 k-NN for a Numerical Outcome 203

7.3 Advantages and Shortcomings of k-NN Algorithms 205

Problems 207

Chapter 8 The Naive Bayes Classifier 209

8.1 Introduction 209

8.2 Applying the Full (Exact) Bayesian Classifier 212

8.3 Solution: Naive Bayes 213

8.4 Advantages and Shortcomings of the Naive Bayes Classifier 224

Problems 226

Chapter 9 Classification and Regression Trees 229

9.1 Introduction 230

9.2 Classification Trees 232

9.3 Evaluating the Performance of a Classification Tree 241

9.4 Avoiding Overfitting 246

9.5 Classification Rules from Trees 252

9.6 Classification Trees for More Than Two Classes 252

9.7 Regression Trees 253

9.8 Advantages and Weaknesses of a Tree 256

9.9 Improving Prediction: Random Forests and Boosted Trees 258

Problems 264

Chapter 10 Logistic Regression 267

10.1 Introduction 268

10.2 The Logistic Regression Model 269

10.3 Example: Acceptance of Personal Loan 272

10.4 Evaluating Classification Performance 277

10.5 Variable Selection 280

10.6 Logistic Regression for Multi-Class Classification 281

10.7 Example of Complete Analysis: Predicting Delayed Flights 285

Problems 298

Chapter 11 Neural Nets 301

11.1 Introduction 302

11.2 Concept and Structure of a Neural Network 302

11.3 Fitting a Network to Data 303

11.4 Required User Input 316

11.5 Exploring the Relationship Between Predictors and Outcome 317

11.6 Deep Learning 318

11.7 Advantages and Weaknesses of Neural Networks 329

Problems 331

Chapter 12 Discriminant Analysis 333

12.1 Introduction 334

12.2 Distance of a Record from a Class 336

12.3 Fisher's Linear Classification Functions 337

12.4 Classification Performance of Discriminant Analysis 341

12.5 Prior Probabilities 342

12.6 Unequal Misclassification Costs 342

12.7 Classifying More Than Two Classes 344

12.8 Advantages and Weaknesses 347

Problems 348

Chapter 13 Generating, Comparing, and Combining Multiple Models 351

13.1 Ensembles 352

13.2 Automated Machine Learning (AutoML) 359

13.3 Explaining Model Predictions 365

13.4 Summary 366

Problems 368

Chapter 14 Experiments, Uplift Models, and Reinforcement Learning 371

14.1 A/B Testing 372

14.2 Uplift (Persuasion) Modeling 377

14.3 Reinforcement Learning 384

14.4 Summary 393

Problems 395

Part V Mining Relationships Among Records

Chapter 15 Association Rules and Collaborative Filtering 399

15.1 Association Rules 400

15.2 Collaborative Filtering 413

15.3 Summary 427

Problems 429

Chapter 16 Cluster Analysis 433

16.1 Introduction 434

16.2 Measuring Distance Between Two Records 437

16.3 Measuring Distance Between Two Clusters 443

16.4 Hierarchical (Agglomerative) Clustering 445

16.5 Non-Hierarchical Clustering: The k-Means Algorithm 453

Problems 459

Part VI Forecasting Time Series

Chapter 17 Handling Time Series 463

17.1 Introduction 464

17.2 Descriptive vs. Predictive Modeling 465

17.3 Popular Forecasting Methods in Business 465

17.4 Time Series Components 466

17.5 Data Partitioning and Performance Evaluation 470

Problems 474

Chapter 18 Regression-Based Forecasting 477

18.1 A Model with Trend 478

18.2 A Model with Seasonality 484

18.3 A Model with Trend and Seasonality 486

18.4 Autocorrelation and ARIMA Models 488

Problems 498

Chapter 19 Smoothing and Deep Learning Methods for Forecasting 509

19.1 Smoothing Methods: Introduction 510

19.2 Moving Average 510

19.3 Simple Exponential Smoothing 515

19.4 Advanced Exponential Smoothing 518

19.5 Deep Learning for Forecasting 521

Problems 527

Part VII Data Analytics

Chapter 20 Social Network Analytics 537

20.1 Introduction 538

20.2 Directed vs. Undirected Networks 538

20.3 Visualizing and Analyzing Networks 539

20.4 Social Data Metrics and Taxonomy 544

20.5 Using Network Metrics in Prediction and Classification 550

20.6 Business Uses of Social Network Analysis 556

20.7 Summary 557

Problems 559

Chapter 21 Text Mining 561

21.1 Introduction 562

21.2 The Tabular Representation of Text 562

21.3 Bag-of-Words vs. Meaning Extraction at Document Level 563

21.4 Preprocessing the Text 564

21.5 Implementing Machine Learning Methods 573

21.6 Example: Online Discussions on Autos and Electronics 573

21.7 Deep Learning Approaches 577

21.8 Example: Sentiment Analysis of Movie Reviews 578

21.9 Summary 581

Problems 584

Chapter 22 Responsible Data Science 587

22.1 Introduction 588

22.2 Unintentional Harm 589

22.3 Legal Considerations 591

22.4 Principles of Responsible Data Science 592

22.5 A Responsible Data Science Framework 595

22.6 Documentation Tools 599

22.7 Example: Applying the RDS Framework to the COMPAS Example 603

22.8 Summary 613

Problems 614

Chapter 23 Generative AI 617

23.1 The Transformative Power of Generative AI 617

23.2 What is Generative AI? 619

23.3 Data and Infrastructure Requirements 621

23.4 Adapting Models for Specific Purposes 623

23.5 Prompt Engineering 624

23.6 Uses of Generative AI 625

23.7 Caveats and Concerns 629

23.8 Summary 631

Problems 633

Part VIII Cases

Chapter 24 Cases 639

24.1 Charles Book Club 639

24.2 German Credit 646

24.3 Tayko Software Cataloger 651

24.4 Political Persuasion 655

24.5 Taxi Cancellations 659

24.7 Direct-Mail Fundraising 665

24.8 Catalog Cross-Selling 668

24.9 Time-Series Case: Forecasting Public Transportation Demand 670

24.10 Loan Approval 672

References 675

Index 677

CHAPTER 1
Introduction

1.1 WHAT IS BUSINESS ANALYTICS?

Business Analytics (BA) is the practice and art of bringing quantitative data to bear on decision-making. The term means different things to different organizations.

Consider the role of analytics in helping newspapers survive the transition to a digital world. One tabloid newspaper with a working-class readership in Britain had launched a web version of the paper and did tests on its home page to determine which images produced more hits: cats, dogs, or monkeys. This simple application, for this company, was considered analytics. By contrast, the Washington Post has a highly influential audience that is of interest to big defense contractors: it is perhaps the only newspaper where you routinely see advertisements for aircraft carriers. In the digital environment, the Post can track readers by time of day, location, and user subscription information. In this fashion, the display of the aircraft carrier advertisement in the online paper may be focused on a very small group of individuals-say, the members of the House and Senate Armed Services Committees who will be voting on the Pentagon's budget.

Business Analytics, or more generically, analytics, include a range of data analysis methods. Many powerful applications involve little more than counting, rule-checking, and basic arithmetic. For some organizations, this is what is meant by analytics.

The next level of business analytics, now termed Business Intelligence (BI), refers to data visualization and reporting for understanding "what happened and what is happening." This is done by use of charts, tables, and dashboards to display, examine, and explore data. BI, which earlier consisted mainly of generating static reports, has evolved into more user-friendly and effective tools and practices, such as creating interactive dashboards that allow the user not only to access real-time data but also to directly interact with it. Effective dashboards are those that tie directly into company data and give managers a tool to quickly see what might not readily be apparent in a large complex database. One such tool for industrial operations managers displays customer orders in a single two-dimensional display, using color and bubble size as added variables, showing customer name, type of product, size of order, and length of time to produce.

Business Analytics now typically includes BI as well as sophisticated data analysis methods, such as statistical models and machine learning algorithms used for exploring data, quantifying and explaining relationships between measurements, and predicting new records. Methods like regression models are used to describe and quantify "on average" relationships (e.g., between advertising and sales), to predict new records (e.g., whether a new patient will react positively to a medication), and to forecast future values (e.g., next week's web traffic).

Readers familiar with earlier editions of this book may have noticed that the book title has changed from Data Mining for Business Intelligence to Data Mining for Business Analytics and, finally, in this edition to Machine Learning for Business Analytics. The first change reflected the advent of the term BA, which overtook the earlier term BI to denote advanced analytics. Today, BI is used to refer to data visualization and reporting. The second change reflects how the term machine learning has overtaken the older term data mining.

WHO USES PREDICTIVE ANALYTICS?

The widespread adoption of predictive analytics, coupled with the accelerating availability of data, has increased organizations' capabilities throughout the economy. A few examples are as follows:

Credit scoring: One long-established use of predictive modeling techniques for business prediction is credit scoring. A credit score is not some arbitrary judgment of creditworthiness; it is based mainly on a predictive model that uses prior data to predict repayment behavior.

Future purchases: A more recent (and controversial) example is Target's use of predictive modeling to classify sales prospects as "pregnant" or "not-pregnant." Those classified as pregnant could then be sent sales promotions at an early stage of pregnancy, giving Target a head start on a significant purchase stream.

Tax evasion: The US Internal Revenue Service found it was 25 times more likely to find tax evasion when enforcement activity was based on predictive models, allowing agents to focus on the most-likely tax cheats (Adapted from Siegel, 2013).

The Business Analytics toolkit also includes statistical experiments, the most common of which is known to marketers as A/B testing. These are often used for pricing decisions:

Orbitz, the travel site, found that it could price hotel options higher for Mac users than Windows users.
Staples online store found it could charge more for staplers if a customer lived far from a Staples store.

Beware the organizational setting where analytics is a solution in search of a problem: a manager, knowing that business analytics and machine learning are hot areas, decides that her organization must deploy them too, to capture that hidden value that must be lurking somewhere. Successful use of analytics and machine learning requires both an understanding of the business context where value is to be captured and an understanding of exactly what the machine learning methods do.

1.2 WHAT IS MACHINE LEARNING?

In this book, machine learning (or data mining) refers to business analytics methods that go beyond counts, descriptive techniques, reporting, and methods based on business rules. While we do introduce data visualization, which is commonly the first step into more advanced analytics, the book focuses mostly on the more advanced data analytics tools. Specifically, it includes statistical and machine learning methods that inform decision-making, often in an automated fashion. Prediction is typically an important component, often at the individual level. Rather than "what is the relationship between advertising and sales," we might be interested in "what specific advertisement, or recommended product, should be shown to a given online shopper at this moment?" Or we might be interested in clustering customers into different "personas" that receive different marketing treatment and then assigning each new prospect to one of these personas.

The era of Big Data has accelerated the use of machine learning. Machine learning methods, with their power and automaticity, have the ability to cope with huge amounts of data and extract value.

1.3 MACHINE LEARNING, AI, AND RELATED TERMS

The field of analytics is growing rapidly, both in terms of the breadth of applications and in terms of the number of organizations using advanced analytics. As a result, there is considerable overlap and inconsistency of definitions. Terms have also changed over time.

The older term data mining itself means different things to different people. To the general public, it may have a general, somewhat hazy and pejorative meaning of digging through vast stores of (often personal) data in search of something interesting. Data mining, as it refers to analytic techniques, has largely been superseded by the term machine learning. Other terms that organizations use are predictive analytics, predictive modeling, and most recently machine learning and artificial intelligence (AI).

Many practitioners, particularly those from the IT and computer science communities, use the term AI to refer to all the methods discussed in this book. AI originally referred to the general capability of a machine to act like a human, and, in its earlier days, existed mainly in the realm of science fiction and the unrealized ambitions of computer scientists. More recently, it has come to encompass the methods of statistical and machine learning discussed in this book, as the primary enablers of that grand vision, and sometimes the term is used loosely to mean the same thing as machine learning. More broadly, it includes generative capabilities such as the creation of images, audio, and video.

Statistical Modeling vs. Machine Learning

A variety of techniques for exploring data and building models have been around for a long time in the world of statistics: linear regression, logistic regression, discriminant analysis, and principal components analysis, for example. However, the core tenets of classical statistics-computing is difficult and data are scarce-do not apply in machine learning applications where both data and computing power are plentiful.

This gives rise to Daryl Pregibon's description of "data mining" (in the sense of machine learning) as "statistics at scale and speed" (Pregibon, 1999). Another major difference between the fields of statistics and machine learning is the focus in statistics on inference from a sample to the population regarding an "average effect"-for example, "a $1 price increase will reduce average demand by 2 boxes." In contrast, the focus in machine learning is on predicting individual records-"the predicted demand for person given a $1 price increase is 1 box, while for person it is 3 boxes."...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Machine Learning for Business Analytics

Description

More details

Other editions

Additional editions

Persons

Content

CHAPTER 1
Introduction

1.1 WHAT IS BUSINESS ANALYTICS?

1.2 WHAT IS MACHINE LEARNING?

1.3 MACHINE LEARNING, AI, AND RELATED TERMS

Statistical Modeling vs. Machine Learning

System requirements

Schweitzer Fachinformationen

Machine Learning for Business Analytics

Description

More details

Other editions

Additional editions

Persons

Content

CHAPTER 1 Introduction

1.1 WHAT IS BUSINESS ANALYTICS?

1.2 WHAT IS MACHINE LEARNING?

1.3 MACHINE LEARNING, AI, AND RELATED TERMS

Statistical Modeling vs. Machine Learning

System requirements

CHAPTER 1
Introduction