Machine Learning for Business Analytics

Name: Machine Learning for Business Analytics | Concepts, Techniques and Applications with JMP Pro
Brand: Wiley
Price: 108.99 EUR
Availability: OnlineOnly

Concepts, Techniques and Applications with JMP Pro

Galit Shmueli Peter C. Bruce Mia L. Stephens Muralidhara Anandamurthy Nitin R. Patel(Autor*in)

Wiley (Verlag)

2. Auflage

Erschienen am 2. Mai 2023

1042 Seiten

E-Book

ePUB mit Adobe-DRM

Systemvoraussetzungen

978-1-119-90385-7 (ISBN)

108,99 €inkl. 7% MwSt.

Systemvoraussetzungen

für ePUB mit Adobe-DRM

E-Book Einzellizenz

Als Download verfügbar

Beschreibung

Weitere Details

Weitere Ausgaben

Personen

Inhalt

Foreword xix

Preface xx

Acknowledgments xxiii

Part I Preliminaries

1 Introduction 3

1.1 What Is Business Analytics? 3

1.2 What Is Machine Learning? 5

1.3 Machine Learning, AI, and Related Terms 5

1.4 Big Data 6

1.5 Data Science 7

1.6 Why Are There So Many Different Methods? 8

1.7 Terminology and Notation 8

1.8 Road Maps to This Book 10

2 Overview of the Machine Learning Process 17

2.1 Introduction 17

2.2 Core Ideas in Machine Learning 18

2.3 The Steps in A Machine Learning Project 21

2.4 Preliminary Steps 22

2.5 Predictive Power and Overfitting 29

2.6 Building a Predictive Model with JMP Pro 34

2.7 Using JMP Pro for Machine Learning 42

2.8 Automating Machine Learning Solutions 43

2.9 Ethical Practice in Machine Learning 47

Part II Data Exploration and Dimension Reduction

3 Data Visualization 59

3.1 Introduction 59

3.2 Data Examples 61

3.3 Basic Charts: Bar Charts, Line Graphs, and Scatter Plots 62

3.4 Multidimensional Visualization 70

3.5 Specialized Visualizations 82

3.6 Summary: Major Visualizations and Operations, According to Machine Learning Goal 87

4 Dimension Reduction 91

4.1 Introduction 91

4.2 Curse of Dimensionality 92

4.3 Practical Considerations 92

Part III Performance Evaluation

5 Evaluating Predictive Performance 117

5.1 Introduction 118

5.2 Evaluating Predictive Performance 118

Part IV Prediction and Classification Methods

6 Multiple Linear Regression 147

6.1 Introduction 147

6.2 Explanatory vs. Predictive Modeling 148

6.3 Estimating the Regression Equation and Prediction 149

6.4 Variable Selection in Linear Regression 155

7 k-Nearest Neighbors (k-NN) 175

7.1 The k-NN Classifier (Categorical Outcome) 175

8 The Naive Bayes Classifier 189

8.1 Introduction 189

9 Classification and Regression Trees 205

9.1 Introduction 206

9.2 Classification Trees 207

9.3 Growing a Tree for Riding Mowers Example 210

9.4 Evaluating the Performance of a Classification Tree 215

9.5 Avoiding Overfitting 219

9.6 Classification Rules from Trees 222

9.7 Classification Trees for More Than Two Classes 224

9.8 Regression Trees 224

9.9 Advantages and Weaknesses of a Single Tree 227

9.10 Improving Prediction: Random Forests and Boosted Trees 229

10 Logistic Regression 237

10.1 Introduction 237

10.2 The Logistic Regression Model 239

10.3 Example: Acceptance of Personal Loan 240

10.4 Evaluating Classification Performance 247

10.5 Variable Selection 249

10.6 Logistic Regression for Multi-class Classification 250

10.7 Example of Complete Analysis: Predicting Delayed Flights 253

11 Neural Nets 267

11.1 Introduction 267

11.2 Concept and Structure of a Neural Network 268

11.3 Fitting a Network to Data 269

11.4 User Input in JMP Pro 282

11.5 Exploring the Relationship Between Predictors and Outcome 284

11.6 Deep Learning 285

11.7 Advantages and Weaknesses of Neural Networks 289

12 Discriminant Analysis 293

12.1 Introduction 293

12.2 Distance of an Observation from a Class 295

12.3 From Distances to Propensities and Classifications 297

12.4 Classification Performance of Discriminant Analysis 300

12.5 Prior Probabilities 301

12.6 Classifying More Than Two Classes 303

12.7 Advantages and Weaknesses 306

13 Generating, Comparing, and Combining Multiple Models 311

13.1 Ensembles 311

13.2 Automated Machine Learning (AutoML) 317

13.3 Summary 322

Part V Intervention and User Feedback

14 Interventions: Experiments, Uplift Models, and Reinforcement Learning 327

14.1 Introduction 327

14.2 A/B Testing 328

14.3 Uplift (Persuasion) Modeling 333

14.4 Reinforcement Learning 340

14.5 Summary 344

Part VI Mining Relationships Among Records

15 Association Rules and Collaborative Filtering 349

15.1 Association Rules 349

15.2 Collaborative Filtering 362

15.3 Summary 370

16 Cluster Analysis 375

16.1 Introduction 375

16.2 Measuring Distance Between Two Records 378

16.3 Measuring Distance Between Two Clusters 383

16.4 Hierarchical (Agglomerative) Clustering 385

16.5 Nonhierarchical Clustering: The K-Means Algorithm 394

Part VII Forecasting Time Series

17 Handling Time Series 409

17.1 Introduction 409

17.2 Descriptive vs. Predictive Modeling 410

17.3 Popular Forecasting Methods in Business 411

17.4 Time Series Components 411

17.5 Data Partitioning and Performance Evaluation 415

18 Regression-Based Forecasting 423

18.1 A Model with Trend 424

18.2 A Model with Seasonality 430

18.3 A Model with Trend and Seasonality 433

18.4 Autocorrelation and ARIMA Models 433

19 Smoothing and Deep Learning Methods for Forecasting 455

19.1 Introduction 455

19.2 Moving Average 456

19.3 Simple Exponential Smoothing 461

19.4 Advanced Exponential Smoothing 465

19.5 Deep Learning for Forecasting 470

Part VIII Data Analytics

20 Text Mining 483

20.1 Introduction 483

20.2 The Tabular Representation of Text: Document-Term Matrix and "Bag-of-Words" 484

20.3 Bag-of-Words vs. Meaning Extraction at Document Level 486

20.4 Preprocessing the Text 486

20.5 Implementing Machine Learning Methods 492

20.6 Example: Online Discussions on Autos and Electronics 492

20.7 Example: Sentiment Analysis of Movie Reviews 500

20.8 Summary 502

21 Responsible Data Science 505

21.1 Introduction 505

21.2 Unintentional Harm 506

21.3 Legal Considerations 508

21.4 Principles of Responsible Data Science 508

21.5 A Responsible Data Science Framework 511

21.6 Documentation Tools 514

21.7 Example: Applying the RDS Framework to the COMPAS Example 517

21.8 Summary 526

Part IX Cases

22 Cases 533

22.1 Charles Book Club 533

22.2 German Credit 541

22.3 Tayko Software Cataloger 545

22.4 Political Persuasion 548

22.5 Taxi Cancellations 552

22.6 Segmenting Consumers of Bath Soap 554

22.7 Catalog Cross-Selling 557

22.8 Direct-Mail Fundraising 559

22.9 Time Series Case: Forecasting Public Transportation Demand 562

22.10 Loan Approval 564

Index 573

1
INTRODUCTION

1.1 WHAT IS BUSINESS ANALYTICS?

Business analytics (BA) is the practice and art of bringing quantitative data to bear on decision-making. The term means different things to different organizations.

Consider the role of analytics in helping newspapers survive the transition to a digital world. One tabloid newspaper with a working-class readership in Britain had launched a web version of the paper, and did tests on its home page to determine which images produced more hits: cats, dogs, or monkeys. This simple application, for this company, was considered analytics. By contrast, the Washington Post has a highly influential audience that is of interest to big defense contractors: it is perhaps the only newspaper where you routinely see advertisements for aircraft carriers. In the digital environment, the Post can track readers by time of day, location, and user subscription information. In this fashion the display of the aircraft carrier advertisement in the online paper may be focused on a very small group of individuals-say, the members of the House and Senate Armed Services Committees who will be voting on the Pentagon's budget.

Business analytics, or more generically, analytics, includes a range of data analysis methods.

Many powerful applications involve little more than counting, rule checking, and basic arithmetic. For some organizations, this is what is meant by analytics.

The next level of business analytics, now termed business intelligence (BI), refers to the use of data visualization and reporting for becoming aware and understanding "what happened and what is happening." This is done by use of charts, tables, and dashboards to display, examine, and explore data. Business intelligence, which earlier consisted mainly of generating static reports, has evolved into more user-friendly and effective tools and practices, such as creating interactive dashboards that allow the user not only to access real-time data, but also to directly interact with it. Effective dashboards are those that tie directly to company data, and give managers a tool to see quickly what might not readily be apparent in a large complex database. One such tool for industrial operations managers displays customer orders in one two-dimensional display using color and bubble size as added variables. The resulting 2 by 2 matrix shows customer name, type of product, size of order, and length of time to produce.

Business analytics now typically includes BI as well as sophisticated data analysis methods, such as statistical models and machine learning algorithms used for exploring data, quantifying and explaining relationships between measurements, and predicting new records. Methods like regression models are used to describe and quantify "on average" relationships (e.g., between advertising and sales), to predict new records (e.g., whether a new patient will react positively to a medication), and to forecast future values (e.g., next week's web traffic).

Readers familiar with the earlier edition of this book might have noticed that the book title changed from Data Mining for Business Analytics to Machine Learning for Business Analytics. The change reflects the more recent term BA, which overtook the earlier term BI to denote advanced analytics. Today, BI is used to refer to data visualization and reporting. The change from data mining to machine learning reflects today's common use of machine learning to refer to algorithms that learn from data. This book uses primarily the term machine learning.

WHO USES PREDICTIVE ANALYTICS?

The widespread adoption of predictive analytics, coupled with the accelerating availability of data, has increased organizations' capabilities throughout the economy. A few examples:

Credit scoring: One long-established use of predictive modeling techniques for business prediction is credit scoring. A credit score is not some arbitrary judgement of creditworthiness; it is based mainly on a predictive model that uses prior data to predict repayment behavior.

Future purchases: A more recent (and controversial) example is Target's use of predictive modeling to classify sales prospects as "pregnant" or "not-pregnant." Those classified as pregnant could then be sent sales promotions at an early stage of pregnancy, giving Target a head start on a significant purchase stream.

Tax evasion: The US Internal Revenue Service found it was 25 times more likely to find tax evasion when enforcement activity was based on predictive models, allowing agents to focus on the most likely tax cheats (Siegel, 2013).

The business analytics toolkit also includes statistical experiments, the most common of which is known to marketers as A/B testing. These are often used for pricing decisions:

Orbitz, the travel site, has found that it could price hotel options higher for Mac users than Windows users.
Staples online store found that it could charge more for staplers if a customer lived far from a Staples store.

Beware the organizational setting where analytics is a solution in search of a problem: a manager, knowing that business analytics and machine learning are hot areas, decides that her organization must deploy them too, to capture that hidden value that must be lurking somewhere. Successful use of analytics and machine learning requires both an understanding of the business context where value is to be captured and an understanding of exactly what the machine learning methods do.

1.2 WHAT IS MACHINE LEARNING?

In this book, machine learning or data mining refers to business analytics methods that go beyond counts, descriptive techniques, reporting, and methods based on business rules. While we do introduce data visualization, which is commonly the first step into more advanced analytics, the book focuses mostly on the more advanced data analytics tools. Specifically, it includes statistical and machine learning methods that inform decision-making, often in automated fashion. Prediction is typically an important component, often at the individual level. Rather than "what is the relationship between advertising and sales?" we might be interested in "what specific advertisement, or recommended product, should be shown to a given online shopper at this moment?" Or we might be interested in clustering customers into different "personas" that receive different marketing treatment, then assigning each new prospect to one of these personas.

The era of Big Data has accelerated the use of machine learning. Machine learning methods, with their power and automaticity, have the ability to cope with huge amounts of data and extract value.

1.3 MACHINE LEARNING, AI, AND RELATED TERMS

The field of analytics is growing rapidly, both in terms of the breadth of applications, and in terms of the number of organizations using advanced analytics. As a result, there is considerable overlap and inconsistency in terms of definitions. Terms have also changed over time.

The older term data mining means different things to different people. To the general public, it may have a general, somewhat hazy and pejorative meaning of digging through vast stores of (often personal) data in search of something interesting. Data mining, as it refers to analytic techniques, has largely been superseded by the term machine learning.

Other terms that organizations use are predictive analytics, predictive modeling, and most recently machine learning and artificial intelligence (AI).

Many practitioners, particularly those from the IT and computer science communities, use the term AI to refer to all the methods discussed in this book. AI originally referred to the general capability of a machine to act like a human, and, in its earlier days, existed mainly in the realm of science fiction and the unrealized ambitions of computer scientists. More recently, it has come to encompass the methods of statistical and machine learning discussed in this book, as the primary enablers of that grand vision, and sometimes the term is used loosely to mean the same thing as machine learning. More broadly, it includes generative capabilities such as the creation of images, audio, and video.

Statistical Modeling vs. Machine Learning

A variety of techniques for exploring data and building models have been around for a long time in the world of statistics: linear regression, logistic regression, discriminant analysis, and principal components analysis, for example. But the core tenets of classical statistics-computing is difficult and data are scarce-do not apply in machine learning applications where both data and computing power are plentiful.

This is what gives rise to Daryl Pregibon's description of "data mining" (in the sense of machine learning) as "statistics at scale and speed" (Pregibon, 1999). Another major difference between the fields of statistics and machine learning is the focus in statistics on inference from a sample to the population regarding an "average effect"-for example, "a $1 price increase will reduce average demand by 2 boxes." In contrast, the focus in machine learning is on predicting individual records-"the predicted demand for person...

Systemvoraussetzungen

Als PDF speichern Als Link merken

Machine Learning for Business Analytics

Beschreibung

Weitere Details

Weitere Ausgaben

Personen

Inhalt

1 INTRODUCTION

1.1 WHAT IS BUSINESS ANALYTICS?

WHO USES PREDICTIVE ANALYTICS?

1.2 WHAT IS MACHINE LEARNING?

1.3 MACHINE LEARNING, AI, AND RELATED TERMS

Statistical Modeling vs. Machine Learning

Systemvoraussetzungen

1
INTRODUCTION