Adversarial Machine Learning

Name: Adversarial Machine Learning | Mechanisms, Vulnerabilities, and Strategies for Trustworthy AI
Brand: Wiley-ISTE
Price: 73.99 EUR
Availability: OnlineOnly

Mechanisms, Vulnerabilities, and Strategies for Trustworthy AI

Jason Edwards(Author)

Wiley-ISTE (Publisher)

1st Edition

Published on 6. January 2026

542 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-394-40204-5 (ISBN)

€73.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.

Alles über E-Books, Kopierschutz & Dateiformate finden Sie in unserem Info- & Hilfebereich.

Enables readers to understand the full lifecycle of adversarial machine learning (AML) and how AI models can be compromised

Adversarial Machine Learning is a definitive guide to one of the most urgent challenges in artificial intelligence today: how to secure machine learning systems against adversarial threats.

This book explores the full lifecycle of adversarial machine learning (AML), providing a structured, real-world understanding of how AI models can be compromised-and what can be done about it.

The book walks readers through the different phases of the machine learning pipeline, showing how attacks emerge during training, deployment, and inference. It breaks down adversarial threats into clear categories based on attacker goals-whether to disrupt system availability, tamper with outputs, or leak private information. With clarity and technical rigor, it dissects the tools, knowledge, and access attackers need to exploit AI systems.

In addition to diagnosing threats, the book provides a robust overview of defense strategies-from adversarial training and certified defenses to privacy-preserving machine learning and risk-aware system design. Each defense is discussed alongside its limitations, trade-offs, and real-world applicability.

Readers will gain a comprehensive view of today???s most dangerous attack methods including:

Evasion attacks that manipulate inputs to deceive AI predictions
Poisoning attacks that corrupt training data or model updates
Backdoor and trojan attacks that embed malicious triggers
Privacy attacks that reveal sensitive data through model interaction and prompt injection
Generative AI attacks that exploit the new wave of large language models

Blending technical depth with practical insight, Adversarial Machine Learning equips developers, security engineers, and AI decision-makers with the knowledge they need to understand the adversarial landscape and defend their systems with confidence.

More details

Other editions

Person

Content

Preface xi

Acknowledgments xiii

From the Author xv

Introduction xvii

About the Companion Website xxi

1 The Age of Intelligent Threats 1

The Rise of AI as a Security Target 1

Fragility in Intelligent Systems 3

Categories of AI: Predictive, Generative, and Agentic 5

Milestones in Adversarial Vulnerability 8

Intelligence as an Attack Multiplier 10

Why This Book and Who It's For 12

Recommendations 14

Conclusion 16

Key Concepts 16

2 Anatomy of AI Systems and Their Attack Surfaces 21

The Architecture of Predictive, Generative, and Agentic AI 21

The AI Development Lifecycle: From Data to Deployment 24

Classical Machine Learning vs. Modern AI Pipelines 26

Identifying Entry Points: Training, Inference, and Supply Chain 28

Security Debt in the Model Development Lifecycle 31

Recommendations 33

Conclusion 35

Key Concepts 35

3 The Adversary's Playbook 39

Threat Actors: Profiles, Motivations, and Objectives 39

White-Box Attack Techniques and Methodologies 41

Black-Box Attack Techniques and Methodologies 44

Gray-Box Attack Techniques and Methodologies 47

Operationalizing AI Attacks: Tactical Methodologies and Execution 49

Advanced Multi-Stage and Coordinated AI Attacks 52

Recommendations 54

Conclusion 55

Key Concepts 56

4 Evasion Attacks-Tricking AI Models at Inference 61

Core Principles and Mechanisms of Evasion Attacks 61

Gradient-Based Evasion Techniques 64

Linguistic and Textual Evasion Methods 67

Image- and Vision-Based Evasion Techniques 69

Evasion Attacks on Time-Series and Sequential Models 72

Recommendations 74

Conclusion 76

Key Concepts 76

5 Poisoning Attacks-Compromising AI Systems During Training 81

Fundamentals and Mechanisms of Training-Time Poisoning 81

Label Manipulation and Clean-Label Poisoning Techniques 84

Backdoor and Trojan Insertion in Training Data 86

Poisoning Attacks on Federated and Distributed Learning Systems 89

Poisoning Attacks Against Reinforcement Learning (RL) Systems 91

Poisoning Attacks on Transfer Learning and Fine-Tuning Processes 94

Recommendations 96

Conclusion 98

Key Concepts 98

6 Privacy Attacks-Extracting Secrets from AI Models 103

Core Mechanisms and Objectives of AI Privacy Attacks 103

Membership Inference Techniques 106

Model Inversion Attacks and Data Reconstruction 109

Attribute and Property Inference Attacks 111

Model Extraction and Functionality Reconstruction 114

Exploiting Privacy Leakage Through Prompting Generative AI 117

Recommendations 119

Conclusion 120

Key Concepts 121

7 Backdoor and Trojan Attacks-Embedding Hidden Behaviors in AI Models 125

Fundamental Concepts of AI Backdoors and Trojans 125

Backdoor Trigger Design and Optimization 128

Data Poisoning Methods for Backdoor Embedding 130

Trojan Attacks in Transfer and Fine-Tuning Scenarios 132

Embedding Backdoors in Federated and Decentralized Training 135

Advanced Trigger Embedding in Generative and Agentic AI Models 137

Recommendations 140

Conclusion 141

Key Concepts 142

8 The Generative AI Attack Surface 147

Architectural Foundations of Large Language Models 147

How Generative Architectures Expand Attack Opportunities 150

Exploiting Fine-Tuning as an Adversarial Vector 152

Prompt Engineering as an Adversarial Exploitation Pathway 155

Technical Risks in Retrieval-Augmented Generation Systems 157

Leveraging Model Internals for Generative AI Exploitation 160

Recommendations 163

Conclusion 164

Key Concepts 165

9 Prompt Injection and Jailbreak Techniques 169

Technical Foundations of Prompt Injection Attacks 169

Direct Prompt Injection Methods and Input Crafting 173

Indirect Prompt Injection via External or Retrieved Content 175

Jailbreak Techniques and Semantic Boundary Exploitation 177

Token-Level and Embedding Space Manipulations 180

Contextual and Conversational Injection Strategies 182

Recommendations 185

Conclusion 186

Key Concepts 187

10 Data Leakage and Model Hallucination 191

Technical Mechanisms of Data Leakage in Generative Models 191

Membership and Attribute Inference via Generative Outputs 195

Model Inversion and Training Data Reconstruction 197

Hallucination Exploitation in Generative Outputs 199

Prompt-Based Extraction of Memorized Data 202

Exploiting Multi-Modal and Cross-Modal Leakage in Generative Models 204

Recommendations 207

Conclusion 208

Key Concepts 209

11 Adversarial Fine-Tuning and Model Reprogramming 213

Technical Foundations of Adversarial Fine-Tuning 213

Semantic Perturbation Methods for Adversarial Fine-Tuning 216

Embedding Covert Behaviors via Adversarial Prompt Conditioning 219

Advanced Trojan Embedding via Fine-Tuning Gradients 221

Cross-Model and Transferable Adversarial Fine-Tuning Attacks 223

Model Reprogramming via Adversarial Fine-Tuning Techniques 226

Recommendations 228

Conclusion 229

Key Concepts 230

12 Agentic AI and Autonomous Threat Loops 235

Technical Foundations of Agentic AI Systems 235

Technical Manipulation of Autonomous Decision Loops 238

Exploitation of Agentic Memory and Context Management 241

Agentic Tool Integration and External API Exploitation 244

Technical Embedding of Autonomous Chain Injection 246

Exploitation of Environmental Interactions and Stateful Vulnerabilities 248

Recommendations 251

Conclusion 252

Key Concepts 253

13 Securing the AI Supply Chain 257

Technical Mechanisms of Supply Chain Poisoning in AI Models 257

Artifact and Model Checkpoint Contamination Techniques 260

Technical Exploitation of Third-Party AI Libraries and Frameworks 263

Dataset Provenance and Annotation Manipulation Techniques 265

Technical Exploitation of Hosted and Cloud-based Model Infrastructure 268

Artifact Repositories and Model Zoo Contamination Methods 270

Recommendations 272

Conclusion 273

Key Concepts 274

14 Evaluating AI Robustness and Response Strategies 277

Technical Foundations of AI Robustness Evaluation 277

Metrics for Evaluating AI Security and Robustness 279

Robust Optimization Methods and Adversarial Training 282

Certified Robustness and Formal Verification Techniques 285

Technical Benchmarking Tools and Evaluation Frameworks 287

Technical Analysis of Robustness Across Model Architectures and Modalities 289

Recommendations 292

Conclusion 293

Key Concepts 294

15 Building Trustworthy AI by Design 299

Technical Foundations of Security-by-Design in AI Systems 299

Robust Embedding and Representation Learning Methods 302

Technical Approaches to Adversarially Robust Architectures 304

Technical Integration of Formal Verification in Model Design 306

Technical Frameworks for Runtime Anomaly Detection and Filtering 308

Technical Embedding of Model Interpretability and Transparency 310

Recommendations 313

Conclusion 315

Key Concepts 315

16 Looking Ahead-Security in the Era of Intelligent Agents 319

Technical Foundations of Future Agentic AI Systems 319

Emerging Technical Attack Vectors in Agentic Systems 322

Technical Exploitation of Multi-Modal and Cross-Domain Agentic Capabilities 325

Future Technical Capabilities in Automated Adversarial Generation 327

Technical Mechanisms for Evaluating Advanced Agentic Robustness 330

Technical Embedding of Ethical Constraints and Safety Mechanisms 332

Recommendations 335

Conclusion 337

Key Concepts 337

Glossary 341

Index 367

1
The Age of Intelligent Threats

Artificial intelligence (AI) has shifted from an experimental novelty to a core dependency across industries, embedding itself into critical infrastructure, decision-making systems, and operational workflows. As organizations embrace predictive, generative, and agentic models, the threat landscape expands-not just in scale, but in character. These systems do not behave like conventional software; they adapt, generalize, and sometimes misfire in ways that defy deterministic reasoning. Understanding their fragility, attack surfaces, and adversarial vulnerabilities is essential to securing AI deployments in environments where trust, reliability, and resilience are paramount.

From the earliest examples of input manipulation in spam filters to today's sophisticated prompt injection and model inversion attacks, adversarial threats have evolved alongside the capabilities of AI itself. This chapter traces the arc of that evolution, grounding the reader in the practical realities of how intelligent systems fail-and how they are made to fail. It categorizes AI not by function alone but by the unique risks each class of model introduces, highlighting how complexity, autonomy, and tool integration multiply the consequences of exploitation. With real-world incidents driving regulatory scrutiny and operational risk, AI security professionals can no longer afford to rely on conventional security patterns.

The Rise of AI as a Security Target

AI systems have rapidly evolved from academic experiments into critical infrastructure supporting a wide range of high-stakes domains. In sectors such as finance, healthcare, energy, transportation, and national defense, AI no longer plays a supporting role-it drives decision-making, automates processes, and interfaces directly with sensitive data and physical systems. Its role in fraud detection, medical imaging analysis, real-time logistics, and mission planning positions AI as both a powerful enabler and a high-value target. The transition from novelty to operational necessity has placed AI firmly within the adversary's crosshairs, fundamentally altering the security landscape.

The widespread adoption of AI in safety-critical applications introduces systemic risk. Unlike traditional software, AI behavior is not defined by fixed rules but emerges from data and training. This creates a unique and often poorly understood attack surface. Autonomous systems, including diagnostic models and autonomous vehicles, can exhibit brittle behavior under subtle adversarial manipulation. Because these models operate with limited transparency, identifying when a model has been compromised or is behaving maliciously is inherently difficult. As AI becomes increasingly embedded in decisions that affect lives, livelihoods, and national stability, ensuring its trustworthy operation is no longer optional-it is existential.

Ironically, the very properties that make AI valuable-its adaptability, scalability, and autonomy-are the same ones that render it uniquely vulnerable. Unlike static software, machine learning systems learn from data, which can be poisoned. They make decisions based on statistical patterns, which can be perturbed. And they often generalize in unpredictable ways, which can be exploited. Adversaries do not need to understand every line of code to compromise an AI system; instead, they manipulate the inputs, the training pipeline, or the deployment environment to subvert the system's outputs. In doing so, attackers can cause silent misclassifications, insert backdoors, or extract sensitive data with minimal effort.

The commoditization of AI components has further accelerated this exposure. Pre-trained models, open-source codebases, and public Application Programming Interfaces (APIs) have democratized access to powerful AI capabilities. While this open ecosystem fuels innovation, it also provides adversaries with everything they need to reverse engineer, test, and exploit models. It is now trivial for threat actors to download a state-of-the-art model, identify its failure modes, and craft tailored attacks-then deploy those attacks against a nearly identical model used in production by a target organization. This mirrors the early days of cybersecurity, when shared protocols and codebases inadvertently created monocultures that were ripe for exploitation.

Despite these risks, security practices around AI remain immature. In many organizations, model development proceeds without threat modeling, secure coding practices, or even basic logging and monitoring of inference behavior. The drive to innovate has consistently outpaced the incentive to secure. Unlike traditional IT systems, where security is regulated and risk is well understood, AI security lacks standardized frameworks, maturity models, or widespread institutional knowledge. This asymmetry between adoption and defense has led to a significant gap between the operational importance of AI and the protections it receives.

This gap is now visible in real-world incidents. State-sponsored attackers have used prompt injection and model leakage to extract sensitive information from AI-powered chatbots. Cybercriminals have leveraged AI for automated fraud, synthetic identity creation, and model evasion. Insider threats have used backdoored models to manipulate recommendations and access restricted data. Even researchers, with no malicious intent, have repeatedly demonstrated how easy it is to bypass model safeguards, infer training data, or repurpose models for tasks far outside their intended scope. These cases illustrate not hypothetical concerns, but present-day vulnerabilities with significant implications.

Trust in AI systems is fragile. Users, organizations, and governments are increasingly aware that models which perform well under benign conditions may collapse under pressure. A model may classify images accurately 99% of the time, but mislabel a stop sign when a sticker is applied. A chatbot may follow instructions faithfully until a cleverly phrased prompt triggers malicious behavior. This brittleness undermines trust-not only in individual systems but in the broader AI paradigm. As AI becomes more pervasive, maintaining this trust requires more than accuracy benchmarks; it demands demonstrable robustness and transparency under adversarial conditions.

Importantly, securing AI cannot rely solely on traditional perimeter defenses. Firewalls, access controls, and runtime monitoring are necessary but insufficient. Machine learning systems are not static codebases-they are statistical processes embedded in data ecosystems. Securing them requires understanding how they learn, generalize, and respond to manipulation. The attacker's advantage lies in the mismatch between a model's behavior under test conditions and its behavior under adversarial use. The defender must shift from securing the infrastructure around the model to understanding and fortifying the model itself.

This demands a new mindset. AI security is not about preventing every conceivable attack-it is about raising the cost of compromise, detecting anomalies, and building systems that degrade gracefully under pressure. It is about integrating red teaming, robustness evaluation, and adversarial testing into the machine learning lifecycle. It is about acknowledging that some models are not safe for deployment and creating governance processes to decide when and where AI can be trusted. And it is about aligning AI development with broader cybersecurity, privacy, and ethical frameworks to ensure resilience by design, not as an afterthought.

As AI continues to evolve, so too will the sophistication of its adversaries. The same capabilities that enable zero-shot generalization and autonomous reasoning can be weaponized to evade detection, generate disinformation, or coordinate attacks. The boundary between AI developer and security engineer will blur. Defending AI will require interdisciplinary collaboration between data scientists, software engineers, cybersecurity experts, and policymakers. No single discipline holds the keys to secure AI; it is a collective responsibility shaped by shared risk.

In this landscape, adversarial machine learning is no longer a niche research area-it is a frontline discipline. It provides the tools to understand how AI systems fail, how those failures can be induced, and how to build defenses that anticipate, rather than react to, attack. As AI becomes integral to societal infrastructure, the security community must treat machine learning systems with the same rigor, scrutiny, and urgency applied to any other critical system. The age of intelligent threats has arrived-and the era of passive AI security must end.

Fragility in Intelligent Systems

Machine learning systems, especially deep learning models, are not engineered in the traditional sense-they are optimized. This distinction lies at the heart of their fragility. These models do not follow programmed logic; instead, they learn statistical patterns from data. As a result, their internal representations of the world can diverge drastically from human intuition. A model may classify millions of examples correctly during evaluation, yet fail dramatically in the presence of a carefully crafted, imperceptible change. This discrepancy between surface-level performance and underlying stability is not an implementation bug-it is an architectural truth of modern AI. Table 1.1 maps adversarial risks to key phases in the AI system lifecycle to illustrate where defenders must apply security controls...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Adversarial Machine Learning

Description

More details

Other editions

Additional editions

Person

Content

1
The Age of Intelligent Threats

The Rise of AI as a Security Target

Fragility in Intelligent Systems

System requirements