Natural Language Processing
A Machine Learning Perspective
Cambridge University Press
2nd Edition
Will be published approx. on 31. December 2026
Book
Paperback/Softback
691 pages
978-1-009-56095-5 (ISBN)
Description
This gentle introduction to the most important techniques in natural language processing uses a unified mathematical and algorithmic framework and gradually increases in complexity. Topics covered range from n-gram language models to large language models (LLMs), from perceptron to deep learning, from text classification to structured prediction (e.g., sequence labelling, segmentation, and parsing) and generation, and from discrete representation to neural representation of linguistics structures. This book provides a comprehensive overview of NLP, making it ideal for upper undergraduate and graduate students in computer science and a valuable reference for researchers and engineers. Exercises of varying difficulty are provided as well as teaching slides and tutorial videos. The new edition features three new chapters on pre-trained language models and large language models as well as a new preliminary chapter overviewing data and model as a framework for NLP methods.
More details
Edition
2nd Revised edition
Language
English
Place of publication
Cambridge
United Kingdom
Edition type
Revised edition
Product notice
Paperback (trade)
Illustrations
Worked examples or Exercises
ISBN-13
978-1-009-56095-5 (9781009560955)
Copyright in bibliographic data is held by Nielsen Book Services Limited or its licensors: all rights reserved.
Schweitzer Classification
Persons
Yue Zhang is Professor at the School of Engineering at Westlake University and Fellow of the Association for Computational Linguistics. He received his Ph.D. from the University of Oxford, and worked as a postdoctoral research associate at the University of Cambridge. He has served as PC chair for EMNLP 2022, Test-of-Time Committee chair for ACL 2024 and 2025, and editor for multiple journals. Zhiyang Teng is a researcher at ByteDance. He has worked at TikTok since 2023. Previously, he was Research Assistant Professor at NTU Singapore and a research fellow at Westlake University. He received his Ph.D. from the Singapore University of Technology and Design in 2018. His research focuses on large language modelling and multimodal reasoning.
Content
Preface; Notation; Part I. Basics: 1. Introduction; 2. Data and model; 3. Counting relative frequencies; 4. Feature vector representation and discriminative text classification; 5. Neuron; 6. Information, entropy, and word representation; 7. Latent variables and EM; Part II. Structures: 8. Generative sequence labelling; 9. Discriminative sequence labelling; 10. Sequence segmentation; 11. Predicting tree structures; 12. Transition-based methods for structured prediction; 13. Bayesian network; Part III. Deep Learning: 14. A paradigm shift to neural network; 15. Sequence representation; 16. Neural structured prediction; 17. Representing structures; 18. Sequence-to-sequence models; 19. Transformer pre-training; 20. Deep latent variable models; 21. Language models as competent generalists; 22. Large language models and beyond; Bibliography; Index.