
Data Clustering with Python
From Theory to Implementation
Guojun Gan(Author)
Chapman & Hall/CRC (Publisher)
1st Edition
Published on 14. September 2025
Book
Hardback
248 pages
978-1-032-97156-8 (ISBN)
Description
Data clustering, an interdisciplinary field with diverse applications, has gained increasing popularity since its origins in the 1950s. Over the past six decades, researchers from various fields have proposed numerous clustering algorithms. In 2011, I wrote a book on implementing clustering algorithms in C++ using object-oriented programming. While C++ offers efficiency, its steep learning curve makes it less ideal for rapid prototyping. Since then, Python has surged in popularity, becoming the most widely used programming language since 2022. Its simplicity and extensive scientific libraries make it an excellent choice for implementing clustering algorithms.
Features:
Introduction to Python programming fundamentals
Overview of key concepts in data clustering
Implementation of popular clustering algorithms in Python
Practical examples of applying clustering algorithms to datasets
Access to associated Python code on GitHub
This book extends my previous work by implementing clustering algorithms in Python. Unlike the object-oriented approach in C++, this book uses a procedural programming style, as Python allows many clustering algorithms to be implemented concisely. The book is divided into two parts: the first introduces Python and key libraries like NumPy, Pandas, and Matplotlib, while the second covers clustering algorithms, including hierarchical and partitional methods. Each chapter includes theoretical explanations, Python implementations, and practical examples, with comparisons to scikit-learn where applicable.
This book is ideal for anyone interested in clustering algorithms, with no prior Python experience required. The complete source code is available at: https://github.com/ganml/dcpython.
Features:
Introduction to Python programming fundamentals
Overview of key concepts in data clustering
Implementation of popular clustering algorithms in Python
Practical examples of applying clustering algorithms to datasets
Access to associated Python code on GitHub
This book extends my previous work by implementing clustering algorithms in Python. Unlike the object-oriented approach in C++, this book uses a procedural programming style, as Python allows many clustering algorithms to be implemented concisely. The book is divided into two parts: the first introduces Python and key libraries like NumPy, Pandas, and Matplotlib, while the second covers clustering algorithms, including hierarchical and partitional methods. Each chapter includes theoretical explanations, Python implementations, and practical examples, with comparisons to scikit-learn where applicable.
This book is ideal for anyone interested in clustering algorithms, with no prior Python experience required. The complete source code is available at: https://github.com/ganml/dcpython.
More details
Series
Language
English
Place of publication
United Kingdom
Publishing group
Taylor & Francis Ltd
Target group
Professional and scholarly
Academic
Illustrations
40 s/w Abbildungen, 40 s/w Zeichnungen
40 Line drawings, black and white; 40 Illustrations, black and white
Dimensions
Height: 240 mm
Width: 161 mm
Thickness: 19 mm
Weight
558 gr
ISBN-13
978-1-032-97156-8 (9781032971568)
Copyright in bibliographic data and cover images is held by Nielsen Book Services Limited or by the publishers or by their respective licensors: all rights reserved.
Schweitzer Classification
Other editions
Additional editions

E-Book
09/2025
1st Edition
Chapman and Hall
€73.99
Available for download

E-Book
09/2025
1st Edition
Chapman and Hall
€73.99
Available for download
Person
Guojun Gan is an Associate Professor in the Department of Mathematics at the University of Connecticut, where he has been since August 2014. Prior to that, he worked at a large life insurance company in Toronto, Canada for six years and a hedge fund in Oakville, Canada for one year. He earned a BS degree from Jilin University, Changchun, China, in 2001 and MS and PhD degrees from York University, Toronto, Canada, in 2003 and 2007, respectively.
Content
1. Python Programming 101. 2. The NumPy Library. 3. The Pandas Library. 4. The Matplotlib Library. 5. Introduction to Data Clustering. 6. Agglomerative Hierarchical Algorithms. 7. DIANA. 8. The k-means Algorithm. 9. The c-means Algorithm. 10. The k-prototypes Algorithm. 11. The Genetic k-modes Algorithm. 12. The FSC Algorithm. 13. The Gaussian Mixture Algorithm. 14 The KMTD Algorithm. 15. The Probability Propagation Algorithm. 16. A Spectral Clustering Algorithm. 17. A Mean-Shift Algorithm.