Machine Learning Methods for Scientific Data Compression
CRC Press
1st Edition
Will be published approx. on 25. November 2026
Book
Hardback
208 pages
978-1-041-22976-6 (ISBN)
Description
This groundbreaking book, Machine Learning Methods for Scientific Data Compression, delivers an essential exploration into the rapidly evolving field of data reduction for scientific applications. As scientific simulations generate petabytes of data, traditional compression methods falter in maintaining critical fidelity. This work introduces novel machine learning approaches, from advanced autoencoders to generative foundation models, all designed to achieve unprecedented compression ratios while rigorously guaranteeing the accuracy of both primary data and quantities of interest.
Dive into comprehensive chapters covering autoencoders, constrained and guaranteed autoencoders, adaptive data reduction, and attention-based hierarchical methods. Discover the power of guaranteed conditional diffusion and the revolutionary potential of foundation models for scientific data. The book culminates in a unified framework for scalable, high-fidelity data reduction, showcasing practical GPU-accelerated pipelines and experimental results across diverse domains like climate modeling, turbulent flow, and plasma physics. This resource provides the tools and insights needed to accelerate scientific discovery by getting smarter faster with data.
The book is a must-read for researchers, data scientists, and engineers grappling with the challenges of managing and analyzing colossal scientific datasets in the age of exascale computing.
Dive into comprehensive chapters covering autoencoders, constrained and guaranteed autoencoders, adaptive data reduction, and attention-based hierarchical methods. Discover the power of guaranteed conditional diffusion and the revolutionary potential of foundation models for scientific data. The book culminates in a unified framework for scalable, high-fidelity data reduction, showcasing practical GPU-accelerated pipelines and experimental results across diverse domains like climate modeling, turbulent flow, and plasma physics. This resource provides the tools and insights needed to accelerate scientific discovery by getting smarter faster with data.
The book is a must-read for researchers, data scientists, and engineers grappling with the challenges of managing and analyzing colossal scientific datasets in the age of exascale computing.
More details
Language
English
Place of publication
Boca Raton, Florida
United States
Publishing group
Taylor & Francis
Target group
Professional Practice & Development
Illustrations
12
67 farbige Abbildungen, 67 Farbfotos bzw. farbige Rasterbilder, 12 s/w Tabellen
Dimensions
Height: 234 mm
Width: 156 mm
ISBN-13
978-1-041-22976-6 (9781041229766)
Schweitzer Classification
Other editions
Additional editions
Xiao Li | Jaemoon Lee | Tania Banerjee
Machine Learning Methods for Scientific Data Compression
E-Book
approx. 11/2026
Taylor & Francis
€68.49
Not yet available
Xiao Li | Jaemoon Lee | Tania Banerjee
Machine Learning Methods for Scientific Data Compression
E-Book
approx. 11/2026
CRC Press
€68.49
Not yet available
Persons
Xiao Li is a Ph.D. student at the University of Florida, specializing in machine learning for scientific data reduction, large language models, generative AI, and AI for science. He holds M.S.E. and B.S. degrees from Sun Yat-sen University.
Jaemoon Lee is a postdoctoral associate at Oak Ridge National Laboratory. He earned his Ph.D. and M.S. from the University of Florida, focusing on machine learning, physics-informed neural networks, large language models, and data compression.
Tania Banerjee, Ph.D., is an Assistant Professor at the University of Houston. Her research integrates high-performance computing with AI and ML for data-driven solutions in transportation, healthcare, cybersecurity, and large-scale scientific data compression.
Liangji Zhu is a Ph.D. student at the University of Florida. His research areas include machine learning for predictive analytics, scientific data compression, generative AI, spatiotemporal modeling, and AI for science.
Qian Gong is a computer scientist at Oak Ridge National Laboratory. With a Ph.D. from Duke University, her research interests encompass lossy compression, data management, and AI-based surrogate modeling for scientific applications.
Scott Klasky is a Distinguished Scientist at Oak Ridge National Laboratory, leading efforts in high-performance data management and data reduction for scientific computing. He founded ADIOS and developed MGARD.
Rahul Sengupta, Ph.D., is an Adjunct Research Scientist at the University of Florida. His research applies machine learning models to sequential and time-series data, particularly in transportation engineering.
Anand Rangarajan is a Professor at the University of Florida, specializing in machine learning, computer vision, medical and hyperspectral imaging, and the science of consciousness.
Sanjay Ranka is a Distinguished Professor at the University of Florida. His research focuses on high-performance computing and big data science, with applications in CFD, healthcare, and transportation. He is a Fellow of IEEE and AAAS.
Jaemoon Lee is a postdoctoral associate at Oak Ridge National Laboratory. He earned his Ph.D. and M.S. from the University of Florida, focusing on machine learning, physics-informed neural networks, large language models, and data compression.
Tania Banerjee, Ph.D., is an Assistant Professor at the University of Houston. Her research integrates high-performance computing with AI and ML for data-driven solutions in transportation, healthcare, cybersecurity, and large-scale scientific data compression.
Liangji Zhu is a Ph.D. student at the University of Florida. His research areas include machine learning for predictive analytics, scientific data compression, generative AI, spatiotemporal modeling, and AI for science.
Qian Gong is a computer scientist at Oak Ridge National Laboratory. With a Ph.D. from Duke University, her research interests encompass lossy compression, data management, and AI-based surrogate modeling for scientific applications.
Scott Klasky is a Distinguished Scientist at Oak Ridge National Laboratory, leading efforts in high-performance data management and data reduction for scientific computing. He founded ADIOS and developed MGARD.
Rahul Sengupta, Ph.D., is an Adjunct Research Scientist at the University of Florida. His research applies machine learning models to sequential and time-series data, particularly in transportation engineering.
Anand Rangarajan is a Professor at the University of Florida, specializing in machine learning, computer vision, medical and hyperspectral imaging, and the science of consciousness.
Sanjay Ranka is a Distinguished Professor at the University of Florida. His research focuses on high-performance computing and big data science, with applications in CFD, healthcare, and transportation. He is a Fellow of IEEE and AAAS.
Author
Prof, Uni of Florida.
Prof, Uni of Florida
University of Florida, Gainesville, USA
Content
1. Introduction 2. Autoencoders 3. Constrained Autoencoders 4. Guaranteed Autoencoders 5. Adaptive Data Reduction 6. Attention and Hierarchical methods 7. Guaranteed Conditional Diffusion 8. Foundation Models