This book provides readers the "big picture" and a comprehensive survey of the domain of big data processing systems. For the past decade, the Hadoop framework has dominated the world of big data processing, yet recently academia and industry have started to recognize its limitations in several application domains and thus, it is now gradually being replaced by a collection of engines that are dedicated to specific verticals (e.g. structured data, graph data, and streaming data). The book explores this new wave of systems, which it refers to as Big Data 2.0 processing systems.
After Chapter 1 presents the general background of the big data phenomena, Chapter 2 provides an overview of various general-purpose big data processing systems that allow their users to develop various big data processing jobs for different application domains. In turn, Chapter 3 examines various systems that have been introduced to support the SQL flavor on top of the Hadoop infrastructure and provide competing and scalable performance in the processing of large-scale structured data. Chapter 4 discusses several systems that have been designed to tackle the problem of large-scale graph processing, while the main focus of Chapter 5 is on several systems that have been designed to provide scalable solutions for processing big data streams, and on other sets of systems that have been introduced to support the development of data pipelines between various types of big data processing jobs and systems. Next, Chapter 6 focuses on covering the emerging frameworks and systems in the domain of scalable machine learning and deep learning processing. Lastly, Chapter 7 shares conclusions and an outlook on future research challenges. This new and considerably enlarged second edition not only contains the completely new chapter 6, but also offers a refreshed content for the state-of-the-art in all domains of big data processing over the last years.
Overall, the book offers a valuable reference guide for professional, students, and researchers in the domain of big data processing systems. Further, its comprehensive content will hopefully encourage readers to pursue further research on the subject.
Sherif Sakr is the Head of Data Systems Group at the Institute of Computer Science, University of Tartu, Estonia. His research interest is data and information management in general, particularly in big data processing systems, big data analytics, data science and big data management in cloud computing platforms. He has published more than 150 refereed research publications in international journals and conferences. Sherif is an ACM Senior Member and an IEEE Senior Member, and in 2017, he has been appointed to serve as an ACM Distinguished Speaker and as an IEEE Distinguished Speaker. In addition, he is serving as the Editor-in-Chief of the Springer Encyclopedia of Big Data Technologies, and is also serving as a Co-Chair for the European Big Data Value Association (BDVA) TF6-Data Technology Architectures Group. In 2019, he received the best Arab scholar award from the Abdul Hammed Shoman Foundation.
Introduction.- General-Purpose Big Data Processing Systems.- Large-Scale Processing Systems of Structured Data.- Large-Scale Graph Processing Systems.- Large-Scale Stream Processing Systems.- Large-Scale Machine/Deep Learning Frameworks.- Conclusions and Outlook.