Chapter 1
Introduction to Energy Efficiency in Large-Scale Distributed Systems
Jean-Marc Pierson1 and Helmut Hlavacs2
1IRIT, University of Toulouse, France
2Faculty of Computer Science, University of Vienna, Austria
1.1 Energy Consumption Status
The demand for research in energy efficiency in large-scale systems is supported by several incentives [1Â-3], including financial incentives by government or institutions to energy efficient industries/companies [4-5]. Indeed, studies such as [6] reported already in 2006 that the information technology (IT) consumption accounts for 5% to 10% of the growing global electricity demand and for a mere 2% of the energy while data centers alone account for 14% of the information and communication technology (ICT) footprint. It was projected that by 2020, the energy demand of data centers will represent 18% of the ICT footprint, the carbon footprint rising at an annual 7% pace, doubling between 2007 and 2020 [7]. The study of Koomey [8] in 2011 highlights that the rise of energy consumption is not as bad as expected in 2007: between 2005 and 2010, the electricity demand for data centers increased by (only) about 56% worldwide instead of the projected doubling and even as low as 36% in the United States. Altogether the electricity used worldwide for operating data centers in 2010 accounted for about 1.3% of total electricity use.
The past 5 years have witnessed the increase of research focusing especially in energy reduction. While being a major concern in embedded systems since decades, the problem is quite new in the large-scale infrastructures where performances have been for long the sole parameters to optimize. The motivation comes from two complementary concerns: first, the electrical cost of running such infrastructure is equivalent nowadays to the purchase costs of the equipment during a 4-year usage [9]. Second, electricity providers are not always able to deliver the needed power to run the machines, capping the amount of electricity delivered to one particular client.
Modern usage of ITs relies on the existence of large data centers, high-performance infrastructures, and performance networks, core and mobile networks.
Cloud computing is one of the major evolutions in IT in the past decade. It mainly relies on data centers, some hosting thousands of servers. In 2010, Google was hosting already 900,000 servers (almost 1 million must be the case today, and is estimated to be even more than 1.5 million). In 2013, Microsoft's CEO Steve Ballmer claimed hosting more than 1 million servers. Amazon is guessed to have about the same number of servers. For 1 million servers, at about 200 W per server, plus something like 50 W for cooling and electricity distribution losses, it represents a total power consumption of 250 MW, which is likely 2 TWh/year. However, in [8], it is shown that less than 1% of electricity used by data centers worldwide was attributable to Google's data center operations: The big players are often cited as examples, but they represent only a few percentage of the problem. When they exhibit better energy efficiency, it must be remembered that most other companies have less advances and the average is far from these big players.
While, traditionally, supercomputers have been mainly compared by their raw performance measured now in PFlops (petaflops), they are now also assessed based on their energy efficiency. The ranking of supercomputers by their energy efficiency places great emphasis on their energy consumption through the number of GFlops (gigaflops) they can achieve per watt. For instance, the Tianhe-2 machine, the leading one of the top performance list (Top500 1), delivers a computing power of over 33 PFlops and shows an energy efficiency of 1.9 GFlops/W; while the CINECA machine, which tops the green list (Green500 list 2) with an energy efficiency of 3.9 GFlops/W, delivers a low computing power of less than 2 PFlops. Nevertheless, it can be noted that supercomputers are getting greener, or more exactly, their energy efficiency is continuously increasing, while their energy consumption itself is nevertheless growing. Despite this trend, it will be difficult to achieve exascale computing for 20 MW by 2020, the limit given by the US Department of Energy (DoE).
Network operators are among the most power-consuming players. Telecom Italia [10] estimated that its consumption represented 1% of the Italian total power consumption in 2011 (compared to 0.7% in 2008). Similarly, British Telecom estimates 0.7% to be its share of electricity usage in the United Kingdom (2.3 TWh), same as that of NTT in Japan. These numbers account not only for networks but also for associated infrastructures to operate them. For instance, for Telecom Italia, 65% of electricity is consumed in the networks (wired and mobile) and 10% by their data centers. However, these numbers do not account for the equipment at final clients. In France, a study from IDATE [11] shows that the total electricity consumption of the telecom is 8.5 TWh in 2012 (for 6.7 TWh in 2008). The share is 40% for wired and mobile networks, 6% for data centers, 24% for the ADSL boxes at client places, and, finally, 18% for the fix and mobile phones themselves. We can notice that the total energy consumption of the internet boxes at clients' home is estimated to be 3.3 TWh in 2012 (40 millions of boxes).
The share of power consumption in servers is evolving continuously, because of the improvement in electronics for individual components. Processors (central processing unit, CPU) and memory account together for about 54% of the total consumption, with a rough 37% share for CPU and 17% for memory while the other components are consuming less: Peripheral Component Interconnect (PCI) slots (23%), motherboard (12%), disk (6%), and fans (5%) [12] (see Chapter 2 for details). When graphics processing unit (GPU) are present, they can represent up to a tremendous 50% share of the total consumption. It is, therefore, not surprising that most of the efforts have been put on reducing the power consumption of processors and memory. However, despite the urge for proportional computing already demonstrated in 2007 [13], the current servers are not consuming proportionally to their usage. This makes a lot of work trying to switch off components (consolidation in clouds) or using them at lower speed and capacity (dynamic voltage frequency scaling (DVFS) for CPU, Low Power Idle (LPI) for network cards, disk spin down for hard disk) very valuable. It should be noted that the situation is improving: 5 years ago, a server was consuming as much as 50% of its peak power when idle. Now this drops to 20%, and the peak power itself is decreasing. One can wonder if the works based on the nonproportionality of power consumption will still be interesting in the future. We believe that the delay is still long enough to see achievements in this dimension and also that the aforementioned researches may be used in conjunction and transferred at lower levels (at the components architecture), to allow for actual proportional computing.
One must not forget also the impact of cooling in the global consumption, especially in data centers or large-scale networking equipment rooms. The power usage effectiveness (PUE), promoted since 2007 as a criteria for assessing the power efficiencies of infrastructures, is the ratio between the global power usage to the power usage for IT. While it was common to have a PUE of 2 or more (meaning that as much electricity was used for the infrastructure-mainly cooling and distribution losses) the state-of-the-art values are now at about 1.5 or 1.6. Still 50-60% of power is used for cooling IT equipment. However, as outlined earlier, many data centers do not operate with state-of-the-art solutions, and their PUE are more likely to be at about 1.8 or 1.9 [8].
Energy concerns have been integrated in many works at the different levels of the IT stack: hardware, network, middleware, and software levels in large-scale distributed systems, being high-performance computing (HPC), clouds, or networks.
In the following, we exhibit some actions undertaken at these levels, in particular in the scope of an European-funded initiative.
1.2 Target of the Book
The focus and context of the book is on large-scale distributed systems. We will not study embedded systems in this book. Also we are not investigating hardware-specific optimization for energy saving. Instead, we focus in this work on energy-efficient computation and communication in large-scale distributed systems. These systems consist of thousands of heterogeneous elements that communicate via heterogeneous networks and provide different memory, storage, and processing capabilities. Examples for very large-scale distributed systems are computational and data grids, data centers, clouds, core and sensor networks, and so on.
The target audiences of the book are manifold: from IT and environmental researchers to operators of large-scale systems, up to small and medium sized enterprises (SMEs) and startups willing to understand the global picture and the state of the art in the field. It helps in building strategies and understanding upcoming developments in the rapid field of energy efficiency to speedup transfer of technologies to industries [14].
1.3 The Cost Action IC0804
This section introduces the European Cooperation in Science and Technology (COST) Action IC0804. The COST Action instrument is a 4-year funding scheme in European research framework aimed at helping the development of...