1. Degree, Clustering Coefficient and Components (a) Consider an undirected network of size N in which each node has degree k = 1. Which condition does N have to satisfy? What is the degree distributi

S M A RT WAT T S : Self-Calibrating Software-De ned Power Meter for Containers Guillaume Fieni Univ. Lille / Inria France [email protected] Romain Rouvoy Univ. Lille / Inria / IUF France [email protected] Lionel Seinturier Univ. Lille / Inria France [email protected] Abstract —Fine-grained power monitoring of software activities becomes unavoidable to maximize the power usage ef ciency of data centers. In particular, achieving an optimal schedul- ing of containers requires the deployment of software-de ned power meters to go beyond the granularity of hardware power monitoring sensors, such as Power Distribution Units(PDU) or Intel's Running Average Power Limit (RAPL), to deliver power estimations of activities at the granularity of software containers.

However, the de nition of the underlying power models that estimate the power consumption remains a long and fragile process that is tightly coupled to the host machine.

To overcome these limitations, this paper introduces S M A RT- W AT T S : a lightweight power monitoring system that adopts online calibration to automatically adjust the CPU and DRAM power models in order to maximize the accuracy of runtime power estimations of containers. Unlike state-of-the-art tech- niques, S M A RTWAT T S does not require any a prioritraining phase or hardware equipment to con gure the power models and can therefore be deployed on a wide range of machines including the latest power optimizations, at no cost.

Index Terms —Energy, Containers, Power model I. IN T RO D U C T I O N Modern data centers are continuously trying to maximize the power usage ef ciency (PUE) of their hardware and software infrastructures to reduce their operating cost and eventually their carbon emission. While physical power meters offer a suitable solution to monitor the power consumption of physical servers, they fail to support the energy pro ling at a ner granularity: dealing with the software services that are distributed across such infrastructures. To overcome this limitation, software-de ned power meters build on power models to estimate the power consumption of software artifacts in order to identify potential energy hotspots and leaks in software systems [1] or improve the management of re- sources [2]. However, existing software-de ned power meters are integrating power models that are statically designed, or learned prior to any deployment in production [3], [4]. This may result in inaccuracies in power estimations when facing unforeseen environments or workloads, thus affecting the exploitation process. As many distributed infrastructures, such as clusters or data centers, have to deal with the scheduling of unforeseen jobs, in particular when handling black-box virtual machines, we can conclude that the adoption of such static power models [5] has to be considered as inadequate in production. We therefore believe that the state-of-the-art in this domain should move towards the integration of more dynamic power models that can adjust themselves at runtime to better re ect the variation of the underlying workloads and to cope with the potential heterogeneity of the host machines.

In this paper, we therefore introduce S M A RTWAT T S , as a self-calibrating software-de ned power meter that can au- tomatically adjust its CPU and DRAM power models to meet the power accuracy requirements of monitored software containers. Our approach builds on the principles of sequential learning principles and proposes to exploit coarse-grained power monitors like Running Power Average Limit (RAPL), which is commonly available on modern Intel's and AMD's micro-architecture generations, to control the estimation error.

We have implemented S M A RTWAT T S as an open source power meter to integrate our self-calibrating approach, which is in charge of automatically adjusting the power model when- ever some deviation from the ground truth is detected. When triggered, the computation of a new power model aggregates the past performance metrics from all the deployed containers to infer a more accurate power model and to seamlessly update the software-de ned power meter con guration, without any interruption. The deployment of S M A RTWAT T S in various environments, ranging from private clouds to distributed HPC clusters, demonstrates that S M A RTWAT T S can ensure accurate real-time power estimations (less than 3.5 % of error on average, at a frequency of 2 Hz) at the granularity of processes, containers and virtual machines. Interestingly, the introduction of sequential learning in software-de ned power meters elim- inates the learning phase, which usually last from minutes to hours or days, depending on the complexity of the hosting infrastructure [3], [5].

Additionally, our software-de ned approach does not re- quire any speci c hardware investment as S M A RTWAT T S can build upon embedded power sensors, like RAPL, whenever they are available. The code of S M A RTWAT T S is made avail- able online as open-source software 1 to encourage its deploy- ment at scale and to leverage the adoption and reproduction of our results. The key contributions of this paper can therefore be summarized as follows:

1)a self-calibrating power modelling approach, 2)CPU & DRAM models supporting power states, 1 https://github.com/powerapi- ng/smartwatts- formula 1 3)an open source implementation of our approach, 4)an assessment on container-based environments.

In the remainder of this paper, we start by providing some background on state-of-the-art power models and their limitations (cf. Section II) prior to introducing our contribu- tion (cf. Section III). Then, we detail the implementation of S M A RT WAT T S as an extension of the B I TW AT T S middleware framework (cf. Section IV) and we assess its validity on three scenarios (cf. Section V). We conclude and provide some perspectives for this work in Section VI.

II. RE L AT E D WO R K A. Hardware Power Meters Over the years, hardware power meters have evolved to deliver hardware-level power measurements with different levels of granularity, from physical machines to electronic components.

WAT T PRO F [6] power monitoring platform supports the pro ling of High Performance Computing (HPC) applications.

This solution is based on a custom board, which can collect raw power measurements from various hardware components (CPU, disk, memory, etc.) from sensors connected to power lines. The board can connect up to 128sensors that can be sampled at up to 12K H z . As in [7], the authors argue that this solution is able to perform per-process power estimation, but they only validate their approach while running a single application.

WAT T WAT C H E R [4] is a tool that can characterize workload energy power consumption. The authors use several calibration phases to build a power model that ts a CPU architecture.

This power model uses a prede ned set of Hardware Perfor- mance Counters (HWPC) as input parameters. As the authors use a special power model generator that can target any CPU architecture, which has be to carefully described.

RAPL [8] offers speci c hardware performance counters (H WPC) to report on the energy consumption of the CPU since the “Sandy Bridge“ micro-architecture for Intel (2011) and “Zen“ for AMD (2017). Intel divides the system into domains ( PP0, PP1, PKG, DRAM ) that report the energy consumption according to the requested context. The PP0domain repre- sents the core activity of the processor (cores + L1 + L2 + L3), the PP1 domain the uncore activities (LLC, integrated graphic cards, etc.), and PKGrepresents the sum of PP0andPP1 , and the DRAM domain exhibits the DRAM energy consumption.

Desrochers et al.demonstrate the accuracy of the DRAM power estimations of RAPL, especially on Intel Xeon pro- cessors [9].

B. Software-De ned Power Meters To get rid of the hardware cost imposed by the above solutions, the design of power models has been regularly considered by the research community over the last decade, in particular for CPU [5], [10]–[13]. Notably, as most ar- chitectures do not provide ne-grained power measurement capabilities, McCullough et al.[12] argue that power models are the rst step towards enabling dynamic power management for power proportionality at all levels of a system.

While standard operating system metrics (CPU, memory, disk, or network), directly computed by the kernel, tend to exhibit a large error rate due to their lack of precision [11], [13], H WPC can be directly gathered from the processor ( e.g., number of retired instructions, cache misses, non-halted cy- cles). Modern processors provide a variable number of H WPC events, depending on the generation of the micro-architectures and the model of the CPU. As shown by Bellosa [10] and Bircher [14], some H WPC events are highly correlated with the processor power consumption, while the authors in [15] concluded that not all HPC are relevant, as they may not be directly correlated with dynamic power.

Power modeling often builds on these raw metrics to apply learning techniques [16] to correlate the metrics with hardware power measurements using various regression models, which are so far mostly linear [12]. Three key components are com- monly considered to train a power model: a)the workload(s) to run during sampling, b)the minimal set of input parameters, and c)the class of regression to use [16]–[19].

The workloads used along the training phase have to be carefully selected to capture the targeted system. In this domain, many benchmarks have been considered, but they are mostly a)designed for a given architecture [16], [20], b)man- ually selected [5], [17]–[19], [21]–[24], or even c)private [17].

Unfortunately, this often leads to the design of power models that are tailored to a given processor architecture and manually tuned (for a limited set of power-aware features) [16], [17], [20], [21], [23]–[26].

C. Limitations & Opportunities To the best of our knowledge, the state of the art in hardware power meters often imposes hardware investments to provide power measurements with an high accuracy, but a coarse granularity, while software-de ned power meters target ne- grained power monitoring, but often fail to reach high accuracy on any architecture and/or workload.

This paper clearly differs from the state of the art by providing an open source, modular, and self-adaptive imple- mentation of a self-calibrating software-de ned power meter: S M A RT WAT T S . As far as we know, our implementation is the rst to deliver both CPU and DRAM power estimations at runtime for any software packaged as processes, containers or virtual machines. Unlike existing approaches published in the literature, the approach we describe is i)architecture agnostic, ii) processor aware, and iii)dynamic. So far, the state of the art fails to deploy software-de ned power meters in productions because i)the model learning phase can last from minutes to days, ii)the power models are often bound to a speci c context of execution that do not take into account hardware energy- optimization states, and iii)the reference power measurement requires speci c hardware to be installed on a large amount of nodes. This therefore calls for methods that can automatically adapt to the hardware and workload diversities of heterogenous 2 Fig. 1. Overview of S M A RTWAT T S environments in order to maintain the accuracy of power measurements at scale.

III. SM A RTWAT T S POW E R MO N I TO R I N G We therefore propose to support self-calibrating power mod- els that leverage Reference Measurements andHardware Per- formance Counters (HWPC) to estimate the power consump- tion at the granularity of software containers along multiple resources: CPU and DRAM. More speci cally, our contribu- tion builds upon two widely available system interfaces: RAPL to collect baseline measurements for CPU and DRAM power consumptions, as well as Linux's perf events interface to capture the Hardware Performance Counters (HWPC) events used to estimate the per-container power consumption from resource-speci c power models, which are adjusted at runtime.

A. Overview of SM A RT WAT T S Figure 1 introduces the general architecture of S M A RT- W AT T S . SM A RT WAT T S manages at runtime a set of self- calibrated power models ( Mf res ) for each power-monitorable resource res(e.g. , CPU, DRAM). These power models are then used by S M A RTWAT T S to estimate the power consump- tions of i)the host ^ p res and ii)all the hosted containers c:

^ p res ( c ).

S M A RT WAT T S uses^ p res to continuously assess the accu- racy of the managed power models ( Mf res ) and to ensure that the estimated power consumption does not diverge from the baseline measurements reported by RAPL ( prapl res , cf.

Section III-B). Whenever the estimated power consumption error ( res ) diverges from the baseline measurements beyond a con gured threshold, S M A RTWAT T S automatically triggers a new online calibration process of the diverging power model to better match the current input workload.

To better capture the dynamic power consumption of the host, S M A RTWAT T S needs to isolate its static consumption.

To do so, we use a dedicated component that activates when the machine is at rest— e.g., after booting (cf. Section III-C)— to monitor the power activity of the host.

In addition to the static constant, S M A RTWAT T S estimates the power consumption of the host from a set of raw input values that refers to H WPC events, which are selected at runtime (cf. Section III-E). This design ensures that S M A RTWAT T S keeps adjusting its power models to maximize the accuracy of power estima- tions. Therefore, unlike the state-of-the-art power monitoring solutions, S M A RTWAT T S does not suffers from estimation errors due to the adoption of an inappropriate power model as it autonomously optimizes the underlying power model whenever a accuracy anomaly is detected.

B. Modelling the Host Power Consumption For each resource res2 fpkg; dram gexposed by the RAPL interface, the associated power consumption prapl res can be modelled as:

prapl res = pstatic res + pdyn res (1) where pstatic res refers to the static power consumption of the monitored resource (cf. Section III-C), and pdyn res re ects the dynamic power dissipated by the processor along the sampling period.

Then, we can compute a power model Mf res = [ 0; ; n] that correlates, for a given frequency f(among available frequencies F, cf. Section III-D), the dynamic power con- sumption ( ^ p dyn res ) to the raw metrics reported by a set of of Hardware Performance Counter (HwPC) events (cf. Sec- tion III-E), Ef res = [ e 0; : : : ; e n] :

9 f 2 F; ^ p dyn res = M f res E f res (2) We build Mf res from a Ridge regression—a linear least squares regression with l2 regularization—applied on the past k samples Sf k = hp dyn res ; Ef res i , with pdyn res = prapl res pstatic res .

By comparing pdyn res + pstatic res with prapl res , we can continuously estimate the error " res = j pdyn res ^ p dyn res j from estimated values in order to monitor the accuracy of the power model M f res . Whenever the error exceeds a given threshold set by the administrator, a new power model is generated for the frequency fby integrating the latest samples.

C. Isolating the Static Power Consumption Isolating the static power consumption of a node is a challenging issue as it requires to reach a quiescient state in order to capture the power consumption of the host at rest.

To capture this information, we designed and implemented a power logger component that runs as a lightweight daemon with low priority that periodically logs the package and DRAM power consumptions reported by RAPL. Then, we compute the medianvalue and the interquartile range (IQR) from gathered measurements to de ne the pstatic res constant as : pstatic res = median res 1:5 I QR res. This approach intends to lter out outliers reported by RAPL, including periodic measurement errors we observed, and to consider the lowest power consumption observed along a given period of time.

By default, S M A RTWAT T S assumes that the static consump- tion of the host does not requires to be spread across the active containers. However, other power accounting policies can be implemented. For example, by reporting an empty static consumption, S M A RTWAT T S will share it across the running containers depending on their activity.

3 perf events selection (§ 3.E) host modelling (§ 3.B) p res power estimation (§ 3.F) idle power p res p pkg (c) , p dram (c) M res estimation error idle isolation (§ 3.C) perf events monitoring (§ 3.D) model calibration (§ 3.F) c target error e HwPC e HwPC ˆ dyn ˆ dyn e HwPC rapl ˆ dyn D. Monitoring Power States & HW PC Events As previously introduced, the accuracy of a power model M f res strongly depends on i)the selection of relevant input features (H WPC events e n ) and ii)the acquisition of input values that are evenly distributed along the reference power consumption range. This is one of the reasons why the input workloads used in standard calibration phases are often critical to capture an accurate power model that re ects the power consumption of a host for a given class of applications.

S M A RT WAT T S rather promotes a self-calibrating approach that does not impose the choice of a speci c benchmark or work- load, but exploits the ongoing activity variations of the host machine to continuously adjust its power models. To achieve this, S M A RTWAT T S monitors selected sets of H WPC events and stores the associated samples in memory. To better deal with the power features of hardware components, we group the input samples per operating frequency. This allows to calibrate frequency-speci c power models when an estimation arises, with the goal to converge automatically to a stable and precise power model over the time.

By balancing the samples along the range of frequencies operated by the processor, S M A RTWAT T S ensures that the power model learning phase does not over t the current context of execution, which may lead to the generation of unstable power models, thus impacting the accuracy of the power measurements. The sampling tuples Sf k are grouped into memory as frequency layers Lf res = [ Sf 0 ; :::; S f n ] , which are the raw features we maintain to build Mf res .

To store the samples in the layer corresponding to the current frequency of the processor, S M A RTWAT T S compute the average running frequency as follows:

Favg = F base APERF MPERF (3) where F base is the processor base frequency constant extracted from the Model Speci c Registers (MSR)PLATFORM_INFO .

APERF andMPERF are MSR-based counters that increment at the current and maximum frequencies, respectively. These counters are continuously updated, hence they report on a pre- cise average frequency without consuming the limited H WPC slots. Interestingly, the performance power states, such as P- states and Turbo Boost, will be accounted by these counters as they act mainly on the frequency of the core in order to boost the performance. The idle optimization states (C-states) will also be accounted as they mainly reduce of the average frequency of the core towards its Max Ef ciency Frequency before being powered-down.

E. Selecting the Correlated HW PC Events The second challenge of S M A RTWAT T S consists in selecting at runtime the relevant H WPC events that can be exploited to accurately estimate the power consumption. To do so, we list the available events exposed by the host's Performance Monitoring Units (PMU) and we evaluate their correlation with the power consumption reported by RAPL. Instead of testing each available H WPC events, we narrow the search using the PMU associated to the modelled component— i.e., we consider the H WPC events from the corePMU to model the PKG power consumption. As reference events, we consider unhalted-cycles for the package andllc-missesfor the DRAM, which are the standard H WPC events available across many processor architectures, and have been widely used by the state of the art to design power models [3]–[5].

To elect a H WPC event as a candidate for the power model, we rst compute the Pearson coef cient r e;p for nvalues reported by each monitored H WPC event eand the base power consumption preported by RAPL:

r e;p = n P i =1 ( e i e ) ( p i p ) s n P i =1 ( e i e )2 s n P i =1 ( p i p )2 (4) Then, S M A RTWAT T S stores the list of H WPC events that exhibit a better correlation coef cient rthan the baseline event for DRAM and PKG. This list of elected H WPC events is further used as input features to implement the PKG and DRAM power models exploited by S M A RTWAT T S .

F. Estimating the Container Power Consumption Given that we learn the power model Mf res from aggregated events, Ef res =P c2 C Ef res ( c ), we can predict the power consumption of any container cby applying the inferred power model Mf res at the scale of the container's events Ef res ( c ):

9 f 2 F; 8c 2 C; ^ p dyn res ( c ) = Mf res E f res ( c ) (5) In theory, one can expect that ^ p dyn res !

= pdyn res if the model perfectly estimates the dynamic power consumption but, in practice, the predicted value may introduce an error " res = j p dyn res ^ p dyn res j . Therefore, we cap the power consumption of any container cas:

8 c 2 C; d^ p dyn res ( c )e = p dyn res ^ p dyn res ( c ) ^ p dyn res (6) to ensure that pdyn res =P c2 C d ^ p dyn res ( c )e , thus avoiding potential outliers. Thanks to this approach, we can also report on the con dence interval of the power consumption of containers by scaling down the observed global error:

8c 2 C; " res( c ) = ^ p dyn res ( c ) ^ p dyn res " res (7) In the following sections, we derive and implement the above formula to report on the power consumption of pkg and dram resources. Our empirical evaluations report on the capped power consumptions for pkg(d ^ p dyn pkg e ) and dram ( d ^ p dyn dram e ), as well as the associated errors " pkg and " dram , respectively.

4 Fig. 2. Deployment of S M A RTWAT T S IV. I M P L E M E N TAT I O N O F SM A RT WAT T S We implemented S M A RTWAT T S as a modular software system that can run atop a wide diversity of production environments. As depicted in Figure 2, our open source implementation of S M A RTWAT T S mostly rely on 2 software components—a sensorand apower meter —which are con- nected with a M O N G ODB database. 2 MO N G O DB offers a exible and persistent buffer to store input metrics and power estimations. The sensoris designed as a lightweight process that is intended to run on target nodes with a limited impact.

The power meter is a remote service that can be deployed whenever needed. S M A RTWAT T S uses this feature to support both online and post mortempower estimations, depending on use cases.

A. Client-side Sensor The component sensorconsists in a lightweight software daemon deployed on all the nodes that need to be monitored. Static power isolation: When the node boots, the sensor starts the idle consumption isolation phase (cf. Section III-C) by monitoring the PKG and DRAM power consumptions re- ported by RAPL along the global idle CPU time and the fork, exec andexit process control activities provided by Linux process information pseudo- lesystem (procfs). Whenever a process control activity or the global idle CPU time exceed 99 % during this phase, the power samples are discarded to prevent the impact of background activities on the static power isolation process. As stated in III-C, this phase is only required when the idle attribution policy consider the idle consumption as a power leakage. It is not needed to run this phase as long as there is no change in the hardware con guration of the machine (speci cally CPU or DRAM changes). Event selection: Once completed, the sensorswitches to the event selection phase (cf. Section III-E). To select the most accurate Hardware performances counters to estimate the power of a given node, S M A RTWAT T S need to identify the H WPC statistically correlated with the power consump- tion of the components. For that, the sensormonitors the power consumption reported by RAPL and the maximum simultaneous H WPC events possible without multiplexing, as it can a signi cant noise and distort the correlation coef cient of the events, over a (con gurable) period of 30ticks. The 2 https://www.mongodb.com maximal amount of simultaneous H WPC events depends of the micro-architecture of the CPU and will be detected at runtime using the PMU detection feature of the libpfm4library.3 We then correlate the power consumption with the values of the monitored H WPC events and rank them by highest correlation with RAPL and lowest correlation across the other H WPC.

Whenever possible, xed H WPC event counters are selected in priority to avoid consuming a programmable counter.

Control groups: SM A RT WAT T S leverages the control groups (Cgroups) implemented by Linux to support a wide range of monitoring granularities, from single processes, to software containers (D O C K E R),4 to virtual machines (using L I B V I RT ).5 The sensor also implement a kernel module that is in charge of con guring the Cgroups to monitor the power consumption of kernel and system activities, which is not supported by default. To do so, this module de nes 2 dedicated Cgroups for the roots of the systemand thekernelprocess hierarchy.

Event monitoring: Once done with the above preliminary phases, the sensorautomatically starts to monitor the selected H W PC events together with RAPL measurements for the DRAM and CPU components at a given frequency and it reports these samples to the M O N G ODB backend (cf. Sec- tion III-D). The sensormonitors the selected H WPC events for the host and all the Cgroups synchronously to ensure that all the reported samples are consistent when computing the power models.

B. Server-side Power Meter The power meter is implemented as a software service that requires to be deployed on a single node ( e.g., the master of a cluster). The power metercan be used online to produce real-time power estimations or of ine to conduct post mortem analysis. This component consumes the input samples stored in the M O N G ODB database and produces power estimations accordingly. S M A RTWAT T S adopts a modu- lar architecture based on the actor programming model, which we use to integrate a wide range of input/output data storage technologies (MongoDB, In uxDB, etc.) and to implement power estimations at scale by devoting one actor per power model.

Power modelling: Thepower meter provides an abstrac- tion to build power models. In this paper, the power model we report on is handled by Scikit-Learn, which is the de facto standard Python library for general-purpose machine learning. 6 We embed the Ridge regression of Scikit-Learnin an actor, which is in charge of delivering a power estimation whenever a new sample is fetched from the database.

Model calibration: When the error reported by the power model exceeds the threshold de ned by the user, the power meter triggers a new calibration of the power model to take into account the latest samples. This new power model 3http://perfmon2.sourceforge.net 4 https://docker.com 5 https://libvirt.org 6 https://scikit- learn.org 5 cluster node node node node (master) OS sensor process process OS sensor mongodb power meter OS OS sensor container container VM VM ... ... ... ... sensor samples power estimations is checked against the last sample to estimate its accuracy.

If it estimates the power consumption below the con gured threshold, then the actor is updated accordingly.

Power estimation: Power estimations are delivered at the scale of a node and for the Cgroups of interest. These scope of these Cgroups can re ect the activity of nodes' kernel and system, as well as any job or service running in the monitored environment. These power estimations can then be aggregated by owner, service identi er or any other key, depending on use cases. They can also be aggregated along time to report on the energy footprint of a given software system.

V. VA L I DAT I O N O F SM A RT WAT T S This section assesses the ef ciency and the accuracy of S M A RT WAT T S to evaluate the power consumption of running software containers.

A. Evaluation Methodology We follow the experimental guidelines reported by [27] to enforce the quality of our results.

Testbeds & workloads: While our production-scale de- ployments of S M A RTWAT T S cover both K U B E R N E T E Sand O P E N STAC K clusters, for the purpose of this paper, we chose to report on more standard benchmarks, like S T R E S SNG7 and NASA's NAS Parallel Benchmarks (NPB) [28] to highlight the bene ts of our approach.

Our setups are reproduced on the G R I D5000 testbed in- frastructure, 8 which provides multiple clusters composed of powerful nodes. In this evaluation, we use a Dell PowerEdge C6420 server having two Intel Xeon Gold 6130 Processors (Skylake) and 192 GB of memory (12 slots of 16 GB DDR4 2666MT/s RDIMMs). We are using the Ubuntu 18.04.3 LTS Linux distribution running with the 4.15.0-55-generic Kernel version, where only a minimal set of daemons are running in background. As stated in IV-A, we are using the Cgroups to monitor the activity of the running processes independently.

In the case of the system services managed by systemd and the services running in Docker containers, their Cgroups membership is automatically handled as part of their lifetime management.

For this host, the reported TDP for the CPU is 125 Watts and 26 Watts for the DRAM. Theses values were obtained from the PKG_POWER_INFO andDRAM_POWER_INFO Model Spe- ci c Registers (MSR). The energy and performance optimiza- tion features of the CPU— i.e.,Hardware P-States (HWP), Hyper-Threading (HT),Turbo Boost (TB) andC-states, are fully enabled and use the default con guration of the distri- bution. The default CPU scaling driver and governor for the distribution are intel pstate andpowersave .

In all our experiments, we con gure S M A RTWAT T S to report power measurements twice a second ( 2H z ) with an error threshold of 5 Watts for the PKG and 1 Watt for the DRAM.

7https://launchpad.net/stress- ng 8 https://www.grid5000.fr Objectives:

We evaluate S M A RTWAT T S with the follow- ing criteria:

The quality of the power estimations when running se- quential and parallel workloads; The accuracy and stability of the power models across different workloads; The overhead of the SM A RTWAT T S sensor component on the monitored host.

Reproducibility: For the sake of reproducible research, S M A RT WAT T S , the necessary tools, deployment scripts and resulting datasets are open-source and publicly available on GitHub. 9 B. Experimental Results Quality of estimations: Figure 3 rst reports on the PKG and DRAM power consumptions we obtained with S M A RT WAT T S . The rst line ( rapl) refers to the ground truth power measurements we sample for the PKG and the DRAM via the H WPC events RAPL_ENERGY_PKG and RAPL_ENERGY_DRAM , respectively. The second line ( global ) refers to the power measurements estimated by S M A RTWAT T S for the PKG and the DRAM components from CPU_CLK_THREAD_UNHALTED:REF_P , CPU_CLK_THREAD_UNHALTED:THREAD_P , INSTRUCTIONS_RETIRED ( xed counters), and LLC_MISSES (programmable counter). The list of events has been automatically selected by the sensorcomponent as presenting the best correlation with RAPL samples, as described in Section III-E. The error for each of the power models are further discussed in Figures 5 and 6.

The lines kernelandsystem isolates the power con- sumption induced by all kernel and system activities. Kernel activities include devices speci c background operations, such as Network interface controller (NIC) and disks I/O processing queues, while system activities covers the different services, like the SSH server and Docker daemon, running on the node.

The remaining lines reports on individual power consump- tions of a set of NPB benchmarks, which are executed in sequence ( lu,ep ,ft ) or concurrently ( ft,cg ,ep ,lu ,mg ) with variable number of cores (ranging from 8 to 32 cores).

One can observe that S M A RTWAT T S supports the isolation of power consumptions at process-level by leveraging Linux Cgroups. This granularity allows S M A RTWAT T S to monitor indifferently processes, containers or virtual machines.

We also run stress-ng to observe potential side effects on the kernel activity by starting 32 workers that attempt to ood the host with UDP packets to random ports (cf. Figure 4).

While it remains negligible compared to the power consump- tion of the UDP ood process (2.971 W vs. 120.322 W on average), one can observe that this stress induces a lot of activity at the kernel to handle IO, while the rest system is not severely impacted.

9https://github.com/powerapi- ng/smartwatts- formula 6 Fig. 3. Evolution of the PKG & DRAM power consumption along time and containers Fig. 4. Illustrating the activity of the kernel when ooding UDP One can also observe that our sensor induces a negligible overhead (less than 0:2 Watts) with regards to the consumption of surrounding activities.

Estimation accuracy: Figures 5 and 6 reports on the distribution of estimation errors we observed per frequency and globally (right part of the plots) for the above scenario.

We also report on the number of estimations produced for each of the frequency (upper part of the plots). While the error threshold for CPU and DRAM is set to 5 Watts and 1 Watts, one can observe that S M A RTWAT T S succeeds to estimate the power consumption with less than 4 Watts and 0.5 Watt of error for the PKG and DRAM components, respectively. The only case where estimation error grows beyond this threshold refers to the frequency 1000 Hzof the CPU (cf. Figures 5).The frequency 1000 Hzrefers to the idle frequency of the node and the sporadic triggering of activities in this frequency induces a chaotic workload which is more dif cult to capture for S M A RT WAT T S given the limited number of samples acquired in this frequency (102 samples against 2868 samples for the frequency 2700 Hz).

The DRAM component, however, provides a more straight- forward behavior to model with the selected H WPC events and therefore reports an excellent accuracy, no matter the operating frequency of the CPU package (cf. Figure 6).

The accuracy of the power models generated by S M A RT- W AT T S are further detailed in Table I. While our approach suc- ceeds to deliver accurate estimations of the power consumption for both CPU and DRAM components, the maximum error refers to the bootstrapping phase of the sensor that requires to acquire a suf ciently representative number of samples in order to build a stable and accurate power model.

7100200300 rapl 100200300 global 025 kernel 0100 system 100150 npb-s-lu.C.8 0250 npb-s-ep.D.32 100200 npb-s-ft.D.16 100200 stress-ng-udpflood32 0200 npb-p-ft.D.16 0200 npb-p-cg.D.16 0100 npb-p-ep.D.16 0100 npb-p-lu.C.8 15 22:55 15 23:00 15 23:05 15 23:10 15 23:15 15 23:20 15 23:25 Timestamp0100 npb-p-mg.C.8 05 kernel 020 system 0.10.2 powerapi-sensor 15 23:06 15 23:07 15 23:08 15 23:09 15 23:10 Timestamp100200 stress-ng-udpflood32 Fig. 5. Global & per-frequency error rate of the PKG power models Fig. 6. Global & per-frequency error rate of the DRAM power models TABLE I P E R -S O C K E T PKG & DRAM P OW E R M O D E L S AC C U R AC YResource Socket " min " max " mean " std PKG 0 0.000 W 123.888 W 3.337 W 5.071 W 1 0.002 W 103.893 W 3.278 W 4.459 W DRAM 0 0.000 W 89.600 W 0.577 W 2.403 W 1 0.000 W 39.702 W 0.600 W 1.270 W Model stability:

Beyond the capability to accurately estimate the power consumption of software containers, we are also interested in assessing the capability of S M A RTWAT T S to generate stable power models over time. Tables II and III therefore reports, for each frequency, on metrics about the stability of power models. In particular, we look at the number of correct estimations produced by the power models in a given frequency. Given our input workloads, we can observe that S M A RT WAT T S succeeds to reuse a given power model up to 592 estimations, depending on frequencies. While we observed that the stability of our power models strongly depends on the sampling frequency, the error threshold, as well as the input workloads, one should note that the overhead for calibrating a power model in a given frequency does not take more than a couple milliseconds, which is perfectly acceptable when monitoring software systems in production.

TABLE II PKG P OW E R M O D E L S S TA B I L I T Y P E R F R E Q U E N C Y Frequency models total min max mean std 1000 86 102 1 24 1.545 2.899 2400 38 241 1 30 5.738 7.404 2500 63 415 1 50 4.414 6.778 2600 59 623 1 62 5.417 9.881 2700 392 2868 1 592 4.961 26.175 2800 47 1024 1 271 14.840 44.907 2900 21 201 1 107 7.730 20.535 3000 132 613 1 171 4.347 15.030 3100 27 165 1 72 6.600 13.898 3200 43 269 1 126 6.255 19.509 3300 35 319 1 90 8.861 19.585 3400 8 47 1 22 5.875 7.180 TABLE III DRAM P OW E R M O D E L S S TA B I L I T Y P E R F R E Q U E N C Y Frequency models total min max mean std 1000 27 102 2 38 11.333 12.103 2400 17 241 1 44 11.476 11.470 2500 34 421 1 87 6.682 12.068 2600 67 636 1 95 6.913 13.754 2700 280 2880 1 538 9.260 37.811 2800 19 1025 1 349 29.285 83.863 2900 21 190 1 35 7.037 9.146 3000 46 601 1 85 10.732 15.487 3100 20 163 1 48 8.150 12.533 3200 27 261 1 42 9.666 11.187 3300 27 309 1 78 11.444 16.158 3400 11 47 1 10 4.272 3.635 Monitoring overhead:

Regarding the runtime overhead of S M A RT WAT T S , one can observe in Figure 4 that the power consumption of S M A RTWAT T S is negligible compared to the hosted software containers. To estimate this overhead, we leverage the fact that the sensorcomponent is running inside a software container, thus enabling S M A RTWAT T S to esti- mate its own power consumption. In particular, one can note in Table IV that the sensorpower consumption represents 1.2 Watts for the PKG and 0.06 Watts for the DRAM, on average, when running at a frequency of 2H z . The usage of the Hardware Performance Counters (HwPC) is well known for its very low impact on the observed system, hence it does not induce runtime performance penalties [5], [19], [29], [30]. Additionally, we carefully took care of the cost of sampling these HwPC events and executing as little as possible instructions on the monitored nodes. By proposing a lightweight and packaged software solu- tion that can be easily deployed across monitored hosts, we facilitate the integration of power monitoring in large-scale computing infrastructures. Futhermore, the modular architec- ture of S M A RTWAT T S can accommodate existing monitoring 81000 2400 2500 2600 2700 2800 2900 3000 3100 3200 3300 3400 All Frequency layer of the model (in MHz)0.02.55.07.510.012.515.017.5 Error of the model compared to the reference (in W) 102 241 415 623 2868 1024 201 613 165 269 319 47 6887 1000 2400 2500 2600 2700 2800 2900 3000 3100 3200 3300 3400 All Frequency layer of the model (in MHz)0.000.250.500.751.001.251.501.75 Error of the model compared to the reference (in W) 102 241 421 636 2880 1025 190 601 163 261 309 47 6876 TABLE IV P E R -C O M P O N E N T P OW E R C O N S U M P T I O N O F T H E S E N S O R Power min max mean std PKG 0.0 W 52.078 W 1.241 W 6.559 W DRAM 0.0 W 29.966 W 0.065 W 0.566 W Fig. 7. Deployment of Kubernetes IoT backend services across 6 nodes infrastructures, like K U B E R N E T E SME T R I C S or OP E NSTAC K C E I L O M E T E R , to report on the power consumption of ap- plications. The following section therefore demonstrates this capability by deploying a distributed case study atop of a K U B E R N E T E S cluster.

C. Tracking the Energy Consumption of Distributed Systems To further illustrate the capabilities of S M A RTWAT T S , we take inspiration from [31] to deploy a distributed software systems that processes messages forwarded by IoT devices to a pipeline of processors connected by a K A F K Acluster to a C A S S A N D R A storage backend. Figure 7 depicts the deployment of this distributed system on a K U B E R N E T E S cluster composed of 1 master and 5 slave nodes. The input workload consists in a producer injecting messages in the cluster with a throughput ranging from 10 to 100 MB/s.

Figure 8 reports on the evolution of the power consumption per service while injecting the workload from the master node.

One can observe that, when increasing the message through- put, the most impacted service is the Consumer, which requires extensive energy to process all the messages enqueued by the Kafka service. This saturation of the Consumerservice seems to represent a core bottleneck in the application.

To further dive into this problem, we consider another perspective on the deployment in order to investigate the source of this ef ciency limitation. While the execution of this workload requires 1.32 MJoules of energy to process the whole dataset, Figure 9 further dives inside the distribution of the energy consumption of individual pods along the PKG and DRAM components as a Sankey diagram [32]. This diagram builds on the capability of S M A RTWAT T S to aggregate power estimations along time to report on the energy consumption, as well as its capacity to track power consumption from software processes (on left-hand side) down to hardware components (on the right-hand side). This diagram can therefore be used to better understand how a distributed software system takes advantage of the underlying hardware components to execute a given workload. In particular, one can observe that 91 % Fig. 8. Monitoring of service-level power consumptions of the energy is spent by the CPU package, while the Con- sumer service drains 65 % of the energy consumption of the monitored scenario. Interestingly, one can observe that this energy consumption is evenly distributed across the 5 slaves, thus fully bene ting from the pod replication support of K U B E R N E T E S . The observed energy overhead is not due to the saturation of a single node, but rather seems to be distributed across the nodes, therefore highlighting an issue in the code of the Consumerservice. This issue is related to the acknowledgement of write requests by the C A S S A N D R A service, which prevents the C O N S U M E Rservice to process pending messages.

We believe that, thanks to S M A RTWAT T S , system admin- istrators and developers can collaborate on identifying energy hotspots in their deployment and adjusting the con guration accordingly.

VI. CO N C L U S I O N Power consumption is critical concern in modern computing infrastructures, from clusters to data centers. While the state of practice offers tools to monitor the power consumption at a coarse granularity ( e.g., nodes, sockets), the literature fails to propose generic power models, which can be used to estimate the power consumption of software artefacts.

In this paper, we therefore reported on a novel approach, named S M A RTWAT T S , to deliver per-container power esti- mations for PKG and DRAM components. In particular, we propose to support self-calibrating power models to estimate the PKG and DRAM power consumption of software contain- ers. Unlike static power models that are trained for a speci c workload, our power models leverage sequential learning principles to be adjusted online in order to match unexpected workload evolutions and thus maximize the accuracy of power estimations.

While we demonstrate this approach using Intel RAPL and the Linux's perf events interface, we strongly believe that it can be used as a solid basis and generalized to other architectures and system components. In particular, we are 9 cluster slave 5 slave 2 slave 1 master OS sensor Zookeeper Kafka OS sensor mongodb power meter OS OS sensor ... sensor Cassandra Consumer producer Zookeeper Kafka Cassandra Consumer Kafka Cassandra Consumer 02:0001:40 01:45 01:50 01:55 02:05 02:10 Timestamp0200040006000800010000 Kernel System Zookeeper Kafka Consumer Cassandra Fig. 9. Distribution of the energy consumption across nodes and resources working on the validation of our approach with AMD Ryzen architecture (including a support for RAPL).

Thanks to S M A RTWAT T S , system administrators and de- velopers can monitor the power consumption of individual containers and identify potential optimizations to apply in the distributed system they manage. Instead of addressing performance issues by adding more resources, we believe that S M A RT WAT T S can favorably contribute to increase the energy ef ciency of distributed software systems at large.

AC K N OW L E D G E M E N T The authors would like to thank Jo ¨ el Penhoat for his insightful feedbacks on this version of the paper.

RE F E R E N C E S [1]A. Noureddine, R. Rouvoy, and L. Seinturier, “Monitoring energy hotspots in software - Energy pro ling of software code,” Autom. Softw.

Eng. , 2015.

[2]D. C. Snowdon, E. L. Sueur, S. M. Petters, and G. Heiser, “Koala: a platform for OS-level power management,” in EuroSys. ACM, 2009, pp. 289–302.

[3]M. Colmant, R. Rouvoy, M. Kurpicz, A. Sobe, P. Felber, and L. Seinturier, “The Next 700 CPU Power Models,” Journal of Systems and Software , 2018. [Online]. Available: http://www.sciencedirect.com/ science/article/pii/S0164121218301377 [4]M. LeBeane, J. H. Ryoo, R. Panda, and L. K. John, “Wattwatcher: Fine-grained power estimation for emerging workloads,” in Computer Architecture and High Performance Computing (SBAC-PAD), 2015 27th International Symposium on , 2015.

[5]M. Colmant, M. Kurpicz, P. Felber, L. Huertas, R. Rouvoy, and A. Sobe, “Process-level Power Estimation in VM-based Systems,” in Proceedings of the 10th European Conference on Computer Systems , 2015.

[6]M. Rashti, G. Sabin, D. Vansickle, and B. Norris, “WattProf: A Flex- ible Platform for Fine-Grained HPC Power Pro ling,” in 2015 IEEE International Conference on Cluster Computing , 2015.

[7]R. Ge, X. Feng, S. Song, H.-C. Chang, D. Li, and K. Cameron, “Power- Pack: Energy Pro ling and Analysis of High-Performance Systems and Applications,” IEEE Transactions on Parallel and Distributed Systems , 2010.

[8]E. Rotem, A. Naveh, A. Ananthakrishnan, E. Weissmann, and D. Ra- jwan, “Power-Management Architecture of the Intel Microarchitecture Code-Named Sandy Bridge,” IEEE Micro, 2012.

[9]S. Desrochers, C. Paradis, and V. M. Weaver, “A validation of DRAM RAPL power measurements,” in Proceedings of the Second International Symposium on Memory Systems, MEMSYS 2016, Alexandria, VA, USA, October 3-6, 2016 , B. Jacob, Ed. ACM, 2016, pp. 455–470. [Online].

Available: http://doi.acm.org/10.1145/2989081.2989088 [10]F. Bellosa, “The Bene ts of Event: Driven Energy Accounting in Power- sensitive Systems,” in Proceedings of the 9th Workshop on ACM SIGOPS European Workshop: Beyond the PC: New Challenges for the Operating System , 2000.

[11]A. Kansal, F. Zhao, J. Liu, N. Kothari, and A. A. Bhattacharya, “Virtual Machine Power Metering and Provisioning,” in Proceedings of the 1st ACM Symposium on Cloud Computing , 2010.[12]J. C. McCullough, Y. Agarwal, J. Chandrashekar, S. Kuppuswamy, A. C.

Snoeren, and R. K. Gupta, “Evaluating the Effectiveness of Model- based Power Characterization,” in Proceedings of the USENIX Annual Technical Conference , 2011.

[13]D. Versick, I. Wassmann, and D. Tavangarian, “Power Consumption Estimation of CPU and Peripheral Components in Virtual Machines,” SIGAPP Appl. Comput. Rev. , 2013.

[14]W. Bircher and L. John, “Complete System Power Estimation: A Trickle- Down Approach Based on Performance Events,” in Proceedings of the IEEE International Symposium on Performance Analysis of Systems Software , ser. ISPASS '07, 2007.

[15]S. Rivoire, P. Ranganathan, and C. Kozyrakis, “A Comparison of High- level Full-system Power Models,” in Proceedings of the Conference on Power Aware Computing and Systems , 2008.

[16]R. Bertran, M. Gonzalez, X. Martorell, N. Navarro, and E. Ayguade, “Decomposable and Responsive Power Models for Multicore Proces- sors Using Performance Counters,” in Proceedings of the 24th ACM International Conference on Supercomputing , 2010.

[17]Y. Zhai, X. Zhang, S. Eranian, L. Tang, and J. Mars, “HaPPy: Hyperthread-aware Power Pro ling Dynamically,” in Proceedings of the USENIX Annual Technical Conference , 2014.

[18]R. Zamani and A. Afsahi, “A Study of Hardware Performance Moni- toring Counter Selection in Power Modeling of Computing Systems,” in Proceedings of the 2012 International Green Computing Conference , 2012.

[19]M. F. Dolz, J. Kunkel, K. Chasapis, and S. Catal ´ an, “An analytical methodology to derive power models based on hardware and software metrics,” Computer Science - Research and Development , 2015.

[20]C. Isci and M. Martonosi, “Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data,” in Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture , 2003.

[21]W. L. Bircher, M. Valluri, J. Law, and L. K. John, “Runtime identi - cation of microprocessor energy saving opportunities,” in Proceedings of the International Symposium on Low Power Electronics and Design , 2005.

[22]G. Contreras and M. Martonosi, “Power Prediction for Intel XScale® Processors Using Performance Monitoring Unit Events,” in Proceedings of the International Symposium on Low Power Electronics and Design , 2005.

[23]T. Li and L. K. John, “Run-time Modeling and Estimation of Operating System Power Consumption,” SIGMETRICS Perform. Eval. Rev. , 2003.

[24]H. Yang, Q. Zhao, Z. Luan, and D. Qian, “iMeter: An integrated fVM gpower model based on performance pro ling,” Future Generation Computer Systems , 2014.

[25]M. Y. Lim, A. Porter eld, and R. Fowler, “SoftPower: Fine-grain Power Estimations Using Performance Counters,” in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing , 2010.

[26]K. Shen, A. Shriraman, S. Dwarkadas, X. Zhang, and Z. Chen, “Power containers: An os facility for ne-grained power and energy management on multicore servers,” in Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems , ser. ASPLOS '13.

New York, NY, USA: ACM, 2013, pp. 65–76. [Online]. Available:

http://doi.acm.org/10.1145/2451116.2451124 [27]E. van der Kouwe, D. Andriesse, H. Bos, C. Giuffrida, and G. Heiser, “Benchmarking Crimes: An Emerging Threat in Systems Security,” CoRR , vol. abs/1801.02381, 2018.

10 [28]D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S.

Schreiber et al., “The NAS parallel benchmarks,” International Journal of High Performance Computing Applications , 1991.

[29]M. Kurpicz, A. Orgerie, and A. Sobe, “How much does a vm cost? energy-proportional accounting in vm-based environments,” in 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) , Feb 2016, pp. 651–658.

[30]G. Prekas, M. Primorac, A. Belay, C. Kozyrakis, and E. Bugnion, “Energy Proportionality and Workload Consolidation for Latency-critical Applications,” in Proceedings of the Sixth ACM Symposium on Cloud Computing , 2015.

[31]M. Colmant, P. Felber, R. Rouvoy, and L. Seinturier, “WattsKit: Software-De ned Power Monitoring of Distributed Systems,” in CC- Grid . IEEE Computer Society / ACM, 2017, pp. 514–523.

[32]R. Lupton and J. Allwood, “Hybrid sankey diagrams: Visual analysis of multidimensional data for understanding resource use,” Resources, Conservation and Recycling , vol. 124, pp. 141 – 151, 2017.

11