diff --git a/inefficiency.tex b/inefficiency.tex index cfa6fe3..597e3ed 100644 --- a/inefficiency.tex +++ b/inefficiency.tex @@ -151,14 +151,36 @@ We propose two methods for computing $E_{min}$: \end{itemize} -We are working towards designing efficient energy prediction models for CPU, -memory and network components. -% -Our models consider cross-component interactions on performance and energy -consumption. -% -In this work we demonstrate how to use inefficiency, deferring predicting and -optimizing $E_{min}$ to future work. +%We are working towards designing efficient energy prediction models for CPU, +%memory and network components. +% +%Our models consider cross-component interactions on performance and energy +%consumption. +% +%%%%%%%% MODEL %%%%%%%%%% +We designed efficient models to predict performance and energy consumption of +CPU and memory at various voltage and frequency settings for a given +application. We plan on using these models to estimate $E_{min}$ of a given set +of instructions. +%We envision a system capable of scaling voltage and frequency of CPU and only +%frequency of DRAM. +Our models consider cross-component interactions on performance and energy. +Performance model uses hardware performance counters to measure amount of time +each component is $Busy$ completing the work, $Idle$ stalled on the other +component and $Waiting$ for more work. We designed systematic methodology to +scale these states to estimate execution time of a given workload at different +voltage and frequency settings. In our model, the $Idle$ time of one component +depends on the settings of the second component. The $Busy$ time of each +component scales with it's own frequency. However, part of the $Busy$ time that +overlaps with the other component is constrained by the slowest component. + +We combine predicted performance with the power models of CPU and memory +described in Section~\ref{subsec-energy-models} to estimate energy consumption +of CPU and memory. Our model has average prediction error of 4\% across SPEC +CPU2006 benchmarks with highest error of 10\% except for $gobmk (18\%)$ and $lbm +(24\%)$. In this work we demonstrate how to use inefficiency, deferring +optimization of $E_{min}$ prediction to future work. +%%%%% END OF MODEL %%%%%% \subsection{Managing Inefficiency} % @@ -183,4 +205,4 @@ We leave building some of these algorithms into a system as future work. % In this paper, we characterize the optimal performance point under different inefficiency constraints and illustrate that the stability of these points -have implications for future algorithms. +has implications for future algorithms. diff --git a/introduction.tex b/introduction.tex index 018bfb4..1de7cb4 100644 --- a/introduction.tex +++ b/introduction.tex @@ -32,7 +32,7 @@ energy constraints. Our work represents two advances over previous efforts. % First, while previous works have explored energy minimizations using DVFS -under performance constraints focusing on reducing slack, we are the first to +under performance constraints focusing on reducing slack~\cite{deng2012coscale}, we are the first to study the potential DVFS settings under an energy constraint. % Specifying performance constraints for servers is appropriate, since they are @@ -86,7 +86,7 @@ management algorithms. % \end{enumerate} -We use the \texttt{gem5} simulator, the Android smartphone platform and Linux +We use the \texttt{Gem5} simulator, the Android smartphone platform and Linux kernel, and an empirical power model to (1) measure the inefficiency of several applications for a wide range of frequency settings, (2) compute performance clusters, and (3) study how they evolve. diff --git a/optimal_performance.tex b/optimal_performance.tex index 9dfb4c8..ccb7819 100644 --- a/optimal_performance.tex +++ b/optimal_performance.tex @@ -61,7 +61,7 @@ simulation noise, the algorithm selects the settings with highest CPU (first) and memory frequency as this setting is bound to have highest performance among the other possibilities. -Figure~\ref{gobmk-optimal} plots the optimal settings for Gobmk for all +Figure~\ref{gobmk-optimal} plots the optimal settings for $gobmk$ for all benchmark samples (each of length 10 million instructions) across multiple inefficiency constraints. At low inefficiencies, the optimal settings follow the trends in CPI (cycles per instruction) and MPKI (misses per thousand diff --git a/paper.tex b/paper.tex index 5a8df8e..13e3ac9 100644 --- a/paper.tex +++ b/paper.tex @@ -46,7 +46,17 @@ Geoffrey Challen, Mark Hempstead} } \else -\author{\IEEEauthorblockN{Paper \thepapernumber}\vspace*{-0.1in}} +%\author{\IEEEauthorblockN{Paper \thepapernumber}\vspace*{-0.1in}} + +\author{% + \IEEEauthorblockN{Rizwana Begum, David Werner and Mark Hempstead} + \IEEEauthorblockA{Drexel University\\ + {\rm \tt{\{rb639,daw77,mhempstead\}@drexel.edu}}} + \and + \IEEEauthorblockN{Guru Prasad and Geoffrey Challen} + \IEEEauthorblockA{University at Buffalo\\ + {\rm \tt \{gurupras,challen\}@buffalo.edu}} +} \hypersetup{ pdfinfo={ diff --git a/performance_clusters.tex b/performance_clusters.tex index 7b58b5f..98918f3 100644 --- a/performance_clusters.tex +++ b/performance_clusters.tex @@ -178,7 +178,7 @@ constant. %Note that Our algorithm is not practical for real systems, it knows the characteristics of the future samples and their performance clusters in the beginning of a stable -region.% (and therefore is impractical to implement in real systems). +region. % (and therefore is impractical to implement in real systems). We are currently designing algorithms that are capable of tuning the system while running the application as future work. In Section~\ref{sec-algo-implications}, we diff --git a/system_methodology.tex b/system_methodology.tex index 77d2adf..cdcc8e6 100644 --- a/system_methodology.tex +++ b/system_methodology.tex @@ -71,6 +71,7 @@ and degrade performance simultaneously.} \subsection{Energy Models} +\label{subsec-energy-models} We developed energy models for the CPU and DRAM for our studies. Gem5 comes with the energy models for various DRAM chipsets. The DRAMPower~\cite{drampower-tool} model is integrated into Gem5 and computes the @@ -79,7 +80,7 @@ Gem5 lacks a model for CPU energy consumption. We developed a processor power model based on empirical measurements of a PandaBoard~\cite{pandaboard-url} evaluation board. The board includes a OMAP4430~chipset with a Cortex~A9 processor; this chipset is used in the mobile platform we want to emulate, the -Samsung Nexus S. We ran microbenchmarks designed to stress the Pandaboard to +Samsung Nexus S. We ran microbenchmarks designed to stress the PandaBoard to its full utilization and measured power consumed using an Agilent~34411A multimeter. Because of the limitations of the platform, we could only measure peak dynamic power. Therefore to model different voltage levels we scaled it @@ -119,7 +120,7 @@ purposes, we have configured the CPU clock domain frequency to have a range of For the memory system, we simulated a LPDDR3 single channel, one rank memory access using an open-page policy. Timing and current parameters for LPDDR3 are configured as specified in -micron data sheet~\cite{micronspec-url}. Memory clock domain is configured with a +Micron data sheet~\cite{micronspec-url}. Memory clock domain is configured with a frequency range of 200MHz to 800MHz. As mentioned earlier, we did not scale memory voltage. The power supplies---VDD and VDD2---for LPDDR3 are fixed at 1.8V and 1.2V respectively. @@ -142,7 +143,7 @@ finer frequency steps was difficult as it would have resulted in more than hours. We collected samples of a fixed amount of work so that each sample would -represent the same work even across different frequencies. In gem5, we collectd +represent the same work even across different frequencies. In Gem5, we collected performance and energy consumption data every 10~million user mode instructions. %this fixed sample of work makes .