Commit 5ae0e5d5998b8b27833e594d0a83615a3a99552f

Authored by Rizwana Begum
1 parent 0e5dc2a4

more edits

abstract.tex
... ... @@ -2,13 +2,13 @@
2 2  
3 3 Battery lifetime continues to be a top complaint about smartphones. Dynamic
4 4 voltage and frequency scaling (DVFS) has existed for mobile device CPUs for some
5   -time, and provides a tradeoff between energy and performance. DVFS is beginning
6   -to be applied to memory as well to make more energy-performance tradeoffs
7   -possible.
  5 +time, and provides a tradeoff between energy and performance. Dynamic frequency
  6 +scaling is beginning to be applied to memory as well to make more
  7 +energy-performance tradeoffs possible.
8 8  
9   -We present the first characterization of the behavior and optimal frequency
10   -settings of workloads running both under \textit{energy constraints} and on
11   -systems with \textit{both} CPU and memory DVFS, an environment representative
  9 +We present the first characterization of the behavior of the optimal frequency
  10 +settings of workloads running both, under \textit{energy constraints} and on
  11 +systems capable of CPU DVFS and memory DFS, an environment representative
12 12 of next-generation mobile devices. Our results show that continuously using
13 13 the optimal frequency settings results in a large number of frequency
14 14 transitions which end up hurting performance. However, by permitting a small
... ...
inefficiency.tex
... ... @@ -7,7 +7,7 @@ management algorithms for mobile systems should optimize performance under
7 7 \textit{energy constraints}.
8 8 %
9 9 While several researchers have proposed algorithms that work under energy
10   -constraints, these approaches require that the constraints are expressed in
  10 +constraints, these approaches require that the constraints be expressed in
11 11 terms of absolute energy~\cite{mobiheld09-cinder,ecosystem}.
12 12 %
13 13 For example, rate-limiting approaches take the maximum energy that can be
... ... @@ -24,7 +24,7 @@ Energy consumption varies across applications, devices, and operating
24 24 conditions, making it impractical to choose an absolute energy budget.
25 25 %
26 26 Also, applying absolute energy constraints may slow down applications to the
27   -point that total energy consumption \textit{increases} and
  27 +point where total energy consumption \textit{increases} and
28 28 performance is degraded.
29 29  
30 30 Other metrics that incorporate energy take the form of $Energy * Delay^n$.
... ... @@ -34,7 +34,7 @@ We argue that while the energy-delay product can be used as a
34 34 \textit{constraint} to specify how much energy can be used to improve
35 35 performance.
36 36 %
37   -A effective constraint should be (1) relative to the applications inherent
  37 +An effective constraint should be (1) relative to the applications inherent
38 38 energy needs and (2) independent of applications and devices.
39 39 %
40 40 Because it uses absolute energy, the energy-delay product meets neither of
... ... @@ -57,7 +57,7 @@ inefficiency: $I = \frac{E}{E_{min}}$.
57 57 %
58 58 An \textit{inefficiency} of $1$ represents an application's most efficient
59 59 execution, while $1.5$ indicates that the application consumed $50\%$ more
60   -energy that its most efficient execution.
  60 +energy than its most efficient execution.
61 61 %
62 62 Inefficiency is independent of workloads and devices and avoids the problems
63 63 inherent to absolute energy constraints.
... ... @@ -143,7 +143,7 @@ We propose two methods for computing $E_{min}$:
143 143  
144 144 \item \textbf{Predicting and learning:} The overhead of the $E_{min}$ computation
145 145 can be further reduced by predicting $E_{min}$ based on previous observations
146   - and learning continuously.
  146 + and by continuous learning.
147 147 %
148 148 A variety of learning based approaches~\cite{li2009machine} have
149 149 been proposed in the past to estimate various metrics and application phases
... ...
inefficiency_speedup.tex
... ... @@ -7,7 +7,7 @@ the past
7 7 %researchers have used it
8 8 to make power performance trade-offs. To the best of our knowledge, prior
9 9 work has not studied the system level energy-performance trade-offs of combined
10   -CPU and memory DVFS.
  10 +CPU and memory frequency scaling.
11 11 %considering the interaction between CPU and memory
12 12 %frequency scaling.
13 13 We take a first step and explore these trade-offs and show that incorrect
... ... @@ -71,8 +71,8 @@ inefficiency budget as needed c) and deliver the best performance.
71 71 %\end{enumerate}
72 72  
73 73 Consequently, like other constraints used by algorithms such as performance, power and absolute energy, inefficiency
74   -also allows energy management algorithms to waste system energy. We suggest
75   -that, even though inefficiency doesn't completely eliminate the problem of
  74 +also allows energy management algorithms to waste system energy. We argue
  75 +that, although inefficiency doesn't completely eliminate the problem of
76 76 wasting energy, it mitigates the problem. For example, rate limiting approaches
77 77 waste energy as energy budget is specified for a given amount of time interval
78 78 and doesn't require a specific amount of work to be done within that budget.
... ...
introduction.tex
... ... @@ -30,14 +30,14 @@ To better understand these systems, we characterize how the most performant
30 30 CPU and memory frequency settings change for multiple workloads under various
31 31 energy constraints.
32 32  
33   -Our work represents two advances over previous efforts.
  33 +Our work presents two advances over previous efforts.
34 34 %
35 35 First, while previous works have explored energy minimizations using DVFS
36 36 under performance constraints focusing on reducing slack, we are the first to
37 37 study the potential DVFS settings under an energy constraint.
38 38 %
39 39 Specifying performance constraints for servers is appropriate, since they are
40   -both wall-powered and have terms of service that must be met.
  40 +both wall-powered and have quality of service constraints that must be met.
41 41 %
42 42 Therefore, they do not have to and cannot afford to sacrifice too much
43 43 performance.
... ... @@ -53,7 +53,7 @@ energy constraints and it is both application and device independent---unlike
53 53 existing metrics.
54 54  
55 55 Second, we are the first to characterize optimal frequency settings for
56   -systems providing both CPU and memory DVFS.
  56 +systems providing CPU DVFS and memory DFS.
57 57 %
58 58 We find that closely tracking the optimal settings during execution produces
59 59 many transitions and large frequency transition overhead.
... ... @@ -65,7 +65,7 @@ We characterize the relationship between the amount of performance loss and
65 65 the rate of tuning for several benchmarks, and introduce the concepts of
66 66 \textit{performance clusters} and \textit{stable regions} to aid the process.
67 67  
68   -We make following four contributions:
  68 +We make the following contributions:
69 69 %
70 70 \begin{enumerate}
71 71 %
... ... @@ -74,7 +74,7 @@ system to express the amount of extra energy that can be used to improve
74 74 performance.
75 75 %
76 76 \item We study the energy-performance trade-offs of systems that are capable
77   -of both CPU and memory DVFS for multiple applications. We show that poor
  77 +of CPU DVFS and memory DFS for multiple applications. We show that poor
78 78 frequency selection can hurt both performance and energy consumption.
79 79 %
80 80 \item We characterize the optimal frequency settings for multiple
... ... @@ -87,7 +87,7 @@ management algorithms.
87 87 %
88 88 \end{enumerate}
89 89  
90   -We use the \texttt{gem5} simulator, the Android smartphone platform and Linux
  90 +We use the \texttt{Gem5} simulator, the Android smartphone platform and Linux
91 91 kernel, and an empirical power model to (1) measure the inefficiency of
92 92 several applications for a wide range of frequency settings, (2) compute
93 93 performance clusters, and (3) study how performance clusters evolve.
... ... @@ -112,4 +112,4 @@ studies their characteristics.
112 112 %
113 113 Section~\ref{sec-algo-implications} presents implications of
114 114 using performance clusters on energy-management algorithms, and
115   -Section~\ref{sec-conclusions} concludes.
  115 +Section~\ref{sec-conclusions} summarizes and concludes the paper.
... ...
optimal_performance.tex
... ... @@ -5,7 +5,7 @@
5 5 \centering
6 6 \includegraphics[width=\columnwidth]{figures/plots/496/2d_best_point_variation_mulineff/gobmk_2d_stable_point_mulineff_cpi_mpki.pdf}
7 7 \vspace{-0.5em}
8   -\caption{\textbf{Optimal Performance Point for \text{Gobmk} Across Inefficiencies:} At
  8 +\caption{\textbf{Optimal Performance Point for \textit{gobmk} Across Inefficiencies:} At
9 9 low inefficiency budgets, the optimal frequency settings follow CPI of the
10 10 application, and select high memory frequencies for memory intensive phases. % with
11 11 %high CPI.
... ... @@ -36,7 +36,7 @@ inefficiency budget is a function of workload.}
36 36 \end{subfigure}%
37 37 \vspace{0.5em}
38 38 \caption{\textbf{Performance Clusters of \textit{milc.}}
39   -\textit{Milc} is CPU intensive to a large extent with some memory intensive
  39 +\textit{milc} is CPU intensive to a large extent with some memory intensive
40 40 phases. At higher thresholds, while CPU frequency is tightly bound, performance
41 41 clusters cover a wide range of memory settings due to small performance
42 42 difference across these frequencies. }
... ... @@ -61,7 +61,7 @@ simulation noise, the algorithm selects the settings with highest CPU (first)
61 61 and then memory frequency as this setting is bound to have highest performance among
62 62 the other possibilities.
63 63  
64   -Figure~\ref{gobmk-optimal} plots the optimal settings for $gobmk$ for all
  64 +Figure~\ref{gobmk-optimal} plots the optimal settings for \textit{gobmk} for all
65 65 benchmark samples (each of length 10~M instructions) across multiple
66 66 inefficiency constraints. At low inefficiencies, the optimal settings follow
67 67 the trends in CPI (cycles per instruction) and MPKI (misses per thousand
... ...
performance_clusters.tex
... ... @@ -4,7 +4,7 @@
4 4 \centering
5 5 \includegraphics[width=\columnwidth]{./figures/plots/496/stable_line_plots/lbm_stable_lineplot_annotated_5.pdf}
6 6 \vspace{-0.5em}
7   -\caption{\textbf{Stable Regions and Transitions for \textit{Lbm} with
  7 +\caption{\textbf{Stable Regions and Transitions for \textit{lbm} with
8 8 Threshold of 5\% and Inefficiency Budget of 1.3:} Solid lines represent the
9 9 stable regions and vertical dashed lines mark the transitions made by
10 10 \textit{lbm}.}
... ...
system_methodology.tex
... ... @@ -25,7 +25,7 @@ performance for energy savings.
25 25 %voltage could result in data corruption since the memory array itself is
26 26 %asynchronous.
27 27 As no current hardware systems support memory frequency scaling,
28   -we resort to Gem5~\cite{Binkert:gem5}, a cycle-accurate full system simulator
  28 +we resort to \texttt{Gem5}~\cite{Binkert:gem5}, a cycle-accurate full system simulator
29 29 %as a platform
30 30 to perform our studies.
31 31  
... ... @@ -34,21 +34,21 @@ to perform our studies.
34 34 \centering
35 35 \includegraphics[width=0.75\columnwidth]{./figures/plots/systemBlockDiagram.pdf}
36 36 \caption{\textbf{System Block Diagram}: Blocks that are newly added or
37   - significantly modified from Gem5 origin implementation are shaded.}
  37 + significantly modified from \texttt{Gem5} origin implementation are shaded.}
38 38 \label{fig-system-block-diag}
39 39 \end{figure}
40 40  
41 41 %We envision a system that consists of a CPU capable of tuning its voltage and
42 42 %frequency and memory that supports frequency scaling.
43   -Current Gem5 versions provide the infrastructure necessary to change CPU
44   -frequency and voltage; we extended Gem5 DVFS to incorporate memory frequency
45   -scaling. As shown in Figure~\ref{fig-system-block-diag}, Gem5 provides a DVFS
  43 +Current \texttt{Gem5} versions provide the infrastructure necessary to change CPU
  44 +frequency and voltage; we extended \texttt{Gem5} DVFS to incorporate memory frequency
  45 +scaling. As shown in Figure~\ref{fig-system-block-diag}, \texttt{Gem5} provides a DVFS
46 46 controller device that provides an interface to control frequency by the OS at
47 47 runtime. We developed a memory frequency governor similar to existing Linux CPU
48 48 frequency governors. Timing and current parameters of DRAM are scaled with its
49 49 frequency as described in the technical note from Micron~\cite{micronpower-TN-url}.
50 50 %that are capable of tuning memory frequency at runtime.
51   -The blocks that we added or significantly modified from Gem5's original
  51 +The blocks that we added or significantly modified from \texttt{Gem5}'s original
52 52 implementation are shaded in Figure~\ref{fig-system-block-diag}.
53 53  
54 54 \begin{figure*}[t]
... ... @@ -75,15 +75,15 @@ and degrade performance simultaneously.}
75 75  
76 76 \subsection{Energy Models}
77 77 \label{subsec-energy-models}
78   -We developed energy models for the CPU and DRAM for our studies. Gem5 comes
  78 +We developed energy models for the CPU and DRAM for our studies. \texttt{Gem5} comes
79 79 with the energy models for various DRAM chipsets. The
80   -DRAMPower~\cite{drampower-tool} model is integrated into Gem5 and computes the
  80 +DRAMPower~\cite{drampower-tool} model is integrated into \texttt{Gem5} and computes the
81 81 memory energy consumption periodically during the benchmark execution. However,
82   -Gem5 lacks a model for CPU energy consumption. We developed a processor power
  82 +\texttt{Gem5} lacks a model for CPU energy consumption. We developed a processor power
83 83 model based on empirical measurements of a PandaBoard~\cite{pandaboard-url}
84 84 evaluation board. The board includes a OMAP4430~chipset with a Cortex~A9
85 85 processor; this chipset is used in the mobile platform we want to emulate, the
86   -Samsung Nexus S. We ran microbenchmarks designed to stress the PandaBoard to
  86 +Galaxy Nexus S. We ran microbenchmarks designed to stress the PandaBoard to
87 87 its full utilization and measured power consumed using an Agilent~34411A
88 88 multimeter. Because of the limitations of the platform, we could only measure
89 89 peak dynamic power. Therefore, to model different voltage levels we scaled it
... ... @@ -97,7 +97,7 @@ processor is not computing, but unlike leakage power, background power scales
97 97 with clock frequency. We measure background power by calculating the
98 98 difference between the CPU power consumption in its power on idle state and
99 99 deep sleep mode (not clocked). Because background power is clocked, it is
100   -scaled in a similar manner to dynamic power. Leakage power comprises up to
  100 +scaled in a similar manner to dynamic power. Leakage power comprises up to
101 101 30\% of microprocessor peak power consumption~\cite{power7} and is linearly
102 102 proportional to supply voltage~\cite{leakage-islped02}.
103 103  
... ... @@ -109,8 +109,8 @@ proportional to supply voltage~\cite{leakage-islped02}.
109 109  
110 110 \subsection{Experimental Methodology}
111 111 Our simulation infrastructure is based on Android~4.1.1 ``Jelly Bean'' run on
112   -the Gem5 full system simulator. We use default core configuration provided by
113   -Gem5 in revision 10585, that is designed to reflect ARM Cortex-A15 processor
  112 +the \texttt{Gem5} full system simulator. We use default core configuration provided by
  113 +\texttt{Gem5} in revision 10585, that is designed to reflect ARM Cortex-A15 processor
114 114 with L1 cache size of 64~KB with access latency of 2 core cycles and a unified
115 115 L2 cache of size 2~MB with hit latency of 12 core cycles. The CPU and caches
116 116 operate under the same clock domain. For our purposes, we have configured the
... ... @@ -147,12 +147,12 @@ benchmarks that have interesting and unique phases.
147 147 %hours.
148 148  
149 149 We collected samples of a fixed amount of work so that each sample would
150   -represent the same work even across different frequencies. In Gem5, we collected
  150 +represent the same work even across different frequencies. In \texttt{Gem5}, we collected
151 151 performance and energy consumption data every 10~million user mode
152 152 instructions.
153 153 %this fixed sample of work makes .
154 154 %By collecting data for a fixed amount of work (instructions) we are able to study frequency scaling for workloads; the alternative sampling in time .
155   -Gem5 provides a mechanism to distinguish between user mode and
  155 +\texttt{Gem5} provides a mechanism to distinguish between user mode and
156 156 kernel mode instructions. We used this feature to remove periodic OS traffic and enable a fair comparison
157 157 across simulations of different CPU and memory frequencies. We used the collected
158 158 performance and energy data to study the impact of workload dynamics on the
... ... @@ -162,7 +162,7 @@ a given inefficiency budget. Note that, all our studies are performed using
162 162 performance or energy. The interplay of performance and energy consumption of
163 163 CPU and memory frequency scaling is complex as pointed by
164 164 CoScale~\cite{deng2012coscale}. In the next Section, we measure and characterize
165   -the larger space of all system level performance and energy trade-offs
  165 +the larger space of system level performance and energy trade-offs
166 166 of various CPU and memory frequency settings.
167 167  
168 168 %Although individual energy-performance trade-offs of DVFS for CPU and
... ...