Commit 175cc70418a55f0e91b66121d6cbfd76afc6f804

Authored by Rizwana Begum
1 parent 2758a2d6

final submission

algorithm_implications.tex
@@ -29,12 +29,12 @@ detect stable regions of clusters containing both memory and CPU settings. @@ -29,12 +29,12 @@ detect stable regions of clusters containing both memory and CPU settings.
29 29
30 \item \textit{Offline Analysis:} Another approach that can be taken to reduce 30 \item \textit{Offline Analysis:} Another approach that can be taken to reduce
31 the number of tuning events is offline analysis of the applications. An 31 the number of tuning events is offline analysis of the applications. An
32 -application can be profiled once offline to identify regions in which the  
33 -performance cluster is stable. The profiled information of the stable region 32 +application can be profiled offline to identify regions in which the
  33 +performance cluster is stable. The profile information of the stable region
34 lengths, positions, and available settings can then be used at run time to enable 34 lengths, positions, and available settings can then be used at run time to enable
35 the system to predict how long it can go without tuning. Algorithms can also 35 the system to predict how long it can go without tuning. Algorithms can also
36 extend the usage of the profiled information to new applications that may have 36 extend the usage of the profiled information to new applications that may have
37 -phases that match with already profiled data. Previous work has already proposed 37 +phases that match with existing profiled data. Previous work has already proposed
38 using offline analysis methods to detect application 38 using offline analysis methods to detect application
39 phases~\cite{Lau:2006:CGO:phase}, which would be directly applicable here in our 39 phases~\cite{Lau:2006:CGO:phase}, which would be directly applicable here in our
40 system. 40 system.
performance_clusters.tex
@@ -213,7 +213,7 @@ available settings increase with inefficiency increasing the average length of @@ -213,7 +213,7 @@ available settings increase with inefficiency increasing the average length of
213 stable regions. At an inefficiency budget of 1.6, the average length of a stable region 213 stable regions. At an inefficiency budget of 1.6, the average length of a stable region
214 increases drastically as shown in Figure~\ref{box-lengths}(b), which requires much less 214 increases drastically as shown in Figure~\ref{box-lengths}(b), which requires much less
215 transitions with 1\% cluster threshold and no transitions with higher cluster thresholds of 3\% 215 transitions with 1\% cluster threshold and no transitions with higher cluster thresholds of 3\%
216 -and 5\%. Note that there is only one point on the box plot for 3\% and 5\% 216 +and 5\%. Note that there is only one point on the box plot of bzip2 for 3\% and 5\%
217 cluster thresholds at inefficiency of 1.6, because the benchmark is covered entirely by only one region. 217 cluster thresholds at inefficiency of 1.6, because the benchmark is covered entirely by only one region.
218 %and therefore no distribution is available. 218 %and therefore no distribution is available.
219 However, \textit{gobmk} has rapidly changing phases and 219 However, \textit{gobmk} has rapidly changing phases and
@@ -339,8 +339,9 @@ available to make better energy-performance trade-offs. Therefore average number @@ -339,8 +339,9 @@ available to make better energy-performance trade-offs. Therefore average number
339 of samples for which one setting can be chosen decreases. For example, with 70 339 of samples for which one setting can be chosen decreases. For example, with 70
340 frequency settings sample 7 through sample 10 can always run at CPU frequency of 900MHz 340 frequency settings sample 7 through sample 10 can always run at CPU frequency of 900MHz
341 and memory frequency of 300MHz. With 496 frequency settings, sample 7 341 and memory frequency of 300MHz. With 496 frequency settings, sample 7
342 -runs at one setting, sample 8-9 runs at another setting and sample 10 runs at a  
343 -different setting due to the availability of more (and better) choices. 342 +runs at 900MHz, sample 8-9 run at 950MHz and sample 10 runs at 980MHz of CPU
  343 +frequency. Fine frequency steps increase the availability of more (and better)
  344 +choices, resulting in smaller stable region lengths.
344 %\XXXnote{sounds wordy -Dave}. 345 %\XXXnote{sounds wordy -Dave}.
345 In our system, we observed only a small improvement in performance (\textless 346 In our system, we observed only a small improvement in performance (\textless
346 1\%) with an increased number of frequency steps when 347 1\%) with an increased number of frequency steps when
@@ -359,6 +360,7 @@ critical in deciding the correct size of the search space. @@ -359,6 +360,7 @@ critical in deciding the correct size of the search space.
359 Figure~(a) plots performance clusters collected using 100MHz of frequency step for both CPU and 360 Figure~(a) plots performance clusters collected using 100MHz of frequency step for both CPU and
360 memory. Figure~(b) plots 361 memory. Figure~(b) plots
361 performance clusters collected using frequency steps of 30MHz for CPU and 40MHz 362 performance clusters collected using frequency steps of 30MHz for CPU and 40MHz
  363 +for memory. We simulate frequency range of 100MHz-1000MHz for CPU and 200MHz-800MHz
362 for memory.} 364 for memory.}
363 \label{sensitivity} 365 \label{sensitivity}
364 \end{figure} 366 \end{figure}