Commit 175cc70418a55f0e91b66121d6cbfd76afc6f804
1 parent
2758a2d6
final submission
Showing
2 changed files
with
8 additions
and
6 deletions
algorithm_implications.tex
| ... | ... | @@ -29,12 +29,12 @@ detect stable regions of clusters containing both memory and CPU settings. |
| 29 | 29 | |
| 30 | 30 | \item \textit{Offline Analysis:} Another approach that can be taken to reduce |
| 31 | 31 | the number of tuning events is offline analysis of the applications. An |
| 32 | -application can be profiled once offline to identify regions in which the | |
| 33 | -performance cluster is stable. The profiled information of the stable region | |
| 32 | +application can be profiled offline to identify regions in which the | |
| 33 | +performance cluster is stable. The profile information of the stable region | |
| 34 | 34 | lengths, positions, and available settings can then be used at run time to enable |
| 35 | 35 | the system to predict how long it can go without tuning. Algorithms can also |
| 36 | 36 | extend the usage of the profiled information to new applications that may have |
| 37 | -phases that match with already profiled data. Previous work has already proposed | |
| 37 | +phases that match with existing profiled data. Previous work has already proposed | |
| 38 | 38 | using offline analysis methods to detect application |
| 39 | 39 | phases~\cite{Lau:2006:CGO:phase}, which would be directly applicable here in our |
| 40 | 40 | system. | ... | ... |
performance_clusters.tex
| ... | ... | @@ -213,7 +213,7 @@ available settings increase with inefficiency increasing the average length of |
| 213 | 213 | stable regions. At an inefficiency budget of 1.6, the average length of a stable region |
| 214 | 214 | increases drastically as shown in Figure~\ref{box-lengths}(b), which requires much less |
| 215 | 215 | transitions with 1\% cluster threshold and no transitions with higher cluster thresholds of 3\% |
| 216 | -and 5\%. Note that there is only one point on the box plot for 3\% and 5\% | |
| 216 | +and 5\%. Note that there is only one point on the box plot of bzip2 for 3\% and 5\% | |
| 217 | 217 | cluster thresholds at inefficiency of 1.6, because the benchmark is covered entirely by only one region. |
| 218 | 218 | %and therefore no distribution is available. |
| 219 | 219 | However, \textit{gobmk} has rapidly changing phases and |
| ... | ... | @@ -339,8 +339,9 @@ available to make better energy-performance trade-offs. Therefore average number |
| 339 | 339 | of samples for which one setting can be chosen decreases. For example, with 70 |
| 340 | 340 | frequency settings sample 7 through sample 10 can always run at CPU frequency of 900MHz |
| 341 | 341 | and memory frequency of 300MHz. With 496 frequency settings, sample 7 |
| 342 | -runs at one setting, sample 8-9 runs at another setting and sample 10 runs at a | |
| 343 | -different setting due to the availability of more (and better) choices. | |
| 342 | +runs at 900MHz, sample 8-9 run at 950MHz and sample 10 runs at 980MHz of CPU | |
| 343 | +frequency. Fine frequency steps increase the availability of more (and better) | |
| 344 | +choices, resulting in smaller stable region lengths. | |
| 344 | 345 | %\XXXnote{sounds wordy -Dave}. |
| 345 | 346 | In our system, we observed only a small improvement in performance (\textless |
| 346 | 347 | 1\%) with an increased number of frequency steps when |
| ... | ... | @@ -359,6 +360,7 @@ critical in deciding the correct size of the search space. |
| 359 | 360 | Figure~(a) plots performance clusters collected using 100MHz of frequency step for both CPU and |
| 360 | 361 | memory. Figure~(b) plots |
| 361 | 362 | performance clusters collected using frequency steps of 30MHz for CPU and 40MHz |
| 363 | +for memory. We simulate frequency range of 100MHz-1000MHz for CPU and 200MHz-800MHz | |
| 362 | 364 | for memory.} |
| 363 | 365 | \label{sensitivity} |
| 364 | 366 | \end{figure} | ... | ... |