Commit 175cc70418a55f0e91b66121d6cbfd76afc6f804
1 parent
2758a2d6
final submission
Showing
2 changed files
with
8 additions
and
6 deletions
algorithm_implications.tex
| @@ -29,12 +29,12 @@ detect stable regions of clusters containing both memory and CPU settings. | @@ -29,12 +29,12 @@ detect stable regions of clusters containing both memory and CPU settings. | ||
| 29 | 29 | ||
| 30 | \item \textit{Offline Analysis:} Another approach that can be taken to reduce | 30 | \item \textit{Offline Analysis:} Another approach that can be taken to reduce |
| 31 | the number of tuning events is offline analysis of the applications. An | 31 | the number of tuning events is offline analysis of the applications. An |
| 32 | -application can be profiled once offline to identify regions in which the | ||
| 33 | -performance cluster is stable. The profiled information of the stable region | 32 | +application can be profiled offline to identify regions in which the |
| 33 | +performance cluster is stable. The profile information of the stable region | ||
| 34 | lengths, positions, and available settings can then be used at run time to enable | 34 | lengths, positions, and available settings can then be used at run time to enable |
| 35 | the system to predict how long it can go without tuning. Algorithms can also | 35 | the system to predict how long it can go without tuning. Algorithms can also |
| 36 | extend the usage of the profiled information to new applications that may have | 36 | extend the usage of the profiled information to new applications that may have |
| 37 | -phases that match with already profiled data. Previous work has already proposed | 37 | +phases that match with existing profiled data. Previous work has already proposed |
| 38 | using offline analysis methods to detect application | 38 | using offline analysis methods to detect application |
| 39 | phases~\cite{Lau:2006:CGO:phase}, which would be directly applicable here in our | 39 | phases~\cite{Lau:2006:CGO:phase}, which would be directly applicable here in our |
| 40 | system. | 40 | system. |
performance_clusters.tex
| @@ -213,7 +213,7 @@ available settings increase with inefficiency increasing the average length of | @@ -213,7 +213,7 @@ available settings increase with inefficiency increasing the average length of | ||
| 213 | stable regions. At an inefficiency budget of 1.6, the average length of a stable region | 213 | stable regions. At an inefficiency budget of 1.6, the average length of a stable region |
| 214 | increases drastically as shown in Figure~\ref{box-lengths}(b), which requires much less | 214 | increases drastically as shown in Figure~\ref{box-lengths}(b), which requires much less |
| 215 | transitions with 1\% cluster threshold and no transitions with higher cluster thresholds of 3\% | 215 | transitions with 1\% cluster threshold and no transitions with higher cluster thresholds of 3\% |
| 216 | -and 5\%. Note that there is only one point on the box plot for 3\% and 5\% | 216 | +and 5\%. Note that there is only one point on the box plot of bzip2 for 3\% and 5\% |
| 217 | cluster thresholds at inefficiency of 1.6, because the benchmark is covered entirely by only one region. | 217 | cluster thresholds at inefficiency of 1.6, because the benchmark is covered entirely by only one region. |
| 218 | %and therefore no distribution is available. | 218 | %and therefore no distribution is available. |
| 219 | However, \textit{gobmk} has rapidly changing phases and | 219 | However, \textit{gobmk} has rapidly changing phases and |
| @@ -339,8 +339,9 @@ available to make better energy-performance trade-offs. Therefore average number | @@ -339,8 +339,9 @@ available to make better energy-performance trade-offs. Therefore average number | ||
| 339 | of samples for which one setting can be chosen decreases. For example, with 70 | 339 | of samples for which one setting can be chosen decreases. For example, with 70 |
| 340 | frequency settings sample 7 through sample 10 can always run at CPU frequency of 900MHz | 340 | frequency settings sample 7 through sample 10 can always run at CPU frequency of 900MHz |
| 341 | and memory frequency of 300MHz. With 496 frequency settings, sample 7 | 341 | and memory frequency of 300MHz. With 496 frequency settings, sample 7 |
| 342 | -runs at one setting, sample 8-9 runs at another setting and sample 10 runs at a | ||
| 343 | -different setting due to the availability of more (and better) choices. | 342 | +runs at 900MHz, sample 8-9 run at 950MHz and sample 10 runs at 980MHz of CPU |
| 343 | +frequency. Fine frequency steps increase the availability of more (and better) | ||
| 344 | +choices, resulting in smaller stable region lengths. | ||
| 344 | %\XXXnote{sounds wordy -Dave}. | 345 | %\XXXnote{sounds wordy -Dave}. |
| 345 | In our system, we observed only a small improvement in performance (\textless | 346 | In our system, we observed only a small improvement in performance (\textless |
| 346 | 1\%) with an increased number of frequency steps when | 347 | 1\%) with an increased number of frequency steps when |
| @@ -359,6 +360,7 @@ critical in deciding the correct size of the search space. | @@ -359,6 +360,7 @@ critical in deciding the correct size of the search space. | ||
| 359 | Figure~(a) plots performance clusters collected using 100MHz of frequency step for both CPU and | 360 | Figure~(a) plots performance clusters collected using 100MHz of frequency step for both CPU and |
| 360 | memory. Figure~(b) plots | 361 | memory. Figure~(b) plots |
| 361 | performance clusters collected using frequency steps of 30MHz for CPU and 40MHz | 362 | performance clusters collected using frequency steps of 30MHz for CPU and 40MHz |
| 363 | +for memory. We simulate frequency range of 100MHz-1000MHz for CPU and 200MHz-800MHz | ||
| 362 | for memory.} | 364 | for memory.} |
| 363 | \label{sensitivity} | 365 | \label{sensitivity} |
| 364 | \end{figure} | 366 | \end{figure} |