diff --git a/algorithm_implications.tex b/algorithm_implications.tex index 008065f..67daa08 100644 --- a/algorithm_implications.tex +++ b/algorithm_implications.tex @@ -29,12 +29,12 @@ detect stable regions of clusters containing both memory and CPU settings. \item \textit{Offline Analysis:} Another approach that can be taken to reduce the number of tuning events is offline analysis of the applications. An -application can be profiled once offline to identify regions in which the -performance cluster is stable. The profiled information of the stable region +application can be profiled offline to identify regions in which the +performance cluster is stable. The profile information of the stable region lengths, positions, and available settings can then be used at run time to enable the system to predict how long it can go without tuning. Algorithms can also extend the usage of the profiled information to new applications that may have -phases that match with already profiled data. Previous work has already proposed +phases that match with existing profiled data. Previous work has already proposed using offline analysis methods to detect application phases~\cite{Lau:2006:CGO:phase}, which would be directly applicable here in our system. diff --git a/performance_clusters.tex b/performance_clusters.tex index ee83e38..228ed16 100644 --- a/performance_clusters.tex +++ b/performance_clusters.tex @@ -213,7 +213,7 @@ available settings increase with inefficiency increasing the average length of stable regions. At an inefficiency budget of 1.6, the average length of a stable region increases drastically as shown in Figure~\ref{box-lengths}(b), which requires much less transitions with 1\% cluster threshold and no transitions with higher cluster thresholds of 3\% -and 5\%. Note that there is only one point on the box plot for 3\% and 5\% +and 5\%. Note that there is only one point on the box plot of bzip2 for 3\% and 5\% cluster thresholds at inefficiency of 1.6, because the benchmark is covered entirely by only one region. %and therefore no distribution is available. However, \textit{gobmk} has rapidly changing phases and @@ -339,8 +339,9 @@ available to make better energy-performance trade-offs. Therefore average number of samples for which one setting can be chosen decreases. For example, with 70 frequency settings sample 7 through sample 10 can always run at CPU frequency of 900MHz and memory frequency of 300MHz. With 496 frequency settings, sample 7 -runs at one setting, sample 8-9 runs at another setting and sample 10 runs at a -different setting due to the availability of more (and better) choices. +runs at 900MHz, sample 8-9 run at 950MHz and sample 10 runs at 980MHz of CPU +frequency. Fine frequency steps increase the availability of more (and better) +choices, resulting in smaller stable region lengths. %\XXXnote{sounds wordy -Dave}. In our system, we observed only a small improvement in performance (\textless 1\%) with an increased number of frequency steps when @@ -359,6 +360,7 @@ critical in deciding the correct size of the search space. Figure~(a) plots performance clusters collected using 100MHz of frequency step for both CPU and memory. Figure~(b) plots performance clusters collected using frequency steps of 30MHz for CPU and 40MHz +for memory. We simulate frequency range of 100MHz-1000MHz for CPU and 200MHz-800MHz for memory.} \label{sensitivity} \end{figure}