Commit 787d6a3551c2fdc6e865c54ffce9c0b9368ebad6
1 parent
03df0645
made some small changes
Showing
6 changed files
with
192 additions
and
10 deletions
conclusion.tex
| ... | ... | @@ -8,7 +8,7 @@ measure by describing the multiple ways in which it would aid in the |
| 8 | 8 | management of energy and other resources on battery-powered smartphones. |
| 9 | 9 | Using an energy consumption dataset collected on \PhoneLab{} we have explored |
| 10 | 10 | separately several potential inputs to a value measure and determined how |
| 11 | -they weight energy consumption. And finally, we have presented results from a | |
| 11 | +they weight energy consumption. Finally, we have presented results from a | |
| 12 | 12 | failed effort to formulate an effective value measure. While this first |
| 13 | 13 | attempt was unsuccessful, we hope to engage the mobile systems community in |
| 14 | 14 | this effort so that more sophisticated and successful value measures can be | ... | ... |
conclusion.tex~
0 → 100644
| 1 | +\section{Conclusions} | |
| 2 | +\label{sec-conclusion} | |
| 3 | + | |
| 4 | +To conclude, we have argued that our inability to estimate app value is a | |
| 5 | +critical weakness that is threatening our successes at accurately estimating | |
| 6 | +and attributing energy consumption. We have motivated the need for a value | |
| 7 | +measure by describing the multiple ways in which it would aid in the | |
| 8 | +management of energy and other resources on battery-powered smartphones. | |
| 9 | +Using an energy consumption dataset collected on \PhoneLab{} we have explored | |
| 10 | +separately several potential inputs to a value measure and determined how | |
| 11 | +they weight energy consumption. And finally, we have presented results from a | |
| 12 | +failed effort to formulate an effective value measure. While this first | |
| 13 | +attempt was unsuccessful, we hope to engage the mobile systems community in | |
| 14 | +this effort so that more sophisticated and successful value measures can be | |
| 15 | +developed. | |
| 16 | + | |
| 17 | +\section*{Acknowledgments} | |
| 18 | + | |
| 19 | +Students and faculty working on estimating app value are supported by NSF | |
| 20 | +awards | |
| 21 | +\href{http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=1205656}{1205656} | |
| 22 | +and | |
| 23 | +\href{http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=1423215}{1423215}. | |
| 24 | +The authors thank the anonymous reviewers for their feedback. | ... | ... |
introduction.tex
| ... | ... | @@ -3,7 +3,7 @@ |
| 3 | 3 | Measuring app energy consumption\footnote{\small To avoid confusion between |
| 4 | 4 | app and energy usage, we use \textit{consumption} exclusively when referring |
| 5 | 5 | to energy usage and \textit{usage} exclusively when referring to user |
| 6 | -interaction with apps.} on mobile devices is nearly a solved problem, due to | |
| 6 | +interaction with apps.} on mobile devices is nearly a solved problem. This is due to | |
| 7 | 7 | great strides made in both generating and validating energy models that |
| 8 | 8 | deliver accurate runtime energy consumption |
| 9 | 9 | estimates~\cite{mansdi,vedge-nsdi13,pathak2011,pathak2012,yoon} and in |
| ... | ... | @@ -23,7 +23,7 @@ including: |
| 23 | 23 | |
| 24 | 24 | \item Will this change to an app make it more energy efficient? |
| 25 | 25 | |
| 26 | -\item Is a particular app an energy virus? | |
| 26 | +\item Is a particular app an \textit{energy virus}? | |
| 27 | 27 | |
| 28 | 28 | \item How should the limited energy resources on a given app be prioritized? |
| 29 | 29 | |
| ... | ... | @@ -41,8 +41,8 @@ of apps in order to evaluate two video conferencing tools, web browsers, or |
| 41 | 41 | email clients. Developers can determine whether a new feature delivers value |
| 42 | 42 | more or less efficiently than the rest of their app and better understand the |
| 43 | 43 | differences in energy consumption across different users. Measuring value |
| 44 | -allows a rigorous definition of an \textit{energy virus} as an app that | |
| 45 | -delivers little or no value per joule, and for systems to reward efficient | |
| 44 | +allows a rigorous definition of an \textit{energy virus} as \textit{an app that | |
| 45 | +delivers little or no value per joule}, and for systems to reward efficient | |
| 46 | 46 | apps by prioritizing limited resources based on app value or energy |
| 47 | 47 | efficiency. After all the progress we have made in computing the |
| 48 | 48 | denominator---energy consumption---we believe that the search for the missing |
| ... | ... | @@ -56,7 +56,7 @@ different usage patterns. It must be efficient to compute, since it should |
| 56 | 56 | not compete for the same limited energy resources that it is intended to help |
| 57 | 57 | manage. Ideally it should require little to no user input, since this will |
| 58 | 58 | make it burdensome and error-prone. And to make matters worse, there is no |
| 59 | -obvious way to measure ground truth to compare against---even in the lab. | |
| 59 | +obvious way to measure ground truth to compare against---even in a lab. | |
| 60 | 60 | Despite all these challenges, however, even a semi-accurate value measure |
| 61 | 61 | would greatly benefit energy management on battery-constrained smartphones. |
| 62 | 62 | With users continuing to report battery lifetime as their top concern with | ... | ... |
results.tex
| ... | ... | @@ -28,7 +28,7 @@ component (the Fuel Gauge) lacks. Changes were distributed to \PhoneLab{} |
| 28 | 28 | participants in November, 2013, via an over-the-air (OTA) platform update. |
| 29 | 29 | The resulting 2~month dataset of 67~GB of compressed log files represents |
| 30 | 30 | \num{6806} user days during which \num{1328}~apps were started \num{277785} |
| 31 | -times and used for a total of \num{15224} hours of active use by | |
| 31 | +times, and used for a total of \num{15224} hours of active use by | |
| 32 | 32 | 107~\PhoneLab{} participants. |
| 33 | 33 | |
| 34 | 34 | Our analysis begins by investigating several components of a possible value |
| ... | ... | @@ -37,7 +37,8 @@ consumed by each app. Next, we formulate a simple measure of content |
| 37 | 37 | delivery by measuring usage of the screen and audio output devices and test |
| 38 | 38 | it through a survey completed by 47~experiment participants. Unfortunately, |
| 39 | 39 | our results are inconclusive and open to several possible interpretations |
| 40 | -which we discuss. | |
| 40 | +which we discuss. We present our results in tabular format where for each measure we | |
| 41 | +rank 10 best performing and 10 worst performing apps in desending order. | |
| 41 | 42 | |
| 42 | 43 | \newpage |
| 43 | 44 | |
| ... | ... | @@ -152,6 +153,6 @@ interpreted as a sign that we need a more sophisticated value measure |
| 152 | 153 | incorporating more of the potential inputs we have previously discussed. |
| 153 | 154 | However, on one level the results are very encouraging: most users were |
| 154 | 155 | willing to consider removing one or more apps if that app would improve their |
| 155 | -battery lifetime. Clearly users are making this decision based on some idea | |
| 156 | +battery lifetime. Clearly, users are making this decision based on some idea | |
| 156 | 157 | of each app's value---the challenge is to replicate their choices using the |
| 157 | 158 | information we have available to us. | ... | ... |
results.tex~
0 → 100644
| 1 | +\section{Results} | |
| 2 | +\label{sec-results} | |
| 3 | + | |
| 4 | +To examine the potential components of a value measure further, we utilize a | |
| 5 | +large dataset of energy consumption measurements collected by an IRB-approved | |
| 6 | +experiment run on the \PhoneLab{} testbed. \PhoneLab{} is a public smartphone | |
| 7 | +platform testbed located at the University at | |
| 8 | +Buffalo~\cite{phonelab-sensemine13}. 220~students, faculty, and staff carry | |
| 9 | +instrumented Android Nexus~5 smartphones and receive subsidized service in | |
| 10 | +return for willingness to participate in experiments. \PhoneLab{} provides | |
| 11 | +access to a representative group of participants balanced between genders and | |
| 12 | +across a wide variety of age brackets, making our results more | |
| 13 | +representative. | |
| 14 | + | |
| 15 | +Understanding fine-grained energy consumption dynamics required more | |
| 16 | +information than Android normally exposes to apps. In addition, to explore | |
| 17 | +components of our value measure we also wanted to capture information about | |
| 18 | +app usage---including foreground and background time and use of the display | |
| 19 | +and audio interface---that was not possible to measure on unmodified Android | |
| 20 | +devices. So to collect our dataset we took advantage of \PhoneLab{}'s ability | |
| 21 | +to modify the Android platform itself. We instrumented the | |
| 22 | +\texttt{SurfaceFlinger} and \texttt{AudioFlinger} components in the Android platform | |
| 23 | +to record usage of the screen and audio, and altered the ActivityManagerService | |
| 24 | +package to record energy consumption at each app transition. This allows energy | |
| 25 | +consumption by components such as the screen to be accurately attributed to | |
| 26 | +the foreground app, a feature that Android's internal battery monitoring | |
| 27 | +component (the Fuel Gauge) lacks. Changes were distributed to \PhoneLab{} | |
| 28 | +participants in November, 2013, via an over-the-air (OTA) platform update. | |
| 29 | +The resulting 2~month dataset of 67~GB of compressed log files represents | |
| 30 | +\num{6806} user days during which \num{1328}~apps were started \num{277785} | |
| 31 | +times, and used for a total of \num{15224} hours of active use by | |
| 32 | +107~\PhoneLab{} participants. | |
| 33 | + | |
| 34 | +Our analysis begins by investigating several components of a possible value | |
| 35 | +measure and shows the effect of using each to weight the overall energy | |
| 36 | +consumed by each app. Next, we formulate a simple measure of content | |
| 37 | +delivery by measuring usage of the screen and audio output devices and test | |
| 38 | +it through a survey completed by 47~experiment participants. Unfortunately, | |
| 39 | +our results are inconclusive and open to several possible interpretations | |
| 40 | +which we discuss. | |
| 41 | + | |
| 42 | +\newpage | |
| 43 | + | |
| 44 | +\subsection{Total Energy} | |
| 45 | + | |
| 46 | +\input{./figures/tables/tableALL.tex} | |
| 47 | + | |
| 48 | +Clearly, ranking apps by total energy consumption computed by adding all | |
| 49 | +foreground and background energy consumptions over the entire study says | |
| 50 | +much more about app popularity than it does about anything else. | |
| 51 | +Table~\ref{table-total} shows the top and bottom energy-consuming apps over | |
| 52 | +the entire study. As expected, popular apps such as the Android Browser, | |
| 53 | +Facebook, and the Android Phone component consume the most energy, while the | |
| 54 | +list of low consumers is dominated by apps with few installs. This table does | |
| 55 | +serve, however, to identify the popular apps in use by \PhoneLab{} | |
| 56 | +participants, and as a point of comparison for the remainder of our results. | |
| 57 | + | |
| 58 | +\subsection{Power} | |
| 59 | + | |
| 60 | +Computing each app's power consumption by scaling their total energy usage | |
| 61 | +against the total time they were running, either in the background or | |
| 62 | +foreground, reveals more information, as shown in Table~\ref{table-rate}. Our | |
| 63 | +results identify Facebook Messenger, Google+, and the Super-Bright LED | |
| 64 | +Flashlight as apps that rapidly-consume energy, while the Bank of America and | |
| 65 | +Weather Channel apps consume energy slowly. Differences between apps in | |
| 66 | +similar categories may begin to identify apps with problematic energy | |
| 67 | +consumption, such as contrasting the high energy usage of Facebook Messenger | |
| 68 | +with other messaging clients such as WhatsApp, Twitter, and Android | |
| 69 | +Messaging. | |
| 70 | + | |
| 71 | +\subsection{Foreground Energy Efficiency} | |
| 72 | + | |
| 73 | +Isolating the foreground component of execution time provides a better | |
| 74 | +measure of value, since it ignores the time that users spend ignoring apps. | |
| 75 | +Table~\ref{table-foreground} shows a measure of energy efficiency computed by | |
| 76 | +%utilizing foreground time alone as our value measure. | |
| 77 | +dividing total foreground energy consumption by total foreground time of an | |
| 78 | +app. Some surprising changes | |
| 79 | +from the power results can be seen. A number of apps have remained in their former | |
| 80 | +categories: Bank of America, which was identified as a low-power app, is also | |
| 81 | +a highly-efficient app when using foreground time as the value measure; and | |
| 82 | +Facebook Messenger, which was identified as a high-power app, is also marked | |
| 83 | +as inefficient. Other apps, however, have switched categories. ESPN | |
| 84 | +Sportscenter and Yahoo Mail do not consume much power, but also don't spend | |
| 85 | +much time in the foreground; interestingly, none of the high-power apps | |
| 86 | +looked better when their foreground usage was considered. | |
| 87 | + | |
| 88 | +\subsection{Content Energy Efficiency} | |
| 89 | + | |
| 90 | +Finally, we use the data we collected by instrumenting the | |
| 91 | +\texttt{SurfaceFlinger} and \texttt{AudioFlinger} components to compute a | |
| 92 | +simple measure of content delivery. We measure the audio and video frame | |
| 93 | +rates and combine them into a single measure by using bit-rates corresponding | |
| 94 | +to a 30~fps YouTube-encoded video and 128~kbps two-channel audio, with the | |
| 95 | +weights representing the fact that a single frame of video contains much more | |
| 96 | +content than a single sample of audio. We use this combined metric as the | |
| 97 | +value measure and again use it to weight the energy consumption of each app, | |
| 98 | +with the results shown in Table~\ref{table-content}. | |
| 99 | + | |
| 100 | +Comparing with the foreground energy efficiency again shows several | |
| 101 | +interesting changes. Yahoo Mail, which foreground energy efficiency marked as | |
| 102 | +inefficient, looks more efficient when content delivery is considered. While | |
| 103 | +it is possible that one \PhoneLab{} participant uses it to read email very | |
| 104 | +quickly, it may be more likely that it uses a ``spinner'' or other fancy UI | |
| 105 | +elements that generate artificially high frame rates without delivering much | |
| 106 | +information. The inability to distinguish between meaningless and meaningful | |
| 107 | +video frame content is a significant weakness of this simple approach. | |
| 108 | +YouTube and Candy Crush Saga both earn high marks, which is encouraging given | |
| 109 | +that they are very different apps but also might be a result of overweighting | |
| 110 | +screen refreshes. The Android Clock is also an unsurprising result, as it | |
| 111 | +requires almost no energy to generate a relatively-large number of screen | |
| 112 | +redraws in timer and stopwatch mode. | |
| 113 | + | |
| 114 | +\subsection{Survey Results and Discussion} | |
| 115 | + | |
| 116 | +\begin{figure*}[t] | |
| 117 | +\centering | |
| 118 | +\includegraphics[width=\textwidth]{./figures/survey.pdf} | |
| 119 | + | |
| 120 | +\caption{\small \textbf{Survey Results.} The height of each bar demonstrates how | |
| 121 | +many of the suggested apps the user is willing to remove for better battery | |
| 122 | +life, with suggestions based on overall usage or our new content-delivery | |
| 123 | +efficiency measure. Our new measure does not convincingly out-perform the | |
| 124 | +straw man.} | |
| 125 | + | |
| 126 | +\label{fig-survey} | |
| 127 | + | |
| 128 | +\end{figure*} | |
| 129 | + | |
| 130 | +To continue the evaluation of our simple content-based value measure, we | |
| 131 | +prepared a survey for the 107~\PhoneLab{} participants who contributed data | |
| 132 | +to our experiment. Our goal was to determine if users would be more willing | |
| 133 | +to remove inefficient apps, as defined using our content-based measure. As a | |
| 134 | +baseline, we also asked users about the apps that consumed the most energy. | |
| 135 | +We used each participants data to generate a custom survey containing | |
| 136 | +questions about 9 apps: the 3 least efficient apps as computed by our | |
| 137 | +content-based value measure, the 3 apps that used the most energy on their | |
| 138 | +smartphone during the experiment, and 3 apps chosen at random. For each we | |
| 139 | +asked them a simple question: ``If it would improve your battery life, would | |
| 140 | +you uninstall or stop using this app?'' To compute an aggregate score for | |
| 141 | +both the content-based and usage based measures, we give each measure 1~point | |
| 142 | +for a ``Yes'', 0.5~points for a ``Maybe'' and 0~points for a ``No''. | |
| 143 | +47~participants completed the survey, and the results are shown in | |
| 144 | +Figure~\ref{fig-survey}. For each user, if the score of one measure is higher | |
| 145 | +than the other, it is considered a ``win'' for the former. | |
| 146 | + | |
| 147 | +Overall the results are inconclusive, with the content-delivery measure not | |
| 148 | +clearly outperforming the straw-man usage measure at predicting which apps | |
| 149 | +each user would be willing to remove to save battery life. Given the crude | |
| 150 | +nature of our metric, this is not particularly surprising, and can be | |
| 151 | +interpreted as a sign that we need a more sophisticated value measure | |
| 152 | +incorporating more of the potential inputs we have previously discussed. | |
| 153 | +However, on one level the results are very encouraging: most users were | |
| 154 | +willing to consider removing one or more apps if that app would improve their | |
| 155 | +battery lifetime. Clearly, users are making this decision based on some idea | |
| 156 | +of each app's value---the challenge is to replicate their choices using the | |
| 157 | +information we have available to us. | ... | ... |
usage.tex
| ... | ... | @@ -143,7 +143,7 @@ same approach can also be applied to determine how much of any limited system |
| 143 | 143 | resource to allocate to each app, |
| 144 | 144 | %with high-value apps gaining priority over |
| 145 | 145 | %the processor, memory allocation, networking bandwidth and limited storage. |
| 146 | -Together these resources allocation measures can be designed to ensure that | |
| 146 | +Together these resource allocation measures can be designed to ensure that | |
| 147 | 147 | high-value apps run smoothly at the expense of lower-value apps. |
| 148 | 148 | |
| 149 | 149 | \subsection{Summary of Requirements} | ... | ... |