Commit 787d6a3551c2fdc6e865c54ffce9c0b9368ebad6
1 parent
03df0645
made some small changes
Showing
6 changed files
with
192 additions
and
10 deletions
conclusion.tex
| @@ -8,7 +8,7 @@ measure by describing the multiple ways in which it would aid in the | @@ -8,7 +8,7 @@ measure by describing the multiple ways in which it would aid in the | ||
| 8 | management of energy and other resources on battery-powered smartphones. | 8 | management of energy and other resources on battery-powered smartphones. |
| 9 | Using an energy consumption dataset collected on \PhoneLab{} we have explored | 9 | Using an energy consumption dataset collected on \PhoneLab{} we have explored |
| 10 | separately several potential inputs to a value measure and determined how | 10 | separately several potential inputs to a value measure and determined how |
| 11 | -they weight energy consumption. And finally, we have presented results from a | 11 | +they weight energy consumption. Finally, we have presented results from a |
| 12 | failed effort to formulate an effective value measure. While this first | 12 | failed effort to formulate an effective value measure. While this first |
| 13 | attempt was unsuccessful, we hope to engage the mobile systems community in | 13 | attempt was unsuccessful, we hope to engage the mobile systems community in |
| 14 | this effort so that more sophisticated and successful value measures can be | 14 | this effort so that more sophisticated and successful value measures can be |
conclusion.tex~
0 → 100644
| 1 | +\section{Conclusions} | ||
| 2 | +\label{sec-conclusion} | ||
| 3 | + | ||
| 4 | +To conclude, we have argued that our inability to estimate app value is a | ||
| 5 | +critical weakness that is threatening our successes at accurately estimating | ||
| 6 | +and attributing energy consumption. We have motivated the need for a value | ||
| 7 | +measure by describing the multiple ways in which it would aid in the | ||
| 8 | +management of energy and other resources on battery-powered smartphones. | ||
| 9 | +Using an energy consumption dataset collected on \PhoneLab{} we have explored | ||
| 10 | +separately several potential inputs to a value measure and determined how | ||
| 11 | +they weight energy consumption. And finally, we have presented results from a | ||
| 12 | +failed effort to formulate an effective value measure. While this first | ||
| 13 | +attempt was unsuccessful, we hope to engage the mobile systems community in | ||
| 14 | +this effort so that more sophisticated and successful value measures can be | ||
| 15 | +developed. | ||
| 16 | + | ||
| 17 | +\section*{Acknowledgments} | ||
| 18 | + | ||
| 19 | +Students and faculty working on estimating app value are supported by NSF | ||
| 20 | +awards | ||
| 21 | +\href{http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=1205656}{1205656} | ||
| 22 | +and | ||
| 23 | +\href{http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=1423215}{1423215}. | ||
| 24 | +The authors thank the anonymous reviewers for their feedback. |
introduction.tex
| @@ -3,7 +3,7 @@ | @@ -3,7 +3,7 @@ | ||
| 3 | Measuring app energy consumption\footnote{\small To avoid confusion between | 3 | Measuring app energy consumption\footnote{\small To avoid confusion between |
| 4 | app and energy usage, we use \textit{consumption} exclusively when referring | 4 | app and energy usage, we use \textit{consumption} exclusively when referring |
| 5 | to energy usage and \textit{usage} exclusively when referring to user | 5 | to energy usage and \textit{usage} exclusively when referring to user |
| 6 | -interaction with apps.} on mobile devices is nearly a solved problem, due to | 6 | +interaction with apps.} on mobile devices is nearly a solved problem. This is due to |
| 7 | great strides made in both generating and validating energy models that | 7 | great strides made in both generating and validating energy models that |
| 8 | deliver accurate runtime energy consumption | 8 | deliver accurate runtime energy consumption |
| 9 | estimates~\cite{mansdi,vedge-nsdi13,pathak2011,pathak2012,yoon} and in | 9 | estimates~\cite{mansdi,vedge-nsdi13,pathak2011,pathak2012,yoon} and in |
| @@ -23,7 +23,7 @@ including: | @@ -23,7 +23,7 @@ including: | ||
| 23 | 23 | ||
| 24 | \item Will this change to an app make it more energy efficient? | 24 | \item Will this change to an app make it more energy efficient? |
| 25 | 25 | ||
| 26 | -\item Is a particular app an energy virus? | 26 | +\item Is a particular app an \textit{energy virus}? |
| 27 | 27 | ||
| 28 | \item How should the limited energy resources on a given app be prioritized? | 28 | \item How should the limited energy resources on a given app be prioritized? |
| 29 | 29 | ||
| @@ -41,8 +41,8 @@ of apps in order to evaluate two video conferencing tools, web browsers, or | @@ -41,8 +41,8 @@ of apps in order to evaluate two video conferencing tools, web browsers, or | ||
| 41 | email clients. Developers can determine whether a new feature delivers value | 41 | email clients. Developers can determine whether a new feature delivers value |
| 42 | more or less efficiently than the rest of their app and better understand the | 42 | more or less efficiently than the rest of their app and better understand the |
| 43 | differences in energy consumption across different users. Measuring value | 43 | differences in energy consumption across different users. Measuring value |
| 44 | -allows a rigorous definition of an \textit{energy virus} as an app that | ||
| 45 | -delivers little or no value per joule, and for systems to reward efficient | 44 | +allows a rigorous definition of an \textit{energy virus} as \textit{an app that |
| 45 | +delivers little or no value per joule}, and for systems to reward efficient | ||
| 46 | apps by prioritizing limited resources based on app value or energy | 46 | apps by prioritizing limited resources based on app value or energy |
| 47 | efficiency. After all the progress we have made in computing the | 47 | efficiency. After all the progress we have made in computing the |
| 48 | denominator---energy consumption---we believe that the search for the missing | 48 | denominator---energy consumption---we believe that the search for the missing |
| @@ -56,7 +56,7 @@ different usage patterns. It must be efficient to compute, since it should | @@ -56,7 +56,7 @@ different usage patterns. It must be efficient to compute, since it should | ||
| 56 | not compete for the same limited energy resources that it is intended to help | 56 | not compete for the same limited energy resources that it is intended to help |
| 57 | manage. Ideally it should require little to no user input, since this will | 57 | manage. Ideally it should require little to no user input, since this will |
| 58 | make it burdensome and error-prone. And to make matters worse, there is no | 58 | make it burdensome and error-prone. And to make matters worse, there is no |
| 59 | -obvious way to measure ground truth to compare against---even in the lab. | 59 | +obvious way to measure ground truth to compare against---even in a lab. |
| 60 | Despite all these challenges, however, even a semi-accurate value measure | 60 | Despite all these challenges, however, even a semi-accurate value measure |
| 61 | would greatly benefit energy management on battery-constrained smartphones. | 61 | would greatly benefit energy management on battery-constrained smartphones. |
| 62 | With users continuing to report battery lifetime as their top concern with | 62 | With users continuing to report battery lifetime as their top concern with |
results.tex
| @@ -28,7 +28,7 @@ component (the Fuel Gauge) lacks. Changes were distributed to \PhoneLab{} | @@ -28,7 +28,7 @@ component (the Fuel Gauge) lacks. Changes were distributed to \PhoneLab{} | ||
| 28 | participants in November, 2013, via an over-the-air (OTA) platform update. | 28 | participants in November, 2013, via an over-the-air (OTA) platform update. |
| 29 | The resulting 2~month dataset of 67~GB of compressed log files represents | 29 | The resulting 2~month dataset of 67~GB of compressed log files represents |
| 30 | \num{6806} user days during which \num{1328}~apps were started \num{277785} | 30 | \num{6806} user days during which \num{1328}~apps were started \num{277785} |
| 31 | -times and used for a total of \num{15224} hours of active use by | 31 | +times, and used for a total of \num{15224} hours of active use by |
| 32 | 107~\PhoneLab{} participants. | 32 | 107~\PhoneLab{} participants. |
| 33 | 33 | ||
| 34 | Our analysis begins by investigating several components of a possible value | 34 | Our analysis begins by investigating several components of a possible value |
| @@ -37,7 +37,8 @@ consumed by each app. Next, we formulate a simple measure of content | @@ -37,7 +37,8 @@ consumed by each app. Next, we formulate a simple measure of content | ||
| 37 | delivery by measuring usage of the screen and audio output devices and test | 37 | delivery by measuring usage of the screen and audio output devices and test |
| 38 | it through a survey completed by 47~experiment participants. Unfortunately, | 38 | it through a survey completed by 47~experiment participants. Unfortunately, |
| 39 | our results are inconclusive and open to several possible interpretations | 39 | our results are inconclusive and open to several possible interpretations |
| 40 | -which we discuss. | 40 | +which we discuss. We present our results in tabular format where for each measure we |
| 41 | +rank 10 best performing and 10 worst performing apps in desending order. | ||
| 41 | 42 | ||
| 42 | \newpage | 43 | \newpage |
| 43 | 44 | ||
| @@ -152,6 +153,6 @@ interpreted as a sign that we need a more sophisticated value measure | @@ -152,6 +153,6 @@ interpreted as a sign that we need a more sophisticated value measure | ||
| 152 | incorporating more of the potential inputs we have previously discussed. | 153 | incorporating more of the potential inputs we have previously discussed. |
| 153 | However, on one level the results are very encouraging: most users were | 154 | However, on one level the results are very encouraging: most users were |
| 154 | willing to consider removing one or more apps if that app would improve their | 155 | willing to consider removing one or more apps if that app would improve their |
| 155 | -battery lifetime. Clearly users are making this decision based on some idea | 156 | +battery lifetime. Clearly, users are making this decision based on some idea |
| 156 | of each app's value---the challenge is to replicate their choices using the | 157 | of each app's value---the challenge is to replicate their choices using the |
| 157 | information we have available to us. | 158 | information we have available to us. |
results.tex~
0 → 100644
| 1 | +\section{Results} | ||
| 2 | +\label{sec-results} | ||
| 3 | + | ||
| 4 | +To examine the potential components of a value measure further, we utilize a | ||
| 5 | +large dataset of energy consumption measurements collected by an IRB-approved | ||
| 6 | +experiment run on the \PhoneLab{} testbed. \PhoneLab{} is a public smartphone | ||
| 7 | +platform testbed located at the University at | ||
| 8 | +Buffalo~\cite{phonelab-sensemine13}. 220~students, faculty, and staff carry | ||
| 9 | +instrumented Android Nexus~5 smartphones and receive subsidized service in | ||
| 10 | +return for willingness to participate in experiments. \PhoneLab{} provides | ||
| 11 | +access to a representative group of participants balanced between genders and | ||
| 12 | +across a wide variety of age brackets, making our results more | ||
| 13 | +representative. | ||
| 14 | + | ||
| 15 | +Understanding fine-grained energy consumption dynamics required more | ||
| 16 | +information than Android normally exposes to apps. In addition, to explore | ||
| 17 | +components of our value measure we also wanted to capture information about | ||
| 18 | +app usage---including foreground and background time and use of the display | ||
| 19 | +and audio interface---that was not possible to measure on unmodified Android | ||
| 20 | +devices. So to collect our dataset we took advantage of \PhoneLab{}'s ability | ||
| 21 | +to modify the Android platform itself. We instrumented the | ||
| 22 | +\texttt{SurfaceFlinger} and \texttt{AudioFlinger} components in the Android platform | ||
| 23 | +to record usage of the screen and audio, and altered the ActivityManagerService | ||
| 24 | +package to record energy consumption at each app transition. This allows energy | ||
| 25 | +consumption by components such as the screen to be accurately attributed to | ||
| 26 | +the foreground app, a feature that Android's internal battery monitoring | ||
| 27 | +component (the Fuel Gauge) lacks. Changes were distributed to \PhoneLab{} | ||
| 28 | +participants in November, 2013, via an over-the-air (OTA) platform update. | ||
| 29 | +The resulting 2~month dataset of 67~GB of compressed log files represents | ||
| 30 | +\num{6806} user days during which \num{1328}~apps were started \num{277785} | ||
| 31 | +times, and used for a total of \num{15224} hours of active use by | ||
| 32 | +107~\PhoneLab{} participants. | ||
| 33 | + | ||
| 34 | +Our analysis begins by investigating several components of a possible value | ||
| 35 | +measure and shows the effect of using each to weight the overall energy | ||
| 36 | +consumed by each app. Next, we formulate a simple measure of content | ||
| 37 | +delivery by measuring usage of the screen and audio output devices and test | ||
| 38 | +it through a survey completed by 47~experiment participants. Unfortunately, | ||
| 39 | +our results are inconclusive and open to several possible interpretations | ||
| 40 | +which we discuss. | ||
| 41 | + | ||
| 42 | +\newpage | ||
| 43 | + | ||
| 44 | +\subsection{Total Energy} | ||
| 45 | + | ||
| 46 | +\input{./figures/tables/tableALL.tex} | ||
| 47 | + | ||
| 48 | +Clearly, ranking apps by total energy consumption computed by adding all | ||
| 49 | +foreground and background energy consumptions over the entire study says | ||
| 50 | +much more about app popularity than it does about anything else. | ||
| 51 | +Table~\ref{table-total} shows the top and bottom energy-consuming apps over | ||
| 52 | +the entire study. As expected, popular apps such as the Android Browser, | ||
| 53 | +Facebook, and the Android Phone component consume the most energy, while the | ||
| 54 | +list of low consumers is dominated by apps with few installs. This table does | ||
| 55 | +serve, however, to identify the popular apps in use by \PhoneLab{} | ||
| 56 | +participants, and as a point of comparison for the remainder of our results. | ||
| 57 | + | ||
| 58 | +\subsection{Power} | ||
| 59 | + | ||
| 60 | +Computing each app's power consumption by scaling their total energy usage | ||
| 61 | +against the total time they were running, either in the background or | ||
| 62 | +foreground, reveals more information, as shown in Table~\ref{table-rate}. Our | ||
| 63 | +results identify Facebook Messenger, Google+, and the Super-Bright LED | ||
| 64 | +Flashlight as apps that rapidly-consume energy, while the Bank of America and | ||
| 65 | +Weather Channel apps consume energy slowly. Differences between apps in | ||
| 66 | +similar categories may begin to identify apps with problematic energy | ||
| 67 | +consumption, such as contrasting the high energy usage of Facebook Messenger | ||
| 68 | +with other messaging clients such as WhatsApp, Twitter, and Android | ||
| 69 | +Messaging. | ||
| 70 | + | ||
| 71 | +\subsection{Foreground Energy Efficiency} | ||
| 72 | + | ||
| 73 | +Isolating the foreground component of execution time provides a better | ||
| 74 | +measure of value, since it ignores the time that users spend ignoring apps. | ||
| 75 | +Table~\ref{table-foreground} shows a measure of energy efficiency computed by | ||
| 76 | +%utilizing foreground time alone as our value measure. | ||
| 77 | +dividing total foreground energy consumption by total foreground time of an | ||
| 78 | +app. Some surprising changes | ||
| 79 | +from the power results can be seen. A number of apps have remained in their former | ||
| 80 | +categories: Bank of America, which was identified as a low-power app, is also | ||
| 81 | +a highly-efficient app when using foreground time as the value measure; and | ||
| 82 | +Facebook Messenger, which was identified as a high-power app, is also marked | ||
| 83 | +as inefficient. Other apps, however, have switched categories. ESPN | ||
| 84 | +Sportscenter and Yahoo Mail do not consume much power, but also don't spend | ||
| 85 | +much time in the foreground; interestingly, none of the high-power apps | ||
| 86 | +looked better when their foreground usage was considered. | ||
| 87 | + | ||
| 88 | +\subsection{Content Energy Efficiency} | ||
| 89 | + | ||
| 90 | +Finally, we use the data we collected by instrumenting the | ||
| 91 | +\texttt{SurfaceFlinger} and \texttt{AudioFlinger} components to compute a | ||
| 92 | +simple measure of content delivery. We measure the audio and video frame | ||
| 93 | +rates and combine them into a single measure by using bit-rates corresponding | ||
| 94 | +to a 30~fps YouTube-encoded video and 128~kbps two-channel audio, with the | ||
| 95 | +weights representing the fact that a single frame of video contains much more | ||
| 96 | +content than a single sample of audio. We use this combined metric as the | ||
| 97 | +value measure and again use it to weight the energy consumption of each app, | ||
| 98 | +with the results shown in Table~\ref{table-content}. | ||
| 99 | + | ||
| 100 | +Comparing with the foreground energy efficiency again shows several | ||
| 101 | +interesting changes. Yahoo Mail, which foreground energy efficiency marked as | ||
| 102 | +inefficient, looks more efficient when content delivery is considered. While | ||
| 103 | +it is possible that one \PhoneLab{} participant uses it to read email very | ||
| 104 | +quickly, it may be more likely that it uses a ``spinner'' or other fancy UI | ||
| 105 | +elements that generate artificially high frame rates without delivering much | ||
| 106 | +information. The inability to distinguish between meaningless and meaningful | ||
| 107 | +video frame content is a significant weakness of this simple approach. | ||
| 108 | +YouTube and Candy Crush Saga both earn high marks, which is encouraging given | ||
| 109 | +that they are very different apps but also might be a result of overweighting | ||
| 110 | +screen refreshes. The Android Clock is also an unsurprising result, as it | ||
| 111 | +requires almost no energy to generate a relatively-large number of screen | ||
| 112 | +redraws in timer and stopwatch mode. | ||
| 113 | + | ||
| 114 | +\subsection{Survey Results and Discussion} | ||
| 115 | + | ||
| 116 | +\begin{figure*}[t] | ||
| 117 | +\centering | ||
| 118 | +\includegraphics[width=\textwidth]{./figures/survey.pdf} | ||
| 119 | + | ||
| 120 | +\caption{\small \textbf{Survey Results.} The height of each bar demonstrates how | ||
| 121 | +many of the suggested apps the user is willing to remove for better battery | ||
| 122 | +life, with suggestions based on overall usage or our new content-delivery | ||
| 123 | +efficiency measure. Our new measure does not convincingly out-perform the | ||
| 124 | +straw man.} | ||
| 125 | + | ||
| 126 | +\label{fig-survey} | ||
| 127 | + | ||
| 128 | +\end{figure*} | ||
| 129 | + | ||
| 130 | +To continue the evaluation of our simple content-based value measure, we | ||
| 131 | +prepared a survey for the 107~\PhoneLab{} participants who contributed data | ||
| 132 | +to our experiment. Our goal was to determine if users would be more willing | ||
| 133 | +to remove inefficient apps, as defined using our content-based measure. As a | ||
| 134 | +baseline, we also asked users about the apps that consumed the most energy. | ||
| 135 | +We used each participants data to generate a custom survey containing | ||
| 136 | +questions about 9 apps: the 3 least efficient apps as computed by our | ||
| 137 | +content-based value measure, the 3 apps that used the most energy on their | ||
| 138 | +smartphone during the experiment, and 3 apps chosen at random. For each we | ||
| 139 | +asked them a simple question: ``If it would improve your battery life, would | ||
| 140 | +you uninstall or stop using this app?'' To compute an aggregate score for | ||
| 141 | +both the content-based and usage based measures, we give each measure 1~point | ||
| 142 | +for a ``Yes'', 0.5~points for a ``Maybe'' and 0~points for a ``No''. | ||
| 143 | +47~participants completed the survey, and the results are shown in | ||
| 144 | +Figure~\ref{fig-survey}. For each user, if the score of one measure is higher | ||
| 145 | +than the other, it is considered a ``win'' for the former. | ||
| 146 | + | ||
| 147 | +Overall the results are inconclusive, with the content-delivery measure not | ||
| 148 | +clearly outperforming the straw-man usage measure at predicting which apps | ||
| 149 | +each user would be willing to remove to save battery life. Given the crude | ||
| 150 | +nature of our metric, this is not particularly surprising, and can be | ||
| 151 | +interpreted as a sign that we need a more sophisticated value measure | ||
| 152 | +incorporating more of the potential inputs we have previously discussed. | ||
| 153 | +However, on one level the results are very encouraging: most users were | ||
| 154 | +willing to consider removing one or more apps if that app would improve their | ||
| 155 | +battery lifetime. Clearly, users are making this decision based on some idea | ||
| 156 | +of each app's value---the challenge is to replicate their choices using the | ||
| 157 | +information we have available to us. |
usage.tex
| @@ -143,7 +143,7 @@ same approach can also be applied to determine how much of any limited system | @@ -143,7 +143,7 @@ same approach can also be applied to determine how much of any limited system | ||
| 143 | resource to allocate to each app, | 143 | resource to allocate to each app, |
| 144 | %with high-value apps gaining priority over | 144 | %with high-value apps gaining priority over |
| 145 | %the processor, memory allocation, networking bandwidth and limited storage. | 145 | %the processor, memory allocation, networking bandwidth and limited storage. |
| 146 | -Together these resources allocation measures can be designed to ensure that | 146 | +Together these resource allocation measures can be designed to ensure that |
| 147 | high-value apps run smoothly at the expense of lower-value apps. | 147 | high-value apps run smoothly at the expense of lower-value apps. |
| 148 | 148 | ||
| 149 | \subsection{Summary of Requirements} | 149 | \subsection{Summary of Requirements} |