diff --git a/abstract.tex b/abstract.tex index ab5a0fc..e398c9b 100644 --- a/abstract.tex +++ b/abstract.tex @@ -5,7 +5,7 @@ measures alone are not sufficient to enable effective energy management on battery-constrained mobile devices. What is urgently needed is a way to put energy consumption into context by measuring the \textit{value} delivered by mobile apps. While difficult to compute, an accurate value measure would -enable cross-app comparison, app improvement, energy virus detection, and +enable cross-app comparison, app improvement, energy inefficient app detection, and effective runtime energy allocation and prioritization. Our paper motivates the problem, describes requirements for a value measure, discusses and evaluates several possible inputs to such a measure, and presents results diff --git a/introduction.tex b/introduction.tex index f996850..a64b885 100644 --- a/introduction.tex +++ b/introduction.tex @@ -39,9 +39,9 @@ Armed with a measure of value we can return to the difficult questions posed above. By computing efficiency users can perform apples-to-apples comparisons of apps in order to evaluate two video conferencing tools, web browsers, or email clients. Developers can determine whether a new feature delivers value -more or less efficiently than the rest of their app and better understand +more or less efficiently than the rest of their app and understand better the differences in energy consumption across different users. Measuring value -allows a rigorous definition of energy virus as an app that delivers little +allows a rigorous definition of an \textit{energy virus} as an app that delivers little or no value per joule, and for systems to reward efficient apps by prioritizing limited resources based on app value or energy efficiency. After all the progress we have made in computing the denominator---energy @@ -69,7 +69,7 @@ how useful such a measure would be while also formulating design requirements for the value measure itself. Section~\ref{sec-measure} presents an overview of possible inputs into such a measure and discussion of how each could be measured and how useful it might be. In Section~\ref{sec-results} we present -at formulating a value measure based on content delivered through the video +our initial effort at formulating a value measure based on content delivered through the video display and audio output---an attempt that we consider a failure based on the result of a user survey, but a failure that we hope sheds some light on this difficult challenge. diff --git a/metric.tex b/metric.tex index 4a5ab76..2135804 100644 --- a/metric.tex +++ b/metric.tex @@ -95,7 +95,7 @@ can compute accurate values while consuming small amounts of energy. However, while this framework is conceptually appealing, fitting each app into it requires app-specific features that we are trying to avoid: content is measured in messages for the chat client, frames for the video player, and -the accuracy of the step value for the pedometer. This raises the question of +the step value accuracy for the pedometer. This raises the question of whether a single measure of content delivery requiring no app-specific knowledge can be utilized in all cases. We explore this question in more detail, as well as differences between the other value measure inputs we have diff --git a/paper.tex b/paper.tex index bff6d28..10a6b53 100644 --- a/paper.tex +++ b/paper.tex @@ -33,7 +33,7 @@ Apps} \href{http://ubicomp.org/ubicomp2014/}{\textit{HotMobile'15}}, February 12--13, 2015, Santa Fe, NM, USA\\ ACM 978-1-4503-3391-7/15/02$\ldots$\$15.00.\\ - \url{http://dx.doi.org/10.1145/2699343.2699361}} + \url{http://dx.doi.org/10.1145/2699343.2699360}} \begin{document} diff --git a/results.tex b/results.tex index 476f1b0..9a0a686 100644 --- a/results.tex +++ b/results.tex @@ -19,9 +19,9 @@ app usage---including foreground and background time and use of the display and audio interface---that was not possible to measure on unmodified Android devices. So to collect our dataset we took advantage of \PhoneLab{}'s ability to modify the Android platform itself. We instrumented the -\texttt{SurfaceFlinger} and \texttt{AudioFlinger} Android platform components -to record usage of the screen and audio, and altered the Activity Services -package to record energy consumption at each app transition, allowing energy +\texttt{SurfaceFlinger} and \texttt{AudioFlinger} components in the Android platform +to record usage of the screen and audio, and altered the ActivityManagerService +package to record energy consumption at each app transition. This allows energy consumption by components such as the screen to be accurately attributed to the foreground app, a feature that Android's internal battery monitoring component (the Fuel Gauge) lacks. Changes were distributed to \PhoneLab{} @@ -37,13 +37,14 @@ consumed by each app. Next, we formulate a simple measure of content delivery by measuring usage of the screen and audio output devices and test it through a survey completed by 47~experiment participants. Unfortunately, our results are inconclusive and open to several possible interpretations -which we conclude by discussing. +which we discuss. \subsection{Total Energy} \input{./figures/tables/tableALL.tex} -Clearly, ranking apps by total energy consumption over the entire study says +Clearly, ranking apps by total energy consumption computed by adding all +foreground and background energy consumptions over the entire study says much more about app popularity than it does about anything else. Table~\ref{table-total} shows the top and bottom energy-consuming apps over the entire study. As expected, popular apps such as the Android Browser, @@ -70,8 +71,10 @@ Messaging. Isolating the foreground component of execution time provides a better measure of value, since it ignores the time that users spend ignoring apps. Table~\ref{table-foreground} shows a measure of energy efficiency computed by -utilizing foreground time alone as our value measure. Some surprising changes -from the power results can be seen. Some apps have remaining in their former +%utilizing foreground time alone as our value measure. +dividing total foreground energy consumption by total foreground time of an +app. Some surprising changes +from the power results can be seen. A number of apps have remained in their former categories: Bank of America, which was identified as a low-power app, is also a highly-efficient app when using foreground time as the value measure; and Facebook Messenger, which was identified as a high-power app, is also marked @@ -82,7 +85,7 @@ looked better when their foreground usage was considered. \subsection{Content Energy Efficiency} -Finally, we the data we collected by instrumenting the +Finally, we use the data we collected by instrumenting the \texttt{SurfaceFlinger} and \texttt{AudioFlinger} components to compute a simple measure of content delivery. We measure the audio and video frame rates and combine them into a single measure by using bit-rates corresponding @@ -94,7 +97,7 @@ with the results shown in Table~\ref{table-content}. Comparing with the foreground energy efficiency again shows several interesting changes. Yahoo Mail, which foreground energy efficiency marked as -inefficiency, looks more efficient when content delivery is considered. While +inefficient, looks more efficient when content delivery is considered. While it is possible that one \PhoneLab{} participant uses it to read email very quickly, it may be more likely that it uses a ``spinner'' or other fancy UI elements that generate artificially high frame rates without delivering much @@ -136,7 +139,8 @@ you uninstall or stop using this app?'' To compute an aggregate score for both the content-based and usage based measures, we give each measure 1~point for a ``Yes'', 0.5~points for a ``Maybe'' and 0~points for a ``No''. 47~participants completed the survey, and the results are shown in -Figure~\ref{fig-survey}. +Figure~\ref{fig-survey}. For each user, if the score of one measure is higher +than the other, it is considered a ``win'' for the former. Overall the results are inconclusive, with the content-delivery measure not clearly outperforming the straw-man usage measure at predicting which apps diff --git a/submitted/3.pdf b/submitted/3.pdf new file mode 100644 index 0000000..da9c3f6 --- /dev/null +++ b/submitted/3.pdf diff --git a/submitted/4.pdf b/submitted/4.pdf new file mode 100644 index 0000000..fd8cee7 --- /dev/null +++ b/submitted/4.pdf diff --git a/usage.tex b/usage.tex index d82a547..89ac9b2 100644 --- a/usage.tex +++ b/usage.tex @@ -11,19 +11,19 @@ problem: what is the value of an app? All smartphone users intuitively realize that smartphone apps differ in value---an email client, for example, is probably more valuable than a app -that makes farting sounds. But is it possible to quantify these subjective +that makes random sounds. But is it possible to quantify these subjective distinctions and produce a value measure? To argue that this is possible we present two experiments that elucidate smartphone app value in the form of both ordinal and cardinal utilities: % \begin{enumerate} -\item An adversary will require you to remove some number of apps from your +\item You will be required to remove some number of apps from your smartphone. Order the apps you are currently using from least important to most important. The N least important apps will be removed. -\item Your smartphone will require you to create an energy budget for the -apps you use. During any discharging cycle, once an app runs out of energy +\item You will be required to create an energy budget for the +apps you use on your smartphone. During any discharging cycle, once an app runs out of energy you will not be able to use it until you plug in your smartphone. Allocate battery percentages to each app you use. @@ -31,7 +31,9 @@ battery percentages to each app you use. % We plan to engage smartphone users in studies to explore in more detail which of these approaches is more effective, comparing them by comparing users' -levels of satisfaction under each scenario. For our value measure we are +levels of satisfaction under each scenario. In the first experiment we ask users +to uninstall apps because often apps have a background component that keeps consuming +energy even when not used by users any more. For our value measure we are hopeful that users will prove capable of assigning cardinal utilities to apps---as in the second experiment---since this matches most directly with our proposed value measure and could provide ground truth for a value measure @@ -47,7 +49,7 @@ these setups are the only way or the right way to measure value. In both cases low value measures have fairly extreme consequences---the app is actually removed or rendered unusable. This may cause users to overvalue essential tools such as communication apps and undervalue inessential apps -that nevertheless provide them with a great deal of enjoyment such as a game. +that nevertheless provide them with a great deal of enjoyment such as games. However, given that our goal is a value measure that can be paired with and used to allocate energy, and that energy exhaustion has such severe consequences on the usability of all apps, a more extreme experimental setup @@ -63,10 +65,10 @@ The most powerful use of a value measure would be to compare apps by comparing their energy efficiency, therefore overcoming the most critical flaw in current attempts to compare or categorize apps by their energy consumption alone~\cite{carat-sensys13}. Consider attempting to compare a -chat client and videoconferencing app by only measuring their energy +chat client and video conferencing app by only measuring their energy consumption. Unless it is terribly written, the chat client will consume less energy. But this does not mean that it is efficient, or that the -videoconferencing app is not. Ultimately, all the energy consumption +video conferencing app is not. Ultimately, all the energy consumption comparison truly reveals is that the two apps do different things---which we already knew. @@ -75,20 +77,22 @@ same app difficult. Given an app that consumes twice as much energy on Alice's smartphone than on Bob's, the question of why is left unanswered by pure energy measures. Even if usage time can be used to normalize the comparison, power consumption alone cannot incorporate differences due to the -different app features or configurations used by Alice and Bob. +different app features or app configurations used by Alice and Bob. By computing value and, thus, energy efficiency, we can overcome these weaknesses. A value measure should allow us to compare the efficiency of two apps in different categories based on how efficiently they use energy to -deliver user value, making it possible to compare games to email clients to -video players. Comparisons within the same app category should allow users to +deliver user value. +%, making it possible to compare games to email clients to video players. +Comparisons within the same app category should allow users to select the most efficient email client or web browser. Aggregating results over all users, differences in app energy efficiency should reflect how well the app is written and how well it predicts and adapts to users, not just differences in the core features it provides. When comparing two users using -the same app, differences in efficiency should reflect different -configurations or differences in how efficiently the app provides certain -features. +the same app, differences in efficiency should reflect differences in +app configurations or app features. +%different app configurations or differences in how efficiently the app provides certain +%features. \subsection{Evaluating App Changes} @@ -97,9 +101,11 @@ and deliver more value per joule. Today's energy profiling tools may be able to show the energy impact of adding a new feature or changing the way that a particular feature is implemented, but energy consumption alone is not sufficient to apply Amdahl's Law properly to the problem of improving app -energy efficiency. For example, if a particular feature consumes a great deal -of energy but adds little value, it is possible that it should be eliminated, -not improved. Overall developers should strive to make the parts of their app +energy efficiency. +%For example, if a particular feature consumes a great deal +%of energy but adds little value, it is possible that it should be eliminated, +%not improved. Overall +Developers should strive to make the parts of their app that generate a large amount of value as energy-efficient as possible, remove parts that generate little value while consuming a great deal of energy, and defer work on everything else. @@ -108,10 +114,10 @@ defer work on everything else. A measure of app value makes it possible to produce a rigorous definition of the term \textit{energy virus}: an app that produces little to no value per -joule. The choice of threshold will require some study, as it is unlikely +joule. The choice of threshold will require some study, as it is unlikely and impossible to produce a single efficiency cutoff that cleanly separates -malicious apps from ones that are merely poorly-written. Note also that this -definition of energy virus can be made on a per-user basis. This is important +malicious apps from ones that are merely poorly-written. This +definition of energy virus can also be made on a per-user basis. This is important since a non-malicious but poorly-written app that continues to consume energy even long after the user has stopped using it---and it has stopped providing value---functions as an energy virus for that user, but may not for a user @@ -119,7 +125,7 @@ that interacts with it more frequently. \subsection{Prioritizing System Resources} -An app value measure should able to able to be used to prioritize limited +An app value measure should be able to be used to prioritize limited system resources, particularly energy but also storage, memory, networking bandwidth and processor time. While mechanisms differ, most previous attempts to control energy consumption rely on some form of rate control which @@ -141,14 +147,15 @@ are likely many ways to combine energy consumption with a value measure in order to prioritize energy consumption, it is not clear that energy consumption can be prioritized effectively without some measure of value. The same approach can also be applied to determine how much of any limited system -resource to allocate to each app, with high-value apps gaining priority over -the processor, memory allocation, networking bandwidth and limited storage. -Together these resources allocation measures are designed to ensure that +resource to allocate to each app, +%with high-value apps gaining priority over +%the processor, memory allocation, networking bandwidth and limited storage. +Together these resources allocation measures can be designed to ensure that high-value apps run smoothly at the expense of lower-value apps. \subsection{Summary of Requirements} -The uses cases above give rise to a set of requirements for a possible value +The use cases above give rise to a set of requirements for a possible value measurement: % \begin{itemize} @@ -162,7 +169,7 @@ inputs, requiring that it be calculable given data from a single user. \item It should enable targeted development by highlighting what parts of an app generate value and what parts do not. -\item It should be able to be efficiently computed to not overly consume the +\item It should be efficiently computable without unduly consuming the resources that it is designed to help manage. \item It should be derived with little to no input from the user.