\section{Results} \label{sec-results} To examine the potential components of a value measure further, we utilize a large dataset of energy consumption measurements collected by an IRB-approved experiment run on the \PhoneLab{} testbed. \PhoneLab{} is a public smartphone platform testbed located at the University at Buffalo~\cite{phonelab-sensemine13}. 220~students, faculty, and staff carry instrumented Android Nexus~5 smartphones and receive subsidized service in return for willingness to participate in experiments. \PhoneLab{} provides access to a representative group of participants balanced between genders and across a wide variety of age brackets, making our results more representative. Understanding fine-grained energy consumption dynamics required more information than Android normally exposes to apps. In addition, to explore components of our value measure we also wanted to capture information about app usage---including foreground and background time and use of the display and audio interface---that was not possible to measure on unmodified Android devices. So to collect our dataset we took advantage of \PhoneLab{}'s ability to modify the Android platform itself. We instrumented the \texttt{SurfaceFlinger} and \texttt{AudioFlinger} components in the Android platform to record usage of the screen and audio, and altered the ActivityManagerService package to record energy consumption at each app transition. This allows energy consumption by components such as the screen to be accurately attributed to the foreground app, a feature that Android's internal battery monitoring component (the Fuel Gauge) lacks. Changes were distributed to \PhoneLab{} participants in November 2013 via an over-the-air (OTA) platform update. The resulting 2~month dataset of 67~GB of compressed log files represents \num{6806} user days during which \num{1328}~apps were started \num{277785} times, and used for a total of \num{15224} hours of active use by 107~\PhoneLab{} participants. Our analysis begins by investigating several components of a possible value measure and shows the effect of using each to weight the overall energy consumed by each app. Next, we formulate a simple measure of content delivery by measuring usage of the screen and audio output devices and test it through a survey completed by 47~experiment participants. Unfortunately, our results are inconclusive and open to several possible interpretations which we discuss. We present our results in tabular format where for each measure we rank 10 best performing and 10 worst performing apps in descending order. %\newpage \subsection{Total Energy} \input{./figures/tables/tableALL.tex} Clearly, ranking apps by total energy consumption computed by adding all foreground and background energy consumption over the entire study says much more about app popularity than it does about anything else. Table~\ref{table-total} shows the top and bottom energy-consuming apps over the entire study. As expected, popular apps such as the Android Browser, Facebook, and the Android Phone component consume the most energy, while the list of low consumers is dominated by apps with few installs. This table does serve, however, to identify the popular apps in use by \PhoneLab{} participants, and as a point of comparison for the remainder of our results. \subsection{Power} Computing each app's power consumption by scaling their total energy usage against the total time they were running, either in the background or foreground, reveals more information, as shown in Table~\ref{table-rate}. Our results identify Facebook Messenger, Google+, and the Super-Bright LED Flashlight as apps that rapidly-consume energy, while the Bank of America and Weather Channel apps consume energy slowly. Differences between apps in similar categories may begin to identify apps with problematic energy consumption, such as contrasting the high energy usage of Facebook Messenger with other messaging clients such as WhatsApp, Twitter, and Android Messaging. \subsection{Foreground Energy Efficiency} Isolating the foreground component of execution time provides a better measure of value, since it ignores the time that users spend ignoring apps. Table~\ref{table-foreground} shows a measure of energy efficiency computed by %utilizing foreground time alone as our value measure. dividing total foreground energy consumption by total foreground time of an app. Some surprising changes from the power results can be seen. A number of apps have remained in their former categories: Bank of America, which was identified as a low-power app, is also a highly-efficient app when using foreground time as the value measure; and Facebook Messenger, which was identified as a high-power app, is also marked as inefficient. Other apps, however, have switched categories. ESPN Sportscenter and Yahoo Mail do not consume much power, but also don't spend much time in the foreground; interestingly, none of the high-power apps looked better when their foreground usage was considered. \subsection{Content Energy Efficiency} Finally, we use the data we collected by instrumenting the \texttt{SurfaceFlinger} and \texttt{AudioFlinger} components to compute a simple measure of content delivery. We measure the audio and video frame rates and combine them into a single measure by using bit-rates corresponding to a 30~fps YouTube-encoded video and 128~kbps two-channel audio, with the weights representing the fact that a single frame of video contains much more content than a single sample of audio. We use this combined metric as the value measure and again use it to weight the energy consumption of each app, with the results shown in Table~\ref{table-content}. Comparing with the foreground energy efficiency again shows several interesting changes. Yahoo Mail, which foreground energy efficiency marked as inefficient, looks more efficient when content delivery is considered. While it is possible that one \PhoneLab{} participant uses it to read email very quickly, it may be more likely that it uses a ``spinner'' or other fancy UI elements that generate artificially high frame rates without delivering much information. The inability to distinguish between meaningless and meaningful video frame content is a significant weakness of this simple approach. YouTube and Candy Crush Saga both earn high marks, which is encouraging given that they are very different apps but also might be a result of overweighting screen refreshes. The Android Clock is also an unsurprising result, as it requires almost no energy to generate a relatively-large number of screen redraws in timer and stopwatch mode. \subsection{Survey Results and Discussion} \begin{figure*}[t] \centering \includegraphics[width=\textwidth]{./figures/survey.pdf} \caption{\small \textbf{Survey Results.} The height of each bar demonstrates how many of the suggested apps the user is willing to remove for better battery life, with suggestions based on overall usage or our new content-delivery efficiency measure. Our new measure does not convincingly out-perform the straw man.} \label{fig-survey} \end{figure*} To continue the evaluation of our simple content-based value measure, we prepared a survey for the 107~\PhoneLab{} participants who contributed data to our experiment. Our goal was to determine if users would be more willing to remove inefficient apps, as defined using our content-based measure. As a baseline, we also asked users about the apps that consumed the most energy. We used each participants data to generate a custom survey containing questions about 9 apps: the 3 least efficient apps as computed by our content-based value measure, the 3 apps that used the most energy on their smartphone during the experiment, and 3 apps chosen at random. For each we asked them a simple question: ``If it would improve your battery life, would you uninstall or stop using this app?'' To compute an aggregate score for both the content-based and usage based measures, we give each measure 1~point for a ``Yes'', 0.5~points for a ``Maybe'' and 0~points for a ``No''. 47~participants completed the survey, and the results are shown in Figure~\ref{fig-survey}. For each user, if the score of one measure is higher than the other, it is considered a ``win'' for the former. Overall the results are inconclusive, with the content-delivery measure not clearly outperforming the straw-man usage measure at predicting which apps each user would be willing to remove to save battery life. Given the crude nature of our metric, this is not particularly surprising, and can be interpreted as a sign that we need a more sophisticated value measure incorporating more of the potential inputs we have previously discussed. However, on one level the results are very encouraging: most users were willing to consider removing one or more apps if that app would improve their battery lifetime. Clearly, users are making this decision based on some idea of each app's value---the challenge is to replicate their choices using the information we have available to us.