results.tex 8.9 KB
\section{Results}
\label{sec-results}

To examine the potential components of a value measure further, we utilize a
large dataset of energy consumption measurements collected by an IRB-approved
experiment run on the \PhoneLab{} testbed. \PhoneLab{} is a public smartphone
platform testbed located at the University at
Buffalo~\cite{phonelab-sensemine13}. 220~students, faculty, and staff carry
instrumented Android Nexus~5 smartphones and receive subsidized service in
return for willingness to participate in experiments. \PhoneLab{} provides
access to a representative group of participants balanced between genders and
across a wide variety of age brackets, making our results more
representative.

Understanding fine-grained energy consumption dynamics required more
information than Android normally exposes to apps. In addition, to explore
components of our value measure we also wanted to capture information about
app usage---including foreground and background time and use of the display
and audio interface---that was not possible to measure on unmodified Android
devices. So to collect our dataset we took advantage of \PhoneLab{}'s ability
to modify the Android platform itself. We instrumented the
\texttt{SurfaceFlinger} and \texttt{AudioFlinger} components in the Android platform
to record usage of the screen and audio, and altered the ActivityManagerService
package to record energy consumption at each app transition. This allows energy
consumption by components such as the screen to be accurately attributed to
the foreground app, a feature that Android's internal battery monitoring
component (the Fuel Gauge) lacks. Changes were distributed to \PhoneLab{}
participants in November, 2013, via an over-the-air (OTA) platform update.
The resulting 2~month dataset of 67~GB of compressed log files represents
\num{6806} user days during which \num{1328}~apps were started \num{277785}
times, and used for a total of \num{15224} hours of active use by
107~\PhoneLab{} participants.

Our analysis begins by investigating several components of a possible value
measure and shows the effect of using each to weight the overall energy
consumed by each app. Next, we formulate a simple measure of content
delivery by measuring usage of the screen and audio output devices and test
it through a survey completed by 47~experiment participants. Unfortunately,
our results are inconclusive and open to several possible interpretations
which we discuss. We present our results in tabular format where for each measure we 
rank 10 best performing and 10 worst performing apps in desending order.

\newpage

\subsection{Total Energy}

\input{./figures/tables/tableALL.tex}

Clearly, ranking apps by total energy consumption computed by adding all 
foreground and background energy consumptions over the entire study says 
much more about app popularity than it does about anything else.
Table~\ref{table-total} shows the top and bottom energy-consuming apps over
the entire study. As expected, popular apps such as the Android Browser,
Facebook, and the Android Phone component consume the most energy, while the
list of low consumers is dominated by apps with few installs. This table does
serve, however, to identify the popular apps in use by \PhoneLab{}
participants, and as a point of comparison for the remainder of our results.

\subsection{Power}

Computing each app's power consumption by scaling their total energy usage
against the total time they were running, either in the background or
foreground, reveals more information, as shown in Table~\ref{table-rate}. Our
results identify Facebook Messenger, Google+, and the Super-Bright LED
Flashlight as apps that rapidly-consume energy, while the Bank of America and
Weather Channel apps consume energy slowly. Differences between apps in
similar categories may begin to identify apps with problematic energy
consumption, such as contrasting the high energy usage of Facebook Messenger
with other messaging clients such as WhatsApp, Twitter, and Android
Messaging.

\subsection{Foreground Energy Efficiency}

Isolating the foreground component of execution time provides a better
measure of value, since it ignores the time that users spend ignoring apps.
Table~\ref{table-foreground} shows a measure of energy efficiency computed by
%utilizing foreground time alone as our value measure. 
dividing total foreground energy consumption by total foreground time of an 
app. Some surprising changes
from the power results can be seen. A number of apps have remained in their former
categories: Bank of America, which was identified as a low-power app, is also
a highly-efficient app when using foreground time as the value measure; and
Facebook Messenger, which was identified as a high-power app, is also marked
as inefficient. Other apps, however, have switched categories. ESPN
Sportscenter and Yahoo Mail do not consume much power, but also don't spend
much time in the foreground; interestingly, none of the high-power apps
looked better when their foreground usage was considered.

\subsection{Content Energy Efficiency}

Finally, we use the data we collected by instrumenting the
\texttt{SurfaceFlinger} and \texttt{AudioFlinger} components to compute a
simple measure of content delivery. We measure the audio and video frame
rates and combine them into a single measure by using bit-rates corresponding
to a 30~fps YouTube-encoded video and 128~kbps two-channel audio, with the
weights representing the fact that a single frame of video contains much more
content than a single sample of audio. We use this combined metric as the
value measure and again use it to weight the energy consumption of each app,
with the results shown in Table~\ref{table-content}.

Comparing with the foreground energy efficiency again shows several
interesting changes. Yahoo Mail, which foreground energy efficiency marked as
inefficient, looks more efficient when content delivery is considered. While
it is possible that one \PhoneLab{} participant uses it to read email very
quickly, it may be more likely that it uses a ``spinner'' or other fancy UI
elements that generate artificially high frame rates without delivering much
information. The inability to distinguish between meaningless and meaningful
video frame content is a significant weakness of this simple approach.
YouTube and Candy Crush Saga both earn high marks, which is encouraging given
that they are very different apps but also might be a result of overweighting
screen refreshes. The Android Clock is also an unsurprising result, as it
requires almost no energy to generate a relatively-large number of screen
redraws in timer and stopwatch mode.

\subsection{Survey Results and Discussion}

\begin{figure*}[t]
\centering
\includegraphics[width=\textwidth]{./figures/survey.pdf}

\caption{\small \textbf{Survey Results.} The height of each bar demonstrates how
many of the suggested apps the user is willing to remove for better battery
life, with suggestions based on overall usage or our new content-delivery
efficiency measure. Our new measure does not convincingly out-perform the
straw man.}

\label{fig-survey}

\end{figure*}

To continue the evaluation of our simple content-based value measure, we
prepared a survey for the 107~\PhoneLab{} participants who contributed data
to our experiment. Our goal was to determine if users would be more willing
to remove inefficient apps, as defined using our content-based measure. As a
baseline, we also asked users about the apps that consumed the most energy.
We used each participants data to generate a custom survey containing
questions about 9 apps: the 3 least efficient apps as computed by our
content-based value measure, the 3 apps that used the most energy on their
smartphone during the experiment, and 3 apps chosen at random. For each we
asked them a simple question: ``If it would improve your battery life, would
you uninstall or stop using this app?'' To compute an aggregate score for
both the content-based and usage based measures, we give each measure 1~point
for a ``Yes'', 0.5~points for a ``Maybe'' and 0~points for a ``No''.
47~participants completed the survey, and the results are shown in
Figure~\ref{fig-survey}. For each user, if the score of one measure is higher 
than the other, it is considered a ``win'' for the former.

Overall the results are inconclusive, with the content-delivery measure not
clearly outperforming the straw-man usage measure at predicting which apps
each user would be willing to remove to save battery life. Given the crude
nature of our metric, this is not particularly surprising, and can be
interpreted as a sign that we need a more sophisticated value measure
incorporating more of the potential inputs we have previously discussed.
However, on one level the results are very encouraging: most users were
willing to consider removing one or more apps if that app would improve their
battery lifetime. Clearly, users are making this decision based on some idea
of each app's value---the challenge is to replicate their choices using the
information we have available to us.