results.tex 8.35 KB
\section{Results}
\label{sec-results}

To examine the potential components of a value measure further, we utilize a
large dataset of energy consumption measurements collected by an IRB-approved
experiment run on the \PhoneLab{} testbed. \PhoneLab{} is a public smartphone
platform testbed located at the University at
Buffalo~\cite{phonelab-sensemine13}. 220~students, faculty, and staff carry
instrumented Android Nexus~5 smartphones and receive subsidized service in
return for willingness to participate in experiments. \PhoneLab{} provides
access to a representative group of participants balanced between genders and
across a wide variety of age brackets, making our results more
representative.

Understanding fine-grained energy consumption dynamics required more
information than Android normally exposes to apps. In addition, to explore
components of our value measure we also wanted to capture information about
app usage---including foreground and background time and use of the display
and audio interface---that was not possible to measure on unmodified Android
devices. So to collect our dataset we took advantage of \PhoneLab{}'s ability
to modify the Android platform itself. We instrumented the
\texttt{SurfaceFlinger} and \texttt{AudioFlinger} Android platform components
to record usage of the screen and audio, and altered the Activity Services
package to record energy consumption at each app transition, allowing energy
consumption by components such as the screen to be accurately attributed to
the foreground app, a feature that Android's internal battery monitoring
component (the Fuel Gauge) lacks. Changes were distributed to \PhoneLab{}
participants in November, 2013, via an over-the-air (OTA) platform update.
The resulting 2~month dataset of 67~GB of compressed log files represents
\num{6806} user days during which \num{1328}~apps were started \num{277785}
times and used for a total of \num{15224} hours of active use by
107~\PhoneLab{} participants.

Our analysis begins by investigating several components of a possible value
measure and shows the effect of using each to weight the overall energy
consumed by each app. Next, we formulate a simple measure of content
delivery by measuring usage of the screen and audio output devices and test
it through a survey completed by 47~experiment participants. Unfortunately,
our results are inconclusive and open to several possible interpretations
which we conclude by discussing.

\subsection{Total Energy}

\input{./figures/tables/tableALL.tex}

Clearly, ranking apps by total energy consumption over the entire study says
much more about app popularity than it does about anything else.
Table~\ref{table-total} shows the top and bottom energy-consuming apps over
the entire study. As expected, popular apps such as the Android Browser,
Facebook, and the Android Phone component consume the most energy, while the
list of low consumers is dominated by apps with few installs. This table does
serve, however, to identify the popular apps in use by \PhoneLab{}
participants, and as a point of comparison for the remainder of our results.

\subsection{Power}

Computing each app's power consumption by scaling their total energy usage
against the total time they were running, either in the background or
foreground, reveals more information, as shown in Table~\ref{table-rate}. Our
results identify Facebook Messenger, Google+, and the Super-Bright LED
Flashlight as apps that rapidly-consume energy, while the Bank of America and
Weather Channel apps consume energy slowly. Differences between apps in
similar categories may begin to identify apps with problematic energy
consumption, such as contrasting the high energy usage of Facebook Messenger
with other messaging clients such as WhatsApp, Twitter, and Android
Messaging.

\subsection{Foreground Energy Efficiency}

Isolating the foreground component of execution time provides a better
measure of value, since it ignores the time that users spend ignoring apps.
Table~\ref{table-foreground} shows a measure of energy efficiency computed by
utilizing foreground time alone as our value measure. Some surprising changes
from the power results can be seen. Some apps have remaining in their former
categories: Bank of America, which was identified as a low-power app, is also
a highly-efficient app when using foreground time as the value measure; and
Facebook Messenger, which was identified as a high-power app, is also marked
as inefficient. Other apps, however, have switched categories. ESPN
Sportscenter and Yahoo Mail do not consume much power, but also don't spend
much time in the foreground; interestingly, none of the high-power apps
looked better when their foreground usage was considered.

\subsection{Content Energy Efficiency}

Finally, we the data we collected by instrumenting the
\texttt{SurfaceFlinger} and \texttt{AudioFlinger} components to compute a
simple measure of content delivery. We measure the audio and video frame
rates and combine them into a single measure by using bitrates corresponding
to a 30~fps YouTube-encoded video and 128~kbps two-channel audio, with the
weights representing the fact that a single frame of video contains much more
content than a single sample of audio. We use this combined metric as the
value measure and again use it to weight the energy consumption of each app,
with the results shown in Table~\ref{table-content}.

Comparing with the foreground energy efficiency again shows several
interesting changes. Yahoo Mail, which foreground energy efficiency marked as
inefficiency, looks more efficient when content delivery is considered. While
it is possible that one \PhoneLab{} participant uses it to read email very
quickly, it may be more likely that it uses a ``spinner'' or other fancy UI
elements that generate artificially high frame rates without delivering much
information. The inability to distinguish between meaningless and meaningful
video frame content is a significant weakness of this simple approach.
YouTube and Candy Crush Saga both earn high marks, which is encouraging given
that they are very different apps but also might be a result of overweighting
screen refreshes. The Android Clock is also an unsurprising result, as it
requires almost no energy to generate a relatively-large number of screen
redraws.

\subsection{Survey Results and Discussion}

\begin{figure*}[t]
\centering
\includegraphics[width=\textwidth]{./figures/survey.pdf}

\caption{\textbf{Participant responses to energy inefficient app suggestions.} The height of each bar
	demonstrates how many of the suggested apps the user is willing to remove for better battery life. }

\label{fig-survey}

\end{figure*}

To continue the evaluation of our simple content-based value measure, we
prepared a survey for the 107~\PhoneLab{} participants who contributed data
to our experiment. Our goal was to determine if users would be more willing
to remove inefficient apps, as defined using our content-based measure. As a
baseline, we also asked users about the apps that consumed the most energy.
We used each participants data to generate a custom survey containing
questions about 9 apps: the 3 least efficient apps as computed by our
content-based value measure, the 3 apps that used the most energy on their
smartphone during the experiment, and 3 apps chosen at random. For each we
asked them a simple question: ``If it would improve your battery life, would
you uninstall or stop using this app?'' To compute an aggregate score for
both the content-based and usage based measures, we give each measure 1~point
for a ``Yes'', 0.5~points for a ``Maybe'' and 0~points for a ``No''.
47~participants completed the survey, and the results are shown in
Figure~\ref{fig-survey}.

Overall the results are inconclusive, with the content-delivery measure not
clearly outperforming the straw-man usage measure at predicting which apps
each user would be willing to remove to save battery life. Given the crude
nature of our metric, this is not particularly surprising, and can be
interpreted as a clear sign that we need a more sophisticated value measure
incorporating several of the potential inputs we have previously discussed.
However, on one level the results are very encouraging: most users were
willing to consider removing one or more apps if that app would improve their
battery lifetime. Clearly users are making this decision based on some idea
of each app's value---the challenge is to replicate their choices using the
information we have available to us.