certainty.tex 16 KB
\section{From Uncertainty to Certainty}
\label{sec-certainty}

While \texttt{maybe} allows programmers to specify multiple alternatives,
ultimately only one alternative may be used. Either a single, globally best
alternative must be chosen, or a deterministic decision procedure must be
selected. Before discussing options for \textit{adapting} an app to its
runtime environment, we first explain our runtime's support for
\texttt{maybe} alternatives, including \textit{a posteriori} evaluation and
data-collection. Then, we discuss how \texttt{maybe} testing enables a
variety of different adaptation patterns.

\subsection{Evaluating Alternatives} The optional \texttt{evaluate} block of
a \texttt{maybe} statement allows programmers to provide app-specific
\textit{a posteriori} evaluation logic. However, in many cases, we expect
that \texttt{maybe} statements will be used to achieve common objectives such
as improving performance or saving energy. To steamline application
development, our current system assesses \texttt{maybe} statements without a
\texttt{better} block in terms of both energy and performance. In cases where
one alternative is a clear winner in terms of both energy and performance,
that alternative will be used---although note that this selection may still
need to be time-varying since it may depend on exogenous factors.

Cases where \texttt{maybe} blocks provide an energy-performance tradeoff
require more investigation to handle. We are exploring several options,
including collapsing both metrics into a single score by computing the
energy-delay product of each alternative, or allowing users to set a single
per-app preference indicating whether energy or performance should determine
\texttt{maybe} alternative selection.

When \texttt{maybe} statements include a \texttt{better} block, their custom
evaluation logic is expected to merge any relevant factors into a single
score, although \texttt{better} blocks can still record other relevant
information to aid in understanding app behavior. Specifically,
\texttt{better} blocks return a JSON object where the ``score'' key is used
as the score but all other keys are recorded for later analysis. Our standard
energy and performance \texttt{better} block records energy and performance
separately in the JSON object but combines them as required into a single
score. Note that because the \texttt{better} block delivers information to
the developer through the \texttt{maybe} service, as described below,
\texttt{maybe} statements can be connected with end-to-end app performance
metrics that would normally not be visible on the device. \texttt{better}
blocks may also want to query the user directly, and we are considering ways
to adjust their semantics to make this possible.

In some cases we expect that \texttt{better} blocks may need to know which
alternative was executed in order to determine the score---for example, if
the two alternatives produce different quality output while trading off
performance or energy consumption. The simplest solution is to have each
alternative set a variable indicating that it was used, but we believe that
eventually including some form of labeling syntax for \texttt{maybe}
alternatives may be beneficial.

As a side note, if a \texttt{maybe} alternative encounters an error, the
system will automatically both retry another alternative and give the
error-generating alternative the worst possible score. In this case any
custom evaluation logic will not be executed for the failing alternative.
While \texttt{maybe} is not designed as a way to avoid errors, the existence
of other alternatives provides our system with a way to work around 
failures sometimes caused by environmental factors.

A final question concerns when a \texttt{maybe} alternative should be assessed.
In some
cases it may be appropriate for post-mortem evaluation to happen immediately after execution.
In others, it may be necessary to execute the same alternative over a period
of time to perform a fair comparison. As described previously, 
\texttt{better} blocks can indicate explicitly whether or not to continue
evaluating the alternative, and we are determining how to make a similar
choice available to \texttt{maybe} statements that do not use customized
evaluation logic. In both cases, however, the \texttt{maybe} system allows
developers continuous per-statement control over alternative evaluation and
selection as described in more detail later in this section.

\subsection{\texttt{\large maybe} Alternative Testing}

We next describe the pre- and post-deployment testing that helps developers to
design an \textit{adaptation} policy, a strategy for ultimately selecting 
between alternatives. While the \texttt{maybe} system
automates many of the tedious tasks normally associated with large-scale
testing, we still provide ways for the developer to guide and even override
any step in the process.

\subsubsection{Runtime Control}

To begin, we briefly outline how our Android prototype system implements the
\texttt{maybe} construct. We (1) rewrite each \texttt{maybe} code block to an
\texttt{if-else} statement controlled by a call into the \texttt{maybe}
system and (2) generate a similar setter for each \texttt{maybe} variable.
Variable values and code branches are now all under the control of a separate
\texttt{maybe} service which can be deployed as a separate app or
incorporated into the Android platform itself. It is responsible for
communicating with the global \texttt{maybe} server to retrieve adaptation
parameters for all \texttt{maybe}-enabled apps on the smartphone. When possible, 
we avoid interprocess communication during each
\texttt{maybe} decision by caching decisions in the app.  The local
\texttt{maybe} service pushes cache invalidation messages when that decision
changes. The
\texttt{maybe} service tracks when alternatives change, runs both generic and
app-specific alternative assessment logic when appropriate, and returns
testing results to the \texttt{maybe} server.

Because unpredictable changes to variables at runtime could cause crashes or
incorrect program behavior, our \texttt{maybe} prototype currently only
changes the values of these variables when they are set using a
\texttt{maybe} statement. If the app wants to enable periodic readaptation of
certain global variables, such as the interval controlling a timer, it can do
so by periodically resetting the value using another \texttt{maybe}
statement. This ensures that \texttt{maybe} variables only change when the
developer expects them to.

\subsubsection{Simulation or Emulation}

Pre-deployment simulation or emulation may provide a way to efficiently
assess \texttt{maybe} blocks without involving users. Building simulation
environments that accurately reflect all of the uncertainties inherent to
mobile systems programming, however, is difficult. To complicate matters,
\texttt{maybe} alternatives may depend on details of user interaction that
are difficult to know a priori, particularly when new apps and app
functionalities are being investigated. So in most cases we believe that
post-deployment testing will still be required.

However, pre-deployment testing may still be a valuable approach, particularly
when a large amount of uncertainties are present and a correspondingly-large
number of \texttt{maybe} statements are being used. Since this can explode
the adaptation space, simulation may be able to help guide the developer's
choices of which \texttt{maybe} statements may have a significant impact on
performance and should be assessed first. Other \texttt{maybe} statements
can be assessed later or eliminated altogether.

\subsubsection{Split Testing}

Eventually code containing a number of \texttt{maybe} statements will be
deployed on thousands or millions of devices. At this point, large-scale
split testing and learning can begin. If the user community is large enough,
then it may be possible to collect statistically-significant results even for
all possible permutations of \texttt{maybe} alternatives.  For apps
with a small number of users, or comparatively large number of \texttt{maybe}
statements, we can collect data for variations of one \texttt{maybe} statement 
at a time while holding the others constant.  As an adaptation policy is
designed and deployed for the statement being tested, we begin to vary and 
measure the next \texttt{maybe} statement. We
will provide a developer web interface allowing the developer to determine
which \texttt{maybe} statements should be tested at any given time.

Each time a \texttt{maybe} statement is reached or \texttt{maybe} variable is
set, the \texttt{maybe} system records:

\begin{itemize}

\item what \texttt{maybe} was reached;

\item what alternative was used and why.  This includes all environmental
features used to make the selection, as well as any other available 
provenance information;

\item what \texttt{better} block evaluated the alternative, and the entire
JSON object it returned, including the score;

\item and a variety of other environmental and configuration parameters 
that the user permits access to: A user identifier; device
and platform information; networking provider and conditions; location;
battery level; and so on.

\end{itemize}

This dataset is periodically uploaded to the \texttt{maybe} server by the
\texttt{maybe} service and used to drive the adaptation approaches discussed
next.

\subsubsection{Simultaneous Split Testing}

While large-scale split testing is intended to provide good coverage over all
possible sources of uncertainty we have discussed, it still normally requires
that only one choice be made at any given time---implying that two
alternatives may \textit{never} be evaluated under identical conditions. For
\texttt{maybe} code blocks, however, we are exploring the idea of performing
\textit{simultaneous} split testing. In this model the app forks at the top
of the \texttt{maybe} block, executes and scores all alternatives, and then
continues with the outputs from the best alternative at the bottom of the
\texttt{maybe} statement. On single-core devices this can be done in serial,
while the growing number of multi-core smartphones provides the option of
doing this in parallel. The benefit of this approach is that each alternative
is executed under near-identical conditions. The drawbacks include the
overhead of the redundant executions and the possibility for interference
between alternatives executing in parallel. However, we are excited to
explore this option in our prototype.

\subsection{\texttt{\large maybe} Endgames}

The entire \texttt{maybe} approach is predicated on the fact that there does
exist, among the alternatives, a right choice, even if it depends on many
factors and uncertainties. We continue by discussing how the dataset
generated by post-deployment testing can be consumed to determine how to
correctly choose \texttt{maybe} alternatives at runtime.

\subsubsection{Simple Cases}

In the simplest case, testing may reveal that a single alternative performs
the best on all devices, for all users, at all times. In this situation, the
\texttt{maybe} system may offer a way for the developer to immediately cease
testing of that alternative and even automatically rewrite that portion of
code to remove the \texttt{maybe} statement. However, it is also possible
that the situation may change in the future when a new device, or Android
version, or battery technology is introduced, and so the programmer may also
choose to preserve the alternatives to enable later retesting and possible
adaptation.

The slightly more complicated case is when testing reveals that alternatives
provide stable tradeoffs between energy and performance---one block always
saves energy at the cost of performance. In this case the system only has to
determine whether to prioritize energy or performance. While this decision
seems simple, it is itself complicated by differences in battery capacity,
charging habits, mixtures of installed apps, and the importance of the app to
each user. However, the stability of the alternatives' outcomes means that
once an energy or performance policy decision has been made, the choice of
alternative has also been made.

\subsubsection{Static Adaptation}

In the more complicated cases, testing reveals that the choice of alternative
depends on some subset of the factors driving uncertainty in mobile systems
programming. We break this group into two subsets, depending on whether the
adaptation is time varying (dynamic) or not (static). We begin with the
second, somewhat easier case.

If the alternative is determined through static adaptation then the correct
choice is a function of some unchanging (or very-slowly changing) aspect of
the deployed environment. Examples include the model of the device, overall
network conditions in the country in which the device is being used, the
other apps installed on the device, or some slowly-changing user
characteristic such as gender, age, or charging habits. In this case it is
possible that the correct alternative can be determined through some
clustering based on these features, and once determined will remain the best
choice over long time intervals.

\subsubsection{Dynamic Adaptation}

If the choice of alternative depends on dynamic factors such as the accuracy
of location services, the amount of energy left in the battery, or the type
of network the device is currently connected to, then it is possible that no
single alternative can be chosen even for a single user. Instead, the
\texttt{maybe} system allows developers to evaluate one or more strategies to
drive the runtime alternative selection process.

Note that \texttt{better} blocks are \textit{not} intended to accomplish this
kind of adaptation. First, they run after the \texttt{maybe} block has been
executed, not before. Second, a single strategy defeats the flexibility
inherent to the \texttt{maybe} approach and would devolve into the same
fragile decision-making logic we are trying to avoid.

Instead, the \texttt{maybe} system allows developers to experiment with and
evaluate a variety of different dynamic adaptation strategies deployed in a
companion library, with the choice guided by post-deployment testing. For
example, if the performance of an alternative is discovered to be correlated
with the presence of a network link with over a certain bandwidth, then that
adaptation strategy can be connected with that particular \texttt{maybe}
block.

Observe that in some cases of dynamic adaptation, what begins as a
\texttt{maybe} statement may end as effectively \texttt{if-else} block
switching on some of the same static thresholds that we attacked to motivate
our system. However, through the process of arriving at this point we have
determined several things that were initially unknown: What the alternatives
accomplish, that a single threshold works for all users, and what that
threshold is. And if the developer chooses to maintain that statement as a
\texttt{maybe} block, they can continue to perform testing and modify their
adaptation strategy as devices and users change.

Another benefit of this approach is that time-varying decisions can be
outsourced to developers with expertise in the particular area driving
adaptation policy decisions. For example, if the developer chooses to make a
energy-performance tradeoff dynamically based on the energy conditions at
that time, they can connect that \texttt{maybe} statement to a sophisticated
machine learning algorithm written by an expert in energy adaptation, instead
of being forced to implement their own simplistic approach.

\subsubsection{Manual Adaptation}

In some cases even our best efforts to automatically adapt may fail, and it
may be impossible to predict which alternative is best for a particular user
using a particular device at a particular time. If the differences between
the alternatives are small, then it may be appropriate to simply fall back to
a best-effort choice. However, if the differences between the alternatives
are significant then the \texttt{maybe} alternatives may need to be exposed
to the user through a settings menu. Fortunately, information obtained
through testing can still be presented to the user to guide their decision.
Note that this requires labeling alternatives in a human-readable way.