\section{From Uncertainty to Certainty} \label{sec-certainty} While \texttt{maybe} allows programmers to specify multiple alternatives, ultimately only one alternative can be executed at runtime. Either a single, globally-optimal alternative must be identified, or a deterministic decision procedure must be developed. Before discussing options for adapting an app to its runtime environment, we first explain our runtime's support for \texttt{maybe} alternatives, including \textit{a posteriori} evaluation and data collection. Then, we discuss how \texttt{maybe} testing enables a variety of different adaptation patterns. \subsection{Evaluating Alternatives} The optional \texttt{evaluate} block of a \texttt{maybe} statement allows programmers to provide app-specific \textit{a posteriori} evaluation logic. However, in many cases, we expect that \texttt{maybe} statements will be used to achieve common objectives such as improving performance or saving energy. To streamline application development, our current system evaluates \texttt{maybe} statements without a \texttt{evaluate} block by measuring both energy and performance. In cases where one alternative optimizes both, that alternative will be used---although the decision may still be time-varying due to dependence on time-varying factors such as network availability. When alternatives produce an energy-performance tradeoff we are exploring several options, including collapsing both metrics into a single score by computing the energy-delay product (EDP) of each alternative, or allowing users to set a per-app energy or performance preference. \texttt{evaluate} blocks can also record other information to aid adaptation. While the \texttt{score} value is used to evaluate the alternative, the entire JSON object returned by the \texttt{evaluate} block is delivered to the developer for later analysis. This allows \texttt{maybe} statements to be connected with end-to-end app performance metrics not visible on the device. We expect that some \texttt{evaluate} blocks may need to know which alternative was executed to compute a score---for example, if the two alternatives produce different quality output. We are exploring the use of automatically-generated labels to aid this process. If a \texttt{maybe} alternative throws an error, the system will bypass the \texttt{evaluate} block and give it the worst possible score. By integrating a form of record-and-replay~\cite{gomez2013reran}, it may be possible to roll back the failed alternative and retry another. While \texttt{maybe} is intended to enable adaptation, not avoid errors, the existence of other alternatives provides a way to work around failures caused by uncertainty. Fault tolerance may also encourage developers to use \texttt{maybe} statements to prototype alternatives to existing well-tested code. A final question concerns when a \texttt{maybe} alternative should be evaluated. Some alternatives may require evaluation immediately after execution. Others may require repeated execution over a longer period of time to perform a fair comparison. As described previously, \texttt{evaluate} blocks can indicate explicitly whether or not to continue evaluating the alternative, and we are determining how to make a similar choice available to \texttt{maybe} statements without \texttt{evaluate} blocks. In addition, \texttt{evaluate} blocks can store state across multiple alternative executions allowing them to evaluate not only micro- but also macro-level decisions. In both cases, however, the \texttt{maybe} system allows developers continuous per-statement control over alternative choice and evaluation as described in more detail later in this section. \subsection{\texttt{\large maybe} Alternative Testing} We next describe the pre- and post-deployment testing that helps developers to design an \textit{adaptation} policy, a strategy for ultimately selecting between alternatives. While the \texttt{maybe} system automates many of the tedious tasks normally associated with large-scale testing, we still provide ways for the developer to guide and control any step in the process. \subsubsection{Runtime control} To begin, we briefly outline how our Android prototype implements the \texttt{maybe} statement. We (1) rewrite each \texttt{maybe} conditional to an \texttt{if-else} statement controlled by a call into the \texttt{maybe} system and (2) generate a similar setter for each \texttt{maybe} variable. Variable values and code branches are now all under the control of a separate \texttt{maybe} service which can be deployed as a separate app or incorporated into the Android platform. It is responsible for communicating with the global \texttt{maybe} server to retrieve adaptation parameters for all \texttt{maybe}-enabled apps on the smartphone. When possible, we avoid interprocess communication during each \texttt{maybe} decision by caching decisions in the app, with the \texttt{maybe} service delivering cache invalidation messages when particular decisions change. The \texttt{maybe} service tracks when alternative decisions change, runs \texttt{evaluate} evaluation logic when appropriate, and returns testing results to the \texttt{maybe} server. \sloppypar{Because unexpected runtime variable changes could cause crashes or incorrect behavior, we only alter \texttt{maybe} variables when they are (re)initialized, not at arbitrary points during execution. If the app wants to enable periodic readaptation of certain variables, such as the interval controlling a timer, it can do so by periodically resetting the value using another \texttt{maybe} statement. This ensures that \texttt{maybe} variables only change when expected.} \subsubsection{Simulation or emulation} Pre-deployment simulation or emulation may provide a way to efficiently evaluate \texttt{maybe} statements without involving users. Building simulation environments that accurately reflect all of the uncertainties inherent to mobile systems programming, however, is difficult. To complicate matters, \texttt{maybe} alternatives may depend on details of user interaction that are difficult to know \textit{a priori}, particularly when new apps or features are being developed. So in most cases we believe post-deployment testing will be required. However, pre-deployment testing may still be a valuable approach, particularly when a large number of \texttt{maybe} statements are being used. Since this can explode the adaptation space, simulations may be able to help guide the developer's choices of which \texttt{maybe} statements may have a significant impact on performance and should be evaluated first. Other \texttt{maybe} statements can be evaluated later or eliminated. \subsubsection{Split testing} Eventually code containing a number of \texttt{maybe} statements will be deployed on thousands or millions of devices. At this point, large-scale split testing and data-driven learning can begin. If the user community is large enough, it may be possible to collect statistically-significant results even for all possible permutations of \texttt{maybe} alternatives. For apps with a small number of users, or a large number of \texttt{maybe} statements, we can collect data for variations of one or several \texttt{maybe} statements while holding the rest constant. As an adaptation policy is designed and deployed for the statement being tested, we begin to vary and measure the next group of \texttt{maybe} statements. Developers can observe and control the testing process through a web interface. Each time a \texttt{maybe} statement is reached or \texttt{maybe} variable is set, the \texttt{maybe} system records: % \begin{itemize} \item what \texttt{maybe} was reached; \item what alternative was used and why. This includes all environmental features used to make the decision, as well as any other available provenance information; \item what \texttt{evaluate} block evaluated the alternative, and the entire JSON object it returned, including the score; \item and a variety of other environmental and configuration parameters that the user permits access to: A user identifier; device and platform information; networking provider and conditions; location; battery level; and so on. \end{itemize} This dataset is periodically uploaded to the \texttt{maybe} server and used to drive the adaptation approaches discussed next. \subsubsection{Simultaneous split testing} While large-scale split testing is intended to provide good coverage over all possible sources of uncertainty we have discussed, it still normally requires that only one decision be made at any given time---implying that two alternatives may never be evaluated under identical conditions. For \texttt{maybe} statements, however, we are exploring the idea of performing \textit{simultaneous} split testing. In this model the app forks at the top of the \texttt{maybe} statement, executes and scores all alternatives, and then continues with the outputs from the best alternative at the bottom of the \texttt{maybe} statement. On single-core devices this can be done in serial, while the growing number of multi-core smartphones provides the option of doing this in parallel. The benefit of this approach is that each alternative is executed under near-identical conditions. The drawbacks include the overhead of the redundant executions and the possibility for interference between alternatives executing in parallel. \subsection{\texttt{\large maybe} Endgames} The entire \texttt{maybe} approach is predicated on the fact that there does exist, among the alternatives, a right decision, even if it depends on many factors and uncertainties. We continue by discussing how the dataset generated by post-deployment testing can be used to determine how to correctly choose \texttt{maybe} alternatives at runtime. \subsubsection{Simple cases} In the simplest case, testing may reveal that a single alternative performs the best on all devices, for all users, at all times. In this situation, the \texttt{maybe} system may offer a way for the developer to immediately cease testing of that alternative and even automatically rewrite that portion of code to remove the \texttt{maybe} statement. However, it is also possible that the situation may change in the future when a new device, or Android version, or battery technology is introduced, and so the programmer may also choose to preserve the flexibility in case it is useful in the future. The slightly more complicated case is when testing reveals that alternatives provide stable tradeoffs between energy and performance---one alternative always saves energy at the cost of performance. In this case the system only has to determine whether to prioritize energy or performance. While this decision seems simple, it is itself complicated by differences in battery capacity, charging habits, mixtures of installed apps, and the importance of the app to each user. However, the stability of the alternatives' outcomes means that once an energy or performance policy decision has been made, the choice of alternative has also been made. \subsubsection{Static adaptation} In the more complicated cases, testing reveals that the choice of alternative depends on some subset of the factors driving uncertainty in mobile systems programming. We break this group into two subsets, depending on whether the adaptation is time varying (dynamic) or not (static). We begin with the second, somewhat easier case. If the alternative is determined through static adaptation then the correct decision is a function of some unchanging (or very-slowly changing) aspect of the deployed environment. Examples include the device model, average network conditions, the other apps installed on the device, or user characteristics such as gender, age, or charging habits. In this case it is possible that the correct alternative can be determined through clustering based on these features, and once determined will remain the best choice for a long time. \subsubsection{Dynamic adaptation} If the choice of alternative depends on dynamic factors such as the accuracy of location services, the amount of energy left in the battery, or the type of network the device is currently connected to, then it is possible that no single alternative can be chosen even for a single user. Instead, the \texttt{maybe} system allows developers to evaluate one or more strategies to drive the runtime alternative selection process. Note that \texttt{evaluate} blocks are \textit{not} intended to accomplish this kind of adaptation. First, they run after the \texttt{maybe} statement has been executed, not before. Second, per-\texttt{maybe} strategy defeats the flexibility inherent to the \texttt{maybe} approach and would devolve into the fragile decision-making we are trying to avoid. Instead, the \texttt{maybe} system allows developers to experiment with and evaluate a variety of different dynamic adaptation strategies deployed in a companion library, with the decision guided by post-deployment testing. For example, if the performance of an alternative is discovered to be correlated with a link providing a certain amount of bandwidth, then that adaptation strategy can be connected with that particular \texttt{maybe} statement. Observe that in some cases of dynamic adaptation, what begins as a \texttt{maybe} statement may end as effectively \texttt{if-else} statement switching on a static threshold---the same approach we attacked to motivate our system. However, through the process of arriving at this point we have determined several things that were initially unknown: (1) what the alternatives accomplish, (2) that a single threshold works for all users, and (3) what that threshold is. And by maintaining the choice as a \texttt{maybe} statement, they can continue adaptating as devices, users, and networks change. Another benefit of this approach is that time-varying decisions can be outsourced to developers with expertise in the particular area affecting adaptation decisions. For example, by exposing an energy-performance tradeoff through a \texttt{maybe} statement, a developer allows it to be connected to a sophisticated machine learning algorithm written by an expert in energy adaptation, instead of their own ad-hoc approach. \subsubsection{Manual adaptation} In some cases even our best efforts to automatically adapt may fail, and it may be impossible to predict which alternative is best for a particular user using a particular device at a particular time. If the differences between the alternatives are small, then it may be appropriate to simply fall back to a best-effort decision. However, if the differences between the alternatives are significant then the \texttt{maybe} alternatives may need to be exposed to the user through a settings menu. Fortunately, information obtained through testing can still be presented to the user to guide their decision. Note that this requires labeling alternatives in a human-readable way. \subsection{Continuous Adaptation} \label{subsec-continuous} Finally, even once a decision process for a particular \texttt{maybe} alternative has been developed, it should be periodically revisited as users, devices, networks, batteries, and other factors affecting mobile apps continue to change. To enable continuous adaptation, developers can configure \texttt{maybe} statements to continue to periodically experiment with alternatives other than the one selected by the alternative testing process. Changes in alternative performance relative to the expectations established during the last round of alternative testing may trigger a large-scale reexamination of that \texttt{maybe} statement using the same process described above.