Blame view

certainty.tex 15.4 KB
Geoffrey Challen authored
1
2
\section{From Uncertainty to Certainty}
\label{sec-certainty}
Geoffrey Challen authored
3
Geoffrey Challen authored
4
While \texttt{maybe} allows programmers to specify multiple alternatives,
Geoffrey Challen authored
5
6
7
8
ultimately only one alternative can be executed at runtime. Either a single,
globally-optimal alternative must be identified, or a deterministic decision
procedure must be developed. Before discussing options for adapting an app to
its runtime environment, we first explain our runtime's support for
Geoffrey Challen authored
9
\texttt{maybe} alternatives, including \textit{a posteriori} evaluation and
Geoffrey Challen authored
10
data collection. Then, we discuss how \texttt{maybe} testing enables a
Geoffrey Challen authored
11
variety of different adaptation patterns.
Geoffrey Challen authored
12
Geoffrey Challen authored
13
14
15
16
17
\subsection{Evaluating Alternatives}

The optional \texttt{evaluate} block of a \texttt{maybe} statement allows
programmers to provide app-specific \textit{a posteriori} evaluation logic.
However, in many cases, we expect that \texttt{maybe} statements will be used
Geoffrey Challen authored
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
to achieve common objectives such as improving performance or saving energy. To
streamline application development, our current system evaluates \texttt{maybe}
statements without a \texttt{evaluate} block by measuring both energy and
performance. In cases where one alternative optimizes both, that alternative
will be used---although the decision may still be time-varying due to
dependence on time-varying factors such as network availability. When
alternatives produce an energy-performance tradeoff we are exploring several
options, including collapsing both metrics into a single score by computing the
energy-delay product (EDP) of each alternative, or allowing users to set a
per-app energy or performance preference.

\texttt{evaluate} blocks can also record other information to aid adaptation.
While the \texttt{score} value is used to evaluate the alternative, the
entire JSON object returned by the \texttt{evaluate} block is delivered to
the developer for later analysis. This allows \texttt{maybe} statements to be
connected with end-to-end app performance metrics not visible on the device.
We expect that some \texttt{evaluate} blocks may need to know which
alternative was executed to compute a score---for example, if the two
alternatives produce different quality output. We are exploring the use of
automatically-generated labels to aid this process.
Geoffrey Challen authored
38
39
40

If a \texttt{maybe} alternative throws an error, the system will bypass the
\texttt{evaluate} block and give it the worst possible score. By integrating
Geoffrey Challen authored
41
a form of record-and-replay~\cite{gomez2013reran}, it may be possible to roll
Geoffrey Challen authored
42
43
44
45
46
back the failed alternative and retry another. While \texttt{maybe} is
intended to enable adaptation, not avoid errors, the existence of other
alternatives provides a way to work around failures caused by uncertainty.
Fault tolerance may also encourage developers to use \texttt{maybe}
statements to prototype alternatives to existing well-tested code.
Geoffrey Challen authored
47
48

A final question concerns when a \texttt{maybe} alternative should be
Geoffrey Challen authored
49
50
51
52
evaluated. Some alternatives may require evaluation immediately after
execution. Others may require repeated execution over a longer period of time
to perform a fair comparison. As described previously, \texttt{evaluate}
blocks can indicate explicitly whether or not to continue evaluating the
Geoffrey Challen authored
53
54
55
56
57
58
59
alternative, and we are determining how to make a similar choice available to
\texttt{maybe} statements without \texttt{evaluate} blocks. In addition,
\texttt{evaluate} blocks can store state across multiple alternative
executions allowing them to evaluate not only micro- but also macro-level
decisions. In both cases, however, the \texttt{maybe} system allows
developers continuous per-statement control over alternative choice and
evaluation as described in more detail later in this section.
Geoffrey Challen authored
60
Geoffrey Challen authored
61
62
\subsection{\texttt{\large maybe} Alternative Testing}
Geoffrey Challen authored
63
64
65
66
67
We next describe the pre- and post-deployment testing that helps developers
to design an \textit{adaptation} policy, a strategy for ultimately selecting
between alternatives. While the \texttt{maybe} system automates many of the
tedious tasks normally associated with large-scale testing, we still provide
ways for the developer to guide and control any step in the process.
Geoffrey Challen authored
68
Geoffrey Challen authored
69
\subsubsection{Runtime control}
Geoffrey Challen authored
70
Geoffrey Challen authored
71
To begin, we briefly outline how our Android prototype implements the
Geoffrey Challen authored
72
73
\texttt{maybe} statement. We (1) rewrite each \texttt{maybe} conditional to
an \texttt{if-else} statement controlled by a call into the \texttt{maybe}
Geoffrey Challen authored
74
75
76
system and (2) generate a similar setter for each \texttt{maybe} variable.
Variable values and code branches are now all under the control of a separate
\texttt{maybe} service which can be deployed as a separate app or
Geoffrey Challen authored
77
78
79
80
81
82
83
84
85
86
incorporated into the Android platform. It is responsible for communicating
with the global \texttt{maybe} server to retrieve adaptation parameters for
all \texttt{maybe}-enabled apps on the smartphone. When possible, we avoid
interprocess communication during each \texttt{maybe} decision by caching
decisions in the app, with the \texttt{maybe} service delivering cache
invalidation messages when particular decisions change. The \texttt{maybe}
service tracks when alternative decisions change, runs \texttt{evaluate}
evaluation logic when appropriate, and returns testing results to the
\texttt{maybe} server.
Geoffrey Challen authored
87
88
89
90
91
92
93
\sloppypar{Because unexpected runtime variable changes could cause crashes or
incorrect behavior, we only alter \texttt{maybe} variables when they are
(re)initialized, not at arbitrary points during execution. If the app wants
to enable periodic readaptation of certain variables, such as the interval
controlling a timer, it can do so by periodically resetting the value using
another \texttt{maybe} statement. This ensures that \texttt{maybe} variables
only change when expected.}
Geoffrey Challen authored
94
Geoffrey Challen authored
95
\subsubsection{Simulation or emulation}
Geoffrey Challen authored
96
97

Pre-deployment simulation or emulation may provide a way to efficiently
Geoffrey Challen authored
98
99
100
101
102
evaluate \texttt{maybe} statements without involving users. Building
simulation environments that accurately reflect all of the uncertainties
inherent to mobile systems programming, however, is difficult. To complicate
matters, \texttt{maybe} alternatives may depend on details of user
interaction that are difficult to know \textit{a priori}, particularly when
Geoffrey Challen authored
103
104
new apps or features are being developed. So in most cases we believe
post-deployment testing will be required.
Geoffrey Challen authored
105
Geoffrey Challen authored
106
107
108
109
However, pre-deployment testing may still be a valuable approach,
particularly when a large number of \texttt{maybe} statements are being used.
Since this can explode the adaptation space, simulations may be able to help
guide the developer's choices of which \texttt{maybe} statements may have a
Geoffrey Challen authored
110
111
significant impact on performance and should be evaluated first. Other
\texttt{maybe} statements can be evaluated later or eliminated.
Geoffrey Challen authored
112
Geoffrey Challen authored
113
\subsubsection{Split testing}
Geoffrey Challen authored
114
115
116

Eventually code containing a number of \texttt{maybe} statements will be
deployed on thousands or millions of devices. At this point, large-scale
Geoffrey Challen authored
117
118
119
120
121
122
split testing and data-driven learning can begin. If the user community is
large enough, it may be possible to collect statistically-significant results
even for all possible permutations of \texttt{maybe} alternatives. For apps
with a small number of users, or a large number of \texttt{maybe} statements,
we can collect data for variations of one or several \texttt{maybe}
statements while holding the rest constant. As an adaptation policy is
Geoffrey Challen authored
123
designed and deployed for the statement being tested, we begin to vary and
Geoffrey Challen authored
124
125
measure the next group of \texttt{maybe} statements. Developers can observe
and control the testing process through a web interface.
Geoffrey Challen authored
126
127
128

Each time a \texttt{maybe} statement is reached or \texttt{maybe} variable is
set, the \texttt{maybe} system records:
Geoffrey Challen authored
129
%
Geoffrey Challen authored
130
131
132
133
\begin{itemize}

\item what \texttt{maybe} was reached;
Oliver Kennedy authored
134
\item what alternative was used and why.  This includes all environmental
Geoffrey Challen authored
135
features used to make the decision, as well as any other available 
Oliver Kennedy authored
136
provenance information;
Geoffrey Challen authored
137
Geoffrey Challen authored
138
\item what \texttt{evaluate} block evaluated the alternative, and the entire
Oliver Kennedy authored
139
JSON object it returned, including the score;
Geoffrey Challen authored
140
Oliver Kennedy authored
141
\item and a variety of other environmental and configuration parameters 
Oliver Kennedy authored
142
that the user permits access to: A user identifier; device
Geoffrey Challen authored
143
144
145
146
and platform information; networking provider and conditions; location;
battery level; and so on.

\end{itemize}
Geoffrey Challen authored
147
Geoffrey Challen authored
148
149
This dataset is periodically uploaded to the \texttt{maybe} server and used
to drive the adaptation approaches discussed next.
Geoffrey Challen authored
150
Geoffrey Challen authored
151
\subsubsection{Simultaneous split testing}
Geoffrey Challen authored
152
153
154

While large-scale split testing is intended to provide good coverage over all
possible sources of uncertainty we have discussed, it still normally requires
Geoffrey Challen authored
155
that only one decision be made at any given time---implying that two
Geoffrey Challen authored
156
alternatives may never be evaluated under identical conditions. For
Geoffrey Challen authored
157
\texttt{maybe} statements, however, we are exploring the idea of performing
Geoffrey Challen authored
158
\textit{simultaneous} split testing. In this model the app forks at the top
Geoffrey Challen authored
159
160
161
162
163
164
165
166
of the \texttt{maybe} statement, executes and scores all alternatives, and
then continues with the outputs from the best alternative at the bottom of
the \texttt{maybe} statement. On single-core devices this can be done in
serial, while the growing number of multi-core smartphones provides the
option of doing this in parallel. The benefit of this approach is that each
alternative is executed under near-identical conditions. The drawbacks
include the overhead of the redundant executions and the possibility for
interference between alternatives executing in parallel.
Geoffrey Challen authored
167
Geoffrey Challen authored
168
169
\subsection{\texttt{\large maybe} Endgames}
Geoffrey Challen authored
170
The entire \texttt{maybe} approach is predicated on the fact that there does
Geoffrey Challen authored
171
exist, among the alternatives, a right decision, even if it depends on many
Geoffrey Challen authored
172
factors and uncertainties. We continue by discussing how the dataset
Geoffrey Challen authored
173
generated by post-deployment testing can be used to determine how to
Geoffrey Challen authored
174
correctly choose \texttt{maybe} alternatives at runtime.
Geoffrey Challen authored
175
Geoffrey Challen authored
176
\subsubsection{Simple cases}
Geoffrey Challen authored
177
178
179

In the simplest case, testing may reveal that a single alternative performs
the best on all devices, for all users, at all times. In this situation, the
Geoffrey Challen authored
180
181
182
183
184
\texttt{maybe} system may offer a way for the developer to immediately cease
testing of that alternative and even automatically rewrite that portion of
code to remove the \texttt{maybe} statement. However, it is also possible
that the situation may change in the future when a new device, or Android
version, or battery technology is introduced, and so the programmer may also
Geoffrey Challen authored
185
choose to preserve the flexibility in case it is useful in the future.
Geoffrey Challen authored
186
Geoffrey Challen authored
187
The slightly more complicated case is when testing reveals that alternatives
Geoffrey Challen authored
188
provide stable tradeoffs between energy and performance---one alternative always
Geoffrey Challen authored
189
190
191
192
193
194
195
saves energy at the cost of performance. In this case the system only has to
determine whether to prioritize energy or performance. While this decision
seems simple, it is itself complicated by differences in battery capacity,
charging habits, mixtures of installed apps, and the importance of the app to
each user. However, the stability of the alternatives' outcomes means that
once an energy or performance policy decision has been made, the choice of
alternative has also been made.
Geoffrey Challen authored
196
Geoffrey Challen authored
197
\subsubsection{Static adaptation}
Geoffrey Challen authored
198
Geoffrey Challen authored
199
200
201
202
203
In the more complicated cases, testing reveals that the choice of alternative
depends on some subset of the factors driving uncertainty in mobile systems
programming. We break this group into two subsets, depending on whether the
adaptation is time varying (dynamic) or not (static). We begin with the
second, somewhat easier case.
Geoffrey Challen authored
204
Geoffrey Challen authored
205
If the alternative is determined through static adaptation then the correct
Geoffrey Challen authored
206
decision is a function of some unchanging (or very-slowly changing) aspect of
Geoffrey Challen authored
207
the deployed environment. Examples include the device model, average network
Geoffrey Challen authored
208
209
210
211
conditions, the other apps installed on the device, or user characteristics
such as gender, age, or charging habits. In this case it is possible that the
correct alternative can be determined through clustering based on these
features, and once determined will remain the best choice for a long time.
Geoffrey Challen authored
212
Geoffrey Challen authored
213
\subsubsection{Dynamic adaptation}
Geoffrey Challen authored
214
215
216
217

If the choice of alternative depends on dynamic factors such as the accuracy
of location services, the amount of energy left in the battery, or the type
of network the device is currently connected to, then it is possible that no
Geoffrey Challen authored
218
219
220
single alternative can be chosen even for a single user. Instead, the
\texttt{maybe} system allows developers to evaluate one or more strategies to
drive the runtime alternative selection process.
Geoffrey Challen authored
221
Geoffrey Challen authored
222
223
224
225
Note that \texttt{evaluate} blocks are \textit{not} intended to accomplish
this kind of adaptation. First, they run after the \texttt{maybe} statement
has been executed, not before. Second, per-\texttt{maybe} strategy defeats
the flexibility inherent to the \texttt{maybe} approach and would devolve
Geoffrey Challen authored
226
227
228
229
into the fragile decision-making we are trying to avoid.

Instead, the \texttt{maybe} system allows developers to experiment with and
evaluate a variety of different dynamic adaptation strategies deployed in a
Geoffrey Challen authored
230
companion library, with the decision guided by post-deployment testing. For
Geoffrey Challen authored
231
232
233
example, if the performance of an alternative is discovered to be correlated
with a link providing a certain amount of bandwidth, then that adaptation
strategy can be connected with that particular \texttt{maybe} statement.
Geoffrey Challen authored
234
235

Observe that in some cases of dynamic adaptation, what begins as a
Geoffrey Challen authored
236
237
\texttt{maybe} statement may end as effectively \texttt{if-else} statement
switching on a static threshold---the same approach we attacked to motivate
Geoffrey Challen authored
238
our system. However, through the process of arriving at this point we have
Geoffrey Challen authored
239
240
241
242
determined several things that were initially unknown: (1) what the
alternatives accomplish, (2) that a single threshold works for all users, and
(3) what that threshold is. And by maintaining the choice as a \texttt{maybe}
statement, they can continue adaptating as devices, users, and networks
Geoffrey Challen authored
243
change.
Geoffrey Challen authored
244
Geoffrey Challen authored
245
Another benefit of this approach is that time-varying decisions can be
Geoffrey Challen authored
246
247
248
249
250
outsourced to developers with expertise in the particular area affecting
adaptation decisions. For example, by exposing an energy-performance tradeoff
through a \texttt{maybe} statement, a developer allows it to be connected to
a sophisticated machine learning algorithm written by an expert in energy
adaptation, instead of their own ad-hoc approach.
Geoffrey Challen authored
251
Geoffrey Challen authored
252
\subsubsection{Manual adaptation}
Geoffrey Challen authored
253
254
255
256
257

In some cases even our best efforts to automatically adapt may fail, and it
may be impossible to predict which alternative is best for a particular user
using a particular device at a particular time. If the differences between
the alternatives are small, then it may be appropriate to simply fall back to
Geoffrey Challen authored
258
a best-effort decision. However, if the differences between the alternatives
Geoffrey Challen authored
259
260
261
262
are significant then the \texttt{maybe} alternatives may need to be exposed
to the user through a settings menu. Fortunately, information obtained
through testing can still be presented to the user to guide their decision.
Note that this requires labeling alternatives in a human-readable way.
Geoffrey Challen authored
263
264
265
266
267
268
269
270
271
272
273
274
275
276

\subsection{Continuous Adaptation}
\label{subsec-continuous}

Finally, even once a decision process for a particular \texttt{maybe}
alternative has been developed, it should be periodically revisited as users,
devices, networks, batteries, and other factors affecting mobile apps
continue to change. To enable continuous adaptation, developers can configure
\texttt{maybe} statements to continue to periodically experiment with
alternatives other than the one selected by the alternative testing process.
Changes in alternative performance relative to the expectations established
during the last round of alternative testing may trigger a large-scale
reexamination of that \texttt{maybe} statement using the same process
described above.