Architecture of the Ubik runtime =======================================
Haldean Brown First draft: Sep 2016
Last updated: Sep 2016
Status: Needs revision
While the value encodings have remained the same, the scheduler
(now called the evaluator) has been significantly reworked to
improve performance; that section needs rewriting.
The Ubik runtime has a relatively simple architecture; the basic idea is
that Ubik values are allocated by the GC system and flow through the
scheduler, which is responsible for producing a "minimal value" from an
existing value. The scheduler attempts to minimize all terminal values,
and in so doing evaluates the result of a computation.
Ubik values come in 9 types:
STR: a UTF-8 encoded string
encoded as byte-arrays with a length
RAT: a rational number
encoded as a 64-bit signed numerator and a 64-bit unsigned
BOO: a boolean
encoded as a C11 _Boolean (which is stored in a byte)
TUP: a fixed-length typed tuple
encoded as a list of values, a list of types and a size
FUN: a function
encoded as a list of nodes, an optional evaluator, and the
index of the node that represents the result of the function.
More on node encodings in a second.
PAP: a partially-applied function
encoded as a pair of values and an integer; the first value
represents the function, the second represents the argument,
and the integer represents the number of arguments the result
can take before it is fully applied.
MUL: a polymorphic multimethod
TYP: a type
IMP: an implementation of an interface
Encoding functions -----------------------------------------------------
Functions are encoded through an implicit directed acyclic graph of
nodes, each of which takes some number of inputs and computes a result.
These nodes perform very simple operations; there are currently 8 types:
APPLY: applies a function to an argument
contains a node index to the function and a node index to
VALUE: represents a constant value
contains a pair of values; one is the type and one is the
value of the constant value.
LOAD: loads a value from the environment
contains a URI to load from.
STORE: stores a value in the environment
contains a node index whose value will be stored, and the
URI at which the value will be stored.
INPUT: represents an argument to the function
contains the argument index this node corresponds to (i.e.,
if the node represents the first argument, it's argument
index is zero).
REF: adopts the type and value of a referrent node
contains the node index of the referrent node. These are
used to make compilation easier; in an ideal world, the
compiler would optimize these out entirely.
COND: branches based on a boolean
takes the node indexes of three nodes: the condition, the
value if true, and the value if false.
NATIVE: stores the result of a native computation
this is a little piece of magic that enables the graph
representation of functions that are implemented by the
runtime instead of in Ubik. If the graph has a custom
evaluator (which we'll discuss in the scheduler section)
then this node will be filled with the result of the
evaluator when the evaluator completes, as if by magic. It
has no subfields, as it is only a container for a result.
Each node also contains an ID (a word that is only used for debugging)
and a flag that determines whether the node is terminal or not. Terminal
nodes are enqueued as "goals" in the scheduler when an evaluation of the
function is requested. This makes it possible to force eventual STORE
effects without keeping them in the dependency chain of the graph's
Scheduling computation -------------------------------------------------
The scheduler's job is to take a FUN, PAP, or MUL value and collapse
them to their minimal values (i.e., if the FUN, PAP, or MUL has
input-arity 0, it evaluates the function). This operation is called
"evaluation". A user of the runtime can request evaluation of a value or
of multiple values; this is called "enqueuing" the values.
When an unreducable value is enqueued (like a STR or a BOO), the
scheduler halts as the value has already been reduced. Additionally, if
a value is executable but is not fully applied (for example, you've
provided one of the two arguments to a function) the scheduler halts, as
the value has already been reduced.
When a graph is enqueued, it is wrapped in a graph executor
(ubik_exec_graph). The executor is responsible for encoding all of the
information specific to an invoking of the function. This means:
- the calculated value of each node (including the values of the
arguments, if any)
- the calculated type of each node
- the readiness of each node (i.e., whether the node is ready to
evaluate or is still waiting on a piece of data or one of the
nodes it depends on)
- the value being evaluated
- the environment in which the graph will be executed
- an optional callback that will be called when the graph's result
is done being evaluated
Following this, we walk the dependency tree of the terminal nodes to
determine their evaluation status. The evaluation status is determined
- if the node is a COND node, the node is marked as waiting on the
evaluation of the condition.
- if the node has any dependencies, set the WAIT flags for each
dependency. The maximum number of dependencies for any node type
is 3, so there are 3 such flags: WAIT_D1, WAIT_D2, and WAIT_D3.
The mapping from field to dependency index is arbitrary but
consistent for a given node type (i.e., D1 for a VALUE node is
always the value, and D2 is always the type).
- if the node has no dependencies, set the READY flag
These flags are set on the graph executor. All READY nodes are enqueued
in the "ready" queue; all WAIT nodes are enqueued in the "wait" queue.
The scheduler then begins the evalulation loop:
- it takes a node executor off of the ready queue
- it evalutates the node in the executor
- it stores the resulting value and type in the graph executor
- it finds the tasks dependent on the calculated value and clears
the dependency-wait flag associated with the evaluated node in
each task. If all wait flags have been cleared on the dependent
task, the node is marked as ready.
- the scheduler then does a "ready-sweep", in which it moves node
executors that are marked as ready from the wait queue to the
The process then repeats, until the ready queue is empty. At that point,
if there are still nodes in the wait queue, the scheduler is deadlocked
and is unable to evaluate the computation; this can happen due to
circular node dependencies, or waiting on data that never appears. If
the wait queue is empty, the computation has completed successfully.