Reowolf 1.2: Release Notes

We are happy to release this milestone of the Reowolf project: Reowolf version 1.2. This is an alpha release. The milestone improves the concurrency of the Protocol Description Language (PDL) run-time interpreter. This post summarizes the improvements, and further lays out the milestones we will be working on next. This release is sponsored by the Next Generation Internet fund.

For this release, we have migrated to Gitlab as our public facing repository. Gitlab includes an issue tracker that is open for alpha users to submit bug reports and feature requests. The release tag is v1.2.0. The software is licensed under the MIT license.

The following aspects of the language have been improved, and in the sections below we demonstrate their functionality by small examples:

  1. Decentralized synchronization implementation. In previous versions, the PDL run-time interpreter would communicate among all components to synchronize them, and in doing so required a central leader. We have improved this aspect of the implementation, and now dynamically discover the neighboring components with which synchronization takes place. Thus, no longer a single component is considered the central leader. This improvement allows for different components in the system to run at different speeds, and allows the composition of slow components (e.g. in embedded systems) with fast components (e.g. running on dedicated high-performance hardware).
  2. Multi-threaded run-time implementation. In previous versions, all components ran on a single processor. We have adapted the run-time interpreter of PDL so that it distributes the execution of components over multiple threads, and can influence the scheduling of components using information provided in the protocol.
  3. We have added the capability for components so that multiple communications can take place over a single port between synchronizations. In previous versions, either no or exactly one datum could be communicated over a port between synchronizations. This change makes it possible to let the implementations of data transmission and synchronization between components race.
  4. We changed the syntax of PDL to more directly express the branches in speculative executions. Since speculative executions in some cases leads to expensive but discarded computations, it seems sensible to make it explicit in a PDL specification where this happens.
  5. We have performed some initial experimentation with error handling in sync blocks. Components might encounter runtime errors, which might make them unavailable for future interactions using the ports they owned. The components might also be programmed in such a way that the synchronization algorithm cannot find a satisfactory global behaviour. In all of these cases we need to be able to tell the programmer what went wrong.

Furthermore, this release has fixed some minor bugs that were present in previous releases. The final section shows the roadmap ahead, explaining what milestones will be worked on next, and our plans for the future.

Decentralized synchronization

The PDL run-time interpeter of versions 1.0 and 1.1 made use of a centralized synchronization algorithm. In this release we have replaced that by a decentralized synchronization algorithm. This section gives more detail explaining what this means in the execution of protocol programs written in PDL.

The centralized synchronization algorithm assumed authority over all of the running components: all of the components were executed until reaching the point in their sync block where a decision had to be made about the speculative branch that could be committed to memory. This made it a lot simpler to provide an initial implementation, but has a downside in that unrelated components now have their execution speed linked to one another. The region over which synchronization had to be achieved (the so-called sync region) spanned the entire constellation of components.

The distributed synchronization algorithm instead performs the discovery of the synchronous regions based on the ports that are involved in a synchronous code block. So if two components communicate with one another synchronously, then they belong to a single synchronous region. Consequently, those components will have to wait on one another until consensus on their shared behavior has been achieved. Conversely, if those two components do not share any ports to communicate with, then they can run independently of one another.

Planned for the next release is a mechanism for more control on how synchronous regions are constructed. If we consider the following composite component, then we see that there are two workers which are performing some kind of asynchronous work. We may assume that each time they’ve finished some work, they will send the result over their out<u32> port.

composite network(out<u32> result) {
    channel output_a -> input_a;
    channel output_b -> input_b;
    new worker(output_a); // create a worker
    new worker(output_b); // create another worker
    new merger(input_a, input_b, result); // merge the streams produced by the workers
}
primitive merger(in<u32> a, in<u32> b, out<u32> c) {
    while (true) {
        sync {
            if (fires(a)) {
                auto m = get(a);
                put(c, m);
            } else if (fires(b)) {
                auto m = get(b);
                put(c, m);
}   }   }   }

It is perfectly reasonable to block the execution of the workers if there is nobody to receive their results on the other end of the merger. But if there is a component rapidly expecting results from the result port, then the two workers should be able to run independently to produce results on input_a and input_b as fast as possible. This is one of the goals of Reowolf 1.3.

Multi-threaded run-time

So far, the Reowolf runtime has been implemented to run on a single thread to simplify development. With this release we’ve moved to a multi-threaded runtime. In its essence its a green-thread scheduler. Working towards an implemention of the Reowolf runtime that operates within the context of the operating system kernel, the scheduler has been kept simple.

The scheduler makes sure that when components are solely waiting on input, then they will not be scheduled until they receive messages allowing them to continue executing. Likewise, when a component has finished its execution (cleanly, or due to an error) the peers are notified such that they are no longer able to send messages to components that cannot receive them.

As a simple example, if we wish to execute the following code:

primitive producer(out<u32> output) {
    sync put(output, 1);
    sync put(output, 2);
}

primitive consumer(in<u32> input) {
    sync {
        auto value = get(input);
        assert(value == 1);
    }

    // Exit without receiving the second value
}

Then we are greeted by the following error message:

ERROR: attempted to 'put' on port that is no longer owned
 +-  at 4:10
 | 
 |      sync put(output, 2);
 |           ~~~~~~~~~~~~~~

 +-  Stack trace:
 | component producer:4

Because we are still running in user-space, the scheduler is implemented such that it will exit if it is certain that all components have finished their work and components can no longer be created through the API.

Multiple port firings

The synchronization algorithm is tasked with finding an agreeable global behavior of all of the participating components in a sync region. Finding this solution is achieved by considering (among other things) the ports that each component has used in the interaction with its peers. In the previous implementation this algorithm imparted the requirement that ports could either fire, or not fire. As a result, component could only put on a particular port once per sync block.

The new decentralized algorithm seeks a solution in a different way: still transmitting data through ports, but now allowing a programmer to use a port multiple times. So in the following simple example we may safely expect the receiver to always execute its second behaviour.

primitive sender(out<u32> output) {
    sync {
        // sender will only allow exactly two messages to be sent
        put(output, 1);
        put(output, 2);
    }
}

primitive receiver(in<u32> input) {
    sync {
        auto num = 0;

        fork      num = 1; // first behaviour: receive once
        or fork   num = 2; // second behaviour: twice
        or        num = 3; // third time's a charm?

        // num is now 1, 2 or 3
        auto index = 0;
        while (index < num) {
            auto value = get(input);
            index += 1;
        }
    }
}

Explicit forking

As hinted at in the example above: firing a port multiple times no longer meshes well with the concept of the fires function. We have also come to realize that the fires predicate was redundant. As a programmer you first have to make sure that the assertion fires(port) == true holds, and then remember to actually use that port. Conversely, asserting that a port is silent must be followed by not using that port. Any other use of the port is a runtime error.

To better reflect the usage of ports, we have replaced the firing predicate with a more explicit fork statement. As an example, consider the following snippet that uses the fires method:

u32 hash = 0;
if (fires(request) && fires(response)) {
    hash = compute_hash(get(request));
    put(response, hash);
} else if (fires(response)) {
    hash = compute_hash(default_request);
    put(response, hash);
} else {
    assert(false); // do not allow any other combination
}

This would now be written as:

fork {
    hash = compute_hash(get(request));
    put(response, hash);
    // Used both 'request' and 'response'
} or {
    hash = compute_hash(default_request);
    put(response, hash);
    // Used only 'response'
} // No more behaviour specified

Roadmap

After this release we can continue our work in the following directions:

  • Allowing further control over the synchronous region in which the synchronization algorithm seeks consensus about the global behaviour of a set of composed components.
  • Modelling existing transport layer protocols, such as TCP and UDP, as Reowolf protocols. This allows us to convincingly demonstrate the expressiveness of the protocol description language, and to compare our implementation’s efficiency with existing networking stacks. These transport layer implementations would make use of native IP components. Further ahead, we can model existing Internet protocols such as ICMP, DNS, BGP.
  • We are interested in the SCION internet architecture, and are investigating whether the Reowolf connector API can be used for programming internet applications that run on top of SCION, and whether we can specify components in PDL that allow applications to make use of all capabilities SCION networks offer. Towards this, we are setting up a subnet that will be connected to the SCIONlab network. Our experiences will be published in a series of blog posts.
  • Make first approaches to integrating Reowolf into the operating system kernel. We are exploring which operating system is most suitable for integration, so that we can offer the Reowolf connector API to user-mode processes. Further, we are investigating the compilation of PDL component specifications into loadable kernel modules, thereby increasing the performance of applications that can instantiate pre-compiled components.
  • Work on the specification of the Protocol Description Language (PDL), leading to a standardization track. Part of this specification work is the need to formalize, in an unambiguous manner, the semantics of protocols specified in PDL. We have submitted an article that describes the formal semantics for centralized synchronization, but we still need to investigate how to adapt the semantics to deal with decentralized synchronization. Formalized semantics increases the future potential for formal verification of protocols, and allows us to define the correctness criteria of Reowolf implementations.

We will keep you updated!

The Reowolf Team
– December 17, 2021