Reowolf 1.1: Release Notes

We are happy to release this milestone of the Reowolf project: Reowolf version 1.1. This is an alpha release. The milestone improves the structural aspects of Protocol Description Language (PDL), which increases the declarative aspects of protocol descriptions needed for modeling Internet protocols (e.g. TCP, UDP, ICMP, DNS). This post summarizes the improvements, and further lays out the milestones we will be working on next. This release would not be here without Max Henger, who joined the Reowolf project in November 2020, whose contributions have had a major impact on the feature completeness of this release. This release is sponsored by the Next Generation Internet fund.

The Git repository associated to this release can be checked out here. The release tag is v1.1.0. The software is licensed under the MIT license.

The following aspects of the language have been improved, and in the sections below we demonstrate their functionality by small examples:

  1. Introduced algebraic data types (“enum”, “union”, “struct”) for user-defined structuring of data. For handling elements, we introduced “if let” statements for deconstructing “enum” and “union” types, and field dereferencing of “struct” types. The “if let” statement allows extensibility of the type definition of “union” and “enum” types. We also introduced constructor literals for constructing elements in data types, including arrays.
  2. Introduced a type system, an algorithm for type checking, and a type inference system. Type checking ensures that the execution of protocols do not misinterpret data (i.e. avoiding “type confusion”), thus rules out a class of errors.
  3. Introduced generic types in function and datatype definitions, by the use of type variables. Ad-hoc polymorphism is a structural feature of the language only, and is erased during the execution of protocols (in a process called monomorphization). Ad-hoc polymorphism is also available for the port types, allowing rich type information to describe the kind of messages that components can exchange over time.
  4. Improved usability. The module system and namespaces are implemented. This is important for protocols that are authored by multiple independent developers, to avoid namespace conflict. Further, error messages are improved, that increases the usability of the system and makes life easier for developers that use PDL.

The final section shows the roadmap ahead, explaining what milestones will be worked on in the future.

Algebraic Data Types

We have introduced a system for user-defined algebraic data types, in addition to the primitive types (for signed, unsigned integer handling, and arrays). User-defined types can be declared to aid the protocol programmer in structuring data. There are three kinds of user-defined data types:

  • enumeration types (enum)
  • tagged union types (union)
  • product types (struct)

Enumeration and tagged union types serve a similar purpose: to discriminate different cases. Tagged unions further allow for data to be stored per variant, whereas enumeration types do not store data. Enumerations are used for named constants.

For example, consider an enumeration of DNS record types (we list only here a few variants):

enum DNSRecordType { A, NS, CNAME, SOA, MX, TXT }

Then one can access each constant as DNSRecordType::A, DNSRecordType::NS, et cetera.

A classic examples of disjoint union types are to implement so-called option types. For example, in places where some variant of the enumeration above is expected or no value can be given, one may use the following data type:

union OptionalDNSRecordType { None, Some(DNSRecordType) }

To construct an element of a tagged union type, one uses the constructor for each variant. In the case of no-argument variants, this is similar in use as constants in enumerations. For variants that accept parameters, these have to be supplied to the variant constructor. For example, OptionalDNSRecordType::Some(DNSRecordType::A) is an element of OptionalDNSRecordType.

To be able to test what variant is used, we introduce the “if let” statement. This statement tests the variant of a union type and at the same time performs a pattern match to extract data from the matched variant.

auto x = OptionalDNSRecordType::Some(DNSRecordType::A);

if (let OptionalDNSRecordType::Some(y) = x) {
    // y is bound to DNSRecordType::A here
    if (let y = DNSRecordType::A) {
        assert true;
    } else {
        assert false;
    }
}

Product types can be used for modeling structured data, such as packet headers. Each field has an associated type, thus constraining what elements can be stored in the type. Product types are also known as records. For example, here is a structure modeling the UDP packet header:

struct UDPHeader {
    u16 source_port,
    u16 dest_port,
    u16 length,
    u16 checksum
}

Elements of product types can be constructed by a constructor literal, that takes an element for each of the fields that are part of the product. For example, UDPHeader{ source_port: 82, dest_port: 1854, length: 0, checksum: 0 } constructs an element of the type defined above.

Further, user-defined types may be recursive, and thus allow modeling interesting structures such as binary trees. Consequently, one can define recursive functions that traverse such structures. See for example:

union Tree {
    Leaf(u8),
    Node(Tree,u8,Tree)
}
func mirror(Tree t) -> Tree {
    if (let Tree::Leaf(u) = t) {
        return t;
    }
    if (let Tree::Node(l, n, r) = t) {
        return Tree::Node(mirror(r), n, mirror(l));
    }
    return Leaf(0); // not reachable
}

Type Checking and Inference

The type checker ensures that protocols written in PDL are type safe: it is ensured statically that no element of one type is assigned to another type. It is still possible to transform elements from one type to another, either by the means of casting or by calling a function.

The type checker in Reowolf futhermore elaborates protocols in which not sufficient typing information is supplied. This is called type inference. This reduces the need for protocol programmers to supply typing information, if such information can be deduced automatically from the surrounding context.

The type checker and inference system works in tandem with user-defined algebraic data types. Also, in pattern matching constructs such as the “if let” statement, the types of the variables occurring in patterns are automatically inferred.

Consider the following function, that computes the size of a binary tree. It declares an automatic variable (s) that contains the result of the function. Further, it automatically infers the type of the pattern variables (l, n, r) that follows from the definition of the Node variant in the Tree data type.

func size(Tree t) -> u32 {
    auto s = 0;
    if (let Tree::Node(l, n, r) = t) {
        s += 1;
        s += size(l) + size(r);
    } else {
        s = 1;
    }
    return s;
}

We shall now consider a number of negative examples. One assigns two different variables (val16 and val32), and then leaves unspecified the type of the third variable (a) by the use of the auto keyword.

func error1() -> u32 {
    u16 val16 = 123;
    u32 val32 = 456;
    auto a = val16;
    a = val32;
    return 0;
}

In this case, an error is triggered, because there exists no type to which both 16-bit unsigned integers and 32-bit unsigned integers can be assigned. The same kind of error occurs whenever one performs an operation on two different types. Reowolf has no automatic implicit casting. This type strictness is added to ensure that code is never ambiguous. However, casting operators are used to explicitly mark where casting happens in the code.

func error2() -> s32 {
    s8 b = 0b00;
    s64 l = 1234;
    auto r = b + l;
    return 0;
}
func good1() -> s32 {
    s8 b = 0b00;
    s64 l = 1234;
    auto r = cast<s64>(b) + l;
    return r;
}
func good2() -> s32 {
    s8 b = 0b00;
    s64 l = 1234;
    auto r = cast(b) + l; // type inferencer can make the jump
    return r;
}

Generic Types and Functions

Reowolf now supports generic type parameters, that can be used both in user-defined data type definitions and in function definitions. Generic type parameters are also used by the type checker and type inferencer. For example, it is possible to define the generic option type:

union Option<T> { None, Some(T) }

The generic type can be instantiated by a concrete type, including the primitive types such as integers. It is also possible to define generic functions, for example:

func some<T>(Option<T> s) -> T {
    if (let Option::Some(c) = s) { return c; }
    while (true) {} // does not terminate for Option::None
}

Furthermore, generic types are also added to input and output ports: this allows protocol programmers to specify precisely what value is expected during communication. For example, the sync channel is defined as:

primitive sync<T>(in<T> i, out<T> o) {
    while (true) {
        sync {
            if (fires(i) && fires(o)) {
                auto m = get(i);
                put(o, m);
}   }   }   }

The sync channel can then be instantiated by different concrete types: e.g. sync<u8> is a byte channel, and sync<u8[]> is a byte array channel. The additional type information is useful to avoid communicating with incompatible message types.

Usability Improvements

The module and namespace system is improved. Protocol descriptions live in their own namespace (each domain name is a separate namespace) to prevent naming conflicts of definitions among multiple protocol authors. Importing symbols from other modules is allowed, and checks for naming conflicts among symbols imported from other modules and locally defined symbols.

An important aspect of this release is to have user-friendly error messages. This helps the protocol programmer to identify the error in the protocol description, that can be quickly resolved. For example:

struct Pair<T1, T2>{ T1 first, T2 second }
func bar(s32 arg1, s8 arg2) -> s32 {
    auto shoo = Pair<s32, s8>{ first: arg1, seond: arg2 };
    return arg1;
}

produces the following user-friendly error message:

ERROR: This field does not exist on the struct 'Pair'
 +-  at 3:45
 | 
 |              auto shoo = Pair<s32, s8>{ first: arg1, seond: arg2 };
 |                                                      ~~~~~

Roadmap

After this release we can continue our work in the following directions:

  • The semantics of Reowolf’s sync block has to be adapted to make it possible to be driven by an efficient distributed consensus algorithm. For this, we introduced so-called scoped sync statements, that allows for a run-time discovery of neighboring components.
  • Modelling existing transport layer protocols, such as TCP and UDP, as Reowolf protocols. This allows us to convincingly demonstrate the expressiveness of the protocol description language, and to compare our implementation’s efficiency with existing networking stacks. These transport layer implementations would make use of native IP components. Further ahead, we can model existing Internet protocols such as ICMP, DNS, HTTP, ….
  • Make first approaches to integrating Reowolf into the operating system kernel. We are exploring which operating system is most suitable for integration. Considering that our user-mode implementation is written in Rust, we are seeking whether our kernel implementation can also be written (mostly) in Rust.
  • Work on the specification of the Protocol Description Language (PDL), leading to a standardization track. Part of this specification work is the need to formalize, in an unambiguous manner, the semantics of protocols specified in PDL. Formalized semantics increases the future potential for formal verification of protocols, and allows us to define the correctness criteria of Reowolf implementations.

We will keep you updated!

The Reowolf Team
– June 4, 2021