Skip to content

A future proof package and import system

Marten Lohstroh edited this page Jul 22, 2020 · 27 revisions

This is a collection of thoughts on the design of a reliable package and import system that is ready for future applications. At this stage, this page mostly represents my personal view (Christian Menard). I will also focus on the C++ target here as this is the target I know best. The C target is not a good example for these considerations as there is a fundamental design issue with the C target. Since the code generator places all code in a single generated .c file and does things like #include reactor.c to avoid the need for Makefiles, it circumvents many of the issues that come with imports that I will outline here. It simply ignores file scopes and namespaces altogether.

The status quo

The current import system is lean and simple. Write import Bar.lf in Foo.lf and every reactor defined in Bar.lf will be visible in the file scope Foo.lf. Bar.lf is looked up simply by scanning the directory Foo.lf is placed in. This works well for the simple programs and tests we have right now, but does not scale. I identify the following problems:

  1. There is no notion of separate namespaces. Every reactor that Bar.lf defines becomes visible in Foo.lf. If both files define a Reactor Foo, there is a name clash and the import would be ill-formed. There should be a mechanism to distinguish the two definitions of Foo, such as using fully qualified names: Foo.Foo and Bar.Foo.

  2. There is no concept for importing files from a directory structure. It is unclear how Foo.lf could import my/lib/Bar.lf.

  3. There is no concept for packages or libraries that can be installed on the system. How could we import Reactors from a library that someone else provided?

These are the more obvious issues that we have talked about. However, there are more subtle ones that we haven't been discussed in depth (or at least not in the context of the import system design discussion). The open question is: What does importing a LF file actually mean? Obviously, an import should bring Reactors defined in another files into local scope. But what should happen with the other structures that are part of an LF file, namely target properties and preambles? That is not specified and our targets use a best practice approach. But this is far away from a good design that is scalable and future proof.

A quick dive into the C++ code generator

Before I discuss the problems with preambles and target properties, I would like to give you a quick overview of how the C++ code generator works. Consider the following LF program consisting of two files Foo.lf and Bar.lf:

// Bar.lf

reactor Bar {
 reaction(startup) {=
   // do something bar like
 =}
}
// Foo.lf

import Bar.lf

reactor Foo {
 bar = new Bar();
 reaction(startup) {=
   // do something foo like
 =}
}

Now let us have a look on what the C++ code generator does. It will produce a file structure like this:

CMakeLists.txt
main.cc
Bar/
  Bar.cc
  Bar.hh
Foo/
  Foo.cc
  Foo.hh

We can ignore CMakeLists.txt and main.cc for our discussion here. The former specifies how the whole program can be build and the latter contains the main() function and some code that is required to get the application up and running. For each processed <file>.lf file, the code generator creates a directory <file>. For each reactor <reactor> defined in <file>.lf, it will create <file>/<reactor>.cc and <file>/<reactor>.hh. The header file declares a class representing the reactor like this:

// Bar/Bar.hh

# pragma once

#include "reactor-cpp/reactor-cpp.hh"

class Bar : public reactor::Reacor {
 private:
  // default actions
  reactor::StartupAction startup {"startup", this};
  reactor::ShutdownAction shutdown {"shutdown", this};

 public:
  /* ... */

 private:
  // reaction bodies 
  r0_body();
};

The corresponding Bar/Bar.cc will look something like this:

#include "Bar/Bar.hh"

/* ... */

Bar::r0_body() {
  // do something bar like
}

Similarly, Foo.hh and Foo.cc will be generated. However, since Foo.lf imports Bar.lf and instantiated the reactor Bar it must be made visible. This is done by an include directive in the generated code like so:

// Foo/Foo.hh
### 
# pragma once

#include "reactor-cpp/reactor-cpp.hh"
#include "Bar/Bar.hh"

class Foo : public reactor::Reacor {
 private:
  // default actions
  reactor::StartupAction startup;
  reactor::ShutdownAction shutdown;

  // reactor instances
  Bar bar;
 public:
  /* ... */

 private:
  // reaction bodies 
  r0_body();
};

The problem with preambles

The problems with preamble in the context of imports were already discussed in a related issue, but I would like to summarize the problem here. While the examples above worked nicely even with imports, things get messy as soon as we introduce a preamble. Let's try this:

// Bar.lf

reactor Bar {
 preamble {=
   struct bar_t {
     int x;
     std::string y;
   };

   bar_t bar_func {
     return bar_t(42, "hello")
   }
 =}
 output out:bar_t;
 reaction(startup) -> out {=
   out.set(bar_fuc());
 =}
}
// Foo.lf

import Bar.lf

reactor Foo 
 bar = new Bar();
 reaction(bar.out) {=
   auto& value = bar.out.get();
   std::cout << "Received {" << value->x << ", " << value->y << "}\n";
 =}
}

This would be expected to print Received {32, hello}. However, before we can even compile this program, we need to talk about what should happen with the preamble during code generation and how the import affects it. So where should the preamble go? The first thing that comes to mind, is to embed it in the header file Bar.hh something like this:

// Bar/Bar.hh

# pragma once

#include "reactor-cpp/reactor-cpp.hh"

// preamble
struct bar_t {
  int x;
  std::string y;
};

bar_t bar_func {
  return bar_t(42, "hello")
}

class Bar : public reactor::Reacor {
 /* ... */
};

If we embed the preamble like this and compile the program ,then the compiler is actually happy and processes all *.cc files without any complaints. But, there is a huge problem while linking the binary. The linker sees multiple definitions of bar_func and has no idea which one to use. Why is that? Well, the definition of bar_func is contained in a header file. This should never be done in C/C++! Since includes translate to a plain-text replacement by the preprocessor, Bar.cc will contain the full definition of bar_func. As Foo.cc imports Foo.hh which imports Bar.hh, also Foo.cc will contain the full definition. And since main.cc also has to include Foo.hh, main.cc will also contain the full definition of bar_func. So we have multiple definitions of the same function and the linker rightfully reports this as an error.

So what should we do? We could place the preamble in Bar.cc instead. This ensures that only Bar.cc sees the definition of bar_func. But then the compiler complains. Neither Bar.hh nor Foo.hh see type declaration of bar_t. Note that there is a dependency of Foo.lf on the preamble in Bar.lf. The import system should somehow take care of this dependency! Also note that this has not appeared as a problem in C as the code generator places everything in the same compilation unit. Foo will see the preamble of Bar as long as Foo is generated before Bar.

But how to solve it for C++ where the code is split in multiple compilation units (which really should be happening in C as well)? What we do at the moment is annotating the preamble with private and public keywords. This helps to split the preamble up and decide what to place in the header and what to place in the source file. For instance:

// Bar.lf

reactor Bar {
 public preamble {=
   struct bar_t {
     int x;
     std::string y;
   };
 =}
 private preamble {=
   bar_t bar_func {
     return bar_t(42, "hello")
   }
 =}
 output out:bar_t;
 reaction(startup) -> out {=
   out.set(bar_fuc());
 =}
}

This makes the type bar_t visible as part of the public interface of Bar. Both the code generated for Bar and the code generated for Foo will see the definition of bar_t. This is realized by placing the public preamble in Bar.hh The function bar_func is part of Bar's private interface. It is only visible with the reactor definition of Bar and is not propagated by an import. This is realized by simply placing the private preamble in Bar.cc. This makes the compiler finally happy and when get an executable program private and public preambles provide a mechanism to define what is propagated on an import and what is not. I think this is an important distinction even in languages other than C/C++ that do not have this weird separation of source and header file.

I am sorry for this lengthy diversion into things that happened in the past where we actually want to talk about how things should work in the future. However, understanding this issue is important and when talking about other solutions we should not forget that it exists.

The problem with target properties

It is also not well-defined what should happen with target properties when importing a .lf file. Apparently the common practice is simply ignoring the existence of other target declarations and only considering the target declaration of the .lf that contains the main reactor. I think this works reasonably well for our small programs. But it will cause problems when either programs become larger or we introduce new target properties where it is unclear what piece of code they reference. Let us have a look at the existing target properties for C++. How should those different properites be handled on an import? Which scope do they actually apply to? We haven't really talked about this.

fast, keepalive, threads and timeout are easy. They apply to the main reactor. Since we do not import main reactors from other files, it is clear that we really want to use the properties defined in the main compilation unit. So our current strategy works in this case. Although there are still some subtelties. For instance, if a library file defines keepalive=true and fast=false because it uses physical actions, should any file importing this library file be allowed to override these properties. Probably not, because it doesn't make sense if physical actions are involved. But a careless user of the library might not be aware of that. So maybe it isn't that clear after all.

build-type, cmake-include, compile, logging and no-runtime-validation influence how the application is build. They are used for generating the CMakeLists.txt file. So their is quite clear: they apply to the whole compilation of the given application. Again it is a simple solution to only consider the target properties of the file containing the main reactor since this can be considered the file that 'drives' the compilation. But what if an imported .lf relies on an external library and uses the cmake-include property to tell cmake to look this library up, make the library header files visible and link our generated code to that library (fortunately this can be done with 2 lines in cmake). Should this target property really be ignored by our import? Probably not, because it will lead to compile errors if the authot of the main .lf file does not configure cmake-include properly. So there should be some kind of merging mechanism for cmake-include. Should this be done for the other properties as well? I am not sure and I actually don't know how the merging would work.

So this raises a lot of questions that we currently have no answer to. I believe we need to find answers for these questions in order to create a well working import and package system. This gets only more complicated when we add more properties such as the proposed files directive. We should really consider what properties actually apply to and if they influence the way imports work.

The work in progress

To be continued... I want to describe here what is happening on the new_import and the (potential) problems this brings.

Possible solutions

To be continued... I would like to show a few possible soltions that have come to mind and that we discussed already.

Concrete proposal

With the risk of overlooking some of the issues discussed above, I'd like to outline a concrete proposal. To me, at least, it is easier to reason about these issues in a context with a few more constraints. Hopefully, this can serve as a starting point that we can tweak/adjust as needed. Note: this proposal borrows from the previous proposal written by Soroush. Based on my experience with Xtext, I have confidence that what is described below is feasible to implement.

Import/export

  1. One LF file can contain multiple reactor definitions and at most one main reactor.
  2. Any reactor class defined outside of the current file has to be imported explicitly.
  3. The visibility of a reactor class can be limited using a modifier in the class definition.
  • Should the default visibility be public or private? I have no strong preference either way.
  1. An import statement must specify which reactor classes to import. This is necessary because if we populate a global scope using the uriImport mechanism, the local scope provider needs to know which definition to link to if there happen to exist multiple among the set of included files.
  2. An LF file in an import statement is specified by a path relative to the directory of the file in which the import statement occurs or relative to a series of directories in a wider search path.
  • Eclipse uses .project files to identify the root of a project; we can look for that.
  • We can look for our own kind of manifest files as well. These can list additional locations to search. This is compatible with the idea of developing a package system. I personally like this approach better than using an environment variable.
  1. I personally find fully qualified names excess generality and have never felt compelled to use them in Java. IMO, they lead to code that's difficult to read and a pain to format. To keep things simple, I suggest we don't support them. Instead, we should provide a mechanism for renaming imported reactor classes to avoid naming conflicts.

Syntax

Import := 'import' <ID> <Rename>? (',' <ID> <Rename>?)* 'from' <PATH>

Rename := 'as' <ID>

Preambles

A preamble allows for the inclusion of verbatim target code that may be necessary for reactors to function. Currently, there are two scopes in which preambles can appear: (1) file scope and (2) reactor class scope. Moreover, there exist visibility modifiers to label preambles private or public. A public preamble is intended to contain code that is necessary for the use of a reactor that is in scope. A private preamble is intended to contain code that is necessary for the implementation of a reactor that is in scope. Only the Cpp code generator can currently effectively separate these LF scope levels. It achieves this by putting each reactor class definition in its own file.

LF file scope

Reactor class scope

Target Properties