Aims

Scope / goals

We'll try to work towards an implementation that's "real-world capable". We will simplify on features and performance, but not in a way that would prevent them from being added or improved easily later on.

The compiler, if everything goes according to plan, will not itself depend on another Scheme system to run (it will use itself to run, though). The VM is exempt from this aim.

The full version of the workshop plan involves building a lowlevel interpreter/VM, a compiler for a dynamic programming language (with a bottom-up development approach), and possibly an implementation for a programming language to use for implementing the VM (with a top-down development approach). The work will give incentives for work on static analysis (type systems and proofs), although actual such work is not part of the plan. You can choose to participate only in certain parts of the project.

The aims

the ultimatel success would be if some of us find that the result is worthwhile to use; but another success is if we have learned much towards building a programming system, structuring projects, and working productively in a group
likely, the biggest value that the end product could have for us is ease of extension / adaptability, thus good structure (program design) is important.
Good system design means that parts can be changed or recombined easily, and that there is a good understanding of how the system works.
This is similar to one of the aims of Scheme, BTW:
Programming languages should be designed not by piling feature on top of feature, but by removing the weaknesses and restrictions that make additional features appear necessary. (R5RS intro)
correctness: any valid source program should work, there should be no corner cases that fail.
This is what makes it possible to build languages on top of each other without risk of triggering cases that were overlooked.
Examples: imprecise GC (see GC), or excessive lifetime of lexical variables (see "why not interpret" in VM). We may deviate from R5RS in the treatment of internal define to make it more uniform.
security: programs should not be able to access resources or information they haven't explicitely been given access to, and they should not be able to influence programs running in other security contexts other than through resources they share with them with their agreement.
This means that memory access must be handled safely and no inconsistent or invalid state must be reachable (or if that happens anyway, the program causing it must be terminated without ill effect to others).
performance optimization will be deferred to after reaching the above aims: they are outside the current scope of the workshop.
That should be fine, as performance will come (a) from optimizer passes in the compiler that can only be added once the rest is working anyway, (b) avoiding interpretation overhead which could be solved by generating LLVM code instead of using our own VM (or alternatively adding JIT compilation (through LLVM) to the VM) or similar means that can also naturally be added later, and (c) invasive or lowlevel changes (below the 'security boundary') like elimination of runtime type/range checks, implementing dynamic scope based memory life times, GC improvements, improved inter-thread/process communication, which will all better be added when the bigger picture is stable as they will tend to work against the good design (modularity / understandability), correctnes, and security aims.
That said, let's avoid decisions that would restrict or inhibit future optimization. E.g. implementing the VM in a programming language that only has slow implementations would necessitate a rewrite which would be more work and disruption than getting it done in a fast language from the start (note that the VM test suite will be big for the security reasons!). Of course the boundary is fluid here: we might decide that performance improvements would be done by eliminating the VM and thus the slowness of the VM implementation would be irrelevant, although this case may not be so clear cut (e.g. LLVM is rather slowly compiling, security goals might not be reachable with it, portability..).

The language

The system should ultimately support R5RS Scheme, including efficient (computational and memory complexity wise) first-class continuations, and universal tail-call optimization.

We may implement simpler languages than Scheme first (and Scheme might be implemented on top of the simpler one(s)).

But we may defer or ignore parts of R5RS less interesting to us:

number tower: we will only implement integers, and maybe rationals. (Saves implementation of or interfacing to real and complex operations, their storage, and parsing of floating point numbers.)
vectors
hygienic¹ macros (define-syntax). Instead we'll implement a simple non-hygienic macro system (similar to define-macro or Common Lisp's defmacro or Clojure macros), although features to make it possible to consciously write macros that are hygienic should be provided.
¹ Hygiene means that through macro expansion, no confusion (shadowing) arises in the code between identifiers written by the user and those that are inserted by macros.

Approach & features

Write most of the system in the language that it implements. But build it up from scratch, meaning, go through the bootstrapping problems that come from not having an implementation of the language at hands in which we want to implement it.
Use continuation-passing style as a language level, so as to make implementation of first-class continuations straight-forward and to "really get to understand" continuations.
debugging / introspection features (and useful error messages)
maintain precise source location information as much as possible

Mon, 23 Jun 2014 14:05:52 +0100