"Safe, S-expression based, Similar-to-C programming"

S3C is a language concept by Christian to write programs "in C", but with an S-expression based syntax, and stronger typing and syntax extension facilities, plus an implementation used during development that features complete safety at runtime (by using runtime type checking, invalidation of stack memory upon leaving a scope, and using destructors offered by the GC of the host language to verify correct memory handling). The idea is to then use extensive rules-based tests to try to make programs written in it secure (total absense of runtime errors in the safe implementation means absense of errors / security holes in the translated C code). This could be a bridge learning to go from 'plain programming' (runtime type checking, writing tests) to verifying safety statically (using static type systems and proofs).

S3C might also be a good for those who haven't used C-level programming before, for learning what gaps security-wise such programming leaves open and how to test against them thoroughly ('runtime pendant' to proofs).

It might also be interesting to actually try to add some static verification (static type system) later on, although probably after doing the rest of the project first due to time constraints.

The project repository (currently just containing a mockup) is here (example).

Reasons this may be a good idea:

instead of diving right into a programming language that requires static thinking (types, proofs), the VM implementors could use one that follows the easy/familiar paradigm of runtime typing, but at the same time get a feeling of what kind of automatic testing is required to ensure that the program won't fail or be insecure. This feeling may help later on when learning statically checked programming styles.
for those not already knowing C, this would give a(n interesting) way to learn it while still being supported working towards the security aim.
it would give a second (first, ordered by time) chance to write a simple compiler, in a different style: top-down (we already know what we want to compile and will implement it right away in a language that's stable (Scheme, excepted from the Aims as this goes into the construction of the VM, not the Scheme compiler¹)), featuring a stack, in the C backend dealing with statically declared object sizes, but not requiring GC or first-class continuations.
¹ once the Scheme compiler is ready, it could run the S3C compiler and things would become full circle.
the programming style (delayed type checking, program modification without restarts) might actually be more productive than using other languages. (Note that we'll (hopefully) have more powerful runtime debugging features than most dynamically typed programming languages offer (first-class continuations, attaching additional info to objects, macros to introduce debug points without editing the program).)
S3C may also be useful for application level programming, embedded in Scheme programs for parts that need to be fast. And it's going to be easier to modify a Scheme program into S3C than rewrite it into C/Rust/ATS/similar.
the program could probably be translated to languages like Rust, ATS, or asm.js (or LLVM or Intel/ARM/... assembly) instead of C with minimal changes to the S3C language (while still retaining its safe implementation for added development features).
The safe implementation of S3C could allow access to Scheme objects, which would allow the VM to represent VML objects with Scheme objects instead of raw bytes, and then make use of the Scheme GC to verify that the VM doesn't leak or prematurely drop VML memory (and could allow to trace the objects' use through the execution). (Would it be possible to prove such safety statically? Using Rust: also by relying on two allocation modi (one using Rust objects, the other raw bytes)?)
(it might also be useful to work on a 'real-world' Scheme program together before doing the actual Scheme implementation on top of VML, it will use more language features than SICP does (so far), and will give those with different Scheme backgrounds a chance to exchange experiences. Contributors participating in S3C development may then also be better equipped to help those joining in the last part of the project.)

Reasons this may be a bad idea:

while S3C isn't difficult conceptually, it's still going to be quite a bit of work. (Two ways to allocate memory, two ways to pass it; various different integer sizes, unsigned/signed, with arithmetic operations for all of these sizes and converting between the sizes; how easy will it be to abstract sizes away?) Some corners haven't been thought through fully yet (like: will it really not matter whether objects are implemented with bytes or with Scheme objects (-> disallow pointer arithmetic)). There's a chance it might not be ready in time.
If it looks like S3C can be finished later, the VM could just be run in the S3C mockup for the time being, though. Otherwise, perhaps write the VM in PreScheme instead (which is a subset of Scheme) or even plain Scheme. (There would probably not be enough time to do it in Rust/ATS anymore at that point.)
Note that if necessary, we could save much time by dropping memory allocation / GC development and using Scheme objects to represent VML objects.
value of investment: it is somewhat likely that one encounters Rust in another context, while it would be super likely that actual usage of S3C would be restricted to personal projects (except it might be a nice way to produce C code in the context of a project built in C).

Plan

S3C only has a mock implementation so far, thus if we are to use it to implement the VM, we'll have to implement it first. This will take 2-3 months perhaps, thus VM implementation would be delayed (we might want to implement the VM at the same time as S3C, though).

But perhaps that would actually suit the group fine: SICP studying could go along at the same time, and once S3C (and perhaps also the VM) are ready, people just interested in implementing Scheme on top could start off with the benefit of some more SICP knowledge. Especially chapter 4 should be useful. Chapter 5 is about writing a compiler: perhaps it makes sense to study this chapter as part of the workshop, we/Christian will have to think about this. Or the workshop could be followed while postponing study of chapter 5 to afterwards. -> To be decided.

The timeline would entail:

meeting where Christian describes each part of the project, with room for questions and discussion to help group members to decide whether they would like to participate in the S3C and/or VM development phase.
write safe implementation of S3C (in Scheme). Perhaps write VM at the same time already. (~2 months)
write translator to C. (This could be done later assuming performance isn't too low without it.) (~1 month)
Write VM if not already done (= implement VML). This is hopefully easier for participants to join: for those already knowing C, this will just entail learning a little different syntax, for those not knowing C, this should be easier than learning C as it has a safe evaluation environment that will point out bugs easily.
Grow Scheme on top of VML, with a bigger or changed participant group. This part should be even easier for starters, nothing other than familiarity with the language to be implemented (Scheme) will be required.
The ultimate goal is reached when the system is able not only to run the Scheme compiler (which is always self-hosted), but also the S3C compiler, i.e. fully hosting itself.

Todo

These should be done before entirely fixating the plan.

look into PreScheme, see how suitable it would be; look what memory manipulation operations are offered by it. (S3C would still make some sense because it could have different backends (although that could presumably be added to PreScheme, too, of course), and perhaps because it would allow for separate compilation units whereas PreScheme might be limited to whole program compilation.)
we will need memory manipulation routines even for S3C's mock implementation; take those from PreScheme or extend them.
a library in Scheme similar to QuickCheck is needed, for rules-based testing. Decide whether to write one ourselves incrementally, or look into (and adapt/extend) existing QuickCheck variant(s) in Scheme.

Mon, 23 Jun 2014 14:05:52 +0100