The language (instruction set) that the VM understands.
Suggestion: a strict and impure lambda calculus (i.e. simplified Scheme) in continuation-passing style, so that the VM does not need to implement a call/continuation stack, and first-class continuations are directly available.
The VM will not analyze the programs, though, which means that variable lifetimes need to be encoded in the program (explicitely obliterate variables when they are no longer needed, and every lambda form (representing a continuation) needs to explicitely tell which variables need to be captured (kept alive)).
Variables could have names, or numbers. Numbers being more likely what actual hardware would implement. The number of variables might be limited, or we may make it unlimited.
Details to be worked out (as part of the VM implementation effort, or perhaps as an exercise?).
I've done some experiments myself already, but I think it's useful if you give it a try yourself first, then we can compare our thoughts/results. Christian.
Actual hardware would read binary representations ('byte code') for efficiency and ease of implementation. Our VM could do the same, but the question then becomes how to write our first programs for it: do we want to prove our hardcore-ness by first writing an assembler in binary? Probably rather 'cheat' and implement a cross assembler, i.e. in an existing programming language outside the VM. But that would mean that the assembler could not be used by higher levels running on the VM itself (i.e. the compiler).
Alternatively, the manufacturer of the 'machine' (our VM group) could provide an assembler for us, i.e. a program in bytecode (how will they produce it?). Or, the VM could contain a text file reader (assembler) built into it (let's assume that the process used to build the machine allows one to do this economically).
(The machine could also offer functionality ('opcodes' / procedures / functions) to emit code directly into an internal representation (whatever that is, perhaps an abstract syntax tree), that compilers would use to output code. (LLVM does offer this, aside bytecode and text formats.))
what are the minimal features for this language? What a minimal / good syntax? (An S-expression based representation will probably make sense, but perhaps another than parenthesis-based serialization would make sense here? Or, another S-expression based representation than using lambda forms, since the latter will lead to deep indentation levels.)
optional: would some kind of Forth(-like language) be a good alternative? See the 'Reverse Polish notation' exercise on Exercises.
Spoiler: see Copycat for an experimental implementation of such a language, in Scheme, from me. Christian.