There are many implementation choices with different performance characteristics. But we're not worrying about performance as long as it won't create barriers for optimizing later on.
But the collection must be precise, to fulfill the correctness aim: algorithms that treat some objects as in-use when they are not make programs that append data to children of such objects (like with functional streams) run out of memory.
Mark & sweep on top of malloc/free would be one choice, but it seems that a copying collector makes more sense: that's closer to what most systems implement, and avoids the malloc overhead[1]. It will be easy to do: we won't have big chunks of memory unless we implement vectors, too, thus there's no temptation to optimize by separating big objects into a heap.
[1] although that's a moot point if we end up allocating every Scheme object as an object in the host language anyway for safety reasons. (But how much safety could that approach give, given that allocation/deallocation would be handled by our GC, not the host language, and the behaviour of the mutator (Scheme) can't be limited or known statically?)
Simple reference counting would not work because Scheme programs can create cyclic references (through explicit mutation, but also closures with recursive definitions will probably be implemented using mutation). Hybrid variants are too complex for a first implementation.
(If we're implementing the memory layout ourselves, then) We'll probably use standard pointer tagging (differentiation between pointers and 'immediate objects' by marking the low bits not used in pointer values). The number of lowlevel object types may be low: see Objects.
The safe implementation of S3C will use destructors for memory allocation verification, thus if we get to the point where we want to run S3C on our system, we'll want to implement destructors to support S3C. (Also, perhaps weak references, they can be useful for other purposes.)