To be silly, I have added a binary representation of a Merthese code file. Here's the spec:
Header (7 bytes):
Bytes 0 - 4: "MERTH" - {0x4D, 0x45, 0x52, 0x54, 0x48}
Byte 5: Version - 0x69
Byte 6: Extension Flag:
0b00000000 - None
0b00000001 - Kerm
0b00000010 - Nikky
0b00000100 - Tev
0b00001000 - Ashbad
After the header, you can start reading bit by bit. Read until the first 0 bit to get the vanilla merthese commands:
Merthese
0 - NOP
10 - M
110 - E
1110 - R
11110 - T
111110 - H
1111110 - Ext.
If you've read 1111110, this signals that we're in an extension. You should then read to the next 0 bit to determine which extension, and read to the next 0 after that to get the extension code.
0
Merthing @ Kerm
0 - K
10
Merthing @ Kerm w/ Nikky
0 - N
10 - I
110 - Y
110
Merthing @ Kerm w/ Nikky && tev
0 - V
1110
Merthing and Ashbad are awesome
0 - A
After you've determined the right command, you can execute based on which extensions are enabled. As per the Merthese definitions, overlapping letters should be determined at run-time, so if you encounter a "10" while the kerm flag is true, you must decide at runtime which action to perform. In the case of the 'n' command (nikky extension), you should read the next 7-bits (Merthese is ASCII-based).
Here's an implementation in C#
Code:
namespace Merthese.Compiler;
public static class Merthpiler
{
public static async Task<int> MerthpileAsync(Stream asciiInput, Stream binaryOutput, bool kerm, bool nikky, bool tev, bool ashbad, bool emitNops, CancellationToken cancellationToken = default)
{
try
{
var flag = 0
| (kerm ? 0b00000001 : 0)
| (nikky ? 0b00000010 : 0)
| (tev ? 0b00000100 : 0)
| (ashbad ? 0b00001000 : 0);
byte[] buffer = [0x4D, 0x45, 0x52, 0x54, 0x48, 0x69, (byte)flag];
await binaryOutput.WriteAsync(buffer.AsMemory(0, 7), cancellationToken);
await binaryOutput.FlushAsync(cancellationToken);
var emittedCount = 0;
string emitBuffer = "";
while (asciiInput.CanRead)
{
cancellationToken.ThrowIfCancellationRequested();
var c = await asciiInput.ReadAsync(buffer.AsMemory(0, 1), cancellationToken);
if (c <= 0)
{
break;
}
emitBuffer += (char)buffer[0] switch
{
'm' => "10",
'e' => "110",
'r' => "1110",
't' => "11110",
'h' => "111110",
// 1111110 = extension
// 11111100 = kerm
'k' when kerm => "111111000",
// 1111111010 = nikky
'n' when nikky => "11111110100",
'i' when nikky => "111111101010",
'y' when nikky => "1111111010110",
// 1111110110 = tev
'v' when tev => "11111101100",
// 11111101110 = ashbad
'a' when ashbad => "111111011100",
_ => "0",
};
if (emitBuffer.Length >= 8)
{
buffer[0] = 0;
var byteCount = (emitBuffer.Length / 8) * 8;
for (var i = 0; i < byteCount; i++)
{
var fixedEndianIndex = 7 - i;
var bit = emitBuffer[i] == '1' ? 1 : 0;
buffer[0] |= (byte)(bit << fixedEndianIndex);
if (i > 8 && i % 8 == 0)
{
await binaryOutput.WriteAsync(buffer.AsMemory(0, 1), cancellationToken);
buffer[0] = 0;
emittedCount++;
}
}
await binaryOutput.WriteAsync(buffer.AsMemory(0, 1), cancellationToken);
buffer[0] = 0;
emittedCount++;
emitBuffer = emitBuffer[byteCount..];
}
await binaryOutput.FlushAsync(cancellationToken);
}
if (emitBuffer.Length > 0)
{
buffer[0] = 0;
for (var i = 0; i < emitBuffer.Length; i++)
{
var fixedEndianIndex = (emitBuffer.Length - 1) - i;
var bit = emitBuffer[i] == '1' ? 1 : 0;
buffer[0] |= (byte)(bit << fixedEndianIndex);
}
await binaryOutput.WriteAsync(buffer.AsMemory(0, 1), cancellationToken);
emittedCount++;
}
await binaryOutput.FlushAsync(cancellationToken);
return emittedCount;
} catch
{
await binaryOutput.FlushAsync(cancellationToken);
throw;
}
}
}
I'm currently working on the virtual machine that will run the compiled files, and will post a github repo containing the Compiler, Runner, and a CLI wrapper them once that's done.