Millfork: a middle-level programming language targeting 6502- and Z80-based microcomputers and home consoles
This project is maintained by KarolS
Every encoding is defined in an .tbl
file with an appropriate name.
The file is looked up in the directories on the include path, first directly, then in the encoding
subdirectory.
The file is a UTF-8 text file, with each line having a specific meaning.
In the specifications below, <>
are not to be meant literally:
lines starting with #
, ;
or //
are comments.
ALIAS=<another encoding name>
defines this encoding to be an alias for another encoding.
No other lines are allowed in the file.
NAME=<name>
defines the name for this encoding. Required.
BUILTIN=<internal name>
defines this encoding to be a UTF-based encoding.
<internal name>
may be one of UTF-8
, UTF-16LE
, UTF-16BE
, UTF-32LE
, UTF-32BE
.
If this directive is present, the only other allowed directive in the file is the NAME
directive.
EOT=<xx>
where <xx>
are two hex digits, defines the string terminator byte.
Required, unless BUILTIN
is present.
There have to be two digits, EOT=0
is invalid.
lines like <xx>=<c>
where <xx>
are two hex digits
and <c>
is either a non-whitespace character or a BMP Unicode codepoint written as U+xxxx
,
define the byte <xx>
to correspond to character <c>
.
There have to be two digits, 0=@
is invalid.
lines like <xx>-<xx>=<c><c><c><c>
where <c>
is repeated an appropriate number of times
define characters for multiple byte values.
In this kind of lines, characters cannot be represented as Unicode codepoints.
lines like <c>=<xx>
, <c>=<xx><xx>
etc.
define secondary or alternate characters that are going to be represented as one or more bytes.
There have to be two digits, @=0
is invalid.
Problematic characters (space, =
, #
, ;
) can be written as Unicode codepoints U+xxxx
.
a line like a-z=<xx>
is equivalent to lines a=<xx>
, b=<xx+$01>
all the way to z=<xx+$19>
.
a line like KATAKANA=>DECOMPOSE
means that katakana characters with dakuten or handakuten
should be split into the base character and the standalone dakuten/handakuten.
similarly with HIRAGANA=>DECOMPOSE
.
lines like {<escape code>}=<xx>
, {<escape code>}=<xx><xx>
etc.
define escape codes. It’s a good practice to define these when possible:
{q}
, {apos}
, {n}
, {lbrace}
, {rbrace}
,
{yen}
, {pound}
, {cent}
, {euro}
, {copy}
, {pi}
,
{nbsp}
, {shy}
.