Millfork: a middle-level programming language targeting 6502- and Z80-based microcomputers and home consoles
This project is maintained by KarolS
Every platform is defined in an .tbl
file with an appropriate name.
The file is looked up in the directories on the include path, first directly, then in the encoding
subdirectory.
TODO: document the file format.
default
– default console encoding (can be omitted)
scr
– default screencodes
(usually the same as default
, a notable exception are the Commodore computers)
ascii
– standard ASCII
petscii
or pet
– PETSCII (ASCII-like character set used by Commodore machines from VIC-20 onward)
petsciijp
or petjp
– PETSCII as used on Japanese versions of Commodore 64
origpetscii
or origpet
– old PETSCII (Commodore PET with original ROMs)
oldpetscii
or oldpet
– old PETSCII (Commodore PET with newer ROMs)
geos_de
– text encoding used by the German version of GEOS for C64
cbmscr
or petscr
– Commodore screencodes
cbmscrjp
or petscrjp
– Commodore screencodes as used on Japanese versions of Commodore 64
apple2
– original Apple II charset ($A0–$DF)
apple2e
– Apple IIe charset
apple2c
– alternative Apple IIc charset
apple2gs
– Apple IIgs charset
macroman
– Macintosh Western Latin charset
bbc
– BBC Micro character set
sinclair
– ZX Spectrum character set
zx80
– ZX80 character set
zx81
– ZX81 character set
jis
or jisx
– JIS X 0201
iso_de
, iso_no
, iso_se
, iso_yu
– various variants of ISO/IEC-646
iso_dk
, iso_fi
– aliases for iso_no
and iso_se
respectivelydmcs
– DEC Multinational Character Set
lics
– Lotus International Character Set
iso8859_1
, iso8859_2
, iso8859_3
,
iso8859_4
, iso8859_5
, iso8859_7
,
iso8859_9
, iso8859_10
, iso8859_13
,
iso8859_14
, iso8859_15
, iso8859_13
–
ISO 8859-1, ISO 8859-2, ISO 8859-3,
ISO 8859-4, ISO 8859-5, ISO 8859-7,
ISO 8859-9, ISO 8859-10, ISO 8859-13,
ISO 8859-14, ISO 8859-15, ISO 8859-16,
iso1
, latin1
– aliases for iso8859_1
iso2
, latin2
– aliases for iso8859_2
iso3
, latin3
– aliases for iso8859_3
iso4
, latin4
– aliases for iso8859_4
iso5
– alias for iso8859_5
iso7
– alias for iso8859_7
iso9
, latin5
, – aliases for iso8859_9
iso10
, latin6
– aliases for iso8859_10
iso13
, latin7
– aliases for iso8859_13
iso14
, latin8
– aliases for iso8859_14
iso_15
, latin9
, latin0
– aliases for iso8859_15
iso16
, latin10
– aliases for iso8859_16
brascii
– BraSCII
cp437
, cp850
, cp851
, cp852
, cp855
, cp858
, cp866
–
DOS codepages 437, 850, 851, 852, 855, 858, 866
mazovia
– Mazovia encoding
kamenicky
– Kamenický encoding
cp1250
, cp1251
, cp1252
, cp1253
, cp1254
, cp1257
– Windows codepages 1250, 1251, 1252, 1253, 1254, 1257
msx_intl
, msx_jp
, msx_ru
, msx_br
– MSX character encoding, International, Japanese, Russian and Brazilian respectively
msx_us
, msx_uk
, msx_fr
, msx_de
– aliases for msx_intl
cpc_en
, cpc_fr
, cpc_es
, cpc_da
– Amstrad CPC character encoding, English, French, Spanish and Danish respectively
pcw
or amstrad_cpm
– Amstrad CP/M encoding, the US variant (language 0), as used on PCW machines
pokemon1en
, pokemon1jp
, pokemon1es
, pokemon1fr
– text encodings used in 1st generation Pokémon games,
English, Japanese, Spanish/Italian and French/German respectively
pokemon1it
, pokemon1de
– aliases for pokemon1es
and pokemon1fr
respectivelyatascii
or atari
– ATASCII as seen on Atari 8-bit computers
atasciiscr
or atariscr
– screencodes used by Atari 8-bit computers
koi7n2
or short_koi
– KOI-7 N2
koi8r
, koi8u
, koi8ru
, koi8e
, koi8f
, koi8t
– various variants of KOI-8
vectrex
– built-in Vectrex font
galaksija
– text encoding used on Galaksija computers
trs80m1
– text encoding used on TRS-80 Model 1
trs80m3
– text encoding used on TRS-80 Model 3
coco
– text encoding used on Tandy Color Computer
cocoscr
– Tandy Color Computer screencodes
z1013
– text encodind used on Robotron Z1013
ebcdic
– EBCDIC codepage 037 (partial coverage)
utf8
– UTF-8
utf16be
, utf16le
– UTF-16BE and UTF-16LE
utf32be
, utf32le
– UTF-32BE and UTF-32LE
When programming for Commodore,
use petscii
for strings you’re printing using standard I/O routines
and petsciiscr
for strings you’re copying to screen memory directly.
When programming for Atari,
use atascii
for strings you’re printing using standard I/O routines
and atasciiscr
for strings you’re copying to screen memory directly.
Escape sequences allow for including characters in the string literals that would be otherwise impossible to type.
Some escape sequences may expand to multiple characters. For example, in several encodings {n}
expands to {x0D}{x0A}
.
{x00}
–{xff}
– a character of the given hexadecimal value
{copyright_year}
– this expands to the current year in digits
{program_name}
– this expands to the name of the output file without the file extension
{program_name_upper}
– the same, but uppercased
{nullchar}
– the null terminator for strings ("{nullchar}"
is equivalent to ""z
).
The exact value of {nullchar}
is encoding-dependent:
vectrex
encoding it’s {x80}
,zx80
encoding it’s {x01}
,zx81
encoding it’s {x0b}
,petscr
and petscrjp
encodings it’s {xe0}
,apple2e
encoding it’s {x7f}
,atasciiscr
encoding it’s {xdb}
,pokemon1*
encodings it’s {x50}
,cocoscr
encoding it’s exceptionally two bytes: {xd0}
utf16be
and utf16le
encodings it’s exceptionally two bytes: {x00}{x00}
utf32be
and utf32le
encodings it’s exceptionally four bytes: {x00}{x00}{x00}{x00}
{x00}
(this may be a subject to change in future versions).{apos}
– apostrophe/single quote (available everywhere except for zx80
, zx81
and galaksija
)
{q}
– double quote symbol (available everywhere except for pokemon1*
encodings)
{n}
– new line
{b}
– backspace
{lbrace}
, {rbrace}
– opening and closing curly brace (only in encodings that support braces)
{up}
, {down}
, {left}
, {right}
– control codes for moving the cursor
{white}
, {black}
, {red}
, {green}
, {blue}
, {cyan}
, {yellow}
, {purple}
–
control codes for changing the text color (petscii
, petsciijp
, sinclair
only)
{bgwhite}
, {bgblack}
, {bgred}
, {bggreen}
, {bgblue}
, {bgcyan}
, {bgyellow}
, {bgpurple}
–
control codes for changing the text background color (sinclair
only)
{reverse}
, {reverseoff}
– inverted mode on/off
{yen}
, {pound}
, {cent}
, {euro}
, {copy}
– yen symbol, pound symbol, cent symbol, euro symbol, copyright symbol
{nbsp}
, {shy}
– non-breaking space, soft hyphen
{pi}
– letter π
{u0000}
–{u1fffff}
– Unicode codepoint (available in UTF encodings only)
For ISO/DOS/Windows/UTF encodings, consult external sources.
Encoding | lowercase letters | backslash | currencies | intl | card suits |
---|---|---|---|---|---|
pet , |
yes¹ | no | £ | none | yes¹ |
origpet |
yes¹ | yes | none | yes¹ | |
oldpet |
yes² | yes | none | yes² | |
petscr |
yes¹ | no | £ | none | yes¹ |
petjp |
no | no | ¥ | katakana³ | yes³ |
petscrjp |
no | no | ¥ | katakana³ | yes³ |
geos_de |
yes | no | no | ||
sinclair , bbc |
yes | yes | £ | none | no |
zx80 , zx81 |
no | no | £ | none | no |
apple2 |
no | yes | none | no | |
atascii |
yes | yes | none | yes | |
atasciiscr |
yes | yes | none | yes | |
z1013 |
yes | yes | none | yes | |
jis |
yes | no | ¥ | both kana | no |
dmcs ,lics |
yes | yes | ¢£¥ | Western | no |
brascii ,macroman |
yes | yes | ¢£¥ | Western | no |
msx_intl ,msx_br |
yes | yes | ¢£¥ | Western | yes |
msx_jp |
yes | no | ¥ | katakana | yes |
msx_ru |
yes | yes | Russian⁴ | yes | |
koi7n2 |
no | yes | Russian⁵ | no | |
koi8* |
yes | yes | Russian | no | |
cpc_en |
yes | yes | £ | none | yes |
cpc_es |
yes | yes | Spanish⁶ | yes | |
cpc_fr |
yes | no | £ | French⁷ | yes |
cpc_da |
yes | no | £ | Nor/Dan. | yes |
vectrex |
no | yes | none | no | |
coco ,cocoscr |
no | yes | none | no | |
pokemon1jp |
no | no | both kana | no | |
pokemon1en |
yes | no | none | no | |
pokemon1fr |
yes | no | Ger/Fre. | no | |
pokemon1es |
yes | no | Spa/Ita. | no | |
galaksija |
no | no | Yugoslav⁸ | no |
pet
, origpet
and petscr
cannot display card suit symbols and lowercase letters at the same time.
Card suit symbols are only available in graphics mode,
in which lowercase letters are displayed as uppercase and uppercase letters are displayed as symbols.
oldpet
cannot display card suit symbols and lowercase letters at the same time.
Card suit symbols are only available in graphics mode, in which lowercase letters are displayed as symbols.
petjp
and petscrjp
cannot display card suit symbols and katakana at the same time.
Card suit symbols are only available in graphics mode, in which katakana is displayed as symbols.
Letter Ё and uppercase Ъ are not available.
Only uppercase. Letters Ё and Ъ are not available.
No accented vowels.
Some accented vowels are not available.
Letter Đ is not available.
If the encoding does not support lowercase letters (e.g. apple2
, petjp
, petscrjp
, koi7n2
, vectrex
),
then text and character literals containing lowercase letters are automatically converted to uppercase.
Only unaccented Latin and Cyrillic letters will be converted as such.
Accented Latin letters will not be converted and will fail to compile without -flenient-encoding
.
To detect if your default encoding does not support lowercase letters, test 'A' == 'a'
.
The table below may be incomplete.
Encoding | new line | braces | backspace | cursor movement | text colour | reverse | background colour |
---|---|---|---|---|---|---|---|
pet ,petjp |
yes | no | no | yes | yes | yes | no |
origpet |
yes | no | no | yes | no | yes | no |
oldpet |
yes | no | no | yes | no | yes | no |
petscr , petscrjp |
no | no | no | no | no | no | no |
geos_de |
no | no | no | no | no | yes | no |
sinclair |
yes | yes | no | yes | yes | yes | yes |
zx80 ,zx81 |
yes | no | yes | yes | no | no | no |
ascii , iso_* |
yes | yes | yes | no | no | no | no |
iso8869_* , cp* |
yes | yes | yes | no | no | no | no |
apple2 |
no | yes | no | no | no | no | no |
apple2 |
no | no | no | no | no | no | no |
apple2e |
no | yes | no | no | no | no | no |
apple2gs |
no | yes | no | no | no | no | no |
atascii |
yes | no | yes | yes | no | no | no |
atasciiscr |
no | no | no | no | no | no | no |
msx_* |
yes | yes | yes | yes | no | no | no |
koi7n2 |
yes | no | yes | no | no | no | no |
koi8* |
yes | yes | yes | no | no | no | no |
vectrex |
no | no | no | no | no | no | no |
coco |
yes | no | yes | no | no | no | no |
cocoscr |
no | no | no | no | no | no | no |
utf* |
yes | yes | yes | no | no | no | no |
all the rest | yes | yes | no | no | no | no | no |