Millfork: a middle-level programming language targeting 6502- and Z80-based microcomputers and home consoles
This project is maintained by KarolS
Decimal: 1
, 10
Binary: %0101
, 0b101001
Quaternary: 0q2131
Octal: 0o172
Hexadecimal: $D323
, 0x2a2
Digits can be separated by underscores for readability. Underscores are also allowed between the radix prefix and the digits:
123_456
0x01_ff
0b_0101_1111__1100_0001
$___________FF
When using Intel syntax for inline assembly, another hexadecimal syntax is available: 0D323H
, 2a2h
, 3e_h
.
It is not allowed in any other places.
The type of a literal is the smallest type of undefined signedness
that can fit either the unsigned or signed representation of the value:
200
is a byte
, 4000
is a word
, 75000
is an int24
etc.
However, padding the literal to the left with zeroes changes the type
to the smallest type that can fit the smallest number with the same number of digits and without padding.
For example, 0002
is of type word
, as 1000 does not fit in one byte.
String literals can be used as either array initializers or expressions of type pointer
.
String literals are equivalent to constant arrays. Writing to them via their pointer is undefined behaviour.
If a string literal is used as an expression, then the text data will be located in the default code segment, regardless of which code segment the current function is located it. This may be subject to change in future releases.
String literals are surrounded with double quotes and optionally followed by the name of the encoding:
"this is a string" ascii
"this is also a string"
If there is no encoding name specified, then the default
encoding is used.
Two encoding names are special and refer to platform-specific encodings:
default
and scr
.
You can also append z
to the name of the encoding to make the string zero-terminated.
This means that the string will have a string terminator appended, usually a single byte.
The exact value of that byte is encoding-dependent:
vectrex
encoding it’s 128,zx80
encoding it’s 1,zx81
encoding it’s 11,petscr
and petscrjp
encodings it’s 224,atascii
encoding it’s 219,utf16be
and utf16le
encodings it’s exceptionally two bytes: 0, 0in other encodings it’s 0 (this may be a subject to change in future versions).
"this is a zero-terminated string" asciiz
"this is also a zero-terminated string"z
The byte constant nullchar
is defined to be equal to the string terminator in the default
encoding (or, in other words, to '{nullchar}'
)
and the byte constant nullchar_scr
is defined to be equal to the string terminator in the scr
encoding ('{nullchar}'scr
).
You can override the values for nullchar
and nullchar_scr
by defining preprocessor features NULLCHAR
and NULLCHAR_SCR
respectively.
Warning: If you define UTF-16 or UTF-32 to be you default or screen encoding, you will encounter several problems:
nullchar
and nullchar_scr
will still be bytes, equal to zero.string
module in the Millfork standard library will not work correctlyYou can also prepend p
to the name of the encoding to make the string length-prefixed.
The length is measured in bytes and doesn’t include the zero terminator, if present. In all encodings except for UTF-16 and UTF-32 the prefix takes one byte, which means that length-prefixed strings cannot be longer than 255 bytes.
In case of UTF-16, the length prefix contains the number of code units,
so the number of bytes divided by two,
which allows for strings of practically unlimited length.
The length is stored as two bytes and is always little endian,
even in case of the utf16be
encoding or a big-endian processor.
In case of UTF-32, the length prefix contains the number of Unicode codepoints,
so the number of bytes divided by four.
The length is stored as four bytes and is always little endian,
even in case of the utf32be
encoding or a big-endian processor.
"this is a Pascal string" pascii
"this is also a Pascal string"p
"this is a zero-terminated Pascal string"pz
Note: A string that’s both length-prefixed and zero-terminated does not count as a normal zero-terminated string! To pass it to a function that expects a zero-terminated string, add 1 (or, in case of UTF-16, 2, or UTF-32, 4):
pointer p
p = "test"pz
// putstrz(p) // won't work correctly
putstrz(p + 1) // ok
Most characters between the quotes are interpreted literally.
To allow characters that cannot be inserted normally,
each encoding may define escape sequences.
Every encoding is guaranteed to support at least
{q}
for double quote
and {apos}
for single quote/apostrophe.
The number of bytes used to represent given characters may differ from the number of the characters.
For example, the petjp
, msx_jp
and jis
encodings represent ポ as two separate characters, and therefore two bytes.
For the list of all text encodings and escape sequences, see this page.
In some encodings, multiple characters are mapped to the same byte value, for compatibility with multiple variants.
If the characters in the literal cannot be encoded in particular encoding, an error is raised.
However, if the command-line option -flenient-encoding
is used,
then literals using default
and scr
encodings replace unsupported characters with supported ones,
skip unsupported escape sequences, and a warning is issued.
For example, if -flenient-encoding
is enabled, then a literal "£¥↑ž©ß"
is equivalent to:
"£Y↑z(C)ss"
if the default encoding is pet
"£Y↑z©ss"
if the default encoding is bbc
"?Y^z(C)ss"
if the default encoding is ascii
"?Y^ž(C)ss"
if the default encoding is iso_yu
"?Y^z(C)ß"
if the default encoding is iso_de
"?¥^z(C)ss"
if the default encoding is jisx
"£¥^z(C)β"
if the default encoding is msx_intl
Note that the final length of the string may vary.
Character literals are surrounded by single quotes and optionally followed by the name of the encoding:
'x' ascii
'W'
Character literals have to be separated from preceding operators with whitespace:
a='a' // wrong
a = 'a' // ok
From the type system point of view, they are constants of type byte.
If the character cannot be represented as one byte, an error is raised.
For the list of all text encodings and escape sequences, see this page.
If the characters in the literal cannot be encoded in particular encoding, an error is raised.
However, if the command-line option -flenient-encoding
is used,
then literals using default
and scr
encodings replace unsupported characters with supported ones.
If the replacement is one character long, only a warning is issued, otherwise an error is raised.
You can create a constant of a given struct type by listing constant values of fields as arguments:
struct point { word x, word y }
point(5,6)
An array is initialized with either:
(only byte arrays) a string literal
(only byte arrays) a file
expression
a for
-style expression
(only byte arrays) a format, followed by an array initializer:
@word_le
: for every term of the array initializer, emit two bytes, first being the low byte of the value, second being the high byte:
@word_le [$1122]
is equivalent to [$22, $11]
@word_be
– like the above, but opposite:
@word_be [$1122]
is equivalent to [$11, $22]
@word
: equivalent to @word_le
on little-endian architectures and @word_be
on big-endian architectures
@long
, @long_le
, @long_be
: similar, but with four bytes
@long_le [$11223344]
is equivalent to [$44, $33, $22, $11]
@long_be [$11223344]
is equivalent to [$11, $22, $33, $44]
@struct
: every term of the initializer is interpreted as a struct constructor (see below)
and treated as a list of bytes with no padding
@struct [s(1, 2)]
is equivalent to [1, 2]
when struct s {byte x, byte y}
is defined
@struct [s2(1, 2), s2(3, 4)]
is equivalent to [1, 0, 2, 0, 3, 0, 4, 0]
on little-endian machines when struct s2 {word x, word y}
is defined
a list of literals and/or other array initializers, surrounded by brackets:
array a = [1, 2]
array b = "----" scr
array c = ["hello world!" ascii, 13]
array d = file("d.bin")
array d1 = file("d.bin", 128)
array e = file("d.bin", 128, 256)
array f = for x,0,until,8 [x * 3 + 5] // equivalent to [5, 8, 11, 14, 17, 20, 23, 26]
array(point) g = [point(2,3), point(5,6)]
array(point) i = for x,0,until,100 [point(x, x+1)]
Trailing commas ([1, 2,]
) are not allowed.
String literals are laid out in the arrays as-is, flat.
To have an array of pointers to strings, wrap each string in pointer(...)
:
// a.length = 12; identical to [$48, $45, $4C, $4C, $4F, 0, $57, $4F, $52, $4C, $44, 0]
array a = [ "hello"z, "world"z ]
// b.length = 2
array(pointer) b = [ pointer("hello"z), pointer("world"z) ]
The parameters for file
are: file path, optional start offset, optional length
(if only two parameters are present, then the second one is assumed to be the start offset).
The file
expression is expanded at the compile time to an array of bytes equal to the bytes contained in the file.
If the start offset is present, then that many bytes at the start of the file are skipped.
If the length is present, then only that many bytes are taken, otherwise, all bytes until the end of the file are taken.
The for
-style expression has a variable, a starting index, a direction, a final index,
and a parameterizable array initializer.
The initializer is repeated for every value of the variable in the given range.
Struct constructors look like a function call, where the callee name is the name of the struct type
and the parameters are the values of fields in the order of declaration.
Fields of arithmetic, pointer and enum types are declared using normal expressions.
Fields of struct types are declared using struct constructors.
Fields of union types cannot be declared.
What might be useful is the fact that the compiler allows for certain built-in functions in constant expressions only:
sin(x, n)
– returns n·sin(xπ/128)
cos(x, n)
– returns n·cos(xπ/128)
tan(x, n)
– returns n·tan(xπ/128)
min(x,...)
– returns the smallest argument
max(x,...)
– returns the largest argument