Millfork: a middle-level programming language targeting 6502- and Z80-based microcomputers and home consoles
This project is maintained by KarolS
The default options provide literally no optimizations.
Consider using at least -O1
for quick compilation and -O4
for release builds.
Inlining can drastically improve performance. Add -finline
to the command line.
If you’re not using self-modifying code or code generation,
enabling interprocedural optimizations (-fipo
) and stdlib optimizations (-foptimize-stdlib
) can also help.
For convenience, all options useful for debug builds can be enabled with -Xd
,
and for release builds with -Xr
.
6502 only: If you are sure the target will have a CPU that supports so-called illegal/undocumented 6502 instructions,
consider adding the -fillegals
option. Good examples of such targets are NES and C64.
Consider adding align(fast)
or even align(256)
to arrays which you want to access quickly.
6502 only: Consider adding align(fast)
to the hottest functions.
If you have an array of structs, consider adding align(X)
to the definition of the struct,
where X
is a power of two. Even if this makes the struct 12 bytes instead of 11, it can still improve performance.
Use the smallest type you need. Note that Millfork supports integers of any size from 1 to 16 bytes.
Consider using multiple arrays instead of arrays of structs.
Avoid reusing temporary variables. It makes it easier for the optimizer to eliminate the variable entirely.
Mark the most frequently used local variables as register
.
It will increase chances that those variables, and not the ones less frequently used,
are inlined into registers or put in the zeropage.
Write many functions with no parameters and use -finline
.
This will simplify the job for the optimizer and increase the chances of certain powerful optimizations to apply.
Avoid passing many parameters to functions. Try to minimize the number of bytes passed as parameters and returned as return values.
For for
loops that use a byte-sized variable and whose body does not involve function calls or further loops,
use a unique iteration variable. Such variable will have a bigger chance of being stored in a CPU register.
For example:
byte i
byte j
for i,0,until,30 { .... }
for j,0,until,40 { .... }
is usually better than:
byte i
for i,0,until,30 { .... }
for i,0,until,40 { .... }
8080/Z80 only: The previous tip applies also for loops using word-sized variables.
When the iteration order is not important, use paralleluntil
or parallelto
.
The compiler will try to choose the optimal iteration order.
Since 0.3.18: When the iteration order is not important,
use for ix,ptr:array
to iterate over arrays of structs.
6502 only: When iterating over an array larger than 256 bytes, whose element count is a composite number, consider splitting it into less-than-256-byte sized slices and use them within the same iteration. For example, instead of:
word i
for i,0,paralleluntil,1000 {
screen[i] = ' 'scr
}
consider:
byte i
for i,0,paralleluntil,250 {
screen[i+000] = ' 'scr
screen[i+250] = ' 'scr
screen[i+500] = ' 'scr
screen[i+750] = ' 'scr
}
Note that the compiler might do this optimization automatically for simpler loops with certain iteration ranges, but it is not guaranteed.
Avoid 16-bit arithmetic. Try to keep calculations 8-bit for as long as you can. If you can calculate the upper and lower byte of a 16-bit value separately, it’s usually better to do so.
Avoid arithmetic larger than 16-bit.
Use nonet
if you are sure that the result of shifting will fit into 9 bits.
Use nonet
when doing byte addition that you want to promote to a word.