Meisaka Wave2 Vector RISC CPU
0x0000 | c0 | x | y | z | w |
⋮ | ⋮ | ⋮ | |||
0x001c | c7 | x | y | z | w |
0x0020 | r0 | x | y | z | w |
⋮ | ⋮ | ⋮ | |||
0x0038 | r6 | x | y | z | w |
0x003c | ri | PC | y | z | w |
0x0040 ⋮ 0x00ff |
Private Memory Region | ||||
⋮ | ⋮ | ||||
See memory for the full map... |
Architecture
The Wave2 CPU has 8 constant and 8* general purpose registers, and IO is memory mapped with some modularity.
Wave2's architecture is designed such that most operations are SIMD, affecting all four words of their respective vectors.
In addition, it is a partially-sandboxed multi-user simulation. Each user has multiple cores, and all users and their cores are executed concurrently. Cores can run code from anywhere in memory, including the shared memory region.
The bytes in memory are stored in little-endian order. Bytes from user writes, such as from chat or when loading binaries, are interpreted as big-endian, and then written to memory in little-endian.
The vector words are also in little-endian order, such that the least significant component X
is the first in memory. The memory order begins at the least significant byte, with the least significant word first.
All CPU registers are SIMD vectors, holding a quartet of 16-bit words.
Some special instructions can operate on vectors as an octuplet of 8-bit words.
Memory is addressed by 16-bit words. Each memory address maps to a single word within its vector.
For example:
0x0000 => 0xABCD
0x0001 => 0xEF39
See Memory for more on the memory and its layout.
SIMD Architecture
The Wave2 CPU architecture is "SIMD by default" where most operations are inherently SIMD.
Every register is a SIMD vector of 4 words, and most instructions operate on all four words of both the source and destination register.
Additionally, while loading and storing memory can be performed one word at a time, it's also possible to atomically load and store entire vectors as long as they are stored with proper 4-word alignment.
Instructions that are said to operate on bytes instead of words are also SIMD, and treat the vector as eight 8-bit words for that operation.
Swizzling
Vector swizzling generally refers to the re-ordering of the components of a vector. Wave2 can perform vector swizzling both directly on a register to alter the vector's contents, or when specifying the vector components as part of an instruction.
Swizzle Instruction
Altering the content of a register.
The Swizzle instruction can be used to rearrange a vector's components arbitrarily. One use case might be altering which of the components in a vector is the least significant component, useful for when an instruction operates on only that value.
See the page on the Swizzle instruction for more.
Swizzling Operands
Selecting a register's vector components.
When using a register as an operand to an instruction, in some cases it is possible (or even required) to specify which word components of the register are relevant to the operation.
A register's components are specified using a period followed by 1-4 word specifiers, x
y
z
or w
, the number of which depends on the instruction's requirements.
For example, when copying values using the Move instruction, you can specify a subset of the components for the move using a swizzle:
; These two are equivalent
mov r0, r1
mov r0.xyzw, r1.xyzw
; Copy only the first word
mov r0.x, r1.x
; Copy only the third word
mov r0.z, r1.z
Another example is the Word Select instructions, which operate on the lest significant word of a register, and can select an arbitrary word from a register as the source word for the operation:
; Copy the value from r1.w into r0.x
wmove r0, r1.w
; Add r0.z into r0.x together
wadd r0, r0.z
; Conditional skip by subtracting a comparison result from the program counter
gt.w r0, r0, r1
wsub r0, ri.x
halt ; Halts only if r0 was greater than r1
jmp :continue
For other situations in which operand swizzling is useful, please see the documentation on the individual instructions.
Memory
System memory map:
0(0x0000) ..= 31(0x001F) => System constant registers, accessible via load
32(0x0020) ..= 63(0x003F) => System mutable registers, accessible via load/store
64(0x0040) ..= 255(0x00FF) => Non-volatile Core private RAM
256(0x0100) ..= 767(0x02FF) => Non-volatile Core banked RAM area
768(0x0300) ..= 1023(0x03FF) => VM and Ship I/O control area
1024(0x0400) ..= 2047(0x07FF) => Remote memory access aperture
2048(0x0800) ..= 4095(0x0FFF) => Volatile private RAM
4096(0x1000) ..= 12288(0x2FFF) => Shared RAM (all user cores can read/write)
within the memory map, the "pre-loadable" memory is a block of memory that can
be written, using the "!vm code ..." or "!vm write ..." commands.
Additionally, it is generally non-volatile, and can be expected to be persisted.
Memory outside the "pre-loadable" range can not be written via chat commands
only the CPU itself can access the other regions.
//// Memory and I/O space
to control "the ship"
- the ship: is visually a triangle floating around a wrapping 2D space
- it has integrated physics (aka motion: impulse, velocity, acceleration, moment, mass)
- ships spawn based on chat commands (or activity) in some random orientation,
(planned: possibly with some default program)
- ships have their own Wave2 core with these properties:
- own C registers, R registers, Instruction register, and private memory area
- own banked memory mapping, but banked memory can be accessed by all cores within the CPU
- own I/O space mapping, but I/O devices are shared between cores within each CPU
- remote aperture is shared with main core
- shares the volatile private memory area with main core
- same limitations on accessing the global shared memory area
- the ship core is not required to actually control the ship, but does by default
- the ship core can be halted indepantly of the main core
- the ship core is not started nor halted by the typical chat commands
+0 +1 +2 +3 Address
+----+----+----+----+
c0 0x0000 | X | Y | Z | W | 0x0000 (beginning of !vm write)
: : | | | | | :
c7 0x001c | X | Y | Z | W | 0x001f
+----+----+----+----+
r0 0x0020 | X | Y | Z | W | 0x0020
: : | | | | | :
r6 0x0038 | X | Y | Z | W | 0x003b
+----+----+----+----+
ri 0x003c | pc | Y | Z | W | 0x003f
+----+----+----+----+
+0 +1 +2 +3 Address
+----+----+----+----+ <- start of address wrap
0x0040 | P P P P | 0x0043 (beginning of !vm code)
: | Private Memory | : 192 words
0x00fc | P P P P | 0x00ff
+----+----+----+----+
0x0100 | BP BP BP BP | 0x0103 (banked memory ROM or persist RAM)
: | Private Memory | : 512 words
0x02fc | BP BP BP BP | 0x02ff (end of pre-loadable memory)
+----+----+----+----+ <- end of address wrap
0x0300 | | 0x0303 (Module I/O, VM control and ship modules)
: | | :
0x03fc | | 0x03ff
+----+----+----+----+ <-
0x0400 |0000 0000 0000 0000| 0x0403 (agent remote memory aperture)
: | | :
0x07fc |0000 0000 0000 0000| 0x07ff
+----+----+----+----+ <-
0x0800 | M M M M | 0x0803 (volatile private memory area)
: | | :
0x0ffc | M M M M | 0x0fff
+----+----+----+----+ <- end of private protection area
+----+----+----+----+ <- start of address wrap and shared protection area
0x1000 | S S S S | 0x1003 (shared memory aparture)
: | Shared Memory | :
0x2ffc | S S S S | 0x2fff
+----+----+----+----+ <- end of address wrap
Instructions
Instructions are arranged in categories:
0x0 => System, 0x1 => WSelect, 0x2 => Extra2, 0x3 => Extra3,
0x4 => Move, 0x5 => Swizzle, 0x6 => Load, 0x7 => Store,
0x8 => Math8, 0x9 => Math16, 0xA => Shift8, 0xB => Shift16,
0xC => BitOp, 0xD => SpecOp, 0xE => Extra14, 0xF => Extra15
//// Example of instructions and their encodings:
Move r0, c0 => 8004 // move c0 into r0
ScatterInc r4, c5 => 5c37 // scatter store values in c5 at the 4 addresses in r4, incrementing each address
Swizzle.xxyz ri => f905 // push pc onto a "stack" (ri.w = ri.z, ri.z = ri.y, ri.y = ri.x, ri.x = ri.x)
Xor r1, r1 => 996c // clear r1 to all zero
CompareEq16 r0, r1 => 8979 // test each field of r0 against r1, store 0xffff or 0 into r0 fields
Move.yzw r0, r1 => 89e4 // move only y,z,w of r1 into r0, leaving r0.x unchanged
SubRev16 ri, r0 => f889 // compute ri - r0, put result into ri
// if r0 is set to 0 or 0xffff by a compare,
// this will conditionally skip the next instruction
//// VM and Ship Modules
- "module 0" is always the control registers
- a slot can be unmapped with ID: 0x0000, all registers in these slots will read as zero
- control registers are a module that can be mapped
into other slots by using the thead ID: 0x0001 or 0x0002
- modules are memory mapped into the I/O control area (0x320 .. 0x3ff)
by storing the desired module ID into the respective
"Module slot selection" (TMS*) registers within the control area
- Mapping a module ID will cause the module to "activate"
- There are limits on the number of modules that can be active
- The lowest numbered IDs have activation priority
- The same module can be mapped into multiple slots at the same time
- Modules will not lose their data nor configuration while unmapped
- Modules can lose their configuration in the same volatile memory can
- ship modules will have some default configuration
- each module takes up 0x20 words
0x300 => VM and module control area
0x320 => module 1 default 0x1000 = CRS0
0x340 => module 2 default 0x1001 = CRS1
0x360 => module 3 default 0x0000 = unmapped
0x380 => module 4 default 0x4000 = flight controls module
0x3a0 => module 5 default 0x4040 = radar 0
0x3c0 => module 6 default 0x4050 = NAV 0
0x3e0 => module 7 default 0x4051 = NAV 1
- each ship will have "radar" scanning module(s) that grant:
- distance to objects (objects are detected within a narrow cone)
- configurable sweep angle
- target information: target ID, distance
- target ID can be used by "NAV" modules to obtain: distance, velocity, and headings
- each ship will have "NAV" navigation modules that grant:
- relative distance to target
- reletive velocity to target
- headings to target
- able to accept the target ID from radar modules
- able to target absolute coordinates
- each ship has a laser comm device:
- remote memory access to a target which is within line-of-sight
- must be facing the front of another ship to access it, the target ship can not be accessed from other sides
- remote access is limited to the volatile memory area of the other ship
//// Ship Local Coordinate System
Axiis Headings
0x0000
+Y ^
/\ 0x7000 | 0x1000
/ \ \ | /
-X/ \+X 0x6000 <----*----> 0x2000
/ \ / | \
+-|--|-+ 0x5000 | 0x3000
-Y v
0x4000
//// Fields common to all modules:
0 = (R ) always reads as zero
M = (R/W) scratch memory, these may be freely written with any value
//// VM Control Area (VCA)
CSTA = (RMW) Core status,
bit 0: Set = Running, Clear = Halted
(this bit can be set to start the core, can not be cleared by writing)
See Core Start below.
other bits may be non zero
CID = (R ) Core ID, this is also the module ID for the VCA
CER = (R/W) Core exception register save
CRI = (R/W) Core instruction register save
CCRL = (R/W) (Read: 0) (Write: Copy Constant Register stores to core's C registers)
CUID = (R ) Core user ID
CPRT = (R/W) Core protection (only applies while the core is running):
bits 1,0: private memory area (PMA) from volatile private area (VPMA)
PMA access, while PC points at the volatile private memory area:
00 = no protection
01 = readonly, writes ignored
10 = read as zero, writes ignored
11 = any access raises exception
bits 3,2: private memory area (PMA) from shared memory (SMA)
PMA access, while PC points at shared memory:
00 = no protection
01 = readonly, writes ignored
10 = read as zero, writes ignored
11 = any access raises exception
bits 5,4: volatile private area (VPMA) from shared memory (SMA)
VPMA access, while PC points at shared memory:
00 = no protection
01 = readonly, writes ignored
10 = read as zero, writes ignored
11 = any access raises exception
bit 6: set = prevent CER, CRI writes from other cores
bit 7: set = prevent CBK, CMS, CRS, CCRL writes from other cores
bit 8: set = exception on PC in private memory (callback protection)
bit 9: set = exception on PC in volatile private memory
bit 10: set = exception on PC in shared memory
bit 11: future use
bit 12: set = exception on core halt and invalid instructions
bit 13: set = exception on core sleep instructions
bit 14: future use
bit 15: future use
CBK = (R/W) Core Bank Select
this register is mirrored in two different words
they both hold the same value and writing to either will set the other
CMS* = (R/W) Core module selection (contains a module ID)
C+0x00 (0x300) [ CSTA, CID , CPRT, 0 ]
C+0x04 (0x304) [ CCRL, ----- CUID ----- ]
C+0x08 (0x308) [ -------- CER -------- ]
C+0x0c (0x30c) [ -------- CRI -------- ]
C+0x10 (0x310) [ 0000, CBK , CBK , f000 ]
C+0x14 (0x314) [ 0 , 0 , 0 , 0 ]
C+0x18 (0x318) [ CID , CMS1, CMS2, CMS3 ]
C+0x1c (0x31c) [ CMS4, CMS5, CMS6, CMS7 ]
//// Core Start
When CSTA bit 0 goes from unset to set
from any core writing to this register
A core start will be performed on the core that this VCA controls:
- CPRT settings will be applied
- CPRT will protect itself from other cores
- CRI will be loaded into Ri
- CUID is updated
- C registers will be loaded from this core's Constant Register Store
- The core will begin running instructions
//// Exceptions
The VCA gives access to exceptions management:
An exception when triggered will:
- Save Ri to the CRI register in the VCA
- Clear the private memory exceptions bit in CPRT
- Load Ri with the contents of CER from the VCA
- If Ri.PC points at the VCA, the CPU will halt
- Set CER to [0x0308, 0, 0, 0]
this makes CER point at itself within the VCA
you are expected to reload CER if you want to handle further exceptions
//// Constant Register Store (Module ID: 0x1001 core 1 CRS, 0x1002 core 2 CRS)
CS** = (R/W) values used to reload all the C registers with
C+0x00 [ CS0x, CS0y, CS0z, CS0w ]
C+0x04 [ CS1x, CS1y, CS1z, CS1w ]
C+0x08 [ CS2x, CS2y, CS2z, CS2w ]
C+0x0c [ CS3x, CS3y, CS3z, CS3w ]
C+0x10 [ CS4x, CS4y, CS4z, CS4w ]
C+0x14 [ CS5x, CS5y, CS5z, CS5w ]
C+0x18 [ CS6x, CS6y, CS6z, CS6w ]
C+0x1c [ CS7x, CS7y, CS7z, CS7w ]
//// Ship Control Modules
MSTS = (R ) Module status
MMID = (R ) Module ID
//// Flight Control Module (Module ID: 0x4000)
- there is only one flight control module, it is always active even if not mapped.
RRV = (R/W) requested relative velocity - X and Y in signed 16b in 8.8 format
CRV = (R ) current relative velocity - X and Y in signed 16b in 8.8 format
RH = (R/W) requested absolute/relative heading - signed 16b in 8.8 format
CAH = (R ) absolute heading
EEN = (R/W) engine controls bitflags:
bit 0 - set: allow Y+ engine to change velocity
bit 1 - set: allow Y- engine to change velocity
bit 2 - set: allow X+ engine to change velocity
bit 3 - set: allow X- engine to change velocity
bit 4 - set: allow engines to change heading
bit 5 - heading mode - set: absolute, clear: relative
SHCC = (R/W) set Ship colour
SHCM = (R/W) background mix (0x00 - ship matches background, 0xff - ship matches set colour)
// Flight control memory layout overview:
// (default mapping)
F+0x00 (0x380) [ MSTS, MMID, 0 , 0 ]
F+0x04 (0x384) [ RRVx, RRVy, M , M ]
F+0x08 (0x388) [ CRVx, CRVy, 0 , 0 ]
F+0x0c (0x38c) [ RH , M , M , M ]
F+0x10 (0x390) [ CAH , 0 , 0 , 0 ]
F+0x14 (0x394) [ EEN , M , M , M ]
F+0x18 (0x398) [ 0 , 0 , 0 , 0 ]
F+0x1c (0x39c) [ SHCC, SHCM, 0 , 0 ]
//// Radar/Scanning functions (Module IDs: 0x4040 - 0x4047)
- up to 4 scanner modules can be active at once
RSSH = (R/W) select scan heading, 0x0000 - 0x7fff (select scan heading), 0xffff (auto scan)
RHLS = (R ) heading of last scan
RNSR = (R ) number of signatures returned
RSDT = (R ) distance to signature - unsigned 16b in 8.8 format
RSID = (R ) signature ID
// Radar0 (default mapping)
R+0x00 (0x3a0) [ MSTS, MMID, 0 , 0 ]
R+0x04 (0x3a4) [ RSSH, 0 , RHLS, RNSR ]
R+0x08 (0x3a8) [ RSDT, ----- RSID ----- ]
R+0x0c (0x3ac) [ RSDT, ----- RSID ----- ]
R+0x10 (0x3b0) [ RSDT, ----- RSID ----- ]
R+0x14 (0x3b4) [ RSDT, ----- RSID ----- ]
R+0x18 (0x3b8) [ RSDT, ----- RSID ----- ]
R+0x1c (0x3bc) [ RSDT, ----- RSID ----- ]
//// NAV Module (Module ID: 0x4050 - 0x4057)
- up to 4 NAV modules can be active at once
NAS* = (R ) current absolute screen X,Y pixel position
NTS* = (R/W) NAV Target absolute screen pixel X,Y
NTGT = (R/W) Target Selector (0 - Fixed point; 1 - Beacon; 2 - Nearest Signature; N - Signature)
NTGI = (R/W) Target ID (3 words)
NRD* = (R ) NAV Target relative distance X,Y
NRV* = (R ) NAV Target relative velocity X,Y
NAHT = (R ) NAV Target absolute heading towards
NAHF = (R ) NAV Target absolute heading away
NRHT = (R ) NAV Target relative heading towards
NRHF = (R ) NAV Target relative heading away
// NAV0 NAV1 (default mappings)
N+0x00 (0x3c0/0x3e0) [ MSTS, MMID, 0 , 0 ]
N+0x04 (0x3c4/0x3e4) [ NASx, NASy, 0 , 0 ]
N+0x08 (0x3c8/0x3e8) [ NTSx, NTSy, M , M ]
N+0x0c (0x3cc/0x3ec) [ NTGT, ----- NTGI ----- ]
N+0x10 (0x3d0/0x3f0) [ NRDx, NRDy, 0 , 0 ]
N+0x14 (0x3d4/0x3f4) [ NRVx, NRVy, 0 , 0 ]
N+0x18 (0x3d8/0x3f8) [ NAHT, 0 , NAHF, 0 ]
N+0x1c (0x3dc/0x3fc) [ NRHT, 0 , NRHF, 0 ]
Stream Overlay
Shared Memory
//// Shared Memory
Memory within the shared RAM range is visualized as blocky pixels
The display matrix is 256 pixels wide, by 32 pixels tall
Accending pixel addresses are rastered from left to right, top down.
Pixel address from position is: 0x1000 + (0x100 * Y) + X
The pixel format is rrrr rggg gggb bbbb => i.e. 0xf800 is pure RED
this mode is called RGB565
Memory accesses will behave differently depending on
where the CPU is currently executing instructions:
- when PC is within private memory:
+ any action that increments an address (load, store, PC fetch)
with the address 0x00ff: will wrap to 0x0040
i.e. load with increment with address of 0x00fe
will read from 0x00fe, 0x00ff, 0x0040, 0x0041
the register will point at 0x0042 after.
+ loads and stores to private memory are atomic
+ loads and stores to volatile private memory are only atomic when
address has alignment 4, i.e. (address modulo 4) equals 0
+ loads and stores to shared memory are non-atomic
shared memory will be accessed a single word at a time
and incur significant delay between words
- when PC is within shared memory:
+ any action that increments an address (load, store, PC fetch)
with the address 0x2fff: will wrap to 0x1000
+ loads and stores to private memory are non-atomic
private memory will be accessed a single word at a time
and incur significant delay between words
+ loads and stores to shared memory are only atomic when
address has alignment 4, i.e. (address modulo 4) equals 0
such access will be without delay
non-aligned access will be non-atomic, and have only slight delay
+ registers should be used for fast transfer between memory regions
loads and stores to the register area (via load/store) of private memory still incurs delays
Space Ships
Space Stations
Asteroids
Economy
Combat
Examples
Ship Color Cycler
This program cycles the user's ship through various predefined color constants.
.memory
000f 00ff 0ff0 ff00
f000 f00f 0f0f f0f0
fff0 0fff f0ff ff0f
dead beef cafe feed
000f 039c 0800
.code
wmov r1, c4.x
wmov r2, c4.y
wmov r3, c4.z
:loop
mov r4.x, [r0.x+]
mov [r2.x], r4.x
mov r5.x, r0.x
ge.w r5, r5, r1
sub.w ri, ri, r5
sub.w r0, r0, r0
slp.w r3
jmp :loop
The .memory
section first lays out the color constants, and then the three values we use later in the program. 0x000f
which is the number of color constants (and subsequently the loop counter wrap point), 0x039c
which is the default address of the flight module's color register SHCC
, and finally 0x0800
which controls the sleep duration at the end of each loop iteration. We end the memory declaration by declaring the start of the .code
region.
The program begins with three lines using the WSelect.Move
instruction to pull the three constants defined in c4.xyz
into three registers for referencing individually later.
wmov r1, c4.x
wmov r2, c4.y
wmov r3, c4.z
Following this we define the loop start point using a label.
:loop
At the top of the loop we make a copy of the color by referring to the address of r0.x
which should be 0
by default1, equivalent to c0.x
which contains 0x000f
as defined by the memory region at the top of the file. The +
inside the pointer syntax means the address will be incremented following the move operation, so r0.x
will increment from 0
to 1
in the first iteration.
mov r4.x, [r0.x+]
Next we update the ship color by writing to the color register in the Flight module. By default the flight module is mounted such that the color register is at 0x39c
which we stored into r2
earlier. Here we use the value of r2.x
as an address to store the color data we just stored into r4
.
mov [r2.x], r4.x
Now we make a copy of the counter and store it in r5.x
so we can perform arithmetic on it without clobbering the original counter value.
We then use the Greater Than or Equal comparison ge
to compare the counter r5
with the total number of colors we stored earlier in r1
.
The ge
comparison will store either 0x0000
for true or 0xffff
for false into r5
.
mov r5, r0
ge.w r5, r5, r1
Note that the .w
modifier on the ge
instruction specifies that this math instruction operates on whole 16-bit words, rather than the .b
size modifier that operates on 8-bit bytes.
For the next step we are going to use the result of the comparison to skip an instruction. We want to reset the loop counter, but only if we have reached the color count.
Here we use a special trick where we subtract the result of the comparison from the Program Counter. Since the comparison is either 0
or ffff
we can use the fact that a subtracting ffff
with overflow is equivalent to adding 1
to skip an instruction conditionally based on the result we got earlier.
So subtracting the result when false will add 1
to the Program Counter, skipping the next instruction where we reset the counter to 0
.
sub.w ri, ri, r5
sub.w r0, r0, r0
At the end of each loop iteration, we sleep for the duration we stored into r3
earlier which is still holding the value we copied from the constant registers.
slp.w r3
Finally we jump2 back to the loop point.
jmp :loop
-
Even if
r0
is not0
here at the beginning as we assume, such as if we were running another program and did not reset our VM core before writing this program, worst case scenario it will just display random junk data as a color, before being reset to0
and iterating as normal. ↩ -
Under the hood, the jump instruction is a load instruction where we use the incrementing Program Counter to store a value into itself, which loads the value at the following address. In this case the jump instruction would translate into
mov ri.x, [ri.x+]
followed by the literal value of the:loop
label, which for this program would be0x43
since it is the fourth instruction in the private memory region which starts at0x40
↩
Wave2 Assembly Syntax Highlighting
nimphious.wave2-lsp
Provides syntax highlighting for the Wave2 assembly language.
Wave2 Assembly Language Server
nimphious.wave2-assembly
Language Server for the Wave2 Assembly Language syntax.
Wave2 Debugging Emulator
Not yet publicly available, needs cleanup and possibly a rewrite.
Wave2 Assembler
Github | Installation
Building Wave2 binaries which can be loaded into the simulator or converted into runic for use as chat load commands can be constructed using the Wave2 Assembler.
The assembler accepts w2s
files either via stdin
or filename. The w2s
file is then compiled into Wave2 bytecode and then output either to stdout
or written to the provided output file.
There are three output modes available. Standard, binary, and runic.
- Standard mode simply outputs the words in big endian plaintext hexadecimal.
- Binary mode outputs a Wave2 binary file.
- Runic mode outputs the words as Wave2 runes, and can optionally be wrapped in chat commands to be posted directly to stream.
Wave2 Binary Format
The binary output of the assembler is in the Wave2 binary format.
These binaries can be loaded directly by Wave2 tools such as the debugger, and contain a description of the
Runic Output
Installation
Make sure rust and therefore cargo is installed, and then run:
cargo install --git https://github.com/zeb-hicks/wave2_assembler
Or clone the repository locally and install using:
cargo install --path /path/to/wave2_assembler
If you have cargo binaries available on your path then you should be able to easily run the assembler from the command line like so:
waveasm
Wave2 Assembly
Syntax
Wave2 Assembly's syntax is very similar to the NASM flavoured assembly. The operands are ordered Intel style (instruction destination, source) and use the semicolon ;
for comments.
Section Directives
The first thing you'll see in most Wave2 assembly files is the .memory
directive, which defines the start of a region containing a sequence of values to be stored in the constant registers.
There are two directives at present, .memory
and .code
, each simply defining the beginning of their eponymous regions. These two directives also correspond to the two VM commands !write
1 and !code
respectively.
.memory
1234 5678
.code
mov r0, c0
; etc...
Structure
The core syntax typically takes the following structure:
instruction | size | operands | ||
---|---|---|---|---|
add | .w | r0, | r0, | c1 |
mov | r0, | r1 | ||
swizzle | r4.zyyw | |||
mul | r1, | r2 |
Not all instructions require anything more than the instruction mnemonic itself, such as nop
and halt
, however the ones that do typically follow the pattern above, as ordered in the below operands section.
Mnemonics
The first part of any typical instruction is the mnemonic, i.e. the identifier of the instruction being used. Some of these map directly to the respective bytecode operations, others choose the operation from context such as the move
instruction selecting move
for register-to-register operations, store
for register-to-memory operations, and load
for memory-to-register operations.
Here is a non-exhaustive list of some instructions and the actual opcode they resolve to:
mov r0, r1 ; `mov r0, r1` - move r1 into r0
mov r3, [r2] ; `load r3, [r2]` - load the memory value at [r2] into r3
mov [r5], r1 ; `store [r5], r1` - store the value of r1 into memory at [r5]
sub.w r0, r0, r1 ; `sub16 r0, r1` - 16-bit Subtract. r0 = r0 - r1
sub.b r4, c2, r4 ; `rsub8 r4, c2` - 8-bit Reciprocal subtract. r4 = c2 - r4
Size Specifiers
The math and the bit shift operations both require size specifiers, since there are a set of math and bit shift instructions for operating on both bytes and words, we need to specify which data type to operate on. For more information on how the size of each operation affects the output, see the appropriate pages for details.
Operands
Most instructions take operands which specify which registers or data on which the operation takes place.
The operand order is, as described earlier, in the Intel operand order, similar to NASM. This means that pointers are bracketed such as [r0.x]
and, pertinent to this section, the order of the operands are typically destination, source
which means if you want to move a value c2
into into register r0
, you use the order:
mov r0, c2
This is because r0
is the destination, and c2
is the source. An easy way to remember this may be thinking of the order as being similar to writing out a math equation:
Some instructions take three operands, such as the math instructions. The three operands in this case are in the order dest, lhs, rhs
where lhs
and rhs
are the left and right hand sides of the math equation, and the destination is the register the arithmetic is performed on, and as such the register the value will be stored in.
Literals
Literal values can be either decimal or hex. Decimal values do not require any special prefix or enclosing symbols and such can be written 12
or 99
or similar.
Hex values must be prefixed with $
so as to not be ambiguous or conflict with the constant register identifiers such as c0
and $c0
etc.
Any instruction that takes a literal value (such as the bit shift instructions) will accept either kind of literal.
ror.w r0, $a
rol.b r1, 2
asl.w r5, $4
Inline Values
Sometimes it might be necessary to include raw values arbitrarily within the code memory. This can be achieved with the inline value syntax, which is simply a !
followed by the hex value to be placed at that address. The hex value must not exceed 0xffff
.
; An alternative way to store a literal value from code memory.
; This is equivalent to using the macro `set r0, $f000`
mov r0.x, [ri.x+]
!f000
-
While the
!write
command starts at address0x0000
it can, given enough values, overflow into the!code
command's region which starts at0x0040
meaning that if you also write starting values to all the registers, your write command can overflow into the code region, potentially resulting in a separate!code
command to be unnecessary. ↩
Instructions
Most of the architecture's opcodes have corresponding assembly instructions, with some notable exceptions and additions.
System
The CPU opcodes for the system instructions halt
and sleep
are both implemented, and an additional mnemonic for the 0-length sleep nop
is also available.
halt
sleep 12
nop
Word Select
The word select instructions map to the native opcodes. All word select operations work on a single arbitrary source word, and the destination .X word, and can be used as follows:
wmove r0, r1.y ; Copy `r1.y` to `r0.x`
swap r0, r1.x ; Swap `r0.x` and `r1.x`
wadd r1, c0.z ; Add `c0.z` to `r1.x`
wsub r4, r1.w ; Subtract `r1.w` from `r4.x`
Move
The assembler combines the move, load, and store opcodes all into the move instruction which picks the appropriate opcode from the provided operands.
If a move is performed between two registers, then the opcode used will be move as normal. However if either the source or destination operands are a pointer, the opcode will instead be either load or store respectively.
For example:
mov r0, r1 ; move
mov r1, [c2] ; load
mov [r5], r0 ; store
For further detail on the move instruction refer to the move instruction section.
Swizzle
The swizzle instruction maps pretty directly to the native opcodes using the syntax:
swizzle r0.xxxx
swizzle r4.zyzx
swizzle r1.yzwx
swizzle ri.xxyz
The swizzled words can be in any order, but must specify exactly four words.
Math
The math instructions support both sizes of all native math opcodes, however instead of extra mnemonics a size specifier .b
or .w
is used.
Additionally, the reciprocal variants are merged with the normal variants, you specify the order of the operands yourself. This means that math instructions take three operands rather than two, with the caveat that the destination operand must also be one of the left- or right-hand-side operands.
; size dst lhs rhs
sub.w r4, r4, c0 ; `sub16 r4, c0`
sub.w r4, c0, r4 ; `rsub16 r4, c0`
Note that the order of c0
and r4
are swapped.
Shift
Like the Math instructions, the shift instructions take a size specifier, since the native opcodes come in both 8-bit and 16-bit variants.
The shift instructions can also take either literals or registers for the shift amount.
rol.w r0, 3
asr.b r4, c0
asl.w r2, $a
Bitwise
The bitwise instructions map straightforwardly to their native opcodes.
and r0, r1
xor r4, c0
all r5
Special
Wave2 has some special extended instructions that map to the native opcodes for doing things like horizontal add, 32-bit multiplication, etc.
mul r0, c0
wadd r4, r0
System
System executes special instructions based on extra bits:
0x0000=> Halt the core
0x__10=> Sleep will suspend the core for a certain number of ticks
0x0010 => SleepNop - any sleep duration of zero acts like a no-op.
the lower 3 bits of the destination field control the sleep duration:
0x0n10 => source field used as a number of ticks (n = 0 to 15)
0x1s10 => lowest byte of source register as number of ticks
0x2s10 => high byte of the lowest word of source register as number of ticks
0x3s10 => lowest word of source register as number of ticks
Halt
The halt
instruction maps to the halt opcode directly.
The following halt mnemonics are available:
hlt
halt
Sleep
The sleep
instruction takes a mode specifier, and a single operand for the number of ticks to sleep.
The syntax is as follows:
Mnemonic
Sleep has the following mnemonic forms:
slp
sleep
Mode
Sleep optionally takes a mode specifier, in the form of either:
.h
takes the high byte of the source register as the tick count source.
.l
takes the low byte of the source register as the tick count source.
.w
takes the low word (.x
) of the source register as the tick count source.
Duration
If no mode is specified, the tick count is taken as a literal in the range of 0
to 15
inclusive.
If a mode is specified, then sleep takes a register as a source for the tick count.
Examples
slp $f
sleep.w r0
slp.h c4
Nop
The sleep opcode has a special case where sleeping for zero ticks is equivalent to a nop
, which is provided as an instruction for convenience.
Nop has one mnemonic, and takes no operands:
nop
Word Select
The word select instructions allow some limited non-SIMD operations to be performed at the word level between the x
word of the destination register, and an arbitrary word from a source register.
Sometimes it may be necessary to perform simple operations only on a single word without having to sanitise the entire source vector, or copying to a temporary register and then copying the result back.
An example might be that you want to add a specific word from the constants registers to the program counter ri.x
, you could do something like copy the entire ri
word to a temporary register and perform arithmetic there with the constant register before doing a 1-word move back to ri
or similar, or alternatively you can use the Word Select instructions to perform the arithmetic more directly.
In such a case as:
.memory ; This is c0.z, we want to add it to ri.x
; ↓↓↓↓
dead beef 0010 f00d
.code
mov c0, r0 ; Make a copy of the constant we can manipulate
swizzle r0.zwxy ; Swizzle the constant so that the value we want is at `r0.x`
add.w r0, r0, ri ; Add the result to `ri`
; Assuming ri was `$0052 $0001 $2000 $0000`
; `ri` now contains: `$0062 $f00e $fead $beef`
; `ri.yzw` have been clobbered with the other three
; potentially unrelated words
; Alternatively:
wadd ri, c0.z ; Adds `c0.z` to `ri.x` without touching `ri.yzw`
; `ri` now contains: `$0062 $0001 $2000 $0000`
; We avoided clobbering the other three words using
; the word select instruction `wadd`.
Word Select Modes
The Word Select instruction has the following modes with the specified mnemonics available:
Mnemonics | Operation | Description |
---|---|---|
wmov wmove | Word Move | Similar to the move instruction, this copiesthe data from the destination's selected word into the destination's x word. |
wswap | Word Swap | Swap exchanges the contents of the specified source word and the destination's x word. |
wadd | Word Add | Add performs addition between the specified source word and the destination's x word,storing the result in the destination's x word. |
wsub | Word Subtract | Subtract performs subtraction between the specified source word and the destination's x word, storing the result in the latter. |
Destination
The destination operand takes a general purpose writable register and specifies the register of which the x
word will be written to.
Source
The source operand takes any register of which the word specified by the swizzle selector will be read.
If the operation is swap
then the source register must also be writable.
Swizzle
The source operand also requires a word to be selected using the swizzle syntax.
A period .
followed by a single x
y
z
or w
is used to specify the word to be selected.
Examples
wmove ri, c0.z
wadd r0, c1.y
wsub r4, r0.w
wswap r3, r1.z
Move
Copying data between registers and memory is done using the move
instructions.
The following two equivalent mnemonics for move are available:
mov
move
The assembler combines the move
, load
, and store
opcodes into a single instruction move
, picking the appropriate opcode given the provided operands.
Standard Move
When provided with two registers as operands, the move
instruction is translated to the move
opcode as normal. This results in the data being copied from the source to the destination registers directly.
Store
If the destination operand is a pointer, the
Load
Skip
A special case when loading any value into the c0
constant register exists that is a no-op that advances the program counter by the number of words loaded.
The skip
mnemonics are as follows:
skip, skip1
skip2
skip3
skip4
The numbered skip
instructions advance the program counter by the same amount, and skip
is the same as skip1
in that it also advances the PC by one.
Specifying Words
Scatter
Gather
(0x4) Move copies source register words to destination register
-> extra bits set to one will not copy the respective word (performs a "Mix" operation)
(0x5) Swizzle re-arranges or copies the destination register words according to bits in "extra" and "source"
-> given register [X,Y,Z,W] source: 0bWWZZ, extra: 0bYYXX
-> every two bit index specifies which source words to swizzle from.
(0x6) Load and (0x7) Store use the word(s) in the source register as an address
-> (0x6) Load copies memory into the destination register
-> (0x7) Store copies destination register into memory
-> "extra" bits used as modifiers:
-> upper 2 bits of extra, specify the number of words to load or store
(words are always accessed starting at X):
0x0 => 4 words, dst: XYZW
0x4 => 3 words, dst: XYZ
0x8 => 2 words, dst: XY
0xC => 1 word, dst: X
-> lower 2 bits of extra, specify the mode:
access words sequentially
word X from source as address
increments the access address after each load/store
0x0 -> Load / Store - source register unchanged
0x1 -> LoadInc / StoreInc - updates source register with address after last access
scatter/gather modes
respective XYZW words in source used as address for
the matching XYZW values to/from destination register
increments each access address by 1 after the load/store
0x2 -> Gather / Scatter - source register unchanged
0x3 -> GatherInc / ScatterInc - updates source register with accessed elements incremented
useful special Load/Store encodings:
LoadInl1 R_.x , #n => 0x_fd6 0xNNNN
LoadInl2 R_.xy , #n,n => 0x_f96 0xNNNN 0xNNNN
LoadInl3 R_.xyz , #n,n,n => 0x_f56 0xNNNN 0xNNNN 0xNNNN
LoadInl4 R_.xyzw, #n,n,n,n => 0x_f16 0xNNNN 0xNNNN 0xNNNN 0xNNNN
Skip1 => 0x0fd6 // skip the next instruction or word of data
Skip2 => 0x0f96 // skip the next 2 instructions or words of data
Skip3 => 0x0f56 // skip the next 3 instructions or words of data
Skip4 => 0x0f16 // skip the next 4 instructions or words of data
Skip encodes C0 as the destination, only C0 should be used
other C registers as destinations are reserved for future use
and may have undesired side affects.
The "#" is the vector size in number of bits, which can be 8 or 16
each group of vector size bits will be operated on independantly of other vector bits
operations are in parallel across all vector bits between the registers
Swizzle
Swizzle rearranges or copies the destination register words according to the provided four-word swizzle on the given destination register.
Mnemonics
Swizzle has the following synonymous mnemonics:
swi
swizzle
Destination
The swizzle instruction expects a destination register on which the swizzle is performed. The destination must be a general purpose writable register.
Swizzle
The actual swizzle operation to be performed is specified after the destination register's identifier.
This swizzle comes in the form of four letters identifying the source words to be copied into the destination in the order x
y
z
and w
.
For example, since this swizzle places the xyzw
words into the same order, this is a no-op:
swizzle r0.xyzw
Whereas the following swaps the values of the x
and y
words:
swizzle r0.yxzw
Swizzling is also not limited to swapping, and you can copy values to multiple destinations. For example, if we instead wanted to place a copy of x
into the y
word and leave the original x
intact:
swizzle r0.xxzw
Swizzling can be useful for building small stack-like structures, for copying data within a vector to be used in another SIMD instruction, or even for rearranging the words to provide access to a register's more significant words to an instruction that can only access the x
word of a register.
Macros
The assembler has some useful macros that make writing programs more ergonomic by abstracting away things like conditional jumps or inline literals.
The following macros are currently supported:
- Jump - Write to the program counter ergonomically
- Set - Assign labels or literals directly to registers
- Labels - Store named positions within the program
Set
Rationale
The set
macro allows you to ergonomically store values directly into registers.
In simple programs you will just store your values in the constant registers and use them as normal, an example of this might be:
.memory
0000 1111 2222 3333
4444 5555 6666 7777
.code
mov r0, c0 ; Set r0 to [0000, 1111, 2222, 3333]
wmov r1, c1.z ; Set r1.x to 6666
However there are situations where you may need more values than can fit into the constant registers.
One way to solve this problem might be to store a larger set of values in other memory regions, or constructing a series of programs to load chunks of data elsewhere, and then load the actual program afterwards.
Alternatively, the CPU has a convenient trick we can use for loading values into registers directly from the program memory.
The Program Counter
Since the program counter contains the value of the proceeding instruction to be executed, we can take advantage of this and use it as a pointer for the next instruction as a literal value to load from. For example:
mov r0.x, [ri.x]
nop
Since the nop
instruction has the literal value 0x0010
, this will load the value 0x0010
into r0.x
and then execute the nop
instruction. However this limits us to values that are valid instructions, and also has the side effect of executing instructions we may not want executed.
Luckily we have a trick we can leverage to avoid this problem, and that is the increment load/store instructions. When using a value as a pointer, we can increment that value after using it. In this example we can avoid executing the nop
by incrementing the program counter after we use it as a pointer:
mov r0.x, [ri.x+] ; The + here means ri.x will be incremented after reading
nop ; This will now be skipped
This now means that the value in this location does not need to be a valid instruction.
We can use the immediate value syntax to store arbitrary data in program memory:
mov r0.x, [ri.x+]
!b0fa
; Now r0.x = 0xb0fa
The set macro can make this more ergonomic. The following compiles to the same bytecode:
set r0, $b0fa
Additionally we can take advantage of the SIMD gather instructions to similar effect:
mov r0.xyz [ri.x+]
!1111
!2222
!3333
; Equivalent to:
set3 r0, $1111, $2222, $3333
Labels as Data
In addition to setting immediate values, we can also take advantage of the assembler's preprocessor to use the addresses of labels as values at runtime.
set r4, :some_label ; r4.x now contains the address of :some_label
This can be useful for adding offsets to jump locations, or even rewriting jumps when copying code into a new memory location (such as when copying a program into shared memory) among other uses.
Jump
The Wave2 architecture does not normally support any type of flow control other than mutating the program counter directly.
This means that in order to perform jumps or conditional execution, we need to perform arithmetic with the program counter.
The simplest method is to move values directly into the PC, a typical example might be to jump back to the start of private memory to cause your program to loop.
.memory ;0x0000
039c 0040
.code ;0x0040
mov r0, [c0.x]
notdst r0
mov [c0.x], r0
wmov ri, c0.y
On the final line we use the WSelect move instruction to move the second word of c0
into ri
the program counter register.
Wave2Assembly provides some helpful macros for modifying the program counter more ergonomically.
We can use labels to mark locations to jump to more easily:
:loop
; loop code goes here
jmp :loop
Additionally we can give jump the value of a register to jump to the address stored therein:
jmp r4
It's also possible to combine these two by using the set macro to place a labelled address into a register, and then perform arithmetic on that before jumping to that address. This would enable offsetting your jump conditionally, such as for selecting one of several subroutines, or implementing function lookup tables, jump lists, etc.
Labels
Labels are a way to store the position/address of a location in program memory for jumping or referring to elsewhere without having to use offsets or manually count instructions or addresses.
Labels are declared using a colon :
character followed by letters, numbers, or underscores.
; Declare the :start label to jump back to later.
:start
swizzle r0.yzwx
mov r1.x [ri.x+]
...
; Use the label defined earlier to jump back to the start of the program.
jmp :start
Unlike other macros, labels perform a post-processing step on the program after the rest of the instructions have been generated, in order to track the correct memory locations and inject them into the necessary places.
Usage
Labels can be used in jump instructions as shown above, and can also be used in set instructions to assign the value of a label to a register, such as:
:begin
sub.w r0, r0, r0
set r1, :begin
set r2, $0010 ; nop
mov [r1.x], r2
jmp :begin
This program overwrites the instruction sub.w r0, r0, r0
with a nop
by setting r1
to the address of the label :begin
which itself contains the subtract instruction's memory address, and then using that as a pointer to write the value 0x0010
stored in r2
, into the address pointed to by r1.x
.