Meisaka Wave2 Vector RISC CPU

0x0000	c0	x	y	z	w
⋮	⋮	⋮
0x001c	c7	x	y	z	w
0x0020	r0	x	y	z	w
⋮	⋮	⋮
0x0038	r6	x	y	z	w
0x003c	ri	PC	y	z	w
0x0040 ⋮ 0x00ff		Private Memory Region
⋮	⋮
See memory for the full map...

Architecture

The Wave2 CPU has 8 constant and 8* general purpose registers, and IO is memory mapped with some modularity.

Wave2's architecture is designed such that most operations are SIMD, affecting all four words of their respective vectors.

In addition, it is a partially-sandboxed multi-user simulation. Each user has multiple cores, and all users and their cores are executed concurrently. Cores can run code from anywhere in memory, including the shared memory region.

The bytes in memory are stored in little-endian order. Bytes from user writes, such as from chat or when loading binaries, are interpreted as big-endian, and then written to memory in little-endian.

The vector words are also in little-endian order, such that the least significant component X is the first in memory. The memory order begins at the least significant byte, with the least significant word first.

All CPU registers are SIMD vectors, holding a quartet of 16-bit words.
Some special instructions can operate on vectors as an octuplet of 8-bit words.

Memory is addressed by 16-bit words. Each memory address maps to a single word within its vector.

For example:

0x0000 => 0xABCD
0x0001 => 0xEF39

See Memory for more on the memory and its layout.

SIMD Architecture

The Wave2 CPU architecture is "SIMD by default" where most operations are inherently SIMD.

Every register is a SIMD vector of 4 words, and most instructions operate on all four words of both the source and destination register.

Additionally, while loading and storing memory can be performed one word at a time, it's also possible to atomically load and store entire vectors as long as they are stored with proper 4-word alignment.

Instructions that are said to operate on bytes instead of words are also SIMD, and treat the vector as eight 8-bit words for that operation.

Swizzling

Vector swizzling generally refers to the re-ordering of the components of a vector. Wave2 can perform vector swizzling both directly on a register to alter the vector's contents, or when specifying the vector components as part of an instruction.

Swizzle Instruction

Altering the content of a register.

The Swizzle instruction can be used to rearrange a vector's components arbitrarily. One use case might be altering which of the components in a vector is the least significant component, useful for when an instruction operates on only that value.

See the page on the Swizzle instruction for more.

Swizzling Operands

Selecting a register's vector components.

When using a register as an operand to an instruction, in some cases it is possible (or even required) to specify which word components of the register are relevant to the operation.

A register's components are specified using a period followed by 1-4 word specifiers, x y z or w, the number of which depends on the instruction's requirements.

For example, when copying values using the Move instruction, you can specify a subset of the components for the move using a swizzle:

; These two are equivalent
mov r0, r1
mov r0.xyzw, r1.xyzw

; Copy only the first word
mov r0.x, r1.x

; Copy only the third word
mov r0.z, r1.z

Another example is the Word Select instructions, which operate on the lest significant word of a register, and can select an arbitrary word from a register as the source word for the operation:

; Copy the value from r1.w into r0.x
wmove r0, r1.w

; Add r0.z into r0.x together
wadd r0, r0.z

; Conditional skip by subtracting a comparison result from the program counter
gt.w r0, r0, r1
wsub r0, ri.x
halt ; Halts only if r0 was greater than r1
jmp :continue

For other situations in which operand swizzling is useful, please see the documentation on the individual instructions.

Memory

System memory map:
   0(0x0000) ..=    31(0x001F) => System constant registers, accessible via load
  32(0x0020) ..=    63(0x003F) => System mutable registers, accessible via load/store
  64(0x0040) ..=   255(0x00FF) => Non-volatile Core private RAM
 256(0x0100) ..=   767(0x02FF) => Non-volatile Core banked RAM area
 768(0x0300) ..=  1023(0x03FF) => VM and Ship I/O control area
1024(0x0400) ..=  2047(0x07FF) => Remote memory access aperture
2048(0x0800) ..=  4095(0x0FFF) => Volatile private RAM
4096(0x1000) ..= 12288(0x2FFF) => Shared RAM (all user cores can read/write)

within the memory map, the "pre-loadable" memory is a block of memory that can
be written, using the "!vm code ..." or "!vm write ..." commands.
Additionally, it is generally non-volatile, and can be expected to be persisted.
Memory outside the "pre-loadable" range can not be written via chat commands
only the CPU itself can access the other regions.

//// Memory and I/O space
 to control "the ship"
 - the ship: is visually a triangle floating around a wrapping 2D space
 - it has integrated physics (aka motion: impulse, velocity, acceleration, moment, mass)
 - ships spawn based on chat commands (or activity) in some random orientation,
   (planned: possibly with some default program)
 - ships have their own Wave2 core with these properties:
   - own C registers, R registers, Instruction register, and private memory area
   - own banked memory mapping, but banked memory can be accessed by all cores within the CPU
   - own I/O space mapping, but I/O devices are shared between cores within each CPU
   - remote aperture is shared with main core
   - shares the volatile private memory area with main core
   - same limitations on accessing the global shared memory area
   - the ship core is not required to actually control the ship, but does by default
   - the ship core can be halted indepantly of the main core
   - the ship core is not started nor halted by the typical chat commands

            +0   +1   +2   +3  Address
          +----+----+----+----+
c0 0x0000 |  X |  Y |  Z |  W | 0x0000  (beginning of !vm write)
:     :   |    |    |    |    |    :
c7 0x001c |  X |  Y |  Z |  W | 0x001f
          +----+----+----+----+
r0 0x0020 |  X |  Y |  Z |  W | 0x0020
:     :   |    |    |    |    |    :
r6 0x0038 |  X |  Y |  Z |  W | 0x003b
          +----+----+----+----+
ri 0x003c | pc |  Y |  Z |  W | 0x003f
          +----+----+----+----+
            +0   +1   +2   +3  Address
          +----+----+----+----+         <- start of address wrap
   0x0040 |  P    P    P    P | 0x0043  (beginning of !vm code)
      :   |   Private Memory  |    :     192 words
   0x00fc |  P    P    P    P | 0x00ff
          +----+----+----+----+
   0x0100 | BP   BP   BP   BP | 0x0103  (banked memory ROM or persist RAM)
      :   |   Private Memory  |    :     512 words
   0x02fc | BP   BP   BP   BP | 0x02ff  (end of pre-loadable memory)
          +----+----+----+----+         <- end of address wrap
   0x0300 |                   | 0x0303  (Module I/O, VM control and ship modules)
      :   |                   |    :
   0x03fc |                   | 0x03ff
          +----+----+----+----+         <-
   0x0400 |0000 0000 0000 0000| 0x0403  (agent remote memory aperture)
      :   |                   |    :
   0x07fc |0000 0000 0000 0000| 0x07ff
          +----+----+----+----+         <-
   0x0800 |  M    M    M    M | 0x0803  (volatile private memory area)
      :   |                   |    :
   0x0ffc |  M    M    M    M | 0x0fff
          +----+----+----+----+         <- end of private protection area
          +----+----+----+----+         <- start of address wrap and shared protection area
   0x1000 |  S    S    S    S | 0x1003  (shared memory aparture)
      :   |   Shared Memory   |    :
   0x2ffc |  S    S    S    S | 0x2fff
          +----+----+----+----+        <- end of address wrap

Instructions

Instructions are arranged in categories:
0x0 => System, 0x1 => WSelect, 0x2 => Extra2,  0x3 => Extra3,
0x4 => Move,   0x5 => Swizzle, 0x6 => Load,    0x7 => Store,
0x8 => Math8,  0x9 => Math16,  0xA => Shift8,  0xB => Shift16,
0xC => BitOp,  0xD => SpecOp,  0xE => Extra14, 0xF => Extra15

//// Example of instructions and their encodings:
Move r0, c0        => 8004  // move c0 into r0
ScatterInc r4, c5  => 5c37  // scatter store values in c5 at the 4 addresses in r4, incrementing each address
Swizzle.xxyz ri    => f905  // push pc onto a "stack" (ri.w = ri.z, ri.z = ri.y, ri.y = ri.x, ri.x = ri.x)
Xor r1, r1         => 996c  // clear r1 to all zero
CompareEq16 r0, r1 => 8979  // test each field of r0 against r1, store 0xffff or 0 into r0 fields
Move.yzw r0, r1    => 89e4  // move only y,z,w of r1 into r0, leaving r0.x unchanged
SubRev16 ri, r0    => f889  // compute ri - r0, put result into ri
                            // if r0 is set to 0 or 0xffff by a compare,
                            // this will conditionally skip the next instruction

//// VM and Ship Modules
 - "module 0" is always the control registers
 - a slot can be unmapped with ID: 0x0000, all registers in these slots will read as zero
 - control registers are a module that can be mapped
   into other slots by using the thead ID: 0x0001 or 0x0002
 - modules are memory mapped into the I/O control area (0x320 .. 0x3ff)
   by storing the desired module ID into the respective
   "Module slot selection" (TMS*) registers within the control area
 - Mapping a module ID will cause the module to "activate"
 - There are limits on the number of modules that can be active
 - The lowest numbered IDs have activation priority
 - The same module can be mapped into multiple slots at the same time
 - Modules will not lose their data nor configuration while unmapped
 - Modules can lose their configuration in the same volatile memory can
 - ship modules will have some default configuration
 - each module takes up 0x20 words
0x300 => VM and module control area
0x320 => module 1 default 0x1000 = CRS0
0x340 => module 2 default 0x1001 = CRS1
0x360 => module 3 default 0x0000 = unmapped
0x380 => module 4 default 0x4000 = flight controls module
0x3a0 => module 5 default 0x4040 = radar 0
0x3c0 => module 6 default 0x4050 = NAV 0
0x3e0 => module 7 default 0x4051 = NAV 1

 - each ship will have "radar" scanning module(s) that grant:
   - distance to objects (objects are detected within a narrow cone)
   - configurable sweep angle
   - target information: target ID, distance
   - target ID can be used by "NAV" modules to obtain: distance, velocity, and headings
 - each ship will have "NAV" navigation modules that grant:
   - relative distance to target
   - reletive velocity to target
   - headings to target
   - able to accept the target ID from radar modules
   - able to target absolute coordinates
 - each ship has a laser comm device:
   - remote memory access to a target which is within line-of-sight
   - must be facing the front of another ship to access it, the target ship can not be accessed from other sides
   - remote access is limited to the volatile memory area of the other ship

//// Ship Local Coordinate System
   Axiis            Headings
                     0x0000
    +Y                  ^
    /\          0x7000  |  0x1000
   /  \               \ | /
-X/    \+X  0x6000 <----*----> 0x2000
 /      \             / | \
 +-|--|-+       0x5000  |  0x3000
    -Y                  v
                     0x4000

//// Fields common to all modules:
0    = (R  ) always reads as zero
M    = (R/W) scratch memory, these may be freely written with any value

//// VM Control Area (VCA)
CSTA = (RMW) Core status,
    bit 0: Set = Running, Clear = Halted
    (this bit can be set to start the core, can not be cleared by writing)
    See Core Start below.
    other bits may be non zero
CID  = (R  ) Core ID, this is also the module ID for the VCA
CER  = (R/W) Core exception register save
CRI  = (R/W) Core instruction register save
CCRL = (R/W) (Read: 0) (Write: Copy Constant Register stores to core's C registers)
CUID = (R  ) Core user ID
CPRT = (R/W) Core protection (only applies while the core is running):
    bits 1,0:  private memory area (PMA) from volatile private area (VPMA)
            PMA access, while PC points at the volatile private memory area:
        00 = no protection
        01 = readonly, writes ignored
        10 = read as zero, writes ignored
        11 = any access raises exception
    bits 3,2:  private memory area (PMA) from shared memory (SMA)
            PMA access, while PC points at shared memory:
        00 = no protection
        01 = readonly, writes ignored
        10 = read as zero, writes ignored
        11 = any access raises exception
    bits 5,4:  volatile private area (VPMA) from shared memory (SMA)
            VPMA access, while PC points at shared memory:
        00 = no protection
        01 = readonly, writes ignored
        10 = read as zero, writes ignored
        11 = any access raises exception
    bit 6: set = prevent CER, CRI writes from other cores
    bit 7: set = prevent CBK, CMS, CRS, CCRL writes from other cores
    bit 8: set = exception on PC in private memory (callback protection)
    bit 9: set = exception on PC in volatile private memory
    bit 10: set = exception on PC in shared memory
    bit 11: future use
    bit 12: set = exception on core halt and invalid instructions
    bit 13: set = exception on core sleep instructions
    bit 14: future use
    bit 15: future use
CBK  = (R/W) Core Bank Select
    this register is mirrored in two different words
    they both hold the same value and writing to either will set the other
CMS* = (R/W) Core module selection (contains a module ID)

C+0x00 (0x300) [ CSTA, CID , CPRT, 0    ]
C+0x04 (0x304) [ CCRL, ----- CUID ----- ]
C+0x08 (0x308) [ -------- CER  -------- ]
C+0x0c (0x30c) [ -------- CRI  -------- ]
C+0x10 (0x310) [ 0000, CBK , CBK , f000 ]
C+0x14 (0x314) [ 0   , 0   , 0   , 0    ]
C+0x18 (0x318) [ CID , CMS1, CMS2, CMS3 ]
C+0x1c (0x31c) [ CMS4, CMS5, CMS6, CMS7 ]

//// Core Start
When CSTA bit 0 goes from unset to set
from any core writing to this register
A core start will be performed on the core that this VCA controls:
 - CPRT settings will be applied
 - CPRT will protect itself from other cores
 - CRI  will be loaded into Ri
 - CUID is updated
 - C registers will be loaded from this core's Constant Register Store
 - The core will begin running instructions

//// Exceptions
The VCA gives access to exceptions management:
An exception when triggered will:
 - Save Ri to the CRI register in the VCA
 - Clear the private memory exceptions bit in CPRT
 - Load Ri with the contents of CER from the VCA
 - If Ri.PC points at the VCA, the CPU will halt
 - Set CER to [0x0308, 0, 0, 0]
   this makes CER point at itself within the VCA
   you are expected to reload CER if you want to handle further exceptions

//// Constant Register Store (Module ID: 0x1001 core 1 CRS, 0x1002 core 2 CRS)
CS** = (R/W) values used to reload all the C registers with
C+0x00 [ CS0x, CS0y, CS0z, CS0w ]
C+0x04 [ CS1x, CS1y, CS1z, CS1w ]
C+0x08 [ CS2x, CS2y, CS2z, CS2w ]
C+0x0c [ CS3x, CS3y, CS3z, CS3w ]
C+0x10 [ CS4x, CS4y, CS4z, CS4w ]
C+0x14 [ CS5x, CS5y, CS5z, CS5w ]
C+0x18 [ CS6x, CS6y, CS6z, CS6w ]
C+0x1c [ CS7x, CS7y, CS7z, CS7w ]

//// Ship Control Modules
MSTS = (R  ) Module status
MMID = (R  ) Module ID

//// Flight Control Module (Module ID: 0x4000)
 - there is only one flight control module, it is always active even if not mapped.
RRV  = (R/W) requested relative velocity - X and Y in signed 16b in 8.8 format
CRV  = (R  ) current relative velocity - X and Y in signed 16b in 8.8 format
RH   = (R/W) requested absolute/relative heading - signed 16b in 8.8 format
CAH  = (R  ) absolute heading
EEN  = (R/W) engine controls bitflags:
        bit 0 - set: allow Y+ engine to change velocity
        bit 1 - set: allow Y- engine to change velocity
        bit 2 - set: allow X+ engine to change velocity
        bit 3 - set: allow X- engine to change velocity
        bit 4 - set: allow engines to change heading
        bit 5 - heading mode - set: absolute, clear: relative
SHCC = (R/W) set Ship colour
SHCM = (R/W) background mix (0x00 - ship matches background, 0xff - ship matches set colour)

// Flight control memory layout overview:
//     (default mapping)
F+0x00 (0x380) [ MSTS, MMID, 0   , 0    ]
F+0x04 (0x384) [ RRVx, RRVy, M   , M    ]
F+0x08 (0x388) [ CRVx, CRVy, 0   , 0    ]
F+0x0c (0x38c) [ RH  , M   , M   , M    ]
F+0x10 (0x390) [ CAH , 0   , 0   , 0    ]
F+0x14 (0x394) [ EEN , M   , M   , M    ]
F+0x18 (0x398) [ 0   , 0   , 0   , 0    ]
F+0x1c (0x39c) [ SHCC, SHCM, 0   , 0    ]

//// Radar/Scanning functions (Module IDs: 0x4040 - 0x4047)
 - up to 4 scanner modules can be active at once

RSSH = (R/W) select scan heading, 0x0000 - 0x7fff (select scan heading), 0xffff (auto scan)
RHLS = (R  ) heading of last scan
RNSR = (R  ) number of signatures returned
RSDT = (R  ) distance to signature - unsigned 16b in 8.8 format
RSID = (R  ) signature ID

//     Radar0 (default mapping)
R+0x00 (0x3a0) [ MSTS, MMID, 0   , 0    ]
R+0x04 (0x3a4) [ RSSH, 0   , RHLS, RNSR ]
R+0x08 (0x3a8) [ RSDT, ----- RSID ----- ]
R+0x0c (0x3ac) [ RSDT, ----- RSID ----- ]
R+0x10 (0x3b0) [ RSDT, ----- RSID ----- ]
R+0x14 (0x3b4) [ RSDT, ----- RSID ----- ]
R+0x18 (0x3b8) [ RSDT, ----- RSID ----- ]
R+0x1c (0x3bc) [ RSDT, ----- RSID ----- ]

//// NAV Module (Module ID: 0x4050 - 0x4057)
 - up to 4 NAV modules can be active at once

NAS* = (R  ) current absolute screen X,Y pixel position
NTS* = (R/W) NAV Target absolute screen pixel X,Y
NTGT = (R/W) Target Selector (0 - Fixed point; 1 - Beacon; 2 - Nearest Signature; N - Signature)
NTGI = (R/W) Target ID (3 words)
NRD* = (R  ) NAV Target relative distance X,Y
NRV* = (R  ) NAV Target relative velocity X,Y
NAHT = (R  ) NAV Target absolute heading towards
NAHF = (R  ) NAV Target absolute heading away
NRHT = (R  ) NAV Target relative heading towards
NRHF = (R  ) NAV Target relative heading away

//      NAV0  NAV1 (default mappings)
N+0x00 (0x3c0/0x3e0) [ MSTS, MMID, 0   , 0    ]
N+0x04 (0x3c4/0x3e4) [ NASx, NASy, 0   , 0    ]
N+0x08 (0x3c8/0x3e8) [ NTSx, NTSy, M   , M    ]
N+0x0c (0x3cc/0x3ec) [ NTGT, ----- NTGI ----- ]
N+0x10 (0x3d0/0x3f0) [ NRDx, NRDy, 0   , 0    ]
N+0x14 (0x3d4/0x3f4) [ NRVx, NRVy, 0   , 0    ]
N+0x18 (0x3d8/0x3f8) [ NAHT, 0   , NAHF, 0    ]
N+0x1c (0x3dc/0x3fc) [ NRHT, 0   , NRHF, 0    ]

Stream Overlay

Shared Memory

//// Shared Memory
Memory within the shared RAM range is visualized as blocky pixels
The display matrix is 256 pixels wide, by 32 pixels tall
Accending pixel addresses are rastered from left to right, top down.
Pixel address from position is: 0x1000 + (0x100 * Y) + X
The pixel format is rrrr rggg gggb bbbb => i.e. 0xf800 is pure RED
this mode is called RGB565

Memory accesses will behave differently depending on
where the CPU is currently executing instructions:
 - when PC is within private memory:
   + any action that increments an address (load, store, PC fetch)
     with the address 0x00ff: will wrap to 0x0040
     i.e. load with increment with address of 0x00fe
          will read from 0x00fe, 0x00ff, 0x0040, 0x0041
          the register will point at 0x0042 after.
   + loads and stores to private memory are atomic
   + loads and stores to volatile private memory are only atomic when
     address has alignment 4, i.e. (address modulo 4) equals 0
   + loads and stores to shared memory are non-atomic
     shared memory will be accessed a single word at a time
     and incur significant delay between words
 - when PC is within shared memory:
   + any action that increments an address (load, store, PC fetch)
     with the address 0x2fff: will wrap to 0x1000
   + loads and stores to private memory are non-atomic
     private memory will be accessed a single word at a time
     and incur significant delay between words
   + loads and stores to shared memory are only atomic when
     address has alignment 4, i.e. (address modulo 4) equals 0
     such access will be without delay
     non-aligned access will be non-atomic, and have only slight delay
   + registers should be used for fast transfer between memory regions
     loads and stores to the register area (via load/store) of private memory still incurs delays

Space Ships

Space Stations

Asteroids

Economy

Combat

Examples

Ship Color Cycler

This program cycles the user's ship through various predefined color constants.

.memory
000f 00ff 0ff0 ff00
f000 f00f 0f0f f0f0
fff0 0fff f0ff ff0f
dead beef cafe feed
000f 039c 0800
.code
wmov r1, c4.x
wmov r2, c4.y
wmov r3, c4.z
:loop
mov r4.x, [r0.x+]
mov [r2.x], r4.x
mov r5.x, r0.x
ge.w r5, r5, r1
sub.w ri, ri, r5
sub.w r0, r0, r0
slp.w r3
jmp :loop

The .memory section first lays out the color constants, and then the three values we use later in the program. 0x000f which is the number of color constants (and subsequently the loop counter wrap point), 0x039c which is the default address of the flight module's color register SHCC, and finally 0x0800 which controls the sleep duration at the end of each loop iteration. We end the memory declaration by declaring the start of the .code region.

The program begins with three lines using the WSelect.Move instruction to pull the three constants defined in c4.xyz into three registers for referencing individually later.

wmov r1, c4.x
wmov r2, c4.y
wmov r3, c4.z

Following this we define the loop start point using a label.

:loop

At the top of the loop we make a copy of the color by referring to the address of r0.x which should be 0 by default¹, equivalent to c0.x which contains 0x000f as defined by the memory region at the top of the file. The + inside the pointer syntax means the address will be incremented following the move operation, so r0.x will increment from 0 to 1 in the first iteration.

mov r4.x, [r0.x+]

Next we update the ship color by writing to the color register in the Flight module. By default the flight module is mounted such that the color register is at 0x39c which we stored into r2 earlier. Here we use the value of r2.x as an address to store the color data we just stored into r4.

mov [r2.x], r4.x

Now we make a copy of the counter and store it in r5.x so we can perform arithmetic on it without clobbering the original counter value.

We then use the Greater Than or Equal comparison ge to compare the counter r5 with the total number of colors we stored earlier in r1.

The ge comparison will store either 0x0000 for true or 0xffff for false into r5.

mov r5, r0
ge.w r5, r5, r1

Note that the .w modifier on the ge instruction specifies that this math instruction operates on whole 16-bit words, rather than the .b size modifier that operates on 8-bit bytes.

For the next step we are going to use the result of the comparison to skip an instruction. We want to reset the loop counter, but only if we have reached the color count.

Here we use a special trick where we subtract the result of the comparison from the Program Counter. Since the comparison is either 0 or ffff we can use the fact that a subtracting ffff with overflow is equivalent to adding 1 to skip an instruction conditionally based on the result we got earlier.

So subtracting the result when false will add 1 to the Program Counter, skipping the next instruction where we reset the counter to 0.

sub.w ri, ri, r5
sub.w r0, r0, r0

At the end of each loop iteration, we sleep for the duration we stored into r3 earlier which is still holding the value we copied from the constant registers.

slp.w r3

Finally we jump² back to the loop point.

jmp :loop

Even if r0 is not 0 here at the beginning as we assume, such as if we were running another program and did not reset our VM core before writing this program, worst case scenario it will just display random junk data as a color, before being reset to 0 and iterating as normal. ↩
Under the hood, the jump instruction is a load instruction where we use the incrementing Program Counter to store a value into itself, which loads the value at the following address. In this case the jump instruction would translate into mov ri.x, [ri.x+] followed by the literal value of the :loop label, which for this program would be 0x43 since it is the fourth instruction in the private memory region which starts at 0x40 ↩

Wave2 Assembly Syntax Highlighting

`nimphious.wave2-lsp`

Provides syntax highlighting for the Wave2 assembly language.

Wave2 Assembly Language Server

`nimphious.wave2-assembly`

Language Server for the Wave2 Assembly Language syntax.

Wave2 Debugging Emulator

Not yet publicly available, needs cleanup and possibly a rewrite.

Wave2 Assembler

Github | Installation

Building Wave2 binaries which can be loaded into the simulator or converted into runic for use as chat load commands can be constructed using the Wave2 Assembler.

The assembler accepts w2s files either via stdin or filename. The w2s file is then compiled into Wave2 bytecode and then output either to stdout or written to the provided output file.

There are three output modes available. Standard, binary, and runic.

Standard mode simply outputs the words in big endian plaintext hexadecimal.
Binary mode outputs a Wave2 binary file.
Runic mode outputs the words as Wave2 runes, and can optionally be wrapped in chat commands to be posted directly to stream.

Wave2 Binary Format

The binary output of the assembler is in the Wave2 binary format.

These binaries can be loaded directly by Wave2 tools such as the debugger, and contain a description of the

Runic Output

Installation

Make sure rust and therefore cargo is installed, and then run:

cargo install --git https://github.com/zeb-hicks/wave2_assembler

Or clone the repository locally and install using:

cargo install --path /path/to/wave2_assembler

If you have cargo binaries available on your path then you should be able to easily run the assembler from the command line like so:

waveasm

Wave2 Assembly

Syntax

Wave2 Assembly's syntax is very similar to the NASM flavoured assembly. The operands are ordered Intel style (instruction destination, source) and use the semicolon ; for comments.

Section Directives

The first thing you'll see in most Wave2 assembly files is the .memory directive, which defines the start of a region containing a sequence of values to be stored in the constant registers.

There are two directives at present, .memory and .code, each simply defining the beginning of their eponymous regions. These two directives also correspond to the two VM commands !write¹ and !code respectively.

.memory
1234 5678
.code
mov r0, c0
; etc...

Structure

The core syntax typically takes the following structure:

instruction	size	operands
add	.w	r0,	r0,	c1
mov		r0,	r1
swizzle		r4.zyyw
mul		r1,	r2

_{Mouse over the example instruction fragments to learn more.}

Not all instructions require anything more than the instruction mnemonic itself, such as nop and halt, however the ones that do typically follow the pattern above, as ordered in the below operands section.

Mnemonics

The first part of any typical instruction is the mnemonic, i.e. the identifier of the instruction being used. Some of these map directly to the respective bytecode operations, others choose the operation from context such as the move instruction selecting move for register-to-register operations, store for register-to-memory operations, and load for memory-to-register operations.

Here is a non-exhaustive list of some instructions and the actual opcode they resolve to:

mov r0, r1       ; `mov r0, r1` - move r1 into r0
mov r3, [r2]     ; `load r3, [r2]` - load the memory value at [r2] into r3
mov [r5], r1     ; `store [r5], r1` - store the value of r1 into memory at [r5]
sub.w r0, r0, r1 ; `sub16 r0, r1` - 16-bit Subtract. r0 = r0 - r1
sub.b r4, c2, r4 ; `rsub8 r4, c2` - 8-bit Reciprocal subtract. r4 = c2 - r4

Size Specifiers

The math and the bit shift operations both require size specifiers, since there are a set of math and bit shift instructions for operating on both bytes and words, we need to specify which data type to operate on. For more information on how the size of each operation affects the output, see the appropriate pages for details.

Operands

Most instructions take operands which specify which registers or data on which the operation takes place.

The operand order is, as described earlier, in the Intel operand order, similar to NASM. This means that pointers are bracketed such as [r0.x] and, pertinent to this section, the order of the operands are typically destination, source which means if you want to move a value c2 into into register r0, you use the order:

mov r0, c2

This is because r0 is the destination, and c2 is the source. An easy way to remember this may be thinking of the order as being similar to writing out a math equation: $r_{0} = c_{2}$ Some instructions take three operands, such as the math instructions. The three operands in this case are in the order dest, lhs, rhs where lhs and rhs are the left and right hand sides of the math equation, and the destination is the register the arithmetic is performed on, and as such the register the value will be stored in.

Literals

Literal values can be either decimal or hex. Decimal values do not require any special prefix or enclosing symbols and such can be written 12 or 99 or similar.

Hex values must be prefixed with $ so as to not be ambiguous or conflict with the constant register identifiers such as c0 and $c0 etc.

Any instruction that takes a literal value (such as the bit shift instructions) will accept either kind of literal.

ror.w r0, $a
rol.b r1, 2
asl.w r5, $4

Inline Values

Sometimes it might be necessary to include raw values arbitrarily within the code memory. This can be achieved with the inline value syntax, which is simply a ! followed by the hex value to be placed at that address. The hex value must not exceed 0xffff.

; An alternative way to store a literal value from code memory.
; This is equivalent to using the macro `set r0, $f000`
mov r0.x, [ri.x+]
!f000

While the !write command starts at address 0x0000 it can, given enough values, overflow into the !code command's region which starts at 0x0040 meaning that if you also write starting values to all the registers, your write command can overflow into the code region, potentially resulting in a separate !code command to be unnecessary. ↩

Instructions

Most of the architecture's opcodes have corresponding assembly instructions, with some notable exceptions and additions.

System

The CPU opcodes for the system instructions halt and sleep are both implemented, and an additional mnemonic for the 0-length sleep nop is also available.

halt
sleep 12
nop

Word Select

The word select instructions map to the native opcodes. All word select operations work on a single arbitrary source word, and the destination .X word, and can be used as follows:

wmove r0, r1.y  ; Copy `r1.y` to `r0.x`
swap r0, r1.x   ; Swap `r0.x` and `r1.x`
wadd r1, c0.z   ; Add `c0.z` to `r1.x`
wsub r4, r1.w   ; Subtract `r1.w` from `r4.x`

Move

The assembler combines the move, load, and store opcodes all into the move instruction which picks the appropriate opcode from the provided operands.

If a move is performed between two registers, then the opcode used will be move as normal. However if either the source or destination operands are a pointer, the opcode will instead be either load or store respectively.

For example:

mov r0, r1   ; move
mov r1, [c2] ; load
mov [r5], r0 ; store

For further detail on the move instruction refer to the move instruction section.

Swizzle

The swizzle instruction maps pretty directly to the native opcodes using the syntax:

swizzle r0.xxxx
swizzle r4.zyzx
swizzle r1.yzwx
swizzle ri.xxyz

The swizzled words can be in any order, but must specify exactly four words.

Math

The math instructions support both sizes of all native math opcodes, however instead of extra mnemonics a size specifier .b or .w is used.

Additionally, the reciprocal variants are merged with the normal variants, you specify the order of the operands yourself. This means that math instructions take three operands rather than two, with the caveat that the destination operand must also be one of the left- or right-hand-side operands.

;  size  dst  lhs  rhs
sub.w    r4,  r4,  c0 ;  `sub16 r4, c0`
sub.w    r4,  c0,  r4 ; `rsub16 r4, c0`

Note that the order of c0 and r4 are swapped.

Shift

Like the Math instructions, the shift instructions take a size specifier, since the native opcodes come in both 8-bit and 16-bit variants.

The shift instructions can also take either literals or registers for the shift amount.

rol.w r0, 3
asr.b r4, c0
asl.w r2, $a

Bitwise

The bitwise instructions map straightforwardly to their native opcodes.

and r0, r1
xor r4, c0
all r5

Special

Wave2 has some special extended instructions that map to the native opcodes for doing things like horizontal add, 32-bit multiplication, etc.

mul r0, c0
wadd r4, r0

System

System executes special instructions based on extra bits:
0x0000=> Halt the core
0x__10=> Sleep will suspend the core for a certain number of ticks
 0x0010 => SleepNop - any sleep duration of zero acts like a no-op.
 the lower 3 bits of the destination field control the sleep duration:
 0x0n10 => source field used as a number of ticks (n = 0 to 15)
 0x1s10 => lowest byte of source register as number of ticks
 0x2s10 => high byte of the lowest word of source register as number of ticks
 0x3s10 => lowest word of source register as number of ticks

Halt

The halt instruction maps to the halt opcode directly.

The following halt mnemonics are available:

hlt
halt

Sleep

The sleep instruction takes a mode specifier, and a single operand for the number of ticks to sleep.

The syntax is as follows:

Mnemonic

Sleep has the following mnemonic forms:

slp
sleep

Mode

Sleep optionally takes a mode specifier, in the form of either: .h takes the high byte of the source register as the tick count source. .l takes the low byte of the source register as the tick count source. .w takes the low word (.x) of the source register as the tick count source.

Duration

If no mode is specified, the tick count is taken as a literal in the range of 0 to 15 inclusive.

If a mode is specified, then sleep takes a register as a source for the tick count.

Examples

slp $f
sleep.w r0
slp.h c4

Nop

The sleep opcode has a special case where sleeping for zero ticks is equivalent to a nop, which is provided as an instruction for convenience.

Nop has one mnemonic, and takes no operands:

nop

Word Select

The word select instructions allow some limited non-SIMD operations to be performed at the word level between the x word of the destination register, and an arbitrary word from a source register.

Sometimes it may be necessary to perform simple operations only on a single word without having to sanitise the entire source vector, or copying to a temporary register and then copying the result back.

An example might be that you want to add a specific word from the constants registers to the program counter ri.x, you could do something like copy the entire ri word to a temporary register and perform arithmetic there with the constant register before doing a 1-word move back to ri or similar, or alternatively you can use the Word Select instructions to perform the arithmetic more directly.

In such a case as:

.memory ; This is c0.z, we want to add it to ri.x
        ; ↓↓↓↓
dead beef 0010 f00d
.code
mov c0, r0       ; Make a copy of the constant we can manipulate
swizzle r0.zwxy  ; Swizzle the constant so that the value we want is at `r0.x`
add.w r0, r0, ri ; Add the result to `ri`
                 ; Assuming ri was    `$0052 $0001 $2000 $0000`
                 ; `ri` now contains: `$0062 $f00e $fead $beef`
                 ; `ri.yzw` have been clobbered with the other three
                 ; potentially unrelated words

				 ; Alternatively:

wadd ri, c0.z    ; Adds `c0.z` to `ri.x` without touching `ri.yzw`
                 ; `ri` now contains: `$0062 $0001 $2000 $0000`
                 ; We avoided clobbering the other three words using
                 ; the word select instruction `wadd`.

Word Select Modes

The Word Select instruction has the following modes with the specified mnemonics available:

Mnemonics	Operation	Description
`wmov` `wmove`	Word Move	Similar to the `move` instruction, this copies the data from the destination's selected word into the destination's `x` word.
`wswap`	Word Swap	Swap exchanges the contents of the specified source word and the destination's `x` word.
`wadd`	Word Add	Add performs addition between the specified source word and the destination's `x` word, storing the result in the destination's `x` word.
`wsub`	Word Subtract	Subtract performs subtraction between the specified source word and the destination's `x` word, storing the result in the latter.

Destination

The destination operand takes a general purpose writable register and specifies the register of which the x word will be written to.

Source

The source operand takes any register of which the word specified by the swizzle selector will be read.

If the operation is swap then the source register must also be writable.

Swizzle

The source operand also requires a word to be selected using the swizzle syntax.

A period . followed by a single x y z or w is used to specify the word to be selected.

Examples

wmove ri, c0.z
wadd r0, c1.y
wsub r4, r0.w
wswap r3, r1.z

Move

Copying data between registers and memory is done using the move instructions.

The following two equivalent mnemonics for move are available:

mov
move

The assembler combines the move, load, and store opcodes into a single instruction move, picking the appropriate opcode given the provided operands.

Standard Move

When provided with two registers as operands, the move instruction is translated to the move opcode as normal. This results in the data being copied from the source to the destination registers directly.

Store

If the destination operand is a pointer, the

Load

Skip

A special case when loading any value into the c0 constant register exists that is a no-op that advances the program counter by the number of words loaded.

The skip mnemonics are as follows:

skip, skip1
skip2
skip3
skip4

The numbered skip instructions advance the program counter by the same amount, and skip is the same as skip1 in that it also advances the PC by one.

Specifying Words

Scatter

Gather

(0x4) Move copies source register words to destination register
 -> extra bits set to one will not copy the respective word (performs a "Mix" operation)
(0x5) Swizzle re-arranges or copies the destination register words according to bits in "extra" and "source"
 -> given register [X,Y,Z,W] source: 0bWWZZ, extra: 0bYYXX
 -> every two bit index specifies which source words to swizzle from.
(0x6) Load and (0x7) Store use the word(s) in the source register as an address
 -> (0x6) Load copies memory into the destination register
 -> (0x7) Store copies destination register into memory
 -> "extra" bits used as modifiers:
 -> upper 2 bits of extra, specify the number of words to load or store
    (words are always accessed starting at X):
    0x0 => 4 words, dst: XYZW
    0x4 => 3 words, dst: XYZ
    0x8 => 2 words, dst: XY
    0xC => 1 word,  dst: X
 -> lower 2 bits of extra, specify the mode:
access words sequentially
word X from source as address
increments the access address after each load/store
 0x0 -> Load    / Store    - source register unchanged
 0x1 -> LoadInc / StoreInc - updates source register with address after last access
scatter/gather modes
respective XYZW words in source used as address for
the matching XYZW values to/from destination register
increments each access address by 1 after the load/store
 0x2 -> Gather    / Scatter    - source register unchanged
 0x3 -> GatherInc / ScatterInc - updates source register with accessed elements incremented
useful special Load/Store encodings:
LoadInl1 R_.x   , #n       => 0x_fd6 0xNNNN
LoadInl2 R_.xy  , #n,n     => 0x_f96 0xNNNN 0xNNNN
LoadInl3 R_.xyz , #n,n,n   => 0x_f56 0xNNNN 0xNNNN 0xNNNN
LoadInl4 R_.xyzw, #n,n,n,n => 0x_f16 0xNNNN 0xNNNN 0xNNNN 0xNNNN

Skip1 => 0x0fd6 // skip the next instruction or word of data
Skip2 => 0x0f96 // skip the next 2 instructions or words of data
Skip3 => 0x0f56 // skip the next 3 instructions or words of data
Skip4 => 0x0f16 // skip the next 4 instructions or words of data

Skip encodes C0 as the destination, only C0 should be used
other C registers as destinations are reserved for future use
and may have undesired side affects.

The "#" is the vector size in number of bits, which can be 8 or 16
each group of vector size bits will be operated on independantly of other vector bits
operations are in parallel across all vector bits between the registers

Swizzle

Swizzle rearranges or copies the destination register words according to the provided four-word swizzle on the given destination register.

Mnemonics

Swizzle has the following synonymous mnemonics:

swi
swizzle

Destination

The swizzle instruction expects a destination register on which the swizzle is performed. The destination must be a general purpose writable register.

Swizzle

The actual swizzle operation to be performed is specified after the destination register's identifier.

This swizzle comes in the form of four letters identifying the source words to be copied into the destination in the order x y z and w.

For example, since this swizzle places the xyzw words into the same order, this is a no-op:

swizzle r0.xyzw

Whereas the following swaps the values of the x and y words:

swizzle r0.yxzw

Swizzling is also not limited to swapping, and you can copy values to multiple destinations. For example, if we instead wanted to place a copy of x into the y word and leave the original x intact:

swizzle r0.xxzw

Swizzling can be useful for building small stack-like structures, for copying data within a vector to be used in another SIMD instruction, or even for rearranging the words to provide access to a register's more significant words to an instruction that can only access the x word of a register.

Macros

The assembler has some useful macros that make writing programs more ergonomic by abstracting away things like conditional jumps or inline literals.

The following macros are currently supported:

Jump - Write to the program counter ergonomically
Set - Assign labels or literals directly to registers
Labels - Store named positions within the program

Set

Rationale

The set macro allows you to ergonomically store values directly into registers.

In simple programs you will just store your values in the constant registers and use them as normal, an example of this might be:

.memory
0000 1111 2222 3333
4444 5555 6666 7777
.code
mov r0, c0 ; Set r0 to [0000, 1111, 2222, 3333]
wmov r1, c1.z ; Set r1.x to 6666

However there are situations where you may need more values than can fit into the constant registers.

One way to solve this problem might be to store a larger set of values in other memory regions, or constructing a series of programs to load chunks of data elsewhere, and then load the actual program afterwards.

Alternatively, the CPU has a convenient trick we can use for loading values into registers directly from the program memory.

The Program Counter

Since the program counter contains the value of the proceeding instruction to be executed, we can take advantage of this and use it as a pointer for the next instruction as a literal value to load from. For example:

mov r0.x, [ri.x]
nop

Since the nop instruction has the literal value 0x0010, this will load the value 0x0010 into r0.x and then execute the nop instruction. However this limits us to values that are valid instructions, and also has the side effect of executing instructions we may not want executed.

Luckily we have a trick we can leverage to avoid this problem, and that is the increment load/store instructions. When using a value as a pointer, we can increment that value after using it. In this example we can avoid executing the nop by incrementing the program counter after we use it as a pointer:

mov r0.x, [ri.x+] ; The + here means ri.x will be incremented after reading
nop ; This will now be skipped

This now means that the value in this location does not need to be a valid instruction.

We can use the immediate value syntax to store arbitrary data in program memory:

mov r0.x, [ri.x+]
!b0fa
; Now r0.x = 0xb0fa

The set macro can make this more ergonomic. The following compiles to the same bytecode:

set r0, $b0fa

Additionally we can take advantage of the SIMD gather instructions to similar effect:

mov r0.xyz [ri.x+]
!1111
!2222
!3333

; Equivalent to:

set3 r0, $1111, $2222, $3333

Labels as Data

In addition to setting immediate values, we can also take advantage of the assembler's preprocessor to use the addresses of labels as values at runtime.

set r4, :some_label ; r4.x now contains the address of :some_label

This can be useful for adding offsets to jump locations, or even rewriting jumps when copying code into a new memory location (such as when copying a program into shared memory) among other uses.

Jump

The Wave2 architecture does not normally support any type of flow control other than mutating the program counter directly.

This means that in order to perform jumps or conditional execution, we need to perform arithmetic with the program counter.

The simplest method is to move values directly into the PC, a typical example might be to jump back to the start of private memory to cause your program to loop.

.memory ;0x0000
039c 0040
.code ;0x0040
mov r0, [c0.x]
notdst r0
mov [c0.x], r0
wmov ri, c0.y

On the final line we use the WSelect move instruction to move the second word of c0 into ri the program counter register.

Wave2Assembly provides some helpful macros for modifying the program counter more ergonomically.

We can use labels to mark locations to jump to more easily:

:loop

; loop code goes here

jmp :loop

Additionally we can give jump the value of a register to jump to the address stored therein:

jmp r4

It's also possible to combine these two by using the set macro to place a labelled address into a register, and then perform arithmetic on that before jumping to that address. This would enable offsetting your jump conditionally, such as for selecting one of several subroutines, or implementing function lookup tables, jump lists, etc.

Labels

Labels are a way to store the position/address of a location in program memory for jumping or referring to elsewhere without having to use offsets or manually count instructions or addresses.

Labels are declared using a colon : character followed by letters, numbers, or underscores.

; Declare the :start label to jump back to later.
:start
swizzle r0.yzwx
mov r1.x [ri.x+]

...

; Use the label defined earlier to jump back to the start of the program.
jmp :start

Unlike other macros, labels perform a post-processing step on the program after the rest of the instructions have been generated, in order to track the correct memory locations and inject them into the necessary places.

Usage

Labels can be used in jump instructions as shown above, and can also be used in set instructions to assign the value of a label to a register, such as:

:begin
sub.w r0, r0, r0
set r1, :begin
set r2, $0010 ; nop
mov [r1.x], r2
jmp :begin

This program overwrites the instruction sub.w r0, r0, r0 with a nop by setting r1 to the address of the label :begin which itself contains the subtract instruction's memory address, and then using that as a pointer to write the value 0x0010 stored in r2, into the address pointed to by r1.x.

Keyboard shortcuts

Wave2 Book

Github | Installation