Jump to content

Fast Math in assembly language


Recommended Posts

When programming in assembly language, quite often you will need to do calculations that would normally require floating point math.  Those subroutines are available, the same ones that BASIC uses, but they are very slow.  And if you're programming in assembly, you're doing so because it's much faster than BASIC, so using those routines can defeat the purpose, particularly if you need to do a lot of such calculations.

Ideally in assembly language you want to have most of your variables consisting of single bytes, not the four used for floating point.  It's having to move all those bytes around and doing calculations involving all of them that makes the floating point subroutines slow.  And if you're going for speed, you can make a tradeoff: increased speed at the expense of accuracy.  A function involving a single byte can just be a lookup table.

Over the last several years I have developed a number of lookup tables and subroutines that enable some very fast math on the 6502 and now 65c02.  Inspired by @svenvandevelde's question about ATAN2, I decided to compile many of them into a single 8kb file that can be loaded into banked RAM.  I have other such routines and lookup tables, but I decided to just include those that would most likely be useful to the broadest range of applications.  These lookup tables and functions will work on all revisions of the X16 emulator, and indeed will work on any 65c02 system as long as the file is loaded at $A000.  The functions are all called through JMP redirects in page BF, and those JMP table locations will not change if I do revisions on the code.  I'm pretty sure I killed all the bugs, but some might have slipped through.

I'll be using further posts in this thread as sort of a user guide, but attached to this post is the file FASTMATH.BIN (the lookup tables and functions) and a text file containing the BASIC program that made all of the lookup tables (but don't copy and paste it into the emulator or it will overwrite FASTMATH.BIN).  When using FASTMATH.BIN, you must use a relocated load to put it into banked RAM, otherwise it will load into memory at $7000 and won't work.  Or you could load it into its default location at $7000 and just copy it up into the RAM bank of your choice.

This software is released under an MIT license, meaning you can use it freely for any purpose, commercial or non-commercial, as long as you leave the MIT license in there (it's located at $BF7F).  And please, these functions are all just approximations accurate to about 1%, so don't use the software for controlling a nuclear power plant or medical equipment.

maketables.txt FASTMATH.BIN

  • Like 4
  • Thanks 2
Link to comment
Share on other sites

Posted (edited)

A single byte can represent either an integer in the range 0-255 or a signed integer in the range -128 to 127; the difference is in how we think of what bit 7 represents.  If bit 7 is equivalent to 128, it's an unsigned integer, and if it's equivalent to -128 it's a signed integer.   The signed integer representation is sometimes called Two's Complement notation.  

In two's complement notation, a number can be negated simply by only two lines of code:

EOR #$FF

INC A

So -127 is just 7F EOR FF plus 1 = $81, and -1 is $FF.

FASTMATH uses this two's complement notation, with a couple of twists.  First, $80 is considered an invalid number.  Only the range from -127 to +127 is used.  These numbers are considered the numerator of a fraction, with 127 as the denominator, so -127..127 represents the range [-1.0 .. +1.0].  It isn't quite a fixed point notation; it's a fractional notation.  Only the numbers in the range [-1.0 .. 1.0] have any meaning to FASTMATH functions.

The exception to this is trigonometric functions; an angle is not treated the same way as other numbers.  For angles, we divide the circle up into 256 "binary degrees" or "bigrees".  So for angles, $80 bigrees = pi radians = 180 degrees. So to add 180 degrees to an angle, just EOR #$80.

There are 26 lookup tables in FASTMATH that are potentially useful.  For each of these, you can put the input in the X (or Y) register and just read the output:  LDA HALF,X will return half the value of X in the accumulator, for instance.

All of these lookup tables are listed as DATA statements in the text file posted above, with REM statements describing each one, so if you just want to use a few of these tables in your own code you can just extract what you need.

Be sure to set the RAM bank pointer to whatever bank holds FASTMATH when using these tables.

name  address  description

MBY2    A000    multiply by 2, truncated at 7F and 81
HALF    A100    multiply by 64/127
MLT5    A200    multiply by 32/127
MLT4    A300    multiply by 16/127
MLT3    A400    multiply by 8/127
MLT2    A500    multiply by 4/127
MLT1    A600    multiply by 2/127
MLT0    A700    multiply by 1/127
FABS    A800    absolute value
DIV6    A900    127*64/index; 00 if absolute value of index is less than 64; $80 if index is zero
DIV5    AA00    127*32/index; 00 if absolute value of index is less than 32; $80 if index is zero
DIV4    AB00    127*16/index; 00 if absolute value of index is less than 16; $80 if index is zero
DIV3    AC00    127*8/index; 00 if absolute value of index is less than 8; $80 if index is zero
DIV2    AD00    127*4/index; 00 if absolute value of index is less than 4; $80 if index is zero
DIV1    AE00    127*2/index; 00 if absolute value of index is 1; $80 if index is zero
DIV0    AF00    127/index; $80 if index is zero
FSIN    B000    sine of index
FCOS    B100    cosine of index
SQRT    B200    square root of absolute value of index
SQAR    B300    square of index
SQA3    B400    2/3 of square of index
ML3F    B500    multiply by 63/127
M2/3    B600    multiply by 2/3
M1/3    B700    multiply by 1/3
FACO    B800    arccosine
FLAT    B900    arctangent of slopes from -1 to 1

The final two lookup tables aren't directly useful to the user; they're just included here for completeness.

NOR2    BA00    used for 2d normalization
NOR3    BA80    used for 3d normalization

Edited by Ed Minchau
  • Thanks 1
Link to comment
Share on other sites

Posted (edited)

There are 36 functions included in FASTMATH.  I'll just list them here for now, and will go into more detail (and show the code) in subsequent posts.

These functions are all given a 4-character label, but you can label them whatever you want in your own code.  The JMP redirect adds three precious cycles to each subroutine, but it's worth it to keep your code working even if I revise this file later.  The time in cycles (including the JMP redirect and RTS, but not your JSR call) for each subroutine is also listed ; variable time subroutines are listed with a mean time in brackets.  Some of those are so short that the call and redirect is a significant fraction of the total time, so for those you may want to simply copy them inline in your code;  the code for the really short ones will be listed in subsequent posts when I go into more detail.

It is assumed that if you're using these functions you're not also using BASIC, so these functions use zero page locations F0 to FF as work space.

Be sure to set the RAM bank pointer to whatever bank holds FASTMATH when using these functions.

    name    addr    time    description
ARITHMETIC (SCALAR) FUNCTIONS                
    ADDT    BF99    (25)    add values in X and Y, truncate on overflow
    SUBT    BF9C    (28)    subtract X – Y, truncate on overflow
    FAVG    BF9F    19      X(64/127) + Y(63/127) {approximately X / 2 + Y / 2}
    FHDF    BFA2    19      X(64/127) – Y(63/127) {approximately X / 2 – Y / 2}
    FCMP    BFA5    23      compare X to Y
    FMIN    BFA8    (39)    minimum of X and Y
    FMAX    BFAB    (39)    maximum of X and Y
    FMLT    BFAE    47      multiply X by Y, faster version
    MULT    BFB1    (87)    multiply X by Y, more accurate version
    FDIV    BFB4    (108)   divide X by Y
    FDV2    BFB7    (88)    divide X by Y, no error checking
                
SINGLE VECTOR FUNCTIONS            
negation                
    NEXY    BFC9    25      [X,Y] = [-X,-Y]
    NEG2    BFCC    33      negate 2d vector pointed to by X
    NEG3    BFCF    45      negate 3d vector pointed to by X
doubling                
    VDXY    BFD2    21      [X,Y] = 2[X,Y]
    VD2D    BFD5    37      double the 2d vector pointed to by X
    VD3D    BFD8    51      double the 3d vector pointed to by X
multiply vector by constant                
    VML2    BF93    (220)   multiply 2d vector (pointed to by X) by the value in Y
    VML3    BF96    (326)   multiply 3d vector (pointed to by X) by the value in Y
length of a vector                
    LEXY    BFC0    34      length of vector [X,Y] divided by sqrt(2)
    LEN2    BFC3    48      length of 2d vector pointed to by X divided by sqrt(2)
    LEN3    BFC6    63      length of 3d vector pointed to by X divided by sqrt(3)
trignometry                
    RTXY    BFBA    (212)   convert radius X angle Y to coordinates (X,Y)
    ATN2    BFBD    (185)   convert coordinates (X,Y) to an angle
normalization to length 1.0                
    NOXY    BFF3    (292)   normalize [X,Y] to unit length
    NRM2    BFF6    (340)   normalize 2d vector pointed to by X to unit length
    NRM3    BFF9    (495)   normalize 3d vector pointed to by X to unit length
                
                
TWO-VECTOR FUNCTIONS            
addition and subtraction                
    VAD2    BFDB    77      add two 2d vectors pointed to by X and Y (divided by 2)
    VAD3    BFDE    111     add two 3d vectors pointed to by X and Y (divided by 2)
    VSB2    BFE1    77      subtract two 2d vectors pointed to by X and Y (divided by 2)
    VSB3    BFE4    111     subtract two 3d vectors pointed to by X and Y (divided by 2)
dot product                
    DOT2    BFE7    (163)   dot product / 2 of two 2d vectors pointed to by X and Y
    DOT3    BFEA    (263)   dot product / 3 of two 3d vectors pointed to by X and Y
angle between two vectors                
    ANG2    BFED    (492)   angle between two 2d vectors pointed to by X and Y
    ANG3    BFF0    (619)   angle between two 3d vectors pointed to by X and Y
cross product                
    CROS    BFFC    (486)   cross product / 2 of two 3d vectors pointed to by X and Y
Edited by Ed Minchau
  • Thanks 1
Link to comment
Share on other sites

On 5/13/2022 at 1:43 PM, rje said:

Do you suppose this might be a candidate for one of the spare ROM banks?

 

If so I would add about 4kb more in tables and code and the remaining 4kb would echo the Kernel redirects and JSRFAR etc from bank 4.

  • Like 2
Link to comment
Share on other sites

Posted (edited)

It just seems too useful, and there seems to be room for it.  I could use it instead of using my own half-baked routines; it could shrink my codebase, which sometimes is a good thing.

 

 

Edited by rje
  • Like 1
Link to comment
Share on other sites

Posted (edited)
On 5/13/2022 at 1:49 PM, rje said:

It just seems too useful, and there appears to be room.

 

Well, if a bunch of people actually do start using it, then maybe the team will consider it.  In the meantime I'll just keep posting more documentation below.  I need a few other people to try it anyway just in case I missed a bug somewhere.  

I made up the lookup tables on a spreadsheet which also generated the text file above, but I wrote all the code in the META/L editor, which isn't exactly conducive to sharing on GitHub.  I'm still in the process of revising the editor for rev40/41 (got sidetracked doing fastmath), but when I release that I'll also include the EDITMATH.PRG file I used to write the code, so people can poke around in it and prod it with tiny sticks.  Until then, I'll just be posting short snippets of code here as well, in the standard notation.  Maybe I'll include screenshots from the editor too.

Edited by Ed Minchau
Link to comment
Share on other sites

Posted (edited)

Ed,

This is very kind of you sharing these libraries. I'm sure that there are more people who have build such libraries over the past (30?) years. I went through your list and what really makes it interesting is to jointly discuss some of these algorithms to understand the logic applied.  Like SQRT    B200    square root of absolute value of index

Edited by svenvandevelde
  • Like 1
Link to comment
Share on other sites

Posted (edited)

I can see I need to add titles to that list... done.

In that example, B200 is the address of the start of the square root table. The index into the table is the input, the value stored at (B200 plus the index) is the square root of the index.  If the index is negative,  the value stored at that address is the square root of the absolute value of the index.  i.e.:

Find square root of -1/2:

LDX #$C0;  minus one half

LDA SQRT,X; loads the value stored at B2C0

CPX #$80

This returns $5A (90/127 = 0.70866... close enough to 0.7071) and the carry set indicates i. 

 

Something similar can be done with all of those lookup tables.  All of them except FSIN and FCOS take one of these fractional data types as the index of a 256 byte LUT.  (FSIN and FCOS take an angle as an index.) Simply reading the table offset by X or Y gives the answer, in 4 cycles.  The subroutines use several of these tables at once to do more complex things, but the lookup tables are all available for the user too.

Edited by Ed Minchau
Link to comment
Share on other sites

Posted (edited)

Arithmetic functions: addition and subtraction

We're using fractions in the range of -1 to 1 as inputs to the addition and subtraction functions.  There's a problem with that: the range of the sum (or difference) of two numbers in the range [-1..1] is [-2..2].  There are two ways to handle this problem: either by truncating the result or by scaling.  Both methods are available.  ADDT and SUBT truncate the result on overflow, and FAVG and FHDF divide the result by two.

Note that for all of the functions listed, the tables used are also included.  That way if you only want a few of the functions, you can simply copy the appropriate tables out of the text file above and copy the code listed for each function.

function name  ADDT    
address        BF99    
time (mean)    (25) cycles
description    add values in X and Y, truncate on overflow    
inputs         X, Y    
registers used A   
tables used    none
outputs        A, C, N, Z    
notes          will clip at $7F on overflow high or at $81 on overflow low;
               carry set on overflow; Z set on result 00; N set if result negative

function name  SUBT
address        BF9C
time (mean)    (28) cycles
description    subtract X – Y, truncate on overflow
inputs         X, Y
registers used A
tables used    none
outputs        A, C, N, Z
notes          will clip at $7F on overflow high or at $81 on overflow low;
               carry set on overflow; Z set on result 00; N set if result negative

function name  FAVG
address        BF9F
time (mean)    19 cycles
description    X(64/127) + Y(63/127) {approximately X / 2 + Y / 2}
inputs         X, Y
registers used A
tables used    HALF, ML3F
outputs        A
notes          average value of X and Y

function name  FHDF
address        BFA2
time (mean)    19 cycles
description    X(64/127) – Y(63/127) {approximately X / 2 – Y / 2}
inputs         X, Y
registers used A
tables used    HALF, ML3F
outputs        A
notes          half the difference between X and Y

Here's what the code looks like in my editor:

note that rather than trying to parse the parameters to figure out what the command addressing mode is, 
the META/L editor has a unique 4 character code for each command to remove all ambiguity.  
The first three characters are the same as the standard notation, but the fourth is the addressing mode as follows:
	A absolute address
	X absolute address offset by X
	Y absolute address offset by Y
	# immediate
	I implied (also used for Accumulator mode on those commands that also have Absolute addressing mode, like LSR)
	Z zero page indexed by X (i.e. LDA 04,X is LDAZ 04)
	/ zero page indirect
	+ indexed indirect (i.e. LDA (04,X) is LDA+ 04)
	- indirect indexed (i.e. LDA (04),Y is LDA- 04)
	0 zero page
there are some exceptions that just made sense to me, such as CLCF for clear carry flag

arith1.png.b71c1fbafb7ffb35c08e0093b630ee95.png

FAVG and FHDF are so short that if you're trying to squeeze in as many calculations per second as possible, it's better to write them inline with your code rather than calling the subroutine; 6 cycles for JSR, 3 cycles for the JMP redirect, and 6 cycles for the RTS are more cycles than the calculation itself.  My editor uses a nonstandard notation, so in standard notation those are

LDA A100,X
CLC
ADC B500,Y 

and

LDA A100,X
SEC
SBC B500,Y

 

Edited by Ed Minchau
Link to comment
Share on other sites

Posted (edited)

Arithmetic operations: compare, minimum, maximum

Since these numbers are in two's complement notation, simple CMP commands won't properly compare the numbers; negative numbers appear bigger than positive to CMP.  FCMP produces the correct Z, C, and N flags.  If the invalid input of $80 is used, it is considered negative infinity.

function name  FCMP
address        BFA5
time (mean)    23 cycles
description    compare X to Y
inputs         X, Y
registers used A
tables used    none
outputs        Z,N,C
notes          Z set if X = Y, C set if X >= Y, N set if X<Y
	

The minimum and maximum of two numbers  are useful for things like a chess program's minimax algorithm.  The maximum is also equivalent to OR in fuzzy logic.

function name  FMIN
address        BFA8
time (mean)    (39) cycles
description    minimum of X and Y
inputs         X, Y
registers used A
tables used    none
outputs        A
notes          returns the minimum

function name  FMAX
address        BFAB
time (mean)    (39) cycles
description    maximum of X and Y
inputs         X, Y
registers used A
tables used    none
outputs        A
notes          returns the maximum

In the META/L editor:

arith2a.png.1612f079c2738a123828c87ae823f527.png

Edited by Ed Minchau
Link to comment
Share on other sites

Posted (edited)

arithmetic operations: multiplication and division

There are two different multiplication subroutines.  Both of them are fairly accurate.  FMLT is about twice as fast as MULT, but it is slightly less accurate and it changes the values in X and Y.  MULT isn't 100% accurate but all results are within $01 of the correct value, and MULT preserves the values in X and Y.

function name  FMLT
address        BFAE
time (mean)    47 cycles
description    multiply X by Y, faster version
inputs         X, Y
registers used A, X, Y
outputs        A
tables used    HALF, ML3F
notes          error standard deviation = 0.6757

function name  MULT
address        BFB1
time (mean)    (87) cycles
description    multiply X by Y, more accurate version
inputs         X, Y
registers used A
tables used    FABS,HALF, MLT5, MLT4, MLT3, MLT2, MLT1, MLT0
outputs        A
notes          error standard deviation = 0.4301

The code for FMLT is displayed below.  It uses inline versions of FAVG and FHDF, taking advantage of the identity

(x/2 + y/2)^2 - (x/2 - y/2)^2

= x^2/4 + 2xy/4 + y^2/4 - x^2/4 + 2xy/4 - y^2/4

= 4xy/4

= xy

arith2b.png.5cdb15459927fc2a28d89f190ea540a3.png

Here's the code for MULT:

arith3.png.2b2ce99fe209b60008e6606d74c5869a.png

MULT just adds fractions if the corresponding bit in the absolute value of X is equal to 1.  The BBR commands just skip that addition if the bit in absolute value of X is zero. That absolute value is taken at the beginning if X is negative, and then bit 7 in the temporary zero page variable is set. At the end of the calculation, if bit 7 is set the result is negated.  The only command not shown on this image is the RTS.

There are two entries to the division subroutine.  FDIV is the first, and does the error checking on the inputs.  FDV2 is a little farther down and skips that error checking step.

function name  FDIV
address        BFB4
time (mean)    (108) cycles
description    divide X by Y
inputs         X, Y
registers used A
tables used    FABS, DIV6, DIV5, DIV4, DIV3, DIV2, DIV1, DIV0
outputs        A
notes          error standard deviation = 0.4726, returns carry set if Y=$00 or Y=$80 or X=$80 or abs(X)>abs(Y)

function name  FDV2
address        BFB7
time (mean)    (88) cycles
description    divide X by Y, no error checking
inputs         X, Y
registers used A
tables used    FABS, DIV6, DIV5, DIV4, DIV3, DIV2, DIV1, DIV0
outputs        A
notes          error standard deviation = 0.4726

 Here's the code for FDIV in the META/L editor:

arith4.png.b635a070aa44a54d4f17e48ae29a8635.png

arith5a.png.dfeac48db8357ae6588940675d3d5c98.png

The multiplication and division functions are pretty important to the rest of the subroutines, so I wanted to make certain that they were actually close to correct.  A nice thing about having two-parameter functions that can each only take 256 possible values is that it's easy to calculate all possible results for all possible combinations of inputs on a spreadsheet, and compare the results to an ideal target.  The targets were all done in floating point, and only converted into single byte format at the last step.  I was then able to compare the calculations of these subroutines against the target, and make histograms of the differences between calculation and target.

FMLTerror.png.8616cced20b7b5d0716271bdb1529efa.png

MULTerror.png.74aca181efaf20aeec4370f792343f1d.png

FDIVerror.png.e1e4afb6ab0f351291279c08cd657991.png

There is another division method using logarithms, but it's only about 20 cycles faster than FDIV and the accuracy is pretty bad, so I didn't include the log/antilog tables in the file:

DIvLOGerror.png.4e28650b18bcc3cfbf24d018e35aa888.png

Edited by Ed Minchau
Link to comment
Share on other sites

Vectors are objects with two or more numerical parameters.  These can be passed to the vector subroutines in one of three ways:

- the vector X and Y coordinates can be stored in the X and Y registers

- the vector coordinates can be stored in two consecutive locations in zero page, the first of which is pointed to by X or Y

- the vector coordinates can be stored in three consecutive locations in zero page, the first of which is pointed to by X or Y

 

Single Vector operations: multiplication by -1

function name  NEXY
address        BFC9
time (mean)    25 cycles
description    [X,Y] = [-X,-Y]
inputs         X, Y
registers used A, X, Y
tables used    none
outputs        X, Y
notes          negates the vector in [X,Y]

function name  NEG2
address        BFCC
time (mean)    33 cycles
description    negate 2d vector pointed to by X
inputs         X
registers used A
tables used    none
outputs        zero page
notes          the input vector in zero page pointed to by X is also the result vector

function name  NEG3
address        BFCF
time (mean)    45 cycles
description    negate 3d vector pointed to by X
inputs         X
registers used A
tables used    none
outputs        zero page
notes          the input vector in zero page pointed to by X is also the result vector

vectN.png.ed273ade285503303f97cccb42082ba2.png

 

Single Vector operations: multiplication by 2

note that these operations do not check for overflow, they simply look up a doubling amount in the MBY2 table, which clips overflow at 7F or 81

function name  VDXY
address        BFD2
time (mean)    21 cycles
description    [X,Y] = 2[X,Y]
inputs         X, Y
registers used A, X, Y
tables used    MBY2
outputs        X, Y
notes          will clip at $7F on overflow high or at $81 on overflow low

function name  VD2D
address        BFD5
time (mean)    37 cycles
description    double the 2d vector pointed to by X
inputs         X
registers used A, Y
tables used    MBY2
outputs        zero page
notes          the input vector in zero page pointed to by X is also the result vector;
               will clip at $7F on overflow high or at $81 on overflow low

function name  VD3D
address        BFD8
time (mean)    51 cycles
description    double the 3d vector pointed to by X
inputs         X
registers used A, Y
tables used    MBY2
outputs        zero page
notes          the input vector in zero page pointed to by X is also the result vector;
               will clip at $7F on overflow high or at $81 on overflow low

vectD.png.744e2071339cb6ad1ab9eda02cf6e456.png

 

Single Vector operations: multiply by a constant

function name  VML2
address        BF93
time (mean)    (220) cycles
description    multiply 2d vector (pointed to by X) by the value in Y
inputs         X, Y
registers used A
tables used    FABS,HALF, MLT5, MLT4, MLT3, MLT2, MLT1, MLT0
outputs        zero page
notes          the input vector in zero page pointed to by X is also the result vector

function name  VML3
address        BF96
time (mean)    (326) cycles
description    multiply 3d vector (pointed to by X) by the value in Y
inputs         X, Y
registers used A
tables used    FABS,HALF, MLT5, MLT4, MLT3, MLT2, MLT1, MLT0
outputs        zero page
notes          the input vector in zero page pointed to by X is also the result vector

vectM.png.3a257b8f63e2910d11f4d174d282b0f8.png

 

Edited by Ed Minchau
Link to comment
Share on other sites

Vector Length

function name  LEXY
address        BFC0
time (mean)    34 cycles
description    length of vector [X,Y] divided by sqrt(2)
inputs         X, Y
registers used A
tables used    SQAR, SQRT
outputs        A
notes          length value is divided by sqrt(2) to keep it in the range [0..1]

function name  LEN2
address        BFC3
time (mean)    48 cycles
description    length of 2d vector pointed to by X divided by sqrt(2)
inputs         X
registers used A, Y
tables used    SQAR, SQRT
outputs        A
notes          length value is divided by sqrt(2) to keep it in the range [0..1]

function name  LEN3
address        BFC6
time (mean)    63 cycles
description    length of 3d vector pointed to by X divided by sqrt(3)
inputs         X
registers used A, Y
tables used    SQA3, SQRT
outputs        A
notes          length value is divided by sqrt(3) to keep it in the range [0..1]

vectLxy.png.d4b310b434b78a1c2b3ee386d677f474.png

vectL23.png.41540c7a19f6250831cb60d72d0f0db2.png

Edited by Ed Minchau
Link to comment
Share on other sites

Single vector trigonometric operations:

function name  RTXY
address        BFBA
time (mean)    (212) cycles
description    convert radius X angle Y to coordinates (X,Y)
inputs         X, Y
registers used A, X, Y
tables used    FABS,HALF, MLT5, MLT4, MLT3, MLT2, MLT1, MLT0, FSIN, FCOS
outputs        X, Y
notes          if the radius is negative, then the absolute value of the radius is used and the angle changes by $80;
               radius of 00 or $80 returns [0,0]

function name  ATN2
address        BFBD
time (mean)    (185) cycles
description    convert (X,Y) to an angle
inputs         X, Y
registers used A, X, Y
tables used    FABS, DIV6, DIV5, DIV4, DIV3, DIV2, DIV1, DIV0, FLAT
outputs        A
notes          error standard deviation = 0.2881

I did a complete analysis of the ATN2 function, and it turns out it's the most accurate of all the functions; there's only one division in the operation, and when it uses the FLAT table, which returns the arctangent of the slope, the minor errors in division mostly don't matter:

ATN2error.png.71740e3a51ac50ae28a69ae668b4194a.png

vectRTXY.png.8af38c576864b8890bd3752680c7bd92.png

vectATN2a.png.6a6d5daefd1d481c02d3345e3635fb0f.png

vectATN2b.png.946ab3c13c6c2fd6869b999a20aacadf.png

Edited by Ed Minchau
Link to comment
Share on other sites

Single vector: Normalization

These subroutines are used if you want to convert a 2d or 3d vector into a vector with the same direction but unit length.

function name  NOXY
address        BFF3
time (mean)    (292) cycles
description    normalize [X,Y] to unit length
inputs         X, Y
registers used A, X, Y
tables used    SQAR, MBY2, FABS,HALF, MLT5, MLT4, MLT3, MLT2, MLT1, MLT0, NOR2
outputs        X, Y, C
notes          carry set if input vector is [0,0]

function name  NRM2
address        BFF6
time (mean)    (340) cycles
description    normalize 2d vector pointed to by X to unit length
inputs         X
registers used A, Y
tables used    SQAR, MBY2, FABS,HALF, MLT5, MLT4, MLT3, MLT2, MLT1, MLT0, NOR2
outputs        zero page, C
notes          the input vector in zero page pointed to by X is also the result vector; carry set if input vector is [0,0]

function name  NRM3
address        BFF9
time (mean)    (495) cycles
description    normalize 3d vector pointed to by X to unit length
inputs         X
registers used A
tables used    SQA3, MBY2, FABS,HALF, MLT5, MLT4, MLT3, MLT2, MLT1, MLT0, NOR3
outputs        zero page, C
notes          the input vector in zero page pointed to by X is also the result vector; carry set if input vector is [0,0,0]

vectNOXYa.png.85354ab07d8d38ead1919763a5f89152.png

vectNOXYb.png.5b0c5c4113c6b3ad792d48b5b2bc6b0d.png

vectNOXYc.png.351d28f117504b0509778f16e1a41143.png

vectNOXYd.png.85ee62b841168be53e882d119f80ad30.png

 

Edited by Ed Minchau
Link to comment
Share on other sites

Two-vector operations:  addition and subtraction

These functions add or subtract the individual components of two vectors.  Since this addition or subtraction could yield a value in the range [-2..2], the result is half the value of the sum or difference, to keep each component in the [-1..1] range.

function name  VAD2
address        BFDB
time (mean)    77 cycles
description    add two 2d vectors pointed to by X and Y
inputs         X, Y
registers used A
tables used    HALF, ML3F
outputs        zero page
notes          result in [FD,FE] is half the length of the actual result to keep all components in the range [-1..1]

function name  VAD3
address        BFDE
time (mean)    111 cycles
description    add two 3d vectors pointed to by X and Y
inputs         X, Y
registers used A
tables used    HALF, ML3F
outputs        zero page
notes          result in [FD,FE,FF] is half the length of the actual result to keep all components in the range [-1..1]

function name  VSB2
address        BFE1
time (mean)    77 cycles
description    subtract two 2d vectors pointed to by X and Y
inputs         X, Y
registers used A
tables used    HALF, ML3F
outputs        zero page
notes          result in [FD,FE] is half the length of the actual result to keep all components in the range [-1..1]

function name  VSB3
address        BFE4
time (mean)    111 cycles
description    subtract two 3d vectors pointed to by X and Y
inputs         X, Y
registers used A
tables used    HALF, ML3F
outputs        zero page
notes          result in [FD,FE,FF] is half the length of the actual result to keep all components in the range [-1..1]

vectADD3.png.e5c8cb58862cdac8c920bb3a1870ec93.png

vectADD3b.png.4ec3f47f9cf97a6de357ce79e3464ec9.png

vectADD3c.png.5452b24b846aa4179799e21327ace57e.png

Edited by Ed Minchau
Link to comment
Share on other sites

Two-vector operations: dot product and angle between two vectors

The vector dot product is an intermediate step on the way to finding the angle between two vectors.  There's a 2d and a 3d version.

function name  DOT2
address        BFE7
time (mean)    (163) cycles
description    dot product / 2 of two 2d vectors pointed to by X and Y
inputs         X, Y
registers used A, X, Y
tables used    HALF, ML3F
outputs        A
notes          result is half the dot product to keep it in the range [-1..1]

function name  DOT3
address        BFEA
time (mean)    (263) cycles
description    dot product / 3 of two 3d vectors pointed to by X and Y
inputs         X, Y
registers used A, X, Y
tables used    HALF, ML3F, M2/3, M1/3
outputs        A
notes          result is one-third the dot product to keep it in the range [-1..1]

The angle between two vectors is given by the equation 

theta = arccosine ((VdotW)/((LenV)(LenW)))

If the carry is clear when calling this function the result will be as above.  If the carry is set, the result will be the negative of theta.

function name  ANG2
address        BFED
time (mean)    (492) cycles
description    angle between two 2d vectors pointed to by X and Y
inputs         X, Y, C
registers used A, X, Y
tables used    SQAR, SQRT, FABS, DIV6, DIV5, DIV4, DIV3, DIV2, DIV1, DIV0, HALF, ML3F
outputs        A
notes          calculates theta=arccos((X dot Y)/((len X)(len Y))); if carry is set on input, the output angle will be negative

function name  ANG3
address        BFF0
time (mean)    (619) cycles
description    angle between two 3d vectors pointed to by X and Y
inputs         X, Y, C
registers used A, X, Y
tables used    SQA3, SQRT, FABS, DIV6, DIV5, DIV4, DIV3, DIV2, DIV1, DIV0, HALF, ML3F, M2/3, M1/3
outputs        A
notes          calculates theta=arccos((X dot Y)/((len X)(len Y))); if carry is set on input, the output angle will be negative

Note that the dot product is divided by 2 in 2d and divided by 3 in 3d; similarly the lengths are divided by sqrt(2) in 2d and by sqrt(3) in 3d; when combined in the ANG2 and ANG3 functions, these scaling factors all cancel out.

vectDOTa.png.e759e52c847c382029e6c06c256cec15.png

vectDOTb.png.7338f1e90d50b4b46358a13986b16210.png

vectDOTc.png.8489b83415ae089398458cd9ee8da79e.png

Edited by Ed Minchau
Link to comment
Share on other sites

Vector Cross Product

The cross product is the vector perpendicular to the plane described by two vectors; the cross product is a vector that is perpendicular to both of the given vectors.  This only works in 3d.  This is very useful for things like light reflection when ray tracing.  The resulting vector is half of the actual cross product, to prevent results outside the [-1..1] range.

The cross product result vector will be stored in zero page at FD, FE, and FF.

function name  CROS
address        BFFC
time (mean)    (486) cycles
description    cross product / 2 of two 3d vectors pointed to by X and Y
inputs         X, Y
registers used A, X, Y
tables used    HALF, ML3F
outputs        zero page
notes          result vector in [FD,FE,FF] is half the length of the actual cross product to keep all components in the range [-1..1]

vectCROSa.png.cce1218b0b60e7284ab4e33634d97caa.png

vectCROSb.png.5ac9e76ee9417a7048a10e48c8c0035d.png

Edited by Ed Minchau
Link to comment
Share on other sites

On 5/16/2022 at 9:29 PM, Ed Minchau said:

Zsound should probably find a home in ROM, too.

While I concur, when I was talking about a "home in ROM", even though my open source Sweet16VM is "fat" compared to Woz's masterwork of spaghetti 6502 code ... it's still under 0.75K, so it it's going to find a home in ROM, it has to ride along with something else that is (1) more substantial but (2) under 16K in its own right.

The alternative is to implement enough applets using the Sweet16 VM to justify putting the whole collection in a ROM, but I am working 11 hour shifts at present, so I probably don't have the time for that

Edited by BruceMcF
  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...

Important Information

Please review our Terms of Use