Ed Minchau Posted May 13 Share Posted May 13 When programming in assembly language, quite often you will need to do calculations that would normally require floating point math. Those subroutines are available, the same ones that BASIC uses, but they are very slow. And if you're programming in assembly, you're doing so because it's much faster than BASIC, so using those routines can defeat the purpose, particularly if you need to do a lot of such calculations. Ideally in assembly language you want to have most of your variables consisting of single bytes, not the four used for floating point. It's having to move all those bytes around and doing calculations involving all of them that makes the floating point subroutines slow. And if you're going for speed, you can make a tradeoff: increased speed at the expense of accuracy. A function involving a single byte can just be a lookup table. Over the last several years I have developed a number of lookup tables and subroutines that enable some very fast math on the 6502 and now 65c02. Inspired by @svenvandevelde's question about ATAN2, I decided to compile many of them into a single 8kb file that can be loaded into banked RAM. I have other such routines and lookup tables, but I decided to just include those that would most likely be useful to the broadest range of applications. These lookup tables and functions will work on all revisions of the X16 emulator, and indeed will work on any 65c02 system as long as the file is loaded at $A000. The functions are all called through JMP redirects in page BF, and those JMP table locations will not change if I do revisions on the code. I'm pretty sure I killed all the bugs, but some might have slipped through. I'll be using further posts in this thread as sort of a user guide, but attached to this post is the file FASTMATH.BIN (the lookup tables and functions) and a text file containing the BASIC program that made all of the lookup tables (but don't copy and paste it into the emulator or it will overwrite FASTMATH.BIN). When using FASTMATH.BIN, you must use a relocated load to put it into banked RAM, otherwise it will load into memory at $7000 and won't work. Or you could load it into its default location at $7000 and just copy it up into the RAM bank of your choice. This software is released under an MIT license, meaning you can use it freely for any purpose, commercial or non-commercial, as long as you leave the MIT license in there (it's located at $BF7F). And please, these functions are all just approximations accurate to about 1%, so don't use the software for controlling a nuclear power plant or medical equipment. maketables.txt FASTMATH.BIN 4 2 Quote Link to comment Share on other sites More sharing options...

Ed Minchau Posted May 13 Author Share Posted May 13 (edited) A single byte can represent either an integer in the range 0-255 or a signed integer in the range -128 to 127; the difference is in how we think of what bit 7 represents. If bit 7 is equivalent to 128, it's an unsigned integer, and if it's equivalent to -128 it's a signed integer. The signed integer representation is sometimes called Two's Complement notation. In two's complement notation, a number can be negated simply by only two lines of code: EOR #$FF INC A So -127 is just 7F EOR FF plus 1 = $81, and -1 is $FF. FASTMATH uses this two's complement notation, with a couple of twists. First, $80 is considered an invalid number. Only the range from -127 to +127 is used. These numbers are considered the numerator of a fraction, with 127 as the denominator, so -127..127 represents the range [-1.0 .. +1.0]. It isn't quite a fixed point notation; it's a fractional notation. Only the numbers in the range [-1.0 .. 1.0] have any meaning to FASTMATH functions. The exception to this is trigonometric functions; an angle is not treated the same way as other numbers. For angles, we divide the circle up into 256 "binary degrees" or "bigrees". So for angles, $80 bigrees = pi radians = 180 degrees. So to add 180 degrees to an angle, just EOR #$80. There are 26 lookup tables in FASTMATH that are potentially useful. For each of these, you can put the input in the X (or Y) register and just read the output: LDA HALF,X will return half the value of X in the accumulator, for instance. All of these lookup tables are listed as DATA statements in the text file posted above, with REM statements describing each one, so if you just want to use a few of these tables in your own code you can just extract what you need. Be sure to set the RAM bank pointer to whatever bank holds FASTMATH when using these tables. name address description MBY2 A000 multiply by 2, truncated at 7F and 81 HALF A100 multiply by 64/127 MLT5 A200 multiply by 32/127 MLT4 A300 multiply by 16/127 MLT3 A400 multiply by 8/127 MLT2 A500 multiply by 4/127 MLT1 A600 multiply by 2/127 MLT0 A700 multiply by 1/127 FABS A800 absolute value DIV6 A900 127*64/index; 00 if absolute value of index is less than 64; $80 if index is zero DIV5 AA00 127*32/index; 00 if absolute value of index is less than 32; $80 if index is zero DIV4 AB00 127*16/index; 00 if absolute value of index is less than 16; $80 if index is zero DIV3 AC00 127*8/index; 00 if absolute value of index is less than 8; $80 if index is zero DIV2 AD00 127*4/index; 00 if absolute value of index is less than 4; $80 if index is zero DIV1 AE00 127*2/index; 00 if absolute value of index is 1; $80 if index is zero DIV0 AF00 127/index; $80 if index is zero FSIN B000 sine of index FCOS B100 cosine of index SQRT B200 square root of absolute value of index SQAR B300 square of index SQA3 B400 2/3 of square of index ML3F B500 multiply by 63/127 M2/3 B600 multiply by 2/3 M1/3 B700 multiply by 1/3 FACO B800 arccosine FLAT B900 arctangent of slopes from -1 to 1 The final two lookup tables aren't directly useful to the user; they're just included here for completeness. NOR2 BA00 used for 2d normalization NOR3 BA80 used for 3d normalization Edited May 15 by Ed Minchau 1 Quote Link to comment Share on other sites More sharing options...

Ed Minchau Posted May 13 Author Share Posted May 13 (edited) There are 36 functions included in FASTMATH. I'll just list them here for now, and will go into more detail (and show the code) in subsequent posts. These functions are all given a 4-character label, but you can label them whatever you want in your own code. The JMP redirect adds three precious cycles to each subroutine, but it's worth it to keep your code working even if I revise this file later. The time in cycles (including the JMP redirect and RTS, but not your JSR call) for each subroutine is also listed ; variable time subroutines are listed with a mean time in brackets. Some of those are so short that the call and redirect is a significant fraction of the total time, so for those you may want to simply copy them inline in your code; the code for the really short ones will be listed in subsequent posts when I go into more detail. It is assumed that if you're using these functions you're not also using BASIC, so these functions use zero page locations F0 to FF as work space. Be sure to set the RAM bank pointer to whatever bank holds FASTMATH when using these functions. name addr time description ARITHMETIC (SCALAR) FUNCTIONS ADDT BF99 (25) add values in X and Y, truncate on overflow SUBT BF9C (28) subtract X – Y, truncate on overflow FAVG BF9F 19 X(64/127) + Y(63/127) {approximately X / 2 + Y / 2} FHDF BFA2 19 X(64/127) – Y(63/127) {approximately X / 2 – Y / 2} FCMP BFA5 23 compare X to Y FMIN BFA8 (39) minimum of X and Y FMAX BFAB (39) maximum of X and Y FMLT BFAE 47 multiply X by Y, faster version MULT BFB1 (87) multiply X by Y, more accurate version FDIV BFB4 (108) divide X by Y FDV2 BFB7 (88) divide X by Y, no error checking SINGLE VECTOR FUNCTIONS negation NEXY BFC9 25 [X,Y] = [-X,-Y] NEG2 BFCC 33 negate 2d vector pointed to by X NEG3 BFCF 45 negate 3d vector pointed to by X doubling VDXY BFD2 21 [X,Y] = 2[X,Y] VD2D BFD5 37 double the 2d vector pointed to by X VD3D BFD8 51 double the 3d vector pointed to by X multiply vector by constant VML2 BF93 (220) multiply 2d vector (pointed to by X) by the value in Y VML3 BF96 (326) multiply 3d vector (pointed to by X) by the value in Y length of a vector LEXY BFC0 34 length of vector [X,Y] divided by sqrt(2) LEN2 BFC3 48 length of 2d vector pointed to by X divided by sqrt(2) LEN3 BFC6 63 length of 3d vector pointed to by X divided by sqrt(3) trignometry RTXY BFBA (212) convert radius X angle Y to coordinates (X,Y) ATN2 BFBD (185) convert coordinates (X,Y) to an angle normalization to length 1.0 NOXY BFF3 (292) normalize [X,Y] to unit length NRM2 BFF6 (340) normalize 2d vector pointed to by X to unit length NRM3 BFF9 (495) normalize 3d vector pointed to by X to unit length TWO-VECTOR FUNCTIONS addition and subtraction VAD2 BFDB 77 add two 2d vectors pointed to by X and Y (divided by 2) VAD3 BFDE 111 add two 3d vectors pointed to by X and Y (divided by 2) VSB2 BFE1 77 subtract two 2d vectors pointed to by X and Y (divided by 2) VSB3 BFE4 111 subtract two 3d vectors pointed to by X and Y (divided by 2) dot product DOT2 BFE7 (163) dot product / 2 of two 2d vectors pointed to by X and Y DOT3 BFEA (263) dot product / 3 of two 3d vectors pointed to by X and Y angle between two vectors ANG2 BFED (492) angle between two 2d vectors pointed to by X and Y ANG3 BFF0 (619) angle between two 3d vectors pointed to by X and Y cross product CROS BFFC (486) cross product / 2 of two 3d vectors pointed to by X and Y Edited May 14 by Ed Minchau 1 Quote Link to comment Share on other sites More sharing options...

rje Posted May 13 Share Posted May 13 Do you suppose this might be a candidate for one of the spare ROM banks? Quote Link to comment Share on other sites More sharing options...

Ed Minchau Posted May 13 Author Share Posted May 13 On 5/13/2022 at 1:43 PM, rje said: Do you suppose this might be a candidate for one of the spare ROM banks? If so I would add about 4kb more in tables and code and the remaining 4kb would echo the Kernel redirects and JSRFAR etc from bank 4. 2 Quote Link to comment Share on other sites More sharing options...

rje Posted May 13 Share Posted May 13 (edited) It just seems too useful, and there seems to be room for it. I could use it instead of using my own half-baked routines; it could shrink my codebase, which sometimes is a good thing. Edited May 13 by rje 1 Quote Link to comment Share on other sites More sharing options...

Ed Minchau Posted May 13 Author Share Posted May 13 (edited) On 5/13/2022 at 1:49 PM, rje said: It just seems too useful, and there appears to be room. Well, if a bunch of people actually do start using it, then maybe the team will consider it. In the meantime I'll just keep posting more documentation below. I need a few other people to try it anyway just in case I missed a bug somewhere. I made up the lookup tables on a spreadsheet which also generated the text file above, but I wrote all the code in the META/L editor, which isn't exactly conducive to sharing on GitHub. I'm still in the process of revising the editor for rev40/41 (got sidetracked doing fastmath), but when I release that I'll also include the EDITMATH.PRG file I used to write the code, so people can poke around in it and prod it with tiny sticks. Until then, I'll just be posting short snippets of code here as well, in the standard notation. Maybe I'll include screenshots from the editor too. Edited May 13 by Ed Minchau Quote Link to comment Share on other sites More sharing options...

svenvandevelde Posted May 14 Share Posted May 14 (edited) Ed, This is very kind of you sharing these libraries. I'm sure that there are more people who have build such libraries over the past (30?) years. I went through your list and what really makes it interesting is to jointly discuss some of these algorithms to understand the logic applied. Like SQRT B200 square root of absolute value of index Edited May 14 by svenvandevelde 1 Quote Link to comment Share on other sites More sharing options...

Ed Minchau Posted May 14 Author Share Posted May 14 (edited) I can see I need to add titles to that list... done. In that example, B200 is the address of the start of the square root table. The index into the table is the input, the value stored at (B200 plus the index) is the square root of the index. If the index is negative, the value stored at that address is the square root of the absolute value of the index. i.e.: Find square root of -1/2: LDX #$C0; minus one half LDA SQRT,X; loads the value stored at B2C0 CPX #$80 This returns $5A (90/127 = 0.70866... close enough to 0.7071) and the carry set indicates i. Something similar can be done with all of those lookup tables. All of them except FSIN and FCOS take one of these fractional data types as the index of a 256 byte LUT. (FSIN and FCOS take an angle as an index.) Simply reading the table offset by X or Y gives the answer, in 4 cycles. The subroutines use several of these tables at once to do more complex things, but the lookup tables are all available for the user too. Edited May 14 by Ed Minchau Quote Link to comment Share on other sites More sharing options...

Ed Minchau Posted May 14 Author Share Posted May 14 (edited) Arithmetic functions: addition and subtraction We're using fractions in the range of -1 to 1 as inputs to the addition and subtraction functions. There's a problem with that: the range of the sum (or difference) of two numbers in the range [-1..1] is [-2..2]. There are two ways to handle this problem: either by truncating the result or by scaling. Both methods are available. ADDT and SUBT truncate the result on overflow, and FAVG and FHDF divide the result by two. Note that for all of the functions listed, the tables used are also included. That way if you only want a few of the functions, you can simply copy the appropriate tables out of the text file above and copy the code listed for each function. function name ADDT address BF99 time (mean) (25) cycles description add values in X and Y, truncate on overflow inputs X, Y registers used A tables used none outputs A, C, N, Z notes will clip at $7F on overflow high or at $81 on overflow low; carry set on overflow; Z set on result 00; N set if result negative function name SUBT address BF9C time (mean) (28) cycles description subtract X – Y, truncate on overflow inputs X, Y registers used A tables used none outputs A, C, N, Z notes will clip at $7F on overflow high or at $81 on overflow low; carry set on overflow; Z set on result 00; N set if result negative function name FAVG address BF9F time (mean) 19 cycles description X(64/127) + Y(63/127) {approximately X / 2 + Y / 2} inputs X, Y registers used A tables used HALF, ML3F outputs A notes average value of X and Y function name FHDF address BFA2 time (mean) 19 cycles description X(64/127) – Y(63/127) {approximately X / 2 – Y / 2} inputs X, Y registers used A tables used HALF, ML3F outputs A notes half the difference between X and Y Here's what the code looks like in my editor: note that rather than trying to parse the parameters to figure out what the command addressing mode is, the META/L editor has a unique 4 character code for each command to remove all ambiguity. The first three characters are the same as the standard notation, but the fourth is the addressing mode as follows: A absolute address X absolute address offset by X Y absolute address offset by Y # immediate I implied (also used for Accumulator mode on those commands that also have Absolute addressing mode, like LSR) Z zero page indexed by X (i.e. LDA 04,X is LDAZ 04) / zero page indirect + indexed indirect (i.e. LDA (04,X) is LDA+ 04) - indirect indexed (i.e. LDA (04),Y is LDA- 04) 0 zero page there are some exceptions that just made sense to me, such as CLCF for clear carry flag FAVG and FHDF are so short that if you're trying to squeeze in as many calculations per second as possible, it's better to write them inline with your code rather than calling the subroutine; 6 cycles for JSR, 3 cycles for the JMP redirect, and 6 cycles for the RTS are more cycles than the calculation itself. My editor uses a nonstandard notation, so in standard notation those are LDA A100,X CLC ADC B500,Y and LDA A100,X SEC SBC B500,Y Edited May 15 by Ed Minchau Quote Link to comment Share on other sites More sharing options...

Ed Minchau Posted May 14 Author Share Posted May 14 (edited) Arithmetic operations: compare, minimum, maximum Since these numbers are in two's complement notation, simple CMP commands won't properly compare the numbers; negative numbers appear bigger than positive to CMP. FCMP produces the correct Z, C, and N flags. If the invalid input of $80 is used, it is considered negative infinity. function name FCMP address BFA5 time (mean) 23 cycles description compare X to Y inputs X, Y registers used A tables used none outputs Z,N,C notes Z set if X = Y, C set if X >= Y, N set if X<Y The minimum and maximum of two numbers are useful for things like a chess program's minimax algorithm. The maximum is also equivalent to OR in fuzzy logic. function name FMIN address BFA8 time (mean) (39) cycles description minimum of X and Y inputs X, Y registers used A tables used none outputs A notes returns the minimum function name FMAX address BFAB time (mean) (39) cycles description maximum of X and Y inputs X, Y registers used A tables used none outputs A notes returns the maximum In the META/L editor: Edited May 15 by Ed Minchau Quote Link to comment Share on other sites More sharing options...

Ed Minchau Posted May 14 Author Share Posted May 14 (edited) arithmetic operations: multiplication and division There are two different multiplication subroutines. Both of them are fairly accurate. FMLT is about twice as fast as MULT, but it is slightly less accurate and it changes the values in X and Y. MULT isn't 100% accurate but all results are within $01 of the correct value, and MULT preserves the values in X and Y. function name FMLT address BFAE time (mean) 47 cycles description multiply X by Y, faster version inputs X, Y registers used A, X, Y outputs A tables used HALF, ML3F notes error standard deviation = 0.6757 function name MULT address BFB1 time (mean) (87) cycles description multiply X by Y, more accurate version inputs X, Y registers used A tables used FABS,HALF, MLT5, MLT4, MLT3, MLT2, MLT1, MLT0 outputs A notes error standard deviation = 0.4301 The code for FMLT is displayed below. It uses inline versions of FAVG and FHDF, taking advantage of the identity (x/2 + y/2)^2 - (x/2 - y/2)^2 = x^2/4 + 2xy/4 + y^2/4 - x^2/4 + 2xy/4 - y^2/4 = 4xy/4 = xy Here's the code for MULT: MULT just adds fractions if the corresponding bit in the absolute value of X is equal to 1. The BBR commands just skip that addition if the bit in absolute value of X is zero. That absolute value is taken at the beginning if X is negative, and then bit 7 in the temporary zero page variable is set. At the end of the calculation, if bit 7 is set the result is negated. The only command not shown on this image is the RTS. There are two entries to the division subroutine. FDIV is the first, and does the error checking on the inputs. FDV2 is a little farther down and skips that error checking step. function name FDIV address BFB4 time (mean) (108) cycles description divide X by Y inputs X, Y registers used A tables used FABS, DIV6, DIV5, DIV4, DIV3, DIV2, DIV1, DIV0 outputs A notes error standard deviation = 0.4726, returns carry set if Y=$00 or Y=$80 or X=$80 or abs(X)>abs(Y) function name FDV2 address BFB7 time (mean) (88) cycles description divide X by Y, no error checking inputs X, Y registers used A tables used FABS, DIV6, DIV5, DIV4, DIV3, DIV2, DIV1, DIV0 outputs A notes error standard deviation = 0.4726 Here's the code for FDIV in the META/L editor: The multiplication and division functions are pretty important to the rest of the subroutines, so I wanted to make certain that they were actually close to correct. A nice thing about having two-parameter functions that can each only take 256 possible values is that it's easy to calculate all possible results for all possible combinations of inputs on a spreadsheet, and compare the results to an ideal target. The targets were all done in floating point, and only converted into single byte format at the last step. I was then able to compare the calculations of these subroutines against the target, and make histograms of the differences between calculation and target. There is another division method using logarithms, but it's only about 20 cycles faster than FDIV and the accuracy is pretty bad, so I didn't include the log/antilog tables in the file: Edited May 15 by Ed Minchau Quote Link to comment Share on other sites More sharing options...

Ed Minchau Posted May 15 Author Share Posted May 15 (edited) Vectors are objects with two or more numerical parameters. These can be passed to the vector subroutines in one of three ways: - the vector X and Y coordinates can be stored in the X and Y registers - the vector coordinates can be stored in two consecutive locations in zero page, the first of which is pointed to by X or Y - the vector coordinates can be stored in three consecutive locations in zero page, the first of which is pointed to by X or Y Single Vector operations: multiplication by -1 function name NEXY address BFC9 time (mean) 25 cycles description [X,Y] = [-X,-Y] inputs X, Y registers used A, X, Y tables used none outputs X, Y notes negates the vector in [X,Y] function name NEG2 address BFCC time (mean) 33 cycles description negate 2d vector pointed to by X inputs X registers used A tables used none outputs zero page notes the input vector in zero page pointed to by X is also the result vector function name NEG3 address BFCF time (mean) 45 cycles description negate 3d vector pointed to by X inputs X registers used A tables used none outputs zero page notes the input vector in zero page pointed to by X is also the result vector Single Vector operations: multiplication by 2 note that these operations do not check for overflow, they simply look up a doubling amount in the MBY2 table, which clips overflow at 7F or 81 function name VDXY address BFD2 time (mean) 21 cycles description [X,Y] = 2[X,Y] inputs X, Y registers used A, X, Y tables used MBY2 outputs X, Y notes will clip at $7F on overflow high or at $81 on overflow low function name VD2D address BFD5 time (mean) 37 cycles description double the 2d vector pointed to by X inputs X registers used A, Y tables used MBY2 outputs zero page notes the input vector in zero page pointed to by X is also the result vector; will clip at $7F on overflow high or at $81 on overflow low function name VD3D address BFD8 time (mean) 51 cycles description double the 3d vector pointed to by X inputs X registers used A, Y tables used MBY2 outputs zero page notes the input vector in zero page pointed to by X is also the result vector; will clip at $7F on overflow high or at $81 on overflow low Single Vector operations: multiply by a constant function name VML2 address BF93 time (mean) (220) cycles description multiply 2d vector (pointed to by X) by the value in Y inputs X, Y registers used A tables used FABS,HALF, MLT5, MLT4, MLT3, MLT2, MLT1, MLT0 outputs zero page notes the input vector in zero page pointed to by X is also the result vector function name VML3 address BF96 time (mean) (326) cycles description multiply 3d vector (pointed to by X) by the value in Y inputs X, Y registers used A tables used FABS,HALF, MLT5, MLT4, MLT3, MLT2, MLT1, MLT0 outputs zero page notes the input vector in zero page pointed to by X is also the result vector Edited May 15 by Ed Minchau Quote Link to comment Share on other sites More sharing options...

Ed Minchau Posted May 15 Author Share Posted May 15 (edited) Vector Length function name LEXY address BFC0 time (mean) 34 cycles description length of vector [X,Y] divided by sqrt(2) inputs X, Y registers used A tables used SQAR, SQRT outputs A notes length value is divided by sqrt(2) to keep it in the range [0..1] function name LEN2 address BFC3 time (mean) 48 cycles description length of 2d vector pointed to by X divided by sqrt(2) inputs X registers used A, Y tables used SQAR, SQRT outputs A notes length value is divided by sqrt(2) to keep it in the range [0..1] function name LEN3 address BFC6 time (mean) 63 cycles description length of 3d vector pointed to by X divided by sqrt(3) inputs X registers used A, Y tables used SQA3, SQRT outputs A notes length value is divided by sqrt(3) to keep it in the range [0..1] Edited May 15 by Ed Minchau Quote Link to comment Share on other sites More sharing options...

Ed Minchau Posted May 15 Author Share Posted May 15 (edited) Single vector trigonometric operations: function name RTXY address BFBA time (mean) (212) cycles description convert radius X angle Y to coordinates (X,Y) inputs X, Y registers used A, X, Y tables used FABS,HALF, MLT5, MLT4, MLT3, MLT2, MLT1, MLT0, FSIN, FCOS outputs X, Y notes if the radius is negative, then the absolute value of the radius is used and the angle changes by $80; radius of 00 or $80 returns [0,0] function name ATN2 address BFBD time (mean) (185) cycles description convert (X,Y) to an angle inputs X, Y registers used A, X, Y tables used FABS, DIV6, DIV5, DIV4, DIV3, DIV2, DIV1, DIV0, FLAT outputs A notes error standard deviation = 0.2881 I did a complete analysis of the ATN2 function, and it turns out it's the most accurate of all the functions; there's only one division in the operation, and when it uses the FLAT table, which returns the arctangent of the slope, the minor errors in division mostly don't matter: Edited May 15 by Ed Minchau Quote Link to comment Share on other sites More sharing options...

Ed Minchau Posted May 15 Author Share Posted May 15 (edited) Single vector: Normalization These subroutines are used if you want to convert a 2d or 3d vector into a vector with the same direction but unit length. function name NOXY address BFF3 time (mean) (292) cycles description normalize [X,Y] to unit length inputs X, Y registers used A, X, Y tables used SQAR, MBY2, FABS,HALF, MLT5, MLT4, MLT3, MLT2, MLT1, MLT0, NOR2 outputs X, Y, C notes carry set if input vector is [0,0] function name NRM2 address BFF6 time (mean) (340) cycles description normalize 2d vector pointed to by X to unit length inputs X registers used A, Y tables used SQAR, MBY2, FABS,HALF, MLT5, MLT4, MLT3, MLT2, MLT1, MLT0, NOR2 outputs zero page, C notes the input vector in zero page pointed to by X is also the result vector; carry set if input vector is [0,0] function name NRM3 address BFF9 time (mean) (495) cycles description normalize 3d vector pointed to by X to unit length inputs X registers used A tables used SQA3, MBY2, FABS,HALF, MLT5, MLT4, MLT3, MLT2, MLT1, MLT0, NOR3 outputs zero page, C notes the input vector in zero page pointed to by X is also the result vector; carry set if input vector is [0,0,0] Edited May 15 by Ed Minchau Quote Link to comment Share on other sites More sharing options...

Ed Minchau Posted May 15 Author Share Posted May 15 (edited) Two-vector operations: addition and subtraction These functions add or subtract the individual components of two vectors. Since this addition or subtraction could yield a value in the range [-2..2], the result is half the value of the sum or difference, to keep each component in the [-1..1] range. function name VAD2 address BFDB time (mean) 77 cycles description add two 2d vectors pointed to by X and Y inputs X, Y registers used A tables used HALF, ML3F outputs zero page notes result in [FD,FE] is half the length of the actual result to keep all components in the range [-1..1] function name VAD3 address BFDE time (mean) 111 cycles description add two 3d vectors pointed to by X and Y inputs X, Y registers used A tables used HALF, ML3F outputs zero page notes result in [FD,FE,FF] is half the length of the actual result to keep all components in the range [-1..1] function name VSB2 address BFE1 time (mean) 77 cycles description subtract two 2d vectors pointed to by X and Y inputs X, Y registers used A tables used HALF, ML3F outputs zero page notes result in [FD,FE] is half the length of the actual result to keep all components in the range [-1..1] function name VSB3 address BFE4 time (mean) 111 cycles description subtract two 3d vectors pointed to by X and Y inputs X, Y registers used A tables used HALF, ML3F outputs zero page notes result in [FD,FE,FF] is half the length of the actual result to keep all components in the range [-1..1] Edited May 15 by Ed Minchau Quote Link to comment Share on other sites More sharing options...

Ed Minchau Posted May 15 Author Share Posted May 15 (edited) Two-vector operations: dot product and angle between two vectors The vector dot product is an intermediate step on the way to finding the angle between two vectors. There's a 2d and a 3d version. function name DOT2 address BFE7 time (mean) (163) cycles description dot product / 2 of two 2d vectors pointed to by X and Y inputs X, Y registers used A, X, Y tables used HALF, ML3F outputs A notes result is half the dot product to keep it in the range [-1..1] function name DOT3 address BFEA time (mean) (263) cycles description dot product / 3 of two 3d vectors pointed to by X and Y inputs X, Y registers used A, X, Y tables used HALF, ML3F, M2/3, M1/3 outputs A notes result is one-third the dot product to keep it in the range [-1..1] The angle between two vectors is given by the equation theta = arccosine ((VdotW)/((LenV)(LenW))) If the carry is clear when calling this function the result will be as above. If the carry is set, the result will be the negative of theta. function name ANG2 address BFED time (mean) (492) cycles description angle between two 2d vectors pointed to by X and Y inputs X, Y, C registers used A, X, Y tables used SQAR, SQRT, FABS, DIV6, DIV5, DIV4, DIV3, DIV2, DIV1, DIV0, HALF, ML3F outputs A notes calculates theta=arccos((X dot Y)/((len X)(len Y))); if carry is set on input, the output angle will be negative function name ANG3 address BFF0 time (mean) (619) cycles description angle between two 3d vectors pointed to by X and Y inputs X, Y, C registers used A, X, Y tables used SQA3, SQRT, FABS, DIV6, DIV5, DIV4, DIV3, DIV2, DIV1, DIV0, HALF, ML3F, M2/3, M1/3 outputs A notes calculates theta=arccos((X dot Y)/((len X)(len Y))); if carry is set on input, the output angle will be negative Note that the dot product is divided by 2 in 2d and divided by 3 in 3d; similarly the lengths are divided by sqrt(2) in 2d and by sqrt(3) in 3d; when combined in the ANG2 and ANG3 functions, these scaling factors all cancel out. Edited May 16 by Ed Minchau Quote Link to comment Share on other sites More sharing options...

Ed Minchau Posted May 15 Author Share Posted May 15 (edited) Vector Cross Product The cross product is the vector perpendicular to the plane described by two vectors; the cross product is a vector that is perpendicular to both of the given vectors. This only works in 3d. This is very useful for things like light reflection when ray tracing. The resulting vector is half of the actual cross product, to prevent results outside the [-1..1] range. The cross product result vector will be stored in zero page at FD, FE, and FF. function name CROS address BFFC time (mean) (486) cycles description cross product / 2 of two 3d vectors pointed to by X and Y inputs X, Y registers used A, X, Y tables used HALF, ML3F outputs zero page notes result vector in [FD,FE,FF] is half the length of the actual cross product to keep all components in the range [-1..1] Edited May 15 by Ed Minchau Quote Link to comment Share on other sites More sharing options...

BruceMcF Posted May 16 Share Posted May 16 On 5/13/2022 at 3:48 PM, Ed Minchau said: If so I would add about 4kb more in tables and code and the remaining 4kb would echo the Kernel redirects and JSRFAR etc from bank 4. This is exactly the kind of ROM home I would like to see for a Sweet16 VM. 1 Quote Link to comment Share on other sites More sharing options...

Ed Minchau Posted May 17 Author Share Posted May 17 On 5/16/2022 at 5:11 PM, BruceMcF said: This is exactly the kind of ROM home I would like to see for a Sweet16 VM. Zsound should probably find a home in ROM, too. Quote Link to comment Share on other sites More sharing options...

BruceMcF Posted May 18 Share Posted May 18 (edited) On 5/16/2022 at 9:29 PM, Ed Minchau said: Zsound should probably find a home in ROM, too. While I concur, when I was talking about a "home in ROM", even though my open source Sweet16VM is "fat" compared to Woz's masterwork of spaghetti 6502 code ... it's still under 0.75K, so it it's going to find a home in ROM, it has to ride along with something else that is (1) more substantial but (2) under 16K in its own right. The alternative is to implement enough applets using the Sweet16 VM to justify putting the whole collection in a ROM, but I am working 11 hour shifts at present, so I probably don't have the time for that Edited May 19 by BruceMcF 2 Quote Link to comment Share on other sites More sharing options...

## Recommended Posts

## Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.