Jump to content
geek504

My First C64/X16 Programs!

Recommended Posts

Just fooling around with C64 BASIC and the new X16 BASIC commands! Code is too simple to share compared to some other games and demos you guys made!

It's been a while trying to use variable names with only 2 unique characters!

"PALETTE.BAS"

x16-palette.PNG.97d2f7729973fe03515dcfc555ed3a73.PNG

"MANDELBROT.BAS" - Took forever to generate this 150px by 150px by 16 greyscale. 6502/200MHz please!

x16-mandelbrot.PNG.0514c0f73057c26e4c269744e8c6c4cc.PNG

  • Like 8

Share this post


Link to post
Share on other sites

Look at the demo directories there are some Mandelbrot already. Two 320x240@256 versions ... it is running some 20-30 minutes for a full run.

Share this post


Link to post
Share on other sites
5 minutes ago, SerErris said:

Look at the demo directories there are some Mandelbrot already. Two 320x240@256 versions ... it is running some 20-30 minutes for a full run.

You used pure C64 BASIC right with all those POKE's 🙂 I used the new X16 BASIC commands and is somewhat smaller:

20 SCREEN 128
30 MAXDWELL = 150
40 COLRS = 16
50 NROW = 150
60 NCOL = 150
70 YOFFSET = 1
80 XOFFSET = 1
90  INPUT "LOWER LEFTHAND CORNER, REAL PART"; AA
100 INPUT "LOWER LEFTHAND CORNER, IMAG. PART"; BB
110 INPUT "LENGTH OF SIDE"; SIDE
120 CLS
140 LINE 0, 0, NCOL + XOFFSET, 0, 16
150 LINE NCOL + XOFFSET, 0, NCOL + XOFFSET, NROW + YOFFSET, 16
160 LINE NCOL + XOFFSET, NROW + YOFFSET, 0, NROW + YOFFSET, 16
170 LINE 0, NROW + YOFFSET, 0, 0, 16
250 HIGHDWELL = 0
260 GAP = SIDE / NROW
270 AC = AA
280 FOR X = XOFFSET TO NROW - 1 + XOFFSET
290 AC = AC + GAP
300 BC = BB
310 FOR Y = YOFFSET TO NCOL - 1 + XOFFSET
320 BC = BC + GAP
330 AZ = 0
340 BZ = 0
350 CNT = 0
360 SZE = 0
370 IF (SZE < 4) AND (CNT < MAXDWELL) GOTO 380  
375 GOTO 470
380 TEMP = AZ * AZ - BZ * BZ + AC
390 BZ = 2 * AZ * BZ + BC
400 AZ = TEMP
410 SZE = AZ * AZ + BZ * BZ
420 CNT = CNT + 1
425 GOTO 370
470 IF (CNT < MAXDWELL) AND (CNT > HIGHDWELL) THEN HIGHDWELL = CNT
480 IF CNT = MAXDWELL THEN PSET X, NROW - Y + 1, 16: GOTO 490
483 RE = CNT-INT(CNT/(COLRS-1))*(COLRS-1) 
485 PSET X, NROW - Y + 1, RE + 1 + 16
490 NEXT Y
520 NEXT X
530 GET A$ : IF A$="" THEN GOTO 530

Your full screen Mandelbrot ran for 20-30 minutes running in BASIC? I was impatient and let it run overnight so I don't know how fast my code was even if a smaller 150x150.

I will try cc65 and see much faster it becomes... unless you already did that 🙂

Share this post


Link to post
Share on other sites
Posted (edited)

And much slower . I write the color directly to VRam using the autoincrememt feature of Vera. But this is also a very good example more BASIC stile and good to understand. The issue with C64 Basic and also this version is that good to read is heavily inefficient (read slow).
 

and to speed up on the initial diagram you should reduce maxdwell to 30. You will not be able to see any difference in that resolution ... 

I am not sure if anyone have done a c implementation. I next aim for a assembler version. The required math make my head spin already. We need long int mul/div ... phew 

Edited by SerErris
  • Like 2

Share this post


Link to post
Share on other sites

I am working on a version in Prog8 that uses floating point via the kernel routines (so in the end it may not be that much faster than a basic variant, but we'll see once it is finished)

Share this post


Link to post
Share on other sites
Posted (edited)

Btw: I can't get the fractal256.bas to work (garbled screen) and fastfract256 doesn't work either (seems to be missing part of  the code at the end).   I'm looking in the x16-demo git repository. So I used part of their source to create my own basic version of the mandelbrot program.

Turns out the prog8 compiled one is significantly faster (4x) even though it is using the same ROM routines as Basic does for the actual floating point calculations. It took slightly more than 11 minutes to finish the task (256 * 200 * 16 image) whereas the basic one took 45 minutes for the identical picture.

 

Edited by desertfish
  • Like 1

Share this post


Link to post
Share on other sites

Hi fractal256.bas works. Tried it from the repository. It looks like a garbled screen, that is even getting more garbled .. however after 10 seconds or so you seen in the first line some ping/violett movement ... that is the fractal creeping over the screen - not not fast :-).

fastfractal256.bas was unfortunately an upload missing a lot of thing, not sure how that happened. I attached the correct version here ... Hope that helps in the meantime. 

Ah and of cause - these files need to get copy and past into the emulator. They are not BAS files in the meaning of a X16 save command (not coded, but pure text).

BTW: The garbled screen exists because both programmers (fract256 and me)  were to lazy to wipe the screen before activating it. The wipe process is pretty slow.

Also in the two programs, you see different VRAM layouts. fract256 starts at $0000 with the bitmap and therefore overwriting the CHARRAM of Layer0. Also the CHARROM is kopied to $0F800 and is sitting in the middle of the screen, where you can see it. That is overwritten by fract256 as well. After the program finishes, you need to reset the C64 cause you cannot read anything anymore.

My approach is preserving the CHARRAM of Layer0 by starting at $4000 and copying the CHARROM to $1F000 and changing the Tilebase to preserve it. Works great.

I also believe that the CHARROM should be copied by default to $1F000 to keep it out of the way. It sits really in a spot were it hurts for 320x240 modes.

fastfract256.bas

Share this post


Link to post
Share on other sites
Posted (edited)
8 hours ago, desertfish said:

Turns out the prog8 compiled one is significantly faster (4x) even though it is using the same ROM routines as Basic does for the actual floating point calculations.

Prog8 - that's a very interesting language you developed. What's the history of the language?

I can't see how one can make it faster other than perhaps write a fixed point scaled-integer math library (faster but less precision) OR ask for some sort of coprocessor chip (maybe inside VERA itself?) with various mathematical functions including vector math to allow for some cool 3D games! Actually I saw someone create a 16-bit LOOKUP table for the fixed point scaled-integer math for various math funtions and burned into 1Mx8 EPROM pairs. How cool would it be to have INSTANT math calculations on a "simple" 8-bit machine? Many geek points there...

It's been a while and I didn't have time to do some manual readings BUT how many bits wide are C64/X16's math functions? I have been in 32-bit/64-bit land far too long...

Edited by geek504

Share this post


Link to post
Share on other sites

The 6502 (and 65c02) only have 8 bit add and subtract. The carry flag has to be used to perform operations on any larger values, and multiplication/division must be done with loops and bit shifts.

Share this post


Link to post
Share on other sites

@geek504 Discussing Prog8 is probably best done in the topic I made in the "X16 General Chat" subforum?  Can you perhaps ask your question again there?

Also the C64 (and CX16's) math functions operate on 5-byte  floats i.e. 40 bits.   Internally they work with 6 bytes I think for intermediate rounding precision, but the float values stored in memory occupy 5 bytes.  Of course all floating point operations are implemented in software in the ROM

  • Like 1

Share this post


Link to post
Share on other sites
Posted (edited)
On 8/31/2020 at 4:47 PM, desertfish said:

Also the C64 (and CX16's) math functions operate on 5-byte  floats i.e. 40 bits.   Internally they work with 6 bytes I think for intermediate rounding precision, but the float values stored in memory occupy 5 bytes.  Of course all floating point operations are implemented in software in the ROM

Four bytes of that are the number part, the "mantissa", with the fifth byte the exponent ... the "x10^[__]" part in scientific notation, but since it is a binary floating point it is really "x2^[___]".

This is more precision than "standard" 32bit float, and for most floating point applications that the CX16 can handle makes long (64bit) floats pretty much redundant. A standard 32bit float is 23 bits mantissa but thanks to a trick it represents a 24 bit ... three byte ... numeric part, because floating point slides the binary number until the leading bit in front of the "binary" point (not "decimal" point) is a "1", and if you know what it is, you don't have to store it. IOW, if the result of an operation is 0.0011011101...x2^12, that is converted to 1.1011101...x2^9, and only the bits after the binary point are stored. That is an unsigned value, with the sign of the mantissa in the high bit of the floating point number and bits 23-30 as an unsigned value that represents  (exponent+127), so 2^0 is binary 127 ($7F).

So standard floating point numbers can PRECISELY represent integers from +/-16,777,216 ... about +/-16.7 million. Outside of that range, they can only precisely represent integers that have a appropriate power of 2 as a factor.

By contrast, the Microsoft 6502 "extended" floating points (at Commodore's insistence) can precisely represent integers from +/-4,294,967,296 ... about +/-4.2 billion. The reason for Commodore's insistence is if you do exact accounting, you actually represent dollar values as an integer number of CENTS, so standard 32bit floats can only precisely represent +/-$167,772.16, and to Commodore's way of thinking, that wasn't big enough. A simple eight digit calculator can do better (using signed-magnitude Binary Coded Decimal arithmetic) .... +/-$999,999.99 ... and they weren't going to have an expensive computer system beat by an eight digit calculator!!!

This is actually twice the range of xForth's "double cell" integers, because the floating point are sign+size, while Forth has native signed integers that run from -2,147,483,648 to +2,147,483,647. So while floating point is generally LESS "precision" than scaled fixed point of the same size ... C64 floating point is actually roughly twice as precise as scaled signed 32bit fixed point (it's the same precision as 32bit unsigned, because if the data is unsigned then the mantissa sign flag being clear doesn't give any extra information).

Still, +/- 2 billion tends to be enough for lots of purposes when you have numbers that don't fit into the signed +/-32,000 ish or unsigned 64 thousand ish of 16bit integers.

Edited by BruceMcF
  • Like 3

Share this post


Link to post
Share on other sites

@BruceMcF very refreshing history! I am happy for Commodore's insistence and have a useful extended floating point representation for the X16.

I guess to have an external co-processor would make those ROM routines useless and that doesn't seem to be in the spirit of 8-bit computing... I guess having an external expansion card that functions as a co-processor or as look-up tables would very much be in the spirit of 8-bit hacking!

Just food for thoughts...

  • Like 1

Share this post


Link to post
Share on other sites

An external math co processor could be done on an IO extension maybe. But that is most likely even not doing any floatingpoint math... a good mul/div would be already great for integers that are longer than 16bit (e.g. 32x32=64bit mul). But to be honest, that is not really required for any games... 16bit math would be good enough. A math chip would "only" speed up the calculation. Esp. if you are talking about wireframe graphics like ELITE, the limited CPU is really slowing it down. However ... all the 8bit computers had to live with no multiply available. 

The 6502 series even need to live to that for any addition it needs to go to the memory. 

This is now deviating from the original topic a lot. I have no clue how the math co processors (8087 for instance) actually worked, but for 6502 there is no such thing available. A math coprocessor would be something that has IO address space mapping for example two data registers and a command register. The two data registers can then be multiplied, divided, squared etc. pp. The biggest issue would be the handover or wait for the coprocessor to execute the result and then continue. The 6502 does not know any of those integrations. Maybe a brk could work, but any interrupt (not just the coprocessor) would reset it. Maybe a quick loop looking at a flag register would do. However the whole thing would be asynchronous and therefore difficult to implement.

Interesting read: https://retrocomputing.stackexchange.com/questions/9173/how-did-the-8086-interface-with-the-8087-fpu-coprocessor

Edited by SerErris

Share this post


Link to post
Share on other sites
22 minutes ago, SerErris said:

However ... all the 8bit computers had to live with no multiply available. 

This was a surprising discovery for me, since the only 8-bit assembly I did before learning 65C02 for the X16 was Motorola/Freescale 68HC11, which did have 8x8 multiplication, thanks to the ability to combine the 2 8-bit accumulators (A and B) into a single 16-bit accumulator (D). I had assumed all these years that that was standard for a lot of 8-bit CPUs, but was actually a very specialized feature to support the 68HC11's signal processing capabilities, being a primarily embedded microcontroller variant of the 6800, which did not have built-in multiplication despite having the same register structure.

  • Like 1

Share this post


Link to post
Share on other sites
1 hour ago, SerErris said:

But to be honest, that is not really required for any games... 16bit math would be good enough. A math chip would "only" speed up the calculation. Esp. if you are talking about wireframe graphics like ELITE, the limited CPU is really slowing it down.

I was thinking exactly for 3D games (e.g. Wolfenstein 3D). With the asynchronous aspect you brought up, I believe that the best would then be the 16-bit look-up tables. It'll only be a matter of memory fetching the pre-computed answer from ROM akin to how VERA works. Look at the link: http://wilsonminesco.com/16bitMathTables/

It provides 6502 code implementation for the look-up table via BUS, SERIAL, PARALLEL, and MEMORY MAP I/O.

  file name    table size    comments
   SQUARE.HEX     256KB    partly for multiplication.  32-bit output
   INVERT.HEX     256KB    partly for division, to multiply by the inverse.  32-bit output.
   SIN.HEX        128KB    sines, also for cosines and tangents
   ASIN.HEX       128KB    arcsines, also for arccosines
   ATAN.HEX        64KB    ends at 1st cell of LOG2.HEX (next)
   LOG2.HEX       128KB    also for logarithms in other bases
   ALOG2.HEX      128KB    also for  antilogs  in other bases
   LOG2-A.HEX     128KB    logs of 1 to 1+65535/65536 (ie, 1.9999847), first range for LOG2(X+1) where X starts at 0
   ALOG2-A.HEX    128KB    antilogs of 0 to 65535/65536 (ie, .9999847), the first range for 2x-1
   LOG2-B.HEX     128KB    logs of 1 to 1+65535/1,048,576 (ie, 1.06249905), a 16x zoom-in range for LOG2(X+1)
   ALOG2-B.HEX    128KB    antilogs of 0 to 65535/1,048,576 (ie, .06249905), a 16x zoom-in range for 2x-1
   SQRT1.HEX       64KB    square roots,  8-bit truncated output
   SQRT2.HEX       64KB    square roots,  8-bit  rounded  output 
   SQRT3.HEX      128KB    square roots, 16-bit  rounded  output
   BITREV.HEX     128KB    set of bit-reversing tables, up to 14-bit, particularly useful for FFTs
   BITREV15.HEX   128KB    15-bit bit-reversing table (not included in EPROM)
   MULT.HEX       128KB    multiplication table like you had in 3rd grade, but up to 255x255

   MathTbls.zip            all the tables, zipped, including BITREV15.HEX which is not in the supplied EPROMs
   ROM0.HEX                a single Intel Hex file for ROM0 as I plan to supply it (also available zipped)
   ROM1.HEX                a single Intel Hex file for ROM1 as I plan to supply it (also available zipped)

 

Edited by geek504
  • Like 2

Share this post


Link to post
Share on other sites

That is by far too much memory .. we only have max 2MB. Yes that would be very fast - no this is not working cause you cannot do anything anymore besides doing calculations. 

There are other efficient solutions, that still rely partially on lookup tables, but that is tooo much.

Regardless of the size of the tables and how fast they are to do the 3d calculations, I cannot see that you get something like Wolfenstein 3D or any filled graphics working in X16. The fill of the areas need to be done by the CPU. And filling in VRAM an oddshaped part of the screen is not fast. It is even slower than filling any memory area, because you need to load the VERA ADDR registers frequently. Each change is a 3 byte write to VERA register for a single byte of data - IF the data is not in any way automatic incrementable. And that is most likely the case with any vector graphics.

Eventually you would end up to copy the whole screen from a memory buffer to Vera every time, and that alone will be relatively slow. estimated 10 cycles per pixel in 256 color mode. in 16 color mode you can do 2 pixel in 10 cycles. That will be still 384.000 cycles per frame (320x240). That is roughly 1/20 second if the 65c02 is running @80mhz. In that resolution you could bring out 20fps copy process. However you need to do all the heavy lifting (e.g. calculating the next image and stuff) and write all that to the normal RAM. That will be another 10fps at best (just the loop and the write) .. and will already half your FPS to 10 fps. and depending on the difficulty of calculation ... it is really not much air. 

So you would need to reduce the viewport massively. Full Screen DOOM or Wolfenstein is not realistic. (I know you did not mention Full Screen). 

Edited by SerErris

Share this post


Link to post
Share on other sites
14 minutes ago, SerErris said:

That is by far too much memory .. we only have max 2MB. Yes that would be very fast - no this is not working cause you cannot do anything anymore besides doing calculations. 

There are other efficient solutions, that still rely partially on lookup tables, but that is tooo much.

I don't think any one program would need all of these, unless it was a scientific calculator or really advanced spreadsheet. If you need to do one or two of these calculations very often, it would be totally worthwhile to take up the space in banked RAM.

Share this post


Link to post
Share on other sites
19 minutes ago, SlithyMatt said:

I don't think any one program would need all of these, unless it was a scientific calculator or really advanced spreadsheet. If you need to do one or two of these calculations very often, it would be totally worthwhile to take up the space in banked RAM.

I was thinking to use an external card using Memory Map I/O akin to VERA chip with its own memory banks, i.e. the external 2MB RAM accessible by the 32 bytes I/O memory. The X16's RAM would be preserved.

I guess we can forget about shaded 3D but maybe a fast game using wireframe instead? Last resort... 200MHz 65c02!

sw2.jpg.067f25f80f735c8b7abeadd264b1349f.jpg

sw1.jpg.7744ca2e87911acbf2c366914999907b.jpg

Share this post


Link to post
Share on other sites
27 minutes ago, SlithyMatt said:

I don't think any one program would need all of these, unless it was a scientific calculator or really advanced spreadsheet. If you need to do one or two of these calculations very often, it would be totally worthwhile to take up the space in banked RAM.

Sure, but even if you take the smallest of the tables above, it is 64kb ... so several banks (actually 8 ). And each lookup need to lookup first the bank, than switch the bank and then lookup the value. Yes it is possible, but also slower than expected. Larger banks will even increase.

12 minutes ago, geek504 said:

I was thinking to use an external card using Memory Map I/O akin to VERA chip with its own memory banks, i.e. the external 2MB RAM accessible by the 32 bytes I/O memory. The X16's RAM would be preserved.

 

Using the SD card ... not sure how long you have to wait for a single load and also not sure how that even shall work. The SD card will most likely still emulate a IOdevice. But maybe I am wrong. Even if you can access the SD card on block level you need to setup the VERA registers, calculate what block you need, load the full block somewhere and then select the Value inside of the block. So most likely that is also not a good in the meaning of fast lookup procedure. It might be even slower than calculating it. Maybe someone has a good idea how that could be implemented, but I cannot see a fast way. Of cause we do not know how access through VERA to SD card will work.

 

Share this post


Link to post
Share on other sites
14 minutes ago, SerErris said:

Using the SD card ... not sure how long you have to wait for a single load and also not sure how that even shall work.

Not using SD card nor using VERA. Another independent circuit inserted into one of X16's slots as per spec below:

Expansion

  • Four expansion slots with access to CPU databus
  • Each slot has its own 32-bytes of mapped RAM
  • 8 general-purpose I/O lines available (user port)

Share this post


Link to post
Share on other sites

Ah ... got it. That might potentially work, but will be quite expensive. You need a card, and all of that and then the SD card itself (which will be minor cost obviously compared to the rest). But yes - might work.

Share this post


Link to post
Share on other sites
7 minutes ago, SerErris said:

Ah ... got it. That might potentially work, but will be quite expensive. You need a card, and all of that and then the SD card itself (which will be minor cost obviously compared to the rest). But yes - might work.

what's wrong with the good old ROM chips? Actually EEPROMs would be better.

Share this post


Link to post
Share on other sites

Thanks for posting the tables, Mr. Geek 504.  It is a Big Gulp for sure, but it's always nice to discuss options, be it ROM or a card or whatever.  Wolf3D on the X16.... whoo....

 

Did you say you were new to the Commodore Line of Stuff?  You do know about the KERNAL though, right?  And the X16's pseudo-16-bit routines?  And Bruce's variant of SWEET16?

 

Edited by rje

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...

Important Information

Please review our Terms of Use