Jump to content

Snickers11001001

Members
  • Posts

    130
  • Joined

  • Last visited

  • Days Won

    5

Everything posted by Snickers11001001

  1. Wow, that's actually quite mature already. Is there a 'howto' to try it out on the X16 emulator? Is it a custom rom or something?
  2. Love, love love these projects. If I can posit any advice from the cheap seats (i.e., from someone like me who absolutely has no ability to make such a thing, but who is a user of BASIC and enjoys it), it would be this: There are things that BASIC needs but not everything from 'gen 3' basics like visual basic need to be included or are feasible. Remember at bottom this platform is still an 8 bit machine with no prefetch, no branch prediction, no modern pipelining, no cache memory, almost no maths, and a 16 bit address space. Its running at 8mhz, maybe, unless they (a) can't get it to work at that speed, in which case it will be a 4mhz machine, or (b) if they release the faster X8 FPGA version (that won't have the banked memory, ZP space, or space in the pages below $0800 that the X16 has) which probably won't be able to fit all your features anyway. Just take a look at the discussion near the end of the "BASIC 2 vs BASIC 7" thread and the impact of just a few more instructions in the byte fetching/parsing core between the C64 and the later Plus/4 and C128 in terms of the negative impact on performance for the later machines with better BASICs. If you're having to bank-switch, for example, surely it takes a hit. Tokenizing a lot of stuff inline (e.g., constants, jumps, variable memory locations) is a great idea, but I suggest a simple escape code structure using byte codes. Parser finds, say petscii code for '@' not inside quotes and it knows the next two bytes are a small-int in 16 bit signed format; it finds petscii code for [english pound] (which looks a bit like a mutant F), it knows the next 5 bytes are the exponent and mantisa for a float; it finds token for 'goto' or 'gosub' it knows the next two bytes are the actual 6 bit address for the destination of the jump in the code, instead of the petscii numeric representation of a line number; it finds petscii code for "%" it knows the next two bytes are the 16 bit address to the value followed by the name of an int style variable in memory. (At execution it just needs to fetch the value, during LIST operation it grabs the variable name at that address+2 until it hits the terminator). Yeah, OK, the modern way to do many of these things would be with a hash table, but I caution you to consider the performance impact on an 8 bit machine. If you use the idea of inlining 16 bit addresses for jump locations to speed up execution, of course, then there are other issues. With line numbers, your "LIST" routine needs only follow the 16 bit address and then grab the line number at that address and put it on the screen during a 'LIST"; but with LABLES, you will need to set up a data structure (probably a linked list) that can be consulted by the interpreter during code LIST operations to regurgitate the labels or line numbers when the user lists the code and that metadata has to get saved with the program. That's actually a better place to use banked memory... the performance cost of swapping banks is not as important when listing the code. I don't think its feasible to tokenize at runtime, it needs to be as you enter things.
  3. Check out the links provided they really do help. RND takes one of three parameters. RND(1) [or any positive parameter, generally] generates a random number between 0 and 1 based on the next 'seed' -- an internal number that is behind the maths of this pseudo-random number generator) -- and updates the seed. RND(0) does the same thing but uses a byte of the internal clock as the seed. RND with a negative number [i.e., RND(-2.39997)] uses the supplied parameter as, in essence, the seed. Which means calling it with the same neg. number gives the same random number between 0 and 1. So A=RND(1) is the basic function. But to get what you want, you have to do something with that random decimal value it spits out, which again is generated as a number greater than 0 and less than 1. If you want an integer (no decimal fraction) number from 0 to 9 then reason it out: You need to us the INT() function too. And you take that decimal number between 0 and 1 and multiply it by the highest number you want + 1 (assuming you want the entire range of 0 to the highest number) R=INT(RND(1)*10) will cause the variable 'R' to have a random number from 0 to 9. (Think about it, even if the RND() function itself returns .99999999 multiplying it by 10 gets you 9.999999 and then taking the INT() function of that returns a 9. About the INT() function. Its not 'rounding' in the sense you might be used to, i.e., in the event the fractional component is greater than .5 it does not 'round up' Instead, INT() is a 'floor' style rounding, which means the output is always LESS than the input. With a positive number, say INT(9.9999), it returns the next integer that is LESS than the value supplied, which is to say it disregards the decimal fraction, i.e., it returns 9 in this example. But with a negative number, say INT(-10.999), it ALSO returns the next integer that is LESS than the value supplied, i.e., -11 here. All of that is why INT (RND(1)*10) gives you a value between 0 and 9 So, if you want it to return two random single digit numbers, let's say you want to put them in variables to use later you need two calls of this function. 10 N1=INT(RND(1)*10) 20 N2=INT(RND(1)*10) 30 PRINT N1, N2 EDIT: Just read more carefully and saw your thing about wanting either a 1 or 2... that means you multiply by the DIFFERENCE between the lowest number and highest number you want +1. To have a lower boundary you just add it as the base to your calculations, i.e, N1 = 1+INT(RND(1)*2) N2 = 1+INT(RND(1)*2) To test these, try this FOR/NEXT loop from the command line: FOR L=1 TO 10: PRINT 1+INT(RND(1)*2):NEXT After it prints its sequence of 1s and 2s, scroll your cursor up and hit enter on the same command again and watch the sequence change. Viola. Pseudo randoms of either 1 or 2. EDIT: Geez, I was pretty sloppy in answering this... so I went in and fixed some typos and added clarity. Substance is the same, it just reads better.
  4. I'm glad you're still working on this. I don't care what happens with the X8/X16 thing/release. I will play this game on the emulator you specify if that's all that's available. Looks cool and its still really neat to watch someone move a project along like this in sorta real time.
  5. Strider... good one. I remember watching some play-throughs of the NES version on youtube some time ago, and laughing at what the 'changes in battle' effect did to their video compression. Artifacts galore! LOL.
  6. As long as it fits in 256 bytes! Now you're triggering the dark underbelly of nostalgia, LOL! I remember a buddy gnashing his teeth for days trying to fit a sprite animator thing he was making/adapting into the tape buffer on his C64. I think he wound up throwing away the custom character set for his game and using the C64 ram at $C000 since it was 4K of space.
  7. I don't think there would 'be' any other 8K. As I understand it (and I should emphasize that as Bruce mentioned, its all unofficially as a result of surmises and speculations given we don't have formal docs of X8 functionality), there's no "ram under rom" under the BASIC and KERNAL on X8, so I don't see how there'd be an alternative 'page 1' of RAM at $A000 to $BFFF. There's only 128K of RAM for the entire FPGA, 64K of it is VERA vid mem and the other 64K is mapped to the 65c02 address space. If the stuff at $A000 to $BFFF is 'kernal buffers and cbdos' stuff then that is what exists at that location. There's no where else that could hold 8K of what the X16 maps in as 'bank 1' ram for that range. Its really only a headache for those who utilize BASIC and want to use helpers / wedges etc. I would presume if you don't need kernal or basic, you can use their address space as ram (and absent kernal's need for $A000 to $BFFF you could use that too) ... but I must be clear that's just a presumption on my part. The X8 FPGA could be set up so its logic simply does not permit writes to those ranges. I have no idea how to read verilog and the sources are 6 months old anyhow. And don't forget that even though VERA is reduced to 64K video ram on the X8, if you're not using all of it for display, some of it can be a convenient place to park 'pure data' -- sounds, gfx, music data, etc., especially with what should be fairly easy access through that 256 byte window. In the end, I'll buy the X8 if its all that comes out, but I'm personally cheering for and wanting to hold out for the X16 as it was spec'd -- whether in through-hole or FPGA only -- for many reasons.
  8. Sheesh. So on the X8, we would lose the ability to put ML code BASIC helpers/wedges at $0400 to $04FF, $0500 to $05ff, and part of page at $700. Additionally, since there's no longer banked ram, the 8K at $A000 to $BFFF that on the X16 is 'page 0' of the banked ram and reserved 'for KERNAL/CBDOS variables and buffers" is the only RAM in that range. So, uh, if you want to include a machine code routine as a BASIC helper (such as sound effects like the simplest sound library, or a little routine to handle collision detection, or your own interrupt handler for music, or a wedge to implement some currently unavailable bitmap stuff from BASIC (e.g., circle, flood fill)), etc., where do we put it? Sounds like it has to go in the 39K of actual BASIC memory now. Anyone know the 'pokes' for the X16/X8 in terms of the pointer to move down the top of BASIC memory like addresses $37 and $38 on the C64?!
  9. I've been going through Matt's videos. I am _not_ an assembly guru or even passably good at it, so forgive me if I'm all wet. But there's 2 data ports on the VERA (well, at least X16 VERA, the X8 is a whole different ball of wax apparently). You have to set things up to point to the VERA address and port you want and the stride. The VERA memory range includes (a) the video memory; (b) the PSG registers; and (c) the sprite registers. (also palette). Three different activities in the structure of a game code and only two ports into that space, which require set-up to read/write. That's all I meant and is the sort of contention I had in mind when I wrote that. It seemed to me that if you've got a routine using VERA data ports 0 and 1 to move some video data in preparation for a scroll or context change, or to work the VERA sprite registers; then if an interrupt for music fires, the music routine must include code to save the what data port was selected, the VERA address the data port had been pointing at as well as the stride value, and then restore all this before exiting back to regular execution. Otherwise when the program flow gets back to what was happening before the audio code, the data port/stride stuff will have wrong values. Seems to me you'd have to store/restore: $9F20, $9F21, $9F22, and $9F25 at least. Looks like those cover the L,M,H portions of the VERA address, the inc/dec stuff, and the port select bit. So to store that's 4 LDA absolutes at 4 cycles each, plus 4 STAs at 4 cycles each (3 each if you have 4 ZP addresses you can set aside to be temp storage for these or I guess you could push them on the stack if you're sure it will have room), and the same when your music handler is ready to exit, the reverse to put everything back. Reading through this now, I realize I'm stuck in the 'long ago' and thinking at C64 cpu speeds and the old 'cycles per refresh' thing, but those cycles are probably of negligible impact with the 8mhz clock on the X16. Still, until I heard the extra timing considerations on the YM chip in this thread, I had thought it might be easier to just have a music routine that needn't concern itself with saving/restoring VERA state info and just play the dang music. Obviously very naïve thinking, as it turns out. I meant no offense at all at what you're doing, and I thought I made it clear that I love this demo. I hope my ramblings weren't taken as any sort of criticism. Cheers.
  10. Do you have any links to the source of those addresses/pages on the X8? Is the memory at page $0600 affected? What 'part' of page $0700 is also spoken for?
  11. This is awesome! Great work. How well do you think this can mesh with using vera moving sprites, moving/scrollling tiles, and updating lots of vid data? I think that's been my concern about using VERA for music and not just sound effects: Contention and resource traffic jams especially for doing a game. The nice thing about the YM chip it has seemed to me in theory was the possibility of having some nice interrupt driven music that did not need to use a bunch of cycles each interrupt storing and restoring all the VERA registers and state information. I'll be following this with interest!
  12. Absolutely disgusting. Short version, its tulip mania (on purpose) designed to puff up a bubble...
  13. I'm working through Matt's videos, and the format would be hard to follow -- stop, pause, 'oh what did that say' - or 'dang it, I didn't catch that part of the code on the screen, gotta rewind' -- except that he also did some color coding which he incorporated into his scripts, and then includes links to the code etc in the description. That's pretty helpful. I think with any sort of code, you've got to have a written form that folks can look at and refer to beyond just video, especially if they want to go back and look later while they're trying something out. BTW, don't use the 'convert/optimize" thread I did as any sort of a template. I was pretty good at BASIC back in the day but then had a career in a completely non computer field before retiring . This year I picked up BASIC again for the first time and the optimization thread was literally me writing that up AS I was going through the process of trying to figure out how to implement optimization ideas I had spotted. (I revisited it again recently as my game project had to go on hiatus until the 'change in product direction' kerfuffle sorts itself out). But that entire exercise would have been better and more concise if I had planned and outlined it more from the start, and then applied the 20% rule (i.e., 'cut 20% from your first draft no matter what' to force brevity and focus) before putting everything up. As it was, I had some notes on things to try and a sort of order, and just typed right into the forum 'new post' window until I covered what I wanted before making a few screen shots and that was it. No editorial processes at all and it probably shows. Looking forward to your series.
  14. EDITED: Well, I tried both the Plus/4 and C128 emulators and, on both, the following worked fine: 10: A$="" 20 A=ASC(A$) 30 PRINT A So, just a matter of tracing the assembly code for ASC() and seeing where the difference is. ('just' hahaha)
  15. I always overcomplicated things! My approach back in the day was to do it all in one line for the read, null handling, and conversion to a number: 20 GET#8, A$: A=1+(A$=""):IF A THEN A=ASC(A$) Funny how there are so many different ways. BTW, I have no idea how mine performed, it never mattered when you knew you were waiting on the 1541!
  16. All this talk about an app store, software development leading sales, etc., all that... It seems almost chimerical. Especially in this thread, since the whole point is that the core architectural details are potentially in transition or at least may include an additional platform.
  17. That would be pretty rough. I make typos enough that I cringe to think of debugging that.
  18. So.... I was browsing trigonometry tricks sites. This was was actually research for another project. But in the course of reading through all that stuff, I learned there's a thing called the cosine double-angle identity. And its possible application here was immediately apparent. The identity provides that for any angle 'n', the value of cosine(2*n) is equivalent to to the value of (2*cosine(n)^2)-1 That's sort of interesting because the cosine stack in my demo has both COS(n) and COS(n+n). To implement an optimization based on this, the result of expression I've called the precursor has to be prepared (call it 'n'). Then you get COS(n) and store that, lets say to 'X" [i.e., X=COS(n)]; Then the COSINE stack becomes (X+2*X*X-1+COS(5*n)). Now there's only 2 COS() operations per cosine stack evaluation, instead of three. But the question is: Is a single cosine operation in that stack so slow that it would potentially save time to replace it with a more convoluted expression that requires an extra variable store, two extra multiplications and several extra variable fetches? Could all that additional work nevertheless be faster? Apparently, yes. Implementing the double-angle identity to get rid of one of the COS() calls actually did cut the 'precomputation' part of the revised Proteus Demo down from 1 minute and 39 seconds to 1 minute and 25 seconds, which brought overall time down to 3:32 from 3:47. Despite increasing the number of 'BASIC' operations, the total cycles for all those added operations still wound up being less than the cycles incurred from a single COS() operation. So, sort of interesting, eh?
  19. As someone put it in the comments, 4 and a half minutes of absolute proof the 80s were the best evah! Man, its my childhood and teen years in terms of the pop culture we all knew and loved boiled down and distilled into a hell of a potent nostalgia fix.
  20. Its not an 80s 'meme,' but this video will make you grin from ear to ear if you're an 80s kid.
  21. Here's the newest version 2.0 with the precalc. Let me know if you want a copy/paste of what I stuck in the downloads -- i.e., the three-in-one combo with the slow, fast, and fastest versions. Cheers. 1 X=.: Y=.:I=.:G=199:T=.:U=.:F=91:J=.:D=.0327:C=20:O=.:A%=.:A=144:B=2.25 2 E=160:K=A*A:L=.5:DIMQ%(64,A):SCREEN$80:TI$="000000":GOSUB8 3 FORO=-ATOASTEPB:T=O/B:U=T+E:A%=L+SQR(K-O*O):J=ABS(T):FORI=-ATOA% 4 X=I+U:IFX>=.THENY=G-(Q%(J,ABS(I))-T+F):IFY<GTHENLINEX,Y,X,G,1:PSETX,Y,. 5 NEXT:NEXT 6 PRINT"\X97\X11\X11\X11\X11\X11\X11\X11\X11\X11\X11\X11\X11\X11\X11\X11\X11\X11\X11\X11\X11\X11\X11 TIME:"TI$"\X13\X11\X11"TAB(23)"THE PROTEUS DEMO" 7 FORI=-1TO0:GETK$:I=(K$=""):NEXT:SCREEN 0:COLOR1,6:CLS:LIST:END 8 PRINT"\X11\X1D\X1D\X1D\X97*PRE-COMPUTING FAST LOOKUP TABLE*":J=D:U=C:FORT=.TO64:X=T*B:Y=X*X 9 FORI=.TOA:X=SQR(Y+I*I)*J:Q%(T,I)=U*(COS(X)+COS(X+X)+COS(5*X)):NEXT:X=4*T+27 10 RECTX,C,X+2,27,11:NEXT:SCREEN $80: RETURN
  22. Proteus View File This is a single program listing that contains three different versions of the Proteus Demo from my thread in HOWTOs on converting and optimizing BASIC programs. Version .02 was from very early in the conversion/optimization process from that thread and, consequently, is quite slow. Its also got the original author's bad scaling coefficients, which make the output look a little gnarly. Version 1.0 was originally what I thought would be the 'fully optimized' version, with all sorts of things (documented in the thread) done to squeeze out better performance. And of course the scaling is fixed so the output looks better. This was supposed to be the end of the thread, except that... Version 2.0 takes advantage of something I noticed about the calculations, which led to the idea to have the program start off by precomputing a table that allows us to avoid having BASIC redundantly performing a bunch of the most expensive operations in the program. Even considering (and counting) the over minute-and-a-half it takes to initially compute the lookup table, the trick resulted in nearly halving the time to complete plotting the output compared to what was the previous fastest version. (Of course, I now wonder if some of the better math and coding gurus have been rolling their eyes all along, just wondering when, if ever, I might figure this part out...). Just RUN it and pick A, B or C from the menu. When its done plotting and puts up the elapsed time (its in HHMMSS format) you can press any key, and it will 'LIST' the program lines that correspond to the version that just completed plotting. I hope folks find it helpful having the three versions (and the howto thread) in terms of seeing how the thing evolved. You will notice that the more tweaking you do for speed, the more opaque and confusing the program becomes in terms of ever expecting someone with fresh eyes to try and see what the heck is going on. That 'early' version is included, in part, because its much easier to follow than the others. More info at the thread here: Submitter Snickers11001001 Submitted 08/27/21 Category Demos  
  23. Version 2.1; 1.0; 0.2

    17 downloads

    This is a single program listing that contains three different versions of the Proteus Demo from my thread in HOWTOs on converting and optimizing BASIC programs. Version .02 was from very early in the conversion/optimization process from that thread and, consequently, is quite slow. Its also got the original author's bad scaling coefficients, which make the output look a little gnarly. Version 1.0 was originally what I thought would be the 'fully optimized' version, with all sorts of things (documented in the thread) done to squeeze out better performance. And of course the scaling is fixed so the output looks better. This was supposed to be the end of the thread, except that... Version 2.0 takes advantage of something I noticed about the calculations, which led to the idea to have the program start off by precomputing a table that allows us to avoid having BASIC redundantly performing a bunch of the most expensive operations in the program. Even considering (and counting) the over minute-and-a-half it takes to initially compute the lookup table, the trick resulted in nearly halving the time to complete plotting the output compared to what was the previous fastest version. (Of course, I now wonder if some of the better math and coding gurus have been rolling their eyes all along, just wondering when, if ever, I might figure this part out...). Just RUN it and pick A, B or C from the menu. When its done plotting and puts up the elapsed time (its in HHMMSS format) you can press any key, and it will 'LIST' the program lines that correspond to the version that just completed plotting. I hope folks find it helpful having the three versions (and the howto thread) in terms of seeing how the thing evolved. You will notice that the more tweaking you do for speed, the more opaque and confusing the program becomes in terms of ever expecting someone with fresh eyes to try and see what the heck is going on. That 'early' version is included, in part, because its much easier to follow than the others. More info at the thread here:
  24. Well, yesterday was longer than I expected. Sheesh. I don't like medical stuff, but its especially annoying if you have a thing that gets bumped several hours at a time because the docs are all in the ER working on a motor vehicle accident. Oh and "no, you still cannot eat or drink anything but water, sir." Geez, just cancel my thing and let's reschedule but don't keep me in there extra hours just waiting. Ah, well, I guess its medicine in a medium sized city in 'current year' USA. I'm very sorry I didn't get back until late, and then had errands and other things today so I had to delay writing this up. Before I start.. Scott: Awesome work on some benchmarking for variable cost/overhead. One possible addition would be to add a line (before everything else) where you initialize a bunch of dummy scalar variables (DA,DB,DC,DD,DE,DF,DG,DH, etc). You can use 'DIM' to initialize regular/scalar variables -- just call the command with the variables separated by commas, e.g., 'DIM DA,DB,DC,DD,DE' -- which should let you fit 20+ dummy variables in one line. You can REM it out to duplicate the results you already have, and then un-REM the line and see how scalar variable performance changes when the ones in your benchmarking routines effectively become later-initialized scalers and have to do the 'nope, nope' stuff in view of all the others earlier in pecking order.... Anyway, now to the thought process and the optimization that got the time of the 'Proteus' demo down to 3 minutes and 47 seconds.... As I said above, the key was looking at the "precursor" expression that creates the value that goes into the COS() functions in the cosines stack (with the result of all that, in turn, getting tweaked into the 'y' pixel coordinate. EDITED: Yes, I know what I call the 'precursor' is the 'angle' that gets fed to the cosine functions. But its not an angle, not really, because if the original author wanted to convert degrees to radians (s)he ought to have multiplied it by pi/180 (approx. 0.01745) and the coefficient used was 0.0327 which is not an even multiple of that or anything. Its just a magic number the original author fiddled with until the output matched what was wanted. So call it angle if you wish, but its just the 'precursor' to me! At any rate... In the "0652" time version of the demo above, that precursor was at the beginning of line 4: Y=SQR(J+I*I)*D We obfuscated it a bit in prior optimizations by kicking part of the calculation up into the outer loop (J=O*O), but in essence the expression generates the square-root of the sum of two squares and then multiplies the result by a magic number the original author came up with to get everything to work on the screen the way (s)he wanted. The things that are squared are the values of our two loop indexing variables (outer 'O' and inner 'I'). My brainwave was to look at this and think about a basic mathematical identity: N squared is equal to (-N) squared. A positive times a positive is a positive; a negative times a negative is also a positive.... This is helpful because each of the main loop indexing variables swing between negative and positive. The outer loop runs from -144 to 144 with a step of 2.25. So there are 129 different values of 'O' as it iterates. Digging deeper, that step increment is an even divisor into 144, which means the that indexing variable will have 64 negative values, the 0 value, and 64 positive values that are the SAME (save for the change in sign) as the 64 negative values. But what that means is that the outer loop variable effectively makes only 65 possible different contributions to that square-root of the sum of two squares expression. Whether 'O' is 144 or -144; -141.75 or 141.75, the value of 'O' participates in the 'precursor' expression only by being multiplied by itself. The sign of 'O' is effectively rendered immaterial for THAT part of the calculations in this program. Likewise, the inner loop runs from -144 to some varied positive number calculated based on the value of the outer loop indexing variable, with an increment step of 1. The way the endpoint is calculated, it cannot be higher than +144. Once again, that means that (keeping in mind the loop also swings through the 0 value while iterating), although the inner loop runs up to 288 iterations, the inner loop indexing variable can only supply 145 different contributions to that 'square root of the sum of two squares' expression... The sign of the value of 'I' is immaterial to the precursor-to-cosines stack combo, since (-I) * (-I) is the same as I * I. Synthesizing all this, we start to see something: Our inner loop has been calculating the extremely expensive combination of what I've been calling the ' precursor' and that triple cosines stack (and a couple multiplication operations into the bargain) to get the 'y' coordinate just under 32,000 times over the course of running the program -- i.e., whenever the more cheaply calculated 'x' coordinate lands within the valid coordinates of the bitmap screen. HOWEVER, our examination above has just demonstrated the calculations in the "evaluate precursor, feed it to the cosine stack" sequence are always going to be performed on what can effectively be treated as a maximum of 9425 (65 x 145) combinations of inputs. It means we might be able to make a lookup table! A 65x145 table. But whether we can implement such a table and what we put in it is limited by memory constraints. For example, if it turns out that we MUST store floating point values for this to work, we cannot use a regular array and this idea probably dies. A float array takes 5 bytes per entry to store floating point values. So, um, yikes! That's 9425 * 5 = 47,125 bytes, which is way WAY more than the 39K of memory available to BASIC. I suppose we could make an artificial fixed point array using POKES/PEEKS storing two bytes in memory for each value, one for the integer portion and one for the fractional portion. But frankly, it seems to me using BASIC to calculate, store and fetch both pieces might be too costly since all our 'overhead' would have to be done in BASIC. (It gives me an idea for a 'USR()' routine however for some future project). I didn't bother trying that, because it turns out we can store our 9425 values in an integer array. In C64 style BASIC, that means we have a range of -32768 to 32767 for each value, and it takes only 2 bytes of memory per entry. That's less than 20K of memory cost here. We can do that. But... Does it actually help us to only be able to store integers? To see about this, let's reason it out. Looking at the plotting sequence as a whole, it is apparent that the 'y' pixel coordinate is calculated and left (as one of my updates above mentions) as a float and fed to the graphics commands as-is, and the commands simply disregard the fractional portion. So can we throw out the fractional portion earlier? Let's see. The final 'y' coordinate for a pixel location is evaluated as: Y=G - ({n} - T + F) {n} is the result of that 'precursor' and cosines stack. 'F' is a constant that acts like the "vertical position" knob on an old analog TV -- changing it moves the entire image up or down within the bitmap. So far, I've kept it at the original author's suggested value of 90. 'T' is the outer loop index value 'O' divided by the step increment 'B' (constant of 2.25). Which is to say T ranges from -64 to +64. OH! LOOK! Now we see how, despite the sign of the outer loop indexing variable being discarded through calculation of the precursor expression, the sign (pos/neg) of 'O' nevertheless DOES matter and plays into the ultimate 'y' coordinate pixel generation because 'T=O/B' gives different result in terms of sign depending on whether 'O' is positive or negative. Nice. That also tells us that what I refer to as {n} above (the result of the precursor and cosines stack) is the thing we should store in our table, if possible. OK, so reasoning further: The result from the precursor/cosines stack (including the multiplication by the original author's constant 'C') is a float. It is a small float that we can be sure will not exceed BASIC's 'valid' integer variable range of -32768 to 32767. We know this because we can see that after the tweaks by subtracting T' (value in range -64 to 64) and adding 'F' (const. of 90), and then taking the result and subtracting it from 'G' (const. 199), it will, most of the time, yield a valid vertical pixel coordinate between 0 and 199. That's great news. It means we can throw that {n} into our integer table. BUT. Doing so results in disregarding the fractional portion earlier than normal. Here's what that does and what we need to do to deal with it. Suppose that prior to using our table (i.e., in the earlier program versions) some {n} value, after getting tweaked by 'T' and 'F' winds up yielding a result of 19.378. That number then got subtracted from 199 to give a 'y '(vertical) pixel coordinate of 179.622. Feeding that to our graphics commands resulted in the fractional component being disregarded, meaning a pixel would be plotted at vertical position 179. But, now suppose we stick the result of {n} into an integer array variable. It loses its fractional component right at that moment. Which means that same value after adjustment by 'T' and 'F' winds up returning '19' instead of 19.378. And subtracting 19 from 199 yields a vertical pixel coordinate of 180. It's off-by-one. That's the only problem. We need to subtract an additional '1' from 199 to get the 'correct' pixel coordinate due to the 'premature' removal of the fractional component when we use the table. Remember that 'F' constant I just mentioned (the one that's like a 'vertical position' setting)? We will change it from 90 to 91, and that will do the trick. Well, that's it. That's how it works. I'll put up a screen shot of the listing tomorrow. Also, I'm going to put up a combined 'final' program in the 'demos' section, probably with a little menu that lets you pick between three versions of this sucker. I'll try to include one of the very early versions from this thread (if I can find one I saved), the 1.0 version that was the best performer prior to this breakthrough, and then this version '2.0' update. Having all three in one program will hopefully help folks make a close study of the listing and see how things evolved. [EDITED: Ok, the 3-in-1 demo is in 'demos' in the downloads section] One final word: This latest optimization really improved performance, but only by eating literally HALF of all the memory available to BASIC. That's often a thing... sometimes you can gain a lot of speed by throwing memory at a problem. And for a free-standing demo like this, hey... whatever works. Many famous C64 demo scene machine language programs pre-calculate all sorts of stuff using every possible available byte of memory. But if this were just a small part of another program, it would clearly be a bridge too far. That said, given what the values in the table look like, I have an idea for an alternative implementation that might be able to get away using only 10K of memory... but there's no way it would be this fast. Cheers! ETA: Here's that listing....
×
×
  • Create New...

Important Information

Please review our Terms of Use