Jump to content

kliepatsch

Members
  • Content Count

    116
  • Joined

  • Last visited

  • Days Won

    1

kliepatsch last won the day on May 31

kliepatsch had the most liked content!

Community Reputation

97 Excellent

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. Well, I could actually try this "branch before doing the expensive work" thing in the Image-to-double Petscii converter (somewhere on this forum) and see if it makes any difference. Thanks for that
  2. That's a nice description. Yes, slightly "blurry" in the sense that the image is made out of patches of color
  3. Heya, progrss here. I managed to port the search over to the GPU. Search time reduced from 10-15 mins down to 15 seconds for a full-on brute force double petscii search. In order to run it, you need a CUDA-capable GPU, the CUDA toolkit installed, and additionally to the aforementioned Python packages, also NumBa. Some CUDA-capable GPUs are not listed on the official CUDA website, simply because they are older... mine for instance is not listed as CUDA-capable, yet did the Job just fine: Nvidia GeForce MX130. It was a pain to get it working. But it was a nice challenge to try to get the maximum of computing power out of my machine Find the script attached. double_petscii_brute_force_8.py
  4. I finally got around to trying to make custom instruments and making some more music with the MPL Ctrl+C & Ctrl+V it into the emulator. HARPY.TXT
  5. +1 on note to frequency table. Many apps will use it. The numerical values will be pretty much the same everywhere. I would vote for having all the low frequency bytes in an array 0...127 and after that all the corresponding high frequency bytes at 128...255, so that you don't need to multiply the note value by 2 to get the correct index, but access both via LDA freq_lo, y and LDA freq_hi, y. But that's a matter of taste. Either way, such a table would be nice. (I have to mention that Concerto uses more than 128 valid note values, so that the whole table requires more than 256 bytes and makes the aforementioned separation into low and high bytes strictly necessary to be able to access all entries)
  6. Yes I think the idea was to have a bigger version of the SID. But the Vera's PSG and the SID differ in a few aspects: Obviously count of voices. The SID has 3, the PSG has 16. Volume control: the SID has analog ADSR envelopes. They sound nice and smooth, however, they aren't very flexible. The maximum volume cannot be changed. The PSG is more flexible: you can simply set a number as the voice's volume. This flexibility comes at the cost that the volume has to be updated manually. And the volume control is kinda coarse, so sometimes you can hear when the volume is updated to an adjacent level. These were the two points where the PSG beats the SID. The other points all go to the SID. It has analog sound generation, while the PSG is fully digital. The difference becomes noticeable at high pitched sounds, where the PSG output can start to sound really unpleasant due to aliasing. The PSG also has no ring modulation, and, most importantly, NO FILTER. I think those are the most important points. But let's not forget that the X16 also comes with an FM chip, which nicely complements the PSG, and you cannot really compare it to the SID.
  7. One could also do the math how much one needs to pitch up a sample at, say, 20 kHz, so that it has the original pitch when being played back at the target sample rate. If I am not mistaken, the pitch in semi tones should be st = log_2(f_e/f_t) * 12, where f_e is the sampling frequency you are exporting at, and f_t is the sampling frequency of your target platform. The log_2 gives the pitch in octaves. Multiply it by 12 to get semitones. Does that make sense?
  8. True. Didn't think of that.
  9. I guess not. The software that is on the system is stored in the read-only-memory (ROM), and contains the Kernal, BASIC and perhaps other utilities. Petscii Robots is designed to be loaded into RAM and run from there. Most notably, it is larger than 16 kB (even the shareware version is more than 18 kB, so the full version will be more than that https://www.commanderx16.com/forum/index.php?/files/file/156-attack-of-the-petscii-robots-shareware/). Since ROM only occupies 16 kB of address space, it would be very hard to redesign the game to run out of ROM.
  10. Thanks. Yes, the input image is scaled down (and cropped if necessary) to 640x480 pixels. It's then subdivided into 8x8 pixel areas. Then the best match for each 8x8 tile is sought. My implementation does all the 8x8 areas at once, making heavy use of numpy expressions. Numpy is one of, if not the most important Python library, offering highly optimized numerical routines and great flexibility. Yes, I also thought about using the GPU, but haven't tried yet, but could provide significant speedup. The operations that do the heavy lifting are just arithmetic operations and not many comparisons and branching, so it should be possible to accelerate the method with a GPU, I think. I think it would be desirable to find an algorithm which finds decent solutions with a fraction of computational effort. My idea is to have a two-stage algorithm which tries all the foreground bitmasks and selects a couple of them, say, 10, according to some clever criteria, and then systematically tries all background combinations for each of the 10 foreground candidates. This would already reduce the computational effort by a factor of 20 to 25 over the dumb brute-force method.
  11. I doubt the limiting factor is data transfer between the processes, as the parallelization is almost trivial. And in fact, I think that each process holds a copy of all the relevant variables (I think they are even initialized in every process, e.g. the image is loaded, the palette determined etc... but I am not totally sure about that) -- so there is no communication necessary, except for the final results. I have tried parallelization with the "ray" library, which claims to solve issues with large numeric datasets and accessing identical data from multiple processes, but it didn't help. I have thought the same about splitting the problem up differently. I.e. doing more in-place computation and less memory-intensive tasks. One obvious way to achieve this would be to compute each 8x8 pixel tile individually. That way, all necessary variables could fit into CPU cache, so memory accesses should be a lot faster. With that in mind, I ran a few tests on a very simple problem earlier, but couldn't find any noticeable difference between different ways of splitting it up. I guess the present version is roughly as efficient as I can make it. Improvements will be about using better algorithms. Well, if there *is* an example where you can actually see different performance for different ways of splitting it up, I will be happy to see it!
  12. What times can you get if you increase the "N_processes" variable? Or did you already do that to achieve that time?
  13. The foreground layer can use any of the 256 colors, while the background layer can only use the first 16 colors, but both foreground and background must be set. My script determines a palette and finds the best solutions using that palette. Improvements could be made finding a better/different palette. See a few examples below:
  14. Hi, following up on the brief discussion in this thread I wanted to see how far I can get with image to double PETSCII conversion. As a first step I wrote a converter to single PETSCII in Python. It works very well and takes about 6 seconds on my machine to find a very good solution. It can pick a custom color palette that works well for the input image. As the second step, I wanted to write a brute force double PETSCII search and see how far I can optimize it. My brute force method tries all possible combinations of foreground and background characters and picks the closest colors from the palette for each of them, respectively, and picks the best combinations. Albeit being very inefficient, this is the gold standard to compare other double PETSCII conversion algorithms against. A naive single-threaded implementation takes about 25 minutes to convert an image. I am afraid that none of the parallelization techniques I tried worked as well as I hoped. Even with 8 processes on my machine with 8 CPUs it still takes more than 10 minutes to complete. It seems that on my laptop, the search algorithm is severely memory bandwidth limited. (I tried the python modules "threading", "multiprocessing" and also "ray", and I found no significant difference in performance between them). Anyway, find the script plus supporting files in the attachment. To run it, you need Python3 with the following packages installed: numpy, pillow, scipy, matplotlib brute_force_double_petscii.zip
  15. In that version, it definitely isn't as much of a challange anymore, to get past the first obstacle! In 15-20 mins of playing, in the end I regularly got past 10-20 pipes. My high score was 27. I went back to version 1.1 and tried again. I have definitely improved over yesterday. I raised my highscore to 5, and was able to get 4 multiple times in a row! But still, the first obstacle was where I got stuck most frequently. I think I suck at this type of games, but it is simple and enjoyable, and I think training pays off. I would definitely include the 1.1 degree of difficulty, because it is a nice challenge. I remember the satisfaction when I first got past the first obstacle. Life could be so simple Edit: By including it I mean that an easier mode should also be included.
×
×
  • Create New...

Important Information

Please review our Terms of Use