Jump to content

New demo uploaded: PSG Audio Test or Why We Only Need Vera.


Recommended Posts

PSG Audio Test or Why We Only Need Vera.

View File

As has been pointed out recently there is tooling for the X16, especially in terms of trackers.

There have been a few great examples of music on the forums, either playing a Amiga\C64 mod, or via a X16 tracker. And these are great.

That said, I did think maybe the community was missing a trick. If the goal is to make music for the X16 do we need to be able to run the tracker on it? Doesn't that make writing everything harder? If you have a modern machine, it would be easier and quicker to create a tracker on that. UIs are easy. MIDI keyboard integration is a mere nuget package away. Can hand edit json files if you want. IO is trivial. That's not to say writing a tracker for the X16 is wrong, it's just a different goal.

So that's what I did. Apart from the emulation of the X16's PSG which took a while to get going, it wasn't so bad to do -- WPF aside. I've ended up with an application which lets me produce X16 music. It can export an .asm file which can be imported into ca65, making integration into a project really easy with just two calls. It only uses single digit worth of cpu lines and 32 bytes in the ZP, so is pretty lean. That said, it does not yet support PCM audio not commands.

The music from file attached is sourced from part of a demo file that comes with FamiTracker. What did occur to me while doing this, is that VRC6 ( https://en.wikipedia.org/wiki/Memory_management_controller#VRC6 ) music is pretty damn good. In fact I'm sure a player for .ftm files could be written. (Given how long it took to get just the first 3rd of a demo file working, I might write a pattern importer myself..!)

For me, the audio quality demonstrates that Vera's PSG and some form of PCM is all that's needed for audio on the X8/16. (Sharpen your pitchforks!) It just needs to be a bit louder!

What next? Like all projects that have gone from 'Proof of Concept' to 'Production' in one step, has resulted in some of the code being a bit crap. Especially on the WPF side! If anyone is interested, I'll try to shore the code up and will post with an explanation of how it works soon.

For now the display shows the four counters. Counters for Frame, Line, Pattern, and the Next Line. Source is the VRAM address (I use VERA to stream out the data for the patterns, as it makes life much easier. I can't understate how useful this feature is.)

The bottom table is:

  • VERA registers
  • Address of the instrument data
  • Instrument Position
  • Command
  • Note Number
  • Instrument Repeat
  • Command 2bytes Parameters

 

  • Like 1
Link to comment
Share on other sites

This is awesome!  Great work. 

How well do you think this can mesh with using vera moving sprites, moving/scrollling tiles, and updating lots of vid data?   I think that's been my concern about using VERA for music and not just sound effects:  Contention and resource traffic jams especially for doing a game.   The nice thing about the YM chip it has seemed to me in theory was the possibility of having some nice interrupt driven music that did not need to use a  bunch of cycles each interrupt storing and restoring all the VERA registers and state information.    

I'll be following this with interest!

Link to comment
Share on other sites

This project seems very promising! 
In general I agree - the productivity on a contemporary computer is much better then using directly x16, even if it is emulator.


So you made a tracker for a PC that can produce a demo like included file. Where can we see that tracker? Screenshots, video, the tracker itself?
Also how did you produce the music in the .prg file - did you transform/decode existing format or you manually loaded the values in your tracker?

Link to comment
Share on other sites

6 hours ago, Snickers11001001 said:

The nice thing about the YM chip it has seemed to me in theory was the possibility of having some nice interrupt driven music that did not need to use a  bunch of cycles each interrupt storing and restoring all the VERA registers and state information. 

From all that I have read, YM has a major flow - writing to a register require the YM to be ready. That ready state on real hardware is around 156  CPU cycles at current configuration.  I bet even with all extra VERA overhead PSG writes takes less then that 🙂 

Edited by Squall_FF8
Link to comment
Share on other sites

10 hours ago, Yazwho said:

That said, I did think maybe the community was missing a trick. If the goal is to make music for the X16 do we need to be able to run the tracker on it?

If the goal is just to make such music, that can be played by X16, then it is totally fine and your solution is great. And many people will follow along and will be grateful. Besides even in old days some commercial software for some machines were created on different more powerful machines.

And for example if we are talking game consoles (say Atari 2600 or NES) there is absolutely no way of writing code or creating graphics/music directly on the system. So it is totally fine approach to use separate capable system.

But when we are talking about computers, there are some people who wish to create everything for the machine directly on this same machine. It's kinda part of fun. )

Link to comment
Share on other sites

12 hours ago, Snickers11001001 said:

How well do you think this can mesh with using vera moving sprites, moving/scrollling tiles, and updating lots of vid data?

I think this is a very lightweight music player in terms of CPU. If you leave the graphics of the demo away, I would guess you barely notice that the X16 is doing something. It boils down to how Yazwho implemented the instruments.

Well, and there's the fact that graphics and music both mess with the VERA registers. But if at an interrupt, you save the vera address at the beginning and restore it at the end, music and graphics should work flawlessly next to each other.

Edited by kliepatsch
Link to comment
Share on other sites

 

14 hours ago, Snickers11001001 said:

How well do you think this can mesh with using vera moving sprites, moving/scrollling tiles, and updating lots of vid data?   I think that's been my concern about using VERA for music and not just sound effects:  Contention and resource traffic jams especially for doing a game.   The nice thing about the YM chip it has seemed to me in theory was the possibility of having some nice interrupt driven music that did not need to use a  bunch of cycles each interrupt storing and restoring all the VERA registers and state information.    

I'm not sure what you mean. Where would any contention come from? 

The purple line at the top is the CPU time that the playback is using. I've not measured, but looks to be around 1% CPU usage for 7 voices. So once done should come under 2% w/o PCM. In this example the audio uses ~3k VRAM and ~2k RAM. Compared to typical audio usage of the 80s, this is minimal; as a comparison in RMCs interview with Rob Hubbard, he said audio was typically given 10% of both CPU and memory. (It's a good book, you can grab a copy here: https://rmcretro.store/ )

The YM audio is a bit of a pain to use, given the clock speed difference you have to wait 144 CPU cycles between each write. You can of course do other things waiting for the ready flag, but that makes it much harder to integrate into an application. Napkin maths of 144 cycles and 7 voices would be around 1% of CPU usage.

Link to comment
Share on other sites

4 hours ago, Yazwho said:

The YM audio is a bit of a pain to use, given the clock speed difference you have to wait 144 CPU cycles between each write.

Wait! Im confused - because I don't do this in my games and music works nevertheless. I write as fast as possible to the YM until there is a KEY_ON command - or I have to yield because of a delay. This then is synced with the vsync interrupt. So yes, for sure there is more than 144 CPU cycles between each KEY_ON command, but not between each write. 

Link to comment
Share on other sites

6 minutes ago, AndyMt said:

Wait! Im confused - because I don't do this in my games and music works nevertheless. I write as fast as possible to the YM until there is a KEY_ON command - or I have to yield because of a delay. This then is synced with the vsync interrupt. So yes, for sure there is more than 144 CPU cycles between each KEY_ON command, but not between each write. 

I don't know the YM well enough. Just going on what I've read. eg https://ayce.dev/emptyx16.html#9f41h---ym2151-register-data-w--status-r

No idea if the emulator handles it correctly.

Edited by Yazwho
  • Thanks 1
Link to comment
Share on other sites

 

2 hours ago, AndyMt said:

Wait! Im confused - because I don't do this in my games and music works nevertheless. I write as fast as possible to the YM until there is a KEY_ON command - or I have to yield because of a delay. This then is synced with the vsync interrupt. So yes, for sure there is more than 144 CPU cycles between each KEY_ON command, but not between each write. 

The busy delay is incurred after each and every write to the YM data port, not just KeyON/OFF.

I worked with @StephenHorn on the box16 emulator project while he was refactoring that emulator to use the new BSD-licensed YMFM core library. I've learned quite a bit about how this chip works.

R38 has several inaccuracies in the YM implementation - the core it uses does not emulate the busy behavior at all. You can bang away on it all day long and it won't skip a beat. The library just handles it. Furthermore, I don't think the IRQ functionality is emulated, and I also think the timer behavior might not be implemented either - I'll go re-test this and let y'all know...

Box16 has extended the YM support in several ways - it supports YM IRQs as well as busy flag behavior. Some of this has been a matter of the group's interpretation of other example code and/or the official YM2151 application manual. In other words, the current behavior in Box16 is our best guess as to the real behavior. Stephen has done an excellent job integrating this code and dealing with a colossal time synchronization nightmare, by the way. Kudos to his hard work! One of his decisions was to make these "enhancements" be disabled by default to maintain functional parity with the official emulator. They are activated by command-line arguments. Interestingly, when testing the accuracy improvements, one of my VGM playback routines went nuts and played an entire song back in less than 1.5 seconds. It turns out that the original VGM data stream contains commands that enable YM IRQs. My player just used WAI as a poor man's IRQ handler to wait for VSYNC. Thus it mistook these YM IRQs as VBLANK IRQs. The player was built to read the busy flag though, and even with strict enforcement of the busy state, the player doesn't drop any updates, so that seems to work properly. After removing the IRQ enable messages from the VGM, it plays back perfectly on Box16 even when it enforces busy flags.

 

Real Hardware?

The big question is how will this work on the real thing? As I said above, the behavior is currently our best guess based on other sources, none of which being real hardware.

One thing I can say is that @SlithyMatt's Chase Vault game works on real HW, and the game's YM routine uses busy flag reads to ensure that it does not write to the chip when it's busy - so that's a good sign. Beyond that, I've sent a basic read test program to Kevin, as the YM reading functionality has never been subjected to testing on X16 hardware the way writing has been. So far, the "hello world" results are interesting. Kevin's only been able to test at 2 and 4MHz which the math says should work just fine on YM. 8MHz is the really important one, but as the system is currently experiencing other 8MHz-related problems, Kevin wasn't able to test at 8MHz.

In short - the test program successfully read IRQ status flags w/o any errors at 2 and 4 MHz. Interestingly, the busy flag read test never observed the flag's having been set, but I think there might've been a bug in my code.

I'm personally convinced that the write delay specification is actually a little bit different in reality than you might expect from reading the application manual. That is, I strongly suspect that the "64 YM clock delay" is not a minimum but a maximum. The data sheet definitely has other errata, and I'm convinced this is just another example.

I have a real YM to test with, but don't have everything lined up enough to be able to put it on a breadboard and do real testing just yet. If I can get that done, I'm definitely going to poke around with the real chip's busy state flag to see what makes it tick.

YM and write performance / overhead:

As for writing performance with the YM, I'd like to point out that in my experience with generating byte streams of PCM writes and FM writes, the FM music data streams are much smaller than the PSG streams. This is completely to be expected. That's because the FM chip does so much of the modulation in hardware that you don't need to update it nearly as often to get decent-sounding music. PSGs must be constantly modified for every little thing, be it pulse width modulation, vibrato, etc - all of which is done in hardware on the FM chips. Consequently, you end up performing SIGNIFICANTLY fewer writes to an FM chip than you do a PSG over the course of a tune. Still, sitting around for ~144 clock cycles waiting on the chip to finish chewing the last bite is not ideal.

I believe one way to handle YM writes may be to queue up the writes into a ring buffer, and have an IRQ that empties the ring buffer. You can set the YM's timers to run a little slower than the actual rate the YM could drain the buffer in order to cut down on the IRQ overhead. This would have the effect of spreading the load evenly throughout a frame if that's what you'd like to do. Kevin has confirmed that the YM2151's IRQ line is indeed connected to the system, so this should be doable on the real system.

  • Like 4
Link to comment
Share on other sites

7 hours ago, Yazwho said:

I'm not sure what you mean. Where would any contention come from? 

I've been going through Matt's videos.   I am _not_ an assembly guru or even passably good at it, so forgive me if I'm all wet. 

But there's 2 data ports on the VERA (well, at least X16 VERA, the X8 is a whole different ball of wax apparently).   You have to set things up to point to the VERA address and port you want and the stride.   The VERA memory range includes (a) the video memory; (b) the PSG registers; and (c) the sprite registers.  (also palette).    Three different activities in the structure of a game code and only two ports into that space, which require set-up to read/write.   That's all I meant and is the sort of contention I had in mind when I wrote that.  

It seemed to me that if you've got a routine using VERA data ports 0 and 1 to move some video data in preparation for a scroll or context change, or to work the VERA sprite registers; then if an interrupt for music fires, the music routine must include code to save the what data port was selected, the VERA address the data port had been pointing at as well as the stride value, and then restore all this before exiting back to regular execution.  Otherwise when the program flow gets back to what was happening before the audio code, the data port/stride stuff will have wrong values.    

Seems to me you'd have to store/restore:  $9F20, $9F21, $9F22, and $9F25 at least.   Looks like those cover the L,M,H portions of the VERA address, the inc/dec stuff, and the port select bit.   So to store that's 4 LDA absolutes at 4 cycles each, plus 4 STAs at 4 cycles each (3 each if you have 4 ZP addresses you can set aside to be temp storage for these or I guess you could push them on the stack if you're sure it will have room), and the same when your music handler is ready to exit, the reverse to put everything back.   Reading through this now, I realize I'm stuck in the 'long ago' and thinking at C64 cpu speeds and the old 'cycles per refresh' thing, but those cycles are probably of negligible impact with the 8mhz clock on the X16.

Still, until I heard the extra timing considerations on the YM chip in this thread, I had thought it might be easier to just have a music routine that needn't concern itself with saving/restoring VERA state info and just play the dang music.   Obviously very naïve thinking,  as it turns out.   

I meant no offense at all at what you're doing, and I thought I made it clear that I love this demo.   I hope my ramblings weren't taken as any sort of criticism.  

Cheers.   

 

Link to comment
Share on other sites

2 minutes ago, Snickers11001001 said:

hen if an interrupt for music fires

For my player, it's not triggered from its own interrupt. You call the audio routine once per frame, that's it. It's up to the callee to save anything that is needed, but as you control it you'd try to time it so you didn't need to.

 

5 minutes ago, Snickers11001001 said:

I hope my ramblings weren't taken as any sort of criticism.  

Not at all 🙂

  • Like 1
Link to comment
Share on other sites

On 9/9/2021 at 5:21 PM, ZeroByte said:

I worked with @StephenHorn on the box16 emulator project while he was refactoring that emulator to use the new BSD-licensed YMFM core library. I've learned quite a bit about how this chip works.

I tried to get box16 to run, but have failed to compile my own compatible rom.bin (don't get the toolchain to work on Windows, despite the instructions being pretty clear).  I then used a rom.bin from an older R39 release I had lying around, but of course this doesn't work, too. Any ideas?

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...

Important Information

Please review our Terms of Use