Jump to content

Request for opinions on sound library functionality


ZeroByte
 Share

Recommended Posts

As I've hinted at in other posts, I'm currently developing a sound API and tool suite, Zsound. I want to release it soon, but before doing so, I'd like to have the base features working well and in a state where additional features to the library do not break previously-published API behavior. Currently, it's only got music playback support in the library and tools for importing music and FM instruments from various sources - mostly VGM, but VGMs of more than just YM2151 type are supported. I.e. you can import Sega Megadrive tunes, or YM2203 tunes with the tool.

The goals of this library are speed and simplicity.

Keeping that in mind, consider this:

For the playback routine, it currently just loops indefinitely on tunes with loops, or stops playback when the tune ends. The process is fairly opaque to the program using the library, and I think it would be useful to have some external control over the player's behavior - for instance you may wish to have a tune loop only a certain number of times, or synchronize events to the music ending, etc.

Currently, it is possible to know whether music is playing or not based on the frame delay counter - 0 = not playing, nonzero = playing. There is no way to know how many times the tune has looped, if any.

The question is: What makes more sense for controlling the looping behavior?

Ideas I've had:
Have a variable like "num_loops" which simply increments every time a tune loops
Make the start_music() function have options to define the behavior and leave it up to the player to behave accordingly
Have the player do callbacks whenever the end of the tune / loop point is reached

Each one has pros and cons. I'm inclined to go with the first option, and leave it as an exercise up to the program to determine how to send signals to the music player based on its state - such as trigger a fadeout() whenever num loops > X.

So - what do you think makes the most sense for this?
Also, what sort of controls would you expect to see for a music player routine beyond: start, stop, fadeout?
Note - I don't currently have volume control implemented, but I'm planning on doing it in a way that wouldn't interfere with the current functionality, so this is not a showstopper for the initial "alpha" release.

  • Like 1
Link to comment
Share on other sites

I think the problem with idea 1 is you don't know how many maximum loops a given application might care about. If it doesn't care you don't need it. If it does care, it might only need one byte, or two, or three ... I would be inclined to omit that default counter since it might not be used or useful depending on the consumer of the library.

Idea #2 is okay, but it also assumes that you can always know in advance how many loops you will require, and that it will always fit into a byte / word / more.

Idea #3 is best because presumably the application consuming the library can do whatever makes sense for itself. It can omit the callback in which case the library doesn't callback at all. If the application needs more or less space for such a variable, it will know how much room it needs. Or maybe it doesn't need to count, but some event might allow it to determine "now is the time to stop after this loop, which I could not possibly have known about before now". Presumably the callback could return a value that the library would use to know "I am looping indefinitely and shall continue to do so" or "I am looping but this should be the last loop" or "I am not looping, but I've reached the end, so the application can tell me to restart even though this isn't a loop" or even "I am not looping, but the user can tell me to start a brand new tune randomly or by some other criteria".

Of course, this is easy for me to suggest, since I'm not writing the library. I just think one callback routine can provide infinite flexibility that isn't available for #1 or #2.

If you *did* want to allow #1 or #2, it would be trivial for the library to provide built in callback functions that could be registered instead of a user defined callback that provided the easy functionality for applications that don't want to worry about it.

Edited by Scott Robison
Link to comment
Share on other sites

I think the callback idea is best. Even more so because I'm thinking of a callback that triggers after a "pattern" or "bar" so that you might be able to sync up some behavior with say, a drum beat or the rhythm of the  music?

would that be possible?  Then we could create music games or puzzles or interesting visual effects (strobe light, whatnot)?

 

Link to comment
Share on other sites

On 11/17/2021 at 11:36 AM, Scott Robison said:

If you *did* want to allow #1 or #2, it would be trivial for the library to provide built in callback functions that could be registered instead of a user defined callback that provided the easy functionality for applications that don't want to worry about it.

I basically figure a byte is enough, and if any program wants more than that, they can implement whatever size variable is required - just occasionally check the n_loops and update your own counter accordingly. I mean, does a song really need to have a finite number of loops > 255? Even if the music was just 1 second, that's about 4 minutes, 15 seconds. If the music is 15 sec per loop, that's an hour and 4 minutes.

I agree that callbacks give the most flexibility, but they also introduce complexity. Maybe I should just make the step_music routine return a boolean "reached_eof" which is true whether it looped or halted.

Link to comment
Share on other sites

And I will not be critical of you for any decision you make. I was just answering the question as posed and trying to think of "ultimate flexibility" realizing that all engineering is about analyzing the problem and deciding which things are good to have and which are too much. Given my lack of musical / audio programming experience, I am mainly able to help answer questions in the "generic library" context. 🙂

Link to comment
Share on other sites

On 11/17/2021 at 12:57 PM, desertfish said:

I'm thinking of a callback that triggers after a "pattern" or "bar" so that you might be able to sync up some behavior with say, a drum beat or the rhythm of the  music?

would that be possible?

What we're talking about would only happen whenever a tune ends/loops, so it wouldn't be useful for this sort of functionality.

However, the music data format does allow for such a thing to be done, potentially. Currently there is an undefined 4-byte command which I wrote into the spec as a place holder for doing the PCM digi sample track. Currently, the player just NOPs these and skips 4 bytes if they're found in a data stream. (I haven't figured out what sort of PCM commands might be needed). Essentially, this is kind of an "event" command, i.e. start/stop playback of a digi sample. It would be possible to define a generic "trigger" as one case of these 4-byte commands. The trigger might be defined as "send 3 bytes to the event callback routine" - and then it would be up to whatever the program decides to do with those 3 bytes.

The danger of such a thing is that it introduces "compatibility" issues - i.e. a ZSM file containing triggers would immediately become non-portable. Suppose one program decides to use triggers, and decides to use the 3 bytes for whatever behavior, and another program does something different. ZSMs for project A would cause unpredictable behavior if loaded and played back in project B.

So, I guess I'd prefer to have a set of standardized event types over a generic "send these three bytes to a callback routine", or maybe even just call it a "sync" frame that doesn't pass any data, and doesn't posses any particular meaning - kind of like "Send IRQ" - why? I dunno - just send it.

Or - maybe it should be "sync: track#" - so here's a trigger that means "something interesting just happened in track #" - do whatever you want or not. K, thx, bai.

Honestly, though, this is starting to sound like a job for MIDI. A MIDI playback library that lets you load a sound font, etc and play back MIDI files would be pretty useful for the community, and you could definitely embed events in there for sync-to-music stuff - like some rail shmup where there're enemies that emit pulses of energy in sync with the bass line of the music, and the lighting in the corridor changes when the music goes into the bridge vs the chorus, etc.

 

Edited by ZeroByte
Link to comment
Share on other sites

Here are some general functions that might be helpful.

playFromTo(FILENAME, START, END)

- Plays part of a file.

loopFromTo(FILENAME, START, END, ITERATIONS)

- Plays part of a file for a set number of iterations.

newSound = genEffect(FILENAME, START, END, EFFECT)

- Creates a new sound file from part of a file and applies one of a number of sound effects.

applyEffect(SOUND, EFFECT)

- Applies a commonly used sound effect, such as reverb, reverse, pitch shifting, fade in, fade out, etc.

newSound = join(SOUND, SOUND)

- Joins two sounds into one.

The from-to format to play different sounds stored in a single file is a concept used by other sound libraries. I don't know how helpful or purposeful it is, but it is "a thing". Being able to rip portions of songs/sounds and using them with joints and effects would allow for nifty stuff such as simulated record scratching or scene based ambience.

Link to comment
Share on other sites

I hadn't considered "utility" functions from the perspective of an editor utility as candidates for inclusion in the library. It's definitely food for thought - many of the functions mentioned (such as join(sound1, sound2)) are pretty much the type of thing that should be left up to the application to implement for whatever purpose - but that does bring up the point that the library should expose the necessary ingredients to facilitate such things.

I think seek() / rewind() / advance() make a lot of sense. These would require some functionality that's also needed for the ability to preempt music playback on one or more channels with SFX and then resume the music once the FX is done.

My estimation is that it will require 2 pages of RAM (512 bytes) for the YM cache and 64 bytes for the PSG cache. YM, being write-only requires a page of memory to shadow it, and another page to cache the state for "ghost writes" while a voice is suspended. This is why I'm strongly leaning towards the library using a bank of HIRAM for its workspace, in order to minimize the main memory footprint...

So my takeaways are that it would be good for the player to have the ability to seek a certain spot in the music, apply effects to the playback, and that I need to consider some non-playback-related things like functions to compute the duration of a tune, the duration offset of the loop start and end, etc.

Effects would need to be limited to fairly simple things like volume adjustments and pitch transposition.

Link to comment
Share on other sites

This is why it's good to ask for feedback.

I really hadn't given any thought to making routines in zsound.lib for manipulating ZSM data (ZSM = zerosound streaming music) on the X16 itself, not as a player but just to work with the data.

For instance, creating a ZSM file on the X16 itself. There is a little bit of complexity involved in the encoding side, so it does make sense that if anyone wanted to generate a ZSM stream from whatever routines they use to generate sound, that it would be nice to have encoder routines available to record that as ZSM. For instance, if someone were to make an on-system tracker, and would like it to be able to export a ZSM of the results, it would be useful to have routines to facilitate that so you don't have to write your own encoder.

Link to comment
Share on other sites

On 11/19/2021 at 4:28 PM, kliepatsch said:

What will be the timing unit of the format? Will it be vsync, or will more precise timing be supported?

I'm going with "ticks" as the timing format. The file has a 2-byte field in the header for "playback rate" (Hz), which defines how long a "tick" is. Right now, the playmusic: routine in my library is speed agnostic. It would be more aptly named "step_music." It only advances one tick per call, so it can be called at whatever rate and that's what playback speed it produces. I plan to make a new routine and call it playmusic instead. This one will be designed to be called at 60Hz intervals. It will work out how many times to call step_music on this frame, keeping a fixed-point ticks-per-frame tally. If you want to make an arbitrary resolution time source using the VIA or whatever, then that can work - just call step_music directly. There's really not a lot of benefit to making a ZSM of anything higher resolution in the time domain as I see it, but I felt that this would be something useful to have in the spec.

In fact, it was just the other night I was starting to figure out how I'm going to do divide-by-60 in assembly, specifically to make this new routine. I'm thinking I could do it by doing >> 6 and then taking the answer >> 4 and adding that to the value, doing another >> 4 and add, and a final >> 2 with the final >> being a ROR (to load the carry flag) then adding zero to the decimal and integer values to carry in the one.
Basically /60 is / 4 / 15. 15's close to 16, so just divide by 16 and add back 1/15th of the result... which is what the repeated >> 4 does - keep propagating the error down off the least significant bit, and if carry is set, round up.

Edited by ZeroByte
Link to comment
Share on other sites

On 11/19/2021 at 4:28 PM, kliepatsch said:

What will be the timing unit of the format? Will it be vsync, or will more precise timing be supported?

I've now implemented the time resolution support in both the library's player and the ZSM generation script. As a test, I generated Sonic The Hedgehog, Green Hill Zone at the VGM-native 44100 just to see how much more CPU-expensive that is. It's pretty steep, roughly 20% of the visible raster time.

My simple player produces a "raster bar" at the top-right corner of the screen.
This is how much raster time is taken up as a minimum when it calls step_music 735 times per frame:

image.png.1d28b4895807cf7782e8019023e0d6ef.png

For comparison, when the same tune is re-sampled to 60Hz and only called once per frame, the minimum is actually zero pixels of the raster bar shown... typically one or two raster lines are used except when YM voices re-patch for instrument changes.

Obviously, this "step it 735 times" per frame method is still not as accurate (or CPU-intensive) as it would be to generate IRQs at 44.1KHz, but it demonstrates that the format handles it well.

Also, now that the playback rate support is there, the player can now speed up or slow down the playback in real time.

  • Like 4
Link to comment
Share on other sites

On 11/23/2021 at 5:28 AM, kliepatsch said:

That's an interesting result! How did you set up the timing of 44100 Hz? And how is the CPU time being measured?

SIMPLEPLAYER.PRG draws the CPU bar using a "deluxe" method of the old school C64 trick where you INC $D020 and then DEC $D020 when the routine is finished, creating a "raster color bar" on the border. The player makes this bar by setting L0 to use L1's screen memory area (i.e. switching the display to use L0 instead of L1), creating a second screen memory area at $4000 and clearing it, then drawing the bar at the right edge of the screen from top to bottom. Then L0 is activated and L1 is disabled so nothing looks different on screen.

Then whenever it calls the player, it first unhides L1 and then when the player finishes, it re-hides L1. In order to put the bar at the top of the screen, I used a LINE IRQ on line 0 to trigger the music instead of just using VSYNC, as that would result in most of the activity happening during VBLANK - thus not visible on screen.

As for the timing - zsound's zsmplayer has 4 main routines for playing music: startmusic, playmusic, stopmusic, and stepmusic.

startmusic: Sets up all the pointers and counters and such from the selected tune. (pass it a pointer to where you loaded the ZSM into HIRAM). Among other things, it calculates the ticks/frame value from the tune's playback HZ in the header. In this case, the ZSM header says 44100 Hz. startmusic calculates this as 735 ticks/frame.

stepmusic: processes exactly 1 tick of the music. You can call this directly if you want to accurately play back a tune at some rate other than 60Hz. If that kind of accuracy isn't required, you can just use playmusic.

playmusic: will play back any tune by calling it once per frame. It just uses the ticks/frame value to know how many times to call stepmusic. It keeps residual fractional ticks so if it's 3.5 ticks per frame, then it would call stepmusic 4 times every other frame.

stopmusic - duh.

 

Edited by ZeroByte
Link to comment
Share on other sites

This is also how Deflemask works when making NTSC YM2151 tracks, making every delay multiples of 735. My VGM conversion code just divides all delay values by 735 to make them 60Hz ticks. If a source VGM was for PAL, it would have delays with multiples of 882, but Deflemask defaults for NTSC as it is intended to create tracks for Sega arcade boards.

Link to comment
Share on other sites

My VGM-to-ZSM conversion script does this as well. VGMs are natively 44100hz sample rate. Until yesterday, it always just resampled time to 60hz. Since I wanted to make the format support higher resolutions in case it's needed for some reason, I needed to make the reference implementations also support the feature, right? 🙂

Now my encoder just does 60hz by default, but supports a command line flag -t so you could use -t240 to get 4 ticks per frame in time resolution. It's a shame the emulators don't support VIA 6522 timer IRQs because I'd like to try some of the tunes from City Connection at 44100 "in the lab" because I suspect that the PSG will be more audible.

Link to comment
Share on other sites

Thanks for the clarification. I missed that part

On 11/22/2021 at 11:34 PM, ZeroByte said:

Obviously, this "step it 735 times" per frame method is still not as accurate (or CPU-intensive) as it would be to generate IRQs at 44.1KHz

So you're doing 60 Hz playback with two different methods, one with and one without prior conversion of the music data to 60 Hz. Gotcha 🙂

Link to comment
Share on other sites

On 11/23/2021 at 9:08 AM, kliepatsch said:

Thanks for the clarification. I missed that part

So you're doing 60 Hz playback with two different methods, one with and one without prior conversion of the music data to 60 Hz. Gotcha 🙂

Yep - the main difference is that the "without prior conversion" method allows the on-system player to play the ticks at a smooth rate or just clump them together as required by the application. 😉

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...

Important Information

Please review our Terms of Use