SSDPCM1 Super by Algotech

screenshot added by algorithm on 2017-06-28 18:24:11

platform :	Commodore 64 Commodore 64
type :	demo demo
release date :	june 2017

popularity : 52%

52%

0.83

alltime top: #29020

algorithm [coding, sampling]

added on the 2017-06-28 18:24:11 by algorithm

popularity helper

increase the popularity of this prod by spreading this URL:

or via: facebook twitter pinterest tumblr bluesky threads

comments

Overview

This sample compression method is a enhancement of the ssdpcm1 routine that was originally used in the "channels" demo by algotech

The original ssdpcm1 routine worked via the encoder choosing an optimum single step size for a chunk of a specific time frame (20ms or so)

The decoder would then shape the sampled output via bit stream adjusting the sample upwards or downwards by the changing step sizes

This new implementation has the following enhancements

Encoder
16 bytes look ahead (in comparison with 1 value prediction) this allows worsening decisions that would overall result in higher overall quality per chunk.
There can now be any two step size adjustments in a given 20ms chunk using only a very small amount of additional space. This is done by the encoder selecting one of two step sizes in a chunk per 8 bytes (and marking this in a single bit). Thus a whole 256 byte of sample data only requires 4 extra bytes (256/8)/8 This bypasses the limitation of only using a single step size per chunk allowing one of two optimum step sizes to be used per 8 byte sample. Higher quality using only an additional 10% increase in file size (or lesser if using lower sampling rate)

Decoder
Constant amount of CPU usage per frame using decrementing playback from stack with decoded data pushed to stack ahead of time. Saves a lot of headache and lower cpu usage.

The demo was originally just going to be a proof of concept with a text screen and audio playing back, but had some (small) cpu time left for some simpler effects synced to the audio.

Some more tech details.

The original Axel-F soundtrack was 180 seconds in duration. By cutting up these samples to 4bar segments, was able to reduce this to approximately 100 seconds of unique samples consisting of 50 four bar segments. As the sample rate is nearly 11khz (10800hz), unpacked, this would not only use over a megabyte of disk space, but it would also not be possible to stream it at this sample rate unpacked (Unless having heavily looping repeating sections).

Each 4 bar sample would be 22032 bytes 8bit and consist of over a hundred 216 byte segments.
The 216 byte segments would have their unique step values and condensed into the following

2 bytes - two optimum step values for the chunk
4 bytes - holding 27bits control code on whether to use step1 or step2 for the given 8 byte sections.
27 bytes - bit stream (27*8) = 216 1bit values

Hence 216 bytes are packed to 33 bytes giving a size reduction of nearly 7:1

Why 216 bytes per frame?

In order for this method to sound ok, ideally it needs to be pushed at a higher sample rate (It is not feasible to use this at lower sample rates unlike ssdpcm2) using more frequent updates for the step sizes would increase the quality however.

total amount of cycles per frame on pal is 19656. 19656/216 gives exactly an integer value (91) which would allow the nmi to pretty much update at exactly the same amount per frame. (Of course there are ways to constantly adjust dd04 per frame to give an approximation of the fractional number - and does work, but opted for the other approach instead.

disk streaming

The next issue was to get the streaming to work well..

The nmi uses 34 cycles (including jmp $dd0c to save a cycle). The translation table to write to d418 is within the nmi and only saves/restores accumulator. A cycle gain per update would have been possible if including the translation within the decoder, but ram space was limited (as the decoder uses custom code per 8bit pattern for speed gain) This would allow to save nearly 4 raster lines per frame.. not much.

Would have been possible to save 5 cycles per update if using Y or X register only in nmi and not saving/restoring it in nmi, but it would result in it not being feasible to use this particular register outside the nmi because of this.

Hence 34 cycles + some cycles latency on average would consume 38 cycles or so per update.

Does not sound like much, but 216 updates per frame x 38 cycles is 8208 cycles which equates to over 130 raster line usage just for the nmi update (playing from a page buffer)

Combine this with the decoder in irq that decompresses the sample in real-time and this results in even more usage.

This leaves not much cpu time for the loader and cuts down the approximate speed to around 2k a second or less.

Now this would not be a problem if there were many repeated consecutive sample loops (that would allow the loader to load next segment in time) but there can be over half a dozen or more unique 4 bar patterns consecutively played back.

Now you may calculate that 10800hz is packed to 1650 bytes and is enough time to load consecutively.. wrong.

There are mechanical delays in the floppy drive. In particular i am also using over 50 files on the disk and 42 files can be cached this way using bitfire. Also as samples can be loaded from different area's of the disk, further delays can be present. This was found out the hard way when noticing that Vice, Turbo Chameleon and 1541u2 does not have the mechanical floppy delay implementation.

To resolve the issue, I have implemented dual stage caching so that even if a trigger is made to load next file while it is still loading the previous, it will be able to do this just fine. However, i have given some "breathing" space in-between per two chunk loads to ensure a fresh new batch of load requests will not result in overrun. Utilising more buffer slots and reorganising order of chunks to load also ensure that specific amount of chunks are loaded in time.

overall together with the sequencing, a whole 180 second (3 minutes) track has been condensed to less than 1k a second with 11khz sample playback rate. Approximately 164k of sample data on the disk.

effects

Now onto the effects. There is not much ram left and I can only use a very small amount of cpu time per frame due to the ssdpcm decoder running per frame + nmi, hence what you see on the screen are just some visualizer based effects.

These are adjusted based on the actual sample playing and the various effects are synced based on the sample segments that are playing back.

Sample output

The actual writes to produce the sample via nmi is using the mahoney method which allows a single write to d418 via relevant amplitude table producing higher than 4bits playback. There is AutoDetect in the demo that should adjust based on whether or not your c64 has old sid or new sid. Warning however that the old sid results sound more distorted and varied. For higher quality, recommended to use newsid revision (although this may also vary on different c64 machines). If using any emulator, make sure you select resid and 8580. It will not work on fastsid emulation.

added on the 2017-06-28 18:25:47 by algorithm

efforts+platform

rulez added on the 2017-07-01 20:50:06 by frog