Announcing TurboQOA - Streaming QOA Codec

Something I have wanted to write but never got around to it, QOA (Quite Ok Audio) is a very simple, lossy audio codec by the same person who brought you QOI, the Quite Ok Image format. There is no real reason to use QOA. Until I found one and I find no one implemented it yet. So I did.

The Quite OK Audio Format for Fast, Lossy Compression

The use case for me is to stream audio over WebSockets for my TTS engine. I know WebRTC exists, but that thing is a pain in the butt to to support unless you are using Electron or is on the browser. The far simpler solution is to dump audio over websockets (which luckily there's a lot of libraries for that) and decode it on the client side. Which I had the brilliant idea to already support OPUS. However, interfacing with libopus is hard and I haven't got it to work yet. The opusend library does streaming encode no problem. But openfile, the high level decoder does not have support for streaming inputs and hacks would encour large overhead making it useless for realtime applications. So I was stuck with raw PCM for a while. Which is horrible for bandwidth usage. Hence, QOA. It's about 1/5 of the size of PCM and I can write my own codec so streaming IO is not going to be a problem. (OPUS by default is another 3.5x smaller then QOA)

Paroli - Streaming TTS based on Piper with optional RK3588 NPU support

Honestly I have no idea how QOA works. So not gonna introduce readers on that. You'd read the author's blog post on that. TurboQOA is completely rewritten. Based on the official QOA spec, partially my previous MagicQOA library (streaming decoder in C++23) and some constants copied from the reference implementation. I decided to give myself some challenge and write it in plain C. And fun fact, a few months back I tried to contract someone around USD $400 for the libray, which is basically the rate of my own time for the amount of time I estimate I need. People sayit;s not enoguh. So I did it myself, and I killed the time estimation. Glad I didn't pay that guy.

Time Domain Audio Compression at 3.2 bits per Sample

MagicQOA - C++23 QOA decoder

Anyway, TurboQOA, since it's streaming, has a more complicated API then the reference implementation. Though experienced C developers should have no problem with it.

// decode
#include <turboqoa/turboqoa.h>

// Create a decoder
struct TurboQOADecoder *decoder = turboqoa_decoder_create();

// Prepare your input buffer. The decoder is input, so you can add more data and invoke decode to get more samples
uint8_t *input = ...;
size_t input_size = ...;

int16_t* pcm_buffer = malloc(4096 * sizeof(int16_t)); // output buffer
while(!turboqoa_decoder_decode_done(decoder)) {
    enum TruboQOADecoderWants wants;       // Reason for the decoder to pause decoding
    size_t consumed;                       // How many bytes from the input buffer is consumed
    size_t samples_written;                // How many samples are written to the output buffer
    enum TurboQOADecoderError error = turboqoa_decoder_decode(decoder, input, input_size,
        &consumed, pcm_buffer, 4096, &samples_written, &wants);
    if(error != TURBOQOA_DECODER_ERROR_NONE) {
        // handle error here. the decoder should not be used after an error
        break;
    }
    input += consumed;
    input_size -= consumed;

    // Make use of the samples in pcm_buffer[0:samples_written]
    // ...

    if(wants == TURBOQOA_DECODER_WANTS_MORE_DATA) {
        // Load more data into the input buffer
        // input = ...
        // input_size = ...
    }
}

turboqoa_decoder_destroy(decoder);

Encoding is the same.

// encode
#include <turboqoa/turboqoa.h>
#include <fstream>

size_t sample_rate = 48000;
size_t channels = 2;
size_t samples_per_channel = ...; // set to 0 if you don't know how many samples you have (streaming mode)

// Create a encoder. The callback is called when the encoder wants to write the encoded data
// You could point it to a file like I did here, or a network socket, or a buffer
struct TurboQOAEncoder* encoder = turboqoa_encoder_create(sample_rate, channels, samples_per_channel,
    [](void* user_data, const uint8_t* data, size_t size) {
        std::ofstream* out = (std::ofstream*)user_data;
        out->write((char*)data, size);
    }, &out);

// Like the decoder, the encoder is also streaming. You can keep feeding it
// with data and invoke encode to get more frames out
int16_t* pcm = ...;
size_t num_samples = ...;
while(!turboqoa_encoder_encode_done(encoder)) {
    enum TurboQOAEncoderWants wants;
    size_t consumed;
    enum TurboQOAEncoderError error = turboqoa_encoder_encode(encoder, pcm, num_samples, &consumed, &wants);
    if(error != TURBOQOA_ENCODER_ERROR_NONE) {
        std::cout << "Error while encoding. Code: " << error << std::endl;
        break;
    }
    pcm += consumed;
    num_samples -= consumed;

    if(wants == TURBOQOA_ENCODER_WANTS_MORE_DATA) {
        // Load more data into the input buffer
        // pcm = ...
        // num_samples = ...
    }
}

turboqoa_encoder_destroy(encoder);

There is also dead simple API for encoding and decoding an entire file at once. Which I figure someone would find useful.

int16_t* turboqoa_decode_buffer(const uint8_t* data, size_t size, uint8_t* num_channels, uint32_t* sample_rate);
uint8_t* turboqoa_encode_buffer(const int16_t* data, size_t size, uint8_t num_channels, uint32_t sample_rate, size_t* out_size);

Pretty cool, huh? Here's the code

TurboQOA - Pure C implementation of realtime, streaming QOA (Quite Ok Audio) codec

Side note

I only discovered that MPV and FFmpeg has support for QOA after made this post. It would have made debugging the codec a lot easier. Dang it!

Future work if I ever get to it

After reading the reference QOA implementation. I find a feature in the file format not being utilized for maximum quality. QOA splits it's audio stream into frames. Each frame has an explicit LMS (Least Mean Square) predictor state stored in the frame headers. The reference encoder simply inherits the predictor state from the previous frame. Which works. But ideally some heuristics could be used to find the best predictor state for each frame. The decoder already reads the predictor state from the frame header. That'll be a free quality boost. Though, I haven't figure out a sane algorithm to do that yet. And it's late today.

Martin Chang

Systems software, HPC, GPGPU and AI. I mostly write stupid C++ code. Sometimes does AI research. Chronic VRChat addict

I run TLGS, a major search engine on Gemini. Used by Buran by default.

martin \at clehaxze.tw
Matrix: @clehaxze:matrix.clehaxze.tw
Jami: a72b62ac04a958ca57739247aa1ed4fe0d11d2df