How to write performant realtime audio code

Niccolo Abate
Oct 3, 2023
9 min read

Updated: Oct 4, 2023

The goal of this article is to break down the knowledge of performant realtime audio programming I have accumulated, hopefully in a way that is approachable, with examples. I still have much to learn, but I hope to distill down what I do know. Please, don’t hesitate to teach me something or discuss.

I will talk about some highlights of realtime audio programming, such as heap memory allocation, as well as discussing thread structure considerations at length, and some further lower level optimizations at the end.

I will bring this up a few times throughout, but I want to make this note even before starting. I believe it is important to think about your goals for any given part of the development cycle. At the beginning you might want to focus on getting something working so you can test earlier, without worrying as much about optimizations. Or if your goal is to achieve something well defined, but make it fast, you might want to be thinking about it from the get go. Some of what I will talk about you should always be thinking about, some of it you might never need to think about. It is all good to understand, but it is good to make sure your code will work properly and create a desirable result first, so you aren’t optimizing something that isn’t worth it.

Now, on to the audio programming:

Don’t allocate heap memory during the audio processing call

Heap memory should never be allocated during the audio processing call because it introduces unpredictability, meaning potential spikes in performance, or hanging up in the worst case (we will discuss this further with threads). Allocating this type of memory interfaces with the operating system in order to figure out what memory in the heap will be allocated, introducing the potential issue.

Common heap allocation pitfalls are allocating buffers (such as audio buffers) and initializing or copying dynamically sized data structures (such as vectors).

There are a couple solutions to consider:

Preallocate your memory

The main solution to this problem is preallocating all your memory. Instead of creating an audio buffer inside the scope of the audio processing function (which typically would create the structure on the stack, but allocate the actual data onto the heap), create a buffer scoped outside the audio processing function and allocate the memory ahead of time. Now the same memory will be reused every block, instead of allocating a new block of temporary memory every time.

Stack based data structures

Another trick that can be leveraged under the correct circumstance are stack based dynamically sized data structures. For example, a StackVector data structure which behaves exactly like a normal vector, but it is managing a fixed size buffer on the stack, instead of heap memory. This can be a simple and effective way to mitigate the heap issues, but should only be used in the right circumstances, where the maximum amount of data is known, and not too large.

Stack based data structures can be implemented through use of a stack allocator in conjunction with STL data structures. However, this is not a trivial thing to do in practice, depending on the allocation and deallocation strategy of the given data structure, and I am not going to get into this right now (maybe in a later post). With that in mind, consider when to simply create a stack based data structure from scratch, such as a simple stack vector. If you go this route, I recommend maintaining the STL API if possible, so your structure can be used interchangeably (for the functionality you need to support).

Handle threads properly

Handling and understanding of threads is a very important component of performant realtime audio code, especially as the scale of the project increases. Even for small scope projects however, it is good to have some understanding of threads and the thread structure of your program.

Most realtime audio environments have some version of the following thread structure: audio thread, GUI / Message thread, and other / worker threads (when needed).

The audio thread is the most important thread, doing the audio / dsp work – the audio processing call.

The GUI / Message thread is the main other thread of the program, handling non-audio processing including GUI, message passing, and other functional considerations of the program. This thread handles GUI callback functions and processes, which are usually called / updated at a lower rate than the audio thread.

Other / worker threads are additional threads employed to accomplish any number of goals, typically either some other loosely synchronized process or in a truly parallel version of part of the program in the pursuit of better performance. Use of these threads are very dependent on the particular program. Benefits of truly parallel solutions in the audio processing call are rare, for reasons I will discuss later in this section, though I expect truly parallel solutions to become more common.

Now, on to thread handling best practices.

Never let the audio thread hang.

We never want the audio thread to hang, as even a short hang up can lead to audible artifacts in the output if a block isn’t completed in time.

There are a few main things to think about in regard to this:

Be careful with critical sections (mutexes / locks). Critical sections can still find use in audio code, but they should be used sparingly. If you are going to use a critical section, have the audio thread try the lock instead of hanging on it, and let the audio thread move on and try again next block if the lock isn’t free.
Never hang on another thread to finish something (again, we will discuss truly parallel solutions later). This is related to the last point, but also more general. With most synchronization schemes, there will be some sort of lock or barrier mechanism and we want to make sure we essentially try the lock instead of hanging. If we set another thread to process a chunk of data (say, graphics processing), if the other thread doesn’t finish in time for the next chunk of data from the audio thread, let the audio thread move on, dropping a frame of graphics, or queuing up the data, but not blocking the audio thread. Once again, the audio thread takes priority – it typically isn’t a big deal if we drop a frame of graphical data, but it is a big deal if we introduce artifacts into the audio output by hanging up.
Never make any calls that might hang (namely OS / system calls). Think back to the memory allocation section. That same reasoning applies to system calls, file I/O, etc.

Break up the work appropriately between threads.

We want to minimize the work of the audio thread; therefore, we want the audio thread to only do the bare minimum amount of work to achieve our desired goal, no more, no less. With that in mind, we want to factor out all not-strictly-necessary code, namely graphics processing. Having a graphical interface will always have to have some effect on the audio thread – the graphical representation in our software has meaning only in conjunction with the audio after all. Specifically, at the very least some amount of synchronization must take place between the threads and some control parameters must be passed between them, but for more advanced interfaces, additional data must be passed between the threads and further processing might be necessary. In practice, completely factoring out the graphics processing, while maintaining proper synchronization and thread safety, can be a complex process, but it is very fruitful.

I have another article about how I lowered my plugin's CPU by 66% with one fix by addressing this type of issue, discussing the development of DelayCat.

Similarly you might want to create extra threads for handling a separate piece of work that isn’t part of the main audio processing. Consideration for handling these threads will be very similar to those we just discussed for the GUI / message thread. The main question is when should you do this and why? Using additional threads like this is a good idea when you have a large process you want to run that affects audio output at a much lower timescale, or no effect on the audio output at all. Examples of this might include running statistical analysis (dimension reduction, clustering, etc.), learning algorithms, or other heavyweight processes.

Make use of truly parallel solutions. Maybe.

One way of increasing performance past the sequential limit is to create a parallel implementation of the algorithm. This is an increasingly important improvement as the number of CPU cores in most processors continues to grow. However, there are some potential issues for realtime audio in this area.

In order to illuminate these, let’s discuss when parallel implementations are most effective: sequentially independent algorithms (i.e. the algorithm can be broken into smaller pieces that can be done in any order and don’t depend on each other) with large amounts of data (the data must be sufficiently large to warrant parallelization and to outweigh the overhead of launching and synchronizing threads).

Audio algorithms are often sequentially dependent. Consider a delay line (particularly one with feedback), which also means delay based algorithms such as filters, choruses, algorithmic reverbs, etc. In this case, each sample may be dependent on the last (depending on the delay time) meaning we can’t break down the block of samples into chunks and compute them independently. This is an arbitrary example, but the point extrapolates out to many different audio processes.

Perhaps more importantly, realtime audio is processing on blocks that are fairly small in the scale of parallel algorithms – typically block size ranging from 64-1024 samples. Instead of the problem being to process 10000000 samples of audio statically all at once, we are processing in small chunks so that realtime playback can be achieved and this thus limits the amount of benefit from parallelization, with the overhead of launching and synchronizing perhaps outweighing the benefit. While thread pooling will help with the launching, this still often ends up being a fight not worth picking. There are some problems which I am sure warrant parallel solutions however, and I again want to note that I believe this will only increase.

As an exercise, consider the FFT. The FFT presents itself as a possibly apt candidate because the work is able to be broken up effectively, you have a known buffer size usually in the range of 1024 - 4096 samples (slightly better than the block size we discussed before), and it is an expensive enough algorithm to warrant the attention. From what I have read, there are some impressive results, however parallel algorithms still aren’t common, and I haven’t tried the algorithms myself in any realtime code.

Cache data

Think about when you can cache data to avoid recomputing the same things unnecessarily. I wouldn’t (for the most part) sacrifice code clarity on the very small scale, because code clarity is important, even if you aren’t sharing the code base, and the compiler can do a lot on this level, but in the higher level parts of your program structure it is important to keep this in mind.

For example, you might want to read and cache a value at the beginning of every block instead of updating it every sample. Or, you might want to update a cached value only whenever a parameter changes, and only read from the cache otherwise. Alternatively, you might have an expensive function you need to poll, a graph or other complicated function – it might be worth it to cache the function in a lookup table at whatever accuracy is necessary.

As always, this isn’t applicable in every situation, but it should be kept in mind.

Use approximations and special cases

Similarly, think about when you can get away with and benefit from using approximations or taking advantage of special cases.

We don’t always need 100% accuracy or fidelity in audio processing. Sometimes we do, but other times the performance benefit we would get from lowering the fidelity or approximating a value outways the perceptual benefit. In fact, often the perceptual difference won’t even be noticeable and sometimes it might even be perceived as lesser. Consider panning for example, you might be perfectly well off with a simpler panning algorithm and the performance gain might be worth it, especially if you are applying the panning algorithm often.

Additionally, sometimes we can take advantage of the constrained environment of our audio code to make some optimizations. Examples of this would be not checking edge cases that we know will never be allowed. I have even seen redefinition of math operations because the standard library implementations must account for lots of edge cases that we know will never occur in our code.

As a note here, I once again want to advocate for safe code, especially in earlier stages of development – consider checking some more of these cases that shouldn’t be possible, and/or using the standard library math implementations first. This doesn't need to be included in release, but unexpected things will happen during development, and a little bit of foresight here can save a huge amount of debugging time later.

Take advantage of your compiler

Compilers do a lot for us, and it is important to think about how to best take advantage of the compiler. I am just going to lay out a couple tips and tricks I have picked up, but I am sure they are others, and I am still hoping to learn more about how to best make use of my compiler.

Make sure optimizations are fully turned on for release. Chances are, they are already, but we want to make sure we are getting every bit of optimization out of the compiler.

Disable exception handling. Obviously, this can’t be done if you are using exceptions in any of your code, or want to catch exceptions from any library code, but I rarely use exceptions (if at all) in my audio code, and so there is no reason to include the extra performance hit involved with supporting them (though it seems to be minimal).

To do this, you simply need to include the following compilation flag(s), depending on your compiler:

Clang: -fno-exceptions

Gcc: -features=no%except

MSVC: /EHsc /D_HAS_EXCEPTIONS=0

Once again, I hope this article was helpful or interesting, and please don’t hesitate to discuss, or teach me something new :)

Software Developer, Audio Engineer, Composer, Sound Designer