Page 1 of 1

Ideal AUX philosophy and compromises

Posted: September 27th, 2018, 8:21 pm
by bjkwon
As mentioned here, in AUX, we describe sounds or sound processing as a conceptual entity rather than an object to implement in a device. This philosophy stands firm because in most cases audio signals we deal with in AUXLAB come with an unambiguous mathematical equation or algorithm, thereby without the explicit description of precise digital sample-by-sample representation of the sound waveform, sounds can precisely defined in the AUXLAB environment. Then what if the sound, as it exists at the conceptual level, does not automatically come with an unambiguous deterministic representation? In other words, what if we have a relatively clear idea of what sound we want to represent, but that is clear only at the conceptual level but requires more information (i.e., clarifications) in order to actually make it? At its core philosophy, we want AUX to be a machine that converts concepts or ideas of sounds into actual sounds without low-level house chores. I admit that the philosophy is somewhat idealistic. In reality, there are plenty of cases where "concepts-only" will not work but more specifications, which might be considered "too technical" by some users at times, are necessary to make the actual sound.

This is where two noble goals of software features collide--ease of use vs. versatility. The software focusing too much on the former often lacks the ability to provide generalized and comprehensive features. On the other hand, the software boasting its versatility often requires a "learning curve" on the user side because it is not easy to use on Day 1. Like many other software architects, I want to seek both. Then, how? To do so, compromises must be made somewhere in the middle and we need to make wise decisions where and how to make compromises.

In the end, I want an AUX code to 1) be consistent with our conception of sounds (easy to use) AND 2) be unambiguous and determinsitic as is. Here, I am giving up more on the latter than on the former; i.e., the syntax should stay intuitive and represent ideas and concepts. If it is not sufficient to define one and only one sound, well, then we should at least document the details so capable users could dig deeper and make use of them.

Before we proceed, let's clarify what we mean by "unambiguous and deterministic" vs "not deterministic." Time to consider some technical details. Here we have two cases:
  • Sounds or sound processing with unique and unambiguous and deterministic representation
  • Sounds or sound processing where our conceptual description doesn't provide deterministic representation of signals
First, the sound represented by tone(f,d) is one and only one. (Granted, in fact, we need two more parameters, the peak amplitude and the beginning phase, for the complete description, which are omitted here; but that's an easy problem to resolve). Here, given the AUX code, we can completely determine the signal unambiguously with a mathematical expression. Another example is a so-called tone-glide, where the frequency of the tone changes (glides) from f1 at the beginning to f2 at the end. This case is also where we have one and only mathematical expression to make it. So you write in AUX: tone([f1 f2],d). Then AUXLAB will make it for you.

At this point, if you wonder whether one needs to know how to actually make the tone glide, I would say, it depends on who you are. If you are an engineer, you do need to be able to do it yourself, if need be. If you are an experimental psychologist doing research with human auditory perception or an architectural acoustician measuring some sort of characteristics of a room, and if you have no clue about how to generate a tone-glide from scratch, most likely you are forgiven. In either case, what's important is, understanding the perceptual entity of this thing called "tone-gilde," a tone that begins with a certain frequency and ends with a different frequency, rather than how the equation is actually written and how to do a low-level coding to implement that.

A third example is shifting the spectrum of a sound by a certain frequency. Once again, it is done with a relatively straightforward algorithm and one can produce the result indisputably with a certain level of training in signal processing. Given the signal x, another signal with the same spectrum but shifted by d Hz in frequency is x->d. There is no ambiguity.

On the other hand, there are plenty of cases where our conceptualization of sounds or sound processing is insufficient to describe them. One example is filtering. Say, you have a customer who wants a "generic" lowpass filter with a cutoff frequency of 1000 Hz. If you are an engineer, you would naturally ask for more technical specifications--the filter order, the slope, the stopband attenuation the passband ripple, etc. Even with the full technical spec, there are multiple ways to build a filter with each of them offering pros and cons. A naive customer without an understanding of how filters are designed often finds it perplexed why this has to be so complicated, all he wants is a damned lowpass filter with 1000 Hz cutoff frequency, something that he considers common and easy to make. You might tell him that there is no such thing as a "common" filter spec and ask him to be more specific. Well, good luck with that. More often than not he would be your boss. Then you might as well figure out what he needs based on the limited information without confronting him, so perhaps try to collect some examples he would find acceptable and make something similar.

The lesson here is that, in the field of acoustics and audio engineering, both perspectives have valid points. There are plenty of professionals without a full knowledge of signal processing but with a great deal of insights on sounds and auditory perception. Or, they may know signal processing, so it's not about the level of their signal processing knowledge, but within their domain there may be truly some form of common specifications of filters they use frequently, which engineers are not familiar with, unless they have interacted with them enough.

Therefore, however awkward it may seem to the engineers' perspective, in the scope of AUX, I decided to acknowledge the need for "generic" filtering based on the cutoff frequencies and included them as built-in functions: lpf, hpf, bpf, and bsf, all of which filter the signal with an IIR filter with "default" parameters. Now you can simply respond to your boss with just one line of AUX

x.lpf(1000)

The chances are, he will be satisfied. Of course, if he is not, you can try putting in more parameters to this and fine-tune the output. You might say that that the AUX philosophy has been compromised, because the meaning of x.lpf(1000) is not universal. There is a risk of disagreement between my choice of "default" filtering parameters and yours. But, from the user's perspective, having default parameters and evaluating them to see if they are appropriate for their needs (and if not, they can always put additional parameters, that's an easy solution) is better than not having the "default" because they may never be perfect. Therefore, in terms of practical functionality of the software, I consider this an acceptable compromise. Those who still do not agree with this approach can simply ignore these four functions and design their own filters and use the filt or filtfilt function, as done in MATLAB.

It goes even further. Another example is time-stretching(-expanding) or frequency-altering of the signal. For those who are haven't studied signal processing enough, this is completely different from the case of shifting the spectrum mentioned above. Much like filtering, there is no single algorithm for this type of processing that everyone indisputably agrees. Worse than filtering, not all engineers can do this off the top of their head. That is why most of the algorithms for these kinds of processing are proprietary and usually included in expensive DAW packages. Regardless of engineer's opinion, for "audio people," these seemingly simple features--just stretch or compress the signal in time without changing the pitch, or increase or decrease the pitch while keeping the duration--are so appealing and desired. Therefore, I included them as built-in functions.

x.tscale(1.1) //stretch the signal by 1.1 times longer
x.fscale(5) //increase the pitch by 5 semitones

Notice that these examples are an even worse deviation from the AUX philosophy than filtering, as more advanced algorithms are required and not readily understandable to many users. Again, I think users would be happy to have them as built-in functions, because the alternative is, not having them. Plus, AUXLAB is free, you can't beat the cost. For whatever reasons, if you don't like the performance of these functions, you are welcome to develop your own functions (and please share them with others), now with the debugger, AUXLAB offers significantly more user-friendly environment to develop audio processing algorithms than any other platforms.

Anyway, there is no regret for allowing these compromises. With proper justifications, functionality can override philosophy.

What do you think?

Re: Ideal AUX philosophy and compromises

Posted: April 18th, 2019, 9:39 pm
by bjkwon
Update---the tscale and fscale functions mentioned above were based on the phase vocoder and the performance was not satisfactory. Sometimes it was embarrassingly bad, so I decided to leave them out of the built-in functions list.

But, a little later, I came across the WSOLA algorithm, http://www.audiolabs-erlangen.de/resour ... SMtoolbox/, which showed very good performance. So I adopted the algorithm (i.e., I translated their MATLAB code into my C++ code) and put these functions back, but with different names as following:

x.timestretch(1.1) //stretch the signal by 1.1 times longer
x.pitchscale(1.1) //increase the pitch by the factor of 1.1 times

Despite this algorithm working quite well, the point I was making in the original post still stands. A lot of people, including myself, seem to be happy with it but It is not perfect by any means, they are there for practical reasons and the WSOLA algorithm is best choice for now, but it is not the ultimate, choice.