Machine Intelligence Applications in Music

This post is a review of a cloud-based, AI-driven application designed to strip vocals and other instruments from audio files. I would imagine the main users would be musicians who would like to remix a track. Having access to a capella vocals enables this process. There may be other uses as well, such as the creation of tracks for karaoke or sing-along exercises. I was not aware of the existence of cloud-based approaches to music track processing until I did a search on how to access  a capella vocals. I had found some examples of this online but concluded that these tracks were not widely available. However, the search did produce some interesting resources. The first would be tutorials on using existing computer-based audio processing tools, such as Audacity or VST-based plugins. The conclusion of most of these tutorials seemed to be that one might be able to reduce the balance of audio signal but artifacts and residual audio from other tracks (such as guitar and drums) were to be expected. The second result was a series of links to online, cloud-based resources. These all offer a small number of free uses, after which one needs to set up an account and send a small payment for more processing time. The costs are uniformly quite fair across the software packages.

I'll write here about LALAL.AI. Although in my day job I do some work with machine learning, I have not delved into the specifics of the approaches used here and at other sites. The simple thing to say here is that of the three that I tried, LALAL.AI's approaches were significantly closer to meeting my goals. Probably best to describe the goals and workflow here for the sake of brevity.

The goal: I wanted to remix some 90s-era indie rock tracks in the style of Chicago House music. Why? Just to play some electronic music gigs with friends in a way that audiences would be able to engage with. So, the need was to have access to a capella tracks in which the vocal is distinct, other instruments are not obviously present, and the sound quality is reasonably good. In this music genre, vocals may not always be surrounded by other instruments, so the vocal tracks should sound good on their own. 

The method: LALAL.AI asks you to upload your tracks. They accept common audio formats at various compression, bit rate, and sample rate levels. You can then specify what kind of processing you are looking for. LALAL.AI has clearly done some significant research and development here, as you can request processing to isolate vocals, guitar, and percussion. After some brief processing time, about the amount required to start a cup of tea, the track is available to download. The user interface is clean and simple to use:



The results meet the goals. LALAL.AI so far is producing the best results of any method I have found or am aware of: software based, cloud-based, or through analog processing. The results stand out in the following ways: there is less contamination of other tracks, the quality of the vocal sound is largely intact in terms of tonal balance, and there are fewer artifacts. These artifacts are common to machine learning approaches. Imagine a synthesizer line that parallels a vocal: machine learning approaches will all have some trouble with this and can result in a vocal track that sounds a bit like a DX100 is vocoding along with the track. LALAL.AI is good about suppressing most of this artifact which suggests to me they are doing something a bit more sophisticated than other approaches in suppressing signals that may have some similar characteristics. Their website provides a view behind the curtain: they have been updating the software. The current version is neural network-based and called Phoenix; prior iterations were Cassiopeia and Rocknet. Phonex handled the problem of separating percussive instruments and vocal information nearly perfectly. I did not yet explore instrument separation extensively but the results so far were promising but depended on the complexity of the track. Some musical genres are more amenable to this process than others. For example, shoegazing tracks remain a challenge for most of these applications. 

LALAL.AI





Comments