realtime vocal harmonization with Caffeine

I’ve written a Caffeine class which, in real time, takes detected pitches from a melody and chords, and sends re-voiced versions of the chords to a harmonizer, which renders them using shifted copies of the melody. It’s an example of an aggregate audio plugin, which builds a new feature from other plugins running in Ableton Live.

re-creating a classic

Way way back in 1991, before the Auto-Tune algorithm popularized in 1998, a Canadian company called IVL Technologies developed a hardware harmonizer, the Vocalist VHM5. It generated five-part vocal harmonies, live from sung melodies and chords played via MIDI. It had a simple but effective model of vocal formants, which enabled it to shift the pitch of a sung note to natural-sounding new pitches, including correcting the pitch of the sung note. It also had very fast pitch detection.

My favorite feature, though, was how it combined those features when voicing chords. In what was called “vocoder mode”, it would adjust the pitches of incoming MIDI chords to be as close as possible to the current pitch of a sung melody, or closed voicing. If the melody moved more than half an octave away from a chord voice, the rendered chord voice would adjust by some number of octaves up or down, so as to be within half an octave of the melody. With kinetic melodies and dense chords, this becomes a simple but compelling voice-leading technique. It’s even more compelling when the voices are spatialized in a stereo or 3D audio field, with reverb, reflections, and other post-processing.

It’s also computationally inexpensive. The IVL pitch-detection and shifting algorithms were straightforward for off-the-shelf digital signal processing chips to perform, and the Auto-Tune algorithm is orders of magnitude cheaper. One of the audio plugins I use in the Ableton Live audio environment, Harmony Engine by Antares, implements Auto-Tune’s pitch shifting. Another, MIDI Guitar by Jam Origin, does polyphonic pitch detection. With these plugins, I have all the live MIDI information necessary to implement closed re-voicing, and the pitch shifting for rendering it. I suppose I would call this “automated closed-voice harmonization”.


Caffeine runs in a web browser, which, along with Live, has access to all the MIDI interfaces provided by the host operating system. Using the WebMIDI API, I can receive and schedule MIDI events in Smalltalk, exchanging music information with Live and its plugins. With MIDI as one possible transport layer, I’ve developed a Smalltalk model of music events based upon sequences and simultaneities. One kind of simultaneity is the chord, a collection of notes sounded at the same time. In my implementation, a chord performs its own re-voicing, while also taking care to send a minimum of MIDI messages to Live. For example, only the notes which were adjusted in response to a melodic change are rescheduled. The other notes simply remain on, requiring no sent messages. Caffeine also knows how many pitch-shifted copies of the melody can be created by the pitch-shifting plugin, and culls the least-recently-activated voices from chords, to remain within that number.

All told, I now have a perfect re-creation of the original Vocalist closed-voicing sound, enhanced by all the audio post-processing that Live can do.

the setup

a GK-3 hex pickup through a breakout box

Back in the day, I played chords to the VHM5 from an exotic MIDI electric guitar controller, the Zeta Mirror 6. This guitar has a hex (six-channel) pickup, and can send a separate data stream for each string. While I still have that guitar, I also have a Roland GK-3 hex pickup, which is still in production and can be moved between guitars without modifying them. Another thing I like about hex pickups is having access to the original analog signal for each string. These days I run the GK-3 through a SynQuaNon breakout module, which makes the signals available at modular levels. The main benefit of this is that I can connect the analog signals directly to my audio interface, without software drivers that may become unsupported. I have a USB GK-3 interface, but the manufacturer never updated the original 32-bit driver for it.

Contemporary computers can do polyphonic pitch detection on any audio stream, without the use of special controller hardware. While the resulting MIDI stream uses only a single channel, with no distinction between strings, it’s very convenient. The Jam Origin plugin is my favorite way to produce a polyphonic chord stream from audio.

the ROLI Lightpad

My favorite new controller for generating multi-channel chord streams is the ROLI Lightpad. It’s a MIDI Polyphonic Expression (MPE) device, using an entire 16-channel MIDI port for each instrument, and a separate MIDI channel for each note. This enables very expressive use of MIDI channel messages for representing the way a note changes after it starts. The Lightpad sends messages that track the velocity with which each finger strikes the surface, how it moves in X, Y, and Z while on the surface, and the velocity with which it leaves the surface. The surface is also a display; I use it as a five-by-five grid, which presents musical intervals in a way I find much more accessible than that of a traditional piano keyboard. There are several MPE instruments that use this grid, including the Linnstrument and the GeoShred iPad app. The Lightpad is also very portable, and modular; many of them can be connected together magnetically.

The main advantage of using MPE for vocal harmonization is associating various audio processing state with each chord voice’s separate channel. For example, the bass voice of a chord progression can have its own spatialization and equalization settings.

My chord signal path starts with an instrument, a hex or normal guitar or Lightpad. Audio and MIDI data goes from the instrument, through a host operating system MIDI interface, through Live where I can detect pitches and record, through another MIDI interface to Caffeine in a web browser, then back to Live and the pitch-shifting plugin. My melody signal path starts with a vocal performance using a microphone, through Live and pitch detection, then through pitch shifting as controlled by the chords.

Let’s Play!

Between this vocal harmonization, control of the Ableton Live API, and the Beatshifting protocol, there is great potential for communal livecoded music performance. If you’re a livecoder interested in music, I’d love to hear from you!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: