Archive for context

realtime vocal harmonization with Caffeine

Posted in Uncategorized with tags , , , , , , , , on 14 July 2021 by Craig Latta

I’ve written a Caffeine class which, in real time, takes detected pitches from a melody and chords, and sends re-voiced versions of the chords to a harmonizer, which renders them using shifted copies of the melody. It’s an example of an aggregate audio plugin, which builds a new feature from other plugins running in Ableton Live.

re-creating a classic

Way way back in 1991, before the Auto-Tune algorithm popularized in 1998, a Canadian company called IVL Technologies developed a hardware harmonizer, the Vocalist VHM5. It generated five-part vocal harmonies, live from sung melodies and chords played via MIDI. It had a simple but effective model of vocal formants, which enabled it to shift the pitch of a sung note to natural-sounding new pitches, including correcting the pitch of the sung note. It also had very fast pitch detection.

My favorite feature, though, was how it combined those features when voicing chords. In what was called “vocoder mode”, it would adjust the pitches of incoming MIDI chords to be as close as possible to the current pitch of a sung melody, or closed voicing. If the melody moved more than half an octave away from a chord voice, the rendered chord voice would adjust by some number of octaves up or down, so as to be within half an octave of the melody. With kinetic melodies and dense chords, this becomes a simple but compelling voice-leading technique. It’s even more compelling when the voices are spatialized in a stereo or 3D audio field, with reverb, reflections, and other post-processing.

It’s also computationally inexpensive. The IVL pitch-detection and shifting algorithms were straightforward for off-the-shelf digital signal processing chips to perform, and the Auto-Tune algorithm is orders of magnitude cheaper. One of the audio plugins I use in the Ableton Live audio environment, Harmony Engine by Antares, implements Auto-Tune’s pitch shifting. Another, MIDI Guitar by Jam Origin, does polyphonic pitch detection. With these plugins, I have all the live MIDI information necessary to implement closed re-voicing, and the pitch shifting for rendering it. I suppose I would call this “automated closed-voice harmonization”.


Caffeine runs in a web browser, which, along with Live, has access to all the MIDI interfaces provided by the host operating system. Using the WebMIDI API, I can receive and schedule MIDI events in Smalltalk, exchanging music information with Live and its plugins. With MIDI as one possible transport layer, I’ve developed a Smalltalk model of music events based upon sequences and simultaneities. One kind of simultaneity is the chord, a collection of notes sounded at the same time. In my implementation, a chord performs its own re-voicing, while also taking care to send a minimum of MIDI messages to Live. For example, only the notes which were adjusted in response to a melodic change are rescheduled. The other notes simply remain on, requiring no sent messages. Caffeine also knows how many pitch-shifted copies of the melody can be created by the pitch-shifting plugin, and culls the least-recently-activated voices from chords, to remain within that number.

All told, I now have a perfect re-creation of the original Vocalist closed-voicing sound, enhanced by all the audio post-processing that Live can do.

the setup

a GK-3 hex pickup through a breakout box

Back in the day, I played chords to the VHM5 from an exotic MIDI electric guitar controller, the Zeta Mirror 6. This guitar has a hex (six-channel) pickup, and can send a separate data stream for each string. While I still have that guitar, I also have a Roland GK-3 hex pickup, which is still in production and can be moved between guitars without modifying them. Another thing I like about hex pickups is having access to the original analog signal for each string. These days I run the GK-3 through a SynQuaNon breakout module, which makes the signals available at modular levels. The main benefit of this is that I can connect the analog signals directly to my audio interface, without software drivers that may become unsupported. I have a USB GK-3 interface, but the manufacturer never updated the original 32-bit driver for it.

Contemporary computers can do polyphonic pitch detection on any audio stream, without the use of special controller hardware. While the resulting MIDI stream uses only a single channel, with no distinction between strings, it’s very convenient. The Jam Origin plugin is my favorite way to produce a polyphonic chord stream from audio.

the ROLI Lightpad

My favorite new controller for generating multi-channel chord streams is the ROLI Lightpad. It’s a MIDI Polyphonic Expression (MPE) device, using an entire 16-channel MIDI port for each instrument, and a separate MIDI channel for each note. This enables very expressive use of MIDI channel messages for representing the way a note changes after it starts. The Lightpad sends messages that track the velocity with which each finger strikes the surface, how it moves in X, Y, and Z while on the surface, and the velocity with which it leaves the surface. The surface is also a display; I use it as a five-by-five grid, which presents musical intervals in a way I find much more accessible than that of a traditional piano keyboard. There are several MPE instruments that use this grid, including the Linnstrument and the GeoShred iPad app. The Lightpad is also very portable, and modular; many of them can be connected together magnetically.

The main advantage of using MPE for vocal harmonization is associating various audio processing state with each chord voice’s separate channel. For example, the bass voice of a chord progression can have its own spatialization and equalization settings.

My chord signal path starts with an instrument, a hex or normal guitar or Lightpad. Audio and MIDI data goes from the instrument, through a host operating system MIDI interface, through Live where I can detect pitches and record, through another MIDI interface to Caffeine in a web browser, then back to Live and the pitch-shifting plugin. My melody signal path starts with a vocal performance using a microphone, through Live and pitch detection, then through pitch shifting as controlled by the chords.

Let’s Play!

Between this vocal harmonization, control of the Ableton Live API, and the Beatshifting protocol, there is great potential for communal livecoded music performance. If you’re a livecoder interested in music, I’d love to hear from you!

Ableton Livecoding with Caffeine

Posted in Uncategorized with tags , , , , , , on 5 June 2021 by Craig Latta
Livecoding access can tame the complexity of Ableton Live.

I’ve written a proxy system to communicate with Ableton Live from Caffeine, for interactive music composition and performance. Live includes Max for Live (M4L), an embedded version of the Max media programming system. M4L has, in turn, access to both Node.JS, a server-side JavaScript engine embedded as a separate process, and to an internal JS engine extension of its own object system. Caffeine can connect to Node.JS through a websocket, Node.JS can send messages to Max, Max can call user-written JS functions, and those JS functions can invoke the Live Object Model, an API for manipulating Live. This stack of APIs also supports returning results back over the websocket, and for establishing callbacks.

getting connected

Caffeine creates a websocket connection to a server running in M4L’s Node.JS, using the JS WebSocket function provided by the web browser. A Caffeine object can use this connection to send a JSON string describing a Live function it would like to invoke. Node.JS passes the JSON string to Max, through an output of a Max object in a Max program, or patcher:

connecting the Node.JS server with JS Live API function invocation

Max is a visual dataflow system, in which objects inputs and outputs are connected, and their functions are run by a real-time scheduler. There are two special objects in the patcher above. The first is node.script, which controls the operation of a Node.JS script. It’s running the Node.JS script “caffeine-server.js”, which creates a websocket server. That script has access to a Max API, which it uses to send data through the output of the node.script object.

The second special object is js, which runs “caffeine-max.js”. That script parses the JSON function invocation request sent by Caffeine, invokes the desired Live API function, and sends the result back to Caffeine through the Node.JS server.


With this infrastructure in place, we can create a proxy object system in Caffeine. In class Live, we can write a method which invokes Live functions:

invoking a Live function from Caffeine

This method uses a SharedQueue for each remote message sent; the JS bridge callback process delivers results to them. This lets us nest remote message sends among multiple processes. The JSON data identifies the function and argument of the invocation, the identifier of receiving Live object, and the desired Smalltalk class of the result.

The LiveObject proxy class can use this invoking function from its doesNotUnderstand method:

forwarding a message from a proxy

Now that we have message forwarding, we can represent the entire Live API as browsable Smalltalk classes. I always find this of huge benefit when doing mashups with external code libraries, but especially so with Live. The Live API is massive, and while the documentation is complete, it’s not very readable. It’s much more pleasant to learn about the API with the Smalltalk browsing tools. As usual, we can extend the API with composite methods of our own, aggregating multiple Live API calls into one. With this we can effectively extend the Live API with new features.

extending the Live API

One area of Live API extension where I’m working now is in song composition. Live has an Arrangement view, for a traditional recording studio workflow, and a Session view, for interactive performance. I find the “scenes” feature of the Session view very useful for sketching song sections, but Live’s support for playing them in different orders is minimal. With Caffeine objects representing scenes, I can compose larger structures from them, and play them however I like.

How would you extend the Live API? How would you simplify it?

The Node.JS server, JS proxying code, and the Max patcher that connects them are available as a self-contained M4L device, which can be applied to any Live track. Look for it in the devices folder of the Caffeine repository.

Caffeine: live web debugging with SqueakJS

Posted in Appsterdam, consulting, Context, Naiad, Smalltalk, Spoon with tags , , , , , , , , , , , , , , , , , , , , on 26 October 2016 by Craig Latta

In February 2015 I spoke about Bert Freudenberg’s SqueakJS at FOSDEM. We were all intrigued with the potential of this system to change both Smalltalk and web programming. This year I’ve had some time to pursue that potential, and the results so far are pretty exciting.

SqueakJS is a Squeak virtual machine implemented with pure JavaScript. It runs in all the web browsers, and features a bi-directional JavaScript bridge. You can invoke JavaScript functions from Smalltalk code, and pass Smalltalk blocks for JavaScript code to invoke as callbacks. This lets Smalltalk programmers take advantage of the myriad JavaScript frameworks available, as well as the extensive APIs exposed by the browsers themselves.

The most familiar built-in browser behavior is for manipulating the structure of rendered webpages (the Document Object Model, or “DOM”). Equally important is behavior for manipulating the operation of the browser itself. The Chrome Debugging Protocol is a set of JavaScript APIs for controlling every aspect of a web browser, over a WebSocket. The developer tools built into the Chrome browser are implemented using these APIs, and it’s likely that other browsers will follow.

Using the JavaScript bridge and the Chrome Debugging Protocol, I have SqueakJS controlling the web browser running it. SqueakJS can get a list of all the browser’s tabs, and control the execution of each tab, just like the built-in devtools can. Now we can use Squeak’s user interface for debugging and building webpages. We can have persistent inspectors on particular DOM elements, rather than having only the REPL console of the built-in tools. We can build DOM structures as Smalltalk object graphs, complete with scripted behavior.

I am also integrating my previous WebDAV work, so that webpages are manifested as virtual filesystems, and can be manipulated with traditional text editors and other file-oriented tools. I call this a metaphorical filesystem. It extends the livecoding ability of Smalltalk and JavaScript to the proverbial “favorite text editor”.

This all comes together in a project I call Caffeine. had fun demoing it at ESUG 2016 in Prague. Video to come…

new website for Black Page Digital

Posted in Appsterdam, consulting, Context, GLASS, music, Naiad, Seaside, Smalltalk, Spoon with tags , , , , , , , , , , , , , , , , on 21 January 2016 by Craig Latta

I wrote a new website for Black Page Digital, my consultancy in Amsterdam and San Francisco. It features a running Squeak Smalltalk that you can use for livecoding. Please check it out, pass it on, and let me know what you think!pano

Context status 2015-01-16

Posted in Appsterdam, consulting, Context, Naiad, Smalltalk, Spoon with tags , , , , , , , on 16 January 2015 by Craig Latta

Hoi all–

Context is the umbrella project for Naiad (a distributed module system for all Smalltalks), Spoon (a minimal object memory that provides the starting point for Naiad), and Lightning (a remote-messaging framework which performs live serialization, used by Naiad for moving methods and other objects between systems). I intend for it to be a future release of Squeak, and a launcher and module system for all the other Smalltalks. I’m writing Context apps for cloud computing, web services, and distributed computation.

Commits b7676ba2cc and later of the Context git repo have:

  • Support for installable object memories as git submodule repos.
  • Submodule repos for memories for each of the known Smalltalk dialects, with Naiad support pre-loaded. I’m currently working on the submodules for Squeak and Pharo.
  • A web-browser-based console for launching and managing object memories.
  • A WebDAV-based virtual filesystem that enables Smalltalk to appear as a network-attached storage device, and mappings of the system to that filesystem that make Smalltalk accessible from external text editors (e.g., for editing code, managing processes and object memories).
  • Remote code and process browsers.

There’s a live discussion site and a mailing list. The newsgroup is gmane.comp.lang.smalltalk.squeak.context.

Thanks for checking it out!


Smalltalk Reflections episode 7: minimalism

Posted in Appsterdam, consulting, Context, Naiad, Smalltalk, Spoon with tags , , , , , on 14 January 2015 by Craig Latta

Episode 7 of Smalltalk Reflections is out. The topic is “minimalism”.

Smalltalk Reflections episode three is up

Posted in Appsterdam, consulting, Context, music, Smalltalk, Spoon with tags , , , , , , , , , , , , , , on 16 December 2014 by Craig Latta

Check it out!

Context release 4 alpha 1

Posted in Appsterdam, consulting, Context, Naiad, Smalltalk, Spoon with tags , , , , , , on 9 December 2014 by Craig Latta

Context 4 alpha 1 is released. This one fixes loading errors in the welcome page, supports remote debugging and process browsing, and makes Naiad and remote messaging support available as Monticello packages. Thanks in advance for reporting bugs!

a most useful virtual machine debugging aid: simulated objects

Posted in Appsterdam, consulting, Context, Smalltalk, Spoon with tags , , , , , , on 2 December 2014 by Craig Latta

Squeak’s virtual machine simulator is extremely useful for debugging. You can use it to inspect and change objects “while time is stopped”, between the execution of individual virtual machine instructions. Traditionally, though, it takes an address-based view of objects. There are several useful utility methods which, given an object address, will print useful information to the Transcript. Wouldn’t it be nicer, though, if you could use normal inspectors to look through the fields of the objects in a simulated virtual machine’s object memory?

I created simulated objects for this purpose. They are instances of a SimulatedObject class; each one has an interpreter and an address. They can print useful information about themselves, like the interpreter can, but they can also modify themselves and interact with each other, changing the interpreter’s object memory appropriately. Are you wondering about the instructions of a compiled method? Would you like to make a few choice modifications to those instructions? A simulated object for that method’s address will help you.

Simulated objects play nicely with Squeak’s object inspectors, and, more importantly, with its object explorers. You feel like you’re inspecting normal objects, except that you can’t send normal messages to them. Or can you? I’m pondering this. It might be useful, for example, to terminate a process in a simulated interpreter’s object memory, without having to do it in another process. Time is stopped, but perhaps you could queue up messages to send when it starts again, through a collaboration between simulated objects and a coordinating object in the memory they describe.

I’ve been using simulated objects recently to chase references with the absolute assurance that I won’t be creating new ones. They’re very useful for debugging virtual machine primitives. Sometimes, when I’m debugging a headless system with a broken remote messaging system, it’s the only user interface I have for inspecting things. And it’s sure a lot nicer than inspecting things in a C debugger.

What will you do with them?

debugging remote exceptions works

Posted in consulting, Context, Smalltalk, Spoon with tags , , , , , , on 20 November 2014 by Craig Latta

a debugger for a remote unhandled exception

a debugger for a remote unhandled exception

I have debugging working for remote unhandled exceptions. My motivating use case was debugging messages not understood by the Context console’s embedded web server. The console is a headless app. In development, I run it with a remote-messaging connection to a headful system. Now, when there is an unhandled exception (like a message not understood), the exception requests that the headful system open a debugger (as its default action).

Before opening the debugger, the headful system replaces the sender of the first relevant context on the headless system with the last relevant context on the headful system, hiding all the remote-messaging-related contexts in between. The picture above shows an example of this. On the headful system, I sent “zork” to an object on the headless system. The debugger shows a continuous context stack which spans the two systems. This all works with little special handling in the debugger because of the complete transparency of remote messaging. It doesn’t matter that the contexts and methods that the debugger is manipulating happen to be remote.

%d bloggers like this: