Archive for LLM

a voice-controlled Ableton Live

Posted in Appsterdam, Caffeine, consulting, Context, livecoding, music, Smalltalk, SqueakJS with tags , , , , , , , , , , on 5 January 2025 by Craig Latta
These robots know the value of keeping your hands on your instrument!

I’ve gotten up to speed on AI programming, and it didn’t hurt a bit. After learning the OpenAI chat and realtime APIs, I’m able to integrate generative AI text and speech into Caffeine. I got off to a good start by adding the ability to evaluate natural language with the same tools used to evaluate Smalltalk expressions in text editors. For my next application, I’m writing voice control for Ableton Live. This will let me keep my hands on a musical instrument instead of the keyboard or mouse while using Live.

This ties together my new realtime OpenAI client with an enabling technology I wrote previously: programmatic control of Live from a web browser. The musician speaks into a microphone, and a large language model translates their words into code that Caffeine can run on the Live API. With AI, the spoken commands can be relatively specific (“fade out at the end”) or very abstract (“use foreboding chords”).

The realtime OpenAI client uses a WebRTC audio channel to send spoken commands to the language model, and a WebRTC data channel to answer text as JSON data. I expect I’ll use system prompts instructing the model to respond using a domain-specific language (DSL) that can be run fairly directly by Caffeine. This will take the form of OpenAI tools that associate natural-language function descriptions with function signatures. The AI can deduce from conversational context when it should call a function, and the functions run locally (not on the OpenAI servers). I’ll define many functions that control various aspects of Ableton Live; prompts given by the musician will invoke them.

I imagine the DSL will be a distillation of the most commonly-used functions in Ableton Live’s very large API, and that it’ll emerge from real DAW use. What would you want to say to your DAW?