Caffeine in a Deno worker can provide Web APIs to Smalltalk in a native app.
bridging native apps and the Web
We’ve been able to run Caffeine headlessly in a Web Worker for some time now, using NodeJS. I’ve updated this support to use the Deno JavaScript runtime instead of Node. This gives us better access to familiar Web APIs, and a cleaner module system without npm. I’ve also extended the bridging capability of the code that Deno runs. Now, a native Squeak app can start Deno (via class OSProcess), Deno starts Caffeine in a worker, and the two Smalltalk instances can communicate with each other via remote messaging.
I’m using this bridge to let native Squeak participate in WebRTC sessions with other Smalltalks, as part of the Naiad team development system. The same Squeak object memory runs in both the native Squeak and the Deno worker. I’m sure many other interesting use cases will arise, as we explore what native Squeak and Web Squeak can do together!
I’ve written a Caffeine class which, in real time, takes detected pitches from a melody and chords, and sends re-voiced versions of the chords to a harmonizer, which renders them using shifted copies of the melody. It’s an example of an aggregate audio plugin, which builds a new feature from other plugins running in Ableton Live.
re-creating a classic
Way way back in 1991, before the Auto-Tune algorithm popularized in 1998, a Canadian company called IVL Technologies developed a hardware harmonizer, the Vocalist VHM5. It generated five-part vocal harmonies, live from sung melodies and chords played via MIDI. It had a simple but effective model of vocal formants, which enabled it to shift the pitch of a sung note to natural-sounding new pitches, including correcting the pitch of the sung note. It also had very fast pitch detection.
My favorite feature, though, was how it combined those features when voicing chords. In what was called “vocoder mode”, it would adjust the pitches of incoming MIDI chords to be as close as possible to the current pitch of a sung melody, or closed voicing. If the melody moved more than half an octave away from a chord voice, the rendered chord voice would adjust by some number of octaves up or down, so as to be within half an octave of the melody. With kinetic melodies and dense chords, this becomes a simple but compelling voice-leading technique. It’s even more compelling when the voices are spatialized in a stereo or 3D audio field, with reverb, reflections, and other post-processing.
It’s also computationally inexpensive. The IVL pitch-detection and shifting algorithms were straightforward for off-the-shelf digital signal processing chips to perform, and the Auto-Tune algorithm is orders of magnitude cheaper. One of the audio plugins I use in the Ableton Live audio environment, Harmony Engine by Antares, implements Auto-Tune’s pitch shifting. Another, MIDI Guitar by Jam Origin, does polyphonic pitch detection. With these plugins, I have all the live MIDI information necessary to implement closed re-voicing, and the pitch shifting for rendering it. I suppose I would call this “automated closed-voice harmonization”.
implementation
Caffeine runs in a web browser, which, along with Live, has access to all the MIDI interfaces provided by the host operating system. Using the WebMIDI API, I can receive and schedule MIDI events in Smalltalk, exchanging music information with Live and its plugins. With MIDI as one possible transport layer, I’ve developed a Smalltalk model of music events based upon sequences and simultaneities. One kind of simultaneity is the chord, a collection of notes sounded at the same time. In my implementation, a chord performs its own re-voicing, while also taking care to send a minimum of MIDI messages to Live. For example, only the notes which were adjusted in response to a melodic change are rescheduled. The other notes simply remain on, requiring no sent messages. Caffeine also knows how many pitch-shifted copies of the melody can be created by the pitch-shifting plugin, and culls the least-recently-activated voices from chords, to remain within that number.
All told, I now have a perfect re-creation of the original Vocalist closed-voicing sound, enhanced by all the audio post-processing that Live can do.
the setup
a GK-3 hex pickup through a breakout box
Back in the day, I played chords to the VHM5 from an exotic MIDI electric guitar controller, the Zeta Mirror 6. This guitar has a hex (six-channel) pickup, and can send a separate data stream for each string. While I still have that guitar, I also have a Roland GK-3 hex pickup, which is still in production and can be moved between guitars without modifying them. Another thing I like about hex pickups is having access to the original analog signal for each string. These days I run the GK-3 through a SynQuaNon breakout module, which makes the signals available at modular levels. The main benefit of this is that I can connect the analog signals directly to my audio interface, without software drivers that may become unsupported. I have a USB GK-3 interface, but the manufacturer never updated the original 32-bit driver for it.
Contemporary computers can do polyphonic pitch detection on any audio stream, without the use of special controller hardware. While the resulting MIDI stream uses only a single channel, with no distinction between strings, it’s very convenient. The Jam Origin plugin is my favorite way to produce a polyphonic chord stream from audio.
the ROLI Lightpad
My favorite new controller for generating multi-channel chord streams is the ROLI Lightpad. It’s a MIDI Polyphonic Expression (MPE) device, using an entire 16-channel MIDI port for each instrument, and a separate MIDI channel for each note. This enables very expressive use of MIDI channel messages for representing the way a note changes after it starts. The Lightpad sends messages that track the velocity with which each finger strikes the surface, how it moves in X, Y, and Z while on the surface, and the velocity with which it leaves the surface. The surface is also a display; I use it as a five-by-five grid, which presents musical intervals in a way I find much more accessible than that of a traditional piano keyboard. There are several MPE instruments that use this grid, including the Linnstrument and the GeoShred iPad app. The Lightpad is also very portable, and modular; many of them can be connected together magnetically.
The main advantage of using MPE for vocal harmonization is associating various audio processing state with each chord voice’s separate channel. For example, the bass voice of a chord progression can have its own spatialization and equalization settings.
My chord signal path starts with an instrument, a hex or normal guitar or Lightpad. Audio and MIDI data goes from the instrument, through a host operating system MIDI interface, through Live where I can detect pitches and record, through another MIDI interface to Caffeine in a web browser, then back to Live and the pitch-shifting plugin. My melody signal path starts with a vocal performance using a microphone, through Live and pitch detection, then through pitch shifting as controlled by the chords.
Let’s Play!
Between this vocal harmonization, control of the Ableton Live API, and the Beatshifting protocol, there is great potential for communal livecoded music performance. If you’re a livecoder interested in music, I’d love to hear from you!
I’ve written a Caffeine app implementation of the Beatshifting algorithm, for collaborative remote music performance that is synchronized and out-of-phase. Beatshifting uses network latency as a rhythmic element, using offsets from beats as timestamps, with a shared metronome and score.
I was inspired to write the Beatshifting app by NINJAM, a similar system that has hosted many hours of joyous sessions. There are a few interesting twists I think I can bring to the technology, through late-binding of audio rendering.
NINJAM also synchronizes distributed streams of rhythmic music. It works by using a server to collect an entire measure of audio from the performers’ timestamped streams, stamps them all with an upcoming measure number, and sends them back to each performer. Each performer’s system plays the collected measures with the start times aligned. In effect, each performer plays along with what everyone else did a measure ago. Each performer must receive audio only by the start of the upcoming measure, rather than fast enough to create the illusion of simultaneity.
Beatshifting gives more control over the session to each performer, and to an audience as well. Each performer can modify not only the local volume levels of the other performers, but also their delays and instruments. Each performer can also change the tempo and time signature of the session. A session can have an audience as well, and each audience member is really a performer who hasn’t played anything yet.
It’s straightforward to have an arbitrary number of participants in a session because Beatshifting takes the form of a web app. Each participant only needs to visit a session link in a web browser, rather than use a special digital audio workstation (DAW) app. By default, Beatshifting uses MIDI event messages instead of audio, using much less bandwidth even with a large group.
To deliver events to each participant’s web browser, Beatshifting uses the Croquet replication service. Croquet is able to replicate and synchronize any JavaScript object in every participant’s web browser, up to 60 times per second. Beatshifting uses this to provide a shared score. Music events like notes and fader movements can be scheduled into the score by any participant, and from code run by the score itself.
One piece of code the score runs broadcasts events indicating that measures have elapsed, so that the web browsers can render metronome clicks. There are three kinds of metronome clicks, for ticks, beats, and measures. For example, with a time signature of 6/8, there are two beats per measure, and three ticks per beat. Each tick is an eighth-note, so each beat is a dotted-quarter note. The sequence of clicks one hears is:
measure
tick
tick
beat
tick
tick
At a tempo of 120 beats per minute, or 240 clicks per 60,000 milliseconds, there are 250 milliseconds between clicks. Each time a web browser receives a measure-elapsed event, it schedules MIDI events for the next measure’s clicks with the local MIDI output interface. Since each web browser knows the starting time of the session in its output MIDI interface’s timescale, it can calculate the timestamps of all ensuing clicks.
When a performer plays a note, their web browser notes the offset in milliseconds between when the note was played and the time of the most recent click. The web browser then publishes an event-scheduling message, to which the score is subscribed. The score then broadcasts a note-played event to all the web browsers. Again, it’s up to each web browser to schedule a corresponding MIDI note with its local MIDI output interface. The local timestamp of that note is chosen to be the same millisecond offset from some future click point. How far in the future that click is can be chosen based on who played the note, or any other element of the event’s data. Each web browser can also choose other parameters for each event, like instrument, volume level, and panning position.
Quantities like tempo are part of the score’s state, and can be changed by any performer or audience member. Croquet ensures that the changed JavaScript variables are synchronized in all the participants’ web browsers.
With so many decisions about how music events are rendered left to each web browser, the mix that each participant hears can be wildly different. The only constants are the millisecond beat offsets of each performer’s notes. I think it’ll be fun to compare recordings of these mixes after the fact, and to make new ones from individual recorded tracks.
There’s no server that any participant needs to set up, and the Croquet service knows nothing of the Beatshifting protocol. This makes it very easy to start and join new sessions.
next steps
The current Beatshifting UI has controls for joining a session, enabling the local scheduling of metronome clicks, and changing the tempo and time signature of a session.
the current Beatshifting UI
If one is using a MIDI output interface connected to a DAW, then one may use the DAW to control instruments, volume, panning, and so on. I’d also like to provide the option of all MIDI event rendering performed by the web browser, and a UI for controlling and recording that. I’ve established the use of the ToneJS audio framework for rendering events, and am now developing the UI.
I led a debut performance of Beatshifting as part of the Netherlands Coding Live concert series, on 23 April 2021.
I’ve written an animated 3D visualization of the Beatshifting algorithm, which can be driven from live session data. This movie is an annotated slow-motion version:
visualizing the Beatshifting algorithm
I’m excited about the creative potential of Beatshifting sessions. Please contact me if you’re interested in playing or coding for this medium!
Naiad keeps livecoders informed of their teammates activity, and remembers all history.
topology established
Naiad is Caffeine‘s live module system. The goal is to support live versioning of classes and methods as they are edited, from connected teams of developers using Smalltalk or JavaScript IDEs from web browsers and native apps. Naiad keeps each developer informed of events meaningful to their teams and work. It’s comparable to a mashup of GitHub and Slack, and will interoperate with them as well.
The current Naiad prototype uses a relay network of NodeJS servers, each with Caffeine running in a Web Worker thread, and each serving a set of Caffeine-based client IDEs, in web browsers and native apps. The workers keep track of class and method versions, system checkpoints, and teams, using the relays to broadcast events to clients. Clients can request various services of the workers, like joining teams and making checkpoints from object memory snapshots.
These two clients are connected to the same relay server. The client on the left created a new team, by sending a message to the relay’s worker. The worker created the team, and told the relay to notify all of its peers (clients and relays). For now, clients respond by inspecting the new team.
I’ve just made the first system checkpoint, and broadcast the first team event (the creation of a team). Eventually, Naiad will support events for several services, including team chatting and screen-sharing, history management, and application deployment. I’m still eager to hear what events and services you think you would want in a livecoding notification system; please let me know! I expect the first public release of this work to be part of the second 2019 solstice release, on 22 December.
run headlessly in a web browser worker thread, NodeJS server worker thread, or NodeJS main thread.
We have all the components we need to connect teams of livecoders, sharing information from their IDEs as they work. What information would we like to share?
proactive conflict resolution
I’d like to share information that makes code integration easier, by spreading awareness of potential conflicts as soon as possible. Imagine, for example, that you’ve found a bug in a longstanding system method, and decide to start editing it. Before the commit of your change (which may still be days or weeks away), someone else on your team also happens to start editing that method. Wouldn’t it be nice to know that both of you are interested in changing the method?
If both of you are connected to a team network, your IDEs can notify each other when a potential conflict situation like this begins, and the two of you can resolve it through discussion. Such a feature could be vital in a team where responsibility for methods and classes is clearly and completely divided between authors.
The servers in this network can provide history services, too, acting as repositories of all the versions of methods and classes that have been committed by team members. This could aid in unit testing, sharing of works-in-progress, and deployment.
How would you use it?
How would you like to use such a system? How would your needs change when acting as a developer, or as a manager? I’m writing a specification now, and would love to hear your thoughts. Thanks!
Pharo 7 running on the SqueakJS virtual machine in Chrome, debugged by Squeak in a DevTools panel
I’ve updated Caffeine to run Pharo 7; please try it out! There was one virtual machine bug (primitivePerformWithArguments wasn’t manipulating the stack correctly), and I had to turn off a few Pharo features (like libGit support, which uses LibC, something I haven’t faked in the virtual machine yet).
Many thanks to the Pharo hackers in the RMOD team at INRIA Lille, for hosting me at their sprint on Friday, 27 September 2019. It was great hanging out and coding with you all. We’ll get that Pharo Apple Watch screenshot soon. :)
With the latest version of the Caffeine Chrome extension, you can run Caffeine in a Chrome DevTools panel, with access to all the Chrome debugging APIs. I’ve been using it to explore the Netflix video player, for an app I’m writing that enables the viewer to edit narratives by rearranging scenes.
From a quick look at the DOM element tree for the player, it’s apparent that it’s a React app. By following a reference chain from a user interface element (like the skip-forward button), through the bound “this” object of its click-event listener, I found the internal React properties for all the player’s UI elements, and all the player functions they use (for example, for seeking forward in a video).
With those functions in hand, I made a Netflix player class in Smalltalk, which can manipulate the Netflix player React app interactively from Smalltalk code. Other objects I made representing show elements (like scenes, episodes, seasons, and series) can use my player to compile analytic information about shows, and present them in different ways. For example, you could watch an episode of Better Call Saul consisting only of scenes that include a certain character, or that take place at a certain location, or with flashbacks placed in chronological order. This is for a webapp I’m writing called Arc.
I’m eager to see what else you explore using the Caffeine extension in the DevTools!
Caffeine running as a Chrome DevTools panel, debugging the Croquet Studios site, with Hydra graphics in the background.
I’ve updated the Caffeine Chrome extension in the Chrome Web Store. This version, 77.1, makes the entire Caffeine user interface available as a Chrome DevTools panel, and can access all of the Chrome APIs. With Hydra graphics support included, it’s the most convenient and geeky way to access Caffeine, perfect for your next Algorave. :)
I’ve created working Vue versions of the traditional Smalltalk workspace and classes browser, livecoded in the web browser from the full Squeak IDE. These use the vue-draggable-resizable component as the basis of a window system, for dragging and resizing, and the vue-menu component for pop-up context menus. Third-party Vue components are loaded live from the network using http-vue-loader, avoiding all offline build steps (e.g., with webpack). Each Smalltalk devtool UI is expressed as a Vue “single-file component” and loaded live.
When enough of the Smalltalk devtools are available in this format, I can provide an initial Squeak object memory snapshot without the UI process and its supporting code, and without the relatively large bitmaps for the Display, drop-shadows, and fonts. This snapshot will be about two megabytes, down from the 35-megabyte original. (I also unloaded lots of other code in The Big Shakeout, including Etoys and Monticello). This will greatly improve Caffeine’s initial-page-load and snapshot times.
I’m also eager to develop other apps, like a proper GUI for the Chrome devtools, a better web browser tabs manager, and several end-user apps. Caffeine is becoming an interesting platform!