I was inspired by Dan Tepfer’s piano worlds to explore my own universe of augmented piano playing. Could I write a program that learns in realtime to improvise with my style in the music breaks between my own playing? 🤖🎹
This is pretty much a play-by-play on making PianoAI in 8 days. If you just want to try out PianoAI, go to the Github for instructions and downloading. Otherwise, stay here to read about my many programming follies.
After watching Dan Tepfers’s video on NPR, I did some freeze frames of his computer and found out he uses the Processing software to connect up to his (very fancy) MIDI keyboard. I pretty easily managed to get together something very similar with my own (not fancy) MIDI keyboard.
As explained in the video, Dan’s augmented piano playing basically allows you to mirror or echo certain notes in specified patterns. A lot of his patterns seem to be song dependent. After thinking, I decided I was interested in something a little different. I wanted to a piano accompaniment that learns in realtime to emulate my style and improvise in the spaces between my own playing, on any song.
I’m not the first to make a Piano AI. Google made the A.I. Duet which is a great program. But I wanted to see if I could make an A.I. specifically tuned to my own style.
A journey into AI
My equipment is nothing special. I bought it used from some guy who decided he had to give up trying to learn how to play piano 😢. Pretty much any MIDI keyboard and MIDI adapter will do.
I generally need to do everything at least twice in order to get it right. Also I need to draw everything out clearly in order to actually understand what I’m doing. My process for writing programs then is basically as follows:
- Draw it out on paper.
- Program it in Python.
- Start over.
- Draw it out on paper, again.
- Program it in Go.
Each time I draw out my idea on paper, it takes about three pieces of paper before my idea actually starts to take form and possibly work.
Programming a Piano AI in Python
Once I had an idea of what I was doing, I implemented everything in a set of Python scripts. These scripts are built on pygame which has great support for MIDI. The idea is fairly simple - there are two threads: a metronome and a listener. The listener just records notes played by the host. The metronome ticks along and plays any notes in the queue, or asks the AI to send some new notes if none are in the queue.
I made it somewhat pluggable, as you can do variations on the AI so it can be easily outfitted with different piano augmentations. There is an algorithm for simply echoing, one for playing notes within the chord structure (after it determines the chords), and one for generating piano runs from a Markov chain. Here’s a movie of me playing with the latter version of this algorithm (when my right hand moves away from the keyboard, the AI begins to play until I play again):
There were a couple of things I didn’t like about this. First, its not very good. At best, the piano AI accompaniment sounds like a small child trying hard to emulate my own playing (I think there are a couple of reasons for this - basically not taking into account velocity data and transition times). Secondly, these python scripts did not work on a Raspberry Pi (the video was shot with me using Windows)! I don’t know why. I had trouble on Python3.4, so I upgraded to 3.6. With Python3.6, I still had weird problems.
pygame.fastevent.post worked but
pygame.fastevent.get did not. I threw up my hands at this and found an alternative.
The alternative is to write this in Go. Go is notably faster than Python - which is quite useful since this is a low-latency application. My ears discern discrepancies of > 20 milliseconds, so I want to keep processing times down to a minimum. I found a Go midi library so porting was very viable.
Programming a Piano AI in Go
I decided to simplify a little bit, and instead of making many modules with different algorithms, I would focus on the one I’m most interested: a program that learns in realtime to improvise in the spaces between my own playing. I took out some more sheets of paper and began.
Most of the code is about the same with my previous Python scripts. When writing in Go, I found that spawning threads is so much easier than in Python. Threads are all over this program. There are threads for listening to midi, threads for playing notes, threads for keeping track of notes. I was tempted to use the brand new Go 1.9 sync.Map in my threads, but realized I could leverage maps of maps which is beyond the
sync.Map complexity. Still, I just made a map of maps that is very similar to another sync map store that I wrote (schollz/jsonstore).
I attempted to make everything classy (pun-intended) so I implemented components (midi, music, ai) as their own objects with their own functions. So far, the midi listening works great, and seems to responds very fast. I also implemented play back functions and they work too - this is pretty easy.
Started by refactoring all the code into folders because I’d like to reserve the
New function for each of the objects. The objects have solidified - there is a AI for learning / generating licks, a Music object for the piano note models, a Piano object for communicating with midi, and a Player object for pulling everything together.
I spent a lot of time with pen and paper figuring out how the AI should work. I realized that there is more than one way to make a Markov chain out of Piano notes. Piano notes have four basic properties: pitch, velocity, duration, and lag (time to next note). The basic Markov chain for piano notes would be four different Markov chains, one for each of the properties. It would be illustrated as such:
Here the next pitch for the next note (P2) is determined from the pitch of the previous note (P1). Similar for Velocity (V1/V2), Duration (D1/D2) and Lag (L1/L2). The actual Markov chain simply enumerates the relative frequencies of occurrence of the value of each property and uses a random selector to pick one.
However, the piano properties are not necessarily independent: sometimes there is a relationship between the pitch and velocity, or the velocity and the duration of a note. To account for this I’ve allowed for different couplings. You can couple properties to the current or the last value of any other property. Currently I’m only allowing two couplings, because that’s complicated enough. But in theory, you could couple the value of the next pitch to the previous pitch and velocity and duration and lag!
Once I had everything figured out, theoretically, I began to implement the AI. The AI is simply a Markov chain, so it determines a table of relative frequencies of note properties and has a function for computing them from the cumulative probabilities and a random number. At the end of the night, it works! Well, kinda but not really. Here’s a silly video of me playing a lick to it and getting some AI piano runs:
Seems like there is more improvement to be made tomorrow!
There is no improvement to be made that I can find.
But maybe the coupling I tried yesterday is not very good. The coupling I’m most interested can be visualized as:
In this coupling, the previous pitch determines the next pitch. The next velocity is determined by the previous velocity and the current pitch. The current pitch also determines the duration. And the current duration determines the current lag. This needs to be evaluated in the correct order (pitch, duration, velocity, lag) and that’s left up to to user cause I don’t want to program in tree traversals.
Well, I tried this and it sounds pretty bad. I’m massively disappointed in the results. I think I need to try a different machine learning. I’m going to make my repo public now, maybe someone will find it and take over for me because I’m not sure I will continue on it.
I’ve been thinking, and I’m going to try a neural net. (commit b5931ff6).
I just tried a neural net. It went badly, to say the least. I tried several variations too. I tried feeding in the notes as pairs, either each property individually or all the properties as a vector. This didn’t sound good - the timings were way off and the notes were all over the place.
I also tried a neural net where I send the layout of the whole keyboard (commit 20948dfb) and then the keyboard layout that it should transition into. It sounds complicated because I think it is and it didn’t seem to work either.
The biggest problem I noticed with the neural net is that it is hard to get some randomness. I tried introducing a random column as a random vector but it just creates too many spurious notes. Once the AI piano lick begins, it seems to just get stuck in a local minimum and doesn’t explore much anymore. I think in order to make the neural net work, I’d have to do what Google does and try to Deep learn what “melody” is and what a “chord” is and what a “piano” is. Ugh.
I give up.
I did give up. But then, I couldn’t help but rethink the entire project while running in the forest.
What is an AI really? Is it just supposed to play with my level of intelligence? What is my level of intelligence when I play? When I think about my own intelligence, I realize: I’m not very intelligent!
Yes, I’m a simple piano player. I just play notes in scales that belong to the chords. I have some little riffs that I mix in when I feel like it. Actually, the more I think about it I realize that my piano improvisation is like linking up little riffs that I reuse or copy or splice over and over. So the Piano AI should do just that!
I wrote down the idea on the requisite piece of paper and went home to program it. Basically this version of the AI was a Markovian scheme again but greater than first order (i.e. remembering more than just the last note). And the Markov transitions should link up larger segments of notes that are known to be riffs (i.e. based off my playing history). I implemented a new AI for this (commit bc96f512) and tried it out.
What the heck! Someone put my Piano AI on Product Hunt today. Oh boy, I’m not done, but I’m almost done so I hope no one tries it today.
With a fresh brain I found a number of problems that were actually pretty easy to fix. I fixed:
- a bug where the maps have notes at the wrong beats (I still don’t know how this happens ut I check for it now)
- a bug to prevent the AI from improvising while its improvising
- a bug to fix concurrent access to a map
- a bug to prevent improvisation while learning
And I add command-line flags so its ready to go. And it actually works! Here’s some videos of me teaching for about 30 seconds and then jamming:
Lots of great feedback on this! Here is a Hacker News discussion which is illuminating.
- There is a great paper about musical composition by Peter Langston which has a “riffology” algorithm that seems similar to mine.
- Dan Tepfer seems to be using SuperCollider for procedural music composition, not Processing.
- There are some great sequence predictors that could be used to make this better
This is likely not the last day, but if you have more ideas or questions, let me know! Tweet @yakczar.