Adventures in generating music via ChatGPT text prompts

Written by Adrian Holovaty on December 1, 2022

Yesterday OpenAI released ChatGPT, a chatbot that uses a sophisticated language model and knows a lot about the world. It can give high-quality answers on business strategy, talk in funny dialects, write coherent poetry/jokes/limericks and generally provide reasonable answers to any question you throw at it.

I was curious: Let’s see whether this thing can generate music!

Given that ChatGPT communicates via plain text, it would need to encode music in a text-based format. Fortunately, several such formats exist. The de facto modern standard is MusicXML, so I started with that.

My first prompt:

How would you encode the melody to "Mary Had A Little Lamb" as MusicXML?

Its answer:

MusicXML is a widely-used digital language for representing musical notation. It provides a standardized format for storing and sharing musical scores in a way that is readable by both humans and computers.

To encode the melody of "Mary Had A Little Lamb" as MusicXML, you would first need to transcribe the melody into musical notation using a music notation software or a score-writing program that supports MusicXML. Once the melody has been transcribed, the software will automatically generate the corresponding MusicXML code, which you can then save and share with others.

Here is an example of how the melody to "Mary Had A Little Lamb" might be encoded as MusicXML:

That was followed by raw MusicXML:

<score-partwise>
  <part-list>
    <score-part id="P1">
      <part-name>Piano</part-name>
    </score-part>
  </part-list>
  <part id="P1">
    <measure number="1">
      <attributes>
        <divisions>1</divisions>
        <key>
          <fifths>0</fifths>
        </key>
        <time>
          <beats>4</beats>
          <beat-type>4</beat-type>
        </time>
        <clef>
          <sign>G</sign>
          <line>2</line>
        </clef>
      </attributes>
      <note>
        <pitch>
          <step>B</step>
          <octave>3</octave>
        </pitch>
        <duration>4</duration>
        <type>quarter</type>
      </note>
      <note>
        <pitch>
          <step>A</step>
          <octave>3</octave>
        </pitch>
        <duration>4</duration>
        <type>quarter</type>
      </note>
      <note>
        <pitch>
          <step>G</step>
          <octave>3</octave>
        </pitch>
        <duration>4</duration>
        <type>quarter</type>
      </note>
      <note>
        <pitch>
          <step>A</step>
          <octave>3</octave>
        </pitch>
        <duration>4</duration>
        <type>quarter</type>
      </note>
    </measure>
    <measure number="2">
      <note>
        <pitch>
          <step>B</step>
          <octave>3</octave>
        </pitch>
        <duration>4</duration>
        <type>quarter</type>
      </note>
      <note>
        <pitch>
          <step>B</step>
          <octave>3</octave>
        </pitch>
        <duration>4</duration>
        <type>quarter</type>
      </note>
      <note>
        <pitch>
          <step>B</step>
          <octave>3</octave>
        </pitch>
        <duration>4</duration>
        <type>quarter</type>
        <dot />
      </note>

Wow, this actually resembles real MusicXML! There’s just one issue — it’s invalid XML, because the <measure>, <part> and <score-partwise> elements aren’t closed. Let’s see whether ChatGPT can fix that:

Could you please repeat that MusicXML, but make sure it's a valid XML document? Some of the XML elements don't have a corresponding closed element.

No dice. It effectively repeated the same thing, with invalid XML.

But, no worries — I’m glad my human brain can be put to use. I added the missing XML elements myself, then uploaded that MusicXML into Soundslice to see how it would sound. Here it is:

Not bad! It’s more or less the first two bars of the tune. Just one quibble: ChatGPT wrote the melody in the key of G major but used a C major key signature. But all things considered, this is impressive.

I would have liked more of the music, but perhaps it ran into a limit on how long embedded code examples could be (more on this later). The ABC music format is much simpler and more compact than MusicXML — so, as a second experiment, I asked ChatGPT to use ABC:

Please write the melody of "Happy Birthday" in ABC format.

The response:

Here is the melody of "Happy Birthday" in ABC format:

X:1
T:Happy Birthday
M:4/4
L:1/4
K:C
C C G G | A A G | F F E E | D D C |
C C G G | A A G | G G F F | E E D |
C C C3 | A A A3 | G G G3 | F F F3 |
E E E3 | D D D3 | C C C4 | C C C4 |

In ABC notation, each line begins with a letter followed by one or more characters. In this example, the X:1 line indicates that this is the first tune in the file, the T:Happy Birthday line indicates the title of the tune, the M:4/4 line indicates the time signature (4 beats per measure, with a quarter note as the beat), the L:1/4 line indicates the note length (a quarter note), and the K:C line indicates the key (C major).

The melody is then written as a series of chords, with each chord consisting of one or more notes played simultaneously. The notes are indicated by their letter names (A-G), with accidentals indicated by a ^ (sharp) or _ (flat) symbol before the note letter, and octave indicators indicated by a number after the note letter. The duration of each note is indicated by a number after the note letter, with a / symbol indicating a dotted note. For example, in the first measure of the melody, the chord C C G G indicates that the C major chord (C, E, and G) is played, with each note lasting for a quarter note.

I’m impressed that it came with a longhand explanation. But was the music actually correct?

Soundslice doesn’t import ABC files yet, so I pasted that ABC code into this free ABC viewer and got a PDF in response. Then I ran that PDF through Soundslice’s new PDF sheet music scanner and got the following:

Hmm. This sounds suspiciously like the “ABC song” (aka “Mary Had A Little Lamb” or “Twinkle Twinkle Little Star”). Could it be that ChatGPT misinterpreted my request for the ABC music notation format to mean the ABC song?

I tried to disambiguate via my next prompt:

Please write the melody of "Happy Birthday" using the ABC format. Make sure that the melody is "Happy Birthday" and the melody is not the "ABC song".

Alas the response was exactly the same.

I’m thinking either ChatGPT doesn’t know enough about the ABC format, or I don’t know the correct incantation to make it avoid encoding the ABC song. No big deal — let’s go back to MusicXML.

My next prompt:

Create a MusicXML document with a G major key signature, containing a G major arpeggio over two octaves, as eighth notes.

The response:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE score-partwise PUBLIC
    "-//Recordare//DTD MusicXML 3.1 Partwise//EN"
    "http://www.musicxml.org/dtds/partwise.dtd">
<score-partwise>
  <part-list>
    <score-part id="P1">
      <part-name>Arpeggio</part-name>
    </score-part>
  </part-list>
  <part id="P1">
    <measure>
      <attributes>
        <divisions>1</divisions>
        <key>
          <fifths>1</fifths>
        </key>
        <time>
          <beats>4</beats>
          <beat-type>4</beat-type>
        </time>
        <clef>
          <sign>G</sign>
          <line>2</line>
        </clef>
      </attributes>
      <note>
        <pitch>
          <step>G</step>
          <octave>3</octave>
        </pitch>
        <duration>1</duration>
        <type>eighth</type>
      </note>
      <note>
        <pitch>
          <step>B</step>
          <octave>3</octave>
        </pitch>
        <duration>1</duration>
        <type>eighth</type>
      </note>
      <note>
        <pitch>
          <step>D</step>
          <octave>4</octave>
        </pitch>
        <duration>1</duration>
        <type>eighth</type>
      </note>
      <note>
        <pitch>
          <step>G</step>
          <octave>4</octave>
        </pitch>
        <duration>1</duration>
        <type>eighth</type>
      </note>
      <note>
        <pitch>
          <step>G</step>
          <octave>3</octave>
        </pitch>
        <duration>1</duration>
        <type>eighth</type>
      </note>
      <note>
        <pitch>
          <step>B</step>
          <octave>3</octave>
        </pitch>
        <duration>1</duration>
        <type>eighth</type>
      </note>
      <note>
        <pitch>
          <step>D</step>
          <octave>4</octave>
        </pitch>
        <duration>1</duration>
        <type>eighth</type>
      </note>
      <note>
        <pitch>
          <step>G</step>
          <octave>4</octave>
        </pitch>
        <duration>1</duration>
        <type>eighth</type

Again we have only a partial document. So by now I’m thinking this is a limitation of ChatGPT — either in the frontend UI, or the backend API, or in the model itself. At any rate, I cleaned up the missing elements and here’s what it sounds like:

Again, not bad! It used the correct key signature, correct rhythmic values and correct pitches. I also like how it named the <part> “Arpeggio” — nice attention to detail. The only problem is that I explicitly asked for two octaves, and it just repeated the single-octave arpeggio.

For some reason it included the XML declaration and doctype this time. To save space in the output, hence leaving more room for music, I included the text "Omit the XML declaration and doctype at the top of the file" in all subsequent instructions. It correctly obliged.

Let’s complicate things:

Create a MusicXML document with a G major key signature, containing a four-note D7 arpeggio followed by a four-note G major arpeggio. All notes should be eighth notes.

So close! It forgot to sharp the second note, and it only used three notes for the G major arpeggio. But pretty impressive.

Let’s see whether it understands the concept of transposition:

Create a MusicXML document with an F major key signature, containing the ascending melody notes F and A, followed by the same melody transposed into the key of C. All notes should be eighth notes in a single measure.

Again really close! The key signature is wrong despite my clear instruction, and there’s an extra F-A, but the melody at the end (the last two notes) are indeed the original melody transposed into the key of C.

Seeing all of these results, specifically the types of mistakes ChatGPT makes, leads me to the following conclusions:

  • ChatGPT indeed has some understanding of how to generate MusicXML. It doesn’t simply regurgitate hard-coded examples from its training data.
  • Its knowledge of the ABC format is more limited (understandable, given MusicXML is much more widely used), but perhaps this is due to the unfortunate ambiguity between “ABC format” and “ABC song.”
  • It has basic knowledge of music theory — e.g., knowing which notes comprise a D7 arpeggio or a G major arpeggio.
  • I used silly, overly simplified examples, and I’d like to push the system harder to see how far its knowledge extends. The main barrier is the apparent character limit in ChatGPT’s output, which doesn’t mix well with MusicXML’s verbosity.

I imagine it would be relatively easy to fine-tune OpenAI’s text model on music theory, repertoire and encoding rules of MusicXML — hence producing the ultimate musical chatbot. Call it MozAIrt? H-AI-rmony? Better yet, let’s ask ChatGPT for ideas:

List 10 names for a chatbot that specializes in musical information

The response:

  1. MelodyBot
  2. NoteWise
  3. HarmonyHelper
  4. SongSage
  5. MusicMaster
  6. TuneTactician
  7. RhythmRanger
  8. ScoreSavior
  9. HarmonyHive
  10. MelodyMaven

OK, those are boring. Let’s use a better prompt.

List 10 creative names for a chatbot that specializes in musical information. Every name should be a pun.

Now we’re talking:

  1. Beethoven Bot
  2. Bach to Basics
  3. Chopin It Up
  4. Mozart of the Chat
  5. Tchaik-bot-sky
  6. Brahms and Pieces
  7. Handel Your Business
  8. Schubert Your Heart Out
  9. Dvorak the Chat
  10. Haydn Fun of It

Tchaik-bot-sky it is!

Comments aren’t enabled for this page.