Video: Alexander Hamilton in IPA

I love musicals, and no musical has loomed larger in popular consciousness in recent years than Hamilton. We don’t have to discuss all the ways in which it’s groundbreaking, but one of the things that’s significant is how Lin-Manuel Miranda has brought a hip-hop sensibility to Broadway in In The Heights and now Hamilton. One of the most prominent ways he’s done that is through his use of rhythm and rhyme.

Inspired by the Wall Street Journal’s analysis of some of Lin-Manuel Miranda’s rhymes, and by Stephen Malinowski’s musical visualisations, I’ve had the puppets do a cover of Alexander Hamilton. In this video, the lyrics of Alexander Hamilton, written in the International Phonetic Alphabet (IPA), scroll by in time with the music.

How I made this video

I recorded the audio using ProTools, using the cast recording as a scratch track and timing the audio to that. Yes, both voices are me. Yes, those are the puppets' voices; any claim that they are controlled and voiced by a human being is fake news. My sister Felicia played the piano track that you hear, which I then laid under the vocal tracks.

The next step was to write out the song in IPA. For the most part, this was fairly straightfoward, but there were points where the pronunciation heard on the cast album is not the canonical General American pronunciation, and I had to make a decision as to how to transcribe it.

 Props to Lin-Manuel Miranda for writing a song that I still want to listen to after I've transcribed it.

Props to Lin-Manuel Miranda for writing a song that I still want to listen to after I've transcribed it.

Some broad guidelines I adhered to:

  • If I heard rhotacisation, I transcribed it. If in doubt, I transcribed it. I only transcribed non-rhotic vowels if I was sure I didn’t hear any rhotacisation.
    • e.g. /ə/ instead of /ɚ/ in “bastard”, line 1.
  • If I heard /ɹ/, I transcribed it. If in doubt, I usually transcribed it. I only left it out if:
    • I’m sure I didn’t hear it
      • e.g. “smarter” /smɑɾə/ in verse 2
    • I’m not sure if I heard it AND it fits the rime not to have /ɹ/ in the syllable coda.
  • /æ/ is left un-raised, for my sanity’s sake. I personally don’t have /æ̝/, and I’m not comfortable making judgements on whether /æ/ is raised or not, especially when there is no contrast involved.
  • Unstressed syllables tend to be transcribed /ə/ or /ɪ/, but the vowel in “and” is often /æ/, and the vowel in "to" is often /u/. This one was a bit of a judgement call.
  • I transcribed //t// sometimes as /t/, sometimes as /ʔ/, sometimes as /ɾ/, and sometimes as null.
  • Syllabification is based purely on phonotactic rules. This means that sometimes syllable boundaries do not correspond to morpheme boundaries:
    • e.g. “start-er”: /stɑ.ɾə/
    • e.g. “with-out”: /wɪ.ðaʊt/
    • e.g. “drip-pin’”: /dɹɪ.pɪn/

Halfway through the transcription, I realised I was a chump for doing this by hand, and I used an IPA transcriber instead. Of course, like any good student taking shortcuts, I checked every syllable of the transcription, and changed it to fit the pronunciation on the cast album where necessary.

The next step took some time to figure out. I love Stephen Malinowski’s work and I wanted my video to create the same sort of visual impression, but with a rigorous theoretical framework undergirding it. I had visions of creating some sort of graphical featural system, with perhaps a colour for each place of articulation, with the colours on a spectrum, so that places of articulation adjacent to each other would have complementary colours. Manners of articulation would be represented by shapes, with voiceless stops having the sharpest corners and approximants having the gentlest (in line with the bouba/kiki effect). As for the vowels, I imagined some use of the colour wheel to reflect vowel height and backness, so that vowels closer in the vowel space would be closer in colour, and vowels further apart would have contrasting colours.

I ran out of colours.

Wit aside, the simple fact is that music is a discipline built on repeating mathematical patterns, which allows Stephen Malinowski to take advantage of colour in his videos. On the other hand, linguistics does not rest on a primarily mathematical framework, and articulatory phonetics is especially subject to the limitations imposed by human anatomy. If all of music theory flows from mathematics, all of articulatory phonetics flows from anatomy. Consequently, it’s difficult to build a notation that tries to fit consonants and vowels into some kind of colour-based visual system.

Speaking of visual representations of consonants and vowels: yes, I tried creating a kiddie version of a spectrogram that would show the acoustic relationship between similar consonants and similar vowels. It’s really unintuitive to non-linguists. It’s also really, really ugly:

 No. Just no. (The above abomination is supposed to represent "how does a...")

No. Just no. (The above abomination is supposed to represent "how does a...")

 The grown-up spectrogram of "how does a..."

The grown-up spectrogram of "how does a..."

Eventually, I came back to just using IPA. English speakers don’t need a linguistics background to figure out which sound each symbol represents. The visual and phonetic relationship between two syllables that have the same nucleus is immediate and obvious: you know at once that /bɹeɪn/ and /peɪn/ rhyme, even if you don’t know the featural specifications of the vowel or diphthong in the syllable nucleus.

While it would have been nice to highlight, say, sibilant fricatives or /s/ + voiceless stop sequences (which frequently fall on the same beat of consecutive bars), I figured that was too much information to try to visually represent or highlight in a video.

It’s a rap! What matters is rhythm and rhyme. I knew how to represent the rhythm; I just needed to focus on the rhyme. So, I did the logical thing: the syllables in this video are coloured according to their nucleus. That is all there is to it.

 This is how the sausage is made.

This is how the sausage is made.

The rest of it was just mechanics: conditional formatting, stitching screenshots into a long image, displacing each column by the correct number of pixels, sticking the image in Final Cut, calibrating the timing of the scrolling to match up with the timing of the recording, overlaying the puppets…

Two separate videos were recorded, one with each puppet, against a fluorescent green piece of card. They were recorded separately because even though I have two hands, I lack the coordination to control two puppets at the same time. I put the puppets together in post-production, which is pretty obvious if you know what to look out for.

That’s how I arrived at the video you see here. I actually finished making this video a few months ago, but I wanted to wait till the Articulatory Phonetics series was done before publishing it, so that if anyone was curious about the IPA symbols, they could look up the theory behind the phonetics.

If I had the chance, there might be a couple of things I might change — but for what is almost a one-person job (once again, shoutout to my sister Felicia for playing the piano part), it’s not too shabby.

I’m happy to take questions about the transcription, the creative process, or the production process. Enjoy.

Articulatory Phonetics 101: Phonation

This is Part 2 of a series covering the basics of articulatory phonetics, the study of how humans physically produce speech sounds. For the full list of posts, see the Articulatory Phonetics 101 Index.

In the last video we looked very briefly and very broadly at breath and the vocal tract. In this video, we’re going to zoom into one very specific and very important part of the vocal tract: the vocal folds.

The Vocal Folds

The vocal folds, perhaps better known as the vocal cords, are the part of the vocal tract that vibrates to produce pitch.

When we sing, it is the frequency of the vocal folds’ vibration that determines whether we are singing high or low.

When you breathe, the vocal folds separate, or abduct. When you speak or sing, the vocal folds come together, or adduct.

Sometimes, the vocal folds vibrate at a particular frequency. That frequency is the pitch of your voice. If your vocal folds are vibrating slowly, that’s a low pitch, and if they’re vibrating quickly, that’s a high pitch.

By the way, if you’re curious what the vocal folds really look like, take a deep breath and google “laryngoscopy”.

Voicing and Phonation

Given this tidbit of information, you might assume that the vocal folds are always vibrating when we speak, but that is not at all the case.

Try this little experiment: place the back of your hand against your throat, and make a hissing sound: “ssssssss”.

Did you feel any vibration? Probably not.

Now say “see” with a very exaggerated hissing sound at the beginning: “sssssssee”.

You should be able to feel the difference there — your vocal folds are not vibrating during the “ssssss” at the beginning, but they are vibrating during the “ee”.

Okay — now try this again, except this time, say “zzzzzzzz”, as if you’re mimicking a bee buzzing.

Do you feel your vocal folds vibrating through the “zzzzzz” this time?

This is the difference between the [s] and [z] sounds: the vocal folds are not vibrating during a [s] sound, but they are during a [z] sound. [s] is what we call a voiceless sound, while [z] is what we call a voiced sound.

In fact, if you hold the back of your hand against your throat and you alternate between “ssssss” and “zzzzzz”, you’ll find that the two sounds are otherwise produced in an identical manner.

The only difference is that during “zzzzzz” the vocal folds are vibrating, but during “ssssss”, they are not. This distinction between voiced and voiceless speech sounds is known as voicing or phonation.

When we describe speech sounds, especially consonants, it’s important to know whether the sounds are voiced or voiceless.

Now, the vibration of the vocal folds is just one way in which the vocal tract can shape airflow to form speech sounds. In the next few videos in this series, we’ll look at perhaps the most critical element of the vocal tract for our purposes, the oral and nasal cavities — and learn about how the mouth and tongue help us to speak.

Articulatory Phonetics 101: The Vocal Tract

This is Part 1 of a series covering the basics of articulatory phonetics, the study of how humans physically produce speech sounds. For the full list of posts, see the Articulatory Phonetics 101 Index.

Have you ever wondered how your body physically produces speech? When you think of something to say, how does your body create the sounds that we recognise as words?

That’s what we’re going to look at in this video.

Breath Control

The physical production of a speech sound begins with an intake of breath.

What happens when you breathe in is that your diaphragm contracts. This increases the volume of your lungs, which reduces the air pressure in your lungs. As a result, air rushes into your lungs to occupy the low-pressure space there and equalise the air pressure.

As you release your breath, your diaphragm relaxes, reducing the volume of your lungs and increasing the air pressure, pushing air out of your lungs.

The air leaving your lungs passes through an array of organs that shape and alter the flow of the air, which we call the vocal tract.

The Vocal Tract

The air comes up through the trachea, and moves past or through the organs of the vocal tract in this order (more or less):

  • glottis
  • epiglottis
  • pharynx
  • uvula
  • velum
  • tongue (usually divided into four sections: root, body, blade, tip)
  • hard palate
  • alveolar ridge
  • teeth
  • lips

What happens when we speak is that we alter the shape of the vocal tract. All human speech is created by manipulating airflow through the vocal tract, using the parts of the vocal tract that are able to move (primarily the tongue, lower jaw and lips). We’ll look at these different parts of the vocal tract and these movements in more detail in future videos.