Articulatory Phonetics 101: Diphthongs

This is Part 11 of a series covering the basics of articulatory phonetics, the study of how humans physically produce speech sounds. For the full list of posts, see the Articulatory Phonetics 101 Index.

When we last left off, we had just covered the basics of vowel production. We’re almost — almost — done with the basics of articulatory phonetics, but we have one last thing to cover, real quick:

What’s the vowel in “hi”?

If you remember what we discussed in the vowel basics article, you’ll remember that from the articulatory point of view, a vowel is defined by the position of the highest point of the tongue — specifically, how high and how far back it is. What you’ll find, though, is that when you say “hi”, your tongue moves from one position to another, creating a change in the vowel quality. In this case, the vowel begins as /a/, a low central vowel, and ends as /ɪ/, a high front lax vowel. This vowel is a diphthong, a vowel with two (Greek di-) different vowel qualities (Greek phthongos, “tone”).

Phonologically, diphthongs are considered one vowel, rather than two. If we want to indicate for clarity that /haɪ/ contains a diphthong, we can draw a tie bar over it: /ha͡ɪ/. Alternatively, we can write the diacritic ̯ under /ɪ/ to indicate that /ɪ/ is not a syllable: /haɪ̯/. (If you want to be really particular about their use, they can be used to specify slightly different things, but we won’t get into that here.)

If you ever need to indicate that two adjacent vowels are separate syllables instead of a diphthong, you can insert a syllable break . to indicate that the two vowels do not form a diphthong: /a.ɪ/.

Diphthongs of English

Depending on what variety of English you speak, you may have different numbers of diphthongs. General American has five diphthongs, and Received Pronunciation has eight diphthongs.

General American diphthongs:

  • /aɪ/ as in “hi
  • /aʊ/ as in “how
  • /ɔɪ/ as in “boy
  • /eɪ/ as in “hay
  • /oʊ/ as in “hoe” (the gardening implement — minds out of gutters, please)

Received Pronunciation (or the Queen’s English, or BBC English — take your pick) diphthongs:

  • /aɪ/ as in “hi
  • /aʊ/ as in “how
  • /ɔɪ/ as in “boy
  • /eɪ/ as in “hay
  • /əʊ/ as in “hoe” (General American /oʊ/)
  • /ɪə/ as in “here” (General American /ɪɹ/)
  • /ɛə/ as in “hair” (General American /ɛɹ/)
  • /aə/ as in “hire” (General American /aɪɹ/)

/aɪ/, /aʊ/ and /aə/ begin with the low central vowel /a/, which otherwise does not appear in English.

/eɪ/ and /oʊ/ are of particular note here, because in General American and Received Pronunciation, /e/ and /o/ always appear as diphthongs. In broad transcription, it’s up to you whether you want to write out the full diphthongs /eɪ/ and /oʊ/, or if you want to leave them as /e/ and /o/.


That’s all I’ve got for you today. I’ll have a round-up post next week that quickly revisits all the most important concepts in articulatory phonetics, like a cheat sheet. See you all next week!

Articulatory Phonetics 101: Vowel Basics

This is Part 10 of a series covering the basics of articulatory phonetics, the study of how humans physically produce speech sounds. For the full list of posts, see the Articulatory Phonetics 101 Index.

What’s the difference between vowels and consonants?

For now, I’m going to give you an answer that might sound quite lame or unsatisfactory. Remember that consonants are formed by creating an obstruction or constriction in the airflow in the vocal tract.

Well, vowels are produced with an open vocal tract. It’s open in the sense in that there is no obstruction, no closure or turbulence in the vocal tract.

Yeah, I know what you’re thinking. What about approximants like /j/ and /w/? There’s no closure or turbulence there, either. Well, the answer is that approximants create more of a constriction in the vocal tract than vowels do. That’s it. Or at least that’s our working definition.

Describing vowels: height and backness

Okay, so now we’ve decided what vowels are. Next question: How do we distinguish one vowel from another? I mean, /i/ (as in “feet”), /ɑ/ (as in “fart”), /u/ (as in “food”) are all vowels, but they’re all different vowels. What makes one vowel different from another? Well — here, I’m going to give a technically incorrect but still very useful definition.

Say /ɑ/, “ahhhh”, like you’re at the doctor. Your vocal tract looks approximately like this (the red dot indicates the highest point of the tongue):

A grossly anatomically inaccurate drawing of /ɑ/

A grossly anatomically inaccurate drawing of /ɑ/

Okay, say /æ/ like the vowel in “ash”. Now, your vocal tract looks approximately like this:

A grossly anatomically inaccurate representation of /æ/

A grossly anatomically inaccurate representation of /æ/

Got it? Now say /u/. This time, your vocal tract looks something like this:

A grossly anatomically inaccurate representation of /u/

A grossly anatomically inaccurate representation of /u/

And finally: say /i/. Now, this is what your vocal tract looks like:

A grossly anatomically inaccurate representation of /i/

A grossly anatomically inaccurate representation of /i/

We’re going to define vowels based on the highest point of the tongue. So, in /ɑ/, the tongue is low, and the highest point is pretty far back in the vocal tract. In /æ/, the highest point of the tongue is still low, but now it’s much further forward. In /u/, the highest point of the tongue is way up there, and it’s at the back of the vocal tract. And finally, in /i/, the highest point of the tongue is high up here and it’s far forward, almost touching the palate.

These four vowels mark the extremes of what we call the vowel quadrilateral:

Grossly anatomically inaccurate drawings make for nicer vowel quadrilaterals

Grossly anatomically inaccurate drawings make for nicer vowel quadrilaterals

Now, we have two axes here: vowel height and vowel backness. Vowel height is how high the highest point of the tongue is, and vowel backness is how far back the highest point of the tongue is.

Vowel Quad Axes.png

(A quick but necessary aside for readers with a prior background in linguistics, who may disagree with the choice of low front vowel: Why /æ/ and not /a/ for low front, as the IPA chart suggests? See Ladefoged and Johnson, A Course in Phonetics, 6th Edition (Boston: Wadsworth Cengage, 2010): /æ/ is a little higher and fronter, /a/ is a little lower and backer.

The vowel quadrilateral is an abstraction. What goes in the low front position that best fits our model? It seems to me that if you want to emphasise the frontness of the low front vowel, that spot should go to /æ/, and /a/ should be considered a low central vowel. If you want to emphasise the lowness of the low front vowel, then the low front vowel should be /a/ and /æ/ is raised (and fronted) relative to /a/.

As far as I can tell, /æ/ is lower than /a/ is fronted, and so it makes sense to me to put /æ/ in the low front position, and /a/ in a low central position. Also, this is the way I was taught, so I’m sticking to it out of habit.)

Back vowels in English

Try this: say these vowels aloud.

/ɑ/ as in “father”, /ɔ/ as in “all”, /o/ as in “pole”, /ʊ/ as in “pull”, /u/ as in “pool”.

Do you feel the back of your tongue rising towards the top of your mouth? These are all back vowels, but they vary in height.

/ɑ/ is a low back vowel, /ɔ/ is a low-mid back vowel, /o/ is a high-mid back vowel, /ʊ/ is a high back lax vowel, /u/ is a high back tense vowel.

“Well, technically, that’s not the whole story…” — don’t yell at me, I’m trying to keep things simple.

You might also notice that your mouth went from mostly open to mostly closed, so for this reason we sometimes use “open” and “close” to refer to vowel height — but as far as we’re concerned here, we’ll stick with high and low.

We can fill these vowels in our vowel quadrilateral right now:

Vowel Quad Back.png

Front vowels in English

Okay, now let’s look at front vowels. Say these vowels out loud:

/æ/ as in “sat”, /ɛ/ as in “set”, /e/ as in “cake”, /ɪ/ as in “sit”, /i/ as in “seat”.

It’s the same thing here — this time, you’re moving the body of the tongue higher in the vocal tract, and at the highest point the tongue almost — but not quite — touches the hard palate. Using the same set of distinctions we used for the back vowels, we get:

/æ/, the low front vowel, /ɛ/, the low-mid front vowel, /e/, the high-mid front vowel, /ɪ/, the high front lax vowel, /i/, the high front tense vowel.

Our vowel quadrilateral now looks like this:

Mid and central vowels in English

These aren’t all the vowels, though: we also have mid vowels, which are neither high nor low, and central vowels, which are neither front nor back.

Consider the word “fun”: that vowel isn’t as far back or as low as the vowel in “father”. It’s a low-mid central vowel, and it’s written /ʌ/. (Note: This corresponds to what I was taught, not to the IPA, which characterises this vowel as a low-mid back unrounded vowel. There’s a case to be made for either description, but /ʌ/ makes more sense to me analysed as a central vowel.)

Then, there’s the vowel in “bird”. Depending on how you speak, this is either a /bɜd/ (if the vowel has no r-like sound) or a /bɝd/ (if the vowel has an r-like sound). The little squiggle in /ɝ/ indicates that it’s an r-coloured vowel, or rhotacised vowel. For our purposes, we’re going to consider /ɜ/ a mid central vowel.

Okay — now, let’s take a look at the final vowel in “father”.

It sounds the same as the vowel in “bird”, but it occurs in an unstressed syllable. For reasons we’re not going to go into in great detail right now, this unstressed vowel, also a mid central vowel, has a different symbol: /ə/, or /ɚ/, depending on whether your vowel is rhotacised or not.

Now, of these central vowels, it’s worth pointing out that many, maybe most native English speakers won’t use all four of these vowels. Received Pronunciation, or the so-called “BBC English”, doesn’t have /ɝ/ and /ɚ/, and General American doesn’t have /ɜ/. But no matter what variety of English you speak, you will have the vowel /ə/, which is the single most common vowel in the English language. This is the mid central vowel, better known as the schwa. The schwa is the vowel produced when the vocal tract is in a completely relaxed position.

Well, what’s the difference between /ɜ/ and /ə/, or between /ɝ/ and /ɚ/? It’s simple: if the vowel is stressed, we call it /ɜ/ or /ɝ/, and if it’s unstressed, we call it /ə/ or /ɚ/ (depending on the word and on your dialect, of course). Now, the reasons for this have more to do with phonology than with phonetics per se, so we’ll leave it at that.

Vowel roundedness

We’re almost — almost — done with vowels, but there’s one more thing we need to talk about. You might have noticed that when you said /ɑ ɔ o ʊ u/, your lips ended up sort of rounded and puckered, while the front vowels /æ ɛ e ɪ i/ didn’t have the same effect. Well, in most languages, back vowels tend to be rounded, while front vowels tend to be unrounded, but that is only a tendency.

If you speak French or Dutch or German or Mandarin, you might be wondering about the vowel /y/, which in French and Dutch is written <u>, and in German as <ü>, and in Mandarin Hanyupinyin romanisation as either <u> or <ü>. That is a high front rounded vowel. If you say /i/ and round the lips, you get /y/. This vowel exists in tense and lax forms as well, and they’re written /y/ and /ʏ/ in IPA respectively. When you’re writing IPA symbols on a vowel quadrilateral, the symbol for the unrounded vowel always goes on the left, and the symbol for the rounded vowel on the right.

Similarly, there’s no reason the back vowels need to be rounded. You can take /u/, and without moving the tongue, relax the lips, so that they become unrounded, and that’s the high back unrounded vowel /ɯ/. You can take /o/, and again without moving the tongue, unround the lips to get /ɤ/, which is a vowel that exists in Mandarin: that’s the vowel in 喝 hē, “to drink”.

This gives us enough information to fully fill out our vowel quadrilateral:

Vowel Quad Full.png

Vowel nasality

Okay, I lied — let’s squeeze in one more thing. If you’ve ever tried to learn French or Portuguese, you might have heard of nasal vowels. In French, <bien> isn’t pronounced /bjen/ as it is in Spanish; it’s instead pronounced /biõ/, with no /n/ at the end of the syllable.

Well — in a nasal vowel, the velum is lowered, which allows air to escape through both the nose and the mouth, and that’s what makes nasal vowels sound different.

Nasal vowels aren’t indicated on the vowel quadrilateral. Instead, in order to indicate a nasal vowel, we draw a tilde ~ above the vowel.


Whew. That’s all for now — if you made it through this post, you now know the basics of how vowels are articulated. In the next post, we’ll look at vowel sounds like “eye” /aɪ/, “ow” /aʊ/, and “oi” /ɔɪ/, and how they’re different from the ones we’ve seen so far.

Articulatory Phonetics 101: Airstream Mechanism

This is Part 9 of a series covering the basics of articulatory phonetics, the study of how humans physically produce speech sounds. For the full list of posts, see the Articulatory Phonetics 101 Index.

There will eventually be a video for this post, once I figure out how to produce the videos in a way that isn’t terrible.

Cast your mind back to the first post in this series, where I said…

The physical production of a speech sound begins with an intake of breath.

Well… I lied.

You may have heard of the click consonants of languages like !Xhosa, or maybe the ejective consonants of Georgian, or the implosive consonants of Sindhi.

With clicks, ejectives and implosives, the air flows differently through the vocal tract. These consonants have a different airstream mechanism.

With all the consonants we’ve been studying, all the consonants of English, the airflow starts with increased air pressure in the lungs forcing air outwards through the vocal tract. This is the pulmonic egressive airstream mechanism: it’s the diaphragm and the lungs that drive the movement of the airstream, and the air moves out of the vocal tract. Hence: pulmonic egressive.

Now, let’s look at click consonants. Listen to this sound, and try to copy it.

Got it? This “tsk” sound is probably a sound you’ve made before, but you might not have known that it’s a speech sound in other languages. (I’m assuming you’re primarily an English speaker, of course.) This is the dental click. Let’s examine what happens here.

Now, when you start to make the sound, two things happen simultaneously. The back of your tongue rises up to touch the velum, and at the same time, the tongue tip rises up to touch the back of the teeth. We’ve now formed a closure in not one, but two places: one at the velum, one at the teeth. Now we have a little pocket of air between the palate and the tongue.

Next, what happens is, we lower the tongue just slightly, without moving the two closures at the velum and the teeth, so the air pressure in the air pocket drops.

Then, we release the dental closure, and air rushes into the mouth to fill the low-pressure space. That’s what happens when you produce a click. Because this airstream mechanism is initiated by the tongue, it is a lingual mechanism (from the Latin word “lingua”, meaning “tongue”), and because air is drawn into the vocal tract rather than being forced out of it, we call it an ingressive airstream. The airstream mechanism of click consonants, therefore, is called the lingual ingressive airstream mechanism.

They don’t have to just be dental, either — they can be bilabial, alveolar, alveolar lateral, or palatal as well.

What about ejectives like /k’/?

Well, try it — can you copy the sound you’re hearing?

What is happening here is that first, you’re closing your glottis to form a glottal stop, and at the same time you raise the back of the tongue to touch the velum, forming a velar closure. Now, we have a pocket of air in the pharynx.

You see where this is going!

The next step is to raise the glottis. This increases the air pressure in the air pocket. Then, you release the velar closure, and the compressed air is released out of the vocal tract — pretty forcefully! The airstream is initiated at the glottis and pushes air out of the vocal tract, so we call it the glottalic egressive airstream mechanism. To indicate that a consonant is ejective, we mark it with an apostrophe, as in /k’/.

One more — implosives like /ɓ/.

If you don’t speak a language that has implosives, they might not seem very remarkable to you — like a slightly different-sounding stop. However, about 13% of the world’s languages have contrastive implosive consonants (source: Wikipedia), so we definitely need to examine them closely.

The articulation of an implosive consonant starts like that of an ejective: there’s a closure at the glottis and at the point of articulation — in this case, the lips. Now, we have the same air pocket as we did with ejectives. But this time, instead of raising the glottis, we lower it. This reduces the air pressure in the oral cavity, and then, when we release the bilabial closure (which is how linguists say “opening the mouth”), air is pulled into the vocal tract. This is — you guessed it — the glottalic ingressive airstream mechanism. There are only a limited number of consonants that can be produced this way, and the ones with an IPA symbol are the voiced bilabial /ɓ/, dental /ɗ̪/, alveolar /ɗ/, palatal /ʄ/, velar /ɠ/ and uvular /ʛ/implosives.

These are the four airstream mechanisms found in the world’s languages. We do use other airstream mechanisms to communicate — for example, a gasp is a pulmonic ingressive airstream mechanism — but crucially, these four are the ones we use to produce speech.

And there you have it — a brief introduction to airstream mechanisms. So, we’re finally done with the articulation of consonants, and in the next post, we’ll be looking at vowels. Thanks for reading! See you next time.