Diphthongs in English: Double Vowel Sounds

Say the word "coin" slowly and pay attention to what your tongue does. It doesn't sit in one place — it starts in one position and slides to another, all inside a single syllable. That sliding sound is a diphthong. The term stitches together the Greek roots di ("two") and phthongos ("sound"), and the label fits: you're hearing one vowel melting into a second without a break. English leans on these gliding vowels more than many languages do, and getting them right is a big part of what separates a natural accent from a flat, textbook pronunciation.
Defining the Diphthong
Phonetically, a diphthong is one vowel phoneme made of two parts: a starting point (the nucleus) that carries most of the weight and loudness, and a weaker glide that finishes the sound. Both elements live inside the same syllable. The word "ride," for instance, is a single beat containing the diphthong /aɪ/, even though your tongue travels from a low, open position up toward the front.
Watch out for a common confusion: two vowels that happen to sit next to each other across a syllable boundary (a phenomenon called hiatus) are not a diphthong. "Reality" and "cooperate" show this — each vowel belongs to its own syllable and carries its own stress pulse. A true diphthong behaves as one phonological unit from start to finish.
Most descriptions of standard English count eight diphthongs, though the tally wobbles depending on the dialect under study and the framework the analyst prefers. They split into two natural families: closing diphthongs, which climb toward a higher tongue position, and centering diphthongs, which drift toward the relaxed schwa in the middle of the mouth.
Diphthongs Set Against Pure Vowels
To understand diphthongs, put them next to their simpler cousins. A monophthong is a steady vowel — the tongue sets up in one place and stays there for the duration of the sound. A diphthong refuses to stay still. Compare:
Diphthong: /aʊ/ in "cloud" — the tongue starts low and slides up and back toward /ʊ/
In reality, the line is fuzzier than textbooks suggest. Plenty of supposedly "pure" English vowels carry a faint glide. The vowel in "see" often drifts as [ɪi], and the vowel in "throw" regularly surfaces as [oʊ] or [əʊ]. Whether you call those sounds monophthongs or diphthongs depends on how much movement you can hear and which dialect you're measuring.
For learners, the real sticking point is producing enough glide. Speakers coming from languages with a smaller diphthong inventory tend to flatten these sounds into single vowels, and the flattening cuts straight into how intelligible and idiomatic their English sounds.
The Closing Group
Closing diphthongs start somewhere lower in the mouth and finish with the tongue raised. English has five of them, and they split neatly by where the glide lands.
/eɪ/ — as in "rain," "make," "stay"
This one opens at a mid-front position and rises toward /ɪ/. English teachers often call it "long A." It is everywhere in the language, and the exact starting height drifts from dialect to dialect — some speakers begin closer to /e/, others closer to /ɛ/.
Common spellings: a-e (bake), ai (paid), ay (say), ea (steak), ei (vein), ey (obey), eigh (sleigh)
/aɪ/ — as in "light," "kind," "buy"
Starting with the mouth wide open in a low, central position, the tongue climbs toward /ɪ/. This diphthong covers more acoustic territory than any other in English — it's a broad, sweeping move from open to close.
Common spellings: i-e (bike), y (shy), igh (sigh), ie (lie), uy (guy), eye (eye)
/ɔɪ/ — as in "joy," "spoil," "employ"
The tongue sets up at a mid-back position with rounded lips, then slides forward and up toward /ɪ/. Two things change at once here — the height and the front-back location — which makes the sound particularly distinctive.
Common spellings: oi (point), oy (enjoy)
/aʊ/ — as in "loud," "brown," "cloud"
Beginning from a low, slightly backed starting point, the tongue rises and retreats toward /ʊ/ while the lips gradually round. It's the mirror image of /aɪ/ in terms of the landing zone.
Common spellings: ou (shout), ow (bow)
/oʊ/ (or /əʊ/) — as in "row," "boat," "snow"
This diphthong starts mid and rises toward /ʊ/. British RP typically begins from a more centralized schwa-like position (/əʊ/), while American English starts further back and more rounded (/oʊ/). Either way, it matches what schools teach as "long O."
Common spellings: o-e (rope), oa (coat), ow (below), o (go), ough (though)
The Centering Group
Centering diphthongs start from a peripheral vowel position and slide toward the schwa /ə/ parked in the middle of the mouth. You hear them most clearly in non-rhotic accents such as British RP, where they took the place of older vowel-plus-/r/ sequences once the /r/ dropped.
/ɪə/ — as in "beard," "cheer," "idea"
This diphthong glides from /ɪ/ toward /ə/. Words carrying it historically had a vowel followed by /r/, and most rhotic accents (General American included) still pronounce that /r/ instead of producing a centering glide.
Common spellings: ear (clear), eer (career), ere (severe), ier (tier)
/eə/ — as in "chair," "where," "stare"
A glide from /e/ to /ə/. Many British speakers have flattened this into a long monophthong /ɛː/, while American speakers use /ɛr/ instead, reflecting the rhotic heritage.
Common spellings: air (hair), are (share), ear (pear), ere (their)
/ʊə/ — as in "moor," "tourist," "jury"
Moving from /ʊ/ to /ə/, this is the scarcest of all the English diphthongs and it is losing ground. In many modern accents it has merged into /ɔː/, so words like "tour" can end up rhyming with "thaw."
Common spellings: ure (insure), oor (moor), our (tourist)
How Spelling Signals Diphthongs
English orthography gives you signposts for diphthong pronunciation, though the language is generous with exceptions.
| Diphthong | Common Spellings | Examples |
|---|---|---|
| /eɪ/ | a-e, ai, ay, ea, ey, eigh | bake, paid, say, steak, obey, sleigh |
| /aɪ/ | i-e, y, igh, ie, uy | bike, shy, sigh, lie, guy |
| /ɔɪ/ | oi, oy | point, enjoy |
| /aʊ/ | ou, ow | shout, brown |
| /oʊ/ | o-e, oa, ow, o | rope, coat, below, go |
The centering diphthongs show up in spellings that pair a vowel with r, a fossil of the pronunciation that existed before non-rhotic accents started dropping the /r/.
Regional Accents and Vowel Glides
Diphthongs swing wildly from one variety of English to another, which is exactly why they make such reliable markers of where a speaker is from.
North America Compared to Britain
The cleanest contrast shows up in the centering diphthongs. American English holds onto the /r/, producing vowel-plus-/r/ combinations where British RP glides toward /ə/. The /oʊ/ diphthong also parts ways: American speakers start it further back and with rounder lips, while RP speakers begin from a more central [əʊ].
The American South
A famous feature of Southern U.S. English is the flattening of /aɪ/ into [aː], especially before voiced consonants. In that accent, "ride" lands much closer to "rahd." It is one of the most immediately recognizable markers of a Southern voice.
Down Under
Australian English pulls its diphthongs around noticeably. /eɪ/ starts lower — sometimes approaching [aɪ] — so an outsider can mishear "day" as "die." The starting point of /aɪ/ also retreats, giving the accent its distinctive shape.
London Voices
In Cockney and the Estuary varieties that spread outward from London, the diphthongs swap slots in a systematic rearrangement: /eɪ/ moves to [aɪ], /aɪ/ becomes [ɒɪ], and /aʊ/ shifts to [æə]. The shifts chain together, reshuffling the whole system at once.
Mistakes Learners Make
Killing the Glide
Easily the most frequent problem: flattening the diphthong into a single pure vowel. Pronouncing "cake" as [kek] instead of [keɪk], or "boat" as [bot] instead of [boʊt], is an instant giveaway of a non-native accent.
Splitting Into Two Syllables
The opposite error also happens. Some learners separate the two parts so firmly that a single syllable turns into two. "Toy" should glide as one beat, not break into "to-ee."
Missing the Target Positions
Even a produced glide can be off if it starts or ends in the wrong spot. A diphthong has a specific path, and both endpoints need to land near the right vowel position for the sound to register correctly.
Drills to Train Your Mouth
Exercise 1: Name That Diphthong
Read each word aloud and identify the diphthong it contains: rope, kind, toy, loud, weight, clear, boy, know, where, gown, side, chair.
Exercise 2: Minimal Pairs
Work through these contrasts, where a single vowel change flips one word into another:
cot /ɒ/ vs. coat /oʊ/
bed /ɛ/ vs. bayed /eɪ/
sit /ɪ/ vs. site /aɪ/
boil /ɔɪ/ vs. bile /aɪ/
Exercise 3: Stretch the Glide
Pick one diphthong at a time and deliberately draw out the glide. Lock in the opening vowel, hold it for a beat, then slide to the ending position in slow motion. Gradually speed the transition up until it feels fluid. The repetition builds the tongue memory that native pronunciation depends on.
Think of diphthongs as the moving parts of the English vowel engine. They set the melody, give words their characteristic shape, and carry a big chunk of what makes an accent sound native. Get them gliding in the right direction, to the right endpoint, and your spoken English moves from merely accurate to genuinely fluent.