Close this search box.

Connected Speech: What Happens During Ordinary, Spontaneous Speech?

“A word is not just the sum of its individual sounds; just as connected speech is not just the sum of its individual words.”

(Underhill, 1994 p. 58)

Why do so many EFL learners have difficulty with listening comprehension? “English people speak so fast” is a very common complaint I hear from students. The frustration learners experience trying to comprehend what native speakers are saying either in a face-to-face social situation or on television, or in films and music, often occurs even with learners who are at a fairly advanced level. Understandably, not being able to understand what a native speaker is saying can be very demotivating and demoralising even for the most dedicated EFL learners. Imagine a learner being asked: “do you want a cup of tea?” /dgəˈwɒnə kʌpə ti:/ instead of the very carefully articulated /dʊ: jʊ: wɒnt æ kʌp ɒv ti:/. This is a real problem that an EFL learner may face because a fundamental characteristic of speech is that it is not a one-at-a-time series of sounds (Abbot, 1986). Speech is, in fact, a continuous stream of sounds without clear-cut boundaries between each word.

Arguably, speech would be much easier for EFL learners to understand if it was spoken with a clear gap between every word. The difference between what language learners see printed on a page and what native speakers of English actually say in everyday natural speech is significant. Speech is a continuous flow of sounds, where one sound ends and the next one begins is almost impossible to decide, particularly for language learners. “We do not articulate sounds in isolation but connect them in strings” (Dalton and Seidlhofer, 1994. p22). This continuous string of sounds is called connected speech and is what occurs in natural spoken English, i.e., rapid colloquial speech. It is what makes a speaker sound native.

The purpose of this article is to focus on what connected speech is and why it is important for EFL teachers to spend time highlighting its importance and use. I shall not be focusing on how connected speech should be taught in the classroom, as this will be the focus of a future article.

Note that the examples and rules I have given in this article apply specifically to Standard British English. However, most of what is written will also apply to other varieties of English.

What is Connected Speech?

A large proportion of what is said by a native speaker of English is distorted. By distorted, I mean words in a stream of speech will be altered somewhat from their citation form. Because natural speech does not consist of mutually exclusive words, an EFL learner is very likely to hear many unfamiliar sounds when listening to a native speaker (Maxwell, n.d.). Native speakers link and simplify words together as a means to enable spoken language to flow smoother, more rhythmically, and more efficiently. Basically, the more casual or informal the speech, the more the citation form of words will change (Alameen and Levis, 2015). This means that each individual word boundary will not be fully realised. Consequently, the sounds that are most affected in a stream of speech are the sounds at the end of words and the sounds at the beginning of words. This phenomenon occurs because as the final sound in a word is being completed, the articulators (or speech organs, i.e., tongue, lips, teeth, alveolar ridge, hard palate, velum, uvula, and glottis, see Ladefoged, 2006) are already anticipating the upcoming initial sound in the immediately following word: “individual target articulations are nearly always affected by the articulation of adjacent segments. There is often, therefore, a considerable overlap of articulatory activities in connected speech” (Williamson 2015. p4). For example, the phrase that pen /ðæt pɛn/ sounds more like /ðæp phɛn/. Connected speech could be described as a continuous stream of transitions and approximations (Dalton and Seidlhofer, 1994).

Native speakers have little trouble understanding each other. This is because they have a number of strategies when listening, for dealing with indistinct utterances caused by connected speech. Native speakers take into account the context and assume they hear words with which they are familiar with in the context. They are able to rely on an immense amount of prior knowledge in order to decipher the messages they hear (Anderson & Lynch, 1988).

EFL learners, on the other hand, are very rarely able to predict what lexical items will be used in any given context. Learners tend to depend almost exclusively on the sounds that they hear. Learners, in general, have little awareness of how words are linked together in natural speech (Temperly, 1987). Arguably, it is the learner’s prior knowledge of citation forms that are responsible for hampering their ability to comprehend what native speakers are saying during natural speech (Cauldwell, 2002).

It might be that teachers and students of English view connected speech as laziness or sloppiness. But this should not be the case because it is perfectly normal (Celce-Murcia et al., 2010). As Alameen and Levis point out:“even in formal situations, it [connected speech] is completely acceptable, natural and an essential part of speech” (2015, p3).

The processes that make up connected speech are assimilation, elision, weakening, contractions, and liaison.

What Happens When Phonemes Meet In Connected Speech?


Assimilation describes how sounds alter each other when they come into contact across word boundaries, i.e., assimilation affects the edges of words. It is a phonological process where a phoneme changes to resemble its neighbours more closely (McCarthy and Smith, 2003). To be more precise, one or more preceding consonants become more similar to a subsequent sound, i.e., the changing sound anticipates the following sound in some manner (see Weisser, 2005). e.g. light blue /laɪp bluː/. This example shows how assimilation is anticipatory in the sense that it goes backwards from a sound to the preceding sound (Roach, 2000). The /b/ in blue influences the previous sound /t/ in light.

If we take another example: in bed /ɪn bɛd/, we can see that the [m] and [b] phonemes are both bilabial consonants, i.e., the same place of articulation (both lips close together). The phonemes [d] (alveolar) and [b] have a different place of articulation; consequently, it is quite challenging to say /ɪn bɛd/ quickly, so what tends to happen is that /ɪm bɛd/ is produced in rapid speech by a native speaker.

Further examples of assimilation are alveolar consonants /t/, /d/, /n/ at the end of a word that often assimilate to the phoneme at the beginning of the next word, such as /p/, /b/, /m/, /k/, /g/:

that boy /ðæt bɔɪ/ /ðæp bɔɪ/
good boy /gʊd bɔɪ/ /gʊb bɔɪ/
bin man /bɪn mæn/ /bɪm mæn/
red cross /rɛd krɒs/ /rɛg krɒs/

/g/ can change to /ʃ/, and /z/ to /ʒ/:

this shop /ðɪs ʃɒp/ /ðɪʃ ʃɒp/
boys’ shoes /bɔɪzʃuːz/ /bɔɪʒʃuːz/


Dalton and Seidlhofer explain why assimilation takes place:

“When we speak at normal speed, individual sound segments follow each other so quickly that the tongue may never reach the ‘ideal position’ connected with a particular sound. It will only approximate to this position before it moves on to the position necessary for the next segment. The exact position of the tongue and other articulators during a segment, therefore, depends on where the tongue is coming from and where it is going to”.

(1994, p28)


Elision, like assimilation, operates at word boundaries as a consequence of rapid articulated speech. Elision means that a sound is omitted altogether in the middle of a consonant cluster (hence why elision is sometimes referred to as Cluster Reduction, see Williamson, 2015). This can occur in the middle of a word, e.g., sandwich /ˈsændwɪʤ/, becomes /ˈsænwɪdʒ/ (or even /ˈsæmwɪdʒ/; illustrates both elision and assimilation); Christmas /ˈkrɪstməs/ becomes /ˈkrɪsməs/. Elision also occurs when a sound, which would be present in an isolated word, is omitted in connected speech. As with assimilation, the most commonplace to find consonant elision is at the beginning of a syllable, and found mostly in /t/ and /d/:

kept quiet /kɛpt ˈkwaɪət/ /kɛp ˈkwaɪət/
left luggage /lɛft ˈlʌgɪʤ/ /lɛf ˈlʌgɪʤ/
first three /fɜːst θriː/ /fɜːs θriː/
changed places /ʧeɪnʤd ˈpleɪsɪz/ /ʧeɪnʤ ˈpleɪsɪz/
used car /juːzd kɑː/ /juːzd /kɑː/
as confused as ever /æz kənˈfjuːzd æz ˈɛvə/ /əz kənˈfjuːzəzˈ ɛvə/


(Williamson, 2015, Knowles, 1987, Gimpson, 1989)


Weakening refers to words that have a number of pronunciations. In citation form, at least one syllable is fully stressed in a word, meaning there is no reduction of vowel quality (Ladefoged, 2006). But this is not the case in connected speech, as many changes may take place. The word would, for example, has three pronunciations: /wʊd/, /wəd/, /əd/. The first is stressed while the other two are not. Prepositions, conjunctions, modal verbs, relative adverbs, relative pronouns, pronouns, and articles all have strong and weak pronunciation forms (Maxwell, n.d.).

Some further examples are:

  Strong Form Weak  Form
and /ænd / /ən/
of /ɒv/ /əv/
you /juː/ /jə/
me /miː/ /mɪ/
she /ʃiː/ /ʃɪ/
does /dʌz/ /dəz/
have /hæv/ /həv/ /əv/
must /mʌst/ /məst/ /məs/


(Underhill, 1994)


In English, the majority of contractions occur with the personal pronoun, the verb ‘to be’, the auxiliary verb ‘have’, and the particle ‘not’.

For example:

I am aɪ æm /aɪm/
I have aɪ hæv /aɪv/
they will ðeɪ wɪl /ðeɪl/
can not kæn nɒt  /kɑːnt/
Would not have wʊd nɒt hæv /ˈwʊdntəv/



Liaison (sometimes referred to as ‘hiatus’), which means the ‘linking’ of sounds or words, refers to a sound change that takes place when a speaker inserts a phoneme to make a smoother transition between words. We could say that an “extra sound intrudes into an utterance to create a seamless, continual quality” (Underhill, 1994. p65).

There are three sounds that often do this: /r/ e.g. law and order /lɔː rən ˈɔːdə/, /j/ e.g. I agree /aɪ jəˈgriː/, /w/ e.g. go on /gəʊ wɒn/.

Linking /r/

There are certain accents of English that are described as ‘rhotic’. This means that the [r] that exists in a written word such as car, will be pronounced. However, in standard accents such as Received Pronunciation (RP) or Standard British English, which are non-rhotic, the [r] is not pronounced. Linking /r/ refers to the sound a speaker with non-rhotic accent inserts between two words to link the preceding vowel to the following one.

For example:

her English is excellent /hɜːˈrɪŋglɪʃ ɪzˈɛksələnt/
four eggs /fɔː rɛgz/
car engine /kɑːˈrɛnʤɪn/
brother and sister /ˈbrʌðə rənˈsɪstə/
far away /fɑː rəˈweɪ/

(Underhill, 1994; Kelly, 2000)

Her Spanish /hɜːˈspænɪʃ/ and her German /hɜːˈʤɜːmən/ are examples where the [r] is not pronounced because it occurs before a consonant sound (Ibid).

Intrusive /r/

Intrusive /r/ refers to the phenomenon where a non-rhotic speaker will insert an /r/ sound where two vowel sounds meet, even when it does not appear in the citation form, in order to make the transition from one word to the next much smoother. In other words, there is no justification from the spelling of a word because the word’s spelling does not end in an [r]. This typically occurs when the first word ends in /ə/, /ɑː/ or /ɔː/ (this does not apply to speakers of rhotic varieties of English). For example:

law and order /lɔː rən ˈɔːdə/
I saw it happen /aɪ sɔː rɪt ˈhæpən/
America and Canada /əˈmɛrɪkə rən ˈkænədə/
the media are to blame /ðə ˈmiːdiə rə tə bleɪm/
the idea of it /ði aɪˈdɪə rəv ɪt/
a banana or an apple /ə bəˈnɑːnə ɔː rən ˈæpl/

(Underhill, 1994; Kelly, 2000)

The difference between intrusive /r/ and linking /r/ is that the former is reflected in the written form of English, while the latter is not (Underhill, 1994). In rhotic accents where [r] appears in the citation form, it is always pronounced (ibid).

Intrusive /j/ and /w/

A further way in which native speakers create a smoother transition between words is to insert a linking element in the shape of a semi-vowel, /j/ or /w/ to avoid a gap (or hiatus) between two words. Intrusive /j/ occurs between two vowel sounds. Generally, if the lips are wide at the end of the first word, a /j/ phoneme is inserted: e.g.

he is /hiː jɪz
I agree /aɪ jəˈgriː/
I am /aɪ jæm/
I ought /aɪ jɔːt/
they are /ðeɪ jɑː/

As is shown, the /j/ sound is inserted between a high front vowel at the end of a word, e.g. /iː/ (or the shortened /i/) /eɪ/, /aɪ/ and the vowel at the beginning of the next word: /æ/, /ə/, /ɔː/, /ɪ/, /ɑː/.

If, on the other hand, our lips are round at the end of the first word, we insert a /w/ phoneme. /w/ is generally inserted after high back vowels /uː/, /əʊ/, /aʊ/ e.g.

you are /juː wɑː/
go off /gəʊ wɒf/
Sue always wants to eat /suːˈwɔːlweɪz wɒnts tʊ iːt/
how old /haʊ wəʊld/

(Underhill, 2000)


In this article, I looked at the processes of connected speech and how these processes can make comprehension of native English challenging for EFL learners. It is not just a challenge for beginners, but also for more advanced learners due to the changes that occur to individual sounds and words when they are connected together in a stream of speech. Learners should certainly not think that ignorance of vocabulary is the cause of their inability to comprehend what a native speaker is saying. On the contrary, it is arguably ignorance of the way native speakers string words together in natural speech as a means to be more economical and rhythmical. Learners need to be made aware of why and how native speakers string words together by spending time in the classroom listening and learning to decode authentic material. This will be the focus of a future article.

There are a number of excellent YouTube videos that provide practical demonstrations of Connected Speech processes:

For a general overview of connected speech:





Weak Forms:






Linking/Intrusive /r/


Linking /w/ and /j/: 





  • Abbott, G. A new look at phonological ‘redundancy’. ELT Journal, Volume 40, Issue 4, April 1986, pp. 299-305.

  • Alameen, G. & Levis, J.M. (2015) Connected Speech. The Handbook of English Pronunciation. Wiley Blackwell. pp. 159-174.

  • Anderson, A. & Lynch, T. (1988) Listening. Language and Teaching: A scheme for teacher education. OUP.

  • Celce-Murcia, M., Brinton, D.M. Goodwin, J.M. (1996) Teaching Pronunciation. CUP.

  • Dalton, C. & Seidlhofer, B. (1994) Pronunciation. OUP.

  • Kelly, G. How to Teach Pronunciation. (2000) Longman.

  • Knowles, G. (1987). Patterns of Spoken English: an Introduction to English Phonetics. London: Longman.

  • Ladefoged, P. Course in Phonetics. (2006) Thomson Wadsworth.

  • McCarthy, J. and Smith, N, (2003) “Phonological processes: Assimilation” .Oxford International Encyclopaedia of Linguistics.20 Retrieved from

  • Maxwell, C. Connected Speech phenomena—Assimilation, Elision, Linking, and Weakening: A Study of Japanese L2 Learners. Asia University. Retrieved 4/7/19.

  • Roach, P. (2000) English Phonetics and Phonology: A practical course. CUP.

  • Temperly, M.S. (1987) Linking and deletion in final consonant clusters. In J. Morley (Ed.), Current perspectives on pronunciation: practices anchored in theory (pp. 59-82).

  • Underhill, A. (1994) Sound Foundations. Macmillan Heinemann.

Weisser, M (2005) Retrieved 6/12/2019Williamson, G. (2015) Connected Speech 101. Retrieved 15/10/2019

Related Topics

Leave a Reply

Your email address will not be published. Required fields are marked *