Production of word-initial stops by Singaporean English/Mandarin and Singaporean English/Malay bilingual children

This paper was written for Lisa Davidson and Frans Adriaans’s LING-UA 54 Learning To Speak: The First and Second Language Acquisition of Sound in Spring 2014. We spent the semester reading primary research, and the assignment was to design an experiment to test for an aspect of first or second language acquisition.

I have Lisa and Frans’s comments on this paper but except for one major note I make below, I have not modified the paper.


How do simultaneously bilingual children distinguish between phonetic categories in their two languages, especially when some of those categories overlap? When bilingual children begin to produce speech, how accurate are they in producing phones that correspond to the correct language?

Current models of speech perception tend to use a monolingual approach, with subsequent language learning being analyzed through the lens of the L1’s phonetic inventory. The two dominant models of speech perception are Best’s Perceptual Assimilation Model (Best et al 2001) and Flege’s Speech Learning Model (1995). In the Perceptual Assimilation Model (PAM), non-native contrasts are assimilated in various ways to native speech sounds, and these non-native distinctions are always parsed relative to the listener’s L1. In comparison, Flege’s Speech Learning Model (SLM) postulates that L1 and L2 phonetic categories co-exist in a common phonetic space, and that while L1 distinctions acquired in childhood remain the primary mechanism for speech perception, these distinctions evolve with subsequent language acquisition to account for phonetic categories in L2 languages.

These models, however, do not account for simultaneously bilingual children who receive input from two or more languages from a young age. Historically, the dominant approach towards understanding bilingualism has been the unitary language system (ULS) hypothesis, which posits that bilingual children do not differentiate between the two languages in the beginning, essentially treating all language input as part of a single monolingual system, and the languages only become fully differentiated later in linguistic development, around 4 years of age (Volterra and Taeschner 1978).

More recently, Paradis and Genesee (1996) suggested that code-mixing in simultaneously bilingual children is not necessarily indicative of a unitary language system, but could equally be the result of an interaction between two separate systems. In this model, a bilingual child begins the process of language acquisition with two separate systems, one for each language, and these systems can either remain completely distinct such that it is as if two separate monolingual acquisition processes were happening in parallel (autonomous development), or they can interact with each other (interdependent development). This interaction can take the form of transfer, acceleration or delay. Transfer refers to the systematic employment of part of one language’s grammar in the other language, whereas acceleration and delay refer to the earlier or later acquisition of aspects of a language relative to monolingual acquisition.

Both the ULS and the autonomous and interdependent development models deal with lexical and syntactic acquisition, but can also be useful in generating a framework for bilingual phonological acquisition. The pertinent question is, do bilingual children have just one phonological system for two languages, or do they have two separate systems that may or may not interact? According to the interdependent development model, a bilingual child should have two phonological systems with the possibility of phonological transfer from one language to another, with the exact specifics of such transfer being dependent on the child’s relative knowledge of the two languages’ phonologies.

Yet another viewpoint is presented by Vihman (2002), who argued that in the earliest stages of language acquisition, there is no evidence that children have either one unitary system or two distinct systems. Although bilingual children eventually do develop two separate phonologies, early acquisition is asystemic, the result not only of language exposure but of a child’s vocal motor development, with an increased facility with speech production being the primary determining factor in a child’s first utterances. Under this model, differentiation of phonological systems is an emergent phenomenon resulting from word learning in two languages. This hypothesis, however, does not adequately explain phonological transfer from one language to another during the acquisition stage.

Research on simultaneous bilinguals’ production of speech segments that fall within overlapping phonemic categories is relatively limited and inconclusive. Fabiano-Smith and Barlow (2009) looked at the phonetic inventories of monolingual English and Spanish 3-4 year olds, and bilingual Spanish-English children of the same age. They found that the bilingual children had phonetic inventories that matched the monolingual children in complexity, and evidence suggests that bilingual children maintained distinct phonological systems with some transfer between the two languages. Fabiano-Smith and Bunta (2012) conducted a study of 24 3-year-olds divided into three groups: monolingual Spanish, monolingual English and bilingual Spanish-English, and found that while the VOT of /p/ and /k/ productions in the monolingual Spanish and monolingual English children were different in their respective languages, the VOT productions of bilingual Spanish-English children were similar in both languages, suggesting that the two phonetic categories exist in the same phonetic space.

The study that is most relevant to the one proposed in this paper is Johnson and Wilson (2002), which presented preliminary data on VOT length in stops based on lab recordings of a family of four: two bilingual Japanese-English children at ages 2;10 and 4;8 with a Japanese-speaking mother and an English-speaking father. The children had similar amounts of exposure to both Japanese and English. The researchers took measurements of the father’s VOT lengths in English, the mother’s VOT lengths in Japanese, and the children’s VOT lengths in both English and Japanese. Both parents’ VOT values matched previously-obtained reference VOT values for English and Japanese. The younger child, at age 2;10, showed no significant effect of language on VOT length in voiceless stops. However, the older child, at age 4;8, had a significantly longer VOT for English /p/ and /t/ than for Japanese /p/ and /t/.

The study proposed in this paper will take a similar approach, but applied to a larger participant pool and with different stop distinctions. Many Singaporeans now grow up simultaneously bilingual in Singaporean English (SgE) and either Mandarin or Malay, in homes where both parents are also bilingual in the same pair of languages. This presents intriguing possibilities for exploring the development of phonemic categories in simultaneous bilinguals, as these three languages have different categories for stops in word-initial position. Mandarin has no voiced obstruents and contrasts aspiration. Malay has no aspirated obstruents and contrasts voicing. Relatively little experimental data on Singaporean English phonetics exists. Gut (2005) found that in Singaporean English "syllable-initial voiceless plosives /p, t, k/ are often aspirated to a far lesser degree than in British English." No indication is given of how much shorter the aspiration in Singaporean English is, other than that the voiceless stops of some speakers may be perceived as voiced stops by speakers of other varieties of English (i.e. VOT of < 40ms for a voiceless stop).

By comparing the production of word-initial stops in Singaporean English, Malay and Mandarin by SgE/Mandarin and SgE/Malay bilingual children, this study hopes to find out if children in the process of developing phonological systems for these languages develop each system independently and are able to keep them separate, or if these different phonemic categories interact with each other during this stage of language acquisition.

In this study, Standard Singaporean English will be used, as opposed to Colloquial Singaporean English (also known as Singlish). There are two reasons for this: 1. as a creole, the lexicon of Singlish is drawn from multiple languages, including English, Mandarin, Malay, as well as Hokkien and Teochew, both of which have a three-way distinction between initial voiceless aspirated, voiceless unaspirated and voiced stops, and too many confounding factors will be introduced and, 2. anecdotally, Singlish itself may have a three-way contrast between voiceless aspirated, voiceless unaspirated and voiced stops in some highly specific situations ("kope" /kop/ vs "cope" /khop/ vs “goat” /got/; “kaki” /kaki/ vs “car key” /khakhi/ vs “gaga” /gaga/).


Participants will be 40 four-year-olds from Singapore and their parents. Of these, 20 households will be English/Malay-speaking, and 20 households will be English/Mandarin-speaking. According to Platt and Weber (1980), educational level correlates highly to sociolect of English in Singapore; to control for this factor, both parents should have some college education so that the child will have exposure to Standard Singaporean English.

The age of four years was chosen because by this age, children have usually acquired nearly the entire phonetic inventory of their L1 languages (Fabiano-Smith and Barlow 2010), but are unlikely to have started mandatory formal schooling, which in Singapore begins at age 6 and is conducted primarily in English. As a result, their language input is likely to be distributed fairly evenly across the two L1 languages, as opposed to being predominantly English input once they reach age 6.

Children from households where both parents or other primary caretaker (for example, a grandparent or nanny) speaks Hokkien or Teochew should be excluded from this study, as both these languages have a three-way contrast between voiceless aspirated, voiceless unaspirated and voiced stops in word-initial position. While they would be an interesting group to look at for precisely this reason, the potential for confounding factors in the context of this study is too great.


The stimuli are designed to combine word-initial stops with vowels that can occur after a word-initial stop in all of the languages under study. This is done to control for any effect the following vowel may have on the stop. The words chosen are generally concrete/picturable nouns, adjectives or verbs that a four-year-old can be expected to be familiar with or to learn during the experiment, such as “pisang” (banana), “gemuk” (fat), or 哭 (cry). In some cases, an abstract but common word has been chosen over other words that, although concrete, a four-year-old is unlikely to be familiar with. These include 比 (comparison particle), 不 (not), and 个 (the most common noun classifier in Mandarin). The Mandarin stimuli are not controlled for tone, as controlling for tone would considerably restrict the set of possible stimuli, making it difficult to test for words within the vocabulary range of four-year-olds.

Mandarin does not have [o], or [e] after /th/ or /kh/, making it impossible to test for combinations containing those vowels. Although Mandarin does have 陪 /pheɪ/ (to accompany), it is not a suitable stimulus for four-year-old subjects. As such, the experiment will not test words with [e] or [o] following the word-initial stop.

Partial list of stimuli (only the initial stop and following vowel are transcribed):

Malay Mandarin English
Voiceless Voiced Aspirated Unaspirated Voiceless Voiced












* *


* *






* Heng and Deterding (2005) found that Singaporean English reduces a relatively low percentage of vowels to /ə/, with 45% of vowels being reduced compared to 100% for the British English control group for the tokens being analyzed. Because the incidence of /ə/ in Singaporean English is still high, it is important to test for this combination if possible, but it is unlikely that Singaporean English stop + /ə/ tokens can be reliably elicited from a word list. Instead, where available, relevant tokens from spontaneous production should be analyzed instead.

[Note: the paper came back from Lisa and Frans with a suggestion to test /ʌ/ if /ə/ could not be reliably solicited, since /ʌ/ is the stressed equivalent of /ə/ in General American. Singaporean English, however, has the phoneme /ɜ/ as the stressed equivalent of /ə/, which would be an even better test. /ʌ/-/ɜ/ is contrastive in Singaporean English, as in the minimal pair bud-bird.]


Given that Singaporean English is poorly defined and described, this study will compare children's production with their parents' production, rather than against a standard set of phonetic features. The parents’ production of these words will serve as a baseline result, allowing us to establish the phonemic distinctions in each of the three languages. If a set of parents shows statistically anomalous significant transfer effects from one language to another, the child’s results can be excluded or considered separately from the main participant pool.

The experiment will take place over two days for each of the groups in order to mitigate any potential priming effects from having both languages tested on the same day. On the first day, each child will be shown pictures designed to elicit the target words in English, in the carrier phrase “this is (a) ________”. If the child does not produce the word, delayed elicitation can be used to prompt the child to produce the word. This will be followed by a spontaneous five-minute conversation between the child and a native speaker of Singaporean English. On the second day, this procedure will be repeated with the other language (either Mandarin or Malay).

The parents will be individually tested in separate rooms at the same time as their child. On the first day, each parent will be given the list of target words in English and asked to read them aloud in the carrier phrase “I say ________”. This will be followed by a spontaneous five-minute conversation with a native speaker of Singaporean English. On the second day, this procedure will be repeated with the other language, with the conversation being carried out with a native speaker of the other language.

After the speech samples have been collected, they will be coded and analyzed with the aim of identifying how phonemic distinctions are codified in each of the three languages, and the extent to which each individual speaker was able to maintain distinct phonemic categories across both languages.


I expect the baseline results from the parents’ speech samples to show that for word-initial stops, Mandarin has an aspiration contrast, Malay has a voicing contrast, and Singaporean English has a VOT length distinction with aspiration accompanying longer VOT. It is also possible that Singaporean English has an aspiration contrast with optional voicing, or a voicing contrast with optional aspiration. I do not expect place of articulation or the subsequent vowel to have a significant effect on either the parents’ or children’s voicing, aspiration or VOT in word-initial stops.

Based on this, I would expect the SgE/Mandarin bilingual children to have acquired the aspiration contrast in Mandarin and the SgE/Malay bilingual children to have acquired the voicing contrast in Malay. I would expect both groups of children to have a VOT distinction in Singaporean English. If this is indeed the case, that would suggest that the phonological systems of bilingual children at age 4 are separate and distinct, with minimal transfer from one language to another. A phonemic category is not defined only its discrete segmental properties, but is linked to one language or the other in the mind of the speaker.

If the SgE/Mandarin and SgE/Malay children have different contrasts for word-initial stops in Singaporean English compared to their parents (for example, SgE/Malay children contrast voicing in English while SgE/Mandarin children contrast aspiration in English, while both SgE/Malay and SgE/Mandarin parents contrast VOT in English), that would suggest that there is interference from Mandarin or Malay in their acquisition of English. Conversely, if the SgE/Malay children have aspirated stops in their Malay production or if the SgE/Mandarin children have voiced stops in their Mandarin production, that would suggest interference from English in their acquisition of Malay or Mandarin. In either case, this result would indicate that the phonological systems of bilingual children are not kept completely separate, at least not at age 4.

These results would not necessarily answer the question of whether bilingual children begin the process of language acquisition with one or two phonological systems, only of whether they end the process of phonological acquisition with separate systems, and whether there is any interaction between the two systems at the end of phonological acquisition. To more accurately trace the development of phonological systems in bilingual children, a longitudinal study from first words to the end of phonological acquisition is necessary, but there are many methodological considerations that make such an undertaking difficult. For example, the linguistic environment of bilingual children is likely to vary much more widely than that of monolingual children, making direct comparisons between children problematic. Additionally, in the earliest stages of speech production, before individual segments are fully acquired, it is difficult to ascribe utterances to one language or another.

One more possibility must be considered: if there is a significant effect of language pair on stop distinctions in English among the parents (i.e. SgE/Mandarin and SgE/Malay parents have different stop distinctions in English), that points to a larger question that cannot be adequately answered in this study: dialectal variation in a language as a direct result of the great majority of its native speakers being bilingual in other languages.

Other potential follow-up studies include a perception study to see if bilingual Singaporean children and adults accurately perceive voiceless aspirated, voiceless unaspirated and voiced stop distinctions depending on their own native languages and the language being spoken (for example, can an SgE/Malay speaker accurately perceive the difference between /ph/ and /p/ in both Mandarin and English?) Additionally, if a similar production study could be replicated in other areas where simultaneous bilingualism is common, such as Quebec, Catalonia or Hong Kong, it could be determined whether the result obtained in the Singapore context is specific to the linguistic situation in Singapore, or if bilingual children as a group maintain separate or merged phonological systems in their two native languages, regardless of what those languages are.


Best, C., McRoberts, G. & Goodell, E. (2001). Discrimination of non-native consonant contrasts varying in perceptual assimilation to the listener’s native phonological system. Journal of the Acoustical Society of America, 109 (2). 775-794.

Fabiano-Smith, L. & Barlow, J. (2010). Interaction in bilingual phonological acquisition: Evidence from phonetic inventories. International Journal of Bilingual Education and Bilingualism, 13 (1). 1-17.

Fabiano-Smith, L. & Bunta, F. (2012). Voice onset time of voiceless bilabial and velar stops in three-year-old bilingual children and their age-matched monolingual peers. Clinical Linguistics and Phonetics, 26 (2). 148-163.

Flege, J. (1995) Second-language speech learning: theory, findings, and problems. In W. Strange (Ed.) Speech Perception and Linguistic Experience: Issues in Cross-Language Research (pp. 229-273). Timonium, MD: York Press.

Gut, U. (2005). The realisation of final plosives in Singapore English: phonological rules and ethnic differences. In Deterding, D., Brown, A. & Low E L (Eds.) English in Singapore: Phonetic Research On A Corpus (pp. 14-25). Singapore: McGraw-Hill.

Heng M.G. & Deterding, D. (2005). Reduced vowels in conversational Singapore English. In Deterding, D., Brown, A. & Low E L (Eds.) English in Singapore: Phonetic Research On A Corpus (pp. 54-63). Singapore: McGraw-Hill.

Johnson, C. & Wilson, I. (2002). Phonetic evidence for early language differentiation: Research issues and some preliminary data. International Journal of Bilingualism, 6. 271-289.

Paradis, J. & Genesee, F. (1996). Syntactic acquisition in bilingual children: autonomous or interdependent? Studies in Second Language Acquisition. 18. 1-25.

Platt, J. & Weber, H. (1980). English in Singapore and Malaysia: Status, features, functions. Kuala Lumpur: Oxford University Press.

Vihman, M. M. (2002). Getting started without a system: from phonetics to phonology in bilingual development. International Journal of Bilingualism, 6. 239-254.

Volterra, V. & Taeschner, T. (1978). The acquisition and development of language by bilingual children. Journal of Child Language, 5. 311-326.