Development of the Catalan Periphrastic Past Tense Construction

This essay is a review of the development of Catalan's unusual periphrastic past tense form. It's a little denser than most of the other posts on this site and assumes some prior knowledge of syntax and morphology. If you have any questions, please ask.

In the Catalan language, there are three important periphrastic constructions that can be formed with the auxiliary verb anar (“to go”). One is the anar + present participle construction, which is functionally and formally equivalent to the Spanish ir (“to go”) + present participle construction. This construction has the same tense as the tense in which the auxiliary anar is conjugated, and has a progressive aspect. Another construction is the anar a + infinitive construction, which is formally equivalent to the Spanish ir a + infinitive construction, but has a slightly different function: instead of functioning as a periphrastic future tense, as in Spanish, the Catalan construction carries the same tense as the conjugated anar and has an inchoative aspect. The third periphrastic construction with anar is the one that interests us here: it is formed with a non-standard present tense conjugation of anar followed by the infinitive, and functions as a periphrastic past tense. In this essay, I will provide a brief overview of the characteristics of this construction that make it unusual from a linguistic perspective, and outline Antoni Badia’s argument of the development of the periphrastic past tense.

The periphrastic past tense construction is of interest to us for several reasons. Firstly, the conjugation used is not the standard present tense conjugation of anar. Anar in the present tense normally conjugates as follows:

Standard present tense conjugation of anar

1st person singular (1s)vaig1st person plural (1pl)anem
2nd person singular (2s)vas2nd person plural (2pl)aneu
3rd person singular (3s)va3rd person plural (3pl)van

The irregular conjugation is due to syncretism of the Latin verbs amnare and vadere as they evolved into Catalan; an ir- root also appears in the future and conditional conjugations of standard modern Catalan, derived from Latin ire. (“anar”, Diccionari) This syncretism can also be found in many of the Western Romance languages, such as Spanish and Portuguese ir (syncretism of Latin ire, vadere and the perfect forms of esse) and Italian andare (syncretism of Latin amnare and vadere). However, when anar is used as part of the periphrastic past tense, it conjugates as follows:

Present tense conjugation of anar in periphrastic past tense

1st person singular (1s)vaig1st person plural (1pl)vam
2nd person singular (2s)vas2nd person plural (2pl)vau
3rd person singular (3s)va3rd person plural (3pl)van

Besides the non-standard conjugation, this periphrastic construction is also significant because of its function. Other Western Romance languages have a formally similar construction that is functionally distinct. In French, for example, the aller (“to go”) + infinitive construction is formally equivalent to the Catalan periphrastic past, but this construction in French serves instead as a periphrastic future tense, rather than as a past tense. The same happens in Portuguese with a periphrastic ir (“to go”) + infinitive future tense construction. As mentioned previously, Spanish has an ir a + infinitive construction that functions as a future tense. Among the major Western Romance languages, Catalan is the only language to form a periphrastic past tense with a present tense form of the auxiliary verb “to go”.

Historically, the preterite form was preferred to the periphrastic form. However, in the written language, the periphrastic past tense now occurs in free variation with the simple past (preterite), and in the spoken language, the periphrastic past tense has virtually replaced the simple past, except in the Balearic Islands, where the simple past continues to be used in speech. Grammars of Catalan are very emphatic in saying that the periphrastic past and the simple past are semantically identical, even if one form may be preferred over the other for stylistic or historical reasons. Antoni Badia, for example, says that “today literary Catalan employs both tenses equally, such that the choice has become a stylistic device… It need not even be said that meaning of the two perfect forms is always identical” (Gramática Catalana, 277). Alarcos Llorach, in his Estudis de Lingüística Catalana, mentions several times while analyzing the preterite form portí (“I carry”, from portar, “to carry”) that it is equivalent to vaig portar (125, 130, 132).

Another reason the periphrastic past tense construction is of interest to us is that the Catalan periphrastic past tense has undergone formal changes since its first appearance in writing. In the earliest written texts such as the Libre dels feyts del rey en Jacme, also known as the Crònica del Jaume I (written in the 13th or 14th century), it appears that the periphrastic past tense might not have been formed with the present tense of anar, but with the preterite conjugation. In Gramàtica Històrica Catalana, Badia points out that the phrases van ferir (with the auxiliary in the present tense, third person plural) and anà’l ferir (with the auxiliary in the preterite, third person singular) both appear in the Crònica. He argues that since the Provençal and Castilian of that era had periphrastic present tense anar + infinitive and ir + infinitive constructions respectively, it seems likely that van ferir was a periphrastic construction for the present tense, and the periphrastic past tense construction could instead have been anà’l ferir. (370) This suggests that at some point, while the Spanish and Provençal forms involving the conjugated present tense either fell out of use or, in the case of Spanish, evolved to become a periphrastic future tense, in Catalan the periphrastic construction evolved in the other direction to become a periphrastic past tense. How did this evolution happen in Catalan, given that it did not happen in any other Western Romance language?

Badia has reconstructed, using “traces that are found in the texts”, a possible early form of the conjugation of anar as used in the periphrastic present and past tenses (Gramàtica Històrica Catalana 370):

Possible historical present tense conjugation of anar

1st person singular (1s)vau1st person plural (1pl)anam
2nd person singular (2s)vas2nd person plural (2pl)anats
3rd person singular (3s)va / vai3rd person plural (3pl)van

Possible historical preterite conjugation of anar

1st person singular (1s)aní / ané1st person plural (1pl)anam
2nd person singular (2s)anist2nd person plural (2pl)anats
3rd person singular (3s)anà3rd person plural (3pl)anaren

Badia suggests that as the preterite form became replaced by the periphrastic form, the fact that the 1pl and 2pl forms of anar in the present tense and simple past were identical, coupled with the “constant mix of the historical present and the perfect in old texts”, caused the present tense to be substituted for the preterite in the auxiliary verb anar, resulting in a conjugation that corresponds to the one still used in the modern Algherese dialect of Catalan (Gramática Històrica 371):

Present tense conjugation of anar in modern Algherese, used in the periphrastic past tense

1st person singular (1s)vaig1st person plural (1pl)anam
2nd person singular (2s)vas2nd person plural (2pl)anats
3rd person singular (3s)va3rd person plural (3pl)van

Badia postulates that once the periphrastic past tense became a definitive substitute for the simple past tense, Catalan speakers began to treat, for example, va venir (“he came”) as a single unit, almost as if it were a synthetic form of the verb with an inflected prefix (Gramàtica Històrica 371). Thus, he argues that the 1pl and 2pl forms became regularized to vam and vau to match the other conjugated forms, resulting in the modern standard Catalan conjugation of vaig, vas, va, vam, vau and van (Gramàtica Històrica 371). Because the regularization was tied specifically to the use of anar as an auxiliary, this did not affect the standard conjugation, creating the distinction we see today in the two different conjugations of anar.

There is an alternative conjugation, used only in the periphrastic anar + infinitive construction, that occurs in free variation in Central Catalan with the conjugation shown above in the first verb table:

Alternative present tense conjugation of anar in periphrastic past tense

1st person singular (1s)vàreig1st person plural (1pl)vàrem
2nd person singular (2s)vàres2nd person plural (2pl)vàreu
3rd person singular (3s)va3rd person plural (3pl)varen

The variant forms vàreig, vàres, vàrem, vàreu and varen, Badia argues, came about because if the present tense 3pl conjugation van were a regular conjugation, this would yield a preterite 3pl conjugation of varen, and consequently a var- root propagated into the 1s, 2s, 1pl and 2pl forms with varying degrees of incidence (Gramàtica Històrica 371). This argument seems plausible because if varen were indeed the preterite 3pl form of a verb var, the preterite 3s form of the verb would remain va (compare preterite 3s conjugation cantà for the verb cantar). Badia also notes that of the var- forms of the verb, the 1s vàreig is the least commonly used, which lines up with his hypothesis of a regularized var- form; in this conjugation, vàreig would be an irregular form (compare preterite 1s conjugation cantí for cantar) and it makes sense that it occurs less often than the others. (Gramàtica Històrica 371)

There remains much to be studied in the Catalan periphrastic past tense construction. Given that there are two different conjugations for the same periphrasis in free variation, and the periphrastic form itself is still in free variation with the synthetic form of the verb, at least in the written language, it seems likely that a shift towards one form over the other two will eventually occur, although it will be a slow process. Moreover, it would be instructive to study similar processes of verbal periphrasis in closely-related languages, particularly Occitan, to see if there were any unique linguistic characteristics that caused this periphrasis to develop in Catalan but not in any of its neighboring languages.

Polandball: European Language Family Portraits, Explained

I saw this Polandball comic posted on Steve the Vagabond and Silly Linguist’s Facebook page a while ago, and found that it was originally posted by Redditor FlanInACupboard on the Polandball subreddit.

This is pretty amazing work, and a good starting point for understanding the diversity of languages in Europe. It’s also full of delightful little details, which I want to explore further.

In this comic, Polandballs are grouped based on the genealogy of the language they represent. The Scottish flag does double duty, representing both Scots and Scottish Gaelic.

Let’s take a closer look.

The Romance Language Family

The Romance languages can all trace their genealogy back to Latin, who gets pride of place on the wall of the Romance language family home.

From left to right (or West to East), we have:

  • Portuguese
  • Spanish
  • Catalan
  • French
  • Occitan
  • Italian
  • Romani (not a Romance language)
  • Romanian
  • Moldovan

Spanish and Catalan aren’t too happy with each other. The Catalan flag used here is the estelada, which is used to represent Catalan republicanism and separatism, rather than the senyera, which represents Catalonia and Aragon more generally. That’s the political undercurrent. Castile and Catalonia have had a tense relationship going back centuries, and language has been one of the key battlegrounds of the battle between the two regions.

The relationship between French and Occitan is much more one-sided: French is so dominant in most of Occitania that UNESCO considers the French Occitan dialects to be severely endangered.

As the geographic successor of Rome, Italy takes its spot beneath Latin’s portrait. Then there is a gap between Italy and the next nearest Romance language, Romanian. Romanian and Moldovan are Romance languages in a sea of Slavic languages, and it’s questionable whether Moldovan and Romanian are separate languages.

FlanInACupboard added a nice little detail, too: Romania doesn’t like its Roma peoples. Unlike the other Polandballs in the Romance language family portrait, Romani isn’t a Romance language: it belongs in the Indo-Aryan language family, along with Hindi, Urdu, Bengali, Punjabi, Marathi, Gujarati, and the like. Despite the seeming resemblance between the words “Romance” and “Romani”, they aren’t related, at least as far as we know. The word “Romance” eventually traces back to Romulus, the name of the founder of Rome, while the word Romani can be traced back to the word “Dom” in the Indo-Aryan languages.

Two languages I hoped to find here are missing: Galician and Romansh. Oh well, I guess they didn't make the family reunion this year.

The Germanic Language Family

The Romance languages have a clearly-attested ancestor language: Latin. The Germanic languages (and the other language families discussed here) have no such luck: they know that they are all cousins, but they have no direct proof that they share a single ancestor, only very strong incidental proof. Part of what historical linguists do is language reconstruction: what might the ancestor language have looked like? We know quite a bit about the Germanic language family, and the reconstructed latest common ancestor of the Germanic languages is known as Proto-Germanic.

(With the exception of the Uralic languages and Basque, all the languages in this comic are Indo-European languages. The reconstructed latest common ancestor of these Indo-European languages is known as Proto-Indo-European, or PIE.)

From left to right, the West Germanic languages are:

  • Scots
  • English
  • Frisian
  • Dutch
  • German

From left to right, the North Germanic languages are:

  • Danish
  • Norwegian
  • Swedish
  • Icelandic
  • Faroese

Here’s a nice touch: the Polandballs are arranged based on their linguistic similarities. Each of the West Germanic languages is most similar to the languages on either side of it. I don’t know enough about the North Germanic languages to be sure that that’s also the case, but I’d assume so.

This is not a well-known fact among English speakers, but the language that is most related to English is Frisian (I suppose that does depend on whether you consider Scots a language or a dialect). However, despite their linguistic similarity, they are not mutually intelligible. “Goeie” is Frisian for “good day”.

On the other hand, kamelåså is not a Danish word at all. It is a word a Norwegian TV show made up to make fun of how incomprehensible Danish is to Norwegians and to Swedes. Here’s the clip (no Danish, Norwegian or Swedish knowledge needed):

Written Icelandic has undergone remarkably little change for a millenium, almost since the end of the Viking age, so Icelandic gets a Viking hat.

The Celtic Language Family

From left to right, these languages are:

  • Scottish Gaelic
  • Manx
  • Irish Gaelic
  • Welsh
  • Cornish
  • Breton

Scottish Gaelic and Breton are being squashed by English and French respectively, but Scottish Gaelic isn’t too happy about it while Breton is resigned to its fate. In terms of the number of speakers and language revitalisation, I don’t know how accurate this is, but the French state is pretty merciless when it comes to favouring French over regional languages. That said, Breton is still the third-most spoken Celtic language, behind Irish and Welsh.

Cornish has a good story. Cornish Polandball looks pretty dazed, and that’s because Cornish was more or less considered an extinct language in the 20th century. This is disputed, but what’s not in doubt is that Cornish was under threat from English for a very long time (Wikipedia’s Last speaker of the Cornish language page mentions that the last-known monolingual Cornish speaker died in 1676). However, during the death throes of the Cornish language, the Cornish managed to revive the language, and UNESCO no longer considers the Cornish language to be extinct. I know that sounds like a strange sentence, but it’s true.

The Slavic Language Family

From left to right, these languages are:

  • Czech
  • Slovak
  • Polish
  • Bulgarian
  • Macedonian
  • Slovenian
  • Bosnian-Croatian-Serbian
  • Ukrainian
  • Russian
  • Belarusian

I know relatively little about the Slavic languages, so the things that jump out at me here are arguably more political in nature than linguistic. Bosnian, Croatian and Serbian are, from the linguistic point of view, dialectal variants of the same language. However, given Bosnia, Croatia and Serbia’s recent histories with one another, everybody insists on calling them different things.

The other obvious thing to point out is that Czech and Slovak have orthographical and phonological differences, and Slovak doesn't seem too happy about that, but I don't know any details about that. I also don't know why Russia has a big Σ above it.

The Slavic languages listed here form three dialect continua: Czech, Slovak and Polish; Ukrainian, Belarusian and Russian; Bulgarian, Macedonian, Bosnian-Croatian-Serbian and Slovenian.

I really don’t know anything about the Slavic languages so I will stop here before I embarrass myself.

Oh yes: in Polandball, Poland's colours are traditionally reversed. That's not a mistake.

The Baltic Language Family

From left to right, these languages are:

  • Lithuanian
  • Latgalian
  • Latvian

Okay, I didn’t know about Latgalian. I had to look it up. Latgalian is a language spoken in Eastern Latvia, and Latgalian’s lexicon seems more similar to Lithuanian’s than to Latvian’s. I can’t comment on their grammatical similarities, though.

Lithuanian gets a walking stick and saggy eyes because it is the most conservative living Indo-European language. In other words, it is the language that is most similar to Proto-Indo-European that still has native speakers.


There is no Kosovan language, and the Albanian language is in a branch of its own within the Indo-European family.

This one isn’t linguistic, it’s political. Kosovo has two official languages, Albanian and Serbian. They do not like each other.


Greek is in a branch of its own, the Hellenic branch.


Armenian, like Greek and Albanian, is in a branch of its own within the Indo-European language family.

The Uralic Language Family

From left to right, these languages are:

  • Hungarian
  • Finnish
  • Estonian

The Uralic languages are not Indo-European, and if you look at the geographic distribution of Uralic languages, one of them looks suspiciously out of place:

How did Hungarian get there? Nobody really knows. While Finnish and Estonian are part of a dialect continuum, Hungarian is in a different branch of the Uralic language family. I imagine Hungarian meeting Finnish feels like an adoptee meeting his biological cousin by accident, or something.


Basque is a language isolate spoken by around 700,000 people in the Basque Country, which spans both Spain and France. It has no known relatives, living or dead. It’s thought to be the oldest living language in Western Europe, and the evidence suggests that Basque was spoken in the area before the arrival of Indo-European languages.

Of course, all this doesn’t mean that Basque is a complete and total orphan that was simply willed into existence. Most likely, Basque is related to unknown or unattested languages that have long since died. However, short of some groundbreaking archaelogical research and a Rosetta Stone moment, we may never discover what those languages are.

That rounds up this very, very broad introduction to the language families of Europe, according to Polandball.

"Want" as Future Auxiliary in Singlish

Here's a little something I noticed a while ago. Singlish allows you to do something interesting with the verb "want", but only under certain circumstances.

For those who are familiar with German or with the history of the English language, you will notice that what I am about to outline in Singlish is a parallel phenomenon to what happened in English. German "will" means English "want" (verb). Formerly, in English, "will" did use to mean "want", and you can see this sense of the word in "last will and testament", or "I willed it into existence", or "where there's a will, there's a way." Now, of course, we primarily use "will" in English as a future auxiliary, to indicate that an action will be happening in the future.

This particular semantic drift most likely occurred because if you want something, it is in the future, not in the present. (As for why something similar didn't happen in German: language change is explainable but not predictable.)

I confess, the first time I was given this explanation, part of me refused to accept it. Of course my brain understood why it was likely, but the fact that "will" is already grammaticalised as a future auxiliary blinded me to the process - I could not conceive of making that leap from "will" as in "want" to "will" as in future tense. I took the explanation at face value but didn't like it.

So imagine how intrigued I was when, while working on a documentary, this gem of a Singlish sentence showed up in the Singaporean musical I was following:

"I eat until I want to bao zha (explode) already!"

Any Singlish speaker will recognise this as a valid construction. "Want" here, however, does not indicate volition. You cannot substitute "would like" in the sentence: "I would like to explode." Neither can you substitute "will": "I eat until I will explode." The only acceptable substitution is the "going to" future: "I eat until I [am] going to explode already."

The first explanation is that Singlish acquired this construction from Chinese, and you can see it in a direct translation:

"I eat until I want to explode already!"

The question, then, is whether Chinese uses this construction to mark the future. The answer is: it does. Sort of.

As with many things in Chinese, the exact meaning is context-dependent, but take me at my word for now:

Lit: I want to go to school already.
Meaning: I'm going to go to school now.

Lit: We want to eat already.
Meaning: We are going to eat now.

Lit: They want to go already.
Meaning: They are going to go now.

Other meanings are possible, but I don't want to get too much into the complexities here. The important thing is, in all three cases, a Singlish speaker can hear the literal translation and accept the dynamic translation as meaning the same thing.

Without actually conducting full-scale research on this, my guess is that it is the "already" inchoative aspect marker that does the trick - when used together with "already", "want" is focused from a generalised volition down to a specific, immediate-future time frame. Take away the "already" or the 了, and suddenly volition becomes the preferred interpretation in almost any context:

Lit: I want to go to school.
Meaning: I want to go to school.

Lit: We want to eat.
Meaning: We want to eat.

Lit: They want to go.
Meaning: They want to go.

Curiously, not all verbs can be used in all meanings with this construction. Not to be deliberately crude or morbid, but I think Singlish speakers will recognise these:

"I work until I want to die already."
"I want to vomit blood already."

The volition interpretation is not possible in the above sentences, in the same way it is not possible to willingly want or choose to explode.

The thing is, of course, when the verb has a neutral or positive connotation it is much easier to assume that volition is intended. For example, you would have no issues with the usual English definition of "want" in these sentences:

"I want to sleep already."
"I tired until I want to sleep already."

In fact, in the second sentence above, it is not possible to substitute "want" with the "going to" future. Volition is pretty much the only interpretation that makes sense.

The answer to untangling this construction lies somewhere in Chinese grammar. That's as far as I've worked it out in my head.