Polandball: European Language Family Portraits, Explained

I saw this Polandball comic posted on Steve the Vagabond and Silly Linguist’s Facebook page a while ago, and found that it was originally posted by Redditor FlanInACupboard on the Polandball subreddit.

This is pretty amazing work, and a good starting point for understanding the diversity of languages in Europe. It’s also full of delightful little details, which I want to explore further.

In this comic, Polandballs are grouped based on the genealogy of the language they represent. The Scottish flag does double duty, representing both Scots and Scottish Gaelic.

Let’s take a closer look.

The Romance Language Family

The Romance languages can all trace their genealogy back to Latin, who gets pride of place on the wall of the Romance language family home.

From left to right (or West to East), we have:

  • Portuguese
  • Spanish
  • Catalan
  • French
  • Occitan
  • Italian
  • Romani (not a Romance language)
  • Romanian
  • Moldovan

Spanish and Catalan aren’t too happy with each other. The Catalan flag used here is the estelada, which is used to represent Catalan republicanism and separatism, rather than the senyera, which represents Catalonia and Aragon more generally. That’s the political undercurrent. Castile and Catalonia have had a tense relationship going back centuries, and language has been one of the key battlegrounds of the battle between the two regions.

The relationship between French and Occitan is much more one-sided: French is so dominant in most of Occitania that UNESCO considers the French Occitan dialects to be severely endangered.

As the geographic successor of Rome, Italy takes its spot beneath Latin’s portrait. Then there is a gap between Italy and the next nearest Romance language, Romanian. Romanian and Moldovan are Romance languages in a sea of Slavic languages, and it’s questionable whether Moldovan and Romanian are separate languages.

FlanInACupboard added a nice little detail, too: Romania doesn’t like its Roma peoples. Unlike the other Polandballs in the Romance language family portrait, Romani isn’t a Romance language: it belongs in the Indo-Aryan language family, along with Hindi, Urdu, Bengali, Punjabi, Marathi, Gujarati, and the like. Despite the seeming resemblance between the words “Romance” and “Romani”, they aren’t related, at least as far as we know. The word “Romance” eventually traces back to Romulus, the name of the founder of Rome, while the word Romani can be traced back to the word “Dom” in the Indo-Aryan languages.

Two languages I hoped to find here are missing: Galician and Romansh. Oh well, I guess they didn't make the family reunion this year.

The Germanic Language Family

The Romance languages have a clearly-attested ancestor language: Latin. The Germanic languages (and the other language families discussed here) have no such luck: they know that they are all cousins, but they have no direct proof that they share a single ancestor, only very strong incidental proof. Part of what historical linguists do is language reconstruction: what might the ancestor language have looked like? We know quite a bit about the Germanic language family, and the reconstructed latest common ancestor of the Germanic languages is known as Proto-Germanic.

(With the exception of the Uralic languages and Basque, all the languages in this comic are Indo-European languages. The reconstructed latest common ancestor of these Indo-European languages is known as Proto-Indo-European, or PIE.)

From left to right, the West Germanic languages are:

  • Scots
  • English
  • Frisian
  • Dutch
  • German

From left to right, the North Germanic languages are:

  • Danish
  • Norwegian
  • Swedish
  • Icelandic
  • Faroese

Here’s a nice touch: the Polandballs are arranged based on their linguistic similarities. Each of the West Germanic languages is most similar to the languages on either side of it. I don’t know enough about the North Germanic languages to be sure that that’s also the case, but I’d assume so.

This is not a well-known fact among English speakers, but the language that is most related to English is Frisian (I suppose that does depend on whether you consider Scots a language or a dialect). However, despite their linguistic similarity, they are not mutually intelligible. “Goeie” is Frisian for “good day”.

On the other hand, kamelåså is not a Danish word at all. It is a word a Norwegian TV show made up to make fun of how incomprehensible Danish is to Norwegians and to Swedes. Here’s the clip (no Danish, Norwegian or Swedish knowledge needed):

Written Icelandic has undergone remarkably little change for a millenium, almost since the end of the Viking age, so Icelandic gets a Viking hat.

The Celtic Language Family

From left to right, these languages are:

  • Scottish Gaelic
  • Manx
  • Irish Gaelic
  • Welsh
  • Cornish
  • Breton

Scottish Gaelic and Breton are being squashed by English and French respectively, but Scottish Gaelic isn’t too happy about it while Breton is resigned to its fate. In terms of the number of speakers and language revitalisation, I don’t know how accurate this is, but the French state is pretty merciless when it comes to favouring French over regional languages. That said, Breton is still the third-most spoken Celtic language, behind Irish and Welsh.

Cornish has a good story. Cornish Polandball looks pretty dazed, and that’s because Cornish was more or less considered an extinct language in the 20th century. This is disputed, but what’s not in doubt is that Cornish was under threat from English for a very long time (Wikipedia’s Last speaker of the Cornish language page mentions that the last-known monolingual Cornish speaker died in 1676). However, during the death throes of the Cornish language, the Cornish managed to revive the language, and UNESCO no longer considers the Cornish language to be extinct. I know that sounds like a strange sentence, but it’s true.

The Slavic Language Family

From left to right, these languages are:

  • Czech
  • Slovak
  • Polish
  • Bulgarian
  • Macedonian
  • Slovenian
  • Bosnian-Croatian-Serbian
  • Ukrainian
  • Russian
  • Belarusian

I know relatively little about the Slavic languages, so the things that jump out at me here are arguably more political in nature than linguistic. Bosnian, Croatian and Serbian are, from the linguistic point of view, dialectal variants of the same language. However, given Bosnia, Croatia and Serbia’s recent histories with one another, everybody insists on calling them different things.

The other obvious thing to point out is that Czech and Slovak have orthographical and phonological differences, and Slovak doesn't seem too happy about that, but I don't know any details about that. I also don't know why Russia has a big Σ above it.

The Slavic languages listed here form three dialect continua: Czech, Slovak and Polish; Ukrainian, Belarusian and Russian; Bulgarian, Macedonian, Bosnian-Croatian-Serbian and Slovenian.

I really don’t know anything about the Slavic languages so I will stop here before I embarrass myself.

Oh yes: in Polandball, Poland's colours are traditionally reversed. That's not a mistake.

The Baltic Language Family

From left to right, these languages are:

  • Lithuanian
  • Latgalian
  • Latvian

Okay, I didn’t know about Latgalian. I had to look it up. Latgalian is a language spoken in Eastern Latvia, and Latgalian’s lexicon seems more similar to Lithuanian’s than to Latvian’s. I can’t comment on their grammatical similarities, though.

Lithuanian gets a walking stick and saggy eyes because it is the most conservative living Indo-European language. In other words, it is the language that is most similar to Proto-Indo-European that still has native speakers.

Albanian

There is no Kosovan language, and the Albanian language is in a branch of its own within the Indo-European family.

This one isn’t linguistic, it’s political. Kosovo has two official languages, Albanian and Serbian. They do not like each other.

Greek

Greek is in a branch of its own, the Hellenic branch.

Armenian

Armenian, like Greek and Albanian, is in a branch of its own within the Indo-European language family.

The Uralic Language Family

From left to right, these languages are:

  • Hungarian
  • Finnish
  • Estonian

The Uralic languages are not Indo-European, and if you look at the geographic distribution of Uralic languages, one of them looks suspiciously out of place:

Image taken from Wikipedia , used under a  CC BY-SA 3.0  license

How did Hungarian get there? Nobody really knows. While Finnish and Estonian are part of a dialect continuum, Hungarian is in a different branch of the Uralic language family. I imagine Hungarian meeting Finnish feels like an adoptee meeting his biological cousin by accident, or something.

Basque

Basque is a language isolate spoken by around 700,000 people in the Basque Country, which spans both Spain and France. It has no known relatives, living or dead. It’s thought to be the oldest living language in Western Europe, and the evidence suggests that Basque was spoken in the area before the arrival of Indo-European languages.

Of course, all this doesn’t mean that Basque is a complete and total orphan that was simply willed into existence. Most likely, Basque is related to unknown or unattested languages that have long since died. However, short of some groundbreaking archaelogical research and a Rosetta Stone moment, we may never discover what those languages are.

That rounds up this very, very broad introduction to the language families of Europe, according to Polandball.

Cognates and False Friends

Language learners are often taught that cognates are sets of words that look or sound similar in different languages that mean the same thing.

English: much
Spanish: mucho

False friends on the other hand, are words that look or sound similar in different languages that mean different things. “They are not cognates,” some language teachers will tell you sternly.

English: constipated
Spanish: constipado -- to have a cold

I understand why language teachers say this. Even when they know better, this explanation is usually much more useful than the real explanation, at least from a language learning point of view.

The problem is, it’s just not true. It’s also much less interesting than the correct definition of a cognate.

Cognates are sets of words in different but related languages that share an etymological origin: that is, they descend from the same word.

In the examples I gave above, English “much” and Spanish “mucho” are in fact false cognates: words that look like cognates and have similar meanings but are not related. They just happen to look like each other. English “much” traces its lineage to Proto-Germanic *mekilaz and to Proto-Indo-European *meg-, while Spanish “mucho” goes back to Latin “multus” (from which English gets “multi”) and Proto-Indo-European *mel-.

Taking a look at some Romance languages:

Meaning: to hurt, to injure
Catalan: ferir
French: férir
Portuguese: ferir
Spanish: herir
Latin: ferīre

Meaning: to do, to make
Catalan: fer
French: faire
Portuguese: fazer
Spanish: hacer
Latin: facere

Meaning: oven
Catalan: forn
French: four
Portuguese: forno
Spanish: horno
Latin: furnus
English: furnace

Meaning: to flee
Catalan: fugir
French: fuir
Portuguese: fugir
Spanish: huir
Latin: fugere

These are all cognates, by both the linguistic and “common” definition of a cognate. It's fairly obvious what's happening here, since we know that Latin is the parent language of Catalan, French, Portuguese and Spanish. We have lots of Latin texts, obviously, and we can trace how each of these languages developed from Latin over time. But supposing we didn't know what Latin sounded or looked like, we could still make a pretty good guess -- we could guess that Latin had the initial /f/ that disappeared somewhere along the way many areas of what is now Spain. Once you know this, finding these types of cognates in these Romance languages is a doddle.

If you have enough data, you can use the lexis, grammar and phonology of the child languages to reconstruct the most recent common ancestor of the two languages using the comparative method(http://en.wikipedia.org/wiki/Comparative_method_(linguistics)). Round up all the Romance languages, run them through the comparative method, and you can reconstruct their most recent common ancestor: Latin.

If the most recent common ancestor is not attested, then the reconstructed language is called a proto-language(http://en.wikipedia.org/wiki/Proto-language). We don’t have proof that a given proto-language exists, but we can be pretty sure that at some point, a language (or possibly a collection of languages) like it must have existed in order for its descendants to have the phonology, grammar and vocabulary they do.

Here are some cognates in English, Dutch and German:

English: do
Dutch: doen
German: tun

English: door
Dutch: deur
German: Tür

English: day
Dutch: daag
German: Tag

Again, there's a regular pattern here, and this is just one of many patterns that you can find when comparing English and German. Gather a large enough corpus across enough related languages, and you can reconstruct Proto-Germanic(http://en.wikipedia.org/wiki/Proto-Germanic_language). (I'm simplifying things a little, mainly because the actual work of linguistic reconstruction is well outside my limited expertise.)

Let’s dig a little deeper. What about cognates in Spanish and German? In theory, with enough data, you could reconstruct the nearest common ancestor of those two languages:

Spanish: pie
German: Fuss

Spanish: padre
German: Vater

No?

Spanish: pie
Catalan: peu
Latin: pedis
English: foot
Dutch: voet
German: Fuss

Spanish: padre
Catalan: pare
Latin: pater
English: father
Dutch: vader
German: Vater

Hmm.

What you’re seeing here is part of Grimm’s Law(http://en.wikipedia.org/wiki/Grimm%27s_law), one of the most famous laws in historical linguistics, both for being the first systematic sound change to be detailed and for the fact it was discovered by that Jakob Grimm(http://en.wikipedia.org/wiki/Jacob_Grimm), half of the pair of brothers who wrote about rather creepy things when you think about it. Among other things, Grimm’s Law says that in Germanic languages, sounds that were voiceless stops in Proto-Indo-European become voiceless fricatives in Proto-Germanic. PIE /p/ becomes /ɸ/ in Proto-Germanic.

Damn. Now I have to finish writing Articulatory Phonetics 101, I say, only slightly upset.

But yes, if you run all these languages through the comparative method, supplemented by other related languages, you eventually get the reconstructed language known as Proto-Indo-European.

Why don’t we compare all the languages of the world and try to reconstruct a sort of Proto-Babel, the language from which all human language descends?

Because if you take, say, English and Mandarin and use the comparative method on them, what you get is — nothing.

Semantic drift

Here’s a set of cognates for you:

Spanish: actual
Catalan: actual
German: aktuell
Dutch: actueel
English: actual

Wait a second — one of these is not like the others.

Spanish: actual — current
Catalan: actual — current
German: aktuell — current
Dutch: actueel — current
English: actual — actual

Or, more famously:

Spanish: preservativo
German: Präservativ
English: preservative

Just so you don’t make this mistake in Spain or Germany:

Spanish: preservativo — condom
German: Präservativ — condom
English: preservative — preservative

What happened here? These are true cognates but false friends, just like “constipated” and “constipado”: they share an etymology, but at some point, the meaning attached to one or both of the words shifted, leaving very confused language students in their wake.

If you can remember that cognates do not have to have the same meaning, you can have a lot of fun with cognates, though. They reflect a language’s history and the agility of the human capacity for language in a myriad of ways: consider the mental leaps that must have happened for this change to happen:

German: wollen -- to want
Dutch: willen -- to want
English: will -- future auxiliary

Or this one:

German: sterben — to die
Dutch: sterven — to die
English: starve — to starve

To come back to the whole “cognates for language learning” thing: I’m not saying, don’t use “cognates” or “false friends”. I am saying that understanding what cognates actually are is, in my not-so-humble opinion, a much more expansive and stimulating way of engaging with a language.