TBD: A digression in four part harmony reinterpratation phonotactics orthography tts OR Part 1, in which ... Part 2, in which ...
Where in the world was Feiba Peveli? Well, if you know the key, you can tell just from its name: it was at 38° 11′ N, 87° 53′ W, here:
The key is a procedure invented in 1826 to create place names based just on the latitude and longitude of the place. I've been exploring that procedure and it's led me again to the peripheries of my linguistic life.
Stedman Whitwell was a somewhat successful English architect (☞) and even had a model of one of his proposed towns on display in the White House for some time. He was also associated with the utopian movement at New Harmony, Indiana, for which he designed some model towns.
For Whitwell, one of those annoyances in life was the fact there are many places with the same name. He gives several examples, including 8 Paris's and 3 Romes, but no mention of multiple Londons. So if someone says they are going to Paris, how do you know which one? (☜)
Whitwell came up with an ingenious solution: a procedure that takes latitude and longitude and "translates" that into two words. While the translations are not unique, every pair words corresponds to a unique pair of coordinates, so that when someone says that they are going to Feiba Peveli, you can figure out exactly where that is.
Here's how it works. Every digit corresponds to two possible letters, one consonant and one vowel:
For each digit, you pick one of the corresponding digits. If the minutes or degrees are less than 10, you first add a leading 0, so "3" becomes "03", for example. Then, if the latitude is south of the equator, you add the letter "s" anywhere you like in the latitude word, and if the longitude is west of the Greenwich Meridian, you add "v" anywhere you like in the longitude word.
So for 38° 11′ N, 87° 53′ W we get these possible steps:
3 ⇒ F 8 ⇒ ei 1 ⇒ b 1 ⇒ a = Feiba
8 ⇒ P 7 ⇒ ee 5 ⇒ l 3 ⇒ i = Peeli
Insert v for west longitude = Feiba Peveli
There are a lot of other possibilities for 38° 11′ N, 87° 53′ W, like Ipaa Evinuf, Ipba Veinui etc. In fact there are 16 possibilities for the latitude word (two letters in four positions) and 80 possibilities for the longitude (5 positions for the "v" for each of the 16 basic words), which makes 1280 two word combinations. In a linguistic analogy, we might say there are 1280 synonyms for the location 38° 11′ N, 87° 53′ W.
Of course, not all of those "words" are pronounceable. For example one such latitude word consists of the consonants Fpbb with a corresponding longitude word being Vpnlf. Whitwell himself gave an unpronounceable example (bdfk for 1 2 3 4), but left it up to us to choose pronounceable names.
But how can we choose pronounceable names? The answer lies in the first of my linguistic peripheries: phonotactics. Phonotactics are the constraints that languages have on the combinations of sounds that are pronounceable, or to put it another way, on what combinations of sounds are possible words. (☜) One thing that is interesting, and fundamental, about human language is that we strong ideas ("intuitions") about what is possible and what is not in the languages that we speak well. (☞)
Some of these phonotactic constraints hold across all language: Fpbb is not a possible word in any language. Other phonotactic constraints are language specific: English doesn't allow "pn" at the beginning of a word, but French does. So in English we have the "silent p" in pneumonia, but French pronounces both the "p" and then "n" in the corresponding pneumonie.
Given my interests in language and in programming, an obvious thing to do was to write a program that uses Whitwell's procedure to generate names from latitude and longitude. More than that, I was interested in generating (mostly) names that Whitwell would have considered pronounceable, in other words, names that follow English phonotactics. My program (which you can try here), procedes in three steps:
- Generate all the base combinations from Whitwell's table
- Eliminate unsuitable combinations
- Add in the "s" for south latitude and the "v" for west longitude in appropriate places
Steps (2) and (3) both involve phonotactics. In step (2), the program checks each potential word against an approximation of English phonotactics. This approximation uses ideas about the phonotactics of the syllables that make up words: what can come at the beginning (onset) and end (coda) of syllable. In step (3), "s" and "v" are inserted only where they are phonotactically allowed. While it is conceptually somewhat unsatisfying to have two different ways of handling phonotactic constraints, there you have it.
Getting acceptable names is a bit more complicated than phonotactics, though, from a variety of perspectives. These perspectives led me (further) into the weeds — follow if you will.
Whitwell's discussion of the possible names for 1 2 3 4 says that one of them, adik "might not be thought such an agreeable and explicit combination" as some of the other names. While it's not clear exactly what Whitwell is thinking of, one possibility is that adik doesn't follow English spelling conventions: we would have adick instead of the simple final "k". (☜) However, given our contemporary uses of unconventional spelling, especially in brand names, I have parted ways with Whitwell on that point.
Another aesthetic judgment is how many consecutive vowels to allow. While English orthography allows several consecutive vowels (as in queueing), more than two vowel sounds is unusual, but they do exist (for example goo-i-est). In Whitwell's system, each vowel letter is pronounced separately, and his examples don't have more than two consecutive vowels. More on both of those aspects below. In addition, in my experiments, potential words with more than 2 consecutive vowels didn't "sound good", so I stuck with a limit of two.
Taking into account phonotactics and the two consecutive vowel limitation, I get 4 latitude words and 14 longitude words for a total of 56 synonyms for 38° 11′ N, 87° 53′ W (aka Feiba Peveli):
Even though Whitwell wanted the names to be unambiguous, he struggled with that goal. Let's see what happens when we "decipher" (☞) Feiba Peveli. The problem is the sequence "ei": does it represent the two digit sequence 2 3 or does it represent the single digit 8?
Whitwell's solution is to restrict how digits are chosen:
Those combinations of the single vowels should be avoided which produce the four double ones ee, ei, ie, ou, these represent 7, 8, 9, 0, and never 22, 23, 32, 45, as they might otherwise have done.
In other words, Whitwell stipulates the meaning of those 4 double vowel sequences, so that Feiba can only mean 38 11.
This stipulation has an interesting consequence. Two paragraphs after the above passage, where Whitwell discusses the example of 1 2 3 4, which we saw above, and he says that aeio and beio are not allowed due the above restriction. He doesn't say explicity why they are not allowed, but we can work out that aeio aeio and beio can only mean 184, not 1 2 3 4.
It's interesting to note that in all of the examples so far, we don't actually need Whitwell's stipulation, since we know that the relevant parts of the names are encoding 4 digits (5 digit longitudes always start with a or b since longitude only goes up to 180). In the case of Feiba, we have 3 — 1 1, and since we must have 4 digits, ei can only be 8. Similarly, aeio and beio would be allowed and unambiguous without the restriction, and beet, which is not allowed in Whitwell's system would also be allowed for 1 2 2 9.
However, there are other more problematic examples that Whitwell's restriction doesn't help us with, like Feiet. Here the problem is not combinations of single vowels, but a combination of a single vowel and a double vowel. Feiet could be either F–e–ie–t (3 2 9 0) or F–ei–e–t (3 8 2 0).
There a couple ways around this type of ambiguity in Whitwell's system. One way is simply to disallow these combinations. If we "encipher" the digits from left to right, then we could say something like:
- If the current word in progress ends in e, then the next digit cannot be enciphered starting with e or i.
- If the current word in progress ends in i, then the next digit cannot be enciphered starting with e.
- If the current word in progress ends in o, then the next digit cannot be enciphered with u.
An alternative type of approach is to stipulate how sequences of vowels are to be disambiguated. For example, we might say that grouping procedes left to right, so that two vowel sequences come before single vowels. This would disambiguate Feiet as F–ei–e–t (3 8 2 0). Of course, we could just as well have single vowels precede two vowel sequences. Note that this type of approach tells us how to parse sequences, while the alternative above tells us how to generate sequences. (☞)
Finding this unresolved ambiguity in Whitwell's system led me to think about other systems he could have proposed that would have avoided the ambiguity (and yes, we are going further, and further into those weeds). (☜)
One way to avoid the ambiguity problem is not to have two vowel sequences in the mapping, since that is what leads to potential ambiguity. However, assuming we want to keep with one sequence of vowels and one sequence of consonants, we have another problem: there aren't enough vowel letters in English. Whitwell is pushing things by using y exclusively as a vowel, but that still only gave him 6 vowels: a, e, i, o, u, and y.
In thinking about this new problem, I came up with three approaches:
- Use additional letter(s)
- Use special symbols
- Use accents
In each case, I considered two additional aspects. One aspect is practicality: how easy would it be to read, write, and learn the system.(☞) This was a concern for Whitwell as well: he comments explicitly that his system should not be too hard to learn and remember. In addition, I take his choice of letters to be indicative of an interest in making the system easy to write and print.
The other aspect is aesthetic. How do the words look visually? How do they sound? These are necessarily subjective, so you may have a different opinion of the methods I'll present. However, I myself am not entirely pleased with all of them, either.
Single vowels by additional letters
The issue of ambiguity arises in Whitwell's system only because he reuses vowels in the "double vowels" of 7 – 0. There are actually two separate issues: which vowels and diphthongs are used and which letters are used. It's not clear whether Whitwell understood this distinction, since he could have used more diphthongs ? One solution to the double vowels is to replace the one of the vowels by a different letter. For example, we could replace the second vowel in the double vowel pairs by a silent h. Single vowels by additional letters "Single vowels" Single letters with special symbols "Single letters" Single letters, with accents ("Single letters, accents") Then talk about TTS 5 vowels with h (= additional letter), 5 vowels with accents