I’ve decided that each weekend I’ll dig out an object or two from my more distant past and write about it. To kick things off, here’s a challenge which was originally created by the same chap who coined my name.
The text you can see in the image below (at least if you happen to be sighted) is in an unknown script. Your task is obvious, I think.
The only clues you have are that it’s a quote from a book by Ursula LeGuin and it’s nothing whatsoever to do with Tolkein.

Now originally I solved this in under 2 days, without the aid of computers or amphetamines. I reckon that in The Age of the Internet you can do better. I’ll negotiate a suitable prize for the first person who posts the solution.
Tags: cryptanalysis, ursula-leguin
-
I’ve had a brief look at this following your tweet about nobody solving it.
Observation 1. It could be a simple character substitution code given that at least two “scribbles” are repeated throughout the script – the first “scribble” and last “scribble” on the first line for instance. If this is the case the commonly used English character frequency of letters table starting ETANOISH… could be of use.
Observation 2. Although you state it has absolutely nothing to do with “Tolkien” clearly this is a clue in itself and research into Tolkien is clearly in order. I note the the “Tolkien logo” bears a resemblance to the strokes (scribbles) in the script.
Observation 3. If this is a character substitution code and if each line is a word then the second line could only be the “word” a or i.
Observation 4. If the second character is an a or an i and if each line is a word then the penultimate line would be a two character word that could only start with an a e i o or u. If the second letter is an a or an i this reduces the number of permutations to words such as it or of etc.
Observation 5. I had never heard of Ursula LeGuin but Google research into “quotes Ursula LeGuin” results in quite a few interesting observations about life the universe and everything but none quite match my deliberations although “if you light a candle you also cast a shadow” has an appropriate place given that it has an “a” where I might expect to find an “a”.
-
Right the greek alpahbet has 24 letters alpha beta though to omega
http://en.wikipedia.org/wiki/Greek_alphabet
The Phoenician alphabet (linked to above) offers some promise but there is no clear relation between the symbols in either the Greek or Phoenician alphabets to the script above.
Wikipedia informs me in their article on Egyptian hieroglyphs that there are logographic and alphabetic representations. The Japanese I seem to recall tend to mix their logographic symbols with English and/or European characters which makes travelling in Tokyo quite interesting as the English bits give an idea of what’s going on.
(working on it.. working on it… It’ll take longer than two days !)
-
It seems to me that some of the letters have an inordinate amount of strokes. So, I’ll offer the hypothesis that the number of strokes and dots in each letter has some relationship to the plain-text. With the number of dots in parentheses, my count of the strokes are
4(2) 5(2) 13(1) 4(0) 4(2) 6(2) 12(2) 4(0)
14(3)
9(4) 12(0) 4(2) 9(1) 14(0)
6(0) 2(1) 17(2) 4(0) 4(2) 19(1) 4(0)
12(3)
3(3) 4(2) 12(0) 4(2) 5(1) 5(1) 10(2)Now, this all looks good up until the last line, where two sets of different letters have the same number of strokes and dots – which may be intentional, as a coding device to represent an English (or more correctly, Latin) letter with two or more coded letters, or it may not be, in which case the substitution breaks.
Perhaps there is meaning as to whether the dot is above or below the central horizontal axis of the script, which would allow the two 5(1)’s to be different letters, but the similarity of the two 4(2)’s on the final line would tend to disprove the hypothesis.
That’s about as far as my reasoning takes me, pre-coffee on a Sunday morning.
-
I have often found it useful in many problem domains to list out the assumptions one is making, so that they may be validated. So, what are the assumptions in play here? And how can we validate them?
1. That each grapheme in the script is separated from other graphemes by a continuous section of space. This means, that the fifth line is one character, not two as @bob_bain has suggested. Can we prove or disprove this? Not really, so lets keep it as a working assumption.
2. That there is a one-to-one correspondence between a grapheme in the script, and a letter in the Latin alphabet – i.e. each grapheme maps to exactly one Latin letter, and each Latin letter maps to exactly one grapheme. Can we prove or disprove this? Yes, because Stilgherrian confirmed it when he said the “The language is English, just written in a different alphabet”.
3. That the different lines represent new words – i.e, based on the first assumption, the English quote is of the form “xxxxxxxx x xxxxx xxxxxxx x xxxxxxx”. Hmm, that’s a problem that could be solved rather quickly with a Regex and the text of Ursula LeGuin’s works. But that would hardly be sporting. Can we prove or disprove this? Not really, but it’s worthwhile to keep in mind other possibilities: that no word breaks are used, ASENGLISHCANBEWRITTENLIKETHISWITHOUTLOSINGMUCHMEANING, or another grapheme is used as a “word-break” character – the last character of the first line is a possibility.
More thought required here…. but at least I learnt a new word (“grapheme”).
-
Ah hah
Currently considering FONTS !
e.g.
http://www.searchfreefonts.com/free/bisaya-1880.htm
Type Stilgherrian into the box and you get symbols with dots and lines BUT if the solution is a FONT then this isn’t the font that may be being used as it doesn’t produce the required graphemes.
Microsoft Wingdings doesn’t work either !
Some of these non-latin fonts approach the type of grapheme being considered.
(working on it.. working on it… )
-
Stilgherrian’s comment that “grapheme” is indeed a wond’rous word…
reference http://en.wikipedia.org/wiki/Grapheme
“A grapheme (from the Greek: γράφω, gráphō, “write”) is the fundamental unit in written language. Examples of graphemes include alphabetic letters, Chinese characters, numerical digits, punctuation marks, and all the individual symbols of any of the world’s writing systems.”
noting that “Alphabet” is an interesting word too, and ONE WHICH DESERVES FURTHER CONSIDERATION” appears to indicate that we can ignore Chinese characters, numerical digits, punctuation marks, and the odd assortment of individual symbols of any of the word’s writing sytems.
This rules out such things as Ideograms http://en.wikipedia.org/wiki/Ideogram
“An ideogram or ideograph (from Greek ἰδέα idea “idea” + γράφω grafo “to write”) is a graphic symbol that represents an idea or concept. Some ideograms are comprehensible only by familiarity with prior convention; others convey their meaning through pictorial resemblance to a physical object, and thus may also be referred to as pictograms”
Over the coming weeks I’m concentrating on fonts !
-
Pingback from Stilgherrian · Script Challenge revisited on 24 August 2009 at 8:11 am
-
Fonts are interesting in their own right and I am delving into aspects of Microsoft Word at Penrith Valley Seniors Computing Club – so even though fonts may not have any direct relevance to this puzzle I will be looking at fonts in the ensuing weeks.
With regards to an aspect of alphabets we the puzzle solvers are overlooking I digged this out from the Interernet last night.
http://www.answers.com/topic/history-of-the-alphabet
No doubt the English class mentioned in your entry this morning can find the aspect of alphabets we seem to be overlooking noting that alp = “ox” and bet = “house” in the Proto-Canaanite invocation of this phenomena.
I await insight from those much younger than myself !
-
Ok, I think I understand the principle, but then I should.
I wonder whether you had to go to the library to sort this out back when it was first set?
I’m up for the challenge.
-
Interesting: the Tolkien script uses those accent-like dots and squiggles for vowels, so for example the first word, transliterated as Ennyn, is really [n][n] with an [underlined acute] meaning [e], and a [pair of dots] meaning [y]. I speculated there was something similar in your example: the first symbol (top left), looking like an F with two dots, recurs in a bunch of places, and there are also similar ones with different dots and frou-frou, like the second symbol on the fourth row. I’m dividing the larger symbols into what I think are letters — for example, the first one on the bottom line may be one of those F things with one dot and a loop, preceded by a smaller L shape. Two letters? Three, counting the loop+dot as a vowel? Perhaps. I should really be working…
-
Argh. I used to have a key to this. Possibly still do, probably in the back of some old mathematics notes. From 1978. My goodness this brings back very old memories. I wonder if I can remember the logic behind the script. As I look at it some of it comes back to me. I remember when Danny was creating it.
-
Stilgherrian may not have found it necessary to go to a library to solve this as I get the impression that he is a fan of this type of fiction and so has a head-start over those of us who haven’t quite got the hang of it. I have therefore been delving into the works of Ursula Le Guin and have discovered that in the Wizard of Earthsea Le Guin introduced a “special language”
http://stuffedhead.wordpress.com/2009/03/21/88/
Part I: Magic Controlled by Speech
One of the most well known aspects of the Earthsea series is the magic system. Le Guin created a system where wizards use a special language called the Old Speech to work magic. As a passage from A Wizard of Earthsea explains,
“In the world under the sun, and in the other world that has no sun, there is much that has nothing to do with men and men’s speech, and there are powers beyond our power. But magic, true magic, is worked only by those beings who speak the Hardic tongue of Earthsea, or the Old Speech from which it grew.”
Now if a person has read Le Guin’s works then he or she would have a head start over the rest of us methinks…
-
Update.. checked out Le Guin in the Galaxy Bookshop. She wrote verse about hexagrams which leads me to the I Ching.
http://en.wikipedia.org/wiki/I_Ching
The text of the I Ching is a set of oracular statements represented by 64 sets of six lines each called hexagrams (卦 guà). Each hexagram is a figure composed of six stacked horizontal lines (爻 yáo), each line is either Yang (an unbroken, or solid line), or Yin (broken, an open line with a gap in the center). With six such lines stacked from bottom to top there are 26 or 64 possible combinations, and thus 64 hexagrams represented.
The oracular interpretation of the symbolic language based on trigram symbols formed from yang and yin components is well known. However, the inherent numerical language of line change and non-change is relatively unknown.
==================
ah hah ! Progress !!
-
I’m making a very different set of assumptions to yours Bob – I think it’s a whole lot simpler than that. I don’t think it’s a numerically based script – I did think about twig runes which work as a counting code, but I don’t think it’s that.
I have now eliminated quite a few possibilities for words, but haven’t found any definite combinations.
I’m proceding on a largely grammatical/linguistic basis.
-
And it would be very useful to know a tense, but that would feel like cheating.
-
It’s official — Stilgherrian is cleverer than I am. Two days have elapsed and I’m a long way from a solution — though I’ve tested a fair few possibilities.
I’m not being terribly complicated about it — I only wanted to know the tense since it would give me a clue about expected distributions of word endings in English, but I think I’ve answered that question at least. And since it’s a literary text, I wouldn’t expect standard distribution patterns to necessarily hold true anyway.
I now have about six pages of if/then statements, and a lengthy statement of assumptions. I just don’t have any positive correlations.
Last night I dreamed of Thai elves with glottal stops (which is probably better than Mandalay).
-
@Stilgherrian “Is any of that a clue? I have no idea”
You make mention of vowels reminding us that alphabets are comprise vowels and consonants which may be the issue we have been overlooking when it comes to alphabets. As it notes on an Internet page somewhere there are a considerable number of “vowels” in the International Phonetic Alphabet — which as an aside is referred to on Ursula Le Guin’s home page. There are over 40 vowel sounds I believe.
As I write the word “sounds” I am reminded that alphabets attempt to document sounds that can be produced by the human vocal chords — as opposed the symbolism of number found in mathematics.
Perhaps we should be looking at intonation — the rise and fall of the strokes and attempting to reconcile this back to human speech — in English.
An approach I took yesterday was to draw a thin blue line through the middle of each row of graphemes and examine the marks above and below the line.
Quatrefoil appears to be attempting a solution using FORTAN
— a computer language which derived it’s name from FORMula Translator. Perhaps we should be using LISP — the language for Artificial Intelligence — “Lots of Irritating Stupid Parentheses”. -
After a fruitless search I am keyless. I can even picture in my mind’s eye the old notebook I need – Stats 1H. But of all the old books in a box in the cellar, that one is missing. Found lots of books of notes with my handwriting in them, but no recollection of ever taking the course. Applicable Analysis anyone? Actually looked to be pertinent too.
My very faint recollection of the explanation Danny gave me was indeed that relative position above and below the dominant line is a key part of the encoding. I think I have the main structure of the text sorted, but fitting the minor consonants isn’t so trivial. One clue to the provenance of the quote is that I will bet the text we see was inscribed in 1978. So that seriously limits the books from whence it came.
-
@ Stilgherrian
And since it’s your handwriting, can you tell me if a sharp bend is intentionally different from a smooth curve, or is that just a variation in the hand?
-
@Quatrefoil
I’m familiar with simple logic and there is a formal system of symbol manipulation that assists. Some of this can also be performed with a Venn Diagram
http://en.wikipedia.org/wiki/Venn_diagram
To test simple logical statements I find Venn diagrams and Truth tables understandable.
However as far as “logic” is concerned as Wikipedia notes
http://en.wikipedia.org/wiki/Logic
“Just as we have seen there is disagreement over what logic is about, so there is disagreement about what logical truths there are”
As far as digital computers are concerned the breakthrough in logic came with Boole and the application of Boolean algebra to electronic circuits
http://en.wikipedia.org/wiki/Digital_electronics
http://en.wikipedia.org/wiki/Boolean_algebra_(logic)
“Boolean algebra (or Boolean logic) is a logical calculus of truth values, developed by George Boole in the 1840s”
As explained to me once in an elementary electronics course it was put to George Boole in the 1800′s “Thats fine George but what possible use is this ?” and it wasn’t until the advent of the digital computer that Boolean algebra becomes valuable involving AND OR and NOR gates which control electronic circuitry. Boolean logic is also used in programming but this is a slightly different concept as it involves human beings who can and do (too frequently) get it wrong.
We also have Fuzzy logic http://en.wikipedia.org/wiki/Fuzzy_logic
This is widely used in the application areas listed – mostly in the electronic sphere.
When it comes to digital computers attempting to emulate the human brain then we enter the world of Neural Networks and Genetic Algorithms
http://en.wikipedia.org/wiki/Neural_network
http://en.wikipedia.org/wiki/Genetic_algorithmIn the end understanding the world is best left to the human brain which is the computer most used by our biological species.
It is from this base I am attempting to understand the symbolism and am thinking of attempting to emulate it via sounds and exploring the wave forms produced and comparing this to the symbols in the script. All such wave forms are a subset of a sine curve I believe.
http://en.wikipedia.org/wiki/Sine_wave
Check out the diagrams in the article above and perhaps examine sine, square, triangle and sawtooth.
I don’t believe any type of “formal logic” will work in this situation.
After all that I am having difficulty too.
@Stilgherrian Sir. Can we apply for a government grant to help solve this ?
-
@stilgherrian There was a government grant ($20,000 I seem to recall) once awarded for the production of a rather glitzy pornographic magazine.
It had elements of artistic merit.
-
So, after a couple of weeks away, I’ve come back to have a look with fresh eyes.
Two things jump out – Stilgherrian’s comment right at the top that it was “English, but written in a different alphabet”, and the continued hints about “alphabet”. So perhaps the alphabet used has graphemes for sounds that we normally might represent by a digraph in English – for example, “sh” or “ch”. Hmmm….
But of particular interest in teh Wikipedia, is this article on Devanagari, the alphabet used to write Hindi, Urdu and Sanskrit. The article notes that the alphabet is “recognizable by a distinctive horizontal line running along the tops of the letters that links them together”. Now where have we seen that before?
-
Bah, it’s not used for Urdu at all. Some of this damned dust must have got into my brain…
-
My understanding of it is that each character represents a word. The characters are geometric, the letters are represented by the lines branching off of the middle line. I figure this because many of the characters have similar ‘parts’ to them (such as the wedge < on the last character on the first line, amongts others, the '4' turn which is juxtaposed to the loopy turn, and of course the dot).
I figure the first character ("double-dot f") and the fifth character (wedge-f) are often-used words such as 'the', 'in' or 'of'. That the characters are words in themselves is supported by several character with a huge number of strokes (far too many for it to realistically be a single letter).
Your continued hint at 'alphabet' makes me think that in english, our alphabet is one letter-per-sound. But in other alphabets, the characters represent groups of sounds, whole words, meanings. So that supports my guess that the individual characters don't merely represent a letter. Also, the characters are not groups together in any way to represent words.
-
@Stil
Thanks for clearing up the difference between ‘sound’ and ‘phoneme’, I should have used that word seeing other people had used it.
Since you keep showing us the difference between ‘sound’ and ‘phoneme’, I’m led to think that this is extremely important. I’m not sure if it’s the correct word, but I’ll use ‘glyph’ to represent the different partitions of a grapheme. One glyph represents either one phoneme (ie. ‘th’ in ‘then’, a in ‘father’), or a group of phonemes, but i’m want to suspect the former, because I don’t think there are enough glyphs to represent all possible groupings of phonemes.
The center-line is obviously releavant, because sometimes a glyph crosses that line, or other times merely sits under or over it. I am not going to count the center-line as a possible phoneme as it exists in every grapheme.
The one thing I’m having difficulty in is separating the glphs. Many are easy (the jagged curve in L1G3 (Line 1 Grapheme 3) and at the beginning of L1G7, amongst others). However some are difficult – take L1G1 – not counting the centerline, there is a south-west quadrant dash, a north-east quadrant dash, a vertical line and two dots – the dots are probably vowel indications, given your love of qwenya and sindarin, but I’m unsure if each of these is a phoneme in itself, or perhaps joins with another stroke to create a phoneme (ie. the verticle and southwest dash could join to become ‘d’).
But I figure that L1G3 is made up of the following glyphs (i’ve separated them): http://imgur.com/rxyrW; Perhaps glyph 2 & 3 belong together as one glyph?
[Stilgherrian writes: To save you having to click through, here's Daniel's (incorrect) breakdown of L1G3.]

-
Far too many or just one or two too many? I can see now that pieces two and three in my picture are in fact one piece.
Regardless, I don’t see myself getting anywhere with this one. There are far too many loaded assumptions for me to get my head around. I’ve sat here for half an hour staring it and now have more questions than answers.
Why does the down-ward squiggle in L1G1 sometimes break the center-line, but other times like L4G3 (the second instance of the glyph) the centerline continues?
What difference does it make when a glyph starts below the line, or when the same glyph goes through the line, or when it’s not even on the line?
Are the dots the same glyph, or different glyphs when they’re in different places?
-
A very vague memory (from 30 years ago.) This wasn’t designed to be a code. It was designed to be beautiful typography. So ligands are a reasonable thing. It was designed by a computer scientist. There is underlying structure and rationale to it. There is beauty in that too.
I’m trying to avoid looking at it again. Too much other stuff to worry about without going down that slippery slope.
-
Just in case anybody is reading this while I’m working on the next bit, Stil’s reference to George Bernard Shaw is a reference to the Shavian Alphabetic, a phonemic/phonetic alphabet. http://en.wikipedia.org/wiki/Shavian_alphabet At first I thought it was in reference to the diacritics that the tengwar script uses to denote vowels, but the Shavian alphabet does not have these.
Something interesting about the Shavian script is that whether the letter is written above or below the centerline denotes if it is a hard or a soft sound (ie. the ‘th’ and ‘they’ and ‘thick’).
The Elvish language is also phonetic, and uses a similar system to change a hard sound to a soft sound. From what I have briefly read, either a line is added or the number is spun around…
I have to look more into this.
‘Everything has meaning’ is most likely in reference to that every stroke has meaning, except for one, the centerline. I’m trying to make a list, but some characters are still hard to separate the different ‘ligatures’. I still am not sure about L1G1 – not counting the middle line, there are three lines + the two dots. Is this three phonemes? four? More? Less? Gah. L1G2 is similary difficult – I’m counting three ligatures – the first vertical line, the second verticle line with the ’4′ loop and the two dots – three phonemes?
I’ll give it to you Stil. I’m hooked.
-
I find myself wondering if the script includes punctiation as well as phonemes?
-
Am I looking at it correctly, then, in that there are three sentences total?
-
I’m onto it. Slow but getting it. The double dots are the most mysterious parts but I’ve found some charts on the net that are helping me decypher the straight & curved lines. I think I probably know what you mean about the “sound” aspect of this code- when I looked at some of the really squiggly bits I recalled seeing something like it that my mum used to do, but she never taught me how- grr…
-
I’ve gotten them all broken down into what I think are the various ‘characters’. They’re relatively consistent, so it wasn’t.. awful.
My one confusion with this is the same as @Murfomurf was saying- the dots. I have them classified as numbers. Number two goes 2, 2a, 2b, based on the dots, etc.
@Stilgherrian, I know you’ve given vague hints before, but would you mind hinting *once more* about how you’ve altered the alphabet? You said it was minor, but does ‘minor’ imply that the alteration was only for vowels or consonants- or for a small number of both?
I ask because I’ve gotten ‘spaces’ written out already with each character’s temporarily-assigned number, but am unsure how to proceed from here.
-
Alright, here’s my proposed list of characters.
https://docs.google.com/drawings/d/1g6j7-PVi7C9VwOarvJZE9zIjTOsMt6JBzkBdB7oRiO8/edit?hl=en_US
Murfomurf, were you getting something similar?
-
I am assuming that each glyph is a word. It seems that most have drawn that conclusion.
No one seems to have mentioned the fact that the centre line is not drawn from the start to the end of all “words”. It may not be relevant, but it seems that a horizontal line is present in all “words”, but there are breaks in some.
Is the break symbolic of syllables? L1G3, for example, starts with a character (without the horizontal), then a series of characters (with a horizontal), then a break to the next series of characters (with a new horizontal).
I would also like to explore the idea that the horizontal line actually splits a vertical line in two – in the image above, number 5 is not 1 vertical line, but actually 2 – one above, one below.
Unfortunately I am as yet unable to do anything to develop these thoughts further, but wanted to share to see if anyone had an opinion on them.
-
One thing I haven’t noticed much about in the comments so far, but which may hook back to some early clues: if it is English written in an alphabet, nothing says it is written in a direct analogue to the *current* English alphabet of 26 characters. There are, after all, 44 phonemes in English, and some of those currently represented by pairs of graphemes were at one point represented by single graphemes that are no longer in use (for example, the “thorn” character used in Old English) or as ligatures (the “ae” ligature rather than an “a” followed by an “e”). Jason L. did touch on the latter with the comments about languages whose characters appear to have line-linkages, sort of, and ligatures serve(d) the same basic purpose in English, albeit usually being a typographical convention.
Since the main place that I have seen either of these written out is in older English, which was Tolkien’s primary subject of study as a linguist — especially the thorn case — it seems plausible that attempting to do a character substitution against English written using some or all of the possible variations if you include older characters and/or ligatures might result in a more productive analysis.
Or it could even be as simple as a true phonetic alphabet for English, in which case you need to use a slightly different form of the classic frequency mapping, as several of the high-frequency characters actually have multiple phonemes associated. I don’t know that I’ll have time to delve into it, but some thoughts in case anyone else cares to try the tack.
-
My working assumptions, wrong as they may be:
1. The “clearer” spacing is handled at least roughly as in English: to wit, it designates word boundaries. The “one word per line” just doesn’t make sense to me; it would produce a quote that is far too short for most of the significant things in LeGuin’s work, and there is far too much complexity within a single line for it to realistically represent one word in something written in English. Welsh, perhaps, but not English.
2. The “disconnect spacing”, such as that seen in the third grouping on the first line, does *not* indicate a word break. It may or may not indicate a phoneme break or a syllable break.
3. There is a mapping to either the full IPA or the set of 44 English phonemes, though given that it is a text from English, presumably the English subset of the IPA would be sufficient.
4. There is probably significance to the repeated patterns that appear in different compositions, such as the frequently-seen vertical stroke with a cross-stroke at the top and one or more dots, which appears in both of the first two groupings, the second time with a diagonal linkage, which is then itself repeated *without* the lower portion in the middle of groupings later in the same line. There is also a rounded-linkage variant that appears multiple times later on.
4. That blue line is actually positioned as a midline (or centerline). A baseline generally occurs lower down; roughly around where the left horizontal stroke on the opening grapheme appears, running just below the “leftward V” on the last grapheme of line 1, etc.
5. Beware the serifs. The original appears to have been done using a calligraphic style, possibly a classic ink pen, evident in the smoothly-broadening strokes and small “tails” in some spots that run exactly across the end of the stroke. See the opening of the third grouping on the last line for a very clear example of this. Details are important, but not *all* details necessarily have the same importance. Anyone know if S. is right-handed? The angles are roughly correct for a right-handed person writing in a classic humanist style (with the nib at a 30-to-45 degree angle to the vertical of the writing).
6. The comment S. made highlighting the two dissimilar corners (sharp vs. rounded) is more obvious if you have an understanding of the physical strokes used to produce the two types with a nib pen: assuming a right-handed writer, the sharp form goes to the left at the base of the stroke, then comes back across to the right, while the soft form goes directly to the right and probably involves a slight (possibly not entirely comfortable) twist of the wrist and fingers to keep it from producing an excessive ink blot or having a strange “wobble” to it.
7. Some of the “connective tissue” may be optional; the same character appearing at the start or end of a word may have different components than when it appears in the middle of a word. Consider classic English cursive writing for an example of this.
8. It may or may not be relevant, but the Old Speech from which Hardic is derived was (is) the language of *dragons*. Hardic is the human adaptation of it. The nature of a stroke or dot may have to do with how someone would imagine them being written by dragging or tapping/pressing with a claw, rather than a pen or a human hand. Alternatively, if inspired by runic languages, it may have the same sort of thing going on in a different way.
-
Hello. I’ve reached this site some days ago, following the Google name policy rant. It’s a shame this script challenge is still unsolved after all these years and with all these clues, so I am trying my hand at it. My considerations so far are the following:
1. The text reads left-to-right, top-to-bottom, just as most Western scripts. I take this for certain, because penstrokes were clearly drawn that way. The centered text of lines 2 and 5 make it look like a title page, which would be an enormous clue.
2. The text is made up of 29 words, which can be identified by whitespace as per @Joel’s assumptions 1 and 2, which are completely correct in my view. In the following figure I have placed each word in a green-bordered box. Maybe this breakup is not completely correct. but I’m confident it mostly is. The first word is also the most frequent (5 occurrences) and my guess is that it represents the article “the”. The fourth word occurs 4 times, and it’s probably a preposition (of, in, on …)
3. The most prominent feature of the script is the line (code named “blue line”, painted in blue in my figure) which appears in every word and is sometimes interrupted. I’ll call “ascenders” the strokes above it and “descenders” those below it. Even if a long slash as the beginning of L1W2 really is only one pen stroke, I will analyse it as two strokes, an ascender and a descender. These are standard typography terms. A unique feature of this script is that, while descenders can live either with or without a blue segment above them,
an ascender always requires one. The reverse is not true: in two instances, L1W3 and L6W4, blue segments start without being triggered by ascenders.4. In almost each case where an ascender and a descender are drawn one above the other, possibly in one pen stroke and probably as part of the same letter, one of them is “more complicated” than the other. This prompts me to classify the candidate letters of this script into four groups:
(i) strictly negative: consisting of descenders only. These are the only letters which can live without a blue segment. All other categories require one, because they involve ascenders.
(ii) extended negative: descenders, with an additional ascender (a simple vertical segment)
(iii) strictly positive: ascenders only (a rich inventory of hooks, loops, dotted variants…)
(iv) extended positive, the same as (iii), with an additional descender, which is a vertical segment (sometimes with a dot)5. Where do I put the simple slash which begins L1W2? It is an extended positive to me, because this script does have ascenders consisting of a single vertical stroke, while I don’t see descenders of this kind. If this is not clear, never mind, it’s only an attempt to rationalize impressions. I might be wrong with this detail, but I’m after an overall picture now.
6. What is urgent now is to break up words into letters, and identify which letters are instances of the same character. My attempt at the first step is this figure, where letters in odd positions appear red, so that they can be distinguished easily from letters in even positions (positions are relative to words). To point to an individual letter I’ll use a notation like L1W2.1, as in the assertion: “L1W2.1 (the first letter of the second word in the first line) is almost certainly an instance of the same character as L2W1.2″ (an assertion I hold to be true, BTW…).
7. I have two more pictures. In the first one I erased the blue segments altogether. This is clearest to my eyes when it comes to splitting words into letters, but in this way some information is lost, because I don’t know any more which strictly negative letters had a blue segment above them, and which didn’t. So here is fig. 3, where only blue segments above strictly negative letters are drawn.
8. What about L1W1.1, a horizontal descender with no blue segment above it? Is it a separate letter or part of the extension descender of L1W1.2? I’ve decided for the former, but since it appears only in this one (frequent) word it could also be a special abbreviation, so I don’t care much.
9. I’ve analyzed the initial squiggle of L1W7 as a ligature of two letters, because I think I have instances of those same characters elsewhere in the text, while there are no other instances of the ligature. Again, I might be wrong. Such uncertainties are normal.
10. This post is already too long, and I have still many more observations about the graphic features of this script. I will post them in the near future (I hope), together with my character count, that is, my guess at which letters are instances of the same character, to estimate how many characters are in the script. What is certain is that this script has internal structure: strict characters have corresponding extended ones, dotted characters correspond to dotless ones, and so on. This means that even if not all characters of the script are represented in this text, I might figure how the missing ones
look like. And even before actually counting, I can tell that the number of characters could be much higher than the 26 letters of the English alphabet.11. So this is my final consideration by now: the script is either an alphabet or a syllabic script, a static encoding (or simple substitution cipher, if you like) of something which is probably not ordinary English spelling: this would be the case if the number of characters was 26 or less. It is more likely that the script is a way to spell English phonetically, just as Deseret or Shavian are, with all complications of the case: phonemic inventories differ greatly across the English-speaking world, but in any case we need an alphabet with at least 40 characters, as the one we are dealing with here, according to my count. It might be a strict transcription of Danny’s presumably r-dropping Aussie accent, or an attempt to picture some kind of abstract pronunciation standard. What is more important, it is highly possible that the internal structure of the script, or only its least aesthetically motivated parts, maps to the internal structure of either syllables or phonemic inventories, or both. This happens in Tengwar, in Shavian, in Hangul, Ethiopic, Inuktitut and many other scripts.
More soon!
-
I’m intrigued by this script challenge, but I fear my lack of ability will leave me stumped. There are a lot of comments here, and before I attempt to decipher this text, I felt that I had to come up with my own assumptions first. Based on what has been said so far, have I extracted the following statements correctly?
1. This should be a simple translation. You noted that it is simple “beginner material”
2. The script translates to English, but not in the form in which we currently use. (Perhaps Anglo Saxon English – 24 characters)
3. The dots indicate a vowel, or the placement of a vowel
4. The “glyphs” represent a sound rather than just a substitution to a different letter
5. There are 3 sentences
6. There is no punctuation
7. There are no numbers
I hope people are still interested in solving this. I realise that I have a lot to learn on the subject so apologies if my comments appear ignorant.
-
@Alex Holsgrove: I know as much as you, but I’d like to comment on your points, just to clarify my own ideas:
1. This should be a simple translation
Yes. Confirmed by Stil several times in this thread.
2. The script translates to English, but not in the form in which we currently use.
Yes, Stil wrote it’s a quote from Ursula K. LeGuin’s work, in English, but “not strictly a letter-substitution cipher” and what you write satisfies both hints. However I don’t agree with
(Perhaps Anglo Saxon English – 24 characters)
This was indeed mentioned in one comment but Stil didn’t confirm it (nor denied). I don’t think that the encoding process involves translating LeGuin into Anglo-Saxon, nor adapting modern English into Anglo-Saxon spelling. As many have written before (including me), a phonetic rendering is more probable. Read answer to point 4 below.
3. The dots indicate a vowel, or the placement of a vowel
This was mentioned, neither confirmed nor denied. I am not working in this direction. You can if you like.
4. The “glyphs” represent a sound rather than just a substitution to a different letter
This is a possibility and I am working in that direction. Truth is, we don’t know yet.
5. There are 3 sentences
This was also mentioned and not confirmed. Stil hinted that “In this particular example, if there is a new sentence, it would just start on a new line”, so maybe there are 6 sentences, one per line. Or, the whole stuff is a title page with no proper sentences…
6. There is no punctuation
Yes. No punctuation. Confirmed by Stil.
7. There are no numbers
Nobody mentioned that before. I’m in fact working as if there are no numbers, but who knows?
-
I noted the repetition of the first word, but I believe it may be 4 occurrences and one *similar* glyph: the left portion of the horizontal stroke on L4G5 appears to have a “sway” (as in “sway dash”) rather than a straight stroke. Obviously this *could* just be an artifact of the reproduction, but since similarly minor variations have been previously hinted as being important distinctions…
I toyed around with the first word being ‘the’, but it just doesn’t feel right. The structure, and the ways in which it appears elsewhere, make me think it is probably a single phoneme, which pretty drastically reduces the available options. This could be way off base, however.
One idea I toyed with is the possibility that the glyphs have some relationship to the actual *glyphs* of the IPA. In particular, the single-high and single-low dots might relate to the primary and secondary stress markers, placed on the right-hand side of the glyph rather than before it as the IPA does — especially since in some American English dictionaries, the stress marker comes *after* — and a break in the horizontal line might equate to a syllabic separation marker. Alternatively, the double-high-dot might be a primary stress, single-high-dot might be secondary stress, and a low dot might be a syllabic consonant?
Just noticed, while looking at the possibilities of that, that there are only two cases where a low dot happens *without* at least one high dot, and both of those have “crowded” upper areas… but that feels too complex. Real scripts don’t generally move indicators around, and if it were a “displaced” single high dot then there would be no reason to have the “single high dot with low dot” that shows up at least twice. Interestingly, both times *that* pattern appears are on the same base glyph, which appears alone with a single high dot as L4G2. In any case, given the pattern of usage I strongly suspect the dots to be diacriticals of some sort, rather than part of the “base” glyph.
In fact, L4G2 almost *has* to be a single phoneme, and since it is also a full word, that rather drastically limits the possibilities. My guess based on the notion of “glyph shapes being related” would be ['ai] (sorry, I couldn’t figure out if Unicode had a proper representation of this) with the “left rounded bit” coming from the a and the vertical stroke from the i, but there are other possibilities as well.
The same theory might lead to L1G4 (probably the second most repeated grouping) being read as [ʃʌ] (“so”), derived by shortening the “leading tail” on the first glyph, rotating the second glyph 90 degrees counter-clockwise, and adding a midline stroke to indicate that they are a single syllable.
Again, this could be way off base; there is nothing saying the script has any *gylph* resemblance to IPA or any of the other common phonetic alphabets, it is just a theory I tinkered with that produced a couple of plausible mappings.
-
Another thought that just struck me, though it *really* may or may not be significant even if correct: in general, calligraphic forms for letters start near the upper-left, sometimes with a ‘curl’, but then proceed either downward or rightward. L4G2, however, appears to be a continuous ‘flowing’ stroke connecting both the vertical and horizontal strokes through the leftward circle. So either Stil’s hand is *very* good (able to lift the pen and reset to the ‘starting position’ without producing any blotting due to excessive ink on a stem that is very narrow *and* lining it up pretty much perfectly with the curl) or this is actually a single stroke changing from vertical to horizontal through the curve.
The latter explanation would be much more typical of a handwritten script, but it implies that either the vertical is a *rising* stroke, or the horizontal is a *reverse* (leftward) stroke. The possibility of it being a rising stroke is only significant if the script *is* based on ascenders and descenders affecting things, which I’m not convinced of, and then in the context of “what if ascender/descender involves the stroke itself, not the position relative to the centerline?”
Otherwise it is just an interesting quirk of the calligraphy system for the script. Stil, any chance you remember whether you were trying to ‘trace’ it from the prior copy, or actually ‘write’ it out in your own hand (i.e., did you focus on replication of the original image, or of the writing pattern of it)? Just for my own curiosity.
Oh, and now I’m wanting to define a font file for it, dammit. As if I didn’t have enough to fill my time…
-
Toying with the possibility that L4G1 and L4G2 are formed based on the “vertical stroke with a left closed curl at the top” being ɪ, the “zig-zag” at that particular spot being ʒ, and the “horizontal stroke across the top” being ɾ, so that the two groupings are “ɪʒ ɪɾ” (“is it”).
Part of the idea being the combination of hints about it *not* being a cipher so much as especially flowery/fancy calligraphy for an alphabet. I’m assuming the IPA for lack of anything that seems to fit better, but if anyone knows of something with a better match, speak up!
-
Joel, you seem as hook as I am now. Over the weekend I made a start on removing the “mid-line” and extracting each glyph (forgive my terminology) – so I am slowly building a collection of large glyphs, those above the mid-line and those below.
I’m pretty sure it’s not a simple substitution of letters, but I’m quite keen on the idea of substitution into phonetics. Because it’s only 3 sentences, I don’t know how well a frequency-analysis would work. Do you think that would be forth pursuing?
Any thoughts on the dotted letters? I translated the Thai that Stil posted “Liver, sweet and very tasty.” – I had to remove the exclamation mark to get it to translate.
-
Clarifying my thought about cipher vs. calligraphy:
A cipher can be “secret writing”, but it can also be properly used to describe any transliteration, even just going from one alphabet to another. For example, saying 0×63697068 0×65720000 is a cipher for “cipher”, even though it is a trivial 1:1 in-order mapping of the ASCII values into hexadecimal. Then again, saying it as 0×83899788 0×85990000 is also a cipher, but is probably rather significantly less obvious to most folks (me included). Figuring that one out is left as an exercise for the reader.
By comparison, calligraphy is an ornamented or decorative rendering of a glyph from any alphabet, which I suppose is technically orthogonal to the question of whether something is actually a cipher. Calligraphy may go so far as to rearrange portions of the glyphs somewhat, though that isn’t all that common, but I was assuming that it may have been done to at least some degree in this case.
However, *encryption* does in fact require that the encrypted form be unknown to all but a small group, as the “crypt” root in this case means “hidden”. Encrypted information pretty much requires applying cryptanalytic skills to extract meaningful-to-the-public information from it. If using a 1:1 transliteration (whether that’s 1:1 by grapheme or phoneme), the line is very blurry and basically boils down to “is the alphabet in question sufficiently common or obvious from context that the audience could be expected to know it or readily find it”. So, for example, something based on the IPA (or any of the alphabets that are well known and can encoding English phonemes) would not be encrypted, since we’ve been told that it is “plain English” and almost certainly phonetic, while a transliteration using the “dancing men” cipher would be an encryption, or at least bordering on it — it would depend on how much of the audience would be expected to recognize it and know where to find the key for it.
All of that said:
The approach I had been toying with involved assuming that the message was written out in IPA or some close kin (definitely not a cipher in the “secret” sense, though certainly not many folks can sight-read it without a reference; I certainly can’t), but written using a calligraphic style with some additional items like the mid-line. I haven’t decided yet whether this is actually going to bear fruit; it definitely leads to coming up with several possible substitutions in short order, such as the ones I discussed in previous posts, but there are a lot of pieces that I haven’t been able to pin to much of anything yet.
My basic approach was to look at some of the shortest groupings, especially ones that were either repeated stand-alone or repeated as parts of longer groupings, and try to map those as a “hook” into the rest. The idea being that very short groupings represent words that can only have a couple of phonemes, at most, which both drastically reduces the number of possible phoneme combinations that could go into it, and *very* drastically reduces the number of English words that they could possibly form. English actually has remarkably few one-or-two phoneme words, as far as I can come up with, but several of them are very commonly used ones, which makes sense.
My approach to the dots was to assume that they mapped in some fashion to the modifiers for IPA (reduction, primary/secondary stress, one other I’m spacing on at the moment), and given the positions that they were probably being treated as diacritics (composed *with* the glyph they modify, the way an umlaut is, rather than proceeding it as they do in normally-written IPA). I had toyed with various mappings (double dots being the reduction modifier, since it is two dots, vs. a single low dot being the reduction modifier), but hadn’t come up with anything terribly concrete yet.
Obviously this may or may not be even remotely on base, but if I make the assumption that it is in a completely arbitrary system of writing that was created by the original person who wrote it out… well, I don’t actually enjoy doing cryptograms, generally; I just don’t find them that interesting. So I’m going with the approach that appeals to me, since if I’m right it is satisfying, and if I’m wrong I’m no worse off than if I tried to do it as a phonetic cryptogram.
-
just stumbled across this. has it been solved yet? what is the prize?
-
Got dragged into a major project at work shortly after my last comment, but I haven’t forgotten this, or given up… pondering what other tools I may be able to bring to bear in decomposing the graphemes, at the moment.
-
Same here.
My motivation is “joyous problem solving” and my life is presently organized on a “duty first, pleasure next” basis, so there is not much time left for this challenge. However, I am still on board too.
-
True. I just hadn’t managed to find them in a form I could apply some of those to. Besides, it felt like cheating, if the idea was to actually figure out the script.
-
Perhaps as something with little skill in phonetics, translations and cryptanalyses – I may be at an advantage by, as you say, keeping things simple.
I had started to just copy each of the “symbols” (again, please excuse my terminology) with the aim of converting them into a “sound” and then simply try and read the sentences.
The hard part would then simply be trying to work out where these “sounds” can appear on their own, or as a part of a word.
Having said that, I think Joel will probably nail it soon…
Would I be right in saying that where that “mid-line” connects the different “symbols” – we are looking at a word? Where they are not connected, and we see isolated “symbols”, theses are words like in, at, of and so forth?
-
To answer your final question, the top line has eight words.
That was my count too (see my very first post). Thank you very much for confirming it.
-
A question that I keep running up against: if this is some form of transcription in a phonetic alphabet, is it transcribing the passage as spoken in AuE (Australian English), RP (Received Pronounciation / British English), GA (General American English), or based on a specific reading of it by someone?
UKL doesn’t provide a ‘standard’ pronunciation guide, specifically because she believes that the names should sound like whatever the reader reads them as (or something to that effect, I found it buried in some comments on her website).
The other half the fun seems to be that there were major changes to IPA in 1989, so even if it *is* based on the IPA, it wouldn’t necessarily be written out with the same symbols today. Or it may not be at all and I’m just barking up the wrong tree…
Line 1 word 8 continues to give me fits, because assuming that the first grapheme maps to either ‘s’ or ‘ʃ’, I cannot come up with any phoneme that both forms a real word *and* has an IPA glyph even remotely close to the one in the image.
-
@Stilgherrian: in my August post I had said that L1W4 appears four times, assuming L1W4 = L1W8, that you confirmed, and also that L1W4 = L4W4 = L4W7, which I take for confirmed as well. You also said that it is a very common word, which is compatible with my guess that it is a preposition (or possibly the verb “is”). I don’t think it is an article: it appears before L4W5, which I believe to be an instance of the same word as L1W1 (in spite of a very small difference in the initial stroke) and is also a frequent word. So I take L4W4 L4W5 to mean something like “Of the”, “is a”, “on a”, “Of my” or similar: “the of” “a is” and the like are probably out of question
In fact, for me, the RP clue is bigger that L1W4 = L1W8. Thank you very much for it, but please, please, please no more clues. It’s Friday morning now up above here (as opposed to Down Under). Please give me a weekend before further spoilers.
-
I think I have it
… and in retrospect, it was “real beginner-grade material”, as Stil posted in October 2008. In fact, after a couple of hours before my own first comment in August, I needed only last Sunday and some more hours yesterday night to break it. I didn’t work on it in the meantime. However, it was clear that Stil was growing impatient (after 66 months!) and he was giving away too many clues. So finally I decided to test my initial hypothesis and it proved right at the first try.
Well, something is still missing from the picture. I could not identify the source work. Googling the last sentence, which is a motto, I found video games and other universes apparently not related to Ursula K. Le Guin. Since the script is a phonetic representation of English, I know how to pronounce but I cannot retrieve the original spelling of three words: two are fictional place names and the third is a generic classifier for one of them (as if they were “Australia”, “New South Wales” and “Commonwealth”). In my solution text I have used plausible spellings for them, in brackets. The challenge text is very short (110 characters) and it doesn’t cover all sounds of English. The internal structure of the script enables me to figure out how some of the missing sounds would be represented, but unfortunately not all (I was too optimistic in August.) There are also other minor doubts.
I am confident, however, that what I’m posting here is a correct solution, and I claim the prize.
-
In 2008 I wrote
Observation 1. It could be a simple character substitution code given that at least two “scribbles” are repeated throughout the script – the first “scribble” and last “scribble” on the first line for instance. If this is the case the commonly used English character frequency of letters table starting ETANOISH… could be of use.
———————–
I can see now that an excellent starting point would be simple word substitution – with 5 “the” and 4 “of” in a set of 28 words with the strokes representing an indication of how each word should be pronounced…. I can find “th” in [Ro"th"mile]
Bob
-
“Glyphs are quite systematic in terms of how they map onto phonemes”
Translation…
“Find an appropriate phonetic font and type the words above into a wordprocessor” ?
I have been searching the 1 million 600 thousand “phonetic fonts” results from Google and even image searched for “phonetic font stilgherrian” where clues to the puzzle can be located possibly in terms of the name of the image file.
Having reached a solution we wait with baited breath for the ultimate key to the solution and whether or not Rothmile is on the Sydney City Rail network (sigh).
PS Rothschild wasn’t the name of the banker. It was short for “The Shop of the Red Shield Company” established by Amshall Moses Bauer in 1743. Mr. Bauer’s son changed the family name to Rothschild after his father’s death.
Bob
-
“Glyphs are quite systematic in terms of how they map onto phonemes”
Translation…
“Find an appropriate phonetic font and type the words above into a wordprocessor” ?
You’d still have to type it in phonetically… if I get some time, I might sit down and try to transcribe the entire thing to IPA, just out of curiosity at what it would look like, and because it would then be possible to represent it as Unicode glyphs. If I do, I’ll be sure to post it.
-
-
Here is a hastily written narrative of my decipherment process. Let’s say it’s a first draft of the full report I’ll hopefully be able to write. I have many more observations, charts to clarify many points, and so on. I haven’t just got the time to put them together right now.
When I learned of the challenge at the end of August, all hints pointed to a phonetic writing system for English (I did mention Deseret and Shavian in my first comment), so I started from there, working by the book: I broke up the challenge text (henceforth: the Document) into words, the words into characters, produced fig. 1, and, based on known facts about word freqency in English, I immediately identified three words: “the” and “of”, which I mentioned, and the single-letter L4W2, which had to be the article “a”, the most frequent English single-sound word. In cryptography jargon, such clues are called “cribs”. Then, let me quote myself:
3. The most prominent feature of the script is the line (code named “blue line”, painted in blue in my figure) which appears in every word and is sometimes interrupted. I’ll call “ascenders” the strokes above it and “descenders” those below it. Even if a long slash as the beginning of L1W2 really is only one pen stroke, I will analyse it as two strokes, an ascender and a descender. These are standard typography terms. A unique feature of this script is that, while descenders can live either with or without a blue segment above them,
an ascender always requires one. The reverse is not true: in two instances, L1W3 and L6W4, blue segments start without being triggered by ascenders.4. In almost each case where an ascender and a descender are drawn one above the other, possibly in one pen stroke and probably as part of the same letter, one of them is “more complicated” than the other. This prompts me to classify the candidate letters of this script into four groups:
(i) strictly negative: consisting of descenders only. These are the only letters which can live without a blue segment. All other categories require one, because they involve ascenders.
(ii) extended negative: descenders, with an additional ascender (a simple vertical segment)
(iii) strictly positive: ascenders only (a rich inventory of hooks, loops, dotted variants…)
(iv) extended positive, the same as (iii), with an additional descender, which is a vertical segment (sometimes with a dot)When I wrote this I wanted to state facts. I didn’t want to share my wild guesses, but I already had an idea in mind: all vowels of my cribs were extended positives, the two consonants were strict negatives. What if the positives represented vowels and the negatives represented consonants? In that case the odd behavior of those glyphs with respect to the midline would have a fascinating explanation: the midline represented voice! Vowels, i.e. positives, are always voiced,
as well as sonorant consonants (the extended negatives!), while plosives, fricatives and affricates, the consonants that in English come in voiceless/voiced pairs, had to be represented by the strict negatives, which appeared in the Document both with and without a midline segment above them. If this was true, it wasn’t simply like Shavian, were the glyphs representing voiced consonants are flipped versions of the ones representing voiceless ones. In this script, voice was really written down with separate penstrokes of its own, as in some kind of spectrogram!(In the following, I write IPA symbols between slashes, as /hɪə/, both to represent phonemes and to represent their corresponding script letters as I decipher them. I know this is against common IPA usage and I hope this causes no confusion.)
The hypothesis had to be verified. Of the two crib consonants, L1W4.2, the voiced final /v/ of “of” had indeed a “blue” segment above it, and this would imply that L3W4.1 represented /f/, its voiceless counterpart, but the other crib consonant, L1W1.1, the initial /ð/ of “the”, was also voiced, but had no segment above it! The whole construction, however, was too beautiful to be dismissed by that simple dash. Many writing systems have special exceptions for common words, so I didn’t consider my idea disproved, but I badly needed real data to see if actual English phoneme frequencies matched what I thought I was seeing in the Document. The ETANOISH sequence mentioned by Bob Bain is well known, but it holds for conventional spelling, and I considered it of little use here. Fortunately, one of the most authoritative living English phoneticians, Prof. John C. Wells, whose blog is in my RSS feed, had posted a piece completely written in IPA in June. It was probably long enough to extract significant phoneme occurrence statistics from it. I preferred starting from scratch and counting the phonemes myself, because Wells uses a standard transcription system I’m completely familiar with, while many articles that could be found on the web used somewhat different systems, different phoneme counts, were based on different varieties of English and would require more adaptation work (I was assuming that the Document represented an Australian variety rather similar to Wells’ British English, at least in phoneme distribution if not in realization… I hope I’m upsetting nobody with this sentence).
In any case the timeframe I could dedicate to this matter had expired. The challenge went into my TODO list with the lowest possible priority, and it stayed there for months. Last weekend, I pushed it to the top.
Not surprisingly, 12.38% of my sample consisted of the single phoneme /ə/. It was clear that in the Document no character was so frequent, but Wells’ is a radical transcription, where, for instance, “the” is transcribed either /ðə/ or /ði/ according to its pronunciation. If Danny, the inventor of the script I was deciphering, wanted to keep the same spelling for the same word in all positions, he might have used /ði/ throughout, reducing the frequency of /ə/. Such an approach might also have explained the disturbing fact that the single vowel of L4W2, a candidate for the indefinite article, far from being the commonest, appeared only there. I still don’t know the reason: now, I think that that letter means “indefinite article, sometimes /ə/, sometimes /eɪ/”. In any case, the positives made up 43.09% of the Document, and 39.44% of the sample consisted of vowels. Not close, but not apart enough to disprove the theory. Maybe there were some positives which weren’t vowels. I know now that that was indeed the case: L1W2.1 appears three times, 2.73% of the Document, and represents /h/, not a vowel and not even a voiced sound. But it is a simple slash, it is somewhat outside of the system just as /h/ is a somewhat special sound, so it’s OK.
After /ə/, the commonest phonemes in the sample are, in order, /ntɪslkr/. I had to go for consonants, that is, in my hypothesis, negatives. The extended ones had to be sonorant consonants. In English there are seven of them: /m/, /n/, /ŋ/, /w/, /l/, /r/, /j/, and indeed I counted seven extended negatives, all scythe-shaped, one of them dotted, either with a sharp or a rounded angle where the “handle” met the “blade”, at three possible depth levels below the midline: 3 depths × 2 angle types + 1 dotted = 7! Some strict negatives, on the other side, were the handleless counterparts of those scythes (let them be “sickles”), while the one I had already identified as /v,f/ and /ð/ had a completely different shape. Wait! The latter were all fricatives… could the former be plosives? In that case, could the three depths correspond to the three places of articulation of English plosives? In that case, the scythe representing /n/ would be at the same depth of the sickle representing /d,t/ (with or without midline), and similarly /m/ with /b,p/ and /ŋ/ with /g,k/!
(if you feel confused, this chart might help). Frequencies showed where /n/ and /t/ are. They are at middepth. Labials tend to prefer initial positions, so they had to be the shallow scythes (/m/ and /w/) and sickles (/b,p/), which also showed this preference. The velars were at maximum depth, with a very conveniently final /ŋ/ at L3W4.6, which also showed that the nasals where the rounded scythes, so that /w/, /l/, /r/, /j/ had to be the sharp-angled ones, identifying the dotted one (L4W6.1) with /j/ (I think you all know that /j/ is the initial glide of “you” /juː/, and not the “j” of “Jew” /dʒuː/)There was also a spatial metaphor in this: the closer to the lips a sound is articulated, the closer to the midline its glyph is written. How elegant!
At this point I had most consonants, and I understood why vowels came in strictly positive or in extended form. Since there are so many scythes and sickles which can be easily confused with each other, some of them cut the stems of the following vowel, some don’t, and this is a way to tell them apart, alongside with sharpness and depth. For example, the boomerang-shaped vowel L3W4.4 isn’t cut by the preceding /l/ (middepth sharp scythe) but a preceding /r/ (deep sharp scythe) cuts it at L5W1.2. There are three possibilities: cutting, overstriking and joining, as in L6W4, where a shallow sickle (a /b/) joins the vowel of the article /ði/. Hey, this is the verb to /bi/! (There are still problems with the choice of strict vs. extended form of vowels, see below.)
Some fricatives were still missing, notably /s/, the commonest of them. A natural candidate was the commonest of the still unidentified glyphs, L1W3.4. It also appeared in a ligature with /k/ at the beginning of L1W7, which then could be read as /skr?b?/. Hmmm. “scribes” /skraɪbz/ perhaps? Tempting. This would identify L1W2 as “high” /haɪ/, solving the problem of L1W2.1, and understanding the initial sequence of L2W1 as /kh/. (/k/ is usually a cutter, as in L1W3, but probably /h/ can’t be cut at all, or /kh/ is a special case. Also, /j/ cuts L4W6.2 but doesn’t cut L6W6.2. Maybe cutting is optional for such an easily identified letter, maybe there are rules we cannot derive from such a short text. Never mind.)
A small problem with reading L1W2 L1W3 as “nine scribes”, however, was that /z/ was not represented as in L4W1 and as it should be, as an /s/ below a midline segment, but with a somewhat abbreviated form, easily confused with a final /v/ (the difference is that in /v/ the glyph hovers below the midline, while in the abbreviated final /z/ it dangles from it. I still don’t know if such an abbreviated form is always optional or is restricted to the cases when /z/ is obviously a suffix (plural, third person, genitive…), but again, it’s not a big problem.
If you have followed me to this point, you are surely able to find out the vowels for yourself. I’ll list a couple of final remarks here:
1. L3W5 is “personage”. I’d pronounce that word /pɜːsənɪdʒ/, but the vowel values I found correspond to /pɜːsɒnædʒ/. This confirms what we had already observed, that vowel characters in this script are not precise phonetic representations. In particular, reduced vowel sounds are (often) written as the full vowel they etymologically come from, just like in conventional English spelling. This word is also the only occurrence of /dʒ,tʃ/. We don’t know how /ʒ,ʃ/ looks like, there are no occurrences in the Document, but /dʒ,tʃ/ is composed by the middepth sickle /d,t/ and a final curl. Maybe that final curl alone represents /ʒ,ʃ/.
2. The whole Document is obviously written in an r-dropping variety of English, as shown by L3W2 /rekɔːdz/ “records” (for /re-/ instead of /ri-/ see the vowel comment above). However, L3W1 is “hereby”, and after a unique first vowel that I interpret as /ɪə/ (and is not in extended form, for an unknown reason), there is an /r/ character: /hɪərbaɪ/. The /r/ would be read only before vowels, but Danny decided to write it always, so that the same word is always spelled the same. Also, I don’t know why /aɪ/ is dotted here (and in L5W1, “Rothmile”). Maybe because there is another stressed vowel in those words. I don’t know.
3. The final vowel of L4W6 is a bit strange. It could be unique, but I think it is an /ɔː/ as in L3W2 /rekɔːdz/ “records” (strictly positive) or in L6W7 /ɔːlwəz/ “always” (extended positive), so I read that word as /jəʊsentrɔː/ and transliterate it as “Yocentro” or “Yocentror” but I am really in doubt here.
Thank you very much for keeping up with me for such a long post, and may the Sands be with you always.
Dario
(an Italian mathematician by study, sysadmin by trade, amateur linguist by passion) -
Dario,
Many congratulations on solving the challenge. I find these thing fascinating, even if I knew I never had much hope of solving it before anyone else (if ever!).
Your prize, whatever it may be, is very well deserved as you’ve clearly put a lot of time and effort into solving this.
Well done again.
@Stilgherrian – thank you for posting this challenge all those months ago. I found your site after seeing the google name blog post, and stumbled across this. I’ve kept a keen eye on the responses and I’m sure you’re probably rather relieved that it’s finally been solved! Do you think you’ll do anything else like this again? Many thanks.
Alex
-
Updates on vowels
Thank you for your congratulations.
I’ve looked at the vowels once again and I must conclude that I still don’t know much about them. All I can show you is my vowel chart: under each glyph I’ve put the words where it appears, in ordinary spelling, with the corresponding letters underlined. If a character appears in the Document both in short and long forms (previously I called them “strict” and “extended”), I’ve put them both. I believe they are variants of the same character, but I must confess that all my previous theories about them proved unsatisfactory. At the moment, I still don’t know when to use a short or a long form.
As you can test pronouncing those words yourself, vowel orthography is not strictly phonetic: let’s say that Danny made some concessions to ordinary spelling. In any case, at least six RP vowels don’t appear in the Document and I have no idea how to represent them: /ɑː/, /ʌ/, /ɔɪ/, /eə/, /ʊ/ and /ʊə/ (assuming that Danny did not distinguish between /i/ and /iː/, /u/ and /uː/. Otherwise /iː/ and /u/ are also missing).
There are many uncertainties, I’ve already mentioned some of them. Here, I’ll say only that perhaps, the /ɜː/ glyph in “personage” is only the extended form of the /e/ glyph in “citizen”. A hint that there could be some phonetic meaning to short and long forms?
@Joel: I hope the chart I posted helps you with your question about the vowels in “May the”. It was Danny’s game. While the structure of the consonant glyphs was immediately transparent to me, the same is not true for the vowels.
What is missing now is a consonant chart. I’ll leave it for my next post.
@Stilgherrian: I hope Danny answers your mail soon. I hope he can remember some of the missing bits. If somebody asked me about the secret scripts and languages I’ve invented in my youth, well, there were so many of them I’d be embarrassed… but something I do remember.
Cheers,
Dario -
I’m curious to know of the prize if Dario or Stilgherrian would kindly share?

@Quatrefoil: Here’s a fragment from the last line of the sample text. I’ve circled what I think is the variation in “sharp bend” versus “smooth curve” which you refer to. These represent two different things.

ABC The Drum
Crikey
CSO Online
Delicious (dormant)
Dopplr
Flickr
LinkedIn (dormant)
newmatilda.com
Posterous (deceased)
Qik (dormant)
Stilgherrian Live (Ustream)
Technology Spectator
The Full Tilt & Patch Monday
Twitter
Viddler
104 comments
Comments feed for this article
Trackback link: http://stilgherrian.com/language/script_challenge/trackback/