Script Challenge: can you figure this out?

I’ve decided that each weekend I’ll dig out an object or two from my more distant past and write about it. To kick things off, here’s a challenge which was originally created by the same chap who coined my name.

The text you can see in the image below (at least if you happen to be sighted) is in an unknown script. Your task is obvious, I think.

The only clues you have are that it’s a quote from a book by Ursula LeGuin and it’s nothing whatsoever to do with Tolkein.

Image of text in an unknown alphabet

Now originally I solved this in under 2 days, without the aid of computers or amphetamines. I reckon that in The Age of the Internet you can do better. I’ll negotiate a suitable prize for the first person who posts the solution.

Tags: ,

  1. Stephen Stockwell’s avatar

    It’s been 14 months, Stil. No takers.

    That tells me that the other 99.96-ish per cent of the population (like me) who aren’t cryptography geeks, will need the help of our protocol droids if we’re to solve this without shelving all our commitments for a month or whatever.

    Reply

  2. Stilgherrian’s avatar

    Stephen Stockwell: I find looking at the preview of your post and correcting problems before posting to be invaluable. :) OTOH, I could register you so you can edit your comments? Editing the past is quite popular now that everything is digital.

    As far as cryptanalysis goes, this puzzle is real beginner-grade material. I’m quite surprised no-one’s even had a go.

    Reply

  3. Stephen Stockwell’s avatar

    Cryptanalysis personally = Could. Not. Be. Arsed. Ball-tearer of an interesting subject, though.

    Reply

  4. Bob Bain’s avatar

    I’ve had a brief look at this following your tweet about nobody solving it.

    Observation 1. It could be a simple character substitution code given that at least two “scribbles” are repeated throughout the script – the first “scribble” and last “scribble” on the first line for instance. If this is the case the commonly used English character frequency of letters table starting ETANOISH… could be of use.

    Observation 2. Although you state it has absolutely nothing to do with “Tolkien” clearly this is a clue in itself and research into Tolkien is clearly in order. I note the the “Tolkien logo” bears a resemblance to the strokes (scribbles) in the script.

    Observation 3. If this is a character substitution code and if each line is a word then the second line could only be the “word” a or i.

    Observation 4. If the second character is an a or an i and if each line is a word then the penultimate line would be a two character word that could only start with an a e i o or u. If the second letter is an a or an i this reduces the number of permutations to words such as it or of etc.

    Observation 5. I had never heard of Ursula LeGuin but Google research into “quotes Ursula LeGuin” results in quite a few interesting observations about life the universe and everything but none quite match my deliberations although “if you light a candle you also cast a shadow” has an appropriate place given that it has an “a” where I might expect to find an “a”.

    Reply

  5. Stilgherrian’s avatar

    @Bob Bain: Your approach to cryptanalysis is taking the right path, mostly. My Tolkien comment was originally because his Tengwar script, also known as Feanorian letters, is the one which appears most frequently in his work — and the first wrong path which some people have taken when trying to solve the puzzle.

    Sample of Tolkien's Tengwar script: Ennyn Durin Aran Moria: pedo mellon a minno. (The Doors of Durin, Lord of Moria. Speak, friend, and enter.)

    This sample reads “Ennyn Durin Aran Moria: pedo mellon a minno”, which is Sindarin, an elvish language, for “The Doors of Durin, Lord of Moria. Speak, friend, and enter.” I could once read and write those letters, some decades ago, but I never went as far as to learn the languages themselves. I could pronounce the words, but not understand them.

    That LeGuin quote you found is indeed beautiful, but it is not the one. The full context, from the first of her Earthsea trilogy, A Wizard of Earthsea, is some advice to the young wizard Sparrowhawk:

    You must not change one thing, one pebble, one grain of sand, until you know what good and evil will follow on that act. The world is in balance, in Equilibrium. A wizard’s power of Changing and Summoning can shake the balance of the world. It is dangerous, that power. It is most perilous. It must follow knowledge, and serve need. To light a candle is to cast a shadow.

    Now as I say, Bob, your approach is mostly the right path. The language is English, just written in a different alphabet. A letter-frequency attack is the right first step for a letter-substitution cypher. But this is not strictly a letter-substitution cypher.

    One more clue: Not all alphabets have the same number of letters. Why is that?

    Reply

  6. Bob Bain’s avatar

    Right the greek alpahbet has 24 letters alpha beta though to omega

    http://en.wikipedia.org/wiki/Greek_alphabet

    The Phoenician alphabet (linked to above) offers some promise but there is no clear relation between the symbols in either the Greek or Phoenician alphabets to the script above.

    Wikipedia informs me in their article on Egyptian hieroglyphs that there are logographic and alphabetic representations. The Japanese I seem to recall tend to mix their logographic symbols with English and/or European characters which makes travelling in Tokyo quite interesting as the English bits give an idea of what’s going on.

    (working on it.. working on it… It’ll take longer than two days !)

    Reply

  7. Jason Langenauer’s avatar

    It seems to me that some of the letters have an inordinate amount of strokes. So, I’ll offer the hypothesis that the number of strokes and dots in each letter has some relationship to the plain-text. With the number of dots in parentheses, my count of the strokes are

    4(2) 5(2) 13(1) 4(0) 4(2) 6(2) 12(2) 4(0)
    14(3)
    9(4) 12(0) 4(2) 9(1) 14(0)
    6(0) 2(1) 17(2) 4(0) 4(2) 19(1) 4(0)
    12(3)
    3(3) 4(2) 12(0) 4(2) 5(1) 5(1) 10(2)

    Now, this all looks good up until the last line, where two sets of different letters have the same number of strokes and dots – which may be intentional, as a coding device to represent an English (or more correctly, Latin) letter with two or more coded letters, or it may not be, in which case the substitution breaks.

    Perhaps there is meaning as to whether the dot is above or below the central horizontal axis of the script, which would allow the two 5(1)’s to be different letters, but the similarity of the two 4(2)’s on the final line would tend to disprove the hypothesis.

    That’s about as far as my reasoning takes me, pre-coffee on a Sunday morning.

    Reply

  8. Stilgherrian’s avatar

    @Jason Langenauer: Welcome to the Game, Sir! Now, what is a letter and what is a word? Inordinate numbers indeed — or perhaps not.

    Reply

  9. Jason Langenauer’s avatar

    I have often found it useful in many problem domains to list out the assumptions one is making, so that they may be validated. So, what are the assumptions in play here? And how can we validate them?

    1. That each grapheme in the script is separated from other graphemes by a continuous section of space. This means, that the fifth line is one character, not two as @bob_bain has suggested. Can we prove or disprove this? Not really, so lets keep it as a working assumption.

    2. That there is a one-to-one correspondence between a grapheme in the script, and a letter in the Latin alphabet – i.e. each grapheme maps to exactly one Latin letter, and each Latin letter maps to exactly one grapheme. Can we prove or disprove this? Yes, because Stilgherrian confirmed it when he said the “The language is English, just written in a different alphabet”.

    3. That the different lines represent new words – i.e, based on the first assumption, the English quote is of the form “xxxxxxxx x xxxxx xxxxxxx x xxxxxxx”. Hmm, that’s a problem that could be solved rather quickly with a Regex and the text of Ursula LeGuin’s works. But that would hardly be sporting. Can we prove or disprove this? Not really, but it’s worthwhile to keep in mind other possibilities: that no word breaks are used, ASENGLISHCANBEWRITTENLIKETHISWITHOUTLOSINGMUCHMEANING, or another grapheme is used as a “word-break” character – the last character of the first line is a possibility.

    More thought required here…. but at least I learnt a new word (“grapheme”).

    Reply

  10. Stilgherrian’s avatar

    @Jason Langenauer: “Grapheme” is indeed a wond’rous word. “Alphabet” is an interesting word too, and one which deserves further consideration.

    Reply

  11. Bob Bain’s avatar

    Ah hah

    Currently considering FONTS !

    e.g.

    http://www.searchfreefonts.com/free/bisaya-1880.htm

    Type Stilgherrian into the box and you get symbols with dots and lines BUT if the solution is a FONT then this isn’t the font that may be being used as it doesn’t produce the required graphemes.

    Microsoft Wingdings doesn’t work either !

    Some of these non-latin fonts approach the type of grapheme being considered.

    (working on it.. working on it… )

    Reply

  12. Bob Bain’s avatar

    Stilgherrian’s comment that “grapheme” is indeed a wond’rous word…

    reference http://en.wikipedia.org/wiki/Grapheme

    “A grapheme (from the Greek: γράφω, gráphō, “write”) is the fundamental unit in written language. Examples of graphemes include alphabetic letters, Chinese characters, numerical digits, punctuation marks, and all the individual symbols of any of the world’s writing systems.”

    noting that “Alphabet” is an interesting word too, and ONE WHICH DESERVES FURTHER CONSIDERATION” appears to indicate that we can ignore Chinese characters, numerical digits, punctuation marks, and the odd assortment of individual symbols of any of the word’s writing sytems.

    This rules out such things as Ideograms http://en.wikipedia.org/wiki/Ideogram

    “An ideogram or ideograph (from Greek ἰδέα idea “idea” + γράφω grafo “to write”) is a graphic symbol that represents an idea or concept. Some ideograms are comprehensible only by familiarity with prior convention; others convey their meaning through pictorial resemblance to a physical object, and thus may also be referred to as pictograms”

    Over the coming weeks I’m concentrating on fonts ! :-)

    Reply

  13. Stilgherrian’s avatar

    @Bob Bain: It’s good that you’re clarifying all the terms — “alphabet”, “grapheme”, “ideogram”, “font”… but pursuing fonts as such is the wrong path. This is a new, unique script, and I reckon the picture in this page would be the only example on the entire Internet.

    I won’t say any more just now, because someone’s high school English class will be having a go at this soon. Not too many clues!

    Reply

  14. Bob Bain’s avatar

    Fonts are interesting in their own right and I am delving into aspects of Microsoft Word at Penrith Valley Seniors Computing Club – so even though fonts may not have any direct relevance to this puzzle I will be looking at fonts in the ensuing weeks.

    With regards to an aspect of alphabets we the puzzle solvers are overlooking I digged this out from the Interernet last night.

    http://www.answers.com/topic/history-of-the-alphabet

    No doubt the English class mentioned in your entry this morning can find the aspect of alphabets we seem to be overlooking noting that alp = “ox” and bet = “house” in the Proto-Canaanite invocation of this phenomena.

    I await insight from those much younger than myself !

    Reply

  15. Quatrefoil’s avatar

    Ok, I think I understand the principle, but then I should.

    I wonder whether you had to go to the library to sort this out back when it was first set?

    I’m up for the challenge.

    Reply

  16. Stilgherrian’s avatar

    @Quatrefoil: Glad you’re joining the fray. No, I didn’t need to go to the library.

    Reply

  17. Eric TF Bat’s avatar

    Interesting: the Tolkien script uses those accent-like dots and squiggles for vowels, so for example the first word, transliterated as Ennyn, is really [n][n] with an [underlined acute] meaning [e], and a [pair of dots] meaning [y]. I speculated there was something similar in your example: the first symbol (top left), looking like an F with two dots, recurs in a bunch of places, and there are also similar ones with different dots and frou-frou, like the second symbol on the fourth row. I’m dividing the larger symbols into what I think are letters — for example, the first one on the bottom line may be one of those F things with one dot and a loop, preceded by a smaller L shape. Two letters? Three, counting the loop+dot as a vowel? Perhaps. I should really be working…

    Reply

  18. Francis’s avatar

    Argh. I used to have a key to this. Possibly still do, probably in the back of some old mathematics notes. From 1978. My goodness this brings back very old memories. I wonder if I can remember the logic behind the script. As I look at it some of it comes back to me. I remember when Danny was creating it.

    Reply

  19. Bob Bain’s avatar

    Stilgherrian may not have found it necessary to go to a library to solve this as I get the impression that he is a fan of this type of fiction and so has a head-start over those of us who haven’t quite got the hang of it. I have therefore been delving into the works of Ursula Le Guin and have discovered that in the Wizard of Earthsea Le Guin introduced a “special language”

    http://stuffedhead.wordpress.com/2009/03/21/88/

    Part I: Magic Controlled by Speech

    One of the most well known aspects of the Earthsea series is the magic system. Le Guin created a system where wizards use a special language called the Old Speech to work magic. As a passage from A Wizard of Earthsea explains,

    “In the world under the sun, and in the other world that has no sun, there is much that has nothing to do with men and men’s speech, and there are powers beyond our power. But magic, true magic, is worked only by those beings who speak the Hardic tongue of Earthsea, or the Old Speech from which it grew.”

    Now if a person has read Le Guin’s works then he or she would have a head start over the rest of us methinks…

    Reply

  20. Bob Bain’s avatar

    Update.. checked out Le Guin in the Galaxy Bookshop. She wrote verse about hexagrams which leads me to the I Ching.

    http://en.wikipedia.org/wiki/I_Ching

    The text of the I Ching is a set of oracular statements represented by 64 sets of six lines each called hexagrams (卦 guà). Each hexagram is a figure composed of six stacked horizontal lines (爻 yáo), each line is either Yang (an unbroken, or solid line), or Yin (broken, an open line with a gap in the center). With six such lines stacked from bottom to top there are 26 or 64 possible combinations, and thus 64 hexagrams represented.

    The oracular interpretation of the symbolic language based on trigram symbols formed from yang and yin components is well known. However, the inherent numerical language of line change and non-change is relatively unknown.

    ==================

    ah hah ! Progress !!

    Reply

  21. Quatrefoil’s avatar

    I’m making a very different set of assumptions to yours Bob – I think it’s a whole lot simpler than that. I don’t think it’s a numerically based script – I did think about twig runes which work as a counting code, but I don’t think it’s that.

    I have now eliminated quite a few possibilities for words, but haven’t found any definite combinations.

    I’m proceding on a largely grammatical/linguistic basis.

    Reply

  22. Quatrefoil’s avatar

    And it would be very useful to know a tense, but that would feel like cheating.

    Reply

  23. Stilgherrian’s avatar

    @Eric TF Bat: Tolkien deliberately designed his languages and writing systems so they were plausible within his pre-history. As I’m sure you know, the narrative of The Lord of the Rings sits at the end of the age of Elves, Dwarves and Hobbits and the beginning of the age of Men.

    In Tengwar, the script I showed before, vowels are indicated much as in Tibetan and other Brahmi-derived scripts. A similar system is used in Thai today.

    ตับหวานอร่อยมากๆ!

    Is any of that a clue? I have no idea.

    @Francis: If you do find the key, please let me know. I’d love to see the original explanation. But email it to me, don’t post it here. Not yet, anyway.

    @Bob Bain and @Quatrefoil: There’s no need to over-complicate it.

    Reply

  24. Quatrefoil’s avatar

    It’s official — Stilgherrian is cleverer than I am. Two days have elapsed and I’m a long way from a solution — though I’ve tested a fair few possibilities.

    I’m not being terribly complicated about it — I only wanted to know the tense since it would give me a clue about expected distributions of word endings in English, but I think I’ve answered that question at least. And since it’s a literary text, I wouldn’t expect standard distribution patterns to necessarily hold true anyway.

    I now have about six pages of if/then statements, and a lengthy statement of assumptions. I just don’t have any positive correlations.

    Last night I dreamed of Thai elves with glottal stops (which is probably better than Mandalay).

    Reply

  25. Bob Bain’s avatar

    @Stilgherrian “Is any of that a clue? I have no idea”

    You make mention of vowels reminding us that alphabets are comprise vowels and consonants which may be the issue we have been overlooking when it comes to alphabets. As it notes on an Internet page somewhere there are a considerable number of “vowels” in the International Phonetic Alphabet — which as an aside is referred to on Ursula Le Guin’s home page. There are over 40 vowel sounds I believe.

    As I write the word “sounds” I am reminded that alphabets attempt to document sounds that can be produced by the human vocal chords — as opposed the symbolism of number found in mathematics.

    Perhaps we should be looking at intonation — the rise and fall of the strokes and attempting to reconcile this back to human speech — in English.

    An approach I took yesterday was to draw a thin blue line through the middle of each row of graphemes and examine the marks above and below the line.

    Quatrefoil appears to be attempting a solution using FORTAN :-) — a computer language which derived it’s name from FORMula Translator. Perhaps we should be using LISP — the language for Artificial Intelligence — “Lots of Irritating Stupid Parentheses”.

    Reply

  26. Stilgherrian’s avatar

    @Quatrefoil: The quote is in relatively straightforward English. No weird Ridley Walker or overly-poetic constructions.

    @Bob Bain: Getting warmer.

    Reply

  27. Francis’s avatar

    After a fruitless search I am keyless. I can even picture in my mind’s eye the old notebook I need – Stats 1H. But of all the old books in a box in the cellar, that one is missing. Found lots of books of notes with my handwriting in them, but no recollection of ever taking the course. Applicable Analysis anyone? Actually looked to be pertinent too.

    My very faint recollection of the explanation Danny gave me was indeed that relative position above and below the dominant line is a key part of the encoding. I think I have the main structure of the text sorted, but fitting the minor consonants isn’t so trivial. One clue to the provenance of the quote is that I will bet the text we see was inscribed in 1978. So that seriously limits the books from whence it came.

    Reply

  28. Stilgherrian’s avatar

    @Francis: You correctly date the puzzle. My fuzzy memory was placing it in 1979 at the very latest, but I reckon 1978 is closer to the mark. There is much in your memory which is correct.

    Not relevant to solving the puzzle, but I’ll mention it anyway: The image everyone looking at here is not in Danny’s original hand, but my own facsimile of it — because the original was damaged in some way. I doubt that it still exists.

    Reply

    1. Quatrefoil’s avatar

      @ Rob Bain:

      No – I’m almost illiterate in the languages of computers, but I do read a number of medieval and modern languages, and have some training in logic (which is where computing got it from in the first place).

      I’m just testing theories, and yes, I figured that we’re talking about sounds, not letters.

      So my reasoning goes:

      If Graph A = Sound B, then Graph C cannot equal Sound D, because it would result in either a set of sounds that don’t occur next to each other in English, or a grammatical problem – e.g. singular subject with a plural verb.

      The problem in that is working out what constitutes a single graph which signifies a phoneme.

      @ Stilgherrian:

      Yes, I was assuming ordinary English, but literary texts tend to have different rhythms from non-literary ones – such as doubling of adjectives, inversion of word order etc. If I remember correctly, though, le Guin isn’t an overly flowery writer.

      I suspect that I will figure it out eventually, using trial and error, but the monkeys with the typewriters might get there first (which isn’t to say that anyone who solves it before I do is a monkey).

      Reply

  29. Quatrefoil’s avatar

    @ Stilgherrian

    And since it’s your handwriting, can you tell me if a sharp bend is intentionally different from a smooth curve, or is that just a variation in the hand?

    Reply

  30. Stilgherrian’s avatar

    Fragment of the Script Challenge, showing that different shapes are significant @Quatrefoil: Here’s a fragment from the last line of the sample text. I’ve circled what I think is the variation in “sharp bend” versus “smooth curve” which you refer to. These represent two different things.

    Small details are, in general, important. Here, as in the rest of life. Gosh.

    Reply

  31. Bob Bain’s avatar

    @Quatrefoil

    I’m familiar with simple logic and there is a formal system of symbol manipulation that assists. Some of this can also be performed with a Venn Diagram

    http://en.wikipedia.org/wiki/Venn_diagram

    To test simple logical statements I find Venn diagrams and Truth tables understandable.

    However as far as “logic” is concerned as Wikipedia notes

    http://en.wikipedia.org/wiki/Logic

    “Just as we have seen there is disagreement over what logic is about, so there is disagreement about what logical truths there are”

    As far as digital computers are concerned the breakthrough in logic came with Boole and the application of Boolean algebra to electronic circuits

    http://en.wikipedia.org/wiki/Digital_electronics

    http://en.wikipedia.org/wiki/Boolean_algebra_(logic)

    “Boolean algebra (or Boolean logic) is a logical calculus of truth values, developed by George Boole in the 1840s”

    As explained to me once in an elementary electronics course it was put to George Boole in the 1800′s “Thats fine George but what possible use is this ?” and it wasn’t until the advent of the digital computer that Boolean algebra becomes valuable involving AND OR and NOR gates which control electronic circuitry. Boolean logic is also used in programming but this is a slightly different concept as it involves human beings who can and do (too frequently) get it wrong.

    We also have Fuzzy logic http://en.wikipedia.org/wiki/Fuzzy_logic

    This is widely used in the application areas listed – mostly in the electronic sphere.

    When it comes to digital computers attempting to emulate the human brain then we enter the world of Neural Networks and Genetic Algorithms

    http://en.wikipedia.org/wiki/Neural_network
    http://en.wikipedia.org/wiki/Genetic_algorithm

    In the end understanding the world is best left to the human brain which is the computer most used by our biological species.

    It is from this base I am attempting to understand the symbolism and am thinking of attempting to emulate it via sounds and exploring the wave forms produced and comparing this to the symbols in the script. All such wave forms are a subset of a sine curve I believe.

    http://en.wikipedia.org/wiki/Sine_wave

    Check out the diagrams in the article above and perhaps examine sine, square, triangle and sawtooth.

    I don’t believe any type of “formal logic” will work in this situation.

    After all that I am having difficulty too.

    @Stilgherrian Sir. Can we apply for a government grant to help solve this ?

    Reply

  32. Stilgherrian’s avatar

    @Bob Bain: “Government grant”? To solve a recreational puzzle, based on a made-up writing system of no real value to anyone except a small group of people? Sure! The Australia Council would be your best bet.

    Reply

  33. Bob Bain’s avatar

    @stilgherrian There was a government grant ($20,000 I seem to recall) once awarded for the production of a rather glitzy pornographic magazine.

    It had elements of artistic merit.

    Reply

  34. Jason Langenauer’s avatar

    So, after a couple of weeks away, I’ve come back to have a look with fresh eyes.

    Two things jump out – Stilgherrian’s comment right at the top that it was “English, but written in a different alphabet”, and the continued hints about “alphabet”. So perhaps the alphabet used has graphemes for sounds that we normally might represent by a digraph in English – for example, “sh” or “ch”. Hmmm….

    But of particular interest in teh Wikipedia, is this article on Devanagari, the alphabet used to write Hindi, Urdu and Sanskrit. The article notes that the alphabet is “recognizable by a distinctive horizontal line running along the tops of the letters that links them together”. Now where have we seen that before?

    http://en.wikipedia.org/wiki/Devanagari

    Reply

  35. Jason Langenauer’s avatar

    Bah, it’s not used for Urdu at all. Some of this damned dust must have got into my brain…

    Reply

  36. Stilgherrian’s avatar

    @Jason Langenauer: There is some sense in what you write… though I doubt that Urdu will help you very much. Not that know anything about Urdu, or could be arsed searching for same.

    Reply

  37. Daniel Edmonds’s avatar

    Wow. This has been sitting here for a long time and still no headway. Stil, can you give everybody either the answer or another large hint to get it going?

    Reply

  38. Stilgherrian’s avatar

    @Daniel Edmonds: I’ll ponder your question on the weekend, Daniel. I must admit, though, there’s a lot of really good clues here already which people seem to have missed.

    Reply

  39. Daniel Edmonds’s avatar

    My understanding of it is that each character represents a word. The characters are geometric, the letters are represented by the lines branching off of the middle line. I figure this because many of the characters have similar ‘parts’ to them (such as the wedge < on the last character on the first line, amongts others, the '4' turn which is juxtaposed to the loopy turn, and of course the dot).

    I figure the first character ("double-dot f") and the fifth character (wedge-f) are often-used words such as 'the', 'in' or 'of'. That the characters are words in themselves is supported by several character with a huge number of strokes (far too many for it to realistically be a single letter).

    Your continued hint at 'alphabet' makes me think that in english, our alphabet is one letter-per-sound. But in other alphabets, the characters represent groups of sounds, whole words, meanings. So that supports my guess that the individual characters don't merely represent a letter. Also, the characters are not groups together in any way to represent words.

    Reply

  40. Stilgherrian’s avatar

    @Daniel Edmonds: You’ve got some good thinking in that analysis, though not all of it is correct. The sample text is a series of words, in English, separated by white space. And the analysis of word- and letter-frequency is indeed an important tool in cryptanalysis.

    For example, single-alphabet letter-substitution ciphers such as the Caesar cipher, where a given letter of the alphabet is always substituted with another, and always the same letter, are easily broken by remembering that the frequency of letters in typical English text typically runs ETAOINSHRDLUCYMFWP. Or thereabouts.

    You’re right in saying that different alphabets transcribe sounds in different ways. But you’re wrong in saying that when English is transcribed in the Roman alphabet, the mapping is always one letter per “sound”. It’s better to use the technical word linguists use when discussing phonetics: “phoneme”.

    For example, the musical verb “sing” is usually spelled like that, with four letters in the Roman transcription of English. But in International Phonetic Alphabet is has only three letters — only one for the final “ng” because that’s considered to be one phoneme, the so-called “dorsal nasal velar“.

    I’d show you the actual IPA rendering but I’m having trouble copy-and-pasting it into WordPress and it’s too early for me to be bothered working out how to do that. However any decent dictionary should show you the IPA transcription.

    I reckon that’s a lot of clues all at ones, given what else I’ve said on this topic.

    Reply

  41. Daniel Edmonds’s avatar

    @Stil

    Thanks for clearing up the difference between ‘sound’ and ‘phoneme’, I should have used that word seeing other people had used it.

    Since you keep showing us the difference between ‘sound’ and ‘phoneme’, I’m led to think that this is extremely important. I’m not sure if it’s the correct word, but I’ll use ‘glyph’ to represent the different partitions of a grapheme. One glyph represents either one phoneme (ie. ‘th’ in ‘then’, a in ‘father’), or a group of phonemes, but i’m want to suspect the former, because I don’t think there are enough glyphs to represent all possible groupings of phonemes.

    The center-line is obviously releavant, because sometimes a glyph crosses that line, or other times merely sits under or over it. I am not going to count the center-line as a possible phoneme as it exists in every grapheme.

    The one thing I’m having difficulty in is separating the glphs. Many are easy (the jagged curve in L1G3 (Line 1 Grapheme 3) and at the beginning of L1G7, amongst others). However some are difficult – take L1G1 – not counting the centerline, there is a south-west quadrant dash, a north-east quadrant dash, a vertical line and two dots – the dots are probably vowel indications, given your love of qwenya and sindarin, but I’m unsure if each of these is a phoneme in itself, or perhaps joins with another stroke to create a phoneme (ie. the verticle and southwest dash could join to become ‘d’).

    But I figure that L1G3 is made up of the following glyphs (i’ve separated them): http://imgur.com/rxyrW; Perhaps glyph 2 & 3 belong together as one glyph?

    [Stilgherrian writes: To save you having to click through, here's Daniel's (incorrect) breakdown of L1G3.]

    Reply

  42. Stilgherrian’s avatar

    @Daniel Edmonds: There’s two reasons that I specifically use the word “phoneme”, and only one of them is that I formally studied linguistics and therefore try to stick to the correct technical terminology.

    I was going to say that another word that comes to mind is “cursive“, but having read that Wikipedia entry I think that might be misleading.

    Your breakdown of L1G3 has yielded too many pieces.

    Reply

  43. Daniel Edmonds’s avatar

    Far too many or just one or two too many? I can see now that pieces two and three in my picture are in fact one piece.

    Regardless, I don’t see myself getting anywhere with this one. There are far too many loaded assumptions for me to get my head around. I’ve sat here for half an hour staring it and now have more questions than answers.

    Why does the down-ward squiggle in L1G1 sometimes break the center-line, but other times like L4G3 (the second instance of the glyph) the centerline continues?

    What difference does it make when a glyph starts below the line, or when the same glyph goes through the line, or when it’s not even on the line?

    Are the dots the same glyph, or different glyphs when they’re in different places?

    Reply

  44. Francis’s avatar

    A very vague memory (from 30 years ago.) This wasn’t designed to be a code. It was designed to be beautiful typography. So ligands are a reasonable thing. It was designed by a computer scientist. There is underlying structure and rationale to it. There is beauty in that too.

    I’m trying to avoid looking at it again. Too much other stuff to worry about without going down that slippery slope.

    Reply

  45. Stilgherrian’s avatar

    @Francis: Absolutely everything you say there is true, except that the word you were looking for is ligature rather than ligand.

    While this isn’t a code intended to hide meaning, but a script intended to transmit meaning, some techniques of crypytanalysis can help. Turing would have noted that they’re all codes anyway.

    @Daniel Edmonds: Your reading of L1G3 is now correct. You just broke it up into one more piece that was correct, as you suspected to begin with.

    I’m reluctant to give too many more clues, as the conversation already links to everything you’d need, including a chart that… well, perhaps that’s saying too much. But here’s three little tidbits to help you along.

    1. You’re asking the right kind of questions. Everything has meaning, except for the things that don’t. Make a list, and cross off one item. You have already mentioned the item.
    2. While I did say that detailed consideration of Tolkien’s Tengwar script would take you down the wrong path, Tengwar and this script have one specific feature in common. If George Bernard Shaw were still alive, he’d know what I was referring to.
    3. “And thirteen shillings?”

    That’s your ration for the day.

    Reply

  46. Daniel Edmonds’s avatar

    Just in case anybody is reading this while I’m working on the next bit, Stil’s reference to George Bernard Shaw is a reference to the Shavian Alphabetic, a phonemic/phonetic alphabet. http://en.wikipedia.org/wiki/Shavian_alphabet At first I thought it was in reference to the diacritics that the tengwar script uses to denote vowels, but the Shavian alphabet does not have these.

    Something interesting about the Shavian script is that whether the letter is written above or below the centerline denotes if it is a hard or a soft sound (ie. the ‘th’ and ‘they’ and ‘thick’).

    The Elvish language is also phonetic, and uses a similar system to change a hard sound to a soft sound. From what I have briefly read, either a line is added or the number is spun around…

    I have to look more into this.

    ‘Everything has meaning’ is most likely in reference to that every stroke has meaning, except for one, the centerline. I’m trying to make a list, but some characters are still hard to separate the different ‘ligatures’. I still am not sure about L1G1 – not counting the middle line, there are three lines + the two dots. Is this three phonemes? four? More? Less? Gah. L1G2 is similary difficult – I’m counting three ligatures – the first vertical line, the second verticle line with the ’4′ loop and the two dots – three phonemes?

    I’ll give it to you Stil. I’m hooked.

    Reply

  47. Stilgherrian’s avatar

    @Daniel Edmonds: I’m glad you got the Shaw reference. I won’t add anything just for the moment, because I think there’s enough pieces to play with for the time being.

    Reply

  48. Kate’s avatar

    I find myself wondering if the script includes punctiation as well as phonemes?

    Reply

  49. Stilgherrian’s avatar

    @Kate: Nope, here isn’t anything there in the way of punctuation. In this particular example, if there is a new sentence, it would just start on a new line.

    Reply

  50. Aaron (bornfor)’s avatar

    Am I looking at it correctly, then, in that there are three sentences total?

    Reply

  51. Stilgherrian’s avatar

    @Aaron (bornfor): If I remember correctly, that is true, yes. I’m only hedging that answer because my source material is in storage and I’m relying on my memory.

    Reply

  52. Murfomurf’s avatar

    I’m onto it. Slow but getting it. The double dots are the most mysterious parts but I’ve found some charts on the net that are helping me decypher the straight & curved lines. I think I probably know what you mean about the “sound” aspect of this code- when I looked at some of the really squiggly bits I recalled seeing something like it that my mum used to do, but she never taught me how- grr…

    Reply

  53. Aaron (bornfor)’s avatar

    I’ve gotten them all broken down into what I think are the various ‘characters’. They’re relatively consistent, so it wasn’t.. awful.

    My one confusion with this is the same as @Murfomurf was saying- the dots. I have them classified as numbers. Number two goes 2, 2a, 2b, based on the dots, etc.

    @Stilgherrian, I know you’ve given vague hints before, but would you mind hinting *once more* about how you’ve altered the alphabet? You said it was minor, but does ‘minor’ imply that the alteration was only for vowels or consonants- or for a small number of both?

    I ask because I’ve gotten ‘spaces’ written out already with each character’s temporarily-assigned number, but am unsure how to proceed from here.

    Reply

  54. Stilgherrian’s avatar

    @Murfomurf and @Aaron (bornfor): Having just written an icy-angry response to someone else’s comment, I feel the need to re-balance my karma by giving you three hints.

    These may or may not help.

    • Some characters do indeed have dots. So does “ö”. But not “o”. We’re not speaking German here, but what if an umlaut only had one dot? Is that even a sensible question to ask? Maybe the compass is mis-aligned.
    • “Pale, yes, but with far more hops. It’s a long way, after all!”
    • Everything is probably more consistent than you fear, but nevertheless quirky.

    This is fun.

    Reply

  55. Aaron (bornfor)’s avatar

    Alright, here’s my proposed list of characters.

    https://docs.google.com/drawings/d/1g6j7-PVi7C9VwOarvJZE9zIjTOsMt6JBzkBdB7oRiO8/edit?hl=en_US

    Murfomurf, were you getting something similar?

    Reply

  56. Stilgherrian’s avatar

    @Aaron (bornfor): I like your chart so much I’m going to embed it here on the page so people don’t have to click through.

    I like the blue baseline. What would that be called in other contexts, I wonder? But my mind wanders…

    Reply

  57. The Bigfella’s avatar

    I am assuming that each glyph is a word. It seems that most have drawn that conclusion.

    No one seems to have mentioned the fact that the centre line is not drawn from the start to the end of all “words”. It may not be relevant, but it seems that a horizontal line is present in all “words”, but there are breaks in some.

    Is the break symbolic of syllables? L1G3, for example, starts with a character (without the horizontal), then a series of characters (with a horizontal), then a break to the next series of characters (with a new horizontal).

    I would also like to explore the idea that the horizontal line actually splits a vertical line in two – in the image above, number 5 is not 1 vertical line, but actually 2 – one above, one below.

    Unfortunately I am as yet unable to do anything to develop these thoughts further, but wanted to share to see if anyone had an opinion on them.

    Reply

  58. Joel’s avatar

    One thing I haven’t noticed much about in the comments so far, but which may hook back to some early clues: if it is English written in an alphabet, nothing says it is written in a direct analogue to the *current* English alphabet of 26 characters. There are, after all, 44 phonemes in English, and some of those currently represented by pairs of graphemes were at one point represented by single graphemes that are no longer in use (for example, the “thorn” character used in Old English) or as ligatures (the “ae” ligature rather than an “a” followed by an “e”). Jason L. did touch on the latter with the comments about languages whose characters appear to have line-linkages, sort of, and ligatures serve(d) the same basic purpose in English, albeit usually being a typographical convention.

    Since the main place that I have seen either of these written out is in older English, which was Tolkien’s primary subject of study as a linguist — especially the thorn case — it seems plausible that attempting to do a character substitution against English written using some or all of the possible variations if you include older characters and/or ligatures might result in a more productive analysis.

    Or it could even be as simple as a true phonetic alphabet for English, in which case you need to use a slightly different form of the classic frequency mapping, as several of the high-frequency characters actually have multiple phonemes associated. I don’t know that I’ll have time to delve into it, but some thoughts in case anyone else cares to try the tack.

    Reply

  59. Joel’s avatar

    My working assumptions, wrong as they may be:

    1. The “clearer” spacing is handled at least roughly as in English: to wit, it designates word boundaries. The “one word per line” just doesn’t make sense to me; it would produce a quote that is far too short for most of the significant things in LeGuin’s work, and there is far too much complexity within a single line for it to realistically represent one word in something written in English. Welsh, perhaps, but not English. :)

    2. The “disconnect spacing”, such as that seen in the third grouping on the first line, does *not* indicate a word break. It may or may not indicate a phoneme break or a syllable break.

    3. There is a mapping to either the full IPA or the set of 44 English phonemes, though given that it is a text from English, presumably the English subset of the IPA would be sufficient.

    4. There is probably significance to the repeated patterns that appear in different compositions, such as the frequently-seen vertical stroke with a cross-stroke at the top and one or more dots, which appears in both of the first two groupings, the second time with a diagonal linkage, which is then itself repeated *without* the lower portion in the middle of groupings later in the same line. There is also a rounded-linkage variant that appears multiple times later on.

    4. That blue line is actually positioned as a midline (or centerline). A baseline generally occurs lower down; roughly around where the left horizontal stroke on the opening grapheme appears, running just below the “leftward V” on the last grapheme of line 1, etc.

    5. Beware the serifs. The original appears to have been done using a calligraphic style, possibly a classic ink pen, evident in the smoothly-broadening strokes and small “tails” in some spots that run exactly across the end of the stroke. See the opening of the third grouping on the last line for a very clear example of this. Details are important, but not *all* details necessarily have the same importance. Anyone know if S. is right-handed? The angles are roughly correct for a right-handed person writing in a classic humanist style (with the nib at a 30-to-45 degree angle to the vertical of the writing).

    6. The comment S. made highlighting the two dissimilar corners (sharp vs. rounded) is more obvious if you have an understanding of the physical strokes used to produce the two types with a nib pen: assuming a right-handed writer, the sharp form goes to the left at the base of the stroke, then comes back across to the right, while the soft form goes directly to the right and probably involves a slight (possibly not entirely comfortable) twist of the wrist and fingers to keep it from producing an excessive ink blot or having a strange “wobble” to it.

    7. Some of the “connective tissue” may be optional; the same character appearing at the start or end of a word may have different components than when it appears in the middle of a word. Consider classic English cursive writing for an example of this.

    8. It may or may not be relevant, but the Old Speech from which Hardic is derived was (is) the language of *dragons*. Hardic is the human adaptation of it. The nature of a stroke or dot may have to do with how someone would imagine them being written by dragging or tapping/pressing with a claw, rather than a pen or a human hand. Alternatively, if inspired by runic languages, it may have the same sort of thing going on in a different way.

    Reply

  60. Stilgherrian’s avatar

    @Joel: That’s a glorious set of working assumptions, containing little that’ll lead you astray. I’d been wondering whether to point out that this is a copy of an original that was done with a nibbed pen, replicated as best I could with a fibre-tip or similar. I am right-handed. There is no need to invoke dragons, or similar.

    Reply

  61. Dario’s avatar

    Hello. I’ve reached this site some days ago, following the Google name policy rant. It’s a shame this script challenge is still unsolved after all these years and with all these clues, so I am trying my hand at it. My considerations so far are the following:

    1. The text reads left-to-right, top-to-bottom, just as most Western scripts. I take this for certain, because penstrokes were clearly drawn that way. The centered text of lines 2 and 5 make it look like a title page, which would be an enormous clue.

    2. The text is made up of 29 words, which can be identified by whitespace as per @Joel’s assumptions 1 and 2, which are completely correct in my view. In the following figure I have placed each word in a green-bordered box. Maybe this breakup is not completely correct. but I’m confident it mostly is. The first word is also the most frequent (5 occurrences) and my guess is that it represents the article “the”. The fourth word occurs 4 times, and it’s probably a preposition (of, in, on …)

    3. The most prominent feature of the script is the line (code named “blue line”, painted in blue in my figure) which appears in every word and is sometimes interrupted. I’ll call “ascenders” the strokes above it and “descenders” those below it. Even if a long slash as the beginning of L1W2 really is only one pen stroke, I will analyse it as two strokes, an ascender and a descender. These are standard typography terms. A unique feature of this script is that, while descenders can live either with or without a blue segment above them,
    an ascender always requires one. The reverse is not true: in two instances, L1W3 and L6W4, blue segments start without being triggered by ascenders.

    4. In almost each case where an ascender and a descender are drawn one above the other, possibly in one pen stroke and probably as part of the same letter, one of them is “more complicated” than the other. This prompts me to classify the candidate letters of this script into four groups:
    (i) strictly negative: consisting of descenders only. These are the only letters which can live without a blue segment. All other categories require one, because they involve ascenders.
    (ii) extended negative: descenders, with an additional ascender (a simple vertical segment)
    (iii) strictly positive: ascenders only (a rich inventory of hooks, loops, dotted variants…)
    (iv) extended positive, the same as (iii), with an additional descender, which is a vertical segment (sometimes with a dot)

    5. Where do I put the simple slash which begins L1W2? It is an extended positive to me, because this script does have ascenders consisting of a single vertical stroke, while I don’t see descenders of this kind. If this is not clear, never mind, it’s only an attempt to rationalize impressions. I might be wrong with this detail, but I’m after an overall picture now.

    6. What is urgent now is to break up words into letters, and identify which letters are instances of the same character. My attempt at the first step is this figure, where letters in odd positions appear red, so that they can be distinguished easily from letters in even positions (positions are relative to words). To point to an individual letter I’ll use a notation like L1W2.1, as in the assertion: “L1W2.1 (the first letter of the second word in the first line) is almost certainly an instance of the same character as L2W1.2″ (an assertion I hold to be true, BTW…).

    7. I have two more pictures. In the first one I erased the blue segments altogether. This is clearest to my eyes when it comes to splitting words into letters, but in this way some information is lost, because I don’t know any more which strictly negative letters had a blue segment above them, and which didn’t. So here is fig. 3, where only blue segments above strictly negative letters are drawn.

    8. What about L1W1.1, a horizontal descender with no blue segment above it? Is it a separate letter or part of the extension descender of L1W1.2? I’ve decided for the former, but since it appears only in this one (frequent) word it could also be a special abbreviation, so I don’t care much.

    9. I’ve analyzed the initial squiggle of L1W7 as a ligature of two letters, because I think I have instances of those same characters elsewhere in the text, while there are no other instances of the ligature. Again, I might be wrong. Such uncertainties are normal.

    10. This post is already too long, and I have still many more observations about the graphic features of this script. I will post them in the near future (I hope), together with my character count, that is, my guess at which letters are instances of the same character, to estimate how many characters are in the script. What is certain is that this script has internal structure: strict characters have corresponding extended ones, dotted characters correspond to dotless ones, and so on. This means that even if not all characters of the script are represented in this text, I might figure how the missing ones
    look like. And even before actually counting, I can tell that the number of characters could be much higher than the 26 letters of the English alphabet.

    11. So this is my final consideration by now: the script is either an alphabet or a syllabic script, a static encoding (or simple substitution cipher, if you like) of something which is probably not ordinary English spelling: this would be the case if the number of characters was 26 or less. It is more likely that the script is a way to spell English phonetically, just as Deseret or Shavian are, with all complications of the case: phonemic inventories differ greatly across the English-speaking world, but in any case we need an alphabet with at least 40 characters, as the one we are dealing with here, according to my count. It might be a strict transcription of Danny’s presumably r-dropping Aussie accent, or an attempt to picture some kind of abstract pronunciation standard. What is more important, it is highly possible that the internal structure of the script, or only its least aesthetically motivated parts, maps to the internal structure of either syllables or phonemic inventories, or both. This happens in Tengwar, in Shavian, in Hangul, Ethiopic, Inuktitut and many other scripts.

    More soon!

    Reply

  62. Alex Holsgrove’s avatar

    I’m intrigued by this script challenge, but I fear my lack of ability will leave me stumped. There are a lot of comments here, and before I attempt to decipher this text, I felt that I had to come up with my own assumptions first. Based on what has been said so far, have I extracted the following statements correctly?

    1. This should be a simple translation. You noted that it is simple “beginner material”

    2. The script translates to English, but not in the form in which we currently use. (Perhaps Anglo Saxon English – 24 characters)

    3. The dots indicate a vowel, or the placement of a vowel

    4. The “glyphs” represent a sound rather than just a substitution to a different letter

    5. There are 3 sentences

    6. There is no punctuation

    7. There are no numbers

    I hope people are still interested in solving this. I realise that I have a lot to learn on the subject so apologies if my comments appear ignorant.

    Reply

  63. Dario’s avatar

    @Alex Holsgrove: I know as much as you, but I’d like to comment on your points, just to clarify my own ideas:

    1. This should be a simple translation

    Yes. Confirmed by Stil several times in this thread.

    2. The script translates to English, but not in the form in which we currently use.

    Yes, Stil wrote it’s a quote from Ursula K. LeGuin’s work, in English, but “not strictly a letter-substitution cipher” and what you write satisfies both hints. However I don’t agree with

    (Perhaps Anglo Saxon English – 24 characters)

    This was indeed mentioned in one comment but Stil didn’t confirm it (nor denied). I don’t think that the encoding process involves translating LeGuin into Anglo-Saxon, nor adapting modern English into Anglo-Saxon spelling. As many have written before (including me), a phonetic rendering is more probable. Read answer to point 4 below.

    3. The dots indicate a vowel, or the placement of a vowel

    This was mentioned, neither confirmed nor denied. I am not working in this direction. You can if you like.

    4. The “glyphs” represent a sound rather than just a substitution to a different letter

    This is a possibility and I am working in that direction. Truth is, we don’t know yet.

    5. There are 3 sentences

    This was also mentioned and not confirmed. Stil hinted that “In this particular example, if there is a new sentence, it would just start on a new line”, so maybe there are 6 sentences, one per line. Or, the whole stuff is a title page with no proper sentences…

    6. There is no punctuation

    Yes. No punctuation. Confirmed by Stil.

    7. There are no numbers

    Nobody mentioned that before. I’m in fact working as if there are no numbers, but who knows?

    Reply

  64. Joel’s avatar

    I noted the repetition of the first word, but I believe it may be 4 occurrences and one *similar* glyph: the left portion of the horizontal stroke on L4G5 appears to have a “sway” (as in “sway dash”) rather than a straight stroke. Obviously this *could* just be an artifact of the reproduction, but since similarly minor variations have been previously hinted as being important distinctions…

    I toyed around with the first word being ‘the’, but it just doesn’t feel right. The structure, and the ways in which it appears elsewhere, make me think it is probably a single phoneme, which pretty drastically reduces the available options. This could be way off base, however.

    One idea I toyed with is the possibility that the glyphs have some relationship to the actual *glyphs* of the IPA. In particular, the single-high and single-low dots might relate to the primary and secondary stress markers, placed on the right-hand side of the glyph rather than before it as the IPA does — especially since in some American English dictionaries, the stress marker comes *after* — and a break in the horizontal line might equate to a syllabic separation marker. Alternatively, the double-high-dot might be a primary stress, single-high-dot might be secondary stress, and a low dot might be a syllabic consonant?

    Just noticed, while looking at the possibilities of that, that there are only two cases where a low dot happens *without* at least one high dot, and both of those have “crowded” upper areas… but that feels too complex. Real scripts don’t generally move indicators around, and if it were a “displaced” single high dot then there would be no reason to have the “single high dot with low dot” that shows up at least twice. Interestingly, both times *that* pattern appears are on the same base glyph, which appears alone with a single high dot as L4G2. In any case, given the pattern of usage I strongly suspect the dots to be diacriticals of some sort, rather than part of the “base” glyph.

    In fact, L4G2 almost *has* to be a single phoneme, and since it is also a full word, that rather drastically limits the possibilities. My guess based on the notion of “glyph shapes being related” would be ['ai] (sorry, I couldn’t figure out if Unicode had a proper representation of this) with the “left rounded bit” coming from the a and the vertical stroke from the i, but there are other possibilities as well.

    The same theory might lead to L1G4 (probably the second most repeated grouping) being read as [ʃʌ] (“so”), derived by shortening the “leading tail” on the first glyph, rotating the second glyph 90 degrees counter-clockwise, and adding a midline stroke to indicate that they are a single syllable.

    Again, this could be way off base; there is nothing saying the script has any *gylph* resemblance to IPA or any of the other common phonetic alphabets, it is just a theory I tinkered with that produced a couple of plausible mappings.

    Reply

  65. Joel’s avatar

    Another thought that just struck me, though it *really* may or may not be significant even if correct: in general, calligraphic forms for letters start near the upper-left, sometimes with a ‘curl’, but then proceed either downward or rightward. L4G2, however, appears to be a continuous ‘flowing’ stroke connecting both the vertical and horizontal strokes through the leftward circle. So either Stil’s hand is *very* good (able to lift the pen and reset to the ‘starting position’ without producing any blotting due to excessive ink on a stem that is very narrow *and* lining it up pretty much perfectly with the curl) or this is actually a single stroke changing from vertical to horizontal through the curve.

    The latter explanation would be much more typical of a handwritten script, but it implies that either the vertical is a *rising* stroke, or the horizontal is a *reverse* (leftward) stroke. The possibility of it being a rising stroke is only significant if the script *is* based on ascenders and descenders affecting things, which I’m not convinced of, and then in the context of “what if ascender/descender involves the stroke itself, not the position relative to the centerline?”

    Otherwise it is just an interesting quirk of the calligraphy system for the script. Stil, any chance you remember whether you were trying to ‘trace’ it from the prior copy, or actually ‘write’ it out in your own hand (i.e., did you focus on replication of the original image, or of the writing pattern of it)? Just for my own curiosity. :)

    Oh, and now I’m wanting to define a font file for it, dammit. As if I didn’t have enough to fill my time…

    Reply

  66. Stilgherrian’s avatar

    Welcome to the new participants! Just a quick comment today, to clarify a couple of things being discussed.

    • The text is in modern English. No need to pursue Anglo-Saxon or any other forms.
    • The image I created here is an attempt to make my copy look as much like to original, including any idiosyncrasies of the original writer’s hand. However I only had some sort of fine-line marker that made fixed-width strokes. So to re-recreate the appearance of a a brush or nibbed pen, I “drew in” the wider and narrower strokes, making the lines fatter at certain points and so on.
    • I’m happy to confirm that there are three sentences. I think. Certainly they’re normal prose and not any sort of title sequence.

    Reply

  67. Joel’s avatar

    Toying with the possibility that L4G1 and L4G2 are formed based on the “vertical stroke with a left closed curl at the top” being ɪ, the “zig-zag” at that particular spot being ʒ, and the “horizontal stroke across the top” being ɾ, so that the two groupings are “ɪʒ ɪɾ” (“is it”).

    Part of the idea being the combination of hints about it *not* being a cipher so much as especially flowery/fancy calligraphy for an alphabet. I’m assuming the IPA for lack of anything that seems to fit better, but if anyone knows of something with a better match, speak up!

    Reply

  68. Alex Holsgrove’s avatar

    Joel, you seem as hook as I am now. Over the weekend I made a start on removing the “mid-line” and extracting each glyph (forgive my terminology) – so I am slowly building a collection of large glyphs, those above the mid-line and those below.

    I’m pretty sure it’s not a simple substitution of letters, but I’m quite keen on the idea of substitution into phonetics. Because it’s only 3 sentences, I don’t know how well a frequency-analysis would work. Do you think that would be forth pursuing?

    Any thoughts on the dotted letters? I translated the Thai that Stil posted “Liver, sweet and very tasty.” – I had to remove the exclamation mark to get it to translate.

    Reply

  69. Stilgherrian’s avatar

    @Joel: I think the key to the comments about it not be a cipher but an alphabet is that a cipher is intended to hide meaning while an alphabet is intended to make meaning clear. Yes, I realise that’s a category error of some kind. But there is indeed a certain logic.

    @Alex Holsgrove: In a short text, frequency analysis can only get you a little way, yes. But then you might wonder what things are similar to other things, and why.

    Transliterating ตับหวานอร่อยมากๆ phonetically gets you “tub waan aroi maak”. Tub waan is a spicy sweet liver salad from the Isan province of Thailand, and one of my favourite dishes. The word “aroi” means tasty or yummy. And “maak” adds “very” or “many”. So this sentence is how you’d compliment the chef. “The tub waan is delicious!” I have used this sentence many times.

    Reply

  70. Joel’s avatar

    Clarifying my thought about cipher vs. calligraphy:

    A cipher can be “secret writing”, but it can also be properly used to describe any transliteration, even just going from one alphabet to another. For example, saying 0×63697068 0×65720000 is a cipher for “cipher”, even though it is a trivial 1:1 in-order mapping of the ASCII values into hexadecimal. Then again, saying it as 0×83899788 0×85990000 is also a cipher, but is probably rather significantly less obvious to most folks (me included). Figuring that one out is left as an exercise for the reader. :)

    By comparison, calligraphy is an ornamented or decorative rendering of a glyph from any alphabet, which I suppose is technically orthogonal to the question of whether something is actually a cipher. Calligraphy may go so far as to rearrange portions of the glyphs somewhat, though that isn’t all that common, but I was assuming that it may have been done to at least some degree in this case.

    However, *encryption* does in fact require that the encrypted form be unknown to all but a small group, as the “crypt” root in this case means “hidden”. Encrypted information pretty much requires applying cryptanalytic skills to extract meaningful-to-the-public information from it. If using a 1:1 transliteration (whether that’s 1:1 by grapheme or phoneme), the line is very blurry and basically boils down to “is the alphabet in question sufficiently common or obvious from context that the audience could be expected to know it or readily find it”. So, for example, something based on the IPA (or any of the alphabets that are well known and can encoding English phonemes) would not be encrypted, since we’ve been told that it is “plain English” and almost certainly phonetic, while a transliteration using the “dancing men” cipher would be an encryption, or at least bordering on it — it would depend on how much of the audience would be expected to recognize it and know where to find the key for it.

    All of that said:

    The approach I had been toying with involved assuming that the message was written out in IPA or some close kin (definitely not a cipher in the “secret” sense, though certainly not many folks can sight-read it without a reference; I certainly can’t), but written using a calligraphic style with some additional items like the mid-line. I haven’t decided yet whether this is actually going to bear fruit; it definitely leads to coming up with several possible substitutions in short order, such as the ones I discussed in previous posts, but there are a lot of pieces that I haven’t been able to pin to much of anything yet.

    My basic approach was to look at some of the shortest groupings, especially ones that were either repeated stand-alone or repeated as parts of longer groupings, and try to map those as a “hook” into the rest. The idea being that very short groupings represent words that can only have a couple of phonemes, at most, which both drastically reduces the number of possible phoneme combinations that could go into it, and *very* drastically reduces the number of English words that they could possibly form. English actually has remarkably few one-or-two phoneme words, as far as I can come up with, but several of them are very commonly used ones, which makes sense.

    My approach to the dots was to assume that they mapped in some fashion to the modifiers for IPA (reduction, primary/secondary stress, one other I’m spacing on at the moment), and given the positions that they were probably being treated as diacritics (composed *with* the glyph they modify, the way an umlaut is, rather than proceeding it as they do in normally-written IPA). I had toyed with various mappings (double dots being the reduction modifier, since it is two dots, vs. a single low dot being the reduction modifier), but hadn’t come up with anything terribly concrete yet.

    Obviously this may or may not be even remotely on base, but if I make the assumption that it is in a completely arbitrary system of writing that was created by the original person who wrote it out… well, I don’t actually enjoy doing cryptograms, generally; I just don’t find them that interesting. So I’m going with the approach that appeals to me, since if I’m right it is satisfying, and if I’m wrong I’m no worse off than if I tried to do it as a phonetic cryptogram. :)

    Reply

  71. Stilgherrian’s avatar

    @Joel: All that’s quite sensible stuff. I think you’ve gathered the evidence so far into a coherent view of the problem in front of you. I’m not going to confirm or deny individual details, but I will say that there’s very, very little in what you say that might be leading you in a wrong direction. Also, there’s not all that much in this script that’s calligraphic ornamentation.

    Reply

  72. ryan’s avatar

    just stumbled across this. has it been solved yet? what is the prize?

    Reply

  73. Stilgherrian’s avatar

    @ryan: No, it has not been solved yet. It’ll be flagged rather heavily when it is. As the original post says, “I’ll negotiate a suitable prize for the first person who posts the solution.” That’ll depend to a large extend on who solves it, where they’re located, and how much of an arsehole they are. They more their motivation seem to be about prize-winning rather than joyous problem-solving, the smaller the prize will be.

    Reply

  74. Joel’s avatar

    Got dragged into a major project at work shortly after my last comment, but I haven’t forgotten this, or given up… pondering what other tools I may be able to bring to bear in decomposing the graphemes, at the moment.

    Reply

  75. Dario’s avatar

    Same here.

    My motivation is “joyous problem solving” and my life is presently organized on a “duty first, pleasure next” basis, so there is not much time left for this challenge. However, I am still on board too.

    Reply

  76. Stilgherrian’s avatar

    I still can’t help thinking that people are over-complicating their approach to this. Some of the most basic cryptanalysis techniques should get you almost all of the way there. Particularly these days, where soft copies of the author’s books are available.

    Reply

  77. Joel’s avatar

    True. I just hadn’t managed to find them in a form I could apply some of those to. Besides, it felt like cheating, if the idea was to actually figure out the script. :)

    Reply

  78. Stilgherrian’s avatar

    @Joel: That said, after more than four years maybe a little bit of “cheating” is called for.

    Reply

    1. Joel’s avatar

      For anyone curious, the “cheating” method I was thinking of is simply doing a multi-line regular expression match based on matching up all of the spots where a word appears more than once in the script, based on distance-in-words. There aren’t many, but it also doesn’t take many to narrow the field down a *lot*, quite possibly to a single possibility.

      However, (A) I don’t have access to an online copy of the books that would allow that level of detailed search, and (B) for me it is more fun to ponder the script, anyway. :)

      Reply

  79. Alex Holsgrove’s avatar

    Perhaps as something with little skill in phonetics, translations and cryptanalyses – I may be at an advantage by, as you say, keeping things simple.

    I had started to just copy each of the “symbols” (again, please excuse my terminology) with the aim of converting them into a “sound” and then simply try and read the sentences.

    The hard part would then simply be trying to work out where these “sounds” can appear on their own, or as a part of a word.

    Having said that, I think Joel will probably nail it soon…

    Would I be right in saying that where that “mid-line” connects the different “symbols” – we are looking at a word? Where they are not connected, and we see isolated “symbols”, theses are words like in, at, of and so forth?

    Reply

  80. Stilgherrian’s avatar

    @Alex Holsgrove: The technical terms you’re after are: “glyph” for what you call a “symbol” (ordinary people call them “letters” and “numbers” and “glyphs”, but this term covers every atom of writing); and “phoneme” (an individual unit of sound, which may be written using one glyph as in the English “k”, or two glyphs such as the English “ea”).

    To answer your final question, the top line has eight words.

    Reply

  81. dario’s avatar

    To answer your final question, the top line has eight words.

    That was my count too (see my very first post). Thank you very much for confirming it.

    Reply

  82. Joel’s avatar

    A question that I keep running up against: if this is some form of transcription in a phonetic alphabet, is it transcribing the passage as spoken in AuE (Australian English), RP (Received Pronounciation / British English), GA (General American English), or based on a specific reading of it by someone?

    UKL doesn’t provide a ‘standard’ pronunciation guide, specifically because she believes that the names should sound like whatever the reader reads them as (or something to that effect, I found it buried in some comments on her website).

    The other half the fun seems to be that there were major changes to IPA in 1989, so even if it *is* based on the IPA, it wouldn’t necessarily be written out with the same symbols today. Or it may not be at all and I’m just barking up the wrong tree…

    Line 1 word 8 continues to give me fits, because assuming that the first grapheme maps to either ‘s’ or ‘ʃ’, I cannot come up with any phoneme that both forms a real word *and* has an IPA glyph even remotely close to the one in the image.

    Reply

  83. Stilgherrian’s avatar

    @Joel: Well, we were living in Adelaide at the time and the local accent for us private-school types is closer to RP than anything else.

    HERE IS THE BIGGEST CLUE OF ALL: Note that in line 1 the word at word 8 also occurs at word 4. And it’s one of the most common words in the English language. And I’m almost entirely sure that it doesn’t begin with either ‘s’ or ‘ʃ’. There is nothing in this script that could be seen as a visual echo of IPA glyphs.

    Reply

  84. dario’s avatar

    @Stilgherrian: in my August post I had said that L1W4 appears four times, assuming L1W4 = L1W8, that you confirmed, and also that L1W4 = L4W4 = L4W7, which I take for confirmed as well. You also said that it is a very common word, which is compatible with my guess that it is a preposition (or possibly the verb “is”). I don’t think it is an article: it appears before L4W5, which I believe to be an instance of the same word as L1W1 (in spite of a very small difference in the initial stroke) and is also a frequent word. So I take L4W4 L4W5 to mean something like “Of the”, “is a”, “on a”, “Of my” or similar: “the of” “a is” and the like are probably out of question :-)

    In fact, for me, the RP clue is bigger that L1W4 = L1W8. Thank you very much for it, but please, please, please no more clues. It’s Friday morning now up above here (as opposed to Down Under). Please give me a weekend before further spoilers. :-)

    Reply

  85. Stilgherrian’s avatar

    @dario: It’s a deal. No more clues for a while. Besides, to provide more clue than those already in the mix would require me remembering the entire solution. ;) Have fun, folks!

    Reply

  86. dario’s avatar

    I think I have it

    … and in retrospect, it was “real beginner-grade material”, as Stil posted in October 2008. In fact, after a couple of hours before my own first comment in August, I needed only last Sunday and some more hours yesterday night to break it. I didn’t work on it in the meantime. However, it was clear that Stil was growing impatient (after 66 months!) and he was giving away too many clues. So finally I decided to test my initial hypothesis and it proved right at the first try.

    Well, something is still missing from the picture. I could not identify the source work. Googling the last sentence, which is a motto, I found video games and other universes apparently not related to Ursula K. Le Guin. Since the script is a phonetic representation of English, I know how to pronounce but I cannot retrieve the original spelling of three words: two are fictional place names and the third is a generic classifier for one of them (as if they were “Australia”, “New South Wales” and “Commonwealth”). In my solution text I have used plausible spellings for them, in brackets. The challenge text is very short (110 characters) and it doesn’t cover all sounds of English. The internal structure of the script enables me to figure out how some of the missing sounds would be represented, but unfortunately not all (I was too optimistic in August.) There are also other minor doubts.

    I am confident, however, that what I’m posting here is a correct solution, and I claim the prize.

    Reply

  87. Stilgherrian’s avatar

    @dario: Well, Sir, you have it! Congratulations! And to save people having to click through, here’s your image of the solution.

    I must now confess that I seem to have led everyone down the wrong path. This is not the work of Ursula K Le Guin at all! It’s actually a document of some sort related to the fantasy universe of Danny, the guy who developed the script!

    The place names Yocentro (as you’ve styled it, but I think Danny transcribed it differently in the Roman alphabet) and Rothmile trigger memories of a sandy desert planet. The well-wishing of “May the Sands be with you always” fits too.

    I haven’t been in touch with Danny for years, but I’ve just emailed him to see if he can provide any missing pieces. But it was more than 30 years ago…

    All that said, I do remember seeing and handling a document written in this script that was a Le Guin quote. I just gave you all the wrong document. Apologies!

    On more practical matters, dario, “I’ll negotiate a suitable prize for the first person who posts the solution,” I said. I’ll email you privately about that tomorrow Sydney time.

    Reply

    1. Joel’s avatar

      That plus the “not a calligraphic variant of IPA” would explain the utter lack of success I was having (I hadn’t gotten around to beating on it further after the ‘it is not any variant of IPA or related phonetic language’ hint).

      I would argue that a script that has no particular relationship to any common phonetic alphabet does qualify as encrypted (apropos earlier discussion of where you cross from ‘transliterated’ to ‘encrypted’). That said, it is still a fairly pretty set of graphemes, and I’m still curious about the full glyph breakdown / rules of construction…

      For example, what are the rules about the breaks in the midline? The word ‘Sands’, for example, would seem to disprove my original theory that they might represent syllabic breaks, but the only pattern I can really find to them looking at it right now is that they seem to often (always?) be associated with a phoneme that starts with an ‘s’ sound.

      And similar questions remain about the dotting; two very visually similar characters (the second phoneme in each of the first two words of the last line) have what appear to be quite different sounds mapped to them, which as far as I know is fairly atypical of natural writing systems. Do you know/remember if there is a significance to this, or is it just that it happens to be a constructed system and that was how the creator felt like drawing them? :)

      Reply

  88. Bob Bain’s avatar

    In 2008 I wrote

    Observation 1. It could be a simple character substitution code given that at least two “scribbles” are repeated throughout the script – the first “scribble” and last “scribble” on the first line for instance. If this is the case the commonly used English character frequency of letters table starting ETANOISH… could be of use.

    ———————–

    I can see now that an excellent starting point would be simple word substitution – with 5 “the” and 4 “of” in a set of 28 words with the strokes representing an indication of how each word should be pronounced…. I can find “th” in [Ro"th"mile]

    Bob

    Reply

  89. Stilgherrian’s avatar

    One thing I can point out without having received Danny’s reply is that the glyphs are quite systematic in terms of how they map onto phonemes.

    Reply

  90. Bob Bain’s avatar

    “Glyphs are quite systematic in terms of how they map onto phonemes”

    Translation…

    “Find an appropriate phonetic font and type the words above into a wordprocessor” ?

    I have been searching the 1 million 600 thousand “phonetic fonts” results from Google and even image searched for “phonetic font stilgherrian” where clues to the puzzle can be located possibly in terms of the name of the image file.

    Having reached a solution we wait with baited breath for the ultimate key to the solution and whether or not Rothmile is on the Sydney City Rail network (sigh).

    PS Rothschild wasn’t the name of the banker. It was short for “The Shop of the Red Shield Company” established by Amshall Moses Bauer in 1743. Mr. Bauer’s son changed the family name to Rothschild after his father’s death.

    Bob

    Reply

    1. Joel’s avatar

      “Glyphs are quite systematic in terms of how they map onto phonemes”

      Translation…

      “Find an appropriate phonetic font and type the words above into a wordprocessor” ?

      You’d still have to type it in phonetically… if I get some time, I might sit down and try to transcribe the entire thing to IPA, just out of curiosity at what it would look like, and because it would then be possible to represent it as Unicode glyphs. If I do, I’ll be sure to post it.

      Reply

  91. Dario’s avatar

    Here is a hastily written narrative of my decipherment process. Let’s say it’s a first draft of the full report I’ll hopefully be able to write. I have many more observations, charts to clarify many points, and so on. I haven’t just got the time to put them together right now.

    When I learned of the challenge at the end of August, all hints pointed to a phonetic writing system for English (I did mention Deseret and Shavian in my first comment), so I started from there, working by the book: I broke up the challenge text (henceforth: the Document) into words, the words into characters, produced fig. 1, and, based on known facts about word freqency in English, I immediately identified three words: “the” and “of”, which I mentioned, and the single-letter L4W2, which had to be the article “a”, the most frequent English single-sound word. In cryptography jargon, such clues are called “cribs”. Then, let me quote myself:

    3. The most prominent feature of the script is the line (code named “blue line”, painted in blue in my figure) which appears in every word and is sometimes interrupted. I’ll call “ascenders” the strokes above it and “descenders” those below it. Even if a long slash as the beginning of L1W2 really is only one pen stroke, I will analyse it as two strokes, an ascender and a descender. These are standard typography terms. A unique feature of this script is that, while descenders can live either with or without a blue segment above them,
    an ascender always requires one. The reverse is not true: in two instances, L1W3 and L6W4, blue segments start without being triggered by ascenders.

    4. In almost each case where an ascender and a descender are drawn one above the other, possibly in one pen stroke and probably as part of the same letter, one of them is “more complicated” than the other. This prompts me to classify the candidate letters of this script into four groups:
    (i) strictly negative: consisting of descenders only. These are the only letters which can live without a blue segment. All other categories require one, because they involve ascenders.
    (ii) extended negative: descenders, with an additional ascender (a simple vertical segment)
    (iii) strictly positive: ascenders only (a rich inventory of hooks, loops, dotted variants…)
    (iv) extended positive, the same as (iii), with an additional descender, which is a vertical segment (sometimes with a dot)

    When I wrote this I wanted to state facts. I didn’t want to share my wild guesses, but I already had an idea in mind: all vowels of my cribs were extended positives, the two consonants were strict negatives. What if the positives represented vowels and the negatives represented consonants? In that case the odd behavior of those glyphs with respect to the midline would have a fascinating explanation: the midline represented voice! Vowels, i.e. positives, are always voiced,
    as well as sonorant consonants (the extended negatives!), while plosives, fricatives and affricates, the consonants that in English come in voiceless/voiced pairs, had to be represented by the strict negatives, which appeared in the Document both with and without a midline segment above them. If this was true, it wasn’t simply like Shavian, were the glyphs representing voiced consonants are flipped versions of the ones representing voiceless ones. In this script, voice was really written down with separate penstrokes of its own, as in some kind of spectrogram!

    (In the following, I write IPA symbols between slashes, as /hɪə/, both to represent phonemes and to represent their corresponding script letters as I decipher them. I know this is against common IPA usage and I hope this causes no confusion.)

    The hypothesis had to be verified. Of the two crib consonants, L1W4.2, the voiced final /v/ of “of” had indeed a “blue” segment above it, and this would imply that L3W4.1 represented /f/, its voiceless counterpart, but the other crib consonant, L1W1.1, the initial /ð/ of “the”, was also voiced, but had no segment above it! The whole construction, however, was too beautiful to be dismissed by that simple dash. Many writing systems have special exceptions for common words, so I didn’t consider my idea disproved, but I badly needed real data to see if actual English phoneme frequencies matched what I thought I was seeing in the Document. The ETANOISH sequence mentioned by Bob Bain is well known, but it holds for conventional spelling, and I considered it of little use here. Fortunately, one of the most authoritative living English phoneticians, Prof. John C. Wells, whose blog is in my RSS feed, had posted a piece completely written in IPA in June. It was probably long enough to extract significant phoneme occurrence statistics from it. I preferred starting from scratch and counting the phonemes myself, because Wells uses a standard transcription system I’m completely familiar with, while many articles that could be found on the web used somewhat different systems, different phoneme counts, were based on different varieties of English and would require more adaptation work (I was assuming that the Document represented an Australian variety rather similar to Wells’ British English, at least in phoneme distribution if not in realization… I hope I’m upsetting nobody with this sentence).

    In any case the timeframe I could dedicate to this matter had expired. The challenge went into my TODO list with the lowest possible priority, and it stayed there for months. Last weekend, I pushed it to the top.

    Not surprisingly, 12.38% of my sample consisted of the single phoneme /ə/. It was clear that in the Document no character was so frequent, but Wells’ is a radical transcription, where, for instance, “the” is transcribed either /ðə/ or /ði/ according to its pronunciation. If Danny, the inventor of the script I was deciphering, wanted to keep the same spelling for the same word in all positions, he might have used /ði/ throughout, reducing the frequency of /ə/. Such an approach might also have explained the disturbing fact that the single vowel of L4W2, a candidate for the indefinite article, far from being the commonest, appeared only there. I still don’t know the reason: now, I think that that letter means “indefinite article, sometimes /ə/, sometimes /eɪ/”. In any case, the positives made up 43.09% of the Document, and 39.44% of the sample consisted of vowels. Not close, but not apart enough to disprove the theory. Maybe there were some positives which weren’t vowels. I know now that that was indeed the case: L1W2.1 appears three times, 2.73% of the Document, and represents /h/, not a vowel and not even a voiced sound. But it is a simple slash, it is somewhat outside of the system just as /h/ is a somewhat special sound, so it’s OK.

    After /ə/, the commonest phonemes in the sample are, in order, /ntɪslkr/. I had to go for consonants, that is, in my hypothesis, negatives. The extended ones had to be sonorant consonants. In English there are seven of them: /m/, /n/, /ŋ/, /w/, /l/, /r/, /j/, and indeed I counted seven extended negatives, all scythe-shaped, one of them dotted, either with a sharp or a rounded angle where the “handle” met the “blade”, at three possible depth levels below the midline: 3 depths × 2 angle types + 1 dotted = 7! Some strict negatives, on the other side, were the handleless counterparts of those scythes (let them be “sickles”), while the one I had already identified as /v,f/ and /ð/ had a completely different shape. Wait! The latter were all fricatives… could the former be plosives? In that case, could the three depths correspond to the three places of articulation of English plosives? In that case, the scythe representing /n/ would be at the same depth of the sickle representing /d,t/ (with or without midline), and similarly /m/ with /b,p/ and /ŋ/ with /g,k/!
    (if you feel confused, this chart might help). Frequencies showed where /n/ and /t/ are. They are at middepth. Labials tend to prefer initial positions, so they had to be the shallow scythes (/m/ and /w/) and sickles (/b,p/), which also showed this preference. The velars were at maximum depth, with a very conveniently final /ŋ/ at L3W4.6, which also showed that the nasals where the rounded scythes, so that /w/, /l/, /r/, /j/ had to be the sharp-angled ones, identifying the dotted one (L4W6.1) with /j/ (I think you all know that /j/ is the initial glide of “you” /juː/, and not the “j” of “Jew” /dʒuː/)

    There was also a spatial metaphor in this: the closer to the lips a sound is articulated, the closer to the midline its glyph is written. How elegant!

    At this point I had most consonants, and I understood why vowels came in strictly positive or in extended form. Since there are so many scythes and sickles which can be easily confused with each other, some of them cut the stems of the following vowel, some don’t, and this is a way to tell them apart, alongside with sharpness and depth. For example, the boomerang-shaped vowel L3W4.4 isn’t cut by the preceding /l/ (middepth sharp scythe) but a preceding /r/ (deep sharp scythe) cuts it at L5W1.2. There are three possibilities: cutting, overstriking and joining, as in L6W4, where a shallow sickle (a /b/) joins the vowel of the article /ði/. Hey, this is the verb to /bi/! (There are still problems with the choice of strict vs. extended form of vowels, see below.)

    Some fricatives were still missing, notably /s/, the commonest of them. A natural candidate was the commonest of the still unidentified glyphs, L1W3.4. It also appeared in a ligature with /k/ at the beginning of L1W7, which then could be read as /skr?b?/. Hmmm. “scribes” /skraɪbz/ perhaps? Tempting. This would identify L1W2 as “high” /haɪ/, solving the problem of L1W2.1, and understanding the initial sequence of L2W1 as /kh/. (/k/ is usually a cutter, as in L1W3, but probably /h/ can’t be cut at all, or /kh/ is a special case. Also, /j/ cuts L4W6.2 but doesn’t cut L6W6.2. Maybe cutting is optional for such an easily identified letter, maybe there are rules we cannot derive from such a short text. Never mind.)

    A small problem with reading L1W2 L1W3 as “nine scribes”, however, was that /z/ was not represented as in L4W1 and as it should be, as an /s/ below a midline segment, but with a somewhat abbreviated form, easily confused with a final /v/ (the difference is that in /v/ the glyph hovers below the midline, while in the abbreviated final /z/ it dangles from it. I still don’t know if such an abbreviated form is always optional or is restricted to the cases when /z/ is obviously a suffix (plural, third person, genitive…), but again, it’s not a big problem.

    If you have followed me to this point, you are surely able to find out the vowels for yourself. I’ll list a couple of final remarks here:

    1. L3W5 is “personage”. I’d pronounce that word /pɜːsənɪdʒ/, but the vowel values I found correspond to /pɜːsɒnædʒ/. This confirms what we had already observed, that vowel characters in this script are not precise phonetic representations. In particular, reduced vowel sounds are (often) written as the full vowel they etymologically come from, just like in conventional English spelling. This word is also the only occurrence of /dʒ,tʃ/. We don’t know how /ʒ,ʃ/ looks like, there are no occurrences in the Document, but /dʒ,tʃ/ is composed by the middepth sickle /d,t/ and a final curl. Maybe that final curl alone represents /ʒ,ʃ/.

    2. The whole Document is obviously written in an r-dropping variety of English, as shown by L3W2 /rekɔːdz/ “records” (for /re-/ instead of /ri-/ see the vowel comment above). However, L3W1 is “hereby”, and after a unique first vowel that I interpret as /ɪə/ (and is not in extended form, for an unknown reason), there is an /r/ character: /hɪərbaɪ/. The /r/ would be read only before vowels, but Danny decided to write it always, so that the same word is always spelled the same. Also, I don’t know why /aɪ/ is dotted here (and in L5W1, “Rothmile”). Maybe because there is another stressed vowel in those words. I don’t know.

    3. The final vowel of L4W6 is a bit strange. It could be unique, but I think it is an /ɔː/ as in L3W2 /rekɔːdz/ “records” (strictly positive) or in L6W7 /ɔːlwəz/ “always” (extended positive), so I read that word as /jəʊsentrɔː/ and transliterate it as “Yocentro” or “Yocentror” but I am really in doubt here.

    Thank you very much for keeping up with me for such a long post, and may the Sands be with you always.

    Dario
    (an Italian mathematician by study, sysadmin by trade, amateur linguist by passion)

    Reply

  92. Alex Holsgrove’s avatar

    Dario,

    Many congratulations on solving the challenge. I find these thing fascinating, even if I knew I never had much hope of solving it before anyone else (if ever!).

    Your prize, whatever it may be, is very well deserved as you’ve clearly put a lot of time and effort into solving this.

    Well done again.

    @Stilgherrian – thank you for posting this challenge all those months ago. I found your site after seeing the google name blog post, and stumbled across this. I’ve kept a keen eye on the responses and I’m sure you’re probably rather relieved that it’s finally been solved! Do you think you’ll do anything else like this again? Many thanks.

    Alex

    Reply

  93. dario’s avatar

    Updates on vowels

    Thank you for your congratulations.

    I’ve looked at the vowels once again and I must conclude that I still don’t know much about them. All I can show you is my vowel chart: under each glyph I’ve put the words where it appears, in ordinary spelling, with the corresponding letters underlined. If a character appears in the Document both in short and long forms (previously I called them “strict” and “extended”), I’ve put them both. I believe they are variants of the same character, but I must confess that all my previous theories about them proved unsatisfactory. At the moment, I still don’t know when to use a short or a long form.

    As you can test pronouncing those words yourself, vowel orthography is not strictly phonetic: let’s say that Danny made some concessions to ordinary spelling. In any case, at least six RP vowels don’t appear in the Document and I have no idea how to represent them: /ɑː/, /ʌ/, /ɔɪ/, /eə/, /ʊ/ and /ʊə/ (assuming that Danny did not distinguish between /i/ and /iː/, /u/ and /uː/. Otherwise /iː/ and /u/ are also missing).

    There are many uncertainties, I’ve already mentioned some of them. Here, I’ll say only that perhaps, the /ɜː/ glyph in “personage” is only the extended form of the /e/ glyph in “citizen”. A hint that there could be some phonetic meaning to short and long forms?

    @Joel: I hope the chart I posted helps you with your question about the vowels in “May the”. It was Danny’s game. While the structure of the consonant glyphs was immediately transparent to me, the same is not true for the vowels.

    What is missing now is a consonant chart. I’ll leave it for my next post.

    @Stilgherrian: I hope Danny answers your mail soon. I hope he can remember some of the missing bits. If somebody asked me about the secret scripts and languages I’ve invented in my youth, well, there were so many of them I’d be embarrassed… but something I do remember.

    Cheers,
    Dario

    Reply

  94. Stilgherrian’s avatar

    No, I’m not ignoring all your comments. I’ve been busy. I should get to this stuff tomorrow.

    Reply

  95. Alex Holsgrove’s avatar

    I’m curious to know of the prize if Dario or Stilgherrian would kindly share?

    Reply

  96. Stilgherrian’s avatar

    @Alex Holsgrove and @Dario: This is where I’m embarrassed to say that it completely slipped my mind. My excuse? Dario’s win happened at a time when I was busy, stressed and a tad short of cash. Well, I’ll fix that within the next 48 hours. Stand by!

    Reply

Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>