I’ve been working on the tag cloud page, and one of my attempts to clarify things has revealed a disturbing fact.
I decided that the “category cloud” on the left-hand side of the website was already showing that the biggest categories were politics, the Internet, human nature, media and business. I didn’t want the tag cloud to repeat that information. So I decided to remove all the tags which were also the names of categories.
Boy, that certainly changed the emphasis!
Even in the reduced screenshot (right), one name dominates. Yes, out of 944 posts, counting this one, 91 are tagged “john howard”.
My own boyfriend comes in a poor second with just 42.
Is that right?
I’ve tried to placate ’Pong. I’ve said that at least he doesn’t frustrate me to the point of inspiring lengthy rants about the destruction of social values and the end of the Enlightenment. I’ve never suggested that he be tried as a war criminal — though after that night-time canal boat ride in Bangkok I may reconsider that.
(Actually I haven’t told you about that canal boat properly yet. It’s another Unreliable Bangkok piece waiting to be written. I’ve been back in Sydney two months now, it’s not too late is it?)
However this does raise an interesting point about how tags work…
In a traditional information system, you’d plan your keywords in advance. You’d invent a taxonomy (that is, a formal classification system), and then you’d develop a controlled vocabulary (that is, a set of authorised keywords). For example you’d decide that it’s the “construction” industry, not “building”. Everyone would work off that controlled vocabulary.
Save confusion, y’see.
Everything filed into neat little pigeon-holes.
However in the Brave New World of the Social Internet, no-one bothers with all that. Everyone makes it up as they go along, throws it all into the ether, and with luck it’ll all sort itself out. Or Google will do it for us.
Instead of a taxonomy, you have a folksonomy.
This development mirrors many, many aspects of the post-Industrial Age. In the Industrial Age everything was centrally planned, like the Soviet Economy — one of history’s great success stories, no? Now, everyone just works at it as best they can, and problems are ironed out through group consensus — or just ignored because no-one’s interested.
And by golly gosh, it actually seems to work.
A 2005 study by Nature (which is behind their paywall, so we’ll link to C|Net’s report too) found that the centrally-planned, professionally-edited Encyclopaedia Britannica is only marginally more accurate in key areas than Wikipedia.
OK, Encyclopaedia Britannica disputed [PDF file] the study, and then Nature bit back. But the core point is that the Wikipedia approach generates a product which is just fine for everyday purposes, and it does so a lot faster, with a relatively small trade-off in accuracy.
So, to get back to my main point… assuming this actually has a point…
Mostly I write about politics. A very broad range of politics. Through my ad hoc assignment of tags to blog posts, I’ve shown that John Winston Howard dominated my political writing. I suspect that everyone else’s was much the same.
Yes, 2007 really was all about JWH, just not in the way he wanted. And now, Sir, can you please bugger off out of my website? Ta.
The problem with controlled vocabularies is always that what you call a spade I call a shovel. That was solved in the olden days to some extent by ‘see’ references, which meant that when I looked up spade, I’d get a nice little note to ‘see shovel’, or perhaps a reference to a broader or related term, ‘see also entrenching tools’. Interestingly, Google doesn’t do that, so we have to be cluey enough to come up with the synonyms on our own and we seem to be managing quite nicely.
Tagging your own input without recourse to a thesaurus of preferred terms or the ability to use see references means that you’re likely to call the same thing different names some of the time, meaning that you’ll get, say 45 hits for ‘spade’ and 44 hits for ‘shovel’ so they’ll feature lower on the list than ‘John Howard’ who is always ‘John Howard’, unless of course you also use ‘lying rodent’ as a tag.
@Quatrefoil: Yes, controlled vocabularies only work when everyone is properly trained in their use. That is, when information was all produced and distributed by “professionals”. With the production tools in the hands of “everyone”, Everything Changes™.
Google does do synonym matching, but invisibly. You can turn it off if you want. However Google searches the full content, not just the keywords. Will the foreigners know who’s really saying “Sorry”?
“John Howard” is always “John Howard”, unless he’s the other “John Howard”, the actor. Does “opera” refer to a staged musical production, or a certain web browser?
This has always been, and always will be, a hard problem.