Sunday, August 19, 2007
Moving to hoakz.com
Tuesday, April 3, 2007
Using UUIDs to Prevent Broken Links
I don't know if someone has proposed this before, but how about using UUIDs to prevent broken links?
A UUID, Universally Unique Identifier, is a hexadecimal number divided into five sections. A UUID has the special quality that it is universally unique. This means two people on each side of the world could create a UUID each at the exact same time, and still be sure their UUIDs are not identical. In fact they can create a large number of UUIDs and still be sure they are not identical. (The same goes for two people on the same server.)
This quality makes UUIDs a perfect tool for assigning unique IDs to web pages or other Internet resources (in fact any resource of any kind, your dog, the cuttlery in your drawers, you name it.)
This could be done like this:
Step 1: Place the UUID on the page
First a UUID has to be put on the webpage, perhaps with a meta-tag, or with plain text on the page.
With a meta tag it could be done like:
<meta name="UUID" content="8523813a-7c47-4cd9-ad78-09c14dfb505f"/>
Or on the page, like:
UUID: 8523813a-7c47-4cd9-ad78-09c14dfb505fStep 2: Find the UUID
The second step would be to make sure every time a program stores a URL to the page it also stores the UUID. (When creating bookmarks, or linking from one site to another etc).
So, once the page get lost, either because the link has changed, the page has been moved or something similar, the browser (or site) can use the UUID to find the page again.
The second step obviously demands a search engine (or some other central registry) that utilizes UUIDs in it's index since the system does require some kind of central processing for keeping track of a UUID-to-page-link.
A UUID is not a particularly good URI since even UUIDs generated at the same host just a few seconds apart are still totally different from each others (this actually depends on implementation, but one should not assume UUIDs from the same host shares any similarities).
This however is also one of the strengths of UUIDs since it means an Internet resource should be possible to locate regardless of its physical location (in a contrary to ordinary http-URLs that are tightly bound to their location -- they start with the server name).
Since a UUID (per definition) is universally unique, it is fairly simple to generate one wherever you are, and use it in a page, be sure there are no duplicates and locate the exact page of the UUID again.
A Google experiment
I've placed the text "UUID: 8523813a-7c47-4cd9-ad78-09c14dfb505f" on this page. (Several times now). As far as I've been able to discern, Google indexes even such arbitrary information as UUID data (the exact string "8523813a-7c47-4cd9-ad78-09c14dfb505f" to be precise, check out this page with a discussion on how to use UUIDs to make pages unique... It has nothing to do with this discussion but is an interesting example on how UUIDs could be used with Google).
By searching for "8523813a-7c47-4cd9-ad78-09c14dfb505f" it should be possible to locate this page... (see if it works? -- Give Google time to index the page though... Update: the above link seems not to work, but this one [searching for the UUID with "-" replaced to space -- or "+"], however, does...)
Finally
The page localization should work regardless of the page's position, site, or anything. In fact, as long as the UUID is still there, it should even be possible to place this text in a document of type Word/OpenDocument/PDF or any other format Google can index, and the text would still be possible to find with nothing but the UUID.
Obviously the end result of this technology would be that there is no "search-engine-in-between" but instead whenever the link is lost, the caller goes to the central repository/search engine (or some other place) and locate the page, then links to it automatically. It should even be smart enough to retry until it finds a link that works.