Jun 282007

The problem with people is that they don’t have unique names. To take an extreme case, Yahoo reported a couple of weeks ago that the Chinese authorities were considering a move to try to end the confusion caused by the fact that more than a billion people are now sharing just 100 surnames, and 93 million have the family name Wang.

More pertinently for this publishing, the problem of author disambiguation has long been an issue for searching bibliographic databases such as PubMed/Medline. There is a lot of work being on done on automatic ways to disambiguate author names, such as using affiliations, email addresses, subjects and co-authorships.

However a more “Web 2.0” way to do this has been suggested in the WikiAuthors proposal. The idea here is that a copy of the database (in this case, Medline initially) would be placed on a wiki, and the authors and their colleagues – that is, the scientific community at large – would do the necessary work.

At present the WikiAuthors proposal appears stalled, pending the development of other WikiMedia projects (e.g. WikiProteins).

I was struck by some similarities with Spock, the current hot new search engine. Spock (currently in private beta, but there’s a good overview on Read/WriteWeb) focuses on people search, that is, it treats all search terms as a request to find matching people. Thus searching for “President of the United States” returns George W Bush and Bill Clinton as its first two hits.

The similarity with the WikiAuthors proposal is that Spock will allow users to add tags (in addition to automatically generated tags).

Spock will, however, be much more semantically rich than is proposed in WikiAuthors. Tags will include name, gender, age, occupation and location, and others. The really interesting bit comes from the “relationship” tag, which will link people together. Thus Spock can offer links to related people – in the case of Bill Clinton, this might be Chelsea Clinton (daughter), Bush (successor), Hilary (wife), Gore (VP). This will be a powerful tool if it works as promised.

Looking back the other way, I wondered if it would it be useful to tag relationships between authors in a bibliographic database, for instance co-authors, co-workers, student/supervisor, etc. This could give a whole new way of exploring links in the literature beyond the current way of using citations.

Technorati Tags: ,