Tuesday, August 16, 2011

The arXiv is 20

Twenty years ago, a particle physicist named Paul Ginsparg got tired of the way in which pre-publication research was circulated. After a paper was written, the author, or a secretary, in the case of fortunate authors, made many copies of the paper, a circulation list was typed, and copies were sent to those on the list. In the process, some copies went to those who had not that much interest in the subject, and many more, who would have been interested in the subject, but not on the list, would only find out when the paper got published. This obviously meant several months, which was a serious handicap to those who worked in fast developing fields. The particle physicists, always the quickest on their feet, had a partial solution to this, they sent the first copy to the SLAC (Stanford Linear Accelerator Center) list, which was widely circulated, and hoped the paper would catch interest, resulting in many preprint request cards in their mailbox (not the electronic version, the little pigeonhole in their department office!). Physicists in all other fields yawned, and didn't bother. Heaven knows what people in other disciplines did!

Then 20 years ago, in August 1991, things changed. Ginsparg, who worked then at the Los Alamos National Lab, decided to harness the technology of the internet, which was of course, itself the major revolution of the last decade of the 20th century, for this purpose. What could be neater and more efficient than uploading your paper at a central archive, neatly catalogued, and searchable by area, title, author names, and keywords, from where anyone with internet access could download the paper? An idea this good, had to be a thumping success. Usage of the archive snowballed from the initial 400 submissions in the first six months, to 75,000 a year in 2011. Over the same period, the number of distinct users who access the archive increased to 400,000 a week and an astounding download of 1 million articles per week. The areas multiplied from a cosy community of particle physicists to all areas of theoretical physics, and across all disciplines to include mathematicians, biologists and computer scientists, admittedly those with a physics bias. Fields like medicine started their own archive with the help of publishers, and called it PubMed. The surprise does not lie in the number of people who use the archive, the surprise lies in the fact that a fraction of the scientific population appears to manage without it, even now.

However, the most important thing about the archive was the way it levelled the playing field, at least for those interested in theoretical areas. One internet connection, and no place was a backwater any more. The dependence on exorbitantly priced journals was, if not gone, greatly reduced. Though archive submissions are unrefereed, their status and versions are updated post publication in regular journals, for ease of reference. Papers can be submitted to journals directly via uploads from the archive. Mirror sites of the archive increase efficiency and download speeds. Just as the archive was the result of Paul Ginsparg's individual initiative, much of the effort in setting up this amazing framework has come from the tireless work of individual scientists. This is a good place to acknowledge the unstinting efforts of Kapil Paranjape in setting up the Indian mirror site of the archive at the Institute of Mathematical Sciences, Chennai.

Finally, which way will the archive evolve further? It is really difficult to say. Ginsparg has said that better quality control may enable the archive to evolve from being merely a repository of information to a powerful and self-maintained knowledge structure. To see if this works, stay tuned in till the archive celebrates its silver jubilee in five years.

Tailpiece: Two tales from the late eighties.

A physicist named Joanne Cohn, an early pioneer in the field of matrix models,initially had a personal list of friends to whom she would mail all the preprints that she received. Soon, her reputation grew and even people who did not know her would send her their preprints hoping that she would circulate them to her friends. By the time Ginsparg took over, she had more than a hundred email addresses to which she would forward the preprints. This was not a small number then, so this was true public service!

There was also the time someone from industry came to talk to the physicists at Santa Barbara and suggested that scientists should charge something for their papers (intellectual property rights!) to be put up for public consumption. This provoked much merriment. Some one joked that most scientists would pay to have their papers read!

This blog post is by Neelima Gupte and Sumathi Rao.

Anant said...

Dear Neelima, Sumathi,

I could not get myself to comment on AIP for sometime, but here is a start. It may not be inappropriate to remind readers of Rahul's observations on the arxiv.

Regards, Anant

Neelima said...

Thanks for the reminder, Anant, and welcome back to As I Please.